Re: [PATCH v2 7/9] trace_uprobe/sdt: Fix multiple update of same reference counter

2018-04-10 Thread Ravi Bangoria
Hi Oleg,

On 04/10/2018 04:36 PM, Oleg Nesterov wrote:
> Hi Ravi,
>
> On 04/10, Ravi Bangoria wrote:
>>> and what if __mmu_notifier_register() fails simply because signal_pending() 
>>> == T?
>>> see mm_take_all_locks().
>>>
>>> at first glance this all look suspicious and sub-optimal,
>> Yes. I should have added checks for failure cases.
>> Will fix them in v3.
> And what can you do if it fails? Nothing except report the problem. But
> signal_pending() is not the unlikely or error condition, it should not
> cause the tracing errors.

...

> Plus mm_take_all_locks() is very heavy... BTW, uprobe_mmap_callback() is
> called unconditionally. Whatever it does, can we at least move it after
> the no_uprobe_events() check? Can't we also check MMF_HAS_UPROBES?

Sure, I'll move it after these conditions.

> Either way, I do not feel that mmu_notifier is the right tool... Did you
> consider the uprobe_clear_state() hook we already have?

Ah! This is really a good idea. We don't need mmu_notifier then.

Thanks for suggestion,
Ravi

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 09/10] drivers/hwmon: Add PECI hwmon client drivers

2018-04-10 Thread Guenter Roeck
On Tue, Apr 10, 2018 at 11:32:11AM -0700, Jae Hyun Yoo wrote:
> This commit adds PECI cputemp and dimmtemp hwmon drivers.
> 
> Signed-off-by: Jae Hyun Yoo 
> Reviewed-by: Haiyue Wang 
> Reviewed-by: James Feist 
> Reviewed-by: Vernon Mauery 
> Cc: Alan Cox 
> Cc: Andrew Jeffery 
> Cc: Andrew Lunn 
> Cc: Andy Shevchenko 
> Cc: Arnd Bergmann 
> Cc: Benjamin Herrenschmidt 
> Cc: Fengguang Wu 
> Cc: Greg KH 
> Cc: Guenter Roeck 
> Cc: Jason M Biils 
> Cc: Jean Delvare 
> Cc: Joel Stanley 
> Cc: Julia Cartwright 
> Cc: Miguel Ojeda 
> Cc: Milton Miller II 
> Cc: Pavel Machek 
> Cc: Randy Dunlap 
> Cc: Stef van Os 
> Cc: Sumeet R Pawnikar 
> ---
>  drivers/hwmon/Kconfig |  28 ++
>  drivers/hwmon/Makefile|   2 +
>  drivers/hwmon/peci-cputemp.c  | 783 
> ++
>  drivers/hwmon/peci-dimmtemp.c | 432 +++
>  4 files changed, 1245 insertions(+)
>  create mode 100644 drivers/hwmon/peci-cputemp.c
>  create mode 100644 drivers/hwmon/peci-dimmtemp.c
> 
> diff --git a/drivers/hwmon/Kconfig b/drivers/hwmon/Kconfig
> index f249a4428458..c52f610f81d0 100644
> --- a/drivers/hwmon/Kconfig
> +++ b/drivers/hwmon/Kconfig
> @@ -1259,6 +1259,34 @@ config SENSORS_NCT7904
> This driver can also be built as a module.  If so, the module
> will be called nct7904.
>  
> +config SENSORS_PECI_CPUTEMP
> + tristate "PECI CPU temperature monitoring support"
> + depends on OF
> + depends on PECI
> + help
> +   If you say yes here you get support for the generic Intel PECI
> +   cputemp driver which provides Digital Thermal Sensor (DTS) thermal
> +   readings of the CPU package and CPU cores that are accessible using
> +   the PECI Client Command Suite via the processor PECI client.
> +   Check Documentation/hwmon/peci-cputemp for details.
> +
> +   This driver can also be built as a module.  If so, the module
> +   will be called peci-cputemp.
> +
> +config SENSORS_PECI_DIMMTEMP
> + tristate "PECI DIMM temperature monitoring support"
> + depends on OF
> + depends on PECI
> + help
> +   If you say yes here you get support for the generic Intel PECI hwmon
> +   driver which provides Digital Thermal Sensor (DTS) thermal readings of
> +   DIMM components that are accessible using the PECI Client Command
> +   Suite via the processor PECI client.
> +   Check Documentation/hwmon/peci-dimmtemp for details.
> +
> +   This driver can also be built as a module.  If so, the module
> +   will be called peci-dimmtemp.
> +
>  config SENSORS_NSA320
>   tristate "ZyXEL NSA320 and compatible fan speed and temperature sensors"
>   depends on GPIOLIB && OF
> diff --git a/drivers/hwmon/Makefile b/drivers/hwmon/Makefile
> index e7d52a36e6c4..48d9598fcd3a 100644
> --- a/drivers/hwmon/Makefile
> +++ b/drivers/hwmon/Makefile
> @@ -136,6 +136,8 @@ obj-$(CONFIG_SENSORS_NCT7802) += nct7802.o
>  obj-$(CONFIG_SENSORS_NCT7904)+= nct7904.o
>  obj-$(CONFIG_SENSORS_NSA320) += nsa320-hwmon.o
>  obj-$(CONFIG_SENSORS_NTC_THERMISTOR) += ntc_thermistor.o
> +obj-$(CONFIG_SENSORS_PECI_CPUTEMP)   += peci-cputemp.o
> +obj-$(CONFIG_SENSORS_PECI_DIMMTEMP)  += peci-dimmtemp.o
>  obj-$(CONFIG_SENSORS_PC87360)+= pc87360.o
>  obj-$(CONFIG_SENSORS_PC87427)+= pc87427.o
>  obj-$(CONFIG_SENSORS_PCF8591)+= pcf8591.o
> diff --git a/drivers/hwmon/peci-cputemp.c b/drivers/hwmon/peci-cputemp.c
> new file mode 100644
> index ..f0bc92687512
> --- /dev/null
> +++ b/drivers/hwmon/peci-cputemp.c
> @@ -0,0 +1,783 @@
> +// SPDX-License-Identifier: GPL-2.0
> +// Copyright (c) 2018 Intel Corporation
> +
> +#include 
> +#include 
> +#include 

Is this include needed ?

> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#define TEMP_TYPE_PECI6  /* Sensor type 6: Intel PECI */
> +
> +#define CORE_MAX_ON_HSX   18 /* Max number of cores on Haswell */
> +#define CORE_MAX_ON_BDX   24 /* Max number of cores on Broadwell */
> +#define CORE_MAX_ON_SKX   28 /* Max number of cores on Skylake */
> +
> +#define DEFAULT_CHANNEL_NUMS  5
> +#define CORETEMP_CHANNEL_NUMS CORE_MAX_ON_SKX
> +#define CPUTEMP_CHANNEL_NUMS  (DEFAULT_CHANNEL_NUMS + CORETEMP_CHANNEL_NUMS)
> +
> +#define CLIENT_CPU_ID_MASK0xf0ff0  /* Mask for Family / Model info */
> +
> +#define UPDATE_INTERVAL_MIN   HZ
> +
> +enum cpu_gens {
> + 

Re: [RFC bpf-next v2 3/8] bpf: add documentation for eBPF helpers (12-22)

2018-04-10 Thread Alexei Starovoitov
On Tue, Apr 10, 2018 at 03:41:52PM +0100, Quentin Monnet wrote:
> Add documentation for eBPF helper functions to bpf.h user header file.
> This documentation can be parsed with the Python script provided in
> another commit of the patch series, in order to provide a RST document
> that can later be converted into a man page.
> 
> The objective is to make the documentation easily understandable and
> accessible to all eBPF developers, including beginners.
> 
> This patch contains descriptions for the following helper functions, all
> writter by Alexei:
> 
> - bpf_get_current_pid_tgid()
> - bpf_get_current_uid_gid()
> - bpf_get_current_comm()
> - bpf_skb_vlan_push()
> - bpf_skb_vlan_pop()
> - bpf_skb_get_tunnel_key()
> - bpf_skb_set_tunnel_key()
> - bpf_redirect()
> - bpf_perf_event_output()
> - bpf_get_stackid()
> - bpf_get_current_task()
> 
> Cc: Alexei Starovoitov 
> Signed-off-by: Quentin Monnet 
> ---
>  include/uapi/linux/bpf.h | 237 
> +++
>  1 file changed, 237 insertions(+)
> 
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index 2bc653a3a20f..f3ea8824efbc 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -580,6 +580,243 @@ union bpf_attr {
>   *   performed again.
>   *   Return
>   *   0 on success, or a negative error in case of failure.
> + *
> + * u64 bpf_get_current_pid_tgid(void)
> + *   Return
> + *   A 64-bit integer containing the current tgid and pid, and
> + *   created as such:
> + *   *current_task*\ **->tgid << 32 \|**
> + *   *current_task*\ **->pid**.
> + *
> + * u64 bpf_get_current_uid_gid(void)
> + *   Return
> + *   A 64-bit integer containing the current GID and UID, and
> + *   created as such: *current_gid* **<< 32 \|** *current_uid*.
> + *
> + * int bpf_get_current_comm(char *buf, u32 size_of_buf)
> + *   Description
> + *   Copy the **comm** attribute of the current task into *buf* of
> + *   *size_of_buf*. The **comm** attribute contains the name of
> + *   the executable (excluding the path) for the current task. The
> + *   *size_of_buf* must be strictly positive. On success, the

that reminds me that we probably should relax it to ARG_CONST_SIZE_OR_ZERO.
The programs won't be passing an actual zero into it, but it helps
a lot to tell verifier that zero is also valid, since programs
become much simpler.

> + *   helper makes sure that the *buf* is NUL-terminated. On failure,
> + *   it is filled with zeroes.
> + *   Return
> + *   0 on success, or a negative error in case of failure.
> + *
> + * int bpf_skb_vlan_push(struct sk_buff *skb, __be16 vlan_proto, u16 
> vlan_tci)
> + *   Description
> + *   Push a *vlan_tci* (VLAN tag control information) of protocol
> + *   *vlan_proto* to the packet associated to *skb*, then update
> + *   the checksum. Note that if *vlan_proto* is different from
> + *   **ETH_P_8021Q** and **ETH_P_8021AD**, it is considered to
> + *   be **ETH_P_8021Q**.
> + *
> + *   A call to this helper is susceptible to change data from the
> + *   packet. Therefore, at load time, all checks on pointers
> + *   previously done by the verifier are invalidated and must be
> + *   performed again.
> + *   Return
> + *   0 on success, or a negative error in case of failure.
> + *
> + * int bpf_skb_vlan_pop(struct sk_buff *skb)
> + *   Description
> + *   Pop a VLAN header from the packet associated to *skb*.
> + *
> + *   A call to this helper is susceptible to change data from the
> + *   packet. Therefore, at load time, all checks on pointers
> + *   previously done by the verifier are invalidated and must be
> + *   performed again.
> + *   Return
> + *   0 on success, or a negative error in case of failure.
> + *
> + * int bpf_skb_get_tunnel_key(struct sk_buff *skb, struct bpf_tunnel_key 
> *key, u32 size, u64 flags)
> + *   Description
> + *   Get tunnel metadata. This helper takes a pointer *key* to an
> + *   empty **struct bpf_tunnel_key** of **size**, that will be
> + *   filled with tunnel metadata for the packet associated to *skb*.
> + *   The *flags* can be set to **BPF_F_TUNINFO_IPV6**, which
> + *   indicates that the tunnel is based on IPv6 protocol instead of
> + *   IPv4.
> + *
> + *   This is typically used on the receive path to perform a lookup
> + *   or a packet redirection based on the value of *key*:

above is correct, but feels a bit cryptic.
May be give more concrete example for particular tunneling protocol like gre
and say that tunnel_key.remote_ip[46] is essential part of the encap and
bpf prog will make decisions based on the contents of the encap header
where bpf_tunnel_key is a 

Re: [PATCH v2 1/2] mm: introduce ARCH_HAS_PTE_SPECIAL

2018-04-10 Thread David Rientjes
On Tue, 10 Apr 2018, Laurent Dufour wrote:

> > On Tue, Apr 10, 2018 at 05:25:50PM +0200, Laurent Dufour wrote:
> >>  arch/powerpc/include/asm/pte-common.h  | 3 ---
> >>  arch/riscv/Kconfig | 1 +
> >>  arch/s390/Kconfig  | 1 +
> > 
> > You forgot to delete __HAVE_ARCH_PTE_SPECIAL from
> > arch/riscv/include/asm/pgtable-bits.h
> 
> Damned !
> Thanks for catching it.
> 

Squashing the two patches together at least allowed it to be caught 
easily.  After it's fixed, feel free to add

Acked-by: David Rientjes 

Thanks for doing this!
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/2] mm: introduce ARCH_HAS_PTE_SPECIAL

2018-04-10 Thread Palmer Dabbelt

On Tue, 10 Apr 2018 09:09:32 PDT (-0700), wi...@infradead.org wrote:

On Tue, Apr 10, 2018 at 05:25:50PM +0200, Laurent Dufour wrote:

 arch/powerpc/include/asm/pte-common.h  | 3 ---
 arch/riscv/Kconfig | 1 +
 arch/s390/Kconfig  | 1 +


You forgot to delete __HAVE_ARCH_PTE_SPECIAL from
arch/riscv/include/asm/pgtable-bits.h


Thanks -- I was looking for that but couldn't find it and assumed I'd just 
misunderstood something.

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] gpiolib: add hogs support for machine code

2018-04-10 Thread Bartosz Golaszewski
2018-04-10 19:05 GMT+02:00 kbuild test robot <l...@intel.com>:
> Hi Bartosz,
>
> I love your patch! Yet something to improve:
>
> [auto build test ERROR on gpio/for-next]
> [also build test ERROR on v4.16 next-20180410]
> [if your patch is applied to the wrong git tree, please drop us a note to 
> help improve the system]
>
> url:
> https://github.com/0day-ci/linux/commits/Bartosz-Golaszewski/gpiolib-add-hogs-support-for-machine-code/20180410-232047
> base:   https://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio.git 
> for-next
> config: i386-randconfig-a0-201814 (attached as .config)
> compiler: gcc-4.9 (Debian 4.9.4-2) 4.9.4
> reproduce:
> # save the attached .config to linux build tree
> make ARCH=i386
>
> All errors (new ones prefixed by >>):
>
>In file included from drivers//mfd/sm501.c:23:0:
>>> include/linux/gpio/machine.h:56:19: error: field 'dflags' has incomplete 
>>> type
>  enum gpiod_flags dflags;
>   ^
>
> vim +/dflags +56 include/linux/gpio/machine.h
>
> 41
> 42  /**
> 43   * struct gpiod_hog - GPIO line hog table
> 44   * @chip_label: name of the chip the GPIO belongs to
> 45   * @chip_hwnum: hardware number (i.e. relative to the chip) of the 
> GPIO
> 46   * @line_name: consumer name for the hogged line
> 47   * @lflags: mask of GPIO lookup flags
> 48   * @dflags: GPIO flags used to specify the direction and value
> 49   */
> 50  struct gpiod_hog {
> 51  struct list_head list;
> 52  const char *chip_label;
> 53  u16 chip_hwnum;
> 54  const char *line_name;
> 55  enum gpio_lookup_flags lflags;
>   > 56  enum gpiod_flags dflags;
> 57  };
> 58
>
> ---
> 0-DAY kernel test infrastructureOpen Source Technology Center
> https://lists.01.org/pipermail/kbuild-all   Intel Corporation

Superseded by v2.
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] gpiolib: add hogs support for machine code

2018-04-10 Thread Bartosz Golaszewski
Board files constitute a significant part of the users of the legacy
GPIO framework. In many cases they only export a line and set its
desired value. We could use GPIO hogs for that like we do for DT and
ACPI but there's no support for that in machine code.

This patch proposes to extend the machine.h API with support for
registering hog tables in board files.

Signed-off-by: Bartosz Golaszewski 
---
v1 -> v2:
- kbuild bot complains about enum gpiod_flags having incomplete type
  although it builds fine for me locally: change the type of dflags
  to int

 Documentation/driver-api/gpio/board.rst | 16 ++
 drivers/gpio/gpiolib.c  | 67 +
 include/linux/gpio/machine.h| 31 
 3 files changed, 114 insertions(+)

diff --git a/Documentation/driver-api/gpio/board.rst 
b/Documentation/driver-api/gpio/board.rst
index 25d62b2e9fd0..2c112553df84 100644
--- a/Documentation/driver-api/gpio/board.rst
+++ b/Documentation/driver-api/gpio/board.rst
@@ -177,3 +177,19 @@ mapping and is thus transparent to GPIO consumers.
 
 A set of functions such as gpiod_set_value() is available to work with
 the new descriptor-oriented interface.
+
+Boards using platform data can also hog GPIO lines by defining GPIO hog tables.
+
+.. code-block:: c
+
+struct gpiod_hog gpio_hog_table[] = {
+GPIO_HOG("gpio.0", 10, "foo", GPIO_ACTIVE_LOW, GPIOD_OUT_HIGH),
+{ }
+};
+
+And the table can be added to the board code as follows::
+
+gpiod_add_hogs(gpio_hog_table);
+
+The line will be hogged as soon as the gpiochip is created or - in case the
+chip was created earlier - when the hog table is registered.
diff --git a/drivers/gpio/gpiolib.c b/drivers/gpio/gpiolib.c
index 43aeb07343ec..547adc149b62 100644
--- a/drivers/gpio/gpiolib.c
+++ b/drivers/gpio/gpiolib.c
@@ -71,6 +71,9 @@ static DEFINE_MUTEX(gpio_lookup_lock);
 static LIST_HEAD(gpio_lookup_list);
 LIST_HEAD(gpio_devices);
 
+static DEFINE_MUTEX(gpio_machine_hogs_mutex);
+static LIST_HEAD(gpio_machine_hogs);
+
 static void gpiochip_free_hogs(struct gpio_chip *chip);
 static int gpiochip_add_irqchip(struct gpio_chip *gpiochip,
struct lock_class_key *lock_key,
@@ -1171,6 +1174,41 @@ static int gpiochip_setup_dev(struct gpio_device *gdev)
return status;
 }
 
+static void gpiochip_machine_hog(struct gpio_chip *chip, struct gpiod_hog *hog)
+{
+   struct gpio_desc *desc;
+   int rv;
+
+   desc = gpiochip_get_desc(chip, hog->chip_hwnum);
+   if (IS_ERR(desc)) {
+   pr_err("%s: unable to get GPIO desc: %ld\n",
+  __func__, PTR_ERR(desc));
+   return;
+   }
+
+   if (desc->flags & FLAG_IS_HOGGED)
+   return;
+
+   rv = gpiod_hog(desc, hog->line_name, hog->lflags, hog->dflags);
+   if (rv)
+   pr_err("%s: unable to hog GPIO line (%s:%u): %d\n",
+  __func__, chip->label, hog->chip_hwnum, rv);
+}
+
+static void machine_gpiochip_add(struct gpio_chip *chip)
+{
+   struct gpiod_hog *hog;
+
+   mutex_lock(_machine_hogs_mutex);
+
+   list_for_each_entry(hog, _machine_hogs, list) {
+   if (!strcmp(chip->label, hog->chip_label))
+   gpiochip_machine_hog(chip, hog);
+   }
+
+   mutex_unlock(_machine_hogs_mutex);
+}
+
 static void gpiochip_setup_devs(void)
 {
struct gpio_device *gdev;
@@ -1326,6 +1364,8 @@ int gpiochip_add_data_with_key(struct gpio_chip *chip, 
void *data,
 
acpi_gpiochip_add(chip);
 
+   machine_gpiochip_add(chip);
+
/*
 * By first adding the chardev, and then adding the device,
 * we get a device node entry in sysfs under
@@ -3462,6 +3502,33 @@ void gpiod_remove_lookup_table(struct gpiod_lookup_table 
*table)
 }
 EXPORT_SYMBOL_GPL(gpiod_remove_lookup_table);
 
+/**
+ * gpiod_add_hogs() - register a set of GPIO hogs from machine code
+ * @hogs: table of gpio hog entries with a zeroed sentinel at the end
+ */
+void gpiod_add_hogs(struct gpiod_hog *hogs)
+{
+   struct gpio_chip *chip;
+   struct gpiod_hog *hog;
+
+   mutex_lock(_machine_hogs_mutex);
+
+   for (hog = [0]; hog->chip_label; hog++) {
+   list_add_tail(>list, _machine_hogs);
+
+   /*
+* The chip may have been registered earlier, so check if it
+* exists and, if so, try to hog the line now.
+*/
+   chip = find_chip_by_name(hog->chip_label);
+   if (chip)
+   gpiochip_machine_hog(chip, hog);
+   }
+
+   mutex_unlock(_machine_hogs_mutex);
+}
+EXPORT_SYMBOL_GPL(gpiod_add_hogs);
+
 static struct gpiod_lookup_table *gpiod_find_lookup_table(struct device *dev)
 {
const char *dev_id = dev ? dev_name(dev) : NULL;
diff --git a/include/linux/gpio/machine.h b/include/linux/gpio/machine.h
index b2f2dc638463..daa44eac9241 

[PATCH v3 01/10] Documentations: dt-bindings: Add documents of generic PECI bus, adapter and client drivers

2018-04-10 Thread Jae Hyun Yoo
This commit adds documents of generic PECI bus, adapter and client drivers.

Signed-off-by: Jae Hyun Yoo 
Reviewed-by: Haiyue Wang 
Reviewed-by: James Feist 
Reviewed-by: Vernon Mauery 
Cc: Alan Cox 
Cc: Andrew Jeffery 
Cc: Andrew Lunn 
Cc: Andy Shevchenko 
Cc: Arnd Bergmann 
Cc: Benjamin Herrenschmidt 
Cc: Fengguang Wu 
Cc: Greg KH 
Cc: Guenter Roeck 
Cc: Jason M Biils 
Cc: Jean Delvare 
Cc: Joel Stanley 
Cc: Julia Cartwright 
Cc: Miguel Ojeda 
Cc: Milton Miller II 
Cc: Pavel Machek 
Cc: Randy Dunlap 
Cc: Stef van Os 
Cc: Sumeet R Pawnikar 
---
 .../devicetree/bindings/peci/peci-adapter.txt  | 23 
 .../devicetree/bindings/peci/peci-bus.txt  | 15 +
 .../devicetree/bindings/peci/peci-client.txt   | 25 ++
 3 files changed, 63 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/peci/peci-adapter.txt
 create mode 100644 Documentation/devicetree/bindings/peci/peci-bus.txt
 create mode 100644 Documentation/devicetree/bindings/peci/peci-client.txt

diff --git a/Documentation/devicetree/bindings/peci/peci-adapter.txt 
b/Documentation/devicetree/bindings/peci/peci-adapter.txt
new file mode 100644
index ..9221374f6b11
--- /dev/null
+++ b/Documentation/devicetree/bindings/peci/peci-adapter.txt
@@ -0,0 +1,23 @@
+Generic device tree configuration for PECI adapters.
+
+Required properties:
+- compatible : Should contain hardware specific definition strings that can
+  match an adapter driver implementation.
+- reg: Should contain PECI controller registers location and 
length.
+- #address-cells : Should be <1>.
+- #size-cells: Should be <0>.
+
+Example:
+   peci: peci@1000 {
+   compatible = "simple-bus";
+   #address-cells = <1>;
+   #size-cells = <1>;
+   ranges = <0x0 0x1000 0x1000>;
+
+   peci0: peci-bus@0 {
+   compatible = "soc,soc-peci";
+   reg = <0x0 0x1000>;
+   #address-cells = <1>;
+   #size-cells = <0>;
+   };
+   };
diff --git a/Documentation/devicetree/bindings/peci/peci-bus.txt 
b/Documentation/devicetree/bindings/peci/peci-bus.txt
new file mode 100644
index ..90bcc791ccb0
--- /dev/null
+++ b/Documentation/devicetree/bindings/peci/peci-bus.txt
@@ -0,0 +1,15 @@
+Generic device tree configuration for PECI buses.
+
+Required properties:
+- compatible : Should be "simple-bus".
+- #address-cells : Should be <1>.
+- #size-cells: Should be <1>.
+- ranges : Should contain PECI controller registers ranges.
+
+Example:
+   peci: peci@1000 {
+   compatible = "simple-bus";
+   #address-cells = <1>;
+   #size-cells = <1>;
+   ranges = <0x0 0x1000 0x1000>;
+   };
diff --git a/Documentation/devicetree/bindings/peci/peci-client.txt 
b/Documentation/devicetree/bindings/peci/peci-client.txt
new file mode 100644
index ..8e2bfd8532f6
--- /dev/null
+++ b/Documentation/devicetree/bindings/peci/peci-client.txt
@@ -0,0 +1,25 @@
+Generic device tree configuration for PECI clients.
+
+Required properties:
+- compatible : Should contain target device specific definition strings that 
can
+  match a client driver implementation.
+- reg: Should contain address of a client CPU. Address range of CPU
+  clients is starting from 0x30 based on PECI specification.
+  <0x30> .. <0x37> (depends on the PECI_OFFSET_MAX definition)
+
+Example:
+   peci-bus@0 {
+   #address-cells = <1>;
+   #size-cells = <0>;
+   < more properties >
+
+   function@cpu0 {
+   compatible = "device,function";
+   reg = <0x30>;
+   };
+
+   function@cpu1 {
+   compatible = "device,function";
+   reg = <0x31>;
+   };
+   };
-- 
2.16.2

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 00/10] PECI device driver introduction

2018-04-10 Thread Jae Hyun Yoo
Introduction of the Platform Environment Control Interface (PECI) bus
device driver. PECI is a one-wire bus interface that provides a
communication channel between an Intel processor and chipset components to
external monitoring or control devices. PECI is designed to support the
following sideband functions:

* Processor and DRAM thermal management
  - Processor fan speed control is managed by comparing Digital Thermal
Sensor (DTS) thermal readings acquired via PECI against the
processor-specific fan speed control reference point, or TCONTROL. Both
TCONTROL and DTS thermal readings are accessible via the processor PECI
client. These variables are referenced to a common temperature, the TCC
activation point, and are both defined as negative offsets from that
reference.
  - PECI based access to the processor package configuration space provides
a means for Baseboard Management Controllers (BMC) or other platform
management devices to actively manage the processor and memory power
and thermal features.

* Platform Manageability
  - Platform manageability functions including thermal, power, and error
monitoring. Note that platform 'power' management includes monitoring
and control for both the processor and DRAM subsystem to assist with
data center power limiting.
  - PECI allows read access to certain error registers in the processor MSR
space and status monitoring registers in the PCI configuration space
within the processor and downstream devices.
  - PECI permits writes to certain registers in the processor PCI
configuration space.

* Processor Interface Tuning and Diagnostics
  - Processor interface tuning and diagnostics capabilities
(Intel Interconnect BIST). The processors Intel Interconnect Built In
Self Test (Intel IBIST) allows for infield diagnostic capabilities in
the Intel UPI and memory controller interfaces. PECI provides a port to
execute these diagnostics via its PCI Configuration read and write
capabilities.

* Failure Analysis
  - Output the state of the processor after a failure for analysis via
Crashdump.

PECI uses a single wire for self-clocking and data transfer. The bus
requires no additional control lines. The physical layer is a self-clocked
one-wire bus that begins each bit with a driven, rising edge from an idle
level near zero volts. The duration of the signal driven high depends on
whether the bit value is a logic '0' or logic '1'. PECI also includes
variable data transfer rate established with every message. In this way, it
is highly flexible even though underlying logic is simple.

The interface design was optimized for interfacing between an Intel
processor and chipset components in both single processor and multiple
processor environments. The single wire interface provides low board
routing overhead for the multiple load connections in the congested routing
area near the processor and chipset components. Bus speed, error checking,
and low protocol overhead provides adequate link bandwidth and reliability
to transfer critical device operating conditions and configuration
information.

This implementation provides the basic framework to add PECI extensions to
the Linux bus and device models. A hardware specific 'Adapter' driver can
be attached to the PECI bus to provide sideband functions described above.
It is also possible to access all devices on an adapter from userspace
through the /dev interface. A device specific 'Client' driver also can be
attached to the PECI bus so each processor client's features can be
supported by the 'Client' driver through an adapter connection in the bus.
This patch set includes Aspeed 24xx/25xx PECI driver and PECI
cputemp/dimmtemp drivers as the first implementation for both adapter and
client drivers on the PECI bus framework.

Please review.

Thanks,

-Jae

Changes from v2:
* Divided peci-hwmon driver into two drivers, peci-cputemp and
  peci-dimmtemp.
* Added generic dt binding documents for PECI bus, adapter and client.
* Removed in_atomic() call from the PECI core driver.
* Improved PECI commands masking logic.
* Added permission check logic for PECI ioctls.
* Removed unnecessary type casts.
* Fixed some invalid error return codes.
* Added the mark_updated() function to improve update interval checking
  logic.
* Fixed a bug in populated DIMM checking function.
* Fixed some typo, grammar and style issues in documents.
* Rewrote hwmon drivers to use devm_hwmon_device_register_with_info API.
* Made peci_match_id() function as a static.
* Replaced a deprecated create_singlethread_workqueue() call with an
  alloc_ordered_workqueue() call.
* Reordered local variable definitions in reversed xmas tree notation.
* Listed up client CPUs that can be supported by peci-cputemp and
  peci-dimmtemp hwmon drivers.
* Added CPU generation detection logic which checks CPUID signature through
  PECI connection.
* Improved interrupt handling logic in the Aspeed PECI 

[PATCH v3 03/10] drivers/peci: Add support for PECI bus driver core

2018-04-10 Thread Jae Hyun Yoo
This commit adds driver implementation for PECI bus core into linux
driver framework.

Signed-off-by: Jae Hyun Yoo 
Signed-off-by: Fengguang Wu 
Reviewed-by: Haiyue Wang 
Reviewed-by: James Feist 
Reviewed-by: Vernon Mauery 
Cc: Alan Cox 
Cc: Andrew Jeffery 
Cc: Andrew Lunn 
Cc: Andy Shevchenko 
Cc: Arnd Bergmann 
Cc: Benjamin Herrenschmidt 
Cc: Fengguang Wu 
Cc: Greg KH 
Cc: Guenter Roeck 
Cc: Jason M Biils 
Cc: Jean Delvare 
Cc: Joel Stanley 
Cc: Julia Cartwright 
Cc: Miguel Ojeda 
Cc: Milton Miller II 
Cc: Pavel Machek 
Cc: Randy Dunlap 
Cc: Stef van Os 
Cc: Sumeet R Pawnikar 
---
 drivers/Kconfig |2 +
 drivers/Makefile|1 +
 drivers/peci/Kconfig|   17 +
 drivers/peci/Makefile   |6 +
 drivers/peci/peci-core.c| 1291 +++
 include/linux/peci.h|  107 
 include/uapi/linux/peci-ioctl.h |  200 ++
 7 files changed, 1624 insertions(+)
 create mode 100644 drivers/peci/Kconfig
 create mode 100644 drivers/peci/Makefile
 create mode 100644 drivers/peci/peci-core.c
 create mode 100644 include/linux/peci.h
 create mode 100644 include/uapi/linux/peci-ioctl.h

diff --git a/drivers/Kconfig b/drivers/Kconfig
index 95b9ccc08165..8c44d9738377 100644
--- a/drivers/Kconfig
+++ b/drivers/Kconfig
@@ -217,4 +217,6 @@ source "drivers/siox/Kconfig"
 
 source "drivers/slimbus/Kconfig"
 
+source "drivers/peci/Kconfig"
+
 endmenu
diff --git a/drivers/Makefile b/drivers/Makefile
index 24cd47014657..250fe3d0fa7e 100644
--- a/drivers/Makefile
+++ b/drivers/Makefile
@@ -185,3 +185,4 @@ obj-$(CONFIG_TEE)   += tee/
 obj-$(CONFIG_MULTIPLEXER)  += mux/
 obj-$(CONFIG_UNISYS_VISORBUS)  += visorbus/
 obj-$(CONFIG_SIOX) += siox/
+obj-$(CONFIG_PECI) += peci/
diff --git a/drivers/peci/Kconfig b/drivers/peci/Kconfig
new file mode 100644
index ..1fbc13f9e6c2
--- /dev/null
+++ b/drivers/peci/Kconfig
@@ -0,0 +1,17 @@
+#
+# Platform Environment Control Interface (PECI) subsystem configuration
+#
+
+menu "PECI support"
+
+config PECI
+   bool "PECI support"
+   select RT_MUTEXES
+   select CRC8
+   help
+ The Platform Environment Control Interface (PECI) is a one-wire bus
+ interface that provides a communication channel between Intel
+ processors and chipset components to external monitoring or control
+ devices.
+
+endmenu
diff --git a/drivers/peci/Makefile b/drivers/peci/Makefile
new file mode 100644
index ..9e8615e0d3ff
--- /dev/null
+++ b/drivers/peci/Makefile
@@ -0,0 +1,6 @@
+#
+# Makefile for the PECI core and bus drivers.
+#
+
+# Core functionality
+obj-$(CONFIG_PECI) += peci-core.o
diff --git a/drivers/peci/peci-core.c b/drivers/peci/peci-core.c
new file mode 100644
index ..9b45869b7c39
--- /dev/null
+++ b/drivers/peci/peci-core.c
@@ -0,0 +1,1291 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (c) 2018 Intel Corporation
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+/* Device Specific Completion Code (CC) Definition */
+#define DEV_PECI_CC_SUCCESS  0x40
+#define DEV_PECI_CC_TIMEOUT  0x80
+#define DEV_PECI_CC_OUT_OF_RESOURCE  0x81
+#define DEV_PECI_CC_UNAVAIL_RESOURCE 0x82
+#define DEV_PECI_CC_INVALID_REQ  0x90
+
+/* Completion Code mask to check retry needs */
+#define DEV_PECI_CC_RETRY_CHECK_MASK 0xf0
+#define DEV_PECI_CC_NEED_RETRY   0x80
+
+/* Skylake EDS says to retry for 250ms */
+#define DEV_PECI_RETRY_TIME_MS 250
+#define DEV_PECI_RETRY_INTERVAL_MS 10
+#define DEV_PECI_RETRY_BIT 0x01
+
+#define GET_TEMP_WR_LEN   1
+#define GET_TEMP_RD_LEN   2
+#define GET_TEMP_PECI_CMD 0x01
+
+#define GET_DIB_WR_LEN   1
+#define GET_DIB_RD_LEN   8
+#define GET_DIB_PECI_CMD 0xf7
+
+#define RDPKGCFG_WRITE_LEN 5
+#define RDPKGCFG_READ_LEN_BASE 1
+#define RDPKGCFG_PECI_CMD  0xa1
+
+#define WRPKGCFG_WRITE_LEN_BASE 6
+#define WRPKGCFG_READ_LEN   1
+#define WRPKGCFG_PECI_CMD   0xa5
+
+#define RDIAMSR_WRITE_LEN 5
+#define RDIAMSR_READ_LEN  9
+#define RDIAMSR_PECI_CMD  0xb1
+
+#define WRIAMSR_PECI_CMD  0xb5
+
+#define RDPCICFG_WRITE_LEN 6
+#define RDPCICFG_READ_LEN  5
+#define RDPCICFG_PECI_CMD  0x61
+
+#define WRPCICFG_PECI_CMD  0x65
+
+#define RDPCICFGLOCAL_WRITE_LEN 5
+#define RDPCICFGLOCAL_READ_LEN_BASE 1
+#define 

[PATCH v3 02/10] Documentations: ioctl: Add ioctl numbers for PECI subsystem

2018-04-10 Thread Jae Hyun Yoo
This commit Updates ioctl-number.txt to reflect ioctl numbers being
used by the PECI subsystem.

Signed-off-by: Jae Hyun Yoo 
Cc: Alan Cox 
Cc: Andrew Jeffery 
Cc: Andrew Lunn 
Cc: Andy Shevchenko 
Cc: Arnd Bergmann 
Cc: Benjamin Herrenschmidt 
Cc: Fengguang Wu 
Cc: Greg KH 
Cc: Guenter Roeck 
Cc: Haiyue Wang 
Cc: James Feist 
Cc: Jason M Biils 
Cc: Jean Delvare 
Cc: Joel Stanley 
Cc: Julia Cartwright 
Cc: Miguel Ojeda 
Cc: Milton Miller II 
Cc: Pavel Machek 
Cc: Randy Dunlap 
Cc: Stef van Os 
Cc: Sumeet R Pawnikar 
Cc: Vernon Mauery 
---
 Documentation/ioctl/ioctl-number.txt | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/Documentation/ioctl/ioctl-number.txt 
b/Documentation/ioctl/ioctl-number.txt
index 84bb74dcae12..4bc3a65d7204 100644
--- a/Documentation/ioctl/ioctl-number.txt
+++ b/Documentation/ioctl/ioctl-number.txt
@@ -323,6 +323,8 @@ Code  Seq#(hex) Include FileComments
 0xB3   00  linux/mmc/ioctl.h
 0xB4   00-0F   linux/gpio.h
 0xB5   00-0F   uapi/linux/rpmsg.h  

+0xB6   00-0F   uapi/linux/peci-ioctl.h PECI subsystem
+   
 0xC0   00-0F   linux/usb/iowarrior.h
 0xCA   00-0F   uapi/misc/cxl.h
 0xCA   10-2F   uapi/misc/ocxl.h
-- 
2.16.2

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 05/10] ARM: dts: aspeed: peci: Add PECI node

2018-04-10 Thread Jae Hyun Yoo
This commit adds PECI bus/adapter node of AST24xx/AST25xx into
aspeed-g4 and aspeed-g5.

Signed-off-by: Jae Hyun Yoo 
Reviewed-by: Haiyue Wang 
Reviewed-by: James Feist 
Reviewed-by: Vernon Mauery 
Cc: Alan Cox 
Cc: Andrew Jeffery 
Cc: Andrew Lunn 
Cc: Andy Shevchenko 
Cc: Arnd Bergmann 
Cc: Benjamin Herrenschmidt 
Cc: Fengguang Wu 
Cc: Greg KH 
Cc: Guenter Roeck 
Cc: Jason M Biils 
Cc: Jean Delvare 
Cc: Joel Stanley 
Cc: Julia Cartwright 
Cc: Miguel Ojeda 
Cc: Milton Miller II 
Cc: Pavel Machek 
Cc: Randy Dunlap 
Cc: Stef van Os 
Cc: Sumeet R Pawnikar 
---
 arch/arm/boot/dts/aspeed-g4.dtsi | 25 +
 arch/arm/boot/dts/aspeed-g5.dtsi | 25 +
 2 files changed, 50 insertions(+)

diff --git a/arch/arm/boot/dts/aspeed-g4.dtsi b/arch/arm/boot/dts/aspeed-g4.dtsi
index 518d2bc7c7fc..f7992eee4d1f 100644
--- a/arch/arm/boot/dts/aspeed-g4.dtsi
+++ b/arch/arm/boot/dts/aspeed-g4.dtsi
@@ -29,6 +29,7 @@
serial3 = 
serial4 = 
serial5 = 
+   peci0 = 
};
 
cpus {
@@ -270,6 +271,13 @@
};
};
 
+   peci: peci@1e78b000 {
+   compatible = "simple-bus";
+   #address-cells = <1>;
+   #size-cells = <1>;
+   ranges = <0x0 0x1e78b000 0x60>;
+   };
+
uart2: serial@1e78d000 {
compatible = "ns16550a";
reg = <0x1e78d000 0x20>;
@@ -313,6 +321,23 @@
};
 };
 
+ {
+   peci0: peci-bus@0 {
+   compatible = "aspeed,ast2400-peci";
+   reg = <0x0 0x60>;
+   #address-cells = <1>;
+   #size-cells = <0>;
+   interrupts = <15>;
+   clocks = < ASPEED_CLK_GATE_REFCLK>;
+   clock-frequency = <2400>;
+   msg-timing-nego = <1>;
+   addr-timing-nego = <1>;
+   rd-sampling-point = <8>;
+   cmd-timeout-ms = <1000>;
+   status = "disabled";
+   };
+};
+
  {
i2c_ic: interrupt-controller@0 {
#interrupt-cells = <1>;
diff --git a/arch/arm/boot/dts/aspeed-g5.dtsi b/arch/arm/boot/dts/aspeed-g5.dtsi
index f9917717dd08..278791dba8a0 100644
--- a/arch/arm/boot/dts/aspeed-g5.dtsi
+++ b/arch/arm/boot/dts/aspeed-g5.dtsi
@@ -29,6 +29,7 @@
serial3 = 
serial4 = 
serial5 = 
+   peci0 = 
};
 
cpus {
@@ -320,6 +321,13 @@
};
};
 
+   peci: peci@1e78b000 {
+   compatible = "simple-bus";
+   #address-cells = <1>;
+   #size-cells = <1>;
+   ranges = <0x0 0x1e78b000 0x60>;
+   };
+
uart2: serial@1e78d000 {
compatible = "ns16550a";
reg = <0x1e78d000 0x20>;
@@ -363,6 +371,23 @@
};
 };
 
+ {
+   peci0: peci-bus@0 {
+   compatible = "aspeed,ast2500-peci";
+   reg = <0x0 0x60>;
+   #address-cells = <1>;
+   #size-cells = <0>;
+   interrupts = <15>;
+   clocks = < ASPEED_CLK_GATE_REFCLK>;
+   clock-frequency = <2400>;
+   msg-timing-nego = <1>;
+   addr-timing-nego = <1>;
+   rd-sampling-point = <8>;
+   cmd-timeout-ms = <1000>;
+   status = "disabled";
+   };
+};
+
  {
i2c_ic: interrupt-controller@0 {
#interrupt-cells = <1>;
-- 
2.16.2

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 06/10] drivers/peci: Add a PECI adapter driver for Aspeed AST24xx/AST25xx

2018-04-10 Thread Jae Hyun Yoo
This commit adds PECI adapter driver implementation for Aspeed
AST24xx/AST25xx.

Signed-off-by: Jae Hyun Yoo 
Reviewed-by: Haiyue Wang 
Reviewed-by: James Feist 
Reviewed-by: Vernon Mauery 
Cc: Alan Cox 
Cc: Andrew Jeffery 
Cc: Andrew Lunn 
Cc: Andy Shevchenko 
Cc: Arnd Bergmann 
Cc: Benjamin Herrenschmidt 
Cc: Fengguang Wu 
Cc: Greg KH 
Cc: Guenter Roeck 
Cc: Jason M Biils 
Cc: Jean Delvare 
Cc: Joel Stanley 
Cc: Julia Cartwright 
Cc: Miguel Ojeda 
Cc: Milton Miller II 
Cc: Pavel Machek 
Cc: Randy Dunlap 
Cc: Stef van Os 
Cc: Sumeet R Pawnikar 
---
 drivers/peci/Kconfig   |  28 +++
 drivers/peci/Makefile  |   3 +
 drivers/peci/peci-aspeed.c | 504 +
 3 files changed, 535 insertions(+)
 create mode 100644 drivers/peci/peci-aspeed.c

diff --git a/drivers/peci/Kconfig b/drivers/peci/Kconfig
index 1fbc13f9e6c2..0e33420365de 100644
--- a/drivers/peci/Kconfig
+++ b/drivers/peci/Kconfig
@@ -14,4 +14,32 @@ config PECI
  processors and chipset components to external monitoring or control
  devices.
 
+ If you want PECI support, you should say Y here and also to the
+ specific driver for your bus adapter(s) below.
+
+if PECI
+
+#
+# PECI hardware bus configuration
+#
+
+menu "PECI Hardware Bus support"
+
+config PECI_ASPEED
+   tristate "Aspeed AST24xx/AST25xx PECI support"
+   select REGMAP_MMIO
+   depends on OF
+   depends on ARCH_ASPEED || COMPILE_TEST
+   help
+ Say Y here if you want support for the Platform Environment Control
+ Interface (PECI) bus adapter driver on the Aspeed AST24XX and AST25XX
+ SoCs.
+
+ This support is also available as a module.  If so, the module
+ will be called peci-aspeed.
+
+endmenu
+
+endif # PECI
+
 endmenu
diff --git a/drivers/peci/Makefile b/drivers/peci/Makefile
index 9e8615e0d3ff..886285e69765 100644
--- a/drivers/peci/Makefile
+++ b/drivers/peci/Makefile
@@ -4,3 +4,6 @@
 
 # Core functionality
 obj-$(CONFIG_PECI) += peci-core.o
+
+# Hardware specific bus drivers
+obj-$(CONFIG_PECI_ASPEED)  += peci-aspeed.o
diff --git a/drivers/peci/peci-aspeed.c b/drivers/peci/peci-aspeed.c
new file mode 100644
index ..be2a1f327eb1
--- /dev/null
+++ b/drivers/peci/peci-aspeed.c
@@ -0,0 +1,504 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (C) 2012-2017 ASPEED Technology Inc.
+// Copyright (c) 2018 Intel Corporation
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define DUMP_DEBUG 0
+
+/* Aspeed PECI Registers */
+#define AST_PECI_CTRL 0x00
+#define AST_PECI_TIMING   0x04
+#define AST_PECI_CMD  0x08
+#define AST_PECI_CMD_CTRL 0x0c
+#define AST_PECI_EXP_FCS  0x10
+#define AST_PECI_CAP_FCS  0x14
+#define AST_PECI_INT_CTRL 0x18
+#define AST_PECI_INT_STS  0x1c
+#define AST_PECI_W_DATA0  0x20
+#define AST_PECI_W_DATA1  0x24
+#define AST_PECI_W_DATA2  0x28
+#define AST_PECI_W_DATA3  0x2c
+#define AST_PECI_R_DATA0  0x30
+#define AST_PECI_R_DATA1  0x34
+#define AST_PECI_R_DATA2  0x38
+#define AST_PECI_R_DATA3  0x3c
+#define AST_PECI_W_DATA4  0x40
+#define AST_PECI_W_DATA5  0x44
+#define AST_PECI_W_DATA6  0x48
+#define AST_PECI_W_DATA7  0x4c
+#define AST_PECI_R_DATA4  0x50
+#define AST_PECI_R_DATA5  0x54
+#define AST_PECI_R_DATA6  0x58
+#define AST_PECI_R_DATA7  0x5c
+
+/* AST_PECI_CTRL - 0x00 : Control Register */
+#define PECI_CTRL_SAMPLING_MASK GENMASK(19, 16)
+#define PECI_CTRL_SAMPLING(x)   (((x) << 16) & PECI_CTRL_SAMPLING_MASK)
+#define PECI_CTRL_SAMPLING_GET(x)   (((x) & PECI_CTRL_SAMPLING_MASK) >> 16)
+#define PECI_CTRL_READ_MODE_MASKGENMASK(13, 12)
+#define PECI_CTRL_READ_MODE(x)  (((x) << 12) & PECI_CTRL_READ_MODE_MASK)
+#define PECI_CTRL_READ_MODE_GET(x)  (((x) & PECI_CTRL_READ_MODE_MASK) >> 12)
+#define PECI_CTRL_READ_MODE_COUNT   BIT(12)
+#define PECI_CTRL_READ_MODE_DBG BIT(13)
+#define PECI_CTRL_CLK_SOURCE_MASK   BIT(11)
+#define PECI_CTRL_CLK_SOURCE(x) (((x) << 11) & PECI_CTRL_CLK_SOURCE_MASK)
+#define PECI_CTRL_CLK_SOURCE_GET(x) (((x) & PECI_CTRL_CLK_SOURCE_MASK) >> 11)
+#define PECI_CTRL_CLK_DIV_MASK  GENMASK(10, 8)
+#define PECI_CTRL_CLK_DIV(x)(((x) << 8) & PECI_CTRL_CLK_DIV_MASK)
+#define PECI_CTRL_CLK_DIV_GET(x)(((x) & PECI_CTRL_CLK_DIV_MASK) >> 8)
+#define PECI_CTRL_INVERT_OUTBIT(7)
+#define PECI_CTRL_INVERT_IN BIT(6)
+#define 

[PATCH v3 08/10] Documentation: hwmon: Add documents for PECI hwmon client drivers

2018-04-10 Thread Jae Hyun Yoo
This commit adds hwmon documents for PECI cputemp and dimmtemp drivers.

Signed-off-by: Jae Hyun Yoo 
Reviewed-by: Haiyue Wang 
Reviewed-by: James Feist 
Reviewed-by: Vernon Mauery 
Cc: Alan Cox 
Cc: Andrew Jeffery 
Cc: Andrew Lunn 
Cc: Andy Shevchenko 
Cc: Arnd Bergmann 
Cc: Benjamin Herrenschmidt 
Cc: Fengguang Wu 
Cc: Greg KH 
Cc: Guenter Roeck 
Cc: Jason M Biils 
Cc: Jean Delvare 
Cc: Joel Stanley 
Cc: Julia Cartwright 
Cc: Miguel Ojeda 
Cc: Milton Miller II 
Cc: Pavel Machek 
Cc: Randy Dunlap 
Cc: Stef van Os 
Cc: Sumeet R Pawnikar 
---
 Documentation/hwmon/peci-cputemp  | 88 +++
 Documentation/hwmon/peci-dimmtemp | 50 ++
 2 files changed, 138 insertions(+)
 create mode 100644 Documentation/hwmon/peci-cputemp
 create mode 100644 Documentation/hwmon/peci-dimmtemp

diff --git a/Documentation/hwmon/peci-cputemp b/Documentation/hwmon/peci-cputemp
new file mode 100644
index ..cdd5ea49a4a2
--- /dev/null
+++ b/Documentation/hwmon/peci-cputemp
@@ -0,0 +1,88 @@
+Kernel driver peci-cputemp
+==
+
+Supported chips:
+   One of Intel server CPUs listed below which is connected to a PECI bus.
+   * Intel Xeon E5/E7 v3 server processors
+   Intel Xeon E5-14xx v3 family
+   Intel Xeon E5-24xx v3 family
+   Intel Xeon E5-16xx v3 family
+   Intel Xeon E5-26xx v3 family
+   Intel Xeon E5-46xx v3 family
+   Intel Xeon E7-48xx v3 family
+   Intel Xeon E7-88xx v3 family
+   * Intel Xeon E5/E7 v4 server processors
+   Intel Xeon E5-16xx v4 family
+   Intel Xeon E5-26xx v4 family
+   Intel Xeon E5-46xx v4 family
+   Intel Xeon E7-48xx v4 family
+   Intel Xeon E7-88xx v4 family
+   * Intel Xeon Scalable server processors
+   Intel Xeon Bronze family
+   Intel Xeon Silver family
+   Intel Xeon Gold family
+   Intel Xeon Platinum family
+   Addresses scanned: PECI client address 0x30 - 0x37
+   Datasheet: Available from http://www.intel.com/design/literature.htm
+
+Author:
+   Jae Hyun Yoo 
+
+Description
+---
+
+This driver implements a generic PECI hwmon feature which provides Digital
+Thermal Sensor (DTS) thermal readings of the CPU package and CPU cores that are
+accessible using the PECI Client Command Suite via the processor PECI client.
+
+All temperature values are given in millidegree Celsius and will be measurable
+only when the target CPU is powered on.
+
+sysfs attributes
+
+
+temp1_label"Die"
+temp1_inputProvides current die temperature of the CPU package.
+temp1_max  Provides thermal control temperature of the CPU package
+   which is also known as Tcontrol.
+temp1_crit Provides shutdown temperature of the CPU package which
+   is also known as the maximum processor junction
+   temperature, Tjmax or Tprochot.
+temp1_crit_hystProvides the hysteresis value from Tcontrol to 
Tjmax of
+   the CPU package.
+
+temp2_label"DTS margin"
+temp2_inputProvides current DTS thermal margin to Tcontrol of the
+   CPU package. Value 0 means it reaches to Tcontrol
+   temperature. Sub-zero value means the die temperature
+   goes across Tconrtol to Tjmax.
+temp2_min  Provides the minimum DTS thermal margin to Tcontrol of
+   the CPU package.
+temp2_lcritProvides the value when the CPU package temperature
+   reaches to Tjmax.
+
+temp3_label"Tcontrol"
+temp3_inputProvides current Tcontrol temperature of the CPU
+   package which is also known as Fan Temperature target.
+   Indicates the relative value from thermal monitor trip
+   temperature at which fans should be engaged.
+temp3_crit Provides Tcontrol critical value of the CPU package
+   which is same to Tjmax.
+

[PATCH v3 04/10] Documentations: dt-bindings: Add a document of PECI adapter driver for Aspeed AST24xx/25xx SoCs

2018-04-10 Thread Jae Hyun Yoo
This commit adds a dt-bindings document of PECI adapter driver for Aspeed
AST24xx/25xx SoCs.

Signed-off-by: Jae Hyun Yoo 
Reviewed-by: Haiyue Wang 
Reviewed-by: James Feist 
Reviewed-by: Vernon Mauery 
Cc: Alan Cox 
Cc: Andrew Jeffery 
Cc: Andrew Lunn 
Cc: Andy Shevchenko 
Cc: Arnd Bergmann 
Cc: Benjamin Herrenschmidt 
Cc: Fengguang Wu 
Cc: Greg KH 
Cc: Guenter Roeck 
Cc: Jason M Biils 
Cc: Jean Delvare 
Cc: Joel Stanley 
Cc: Julia Cartwright 
Cc: Miguel Ojeda 
Cc: Milton Miller II 
Cc: Pavel Machek 
Cc: Randy Dunlap 
Cc: Stef van Os 
Cc: Sumeet R Pawnikar 
---
 .../devicetree/bindings/peci/peci-aspeed.txt   | 60 ++
 1 file changed, 60 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/peci/peci-aspeed.txt

diff --git a/Documentation/devicetree/bindings/peci/peci-aspeed.txt 
b/Documentation/devicetree/bindings/peci/peci-aspeed.txt
new file mode 100644
index ..4598bb8c20fa
--- /dev/null
+++ b/Documentation/devicetree/bindings/peci/peci-aspeed.txt
@@ -0,0 +1,60 @@
+Device tree configuration for PECI buses on the AST24XX and AST25XX SoCs.
+
+Required properties:
+- compatible: Should be "aspeed,ast2400-peci" or "aspeed,ast2500-peci"
+ - aspeed,ast2400-peci: Aspeed AST2400 family PECI
+controller
+ - aspeed,ast2500-peci: Aspeed AST2500 family PECI
+controller
+- reg   : Should contain PECI controller registers location and
+ length.
+- #address-cells: Should be <1>.
+- #size-cells   : Should be <0>.
+- interrupts: Should contain PECI controller interrupt.
+- clocks: Should contain clock source for PECI controller.
+ Should reference clkin.
+- clock_frequency   : Should contain the operation frequency of PECI controller
+ in units of Hz.
+ 187500 ~ 2400
+
+Optional properties:
+- msg-timing-nego   : Message timing negotiation period. This value will
+ determine the period of message timing negotiation to be
+ issued by PECI controller. The unit of the programmed
+ value is four times of PECI clock period.
+ 0 ~ 255 (default: 1)
+- addr-timing-nego  : Address timing negotiation period. This value will
+ determine the period of address timing negotiation to be
+ issued by PECI controller. The unit of the programmed
+ value is four times of PECI clock period.
+ 0 ~ 255 (default: 1)
+- rd-sampling-point : Read sampling point selection. The whole period of a bit
+ time will be divided into 16 time frames. This value will
+ determine the time frame in which the controller will
+ sample PECI signal for data read back. Usually in the
+ middle of a bit time is the best.
+ 0 ~ 15 (default: 8)
+- cmd_timeout_ms: Command timeout in units of ms.
+ 1 ~ 6 (default: 1000)
+
+Example:
+   peci: peci@1e78b000 {
+   compatible = "simple-bus";
+   #address-cells = <1>;
+   #size-cells = <1>;
+   ranges = <0x0 0x1e78b000 0x60>;
+
+   peci0: peci-bus@0 {
+   compatible = "aspeed,ast2500-peci";
+   reg = <0x0 0x60>;
+   #address-cells = <1>;
+   #size-cells = <0>;
+   interrupts = <15>;
+   clocks = <_clkin>;
+   clock-frequency = <2400>;
+   msg-timing-nego = <1>;
+   addr-timing-nego = <1>;
+   rd-sampling-point = <8>;
+   cmd-timeout-ms = <1000>;
+   };
+   };
-- 
2.16.2

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 07/10] Documentation: dt-bindings: Add documents for PECI hwmon client drivers

2018-04-10 Thread Jae Hyun Yoo
This commit adds dt-bindings documents for PECI cputemp and dimmtemp client
drivers.

Signed-off-by: Jae Hyun Yoo 
Reviewed-by: Haiyue Wang 
Reviewed-by: James Feist 
Reviewed-by: Vernon Mauery 
Cc: Alan Cox 
Cc: Andrew Jeffery 
Cc: Andrew Lunn 
Cc: Andy Shevchenko 
Cc: Arnd Bergmann 
Cc: Benjamin Herrenschmidt 
Cc: Fengguang Wu 
Cc: Greg KH 
Cc: Guenter Roeck 
Cc: Jason M Biils 
Cc: Jean Delvare 
Cc: Joel Stanley 
Cc: Julia Cartwright 
Cc: Miguel Ojeda 
Cc: Milton Miller II 
Cc: Pavel Machek 
Cc: Randy Dunlap 
Cc: Stef van Os 
Cc: Sumeet R Pawnikar 
---
 .../devicetree/bindings/hwmon/peci-cputemp.txt | 24 +
 .../devicetree/bindings/hwmon/peci-dimmtemp.txt| 25 ++
 2 files changed, 49 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/hwmon/peci-cputemp.txt
 create mode 100644 Documentation/devicetree/bindings/hwmon/peci-dimmtemp.txt

diff --git a/Documentation/devicetree/bindings/hwmon/peci-cputemp.txt 
b/Documentation/devicetree/bindings/hwmon/peci-cputemp.txt
new file mode 100644
index ..d5530ef9cfd2
--- /dev/null
+++ b/Documentation/devicetree/bindings/hwmon/peci-cputemp.txt
@@ -0,0 +1,24 @@
+Bindings for Intel PECI (Platform Environment Control Interface) cputemp 
driver.
+
+Required properties:
+- compatible : Should be "intel,peci-cputemp".
+- reg: Should contain address of a client CPU. Address range of CPU
+  clients is starting from 0x30 based on PECI specification.
+  <0x30> .. <0x37> (depends on the PECI_OFFSET_MAX definition)
+
+Example:
+   peci-bus@0 {
+   #address-cells = <1>;
+   #size-cells = <0>;
+   < more properties >
+
+   peci-cputemp@cpu0 {
+   compatible = "intel,peci-cputemp";
+   reg = <0x30>;
+   };
+
+   peci-cputemp@cpu1 {
+   compatible = "intel,peci-cputemp";
+   reg = <0x31>;
+   };
+   };
diff --git a/Documentation/devicetree/bindings/hwmon/peci-dimmtemp.txt 
b/Documentation/devicetree/bindings/hwmon/peci-dimmtemp.txt
new file mode 100644
index ..56e5deb61e5c
--- /dev/null
+++ b/Documentation/devicetree/bindings/hwmon/peci-dimmtemp.txt
@@ -0,0 +1,25 @@
+Bindings for Intel PECI (Platform Environment Control Interface) dimmtemp
+driver.
+
+Required properties:
+- compatible : Should be "intel,peci-dimmtemp".
+- reg: Should contain address of a client CPU. Address range of CPU
+  clients is starting from 0x30 based on PECI specification.
+  <0x30> .. <0x37> (depends on the PECI_OFFSET_MAX definition)
+
+Example:
+   peci-bus@0 {
+   #address-cells = <1>;
+   #size-cells = <0>;
+   < more properties >
+
+   peci-dimmtemp@cpu0 {
+   compatible = "intel,peci-dimmtemp";
+   reg = <0x30>;
+   };
+
+   peci-dimmtemp@cpu1 {
+   compatible = "intel,peci-dimmtemp";
+   reg = <0x31>;
+   };
+   };
-- 
2.16.2

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 09/10] drivers/hwmon: Add PECI hwmon client drivers

2018-04-10 Thread Jae Hyun Yoo
This commit adds PECI cputemp and dimmtemp hwmon drivers.

Signed-off-by: Jae Hyun Yoo 
Reviewed-by: Haiyue Wang 
Reviewed-by: James Feist 
Reviewed-by: Vernon Mauery 
Cc: Alan Cox 
Cc: Andrew Jeffery 
Cc: Andrew Lunn 
Cc: Andy Shevchenko 
Cc: Arnd Bergmann 
Cc: Benjamin Herrenschmidt 
Cc: Fengguang Wu 
Cc: Greg KH 
Cc: Guenter Roeck 
Cc: Jason M Biils 
Cc: Jean Delvare 
Cc: Joel Stanley 
Cc: Julia Cartwright 
Cc: Miguel Ojeda 
Cc: Milton Miller II 
Cc: Pavel Machek 
Cc: Randy Dunlap 
Cc: Stef van Os 
Cc: Sumeet R Pawnikar 
---
 drivers/hwmon/Kconfig |  28 ++
 drivers/hwmon/Makefile|   2 +
 drivers/hwmon/peci-cputemp.c  | 783 ++
 drivers/hwmon/peci-dimmtemp.c | 432 +++
 4 files changed, 1245 insertions(+)
 create mode 100644 drivers/hwmon/peci-cputemp.c
 create mode 100644 drivers/hwmon/peci-dimmtemp.c

diff --git a/drivers/hwmon/Kconfig b/drivers/hwmon/Kconfig
index f249a4428458..c52f610f81d0 100644
--- a/drivers/hwmon/Kconfig
+++ b/drivers/hwmon/Kconfig
@@ -1259,6 +1259,34 @@ config SENSORS_NCT7904
  This driver can also be built as a module.  If so, the module
  will be called nct7904.
 
+config SENSORS_PECI_CPUTEMP
+   tristate "PECI CPU temperature monitoring support"
+   depends on OF
+   depends on PECI
+   help
+ If you say yes here you get support for the generic Intel PECI
+ cputemp driver which provides Digital Thermal Sensor (DTS) thermal
+ readings of the CPU package and CPU cores that are accessible using
+ the PECI Client Command Suite via the processor PECI client.
+ Check Documentation/hwmon/peci-cputemp for details.
+
+ This driver can also be built as a module.  If so, the module
+ will be called peci-cputemp.
+
+config SENSORS_PECI_DIMMTEMP
+   tristate "PECI DIMM temperature monitoring support"
+   depends on OF
+   depends on PECI
+   help
+ If you say yes here you get support for the generic Intel PECI hwmon
+ driver which provides Digital Thermal Sensor (DTS) thermal readings of
+ DIMM components that are accessible using the PECI Client Command
+ Suite via the processor PECI client.
+ Check Documentation/hwmon/peci-dimmtemp for details.
+
+ This driver can also be built as a module.  If so, the module
+ will be called peci-dimmtemp.
+
 config SENSORS_NSA320
tristate "ZyXEL NSA320 and compatible fan speed and temperature sensors"
depends on GPIOLIB && OF
diff --git a/drivers/hwmon/Makefile b/drivers/hwmon/Makefile
index e7d52a36e6c4..48d9598fcd3a 100644
--- a/drivers/hwmon/Makefile
+++ b/drivers/hwmon/Makefile
@@ -136,6 +136,8 @@ obj-$(CONFIG_SENSORS_NCT7802)   += nct7802.o
 obj-$(CONFIG_SENSORS_NCT7904)  += nct7904.o
 obj-$(CONFIG_SENSORS_NSA320)   += nsa320-hwmon.o
 obj-$(CONFIG_SENSORS_NTC_THERMISTOR)   += ntc_thermistor.o
+obj-$(CONFIG_SENSORS_PECI_CPUTEMP) += peci-cputemp.o
+obj-$(CONFIG_SENSORS_PECI_DIMMTEMP)+= peci-dimmtemp.o
 obj-$(CONFIG_SENSORS_PC87360)  += pc87360.o
 obj-$(CONFIG_SENSORS_PC87427)  += pc87427.o
 obj-$(CONFIG_SENSORS_PCF8591)  += pcf8591.o
diff --git a/drivers/hwmon/peci-cputemp.c b/drivers/hwmon/peci-cputemp.c
new file mode 100644
index ..f0bc92687512
--- /dev/null
+++ b/drivers/hwmon/peci-cputemp.c
@@ -0,0 +1,783 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (c) 2018 Intel Corporation
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define TEMP_TYPE_PECI6  /* Sensor type 6: Intel PECI */
+
+#define CORE_MAX_ON_HSX   18 /* Max number of cores on Haswell */
+#define CORE_MAX_ON_BDX   24 /* Max number of cores on Broadwell */
+#define CORE_MAX_ON_SKX   28 /* Max number of cores on Skylake */
+
+#define DEFAULT_CHANNEL_NUMS  5
+#define CORETEMP_CHANNEL_NUMS CORE_MAX_ON_SKX
+#define CPUTEMP_CHANNEL_NUMS  (DEFAULT_CHANNEL_NUMS + CORETEMP_CHANNEL_NUMS)
+
+#define CLIENT_CPU_ID_MASK0xf0ff0  /* Mask for Family / Model info */
+
+#define UPDATE_INTERVAL_MIN   HZ
+
+enum cpu_gens {
+   CPU_GEN_HSX, /* Haswell Xeon */
+   CPU_GEN_BRX, /* Broadwell Xeon */
+   CPU_GEN_SKX, /* Skylake Xeon */
+   CPU_GEN_MAX
+};
+
+struct cpu_gen_info {
+   u32 type;
+   u32 cpu_id;
+   u32 core_max;
+};
+
+struct temp_data {
+   bool valid;
+   s32  value;
+  

[PATCH v3 10/10] Add a maintainer for the PECI subsystem

2018-04-10 Thread Jae Hyun Yoo
This commit adds a maintainer information for the PECI subsystem.

Signed-off-by: Jae Hyun Yoo 
Reviewed-by: Haiyue Wang 
Reviewed-by: James Feist 
Reviewed-by: Vernon Mauery 
Cc: Alan Cox 
Cc: Andrew Jeffery 
Cc: Andrew Lunn 
Cc: Andy Shevchenko 
Cc: Arnd Bergmann 
Cc: Benjamin Herrenschmidt 
Cc: Fengguang Wu 
Cc: Greg KH 
Cc: Guenter Roeck 
Cc: Jason M Biils 
Cc: Jean Delvare 
Cc: Joel Stanley 
Cc: Julia Cartwright 
Cc: Miguel Ojeda 
Cc: Milton Miller II 
Cc: Pavel Machek 
Cc: Randy Dunlap 
Cc: Stef van Os 
Cc: Sumeet R Pawnikar 
---
 MAINTAINERS | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 5cd5ff0e4428..3e6917e1ad31 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -10965,6 +10965,16 @@ L: platform-driver-...@vger.kernel.org
 S: Maintained
 F: drivers/platform/x86/peaq-wmi.c
 
+PECI SUBSYSTEM
+M: Jae Hyun Yoo 
+M: Jason M Biils 
+S: Maintained
+F: Documentation/devicetree/bindings/peci/
+F: drivers/peci/
+F: drivers/hwmon/peci-*.c
+F: include/linux/peci.h
+F: include/uapi/linux/peci-ioctl.h
+
 PER-CPU MEMORY ALLOCATOR
 M: Tejun Heo 
 M: Christoph Lameter 
-- 
2.16.2

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC bpf-next v2 1/8] bpf: add script and prepare bpf.h for new helpers documentation

2018-04-10 Thread Alexei Starovoitov
On Tue, Apr 10, 2018 at 03:41:50PM +0100, Quentin Monnet wrote:
> Remove previous "overview" of eBPF helpers from user bpf.h header.
> Replace it by a comment explaining how to process the new documentation
> (to come in following patches) with a Python script to produce RST, then
> man page documentation.
> 
> Also add the aforementioned Python script under scripts/. It is used to
> process include/uapi/linux/bpf.h and to extract helper descriptions, to
> turn it into a RST document that can further be processed with rst2man
> to produce a man page. The script takes one "--filename "
> option. If the script is launched from scripts/ in the kernel root
> directory, it should be able to find the location of the header to
> parse, and "--filename " is then optional. If it cannot
> find the file, then the option becomes mandatory. RST-formatted
> documentation is printed to standard output.
> 
> Typical workflow for producing the final man page would be:
> 
> $ ./scripts/bpf_helpers_doc.py \
> --filename include/uapi/linux/bpf.h > /tmp/bpf-helpers.rst
> $ rst2man /tmp/bpf-helpers.rst > /tmp/bpf-helpers.7
> $ man /tmp/bpf-helpers.7
> 
> Note that the tool kernel-doc cannot be used to document eBPF helpers,
> whose signatures are not available directly in the header files
> (pre-processor directives are used to produce them at the beginning of
> the compilation process).
> 
> Signed-off-by: Quentin Monnet 
> ---
>  include/uapi/linux/bpf.h   | 406 ++--
>  scripts/bpf_helpers_doc.py | 414 
> +
>  2 files changed, 430 insertions(+), 390 deletions(-)
>  create mode 100755 scripts/bpf_helpers_doc.py
> 
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index c5ec89732a8d..45f77f01e672 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -365,396 +365,22 @@ union bpf_attr {
>   } raw_tracepoint;
>  } __attribute__((aligned(8)));
>  
> -/* BPF helper function descriptions:
> - *
> - * void *bpf_map_lookup_elem(, )
> - * Return: Map value or NULL
> - *
> - * int bpf_map_update_elem(, , , flags)
> - * Return: 0 on success or negative error
> - *
> - * int bpf_map_delete_elem(, )
> - * Return: 0 on success or negative error
> - *
> - * int bpf_probe_read(void *dst, int size, void *src)
> - * Return: 0 on success or negative error
> - *
> - * u64 bpf_ktime_get_ns(void)
> - * Return: current ktime
> - *
> - * int bpf_trace_printk(const char *fmt, int fmt_size, ...)
> - * Return: length of buffer written or negative error
> - *
> - * u32 bpf_prandom_u32(void)
> - * Return: random value
> - *
> - * u32 bpf_raw_smp_processor_id(void)
> - * Return: SMP processor ID
> - *
> - * int bpf_skb_store_bytes(skb, offset, from, len, flags)
> - * store bytes into packet
> - * @skb: pointer to skb
> - * @offset: offset within packet from skb->mac_header
> - * @from: pointer where to copy bytes from
> - * @len: number of bytes to store into packet
> - * @flags: bit 0 - if true, recompute skb->csum
> - * other bits - reserved
> - * Return: 0 on success or negative error
> - *
> - * int bpf_l3_csum_replace(skb, offset, from, to, flags)
> - * recompute IP checksum
> - * @skb: pointer to skb
> - * @offset: offset within packet where IP checksum is located
> - * @from: old value of header field
> - * @to: new value of header field
> - * @flags: bits 0-3 - size of header field
> - * other bits - reserved
> - * Return: 0 on success or negative error
> - *
> - * int bpf_l4_csum_replace(skb, offset, from, to, flags)
> - * recompute TCP/UDP checksum
> - * @skb: pointer to skb
> - * @offset: offset within packet where TCP/UDP checksum is located
> - * @from: old value of header field
> - * @to: new value of header field
> - * @flags: bits 0-3 - size of header field
> - * bit 4 - is pseudo header
> - * other bits - reserved
> - * Return: 0 on success or negative error
> - *
> - * int bpf_tail_call(ctx, prog_array_map, index)
> - * jump into another BPF program
> - * @ctx: context pointer passed to next program
> - * @prog_array_map: pointer to map which type is BPF_MAP_TYPE_PROG_ARRAY
> - * @index: 32-bit index inside array that selects specific program to run
> - * Return: 0 on success or negative error
> - *
> - * int bpf_clone_redirect(skb, ifindex, flags)
> - * redirect to another netdev
> - * @skb: pointer to skb
> - * @ifindex: ifindex of the net device
> - * @flags: bit 0 - if set, redirect to ingress instead of egress
> - * other bits - reserved
> - * Return: 0 on success or negative error
> - *
> - * u64 bpf_get_current_pid_tgid(void)
> - * Return: current->tgid << 32 | current->pid
> - *
> - * u64 bpf_get_current_uid_gid(void)
> - * 

Re: [PATCH 1/2] perf: riscv: preliminary RISC-V support

2018-04-10 Thread Alex Solomatnikov
Alan,

I merged SBI emulation for perf counters and config:
https://github.com/riscv/riscv-pk/pull/98

You should be able to write these CSRs.

Thanks,
Alex

On Mon, Apr 9, 2018 at 12:07 AM, Alan Kao  wrote:
> On Thu, Apr 05, 2018 at 09:47:50AM -0700, Palmer Dabbelt wrote:
>> On Mon, 26 Mar 2018 00:57:54 PDT (-0700), alan...@andestech.com wrote:
>> >This patch provide a basic PMU, riscv_base_pmu, which supports two
>> >general hardware event, instructions and cycles.  Furthermore, this
>> >PMU serves as a reference implementation to ease the portings in
>> >the future.
>> >
>> >riscv_base_pmu should be able to run on any RISC-V machine that
>> >conforms to the Priv-Spec.  Note that the latest qemu model hasn't
>> >fully support a proper behavior of Priv-Spec 1.10 yet, but work
>> >around should be easy with very small fixes.  Please check
>> >https://github.com/riscv/riscv-qemu/pull/115 for future updates.
>> >
>> >Cc: Nick Hu 
>> >Cc: Greentime Hu 
>> >Signed-off-by: Alan Kao 
>>
>> We should really be able to detect PMU types at runtime (via a device tree
>> entry) rather than requiring that a single PMU is built in to the kernel.
>> This will require a handful of modifications to how this patch works, which
>> I'll try to list below.
>
>> >+menu "PMU type"
>> >+depends on PERF_EVENTS
>> >+
>> >+config RISCV_BASE_PMU
>> >+bool "Base Performance Monitoring Unit"
>> >+def_bool y
>> >+help
>> >+  A base PMU that serves as a reference implementation and has limited
>> >+  feature of perf.
>> >+
>> >+endmenu
>> >+
>>
>> Rather than a menu where a single PMU can be selected, there should be
>> options to enable or disable support for each PMU type -- this is just like
>> how all our other drivers work.
>>
>
> I see.  Sure.  The descriptions and implementation will be refined in v3.
>
>> >+struct pmu * __weak __init riscv_init_platform_pmu(void)
>> >+{
>> >+riscv_pmu = _base_pmu;
>> >+return riscv_pmu->pmu;
>> >+}
>>
>> Rather than relying on a weak symbol that gets overridden by other PMU
>> types, this should look through the device tree for a compatible PMU (in the
>> case of just the base PMU it could be any RISC-V hart) and install a PMU
>> handler for it.  There'd probably be some sort of priority scheme here, like
>> there are for other driver subsystems, where we'd pick the best PMU driver
>> that's compatible with the PMUs on every hart.
>>
>> >+
>> >+int __init init_hw_perf_events(void)
>> >+{
>> >+struct pmu *pmu = riscv_init_platform_pmu();
>> >+
>> >+perf_irq = NULL;
>> >+perf_pmu_register(pmu, "cpu", PERF_TYPE_RAW);
>> >+return 0;
>> >+}
>> >+arch_initcall(init_hw_perf_events);
>>
>> Since we only have a single PMU type right now this isn't critical to handle
>> right away, but we will have to refactor this before adding another PMU.
>
> I see.  My rough plan is to do the device tree parsing here, and if no 
> specific
> PMU string is found then just register the base PMU proposed in this patch.
> How about this idea?
>
> Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC bpf-next v2 2/8] bpf: add documentation for eBPF helpers (01-11)

2018-04-10 Thread Alexei Starovoitov
On Tue, Apr 10, 2018 at 03:41:51PM +0100, Quentin Monnet wrote:
> Add documentation for eBPF helper functions to bpf.h user header file.
> This documentation can be parsed with the Python script provided in
> another commit of the patch series, in order to provide a RST document
> that can later be converted into a man page.
> 
> The objective is to make the documentation easily understandable and
> accessible to all eBPF developers, including beginners.
> 
> This patch contains descriptions for the following helper functions, all
> written by Alexei:
> 
> - bpf_map_lookup_elem()
> - bpf_map_update_elem()
> - bpf_map_delete_elem()
> - bpf_probe_read()
> - bpf_ktime_get_ns()
> - bpf_trace_printk()
> - bpf_skb_store_bytes()
> - bpf_l3_csum_replace()
> - bpf_l4_csum_replace()
> - bpf_tail_call()
> - bpf_clone_redirect()
> 
> Cc: Alexei Starovoitov 
> Signed-off-by: Quentin Monnet 
> ---
>  include/uapi/linux/bpf.h | 199 
> +++
>  1 file changed, 199 insertions(+)
> 
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index 45f77f01e672..2bc653a3a20f 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -381,6 +381,205 @@ union bpf_attr {
>   * intentional, removing them would break paragraphs for rst2man.
>   *
>   * Start of BPF helper function descriptions:
> + *
> + * void *bpf_map_lookup_elem(struct bpf_map *map, void *key)
> + *   Description
> + *   Perform a lookup in *map* for an entry associated to *key*.
> + *   Return
> + *   Map value associated to *key*, or **NULL** if no entry was
> + *   found.
> + *
> + * int bpf_map_update_elem(struct bpf_map *map, void *key, void *value, u64 
> flags)
> + *   Description
> + *   Add or update the value of the entry associated to *key* in
> + *   *map* with *value*. *flags* is one of:
> + *
> + *   **BPF_NOEXIST**
> + *   The entry for *key* must not exist in the map.
> + *   **BPF_EXIST**
> + *   The entry for *key* must already exist in the map.
> + *   **BPF_ANY**
> + *   No condition on the existence of the entry for *key*.
> + *
> + *   These flags are only useful for maps of type
> + *   **BPF_MAP_TYPE_HASH**. For all other map types, **BPF_ANY**
> + *   should be used.

I think that's not entirely accurate.
The flags work as expected for all other map types as well
and for lru map, sockmap, map in map the flags have practical use cases.

> + *   Return
> + *   0 on success, or a negative error in case of failure.
> + *
> + * int bpf_map_delete_elem(struct bpf_map *map, void *key)
> + *   Description
> + *   Delete entry with *key* from *map*.
> + *   Return
> + *   0 on success, or a negative error in case of failure.
> + *
> + * int bpf_probe_read(void *dst, u32 size, const void *src)
> + *   Description
> + *   For tracing programs, safely attempt to read *size* bytes from
> + *   address *src* and store the data in *dst*.
> + *   Return
> + *   0 on success, or a negative error in case of failure.
> + *
> + * u64 bpf_ktime_get_ns(void)
> + *   Description
> + *   Return the time elapsed since system boot, in nanoseconds.
> + *   Return
> + *   Current *ktime*.
> + *
> + * int bpf_trace_printk(const char *fmt, u32 fmt_size, ...)
> + *   Description
> + *   This helper is a "printk()-like" facility for debugging. It
> + *   prints a message defined by format *fmt* (of size *fmt_size*)
> + *   to file *\/sys/kernel/debug/tracing/trace* from DebugFS, if
> + *   available. It can take up to three additional **u64**
> + *   arguments (as an eBPF helpers, the total number of arguments is
> + *   limited to five). Each time the helper is called, it appends a
> + *   line that looks like the following:
> + *
> + *   ::
> + *
> + *   telnet-470   [001] .N.. 419421.045894: 0x0001: BPF 
> command: 2
> + *
> + *   In the above:
> + *
> + *   * ``telnet`` is the name of the current task.
> + *   * ``470`` is the PID of the current task.
> + *   * ``001`` is the CPU number on which the task is
> + * running.
> + *   * In ``.N..``, each character refers to a set of
> + * options (whether irqs are enabled, scheduling
> + * options, whether hard/softirqs are running, level of
> + * preempt_disabled respectively). **N** means that
> + * **TIF_NEED_RESCHED** and **PREEMPT_NEED_RESCHED**
> + * are set.
> + *   * ``419421.045894`` is a timestamp.
> + *   * ``0x0001`` is a fake value used by BPF for the
> + * 

Re: [RFC bpf-next v2 7/8] bpf: add documentation for eBPF helpers (51-57)

2018-04-10 Thread Andrey Ignatov
Quentin Monnet  [Tue, 2018-04-10 07:43 -0700]:
> + * int bpf_bind(struct bpf_sock_addr_kern *ctx, struct sockaddr *addr, int 
> addr_len)
> + *   Description
> + *   Bind the socket associated to *ctx* to the address pointed by
> + *   *addr*, of length *addr_len*. This allows for making outgoing
> + *   connection from the desired IP address, which can be useful for
> + *   example when all processes inside a cgroup should use one
> + *   single IP address on a host that has multiple IP configured.
> + *
> + *   This helper works for IPv4 and IPv6, TCP and UDP sockets. The
> + *   domain (*addr*\ **->sa_family**) must be **AF_INET** (or
> + *   **AF_INET6**). Looking for a free port to bind to can be
> + *   expensive, therefore binding to port is not permitted by the
> + *   helper: *addr*\ **->sin_port** (or **sin6_port**, respectively)
> + *   must be set to zero.
> + *
> + *   As for the remote end, both parts of it can be overridden,
> + *   remote IP and remote port. This can be useful if an application
> + *   inside a cgroup wants to connect to another application inside
> + *   the same cgroup or to itself, but knows nothing about the IP
> + *   address assigned to the cgroup.

The last paragraph ("As for the remote end ...") is not relevant to
bpf_bind() and should be removed. It's about sys_connect hook itself
that can call to bpf_bind() but also has other functionality (and that
other functionality is described by this paragraph).


-- 
Andrey Ignatov
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC 01/10] PCI: dwc: Add MSI-X callbacks handler

2018-04-10 Thread Gustavo Pimentel
Changes the pcie_raise_irq function signature, namely the interrupt_num
variable type from u8 to u16 to accommodate the MSI-X maximum interrupts
of 2048.

Implements a PCIe config space capability iterator function to search and
save the MSI and MSI-X pointers. With this method the code becomes more
generic and flexible.

Implements MSI-X set/get functions for sysfs interface in order to change
the EP entries number.

Implements EP MSI-X interface for triggering interruptions.

Signed-off-by: Gustavo Pimentel 
---
 drivers/pci/dwc/pci-dra7xx.c   |   2 +-
 drivers/pci/dwc/pcie-artpec6.c |   2 +-
 drivers/pci/dwc/pcie-designware-ep.c   | 145 -
 drivers/pci/dwc/pcie-designware-plat.c |   6 +-
 drivers/pci/dwc/pcie-designware.h  |  23 +-
 5 files changed, 173 insertions(+), 5 deletions(-)

diff --git a/drivers/pci/dwc/pci-dra7xx.c b/drivers/pci/dwc/pci-dra7xx.c
index ed8558d..5265725 100644
--- a/drivers/pci/dwc/pci-dra7xx.c
+++ b/drivers/pci/dwc/pci-dra7xx.c
@@ -369,7 +369,7 @@ static void dra7xx_pcie_raise_msi_irq(struct dra7xx_pcie 
*dra7xx,
 }
 
 static int dra7xx_pcie_raise_irq(struct dw_pcie_ep *ep, u8 func_no,
-enum pci_epc_irq_type type, u8 interrupt_num)
+enum pci_epc_irq_type type, u16 interrupt_num)
 {
struct dw_pcie *pci = to_dw_pcie_from_ep(ep);
struct dra7xx_pcie *dra7xx = to_dra7xx_pcie(pci);
diff --git a/drivers/pci/dwc/pcie-artpec6.c b/drivers/pci/dwc/pcie-artpec6.c
index e66cede..96dc259 100644
--- a/drivers/pci/dwc/pcie-artpec6.c
+++ b/drivers/pci/dwc/pcie-artpec6.c
@@ -428,7 +428,7 @@ static void artpec6_pcie_ep_init(struct dw_pcie_ep *ep)
 }
 
 static int artpec6_pcie_raise_irq(struct dw_pcie_ep *ep, u8 func_no,
- enum pci_epc_irq_type type, u8 interrupt_num)
+ enum pci_epc_irq_type type, u16 interrupt_num)
 {
struct dw_pcie *pci = to_dw_pcie_from_ep(ep);
 
diff --git a/drivers/pci/dwc/pcie-designware-ep.c 
b/drivers/pci/dwc/pcie-designware-ep.c
index 15b22a6..874d4c2 100644
--- a/drivers/pci/dwc/pcie-designware-ep.c
+++ b/drivers/pci/dwc/pcie-designware-ep.c
@@ -40,6 +40,44 @@ void dw_pcie_ep_reset_bar(struct dw_pcie *pci, enum 
pci_barno bar)
__dw_pcie_ep_reset_bar(pci, bar, 0);
 }
 
+void dw_pcie_ep_find_cap_addr(struct dw_pcie_ep *ep)
+{
+   struct dw_pcie *pci = to_dw_pcie_from_ep(ep);
+   u8 next_ptr, curr_ptr, cap_id;
+   u16 reg;
+
+   memset(>cap_addr, 0, sizeof(ep->cap_addr));
+
+   reg = dw_pcie_readw_dbi(pci, PCI_STATUS);
+   if (!(reg & PCI_STATUS_CAP_LIST))
+   return;
+
+   reg = dw_pcie_readw_dbi(pci, PCI_CAPABILITY_LIST);
+   next_ptr = (reg & 0x00ff);
+   if (!next_ptr)
+   return;
+
+   reg = dw_pcie_readw_dbi(pci, next_ptr);
+   curr_ptr = next_ptr;
+   next_ptr = (reg & 0xff00) >> 8;
+   cap_id = (reg & 0x00ff);
+
+   while (next_ptr && (cap_id <= PCI_CAP_ID_MAX)) {
+   switch (cap_id) {
+   case PCI_CAP_ID_MSI:
+   ep->cap_addr.msi_addr = curr_ptr;
+   break;
+   case PCI_CAP_ID_MSIX:
+   ep->cap_addr.msix_addr = curr_ptr;
+   break;
+   }
+   reg = dw_pcie_readw_dbi(pci, next_ptr);
+   curr_ptr = next_ptr;
+   next_ptr = (reg & 0xff00) >> 8;
+   cap_id = (reg & 0x00ff);
+   }
+}
+
 static int dw_pcie_ep_write_header(struct pci_epc *epc, u8 func_no,
   struct pci_epf_header *hdr)
 {
@@ -241,8 +279,47 @@ static int dw_pcie_ep_set_msi(struct pci_epc *epc, u8 
func_no, u8 encode_int)
return 0;
 }
 
+static int dw_pcie_ep_get_msix(struct pci_epc *epc, u8 func_no)
+{
+   struct dw_pcie_ep *ep = epc_get_drvdata(epc);
+   struct dw_pcie *pci = to_dw_pcie_from_ep(ep);
+   u32 val, reg;
+
+   if (ep->cap_addr.msix_addr == 0)
+   return 0;
+
+   reg = ep->cap_addr.msix_addr + PCI_MSIX_FLAGS;
+   val = dw_pcie_readw_dbi(pci, reg);
+   if (!(val & PCI_MSIX_FLAGS_ENABLE))
+   return -EINVAL;
+
+   val &= PCI_MSIX_FLAGS_QSIZE;
+
+   return val;
+}
+
+static int dw_pcie_ep_set_msix(struct pci_epc *epc, u8 func_no, u16 interrupts)
+{
+   struct dw_pcie_ep *ep = epc_get_drvdata(epc);
+   struct dw_pcie *pci = to_dw_pcie_from_ep(ep);
+   u32 val, reg;
+
+   if (ep->cap_addr.msix_addr == 0)
+   return 0;
+
+   reg = ep->cap_addr.msix_addr + PCI_MSIX_FLAGS;
+   val = dw_pcie_readw_dbi(pci, reg);
+   val &= ~PCI_MSIX_FLAGS_QSIZE;
+   val |= interrupts;
+   dw_pcie_dbi_ro_wr_en(pci);
+   dw_pcie_writew_dbi(pci, reg, val);
+   dw_pcie_dbi_ro_wr_dis(pci);
+
+   return 0;
+}
+
 static int dw_pcie_ep_raise_irq(struct pci_epc *epc, u8 

[RFC 05/10] PCI: dwc: Add legacy interrupt callback handler

2018-04-10 Thread Gustavo Pimentel
Adds a legacy interrupt callback handler. Currently Designware IP doesn't
allow triggering the legacy interrupt.

Signed-off-by: Gustavo Pimentel 
---
 drivers/pci/dwc/pcie-designware-ep.c   | 10 ++
 drivers/pci/dwc/pcie-designware-plat.c |  3 +--
 drivers/pci/dwc/pcie-designware.h  |  6 ++
 3 files changed, 17 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/dwc/pcie-designware-ep.c 
b/drivers/pci/dwc/pcie-designware-ep.c
index e352786..fb55259 100644
--- a/drivers/pci/dwc/pcie-designware-ep.c
+++ b/drivers/pci/dwc/pcie-designware-ep.c
@@ -375,6 +375,16 @@ static const struct pci_epc_ops epc_ops = {
.stop   = dw_pcie_ep_stop,
 };
 
+int dw_pcie_ep_raise_legacy_irq(struct dw_pcie_ep *ep, u8 func_no)
+{
+   struct dw_pcie *pci = to_dw_pcie_from_ep(ep);
+   struct device *dev = pci->dev;
+
+   dev_err(dev, "EP cannot trigger legacy IRQs\n");
+
+   return -EINVAL;
+}
+
 int dw_pcie_ep_raise_msi_irq(struct dw_pcie_ep *ep, u8 func_no,
 u8 interrupt_num)
 {
diff --git a/drivers/pci/dwc/pcie-designware-plat.c 
b/drivers/pci/dwc/pcie-designware-plat.c
index c3a4707..3874b02 100644
--- a/drivers/pci/dwc/pcie-designware-plat.c
+++ b/drivers/pci/dwc/pcie-designware-plat.c
@@ -86,8 +86,7 @@ static int dw_plat_pcie_ep_raise_irq(struct dw_pcie_ep *ep, 
u8 func_no,
 
switch (type) {
case PCI_EPC_IRQ_LEGACY:
-   dev_err(pci->dev, "EP cannot trigger legacy IRQs\n");
-   return -EINVAL;
+   return dw_pcie_ep_raise_legacy_irq(ep, func_no);
case PCI_EPC_IRQ_MSI:
return dw_pcie_ep_raise_msi_irq(ep, func_no, interrupt_num);
case PCI_EPC_IRQ_MSIX:
diff --git a/drivers/pci/dwc/pcie-designware.h 
b/drivers/pci/dwc/pcie-designware.h
index 2acf18b0..808b280 100644
--- a/drivers/pci/dwc/pcie-designware.h
+++ b/drivers/pci/dwc/pcie-designware.h
@@ -354,6 +354,7 @@ static inline int dw_pcie_allocate_domains(struct pcie_port 
*pp)
 void dw_pcie_ep_linkup(struct dw_pcie_ep *ep);
 int dw_pcie_ep_init(struct dw_pcie_ep *ep);
 void dw_pcie_ep_exit(struct dw_pcie_ep *ep);
+int dw_pcie_ep_raise_legacy_irq(struct dw_pcie_ep *ep, u8 func_no);
 int dw_pcie_ep_raise_msi_irq(struct dw_pcie_ep *ep, u8 func_no,
 u8 interrupt_num);
 int dw_pcie_ep_raise_msix_irq(struct dw_pcie_ep *ep, u8 func_no,
@@ -374,6 +375,11 @@ static inline void dw_pcie_ep_exit(struct dw_pcie_ep *ep)
 {
 }
 
+static inline int dw_pcie_ep_raise_legacy_irq(struct dw_pcie_ep *ep, u8 
func_no)
+{
+   return 0;
+}
+
 static inline int dw_pcie_ep_raise_msi_irq(struct dw_pcie_ep *ep, u8 func_no,
   u8 interrupt_num)
 {
-- 
2.7.4


--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC 07/10] misc: pci_endpoint_test: Replace lower into upper case characters

2018-04-10 Thread Gustavo Pimentel
Replaces lower into upper case characters in comments and debug printks.

Signed-off-by: Gustavo Pimentel 
---
 drivers/misc/pci_endpoint_test.c | 18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/drivers/misc/pci_endpoint_test.c b/drivers/misc/pci_endpoint_test.c
index a7d9354..7212a7d 100644
--- a/drivers/misc/pci_endpoint_test.c
+++ b/drivers/misc/pci_endpoint_test.c
@@ -259,7 +259,7 @@ static bool pci_endpoint_test_copy(struct pci_endpoint_test 
*test, size_t size)
orig_src_addr = dma_alloc_coherent(dev, size + alignment,
   _src_phys_addr, GFP_KERNEL);
if (!orig_src_addr) {
-   dev_err(dev, "failed to allocate source buffer\n");
+   dev_err(dev, "Failed to allocate source buffer\n");
ret = false;
goto err;
}
@@ -285,7 +285,7 @@ static bool pci_endpoint_test_copy(struct pci_endpoint_test 
*test, size_t size)
orig_dst_addr = dma_alloc_coherent(dev, size + alignment,
   _dst_phys_addr, GFP_KERNEL);
if (!orig_dst_addr) {
-   dev_err(dev, "failed to allocate destination address\n");
+   dev_err(dev, "Failed to allocate destination address\n");
ret = false;
goto err_orig_src_addr;
}
@@ -349,7 +349,7 @@ static bool pci_endpoint_test_write(struct 
pci_endpoint_test *test, size_t size)
orig_addr = dma_alloc_coherent(dev, size + alignment, _phys_addr,
   GFP_KERNEL);
if (!orig_addr) {
-   dev_err(dev, "failed to allocate address\n");
+   dev_err(dev, "Failed to allocate address\n");
ret = false;
goto err;
}
@@ -412,7 +412,7 @@ static bool pci_endpoint_test_read(struct pci_endpoint_test 
*test, size_t size)
orig_addr = dma_alloc_coherent(dev, size + alignment, _phys_addr,
   GFP_KERNEL);
if (!orig_addr) {
-   dev_err(dev, "failed to allocate destination address\n");
+   dev_err(dev, "Failed to allocate destination address\n");
ret = false;
goto err;
}
@@ -550,7 +550,7 @@ static int pci_endpoint_test_probe(struct pci_dev *pdev,
case IRQ_TYPE_MSI:
irq = pci_alloc_irq_vectors(pdev, 1, 32, PCI_IRQ_MSI);
if (irq < 0)
-   dev_err(dev, "failed to get MSI interrupts\n");
+   dev_err(dev, "Failed to get MSI interrupts\n");
test->num_irqs = irq;
break;
case IRQ_TYPE_MSIX:
@@ -567,7 +567,7 @@ static int pci_endpoint_test_probe(struct pci_dev *pdev,
err = devm_request_irq(dev, pdev->irq, pci_endpoint_test_irqhandler,
   IRQF_SHARED, DRV_MODULE_NAME, test);
if (err) {
-   dev_err(dev, "failed to request IRQ %d\n", pdev->irq);
+   dev_err(dev, "Failed to request IRQ %d\n", pdev->irq);
goto err_disable_msi;
}
 
@@ -585,7 +585,7 @@ static int pci_endpoint_test_probe(struct pci_dev *pdev,
if (pci_resource_flags(pdev, bar) & IORESOURCE_MEM) {
base = pci_ioremap_bar(pdev, bar);
if (!base) {
-   dev_err(dev, "failed to read BAR%d\n", bar);
+   dev_err(dev, "Failed to read BAR%d\n", bar);
WARN_ON(bar == test_reg_bar);
}
test->bar[bar] = base;
@@ -605,7 +605,7 @@ static int pci_endpoint_test_probe(struct pci_dev *pdev,
id = ida_simple_get(_endpoint_test_ida, 0, 0, GFP_KERNEL);
if (id < 0) {
err = id;
-   dev_err(dev, "unable to get id\n");
+   dev_err(dev, "Unable to get id\n");
goto err_iounmap;
}
 
@@ -621,7 +621,7 @@ static int pci_endpoint_test_probe(struct pci_dev *pdev,
 
err = misc_register(misc_device);
if (err) {
-   dev_err(dev, "failed to register device\n");
+   dev_err(dev, "Failed to register device\n");
goto err_kfree_name;
}
 
-- 
2.7.4


--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC 08/10] PCI: endpoint: functions/pci-epf-test: Add MSI-X support

2018-04-10 Thread Gustavo Pimentel
Adds driver's MSI-X support.

Signed-off-by: Gustavo Pimentel 
---
 drivers/pci/endpoint/functions/pci-epf-test.c | 87 +--
 1 file changed, 69 insertions(+), 18 deletions(-)

diff --git a/drivers/pci/endpoint/functions/pci-epf-test.c 
b/drivers/pci/endpoint/functions/pci-epf-test.c
index 63dca44..5997c6e 100644
--- a/drivers/pci/endpoint/functions/pci-epf-test.c
+++ b/drivers/pci/endpoint/functions/pci-epf-test.c
@@ -20,11 +20,18 @@
 
 #define COMMAND_RAISE_LEGACY_IRQ   BIT(0)
 #define COMMAND_RAISE_MSI_IRQ  BIT(1)
-#define MSI_NUMBER_SHIFT   2
+#define COMMAND_RAISE_MSIX_IRQ BIT(2)
+#define IRQ_TYPE_SHIFT 3
+#define MSI_NUMBER_SHIFT   5
+#define IRQ_TYPE_MASK  (0x3 << IRQ_TYPE_SHIFT)
+#define IRQ_TYPE_LEGACY0
+#define IRQ_TYPE_MSI   1
+#define IRQ_TYPE_MSIX  2
 #define MSI_NUMBER_MASK(0x3f << MSI_NUMBER_SHIFT)
-#define COMMAND_READ   BIT(8)
-#define COMMAND_WRITE  BIT(9)
-#define COMMAND_COPY   BIT(10)
+#define MSIX_NUMBER_MASK   (0xfff << MSI_NUMBER_SHIFT)
+#define COMMAND_READ   BIT(17)
+#define COMMAND_WRITE  BIT(18)
+#define COMMAND_COPY   BIT(19)
 
 #define STATUS_READ_SUCCESSBIT(0)
 #define STATUS_READ_FAIL   BIT(1)
@@ -244,31 +251,44 @@ static int pci_epf_test_write(struct pci_epf_test 
*epf_test)
return ret;
 }
 
-static void pci_epf_test_raise_irq(struct pci_epf_test *epf_test, u8 irq)
+static void pci_epf_test_raise_irq(struct pci_epf_test *epf_test, u8 irq_type,
+  u16 irq)
 {
-   u8 msi_count;
struct pci_epf *epf = epf_test->epf;
+   struct device *dev = >dev;
struct pci_epc *epc = epf->epc;
enum pci_barno test_reg_bar = epf_test->test_reg_bar;
struct pci_epf_test_reg *reg = epf_test->reg[test_reg_bar];
 
reg->status |= STATUS_IRQ_RAISED;
-   msi_count = pci_epc_get_msi(epc, epf->func_no);
-   if (irq > msi_count || msi_count <= 0)
-   pci_epc_raise_irq(epc, epf->func_no, PCI_EPC_IRQ_LEGACY, 0);
-   else
+
+   switch (irq_type) {
+   case IRQ_TYPE_LEGACY:
+   pci_epc_raise_irq(epc, epf->func_no, PCI_EPC_IRQ_LEGACY, irq);
+   break;
+   case IRQ_TYPE_MSI:
pci_epc_raise_irq(epc, epf->func_no, PCI_EPC_IRQ_MSI, irq);
+   break;
+   case IRQ_TYPE_MSIX:
+   pci_epc_raise_irq(epc, epf->func_no, PCI_EPC_IRQ_MSIX, irq);
+   break;
+   default:
+   dev_err(dev, "Failed to raise IRQ, unknown type\n");
+   break;
+   }
 }
 
 static void pci_epf_test_cmd_handler(struct work_struct *work)
 {
int ret;
-   u8 irq;
-   u8 msi_count;
+   u16 irq;
+   u8 irq_type;
+   u16 msi_count;
u32 command;
struct pci_epf_test *epf_test = container_of(work, struct pci_epf_test,
 cmd_handler.work);
struct pci_epf *epf = epf_test->epf;
+   struct device *dev = >dev;
struct pci_epc *epc = epf->epc;
enum pci_barno test_reg_bar = epf_test->test_reg_bar;
struct pci_epf_test_reg *reg = epf_test->reg[test_reg_bar];
@@ -280,11 +300,25 @@ static void pci_epf_test_cmd_handler(struct work_struct 
*work)
reg->command = 0;
reg->status = 0;
 
-   irq = (command & MSI_NUMBER_MASK) >> MSI_NUMBER_SHIFT;
+   irq_type = (command & IRQ_TYPE_MASK) >> IRQ_TYPE_SHIFT;
+   switch (irq_type) {
+   case IRQ_TYPE_LEGACY:
+   irq = 0;
+   break;
+   case IRQ_TYPE_MSI:
+   irq = (command & MSI_NUMBER_MASK) >> MSI_NUMBER_SHIFT;
+   break;
+   case IRQ_TYPE_MSIX:
+   irq = (command & MSIX_NUMBER_MASK) >> MSI_NUMBER_SHIFT;
+   break;
+   default:
+   dev_err(dev, "Failed to detect IRQ type\n");
+   goto reset_handler;
+   }
 
if (command & COMMAND_RAISE_LEGACY_IRQ) {
reg->status = STATUS_IRQ_RAISED;
-   pci_epc_raise_irq(epc, epf->func_no, PCI_EPC_IRQ_LEGACY, 0);
+   pci_epc_raise_irq(epc, epf->func_no, PCI_EPC_IRQ_LEGACY, irq);
goto reset_handler;
}
 
@@ -294,7 +328,7 @@ static void pci_epf_test_cmd_handler(struct work_struct 
*work)
reg->status |= STATUS_WRITE_FAIL;
else
reg->status |= STATUS_WRITE_SUCCESS;
-   pci_epf_test_raise_irq(epf_test, irq);
+   pci_epf_test_raise_irq(epf_test, irq_type, irq);
goto reset_handler;
}
 
@@ -304,7 +338,7 @@ static void pci_epf_test_cmd_handler(struct work_struct 
*work)

[RFC 09/10] PCI: endpoint: functions/pci-epf-test: Replace lower into upper case characters

2018-04-10 Thread Gustavo Pimentel
Replaces lower into upper case characters in comments and debug printks.

Signed-off-by: Gustavo Pimentel 
---
 drivers/pci/endpoint/functions/pci-epf-test.c | 26 +-
 1 file changed, 13 insertions(+), 13 deletions(-)

diff --git a/drivers/pci/endpoint/functions/pci-epf-test.c 
b/drivers/pci/endpoint/functions/pci-epf-test.c
index 5997c6e..e3d4af0 100644
--- a/drivers/pci/endpoint/functions/pci-epf-test.c
+++ b/drivers/pci/endpoint/functions/pci-epf-test.c
@@ -94,7 +94,7 @@ static int pci_epf_test_copy(struct pci_epf_test *epf_test)
 
src_addr = pci_epc_mem_alloc_addr(epc, _phys_addr, reg->size);
if (!src_addr) {
-   dev_err(dev, "failed to allocate source address\n");
+   dev_err(dev, "Failed to allocate source address\n");
reg->status = STATUS_SRC_ADDR_INVALID;
ret = -ENOMEM;
goto err;
@@ -103,14 +103,14 @@ static int pci_epf_test_copy(struct pci_epf_test 
*epf_test)
ret = pci_epc_map_addr(epc, epf->func_no, src_phys_addr, reg->src_addr,
   reg->size);
if (ret) {
-   dev_err(dev, "failed to map source address\n");
+   dev_err(dev, "Failed to map source address\n");
reg->status = STATUS_SRC_ADDR_INVALID;
goto err_src_addr;
}
 
dst_addr = pci_epc_mem_alloc_addr(epc, _phys_addr, reg->size);
if (!dst_addr) {
-   dev_err(dev, "failed to allocate destination address\n");
+   dev_err(dev, "Failed to allocate destination address\n");
reg->status = STATUS_DST_ADDR_INVALID;
ret = -ENOMEM;
goto err_src_map_addr;
@@ -119,7 +119,7 @@ static int pci_epf_test_copy(struct pci_epf_test *epf_test)
ret = pci_epc_map_addr(epc, epf->func_no, dst_phys_addr, reg->dst_addr,
   reg->size);
if (ret) {
-   dev_err(dev, "failed to map destination address\n");
+   dev_err(dev, "Failed to map destination address\n");
reg->status = STATUS_DST_ADDR_INVALID;
goto err_dst_addr;
}
@@ -156,7 +156,7 @@ static int pci_epf_test_read(struct pci_epf_test *epf_test)
 
src_addr = pci_epc_mem_alloc_addr(epc, _addr, reg->size);
if (!src_addr) {
-   dev_err(dev, "failed to allocate address\n");
+   dev_err(dev, "Failed to allocate address\n");
reg->status = STATUS_SRC_ADDR_INVALID;
ret = -ENOMEM;
goto err;
@@ -165,7 +165,7 @@ static int pci_epf_test_read(struct pci_epf_test *epf_test)
ret = pci_epc_map_addr(epc, epf->func_no, phys_addr, reg->src_addr,
   reg->size);
if (ret) {
-   dev_err(dev, "failed to map address\n");
+   dev_err(dev, "Failed to map address\n");
reg->status = STATUS_SRC_ADDR_INVALID;
goto err_addr;
}
@@ -208,7 +208,7 @@ static int pci_epf_test_write(struct pci_epf_test *epf_test)
 
dst_addr = pci_epc_mem_alloc_addr(epc, _addr, reg->size);
if (!dst_addr) {
-   dev_err(dev, "failed to allocate address\n");
+   dev_err(dev, "Failed to allocate address\n");
reg->status = STATUS_DST_ADDR_INVALID;
ret = -ENOMEM;
goto err;
@@ -217,7 +217,7 @@ static int pci_epf_test_write(struct pci_epf_test *epf_test)
ret = pci_epc_map_addr(epc, epf->func_no, phys_addr, reg->dst_addr,
   reg->size);
if (ret) {
-   dev_err(dev, "failed to map address\n");
+   dev_err(dev, "Failed to map address\n");
reg->status = STATUS_DST_ADDR_INVALID;
goto err_addr;
}
@@ -422,7 +422,7 @@ static int pci_epf_test_set_bar(struct pci_epf *epf)
ret = pci_epc_set_bar(epc, epf->func_no, epf_bar);
if (ret) {
pci_epf_free_space(epf, epf_test->reg[bar], bar);
-   dev_err(dev, "failed to set BAR%d\n", bar);
+   dev_err(dev, "Failed to set BAR%d\n", bar);
if (bar == test_reg_bar)
return ret;
}
@@ -449,7 +449,7 @@ static int pci_epf_test_alloc_space(struct pci_epf *epf)
base = pci_epf_alloc_space(epf, sizeof(struct pci_epf_test_reg),
   test_reg_bar);
if (!base) {
-   dev_err(dev, "failed to allocated register space\n");
+   dev_err(dev, "Failed to allocated register space\n");
return -ENOMEM;
}
epf_test->reg[test_reg_bar] = base;
@@ -459,7 +459,7 @@ static int pci_epf_test_alloc_space(struct pci_epf *epf)
continue;
base = pci_epf_alloc_space(epf, 

[RFC 06/10] misc: pci_endpoint_test: Add MSI-X support

2018-04-10 Thread Gustavo Pimentel
Adds the MSI-X support and updates driver documentation accordingly.

Changes the driver parameter in order to allow the interruption type
selection.

Signed-off-by: Gustavo Pimentel 
---
 Documentation/misc-devices/pci-endpoint-test.txt |   3 +
 drivers/misc/pci_endpoint_test.c | 102 +--
 2 files changed, 79 insertions(+), 26 deletions(-)

diff --git a/Documentation/misc-devices/pci-endpoint-test.txt 
b/Documentation/misc-devices/pci-endpoint-test.txt
index 4ebc359..fdfa0f6 100644
--- a/Documentation/misc-devices/pci-endpoint-test.txt
+++ b/Documentation/misc-devices/pci-endpoint-test.txt
@@ -10,6 +10,7 @@ The PCI driver for the test device performs the following 
tests
*) verifying addresses programmed in BAR
*) raise legacy IRQ
*) raise MSI IRQ
+   *) raise MSI-X IRQ
*) read data
*) write data
*) copy data
@@ -25,6 +26,8 @@ ioctl
  PCITEST_LEGACY_IRQ: Tests legacy IRQ
  PCITEST_MSI: Tests message signalled interrupts. The MSI number
  to be tested should be passed as argument.
+ PCITEST_MSIX: Tests message signalled interrupts. The MSI-X number
+ to be tested should be passed as argument.
  PCITEST_WRITE: Perform write tests. The size of the buffer should be passed
as argument.
  PCITEST_READ: Perform read tests. The size of the buffer should be passed
diff --git a/drivers/misc/pci_endpoint_test.c b/drivers/misc/pci_endpoint_test.c
index 37db0fc..a7d9354 100644
--- a/drivers/misc/pci_endpoint_test.c
+++ b/drivers/misc/pci_endpoint_test.c
@@ -42,11 +42,16 @@
 #define PCI_ENDPOINT_TEST_COMMAND  0x4
 #define COMMAND_RAISE_LEGACY_IRQ   BIT(0)
 #define COMMAND_RAISE_MSI_IRQ  BIT(1)
-#define MSI_NUMBER_SHIFT   2
-/* 6 bits for MSI number */
-#define COMMAND_READBIT(8)
-#define COMMAND_WRITE   BIT(9)
-#define COMMAND_COPYBIT(10)
+#define COMMAND_RAISE_MSIX_IRQ BIT(2)
+#define IRQ_TYPE_SHIFT 3
+#define IRQ_TYPE_LEGACY0
+#define IRQ_TYPE_MSI   1
+#define IRQ_TYPE_MSIX  2
+#define MSI_NUMBER_SHIFT   5
+/* 12 bits for MSI number */
+#define COMMAND_READBIT(17)
+#define COMMAND_WRITE   BIT(18)
+#define COMMAND_COPYBIT(19)
 
 #define PCI_ENDPOINT_TEST_STATUS   0x8
 #define STATUS_READ_SUCCESS BIT(0)
@@ -73,9 +78,9 @@ static DEFINE_IDA(pci_endpoint_test_ida);
 #define to_endpoint_test(priv) container_of((priv), struct pci_endpoint_test, \
miscdev)
 
-static bool no_msi;
-module_param(no_msi, bool, 0444);
-MODULE_PARM_DESC(no_msi, "Disable MSI interrupt in pci_endpoint_test");
+static int irq_type = IRQ_TYPE_MSIX;
+module_param(irq_type, int, 0444);
+MODULE_PARM_DESC(irq_type, "IRQ mode selection in pci_endpoint_test (0 - 
Legacy, 1 - MSI, 2 - MSI-X)");
 
 enum pci_barno {
BAR_0,
@@ -103,7 +108,7 @@ struct pci_endpoint_test {
 struct pci_endpoint_test_data {
enum pci_barno test_reg_bar;
size_t alignment;
-   bool no_msi;
+   int irq_type;
 };
 
 static inline u32 pci_endpoint_test_readl(struct pci_endpoint_test *test,
@@ -177,10 +182,10 @@ static bool pci_endpoint_test_bar(struct 
pci_endpoint_test *test,
 
 static bool pci_endpoint_test_legacy_irq(struct pci_endpoint_test *test)
 {
-   u32 val;
+   u32 val = COMMAND_RAISE_LEGACY_IRQ;
 
-   pci_endpoint_test_writel(test, PCI_ENDPOINT_TEST_COMMAND,
-COMMAND_RAISE_LEGACY_IRQ);
+   val |= (IRQ_TYPE_LEGACY << IRQ_TYPE_SHIFT);
+   pci_endpoint_test_writel(test, PCI_ENDPOINT_TEST_COMMAND, val);
val = wait_for_completion_timeout(>irq_raised,
  msecs_to_jiffies(1000));
if (!val)
@@ -192,12 +197,12 @@ static bool pci_endpoint_test_legacy_irq(struct 
pci_endpoint_test *test)
 static bool pci_endpoint_test_msi_irq(struct pci_endpoint_test *test,
  u8 msi_num)
 {
-   u32 val;
+   u32 val = COMMAND_RAISE_MSI_IRQ;
struct pci_dev *pdev = test->pdev;
 
-   pci_endpoint_test_writel(test, PCI_ENDPOINT_TEST_COMMAND,
-msi_num << MSI_NUMBER_SHIFT |
-COMMAND_RAISE_MSI_IRQ);
+   val |= (msi_num << MSI_NUMBER_SHIFT);
+   val |= (IRQ_TYPE_MSI << IRQ_TYPE_SHIFT);
+   pci_endpoint_test_writel(test, PCI_ENDPOINT_TEST_COMMAND, val);
val = wait_for_completion_timeout(>irq_raised,
  msecs_to_jiffies(1000));
if (!val)
@@ -209,6 +214,26 @@ static bool pci_endpoint_test_msi_irq(struct 
pci_endpoint_test *test,
return false;
 }
 
+static bool pci_endpoint_test_msix_irq(struct pci_endpoint_test *test,
+ 

[RFC 10/10] tools: PCI: Add MSI-X support

2018-04-10 Thread Gustavo Pimentel
Adds MSI-X support to the pcitest tool and modified the pcitest.sh script
to accomodate this new type of interruption test.

Signed-off-by: Gustavo Pimentel 
---
 include/uapi/linux/pcitest.h |  1 +
 tools/pci/pcitest.c  | 18 +-
 tools/pci/pcitest.sh | 25 +
 3 files changed, 43 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/pcitest.h b/include/uapi/linux/pcitest.h
index 953cf03..d746fb1 100644
--- a/include/uapi/linux/pcitest.h
+++ b/include/uapi/linux/pcitest.h
@@ -16,5 +16,6 @@
 #define PCITEST_WRITE  _IOW('P', 0x4, unsigned long)
 #define PCITEST_READ   _IOW('P', 0x5, unsigned long)
 #define PCITEST_COPY   _IOW('P', 0x6, unsigned long)
+#define PCITEST_MSIX   _IOW('P', 0x7, int)
 
 #endif /* __UAPI_LINUX_PCITEST_H */
diff --git a/tools/pci/pcitest.c b/tools/pci/pcitest.c
index 9074b47..9d145a3 100644
--- a/tools/pci/pcitest.c
+++ b/tools/pci/pcitest.c
@@ -37,6 +37,7 @@ struct pci_test {
charbarnum;
boollegacyirq;
unsigned intmsinum;
+   unsigned intmsixnum;
boolread;
boolwrite;
boolcopy;
@@ -83,6 +84,15 @@ static int run_test(struct pci_test *test)
fprintf(stdout, "%s\n", result[ret]);
}
 
+   if (test->msixnum > 0 && test->msixnum <= 2048) {
+   ret = ioctl(fd, PCITEST_MSIX, test->msixnum);
+   fprintf(stdout, "MSI-X%d:\t\t", test->msixnum);
+   if (ret < 0)
+   fprintf(stdout, "TEST FAILED\n");
+   else
+   fprintf(stdout, "%s\n", result[ret]);
+   }
+
if (test->write) {
ret = ioctl(fd, PCITEST_WRITE, test->size);
fprintf(stdout, "WRITE (%7ld bytes):\t\t", test->size);
@@ -133,7 +143,7 @@ int main(int argc, char **argv)
/* set default endpoint device */
test->device = "/dev/pci-endpoint-test.0";
 
-   while ((c = getopt(argc, argv, "D:b:m:lrwcs:")) != EOF)
+   while ((c = getopt(argc, argv, "D:b:m:x:lrwcs:")) != EOF)
switch (c) {
case 'D':
test->device = optarg;
@@ -151,6 +161,11 @@ int main(int argc, char **argv)
if (test->msinum < 1 || test->msinum > 32)
goto usage;
continue;
+   case 'x':
+   test->msixnum = atoi(optarg);
+   if (test->msixnum < 1 || test->msixnum > 2048)
+   goto usage;
+   continue;
case 'r':
test->read = true;
continue;
@@ -173,6 +188,7 @@ int main(int argc, char **argv)
"\t-D  PCI endpoint test device 
{default: /dev/pci-endpoint-test.0}\n"
"\t-b  BAR test (bar number between 
0..5)\n"
"\t-m  MSI test (msi number between 
1..32)\n"
+   "\t-x MSI-X test (msix number between 
1..2048)\n"
"\t-l   Legacy IRQ test\n"
"\t-r   Read buffer test\n"
"\t-w   Write buffer test\n"
diff --git a/tools/pci/pcitest.sh b/tools/pci/pcitest.sh
index 77e8c85..86709a2 100644
--- a/tools/pci/pcitest.sh
+++ b/tools/pci/pcitest.sh
@@ -4,6 +4,8 @@
 echo "BAR tests"
 echo
 
+modprobe pci_endpoint_test
+sleep 2
 bar=0
 
 while [ $bar -lt 6 ]
@@ -16,7 +18,14 @@ echo
 echo "Interrupt tests"
 echo
 
+rmmod pci_endpoint_test
+sleep 2
+modprobe pci_endpoint_test irq_type=0
 pcitest -l
+
+rmmod pci_endpoint_test
+sleep 2
+modprobe pci_endpoint_test irq_type=1
 msi=1
 
 while [ $msi -lt 33 ]
@@ -26,9 +35,25 @@ do
 done
 echo
 
+rmmod pci_endpoint_test
+sleep 2
+modprobe pci_endpoint_test irq_type=2
+msix=1
+
+while [ $msix -lt 2049 ]
+do
+pcitest -x $msix
+msix=`expr $msix + 1`
+done
+echo
+
 echo "Read Tests"
 echo
 
+rmmod pci_endpoint_test
+sleep 2
+modprobe pci_endpoint_test irq_type=1
+
 pcitest -r -s 1
 pcitest -r -s 1024
 pcitest -r -s 1025
-- 
2.7.4


--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC 02/10] PCI: cadence: Update cdns_pcie_ep_raise_irq function signature

2018-04-10 Thread Gustavo Pimentel
Changes the cdns_pcie_ep_raise_irq function signature, namely the
interrupt_num variable type from u8 to u16 to accommodate the MSI-X maximum
interrupts of 2048.

Signed-off-by: Gustavo Pimentel 
---
 drivers/pci/cadence/pcie-cadence-ep.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/pci/cadence/pcie-cadence-ep.c 
b/drivers/pci/cadence/pcie-cadence-ep.c
index 3d8283e..6d6322c 100644
--- a/drivers/pci/cadence/pcie-cadence-ep.c
+++ b/drivers/pci/cadence/pcie-cadence-ep.c
@@ -363,7 +363,7 @@ static int cdns_pcie_ep_send_msi_irq(struct cdns_pcie_ep 
*ep, u8 fn,
 }
 
 static int cdns_pcie_ep_raise_irq(struct pci_epc *epc, u8 fn,
- enum pci_epc_irq_type type, u8 interrupt_num)
+ enum pci_epc_irq_type type, u16 interrupt_num)
 {
struct cdns_pcie_ep *ep = epc_get_drvdata(epc);
 
-- 
2.7.4


--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC 04/10] PCI: dwc: MSI callbacks handler rework

2018-04-10 Thread Gustavo Pimentel
Adds in pci_epc_set_msi function a maximum number of 32 interrupts
validation.

Removes duplicate defines located on pcie-designware.h file. Uses now
the defines available on /include/uapi/linux/pci-regs.h file.

Signed-off-by: Gustavo Pimentel 
---
 drivers/pci/dwc/pcie-designware-ep.c | 46 +++-
 drivers/pci/dwc/pcie-designware.h| 11 -
 drivers/pci/endpoint/pci-epc-core.c  |  3 ++-
 3 files changed, 32 insertions(+), 28 deletions(-)

diff --git a/drivers/pci/dwc/pcie-designware-ep.c 
b/drivers/pci/dwc/pcie-designware-ep.c
index 874d4c2..e352786 100644
--- a/drivers/pci/dwc/pcie-designware-ep.c
+++ b/drivers/pci/dwc/pcie-designware-ep.c
@@ -251,29 +251,38 @@ static int dw_pcie_ep_map_addr(struct pci_epc *epc, u8 
func_no,
 
 static int dw_pcie_ep_get_msi(struct pci_epc *epc, u8 func_no)
 {
-   int val;
struct dw_pcie_ep *ep = epc_get_drvdata(epc);
struct dw_pcie *pci = to_dw_pcie_from_ep(ep);
+   u32 val, reg;
+
+   if (ep->cap_addr.msi_addr == 0)
+   return 0;
 
-   val = dw_pcie_readw_dbi(pci, MSI_MESSAGE_CONTROL);
-   if (!(val & MSI_CAP_MSI_EN_MASK))
+   reg = ep->cap_addr.msi_addr + PCI_MSI_FLAGS;
+   val = dw_pcie_readw_dbi(pci, reg);
+   if (!(val & PCI_MSI_FLAGS_ENABLE))
return -EINVAL;
 
-   val = (val & MSI_CAP_MME_MASK) >> MSI_CAP_MME_SHIFT;
+   val = (val & PCI_MSI_FLAGS_QSIZE) >> 4;
+
return val;
 }
 
-static int dw_pcie_ep_set_msi(struct pci_epc *epc, u8 func_no, u8 encode_int)
+static int dw_pcie_ep_set_msi(struct pci_epc *epc, u8 func_no, u8 interrupts)
 {
-   int val;
struct dw_pcie_ep *ep = epc_get_drvdata(epc);
struct dw_pcie *pci = to_dw_pcie_from_ep(ep);
+   u32 val, reg;
+
+   if (ep->cap_addr.msi_addr == 0)
+   return 0;
 
-   val = dw_pcie_readw_dbi(pci, MSI_MESSAGE_CONTROL);
-   val &= ~MSI_CAP_MMC_MASK;
-   val |= (encode_int << MSI_CAP_MMC_SHIFT) & MSI_CAP_MMC_MASK;
+   reg = ep->cap_addr.msi_addr + PCI_MSI_FLAGS;
+   val = dw_pcie_readw_dbi(pci, reg);
+   val &= ~PCI_MSI_FLAGS_QMASK;
+   val |= (interrupts << 1) & PCI_MSI_FLAGS_QMASK;
dw_pcie_dbi_ro_wr_en(pci);
-   dw_pcie_writew_dbi(pci, MSI_MESSAGE_CONTROL, val);
+   dw_pcie_writew_dbi(pci, reg, val);
dw_pcie_dbi_ro_wr_dis(pci);
 
return 0;
@@ -372,21 +381,26 @@ int dw_pcie_ep_raise_msi_irq(struct dw_pcie_ep *ep, u8 
func_no,
struct dw_pcie *pci = to_dw_pcie_from_ep(ep);
struct pci_epc *epc = ep->epc;
u16 msg_ctrl, msg_data;
-   u32 msg_addr_lower, msg_addr_upper;
+   u32 msg_addr_lower, msg_addr_upper, reg;
u64 msg_addr;
bool has_upper;
int ret;
 
/* Raise MSI per the PCI Local Bus Specification Revision 3.0, 6.8.1. */
-   msg_ctrl = dw_pcie_readw_dbi(pci, MSI_MESSAGE_CONTROL);
+   reg = ep->cap_addr.msi_addr + PCI_MSI_FLAGS;
+   msg_ctrl = dw_pcie_readw_dbi(pci, reg);
has_upper = !!(msg_ctrl & PCI_MSI_FLAGS_64BIT);
-   msg_addr_lower = dw_pcie_readl_dbi(pci, MSI_MESSAGE_ADDR_L32);
+   reg = ep->cap_addr.msi_addr + PCI_MSI_ADDRESS_LO;
+   msg_addr_lower = dw_pcie_readl_dbi(pci, reg);
if (has_upper) {
-   msg_addr_upper = dw_pcie_readl_dbi(pci, MSI_MESSAGE_ADDR_U32);
-   msg_data = dw_pcie_readw_dbi(pci, MSI_MESSAGE_DATA_64);
+   reg = ep->cap_addr.msi_addr + PCI_MSI_ADDRESS_HI;
+   msg_addr_upper = dw_pcie_readl_dbi(pci, reg);
+   reg = ep->cap_addr.msi_addr + PCI_MSI_DATA_64;
+   msg_data = dw_pcie_readw_dbi(pci, reg);
} else {
msg_addr_upper = 0;
-   msg_data = dw_pcie_readw_dbi(pci, MSI_MESSAGE_DATA_32);
+   reg = ep->cap_addr.msi_addr + PCI_MSI_DATA_32;
+   msg_data = dw_pcie_readw_dbi(pci, reg);
}
msg_addr = ((u64) msg_addr_upper) << 32 | msg_addr_lower;
ret = dw_pcie_ep_map_addr(epc, func_no, ep->msi_mem_phys, msg_addr,
diff --git a/drivers/pci/dwc/pcie-designware.h 
b/drivers/pci/dwc/pcie-designware.h
index 456fd94..2acf18b0 100644
--- a/drivers/pci/dwc/pcie-designware.h
+++ b/drivers/pci/dwc/pcie-designware.h
@@ -96,17 +96,6 @@
 #define PCIE_GET_ATU_INB_UNR_REG_OFFSET(region)
\
((0x3 << 20) | ((region) << 9) | (0x1 << 8))
 
-#define MSI_MESSAGE_CONTROL0x52
-#define MSI_CAP_MMC_SHIFT  1
-#define MSI_CAP_MMC_MASK   (7 << MSI_CAP_MMC_SHIFT)
-#define MSI_CAP_MME_SHIFT  4
-#define MSI_CAP_MSI_EN_MASK0x1
-#define MSI_CAP_MME_MASK   (7 << MSI_CAP_MME_SHIFT)
-#define MSI_MESSAGE_ADDR_L32   0x54
-#define MSI_MESSAGE_ADDR_U32   0x58
-#define MSI_MESSAGE_DATA_320x58
-#define MSI_MESSAGE_DATA_640x5C
-
 #define MAX_MSI_IRQS   256
 

Re: [PATCH] gpiolib: add hogs support for machine code

2018-04-10 Thread kbuild test robot
Hi Bartosz,

I love your patch! Yet something to improve:

[auto build test ERROR on gpio/for-next]
[also build test ERROR on v4.16 next-20180410]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Bartosz-Golaszewski/gpiolib-add-hogs-support-for-machine-code/20180410-232047
base:   https://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio.git 
for-next
config: i386-randconfig-a0-201814 (attached as .config)
compiler: gcc-4.9 (Debian 4.9.4-2) 4.9.4
reproduce:
# save the attached .config to linux build tree
make ARCH=i386 

All errors (new ones prefixed by >>):

   In file included from drivers//mfd/sm501.c:23:0:
>> include/linux/gpio/machine.h:56:19: error: field 'dflags' has incomplete type
 enum gpiod_flags dflags;
  ^

vim +/dflags +56 include/linux/gpio/machine.h

41  
42  /**
43   * struct gpiod_hog - GPIO line hog table
44   * @chip_label: name of the chip the GPIO belongs to
45   * @chip_hwnum: hardware number (i.e. relative to the chip) of the GPIO
46   * @line_name: consumer name for the hogged line
47   * @lflags: mask of GPIO lookup flags
48   * @dflags: GPIO flags used to specify the direction and value
49   */
50  struct gpiod_hog {
51  struct list_head list;
52  const char *chip_label;
53  u16 chip_hwnum;
54  const char *line_name;
55  enum gpio_lookup_flags lflags;
  > 56  enum gpiod_flags dflags;
57  };
58  

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip


Re: [RFC bpf-next v2 7/8] bpf: add documentation for eBPF helpers (51-57)

2018-04-10 Thread Yonghong Song



On 4/10/18 7:41 AM, Quentin Monnet wrote:

Add documentation for eBPF helper functions to bpf.h user header file.
This documentation can be parsed with the Python script provided in
another commit of the patch series, in order to provide a RST document
that can later be converted into a man page.

The objective is to make the documentation easily understandable and
accessible to all eBPF developers, including beginners.

This patch contains descriptions for the following helper functions:

Helpers from Lawrence:
- bpf_setsockopt()
- bpf_getsockopt()
- bpf_sock_ops_cb_flags_set()

Helpers from Yonghong:
- bpf_perf_event_read_value()
- bpf_perf_prog_read_value()

Helper from Josef:
- bpf_override_return()

Helper from Andrey:
- bpf_bind()

Cc: Lawrence Brakmo 
Cc: Yonghong Song 
Cc: Josef Bacik 
Cc: Andrey Ignatov 
Signed-off-by: Quentin Monnet 
---
  include/uapi/linux/bpf.h | 184 +++
  1 file changed, 184 insertions(+)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 15d9ccafebbe..7343af4196c8 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -1208,6 +1208,28 @@ union bpf_attr {
   *Return
   *0
   *
+ * int bpf_setsockopt(struct bpf_sock_ops_kern *bpf_socket, int level, int 
optname, char *optval, int optlen)
+ * Description
+ * Emulate a call to **setsockopt()** on the socket associated to
+ * *bpf_socket*, which must be a full socket. The *level* at
+ * which the option resides and the name *optname* of the option
+ * must be specified, see **setsockopt(2)** for more information.
+ * The option value of length *optlen* is pointed by *optval*.
+ *
+ * This helper actually implements a subset of **setsockopt()**.
+ * It supports the following *level*\ s:
+ *
+ * * **SOL_SOCKET**, which supports the following *optname*\ s:
+ *   **SO_RCVBUF**, **SO_SNDBUF**, **SO_MAX_PACING_RATE**,
+ *   **SO_PRIORITY**, **SO_RCVLOWAT**, **SO_MARK**.
+ * * **IPPROTO_TCP**, which supports the following *optname*\ s:
+ *   **TCP_CONGESTION**, **TCP_BPF_IW**,
+ *   **TCP_BPF_SNDCWND_CLAMP**.
+ * * **IPPROTO_IP**, which supports *optname* **IP_TOS**.
+ * * **IPPROTO_IPV6**, which supports *optname* **IPV6_TCLASS**.
+ * Return
+ * 0 on success, or a negative error in case of failure.
+ *
   * int bpf_skb_adjust_room(struct sk_buff *skb, u32 len_diff, u32 mode, u64 
flags)
   *Description
   *Grow or shrink the room for data in the packet associated to
@@ -1255,6 +1277,168 @@ union bpf_attr {
   *performed again.
   *Return
   *0 on success, or a negative error in case of failure.
+ *
+ * int bpf_perf_event_read_value(struct bpf_map *map, u64 flags, struct 
bpf_perf_event_value *buf, u32 buf_size)
+ * Description
+ * Read the value of a perf event counter, and store it into *buf*
+ * of size *buf_size*. This helper relies on a *map* of type
+ * **BPF_MAP_TYPE_PERF_EVENT_ARRAY**. The nature of the perf
+ * event counter is selected at the creation of the *map*. The


The nature of the perf event counter is selected when *map* is updated 
with perf_event fd's.



+ * *map* is an array whose size is the number of available CPU
+ * cores, and each cell contains a value relative to one core. The


It is confusing to mix core/cpu here. Maybe just use perf_event 
convention, always using cpu?



+ * value to retrieve is indicated by *flags*, that contains the
+ * index of the core to look up, masked with
+ * **BPF_F_INDEX_MASK**. Alternatively, *flags* can be set to
+ * **BPF_F_CURRENT_CPU** to indicate that the value for the
+ * current CPU core should be retrieved.
+ *
+ * This helper behaves in a way close to
+ * **bpf_perf_event_read**\ () helper, save that instead of
+ * just returning the value observed, it fills the *buf*
+ * structure. This allows for additional data to be retrieved: in
+ * particular, the enabled and running times (in *buf*\
+ * **->enabled** and *buf*\ **->running**, respectively) are
+ * copied.
+ *
+ * These values are interesting, because hardware PMU (Performance
+ * Monitoring Unit) counters are limited resources. When there are
+ * more PMU based perf events opened than available counters,
+ * kernel will multiplex these events so each event gets certain
+ * percentage (but not all) of the PMU time. In case that
+ * multiplexing happens, the number of samples or counter value
+ *

Re: [PATCH v2 2/2] mm: remove odd HAVE_PTE_SPECIAL

2018-04-10 Thread Laurent Dufour


On 10/04/2018 17:58, Robin Murphy wrote:
> On 10/04/18 16:25, Laurent Dufour wrote:
>> Remove the additional define HAVE_PTE_SPECIAL and rely directly on
>> CONFIG_ARCH_HAS_PTE_SPECIAL.
>>
>> There is no functional change introduced by this patch
>>
>> Signed-off-by: Laurent Dufour 
>> ---
>>   mm/memory.c | 23 ++-
>>   1 file changed, 10 insertions(+), 13 deletions(-)
>>
>> diff --git a/mm/memory.c b/mm/memory.c
>> index 96910c625daa..53b6344a90d2 100644
>> --- a/mm/memory.c
>> +++ b/mm/memory.c
>> @@ -817,19 +817,13 @@ static void print_bad_pte(struct vm_area_struct *vma,
>> unsigned long addr,
>>    * PFNMAP mappings in order to support COWable mappings.
>>    *
>>    */
>> -#ifdef CONFIG_ARCH_HAS_PTE_SPECIAL
>> -# define HAVE_PTE_SPECIAL 1
>> -#else
>> -# define HAVE_PTE_SPECIAL 0
>> -#endif
>>   struct page *_vm_normal_page(struct vm_area_struct *vma, unsigned long 
>> addr,
>>    pte_t pte, bool with_public_device)
>>   {
>>   unsigned long pfn = pte_pfn(pte);
>>   -    if (HAVE_PTE_SPECIAL) {
>> -    if (likely(!pte_special(pte)))
>> -    goto check_pfn;
>> +#ifdef CONFIG_ARCH_HAS_PTE_SPECIAL
> 
> Nit: Couldn't you use IS_ENABLED(CONFIG_ARCH_HAS_PTE_SPECIAL) within the
> existing code structure to avoid having to add these #ifdefs?

I agree, that would be better. I didn't thought about this option..
Thanks for reporting this.

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/2] mm: introduce ARCH_HAS_PTE_SPECIAL

2018-04-10 Thread Laurent Dufour
On 10/04/2018 18:09, Matthew Wilcox wrote:
> On Tue, Apr 10, 2018 at 05:25:50PM +0200, Laurent Dufour wrote:
>>  arch/powerpc/include/asm/pte-common.h  | 3 ---
>>  arch/riscv/Kconfig | 1 +
>>  arch/s390/Kconfig  | 1 +
> 
> You forgot to delete __HAVE_ARCH_PTE_SPECIAL from
> arch/riscv/include/asm/pgtable-bits.h

Damned !
Thanks for catching it.

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/2] mm: introduce ARCH_HAS_PTE_SPECIAL

2018-04-10 Thread Matthew Wilcox
On Tue, Apr 10, 2018 at 05:25:50PM +0200, Laurent Dufour wrote:
>  arch/powerpc/include/asm/pte-common.h  | 3 ---
>  arch/riscv/Kconfig | 1 +
>  arch/s390/Kconfig  | 1 +

You forgot to delete __HAVE_ARCH_PTE_SPECIAL from
arch/riscv/include/asm/pgtable-bits.h
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 2/2] mm: remove odd HAVE_PTE_SPECIAL

2018-04-10 Thread Robin Murphy

On 10/04/18 16:25, Laurent Dufour wrote:

Remove the additional define HAVE_PTE_SPECIAL and rely directly on
CONFIG_ARCH_HAS_PTE_SPECIAL.

There is no functional change introduced by this patch

Signed-off-by: Laurent Dufour 
---
  mm/memory.c | 23 ++-
  1 file changed, 10 insertions(+), 13 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index 96910c625daa..53b6344a90d2 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -817,19 +817,13 @@ static void print_bad_pte(struct vm_area_struct *vma, 
unsigned long addr,
   * PFNMAP mappings in order to support COWable mappings.
   *
   */
-#ifdef CONFIG_ARCH_HAS_PTE_SPECIAL
-# define HAVE_PTE_SPECIAL 1
-#else
-# define HAVE_PTE_SPECIAL 0
-#endif
  struct page *_vm_normal_page(struct vm_area_struct *vma, unsigned long addr,
 pte_t pte, bool with_public_device)
  {
unsigned long pfn = pte_pfn(pte);
  
-	if (HAVE_PTE_SPECIAL) {

-   if (likely(!pte_special(pte)))
-   goto check_pfn;
+#ifdef CONFIG_ARCH_HAS_PTE_SPECIAL


Nit: Couldn't you use IS_ENABLED(CONFIG_ARCH_HAS_PTE_SPECIAL) within the 
existing code structure to avoid having to add these #ifdefs?


Robin.


+   if (unlikely(pte_special(pte))) {
if (vma->vm_ops && vma->vm_ops->find_special_page)
return vma->vm_ops->find_special_page(vma, addr);
if (vma->vm_flags & (VM_PFNMAP | VM_MIXEDMAP))
@@ -862,7 +856,7 @@ struct page *_vm_normal_page(struct vm_area_struct *vma, 
unsigned long addr,
return NULL;
}
  
-	/* !HAVE_PTE_SPECIAL case follows: */

+#else  /* CONFIG_ARCH_HAS_PTE_SPECIAL */
  
  	if (unlikely(vma->vm_flags & (VM_PFNMAP|VM_MIXEDMAP))) {

if (vma->vm_flags & VM_MIXEDMAP) {
@@ -881,7 +875,8 @@ struct page *_vm_normal_page(struct vm_area_struct *vma, 
unsigned long addr,
  
  	if (is_zero_pfn(pfn))

return NULL;
-check_pfn:
+#endif /* CONFIG_ARCH_HAS_PTE_SPECIAL */
+
if (unlikely(pfn > highest_memmap_pfn)) {
print_bad_pte(vma, addr, pte, NULL);
return NULL;
@@ -891,7 +886,7 @@ struct page *_vm_normal_page(struct vm_area_struct *vma, 
unsigned long addr,
 * NOTE! We still have PageReserved() pages in the page tables.
 * eg. VDSO mappings can cause them to exist.
 */
-out:
+out: __maybe_unused
return pfn_to_page(pfn);
  }
  
@@ -904,7 +899,7 @@ struct page *vm_normal_page_pmd(struct vm_area_struct *vma, unsigned long addr,

/*
 * There is no pmd_special() but there may be special pmds, e.g.
 * in a direct-access (dax) mapping, so let's just replicate the
-* !HAVE_PTE_SPECIAL case from vm_normal_page() here.
+* !CONFIG_ARCH_HAS_PTE_SPECIAL case from vm_normal_page() here.
 */
if (unlikely(vma->vm_flags & (VM_PFNMAP|VM_MIXEDMAP))) {
if (vma->vm_flags & VM_MIXEDMAP) {
@@ -1926,6 +1921,7 @@ static int __vm_insert_mixed(struct vm_area_struct *vma, 
unsigned long addr,
  
  	track_pfn_insert(vma, , pfn);
  
+#ifndef CONFIG_ARCH_HAS_PTE_SPECIAL

/*
 * If we don't have pte special, then we have to use the pfn_valid()
 * based VM_MIXEDMAP scheme (see vm_normal_page), and thus we *must*
@@ -1933,7 +1929,7 @@ static int __vm_insert_mixed(struct vm_area_struct *vma, 
unsigned long addr,
 * than insert_pfn).  If a zero_pfn were inserted into a VM_MIXEDMAP
 * without pte special, it would there be refcounted as a normal page.
 */
-   if (!HAVE_PTE_SPECIAL && !pfn_t_devmap(pfn) && pfn_t_valid(pfn)) {
+   if (!pfn_t_devmap(pfn) && pfn_t_valid(pfn)) {
struct page *page;
  
  		/*

@@ -1944,6 +1940,7 @@ static int __vm_insert_mixed(struct vm_area_struct *vma, 
unsigned long addr,
page = pfn_to_page(pfn_t_to_pfn(pfn));
return insert_page(vma, addr, page, pgprot);
}
+#endif
return insert_pfn(vma, addr, pfn, pgprot, mkwrite);
  }
  


--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 1/2] mm: introduce ARCH_HAS_PTE_SPECIAL

2018-04-10 Thread Laurent Dufour
Currently the PTE special supports is turned on in per architecture header
files. Most of the time, it is defined in arch/*/include/asm/pgtable.h
depending or not on some other per architecture static definition.

This patch introduce a new configuration variable to manage this directly
in the Kconfig files. It would later replace __HAVE_ARCH_PTE_SPECIAL.

Here notes for some architecture where the definition of
__HAVE_ARCH_PTE_SPECIAL is not obvious:

arm
 __HAVE_ARCH_PTE_SPECIAL which is currently defined in
arch/arm/include/asm/pgtable-3level.h which is included by
arch/arm/include/asm/pgtable.h when CONFIG_ARM_LPAE is set.
So select ARCH_HAS_PTE_SPECIAL if ARM_LPAE.

powerpc
__HAVE_ARCH_PTE_SPECIAL is defined in 2 files:
 - arch/powerpc/include/asm/book3s/64/pgtable.h
 - arch/powerpc/include/asm/pte-common.h
The first one is included if (PPC_BOOK3S & PPC64) while the second is
included in all the other cases.
So select ARCH_HAS_PTE_SPECIAL all the time.

sparc:
__HAVE_ARCH_PTE_SPECIAL is defined if defined(__sparc__) &&
defined(__arch64__) which are defined through the compiler in
sparc/Makefile if !SPARC32 which I assume to be if SPARC64.
So select ARCH_HAS_PTE_SPECIAL if SPARC64

There is no functional change introduced by this patch.

Suggested-by: Jerome Glisse 
Reviewed-by: Jerome Glisse 
Signed-off-by: Laurent Dufour 
---
 Documentation/features/vm/pte_special/arch-support.txt | 2 +-
 arch/arc/Kconfig   | 1 +
 arch/arc/include/asm/pgtable.h | 2 --
 arch/arm/Kconfig   | 1 +
 arch/arm/include/asm/pgtable-3level.h  | 1 -
 arch/arm64/Kconfig | 1 +
 arch/arm64/include/asm/pgtable.h   | 2 --
 arch/powerpc/Kconfig   | 1 +
 arch/powerpc/include/asm/book3s/64/pgtable.h   | 3 ---
 arch/powerpc/include/asm/pte-common.h  | 3 ---
 arch/riscv/Kconfig | 1 +
 arch/s390/Kconfig  | 1 +
 arch/s390/include/asm/pgtable.h| 1 -
 arch/sh/Kconfig| 1 +
 arch/sh/include/asm/pgtable.h  | 2 --
 arch/sparc/Kconfig | 1 +
 arch/sparc/include/asm/pgtable_64.h| 3 ---
 arch/x86/Kconfig   | 1 +
 arch/x86/include/asm/pgtable_types.h   | 1 -
 include/linux/pfn_t.h  | 4 ++--
 mm/Kconfig | 3 +++
 mm/gup.c   | 4 ++--
 mm/memory.c| 2 +-
 23 files changed, 18 insertions(+), 24 deletions(-)

diff --git a/Documentation/features/vm/pte_special/arch-support.txt 
b/Documentation/features/vm/pte_special/arch-support.txt
index 055004f467d2..cd05924ea875 100644
--- a/Documentation/features/vm/pte_special/arch-support.txt
+++ b/Documentation/features/vm/pte_special/arch-support.txt
@@ -1,6 +1,6 @@
 #
 # Feature name:  pte_special
-# Kconfig:   __HAVE_ARCH_PTE_SPECIAL
+# Kconfig:   ARCH_HAS_PTE_SPECIAL
 # description:   arch supports the pte_special()/pte_mkspecial() VM 
APIs
 #
 ---
diff --git a/arch/arc/Kconfig b/arch/arc/Kconfig
index d76bf4a83740..8516e2b0239a 100644
--- a/arch/arc/Kconfig
+++ b/arch/arc/Kconfig
@@ -44,6 +44,7 @@ config ARC
select HAVE_GENERIC_DMA_COHERENT
select HAVE_KERNEL_GZIP
select HAVE_KERNEL_LZMA
+   select ARCH_HAS_PTE_SPECIAL
 
 config MIGHT_HAVE_PCI
bool
diff --git a/arch/arc/include/asm/pgtable.h b/arch/arc/include/asm/pgtable.h
index 08fe33830d4b..8ec5599a0957 100644
--- a/arch/arc/include/asm/pgtable.h
+++ b/arch/arc/include/asm/pgtable.h
@@ -320,8 +320,6 @@ PTE_BIT_FUNC(mkexec,|= (_PAGE_EXECUTE));
 PTE_BIT_FUNC(mkspecial,|= (_PAGE_SPECIAL));
 PTE_BIT_FUNC(mkhuge,   |= (_PAGE_HW_SZ));
 
-#define __HAVE_ARCH_PTE_SPECIAL
-
 static inline pte_t pte_modify(pte_t pte, pgprot_t newprot)
 {
return __pte((pte_val(pte) & _PAGE_CHG_MASK) | pgprot_val(newprot));
diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index a7f8e7f4b88f..c088c851b235 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -8,6 +8,7 @@ config ARM
select ARCH_HAS_DEVMEM_IS_ALLOWED
select ARCH_HAS_ELF_RANDOMIZE
select ARCH_HAS_FORTIFY_SOURCE
+   select ARCH_HAS_PTE_SPECIAL if ARM_LPAE
select ARCH_HAS_SET_MEMORY
select ARCH_HAS_PHYS_TO_DMA
select ARCH_HAS_STRICT_KERNEL_RWX if MMU && !XIP_KERNEL
diff --git a/arch/arm/include/asm/pgtable-3level.h 
b/arch/arm/include/asm/pgtable-3level.h
index 2a4836087358..6d50a11d7793 100644
--- a/arch/arm/include/asm/pgtable-3level.h
+++ 

[PATCH v2 2/2] mm: remove odd HAVE_PTE_SPECIAL

2018-04-10 Thread Laurent Dufour
Remove the additional define HAVE_PTE_SPECIAL and rely directly on
CONFIG_ARCH_HAS_PTE_SPECIAL.

There is no functional change introduced by this patch

Signed-off-by: Laurent Dufour 
---
 mm/memory.c | 23 ++-
 1 file changed, 10 insertions(+), 13 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index 96910c625daa..53b6344a90d2 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -817,19 +817,13 @@ static void print_bad_pte(struct vm_area_struct *vma, 
unsigned long addr,
  * PFNMAP mappings in order to support COWable mappings.
  *
  */
-#ifdef CONFIG_ARCH_HAS_PTE_SPECIAL
-# define HAVE_PTE_SPECIAL 1
-#else
-# define HAVE_PTE_SPECIAL 0
-#endif
 struct page *_vm_normal_page(struct vm_area_struct *vma, unsigned long addr,
 pte_t pte, bool with_public_device)
 {
unsigned long pfn = pte_pfn(pte);
 
-   if (HAVE_PTE_SPECIAL) {
-   if (likely(!pte_special(pte)))
-   goto check_pfn;
+#ifdef CONFIG_ARCH_HAS_PTE_SPECIAL
+   if (unlikely(pte_special(pte))) {
if (vma->vm_ops && vma->vm_ops->find_special_page)
return vma->vm_ops->find_special_page(vma, addr);
if (vma->vm_flags & (VM_PFNMAP | VM_MIXEDMAP))
@@ -862,7 +856,7 @@ struct page *_vm_normal_page(struct vm_area_struct *vma, 
unsigned long addr,
return NULL;
}
 
-   /* !HAVE_PTE_SPECIAL case follows: */
+#else  /* CONFIG_ARCH_HAS_PTE_SPECIAL */
 
if (unlikely(vma->vm_flags & (VM_PFNMAP|VM_MIXEDMAP))) {
if (vma->vm_flags & VM_MIXEDMAP) {
@@ -881,7 +875,8 @@ struct page *_vm_normal_page(struct vm_area_struct *vma, 
unsigned long addr,
 
if (is_zero_pfn(pfn))
return NULL;
-check_pfn:
+#endif /* CONFIG_ARCH_HAS_PTE_SPECIAL */
+
if (unlikely(pfn > highest_memmap_pfn)) {
print_bad_pte(vma, addr, pte, NULL);
return NULL;
@@ -891,7 +886,7 @@ struct page *_vm_normal_page(struct vm_area_struct *vma, 
unsigned long addr,
 * NOTE! We still have PageReserved() pages in the page tables.
 * eg. VDSO mappings can cause them to exist.
 */
-out:
+out: __maybe_unused
return pfn_to_page(pfn);
 }
 
@@ -904,7 +899,7 @@ struct page *vm_normal_page_pmd(struct vm_area_struct *vma, 
unsigned long addr,
/*
 * There is no pmd_special() but there may be special pmds, e.g.
 * in a direct-access (dax) mapping, so let's just replicate the
-* !HAVE_PTE_SPECIAL case from vm_normal_page() here.
+* !CONFIG_ARCH_HAS_PTE_SPECIAL case from vm_normal_page() here.
 */
if (unlikely(vma->vm_flags & (VM_PFNMAP|VM_MIXEDMAP))) {
if (vma->vm_flags & VM_MIXEDMAP) {
@@ -1926,6 +1921,7 @@ static int __vm_insert_mixed(struct vm_area_struct *vma, 
unsigned long addr,
 
track_pfn_insert(vma, , pfn);
 
+#ifndef CONFIG_ARCH_HAS_PTE_SPECIAL
/*
 * If we don't have pte special, then we have to use the pfn_valid()
 * based VM_MIXEDMAP scheme (see vm_normal_page), and thus we *must*
@@ -1933,7 +1929,7 @@ static int __vm_insert_mixed(struct vm_area_struct *vma, 
unsigned long addr,
 * than insert_pfn).  If a zero_pfn were inserted into a VM_MIXEDMAP
 * without pte special, it would there be refcounted as a normal page.
 */
-   if (!HAVE_PTE_SPECIAL && !pfn_t_devmap(pfn) && pfn_t_valid(pfn)) {
+   if (!pfn_t_devmap(pfn) && pfn_t_valid(pfn)) {
struct page *page;
 
/*
@@ -1944,6 +1940,7 @@ static int __vm_insert_mixed(struct vm_area_struct *vma, 
unsigned long addr,
page = pfn_to_page(pfn_t_to_pfn(pfn));
return insert_page(vma, addr, page, pgprot);
}
+#endif
return insert_pfn(vma, addr, pfn, pgprot, mkwrite);
 }
 
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 0/2] move __HAVE_ARCH_PTE_SPECIAL in Kconfig

2018-04-10 Thread Laurent Dufour
The per architecture __HAVE_ARCH_PTE_SPECIAL is defined statically in the
per architecture header files. This doesn't allow to make other
configuration dependent on it.

The first patch of this series is replacing __HAVE_ARCH_PTE_SPECIAL by
CONFIG_ARCH_HAS_PTE_SPECIAL defined into the Kconfig files,
setting it automatically when architectures was already setting it in
header file.

The second patch is removing the odd define HAVE_PTE_SPECIAL which is a
duplicate of CONFIG_ARCH_HAS_PTE_SPECIAL.

There is no functional change introduced by this series.

Laurent Dufour (2):
  mm: introduce ARCH_HAS_PTE_SPECIAL
  mm: remove odd HAVE_PTE_SPECIAL

 .../features/vm/pte_special/arch-support.txt   |  2 +-
 arch/arc/Kconfig   |  1 +
 arch/arc/include/asm/pgtable.h |  2 --
 arch/arm/Kconfig   |  1 +
 arch/arm/include/asm/pgtable-3level.h  |  1 -
 arch/arm64/Kconfig |  1 +
 arch/arm64/include/asm/pgtable.h   |  2 --
 arch/powerpc/Kconfig   |  1 +
 arch/powerpc/include/asm/book3s/64/pgtable.h   |  3 ---
 arch/powerpc/include/asm/pte-common.h  |  3 ---
 arch/riscv/Kconfig |  1 +
 arch/s390/Kconfig  |  1 +
 arch/s390/include/asm/pgtable.h|  1 -
 arch/sh/Kconfig|  1 +
 arch/sh/include/asm/pgtable.h  |  2 --
 arch/sparc/Kconfig |  1 +
 arch/sparc/include/asm/pgtable_64.h|  3 ---
 arch/x86/Kconfig   |  1 +
 arch/x86/include/asm/pgtable_types.h   |  1 -
 include/linux/pfn_t.h  |  4 ++--
 mm/Kconfig |  3 +++
 mm/gup.c   |  4 ++--
 mm/memory.c| 23 ++
 23 files changed, 27 insertions(+), 36 deletions(-)

-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] mm: replace __HAVE_ARCH_PTE_SPECIAL

2018-04-10 Thread Laurent Dufour
On 09/04/2018 22:08, David Rientjes wrote:
> On Mon, 9 Apr 2018, Christoph Hellwig wrote:
> 
>>> -#ifdef __HAVE_ARCH_PTE_SPECIAL
>>> +#ifdef CONFIG_ARCH_HAS_PTE_SPECIAL
>>>  # define HAVE_PTE_SPECIAL 1
>>>  #else
>>>  # define HAVE_PTE_SPECIAL 0
>>
>> I'd say kill this odd indirection and just use the
>> CONFIG_ARCH_HAS_PTE_SPECIAL symbol directly.
>>
>>
> 
> Agree, and I think it would be easier to audit/review if patches 1 and 3 
> were folded together to see the relationship between the newly added 
> selects and what #define's it is replacing.  Otherwise, looks good!
>

Ok I will fold the 3 patches and introduce a new one removing HAVE_PTE_SPECIAL.

Thanks,
Laurent.

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC bpf-next v2 4/8] bpf: add documentation for eBPF helpers (23-32)

2018-04-10 Thread Quentin Monnet
Add documentation for eBPF helper functions to bpf.h user header file.
This documentation can be parsed with the Python script provided in
another commit of the patch series, in order to provide a RST document
that can later be converted into a man page.

The objective is to make the documentation easily understandable and
accessible to all eBPF developers, including beginners.

This patch contains descriptions for the following helper functions, all
written by Daniel:

- bpf_get_prandom_u32()
- bpf_get_smp_processor_id()
- bpf_get_cgroup_classid()
- bpf_get_route_realm()
- bpf_skb_load_bytes()
- bpf_csum_diff()
- bpf_skb_get_tunnel_opt()
- bpf_skb_set_tunnel_opt()
- bpf_skb_change_proto()
- bpf_skb_change_type()

Cc: Daniel Borkmann 
Signed-off-by: Quentin Monnet 
---
 include/uapi/linux/bpf.h | 125 +++
 1 file changed, 125 insertions(+)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index f3ea8824efbc..d147d9dd6a83 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -473,6 +473,14 @@ union bpf_attr {
  * The number of bytes written to the buffer, or a negative error
  * in case of failure.
  *
+ * u32 bpf_prandom_u32(void)
+ * Return
+ * A random 32-bit unsigned value.
+ *
+ * u32 bpf_get_smp_processor_id(void)
+ * Return
+ * The SMP (Symmetric multiprocessing) processor id.
+ *
  * int bpf_skb_store_bytes(struct sk_buff *skb, u32 offset, const void *from, 
u32 len, u64 flags)
  * Description
  * Store *len* bytes from address *from* into the packet
@@ -604,6 +612,13 @@ union bpf_attr {
  * Return
  * 0 on success, or a negative error in case of failure.
  *
+ * u32 bpf_get_cgroup_classid(struct sk_buff *skb)
+ * Description
+ * Retrieve the classid for the current task, i.e. for the
+ * net_cls (network classifier) cgroup to which *skb* belongs.
+ * Return
+ * The classid, or 0 for the default unconfigured classid.
+ *
  * int bpf_skb_vlan_push(struct sk_buff *skb, __be16 vlan_proto, u16 vlan_tci)
  * Description
  * Push a *vlan_tci* (VLAN tag control information) of protocol
@@ -703,6 +718,14 @@ union bpf_attr {
  * are **TC_ACT_REDIRECT** on success or **TC_ACT_SHOT** on
  * error.
  *
+ * u32 bpf_get_route_realm(struct sk_buff *skb)
+ * Description
+ * Retrieve the realm or the route, that is to say the
+ * **tclassid** field of the destination for the *skb*.
+ * Return
+ * The realm of the route for the packet associated to *sdb*, or 0
+ * if none was found.
+ *
  * int bpf_perf_event_output(struct pt_reg *ctx, struct bpf_map *map, u64 
flags, void *data, u64 size)
  * Description
  * Write perf raw sample into a perf event held by *map* of type
@@ -779,6 +802,21 @@ union bpf_attr {
  * Return
  * 0 on success, or a negative error in case of failure.
  *
+ * int bpf_skb_load_bytes(const struct sk_buff *skb, u32 offset, void *to, u32 
len)
+ * Description
+ * This helper was provided as an easy way to load data from a
+ * packet. It can be used to load *len* bytes from *offset* from
+ * the packet associated to *skb*, into the buffer pointed by
+ * *to*.
+ *
+ * Since Linux 4.7, this helper is deprecated in favor of
+ * "direct packet access", enabling packet data to be manipulated
+ * with *skb*\ **->data** and *skb*\ **->data_end** pointing
+ * respectively to the first byte of packet data and to the byte
+ * after the last byte of packet data.
+ * Return
+ * 0 on success, or a negative error in case of failure.
+ *
  * int bpf_get_stackid(struct pt_reg *ctx, struct bpf_map *map, u64 flags)
  * Description
  * Walk a user or a kernel stack and return its id. To achieve
@@ -814,6 +852,93 @@ union bpf_attr {
  * The positive or null stack id on success, or a negative error
  * in case of failure.
  *
+ * s64 bpf_csum_diff(__be32 *from, u32 from_size, __be32 *to, u32 to_size, 
__wsum seed)
+ * Description
+ * Compute a checksum difference, from the raw buffer pointed by
+ * *from*, of length *from_size* (that must be a multiple of 4),
+ * towards the raw buffer pointed by *to*, of size *to_size*
+ * (same remark). An optional *seed* can be added to the value.
+ *
+ * This is flexible enough to be used in several ways:
+ *
+ * * With *from_size* == 0, *to_size* > 0 and *seed* set to
+ *   checksum, it can be used when pushing new data.
+ * * With *from_size* > 0, *to_size* == 0 and *seed* set to
+ *   checksum, it can be used when 

[RFC bpf-next v2 3/8] bpf: add documentation for eBPF helpers (12-22)

2018-04-10 Thread Quentin Monnet
Add documentation for eBPF helper functions to bpf.h user header file.
This documentation can be parsed with the Python script provided in
another commit of the patch series, in order to provide a RST document
that can later be converted into a man page.

The objective is to make the documentation easily understandable and
accessible to all eBPF developers, including beginners.

This patch contains descriptions for the following helper functions, all
writter by Alexei:

- bpf_get_current_pid_tgid()
- bpf_get_current_uid_gid()
- bpf_get_current_comm()
- bpf_skb_vlan_push()
- bpf_skb_vlan_pop()
- bpf_skb_get_tunnel_key()
- bpf_skb_set_tunnel_key()
- bpf_redirect()
- bpf_perf_event_output()
- bpf_get_stackid()
- bpf_get_current_task()

Cc: Alexei Starovoitov 
Signed-off-by: Quentin Monnet 
---
 include/uapi/linux/bpf.h | 237 +++
 1 file changed, 237 insertions(+)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 2bc653a3a20f..f3ea8824efbc 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -580,6 +580,243 @@ union bpf_attr {
  * performed again.
  * Return
  * 0 on success, or a negative error in case of failure.
+ *
+ * u64 bpf_get_current_pid_tgid(void)
+ * Return
+ * A 64-bit integer containing the current tgid and pid, and
+ * created as such:
+ * *current_task*\ **->tgid << 32 \|**
+ * *current_task*\ **->pid**.
+ *
+ * u64 bpf_get_current_uid_gid(void)
+ * Return
+ * A 64-bit integer containing the current GID and UID, and
+ * created as such: *current_gid* **<< 32 \|** *current_uid*.
+ *
+ * int bpf_get_current_comm(char *buf, u32 size_of_buf)
+ * Description
+ * Copy the **comm** attribute of the current task into *buf* of
+ * *size_of_buf*. The **comm** attribute contains the name of
+ * the executable (excluding the path) for the current task. The
+ * *size_of_buf* must be strictly positive. On success, the
+ * helper makes sure that the *buf* is NUL-terminated. On failure,
+ * it is filled with zeroes.
+ * Return
+ * 0 on success, or a negative error in case of failure.
+ *
+ * int bpf_skb_vlan_push(struct sk_buff *skb, __be16 vlan_proto, u16 vlan_tci)
+ * Description
+ * Push a *vlan_tci* (VLAN tag control information) of protocol
+ * *vlan_proto* to the packet associated to *skb*, then update
+ * the checksum. Note that if *vlan_proto* is different from
+ * **ETH_P_8021Q** and **ETH_P_8021AD**, it is considered to
+ * be **ETH_P_8021Q**.
+ *
+ * A call to this helper is susceptible to change data from the
+ * packet. Therefore, at load time, all checks on pointers
+ * previously done by the verifier are invalidated and must be
+ * performed again.
+ * Return
+ * 0 on success, or a negative error in case of failure.
+ *
+ * int bpf_skb_vlan_pop(struct sk_buff *skb)
+ * Description
+ * Pop a VLAN header from the packet associated to *skb*.
+ *
+ * A call to this helper is susceptible to change data from the
+ * packet. Therefore, at load time, all checks on pointers
+ * previously done by the verifier are invalidated and must be
+ * performed again.
+ * Return
+ * 0 on success, or a negative error in case of failure.
+ *
+ * int bpf_skb_get_tunnel_key(struct sk_buff *skb, struct bpf_tunnel_key *key, 
u32 size, u64 flags)
+ * Description
+ * Get tunnel metadata. This helper takes a pointer *key* to an
+ * empty **struct bpf_tunnel_key** of **size**, that will be
+ * filled with tunnel metadata for the packet associated to *skb*.
+ * The *flags* can be set to **BPF_F_TUNINFO_IPV6**, which
+ * indicates that the tunnel is based on IPv6 protocol instead of
+ * IPv4.
+ *
+ * This is typically used on the receive path to perform a lookup
+ * or a packet redirection based on the value of *key*:
+ *
+ * ::
+ *
+ * struct bpf_tunnel_key key = {};
+ * bpf_skb_get_tunnel_key(skb, , sizeof(key), 0);
+ *  lookup or redirect based on key ...
+ *
+ * Return
+ * 0 on success, or a negative error in case of failure.
+ *
+ * int bpf_skb_set_tunnel_key(struct sk_buff *skb, struct bpf_tunnel_key *key, 
u32 size, u64 flags)
+ * Description
+ * Populate tunnel metadata for packet associated to *skb.* The
+ * tunnel metadata is set to the contents of *key*, of *size*. The
+ * *flags* can be set to a combination of the following values:
+ *
+ * 

[RFC bpf-next v2 0/8] bpf: document eBPF helpers and add a script to generate man page

2018-04-10 Thread Quentin Monnet
eBPF helper functions can be called from within eBPF programs to perform
a variety of tasks that would be otherwise hard or impossible to do with
eBPF itself. There is a growing number of such helper functions in the
kernel, but documentation is scarce. The main user space header file
does contain a short commented description of most helpers, but it is
somewhat outdated and not complete. It is more a "cheat sheet" than a
real documentation accessible to new eBPF developers.

This commit attempts to improve the situation by replacing the existing
overview for the helpers with a more developed description. Furthermore,
a Python script is added to generate a manual page for eBPF helpers. The
workflow is the following, and requires the rst2man utility:

$ ./scripts/bpf_helpers_doc.py \
--filename include/uapi/linux/bpf.h > /tmp/bpf-helpers.rst
$ rst2man /tmp/bpf-helpers.rst > /tmp/bpf-helpers.7
$ man /tmp/bpf-helpers.7

The objective is to keep all documentation related to the helpers in a
single place, and to be able to generate from here a manual page that
could be packaged in the man-pages repository and shipped with most
distributions.

Additionally, parsing the prototypes of the helper functions could
hopefully be reused, with a different Printer object, to generate
header files needed in some eBPF-related projects.

Regarding the description of each helper, it comprises several items:

- The function prototype.
- A description of the function and of its arguments (except for a
  couple of cases, when there are no arguments and the return value
  makes the function usage really obvious).
- A description of return values (if not void).

Additional items such as the list of compatible eBPF program and map
types for each helper, Linux kernel version that introduced the helper,
GPL-only restriction, and commit hash could be added in the future, but
it was decided on the mailing list to leave them aside for now.

For several helpers, descriptions are inspired (at times, nearly copied)
from the commit logs introducing them in the kernel--Many thanks to
their respective authors! They were completed as much as possible, the
objective being to have something easily accessible even for people just
starting with eBPF. There is probably a bit more work to do in this
direction for some helpers.

Some RST formatting is used in the descriptions (not in function
prototypes, to keep them readable, but the Python script provided in
order to generate the RST for the manual page does add formatting to
prototypes, to produce something pretty) to get "bold" and "italics" in
manual pages. Hopefully, the descriptions in bpf.h file remains
perfectly readable. Note that the few trailing white spaces are
intentional, removing them would break paragraphs for rst2man.

The descriptions should ideally be updated each time someone adds a new
helper, or updates the behaviour (new socket option supported, ...) or
the interface (new flags available, ...) of existing ones.

The second RFC for this set splits the documentation into several patches.
Ideally all helper descriptions should be reviewed by the respective
authors of the functions they describe. Please do not hesitate to suggest
improvements to make descriptions more complete or accessible.

v2:
- Remove "For" (compatible program and map types), "Since" (minimal
  Linux kernel version required), "GPL only" sections and commit hashes
  for the helpers.
- Add comment on top of the description list to explain how this
  documentation is supposed to be processed.
- Update Python script accordingly (remove the same sections, and remove
  paragraphs on program types and GPL restrictions from man page
  header).
- Split series into several patches.

Cc: linux-doc@vger.kernel.org
Cc: linux-...@vger.kernel.org
Signed-off-by: Quentin Monnet 

Quentin Monnet (8):
  bpf: add script and prepare bpf.h for new helpers documentation
  bpf: add documentation for eBPF helpers (01-11)
  bpf: add documentation for eBPF helpers (12-22)
  bpf: add documentation for eBPF helpers (23-32)
  bpf: add documentation for eBPF helpers (33-41)
  bpf: add documentation for eBPF helpers (42-50)
  bpf: add documentation for eBPF helpers (51-57)
  bpf: add documentation for eBPF helpers (58-64)

 include/uapi/linux/bpf.h   | 1580 +---
 scripts/bpf_helpers_doc.py |  414 
 2 files changed, 1616 insertions(+), 378 deletions(-)
 create mode 100755 scripts/bpf_helpers_doc.py

-- 
2.14.1

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC bpf-next v2 2/8] bpf: add documentation for eBPF helpers (01-11)

2018-04-10 Thread Quentin Monnet
Add documentation for eBPF helper functions to bpf.h user header file.
This documentation can be parsed with the Python script provided in
another commit of the patch series, in order to provide a RST document
that can later be converted into a man page.

The objective is to make the documentation easily understandable and
accessible to all eBPF developers, including beginners.

This patch contains descriptions for the following helper functions, all
written by Alexei:

- bpf_map_lookup_elem()
- bpf_map_update_elem()
- bpf_map_delete_elem()
- bpf_probe_read()
- bpf_ktime_get_ns()
- bpf_trace_printk()
- bpf_skb_store_bytes()
- bpf_l3_csum_replace()
- bpf_l4_csum_replace()
- bpf_tail_call()
- bpf_clone_redirect()

Cc: Alexei Starovoitov 
Signed-off-by: Quentin Monnet 
---
 include/uapi/linux/bpf.h | 199 +++
 1 file changed, 199 insertions(+)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 45f77f01e672..2bc653a3a20f 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -381,6 +381,205 @@ union bpf_attr {
  * intentional, removing them would break paragraphs for rst2man.
  *
  * Start of BPF helper function descriptions:
+ *
+ * void *bpf_map_lookup_elem(struct bpf_map *map, void *key)
+ * Description
+ * Perform a lookup in *map* for an entry associated to *key*.
+ * Return
+ * Map value associated to *key*, or **NULL** if no entry was
+ * found.
+ *
+ * int bpf_map_update_elem(struct bpf_map *map, void *key, void *value, u64 
flags)
+ * Description
+ * Add or update the value of the entry associated to *key* in
+ * *map* with *value*. *flags* is one of:
+ *
+ * **BPF_NOEXIST**
+ * The entry for *key* must not exist in the map.
+ * **BPF_EXIST**
+ * The entry for *key* must already exist in the map.
+ * **BPF_ANY**
+ * No condition on the existence of the entry for *key*.
+ *
+ * These flags are only useful for maps of type
+ * **BPF_MAP_TYPE_HASH**. For all other map types, **BPF_ANY**
+ * should be used.
+ * Return
+ * 0 on success, or a negative error in case of failure.
+ *
+ * int bpf_map_delete_elem(struct bpf_map *map, void *key)
+ * Description
+ * Delete entry with *key* from *map*.
+ * Return
+ * 0 on success, or a negative error in case of failure.
+ *
+ * int bpf_probe_read(void *dst, u32 size, const void *src)
+ * Description
+ * For tracing programs, safely attempt to read *size* bytes from
+ * address *src* and store the data in *dst*.
+ * Return
+ * 0 on success, or a negative error in case of failure.
+ *
+ * u64 bpf_ktime_get_ns(void)
+ * Description
+ * Return the time elapsed since system boot, in nanoseconds.
+ * Return
+ * Current *ktime*.
+ *
+ * int bpf_trace_printk(const char *fmt, u32 fmt_size, ...)
+ * Description
+ * This helper is a "printk()-like" facility for debugging. It
+ * prints a message defined by format *fmt* (of size *fmt_size*)
+ * to file *\/sys/kernel/debug/tracing/trace* from DebugFS, if
+ * available. It can take up to three additional **u64**
+ * arguments (as an eBPF helpers, the total number of arguments is
+ * limited to five). Each time the helper is called, it appends a
+ * line that looks like the following:
+ *
+ * ::
+ *
+ * telnet-470   [001] .N.. 419421.045894: 0x0001: BPF 
command: 2
+ *
+ * In the above:
+ *
+ * * ``telnet`` is the name of the current task.
+ * * ``470`` is the PID of the current task.
+ * * ``001`` is the CPU number on which the task is
+ *   running.
+ * * In ``.N..``, each character refers to a set of
+ *   options (whether irqs are enabled, scheduling
+ *   options, whether hard/softirqs are running, level of
+ *   preempt_disabled respectively). **N** means that
+ *   **TIF_NEED_RESCHED** and **PREEMPT_NEED_RESCHED**
+ *   are set.
+ * * ``419421.045894`` is a timestamp.
+ * * ``0x0001`` is a fake value used by BPF for the
+ *   instruction pointer register.
+ * * ``BPF command: 2`` is the message formatted with
+ *   *fmt*.
+ *
+ * The conversion specifiers supported by *fmt* are similar, but
+ * more limited than for printk(). They are **%d**, **%i**,
+ * **%u**, **%x**, **%ld**, **%li**, **%lu**, **%lx**, 

[RFC bpf-next v2 8/8] bpf: add documentation for eBPF helpers (58-64)

2018-04-10 Thread Quentin Monnet
Add documentation for eBPF helper functions to bpf.h user header file.
This documentation can be parsed with the Python script provided in
another commit of the patch series, in order to provide a RST document
that can later be converted into a man page.

The objective is to make the documentation easily understandable and
accessible to all eBPF developers, including beginners.

This patch contains descriptions for the following helper functions, all
written by John:

- bpf_redirect_map()
- bpf_sk_redirect_map()
- bpf_sock_map_update()
- bpf_msg_redirect_map()
- bpf_msg_apply_bytes()
- bpf_msg_cork_bytes()
- bpf_msg_pull_data()

Cc: John Fastabend 
Signed-off-by: Quentin Monnet 
---
 include/uapi/linux/bpf.h | 140 +++
 1 file changed, 140 insertions(+)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 7343af4196c8..db090ad03626 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -1250,6 +1250,51 @@ union bpf_attr {
  * Return
  * 0 on success, or a negative error in case of failure.
  *
+ * int bpf_redirect_map(struct bpf_map *map, u32 key, u64 flags)
+ * Description
+ * Redirect the packet to the endpoint referenced by *map* at
+ * index *key*. Depending on its type, his *map* can contain
+ * references to net devices (for forwarding packets through other
+ * ports), or to CPUs (for redirecting XDP frames to another CPU;
+ * but this is not fully implemented as of this writing).
+ *
+ * All values for *flags* are reserved for future usage, and must
+ * be left at zero.
+ * Return
+ * **XDP_REDIRECT** on success, or **XDP_ABORT** on error.
+ *
+ * int bpf_sk_redirect_map(struct bpf_map *map, u32 key, u64 flags)
+ * Description
+ * Redirect the packet to the socket referenced by *map* (of type
+ * **BPF_MAP_TYPE_SOCKMAP**) at index *key*. The only flag
+ * supported for now is **BPF_F_INGRESS**, which indicates the
+ * packet is to be redirected to the ingress side of the socket
+ * instead of (by default) egress.
+ *
+ * All values for *flags* are reserved for future usage, and must
+ * be left at zero.
+ * Return
+ * **SK_PASS** on success, or **SK_DROP** on error.
+ *
+ * int bpf_sock_map_update(struct bpf_sock_ops_kern *skops, struct bpf_map 
*map, void *key, u64 flags)
+ * Description
+ * Add an entry to, or update a *map* referencing sockets. The
+ * *skops* is used as a new value for the entry associated to
+ * *key*. *flags* is one of:
+ *
+ * **BPF_NOEXIST**
+ * The entry for *key* must not exist in the map.
+ * **BPF_EXIST**
+ * The entry for *key* must already exist in the map.
+ * **BPF_ANY**
+ * No condition on the existence of the entry for *key*.
+ *
+ * If the *map* has eBPF programs (parser and verdict), those will
+ * be inherited by the socket being added. If the socket is
+ * already attached to eBPF programs, this results in an error.
+ * Return
+ * 0 on success, or a negative error in case of failure.
+ *
  * int bpf_xdp_adjust_meta(struct xdp_buff *xdp_md, int delta)
  * Description
  * Adjust the address pointed by *xdp_md*\ **->data_meta** by
@@ -1417,6 +1462,101 @@ union bpf_attr {
  * be set is returned (which comes down to 0 if all bits were set
  * as required).
  *
+ * int bpf_msg_redirect_map(struct sk_msg_buff *msg, struct bpf_map *map, u32 
key, u64 flags)
+ * Description
+ * This helper is used in programs implementing policies at the
+ * socket level. If the message *msg* is allowed to pass (i.e. if
+ * the verdict eBPF program returns **SK_PASS**), redirect it to
+ * the socket referenced by *map* (of type
+ * **BPF_MAP_TYPE_SOCKMAP**) at index *key*. The only flag
+ * supported for now is **BPF_F_INGRESS**, which indicates the
+ * packet is to be redirected to the ingress side of the socket
+ * instead of (by default) egress.
+ * Return
+ * **SK_PASS** on success, or **SK_DROP** on error.
+ *
+ * int bpf_msg_apply_bytes(struct sk_msg_buff *msg, u32 bytes)
+ * Description
+ * For socket policies, apply the verdict of the eBPF program to
+ * the next *bytes* (number of bytes) of message *msg*.
+ *
+ * For example, this helper can be used in the following cases:
+ *
+ * * A single **sendmsg**\ () or **sendfile**\ () system call
+ *   contains multiple logical messages that the eBPF program is
+ *   

[RFC bpf-next v2 1/8] bpf: add script and prepare bpf.h for new helpers documentation

2018-04-10 Thread Quentin Monnet
Remove previous "overview" of eBPF helpers from user bpf.h header.
Replace it by a comment explaining how to process the new documentation
(to come in following patches) with a Python script to produce RST, then
man page documentation.

Also add the aforementioned Python script under scripts/. It is used to
process include/uapi/linux/bpf.h and to extract helper descriptions, to
turn it into a RST document that can further be processed with rst2man
to produce a man page. The script takes one "--filename "
option. If the script is launched from scripts/ in the kernel root
directory, it should be able to find the location of the header to
parse, and "--filename " is then optional. If it cannot
find the file, then the option becomes mandatory. RST-formatted
documentation is printed to standard output.

Typical workflow for producing the final man page would be:

$ ./scripts/bpf_helpers_doc.py \
--filename include/uapi/linux/bpf.h > /tmp/bpf-helpers.rst
$ rst2man /tmp/bpf-helpers.rst > /tmp/bpf-helpers.7
$ man /tmp/bpf-helpers.7

Note that the tool kernel-doc cannot be used to document eBPF helpers,
whose signatures are not available directly in the header files
(pre-processor directives are used to produce them at the beginning of
the compilation process).

Signed-off-by: Quentin Monnet 
---
 include/uapi/linux/bpf.h   | 406 ++--
 scripts/bpf_helpers_doc.py | 414 +
 2 files changed, 430 insertions(+), 390 deletions(-)
 create mode 100755 scripts/bpf_helpers_doc.py

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index c5ec89732a8d..45f77f01e672 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -365,396 +365,22 @@ union bpf_attr {
} raw_tracepoint;
 } __attribute__((aligned(8)));
 
-/* BPF helper function descriptions:
- *
- * void *bpf_map_lookup_elem(, )
- * Return: Map value or NULL
- *
- * int bpf_map_update_elem(, , , flags)
- * Return: 0 on success or negative error
- *
- * int bpf_map_delete_elem(, )
- * Return: 0 on success or negative error
- *
- * int bpf_probe_read(void *dst, int size, void *src)
- * Return: 0 on success or negative error
- *
- * u64 bpf_ktime_get_ns(void)
- * Return: current ktime
- *
- * int bpf_trace_printk(const char *fmt, int fmt_size, ...)
- * Return: length of buffer written or negative error
- *
- * u32 bpf_prandom_u32(void)
- * Return: random value
- *
- * u32 bpf_raw_smp_processor_id(void)
- * Return: SMP processor ID
- *
- * int bpf_skb_store_bytes(skb, offset, from, len, flags)
- * store bytes into packet
- * @skb: pointer to skb
- * @offset: offset within packet from skb->mac_header
- * @from: pointer where to copy bytes from
- * @len: number of bytes to store into packet
- * @flags: bit 0 - if true, recompute skb->csum
- * other bits - reserved
- * Return: 0 on success or negative error
- *
- * int bpf_l3_csum_replace(skb, offset, from, to, flags)
- * recompute IP checksum
- * @skb: pointer to skb
- * @offset: offset within packet where IP checksum is located
- * @from: old value of header field
- * @to: new value of header field
- * @flags: bits 0-3 - size of header field
- * other bits - reserved
- * Return: 0 on success or negative error
- *
- * int bpf_l4_csum_replace(skb, offset, from, to, flags)
- * recompute TCP/UDP checksum
- * @skb: pointer to skb
- * @offset: offset within packet where TCP/UDP checksum is located
- * @from: old value of header field
- * @to: new value of header field
- * @flags: bits 0-3 - size of header field
- * bit 4 - is pseudo header
- * other bits - reserved
- * Return: 0 on success or negative error
- *
- * int bpf_tail_call(ctx, prog_array_map, index)
- * jump into another BPF program
- * @ctx: context pointer passed to next program
- * @prog_array_map: pointer to map which type is BPF_MAP_TYPE_PROG_ARRAY
- * @index: 32-bit index inside array that selects specific program to run
- * Return: 0 on success or negative error
- *
- * int bpf_clone_redirect(skb, ifindex, flags)
- * redirect to another netdev
- * @skb: pointer to skb
- * @ifindex: ifindex of the net device
- * @flags: bit 0 - if set, redirect to ingress instead of egress
- * other bits - reserved
- * Return: 0 on success or negative error
- *
- * u64 bpf_get_current_pid_tgid(void)
- * Return: current->tgid << 32 | current->pid
- *
- * u64 bpf_get_current_uid_gid(void)
- * Return: current_gid << 32 | current_uid
- *
- * int bpf_get_current_comm(char *buf, int size_of_buf)
- * stores current->comm into buf
- * Return: 0 on success or negative error
- *
- * u32 bpf_get_cgroup_classid(skb)
- * retrieve a proc's classid
- * @skb: pointer to skb
- * Return: 

[RFC bpf-next v2 6/8] bpf: add documentation for eBPF helpers (42-50)

2018-04-10 Thread Quentin Monnet
Add documentation for eBPF helper functions to bpf.h user header file.
This documentation can be parsed with the Python script provided in
another commit of the patch series, in order to provide a RST document
that can later be converted into a man page.

The objective is to make the documentation easily understandable and
accessible to all eBPF developers, including beginners.

This patch contains descriptions for the following helper functions:

Helper from Kaixu:
- bpf_perf_event_read()

Helpers from Martin:
- bpf_skb_under_cgroup()
- bpf_xdp_adjust_head()

Helpers from Sargun:
- bpf_probe_write_user()
- bpf_current_task_under_cgroup()

Helper from Thomas:
- bpf_skb_change_head()

Helper from Gianluca:
- bpf_probe_read_str()

Helpers from Chenbo:
- bpf_get_socket_cookie()
- bpf_get_socket_uid()

Cc: Kaixu Xia 
Cc: Martin KaFai Lau 
Cc: Sargun Dhillon 
Cc: Thomas Graf 
Cc: Gianluca Borello 
Cc: Chenbo Feng 
Signed-off-by: Quentin Monnet 
---
 include/uapi/linux/bpf.h | 158 +++
 1 file changed, 158 insertions(+)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index af429ec79f50..15d9ccafebbe 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -701,6 +701,25 @@ union bpf_attr {
  * Return
  * 0 on success, or a negative error in case of failure.
  *
+ * u64 bpf_perf_event_read(struct bpf_map *map, u64 flags)
+ * Description
+ * Read the value of a perf event counter. This helper relies on a
+ * *map* of type **BPF_MAP_TYPE_PERF_EVENT_ARRAY**. The
+ * nature of the perf event counter is selected at the creation of
+ * the *map*. The *map* is an array whose size is the number of
+ * available CPU cores, and each cell contains a value relative to
+ * one core. The value to retrieve is indicated by *flags*, that
+ * contains the index of the core to look up, masked with
+ * **BPF_F_INDEX_MASK**. Alternatively, *flags* can be set to
+ * **BPF_F_CURRENT_CPU** to indicate that the value for the
+ * current CPU core should be retrieved.
+ *
+ * Note that before Linux 4.13, only hardware perf event can be
+ * retrieved.
+ * Return
+ * The value of the perf event counter read from the map, or a
+ * negative error code in case of failure.
+ *
  * int bpf_redirect(u32 ifindex, u64 flags)
  * Description
  * Redirect the packet to another net device of index *ifindex*.
@@ -939,6 +958,17 @@ union bpf_attr {
  * Return
  * 0 on success, or a negative error in case of failure.
  *
+ * int bpf_skb_under_cgroup(struct sk_buff *skb, struct bpf_map *map, u32 
index)
+ * Description
+ * Check whether *skb* is a descendant of the cgroup2 held by
+ * *map* of type **BPF_MAP_TYPE_CGROUP_ARRAY**, at *index*.
+ * Return
+ * The return value depends on the result of the test, and can be:
+ *
+ * * 0, if the *skb* failed the cgroup2 descendant test.
+ * * 1, if the *skb* succeeded the cgroup2 descendant test.
+ * * A negative error code, if an error occurred.
+ *
  * u32 bpf_get_hash_recalc(struct sk_buff *skb)
  * Description
  * Retrieve the hash of the packet, *skb*\ **->hash**. If it is
@@ -959,6 +989,37 @@ union bpf_attr {
  * Return
  * A pointer to the current task struct.
  *
+ * int bpf_probe_write_user(void *dst, const void *src, u32 len)
+ * Description
+ * Attempt in a safe way to write *len* bytes from the buffer
+ * *src* to *dst* in memory. It only works for threads that are in
+ * user context.
+ *
+ * This helper should not be used to implement any kind of
+ * security mechanism because of TOC-TOU attacks, but rather to
+ * debug, divert, and manipulate execution of semi-cooperative
+ * processes.
+ *
+ * Keep in mind that this feature is meant for experiments, and it
+ * has a risk of crashing the system and running programs.
+ * Therefore, when an eBPF program using this helper is attached,
+ * a warning including PID and process name is printed to kernel
+ * logs.
+ * Return
+ * 0 on success, or a negative error in case of failure.
+ *
+ * int bpf_current_task_under_cgroup(struct bpf_map *map, u32 index)
+ * Description
+ * Check whether the probe is being run is the context of a given
+ * subset of the cgroup2 hierarchy. The cgroup2 to test is held by
+ * *map* of type **BPF_MAP_TYPE_CGROUP_ARRAY**, at *index*.
+ * Return
+ * The return value depends 

[RFC bpf-next v2 5/8] bpf: add documentation for eBPF helpers (33-41)

2018-04-10 Thread Quentin Monnet
Add documentation for eBPF helper functions to bpf.h user header file.
This documentation can be parsed with the Python script provided in
another commit of the patch series, in order to provide a RST document
that can later be converted into a man page.

The objective is to make the documentation easily understandable and
accessible to all eBPF developers, including beginners.

This patch contains descriptions for the following helper functions, all
written by Daniel:

- bpf_get_hash_recalc()
- bpf_skb_change_tail()
- bpf_skb_pull_data()
- bpf_csum_update()
- bpf_set_hash_invalid()
- bpf_get_numa_node_id()
- bpf_set_hash()
- bpf_skb_adjust_room()
- bpf_xdp_adjust_meta()

Cc: Daniel Borkmann 
Signed-off-by: Quentin Monnet 
---
 include/uapi/linux/bpf.h | 155 +++
 1 file changed, 155 insertions(+)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index d147d9dd6a83..af429ec79f50 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -939,9 +939,164 @@ union bpf_attr {
  * Return
  * 0 on success, or a negative error in case of failure.
  *
+ * u32 bpf_get_hash_recalc(struct sk_buff *skb)
+ * Description
+ * Retrieve the hash of the packet, *skb*\ **->hash**. If it is
+ * not set, in particular if the hash was cleared due to mangling,
+ * recompute this hash. Later accesses to the hash can be done
+ * directly with *skb*\ **->hash**.
+ *
+ * Calling **bpf_set_hash_invalid**\ (), changing a packet
+ * prototype with **bpf_skb_change_proto**\ (), or calling
+ * **bpf_skb_store_bytes**\ () with the
+ * **BPF_F_INVALIDATE_HASH** are actions susceptible to clear
+ * the hash and to trigger a new computation for the next call to
+ * **bpf_get_hash_recalc**\ ().
+ * Return
+ * The 32-bit hash.
+ *
  * u64 bpf_get_current_task(void)
  * Return
  * A pointer to the current task struct.
+ *
+ * int bpf_skb_change_tail(struct sk_buff *skb, u32 len, u64 flags)
+ * Description
+ * Resize (trim or grow) the packet associated to *skb* to the
+ * new *len*. The *flags* are reserved for future usage, and must
+ * be left at zero.
+ *
+ * The basic idea is that the helper performs the needed work to
+ * change the size of the packet, then the eBPF program rewrites
+ * the rest via helpers like **bpf_skb_store_bytes**\ (),
+ * **bpf_l3_csum_replace**\ (), **bpf_l3_csum_replace**\ ()
+ * and others. This helper is a slow path utility intended for
+ * replies with control messages. And because it is targeted for
+ * slow path, the helper itself can afford to be slow: it
+ * implicitly linearizes, unclones and drops offloads from the
+ * *skb*.
+ *
+ * A call to this helper is susceptible to change data from the
+ * packet. Therefore, at load time, all checks on pointers
+ * previously done by the verifier are invalidated and must be
+ * performed again.
+ * Return
+ * 0 on success, or a negative error in case of failure.
+ *
+ * int bpf_skb_pull_data(struct sk_buff *skb, u32 len)
+ * Description
+ * Pull in non-linear data in case the *skb* is non-linear and not
+ * all of *len* are part of the linear section. Make *len* bytes
+ * from *skb* readable and writable. If a zero value is passed for
+ * *len*, then the whole length of the *skb* is pulled.
+ *
+ * This helper is only needed for reading and writing with direct
+ * packet access.
+ *
+ * For direct packet access, when testing that offsets to access
+ * are within packet boundaries (test on *skb*\ **->data_end**)
+ * fails, programs just bail out, or, in the direct read case, use
+ * **bpf_skb_load_bytes()** as an alternative to overcome this
+ * limitation. If such data sits in non-linear parts, it is
+ * possible to pull them in once with the new helper, retest and
+ * eventually access them.
+ *
+ * At the same time, this also makes sure the skb is uncloned,
+ * which is a necessary condition for direct write. As this needs
+ * to be an invariant for the write part only, the verifier
+ * detects writes and adds a prologue that is calling
+ * **bpf_skb_pull_data()** to effectively unclone the skb from the
+ * very beginning in case it is indeed cloned.
+ *
+ * A call to this helper is susceptible to change data from the
+ * packet. Therefore, at load time, all checks on pointers
+ * previously done by 

[RFC bpf-next v2 7/8] bpf: add documentation for eBPF helpers (51-57)

2018-04-10 Thread Quentin Monnet
Add documentation for eBPF helper functions to bpf.h user header file.
This documentation can be parsed with the Python script provided in
another commit of the patch series, in order to provide a RST document
that can later be converted into a man page.

The objective is to make the documentation easily understandable and
accessible to all eBPF developers, including beginners.

This patch contains descriptions for the following helper functions:

Helpers from Lawrence:
- bpf_setsockopt()
- bpf_getsockopt()
- bpf_sock_ops_cb_flags_set()

Helpers from Yonghong:
- bpf_perf_event_read_value()
- bpf_perf_prog_read_value()

Helper from Josef:
- bpf_override_return()

Helper from Andrey:
- bpf_bind()

Cc: Lawrence Brakmo 
Cc: Yonghong Song 
Cc: Josef Bacik 
Cc: Andrey Ignatov 
Signed-off-by: Quentin Monnet 
---
 include/uapi/linux/bpf.h | 184 +++
 1 file changed, 184 insertions(+)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 15d9ccafebbe..7343af4196c8 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -1208,6 +1208,28 @@ union bpf_attr {
  * Return
  * 0
  *
+ * int bpf_setsockopt(struct bpf_sock_ops_kern *bpf_socket, int level, int 
optname, char *optval, int optlen)
+ * Description
+ * Emulate a call to **setsockopt()** on the socket associated to
+ * *bpf_socket*, which must be a full socket. The *level* at
+ * which the option resides and the name *optname* of the option
+ * must be specified, see **setsockopt(2)** for more information.
+ * The option value of length *optlen* is pointed by *optval*.
+ *
+ * This helper actually implements a subset of **setsockopt()**.
+ * It supports the following *level*\ s:
+ *
+ * * **SOL_SOCKET**, which supports the following *optname*\ s:
+ *   **SO_RCVBUF**, **SO_SNDBUF**, **SO_MAX_PACING_RATE**,
+ *   **SO_PRIORITY**, **SO_RCVLOWAT**, **SO_MARK**.
+ * * **IPPROTO_TCP**, which supports the following *optname*\ s:
+ *   **TCP_CONGESTION**, **TCP_BPF_IW**,
+ *   **TCP_BPF_SNDCWND_CLAMP**.
+ * * **IPPROTO_IP**, which supports *optname* **IP_TOS**.
+ * * **IPPROTO_IPV6**, which supports *optname* **IPV6_TCLASS**.
+ * Return
+ * 0 on success, or a negative error in case of failure.
+ *
  * int bpf_skb_adjust_room(struct sk_buff *skb, u32 len_diff, u32 mode, u64 
flags)
  * Description
  * Grow or shrink the room for data in the packet associated to
@@ -1255,6 +1277,168 @@ union bpf_attr {
  * performed again.
  * Return
  * 0 on success, or a negative error in case of failure.
+ *
+ * int bpf_perf_event_read_value(struct bpf_map *map, u64 flags, struct 
bpf_perf_event_value *buf, u32 buf_size)
+ * Description
+ * Read the value of a perf event counter, and store it into *buf*
+ * of size *buf_size*. This helper relies on a *map* of type
+ * **BPF_MAP_TYPE_PERF_EVENT_ARRAY**. The nature of the perf
+ * event counter is selected at the creation of the *map*. The
+ * *map* is an array whose size is the number of available CPU
+ * cores, and each cell contains a value relative to one core. The
+ * value to retrieve is indicated by *flags*, that contains the
+ * index of the core to look up, masked with
+ * **BPF_F_INDEX_MASK**. Alternatively, *flags* can be set to
+ * **BPF_F_CURRENT_CPU** to indicate that the value for the
+ * current CPU core should be retrieved.
+ *
+ * This helper behaves in a way close to
+ * **bpf_perf_event_read**\ () helper, save that instead of
+ * just returning the value observed, it fills the *buf*
+ * structure. This allows for additional data to be retrieved: in
+ * particular, the enabled and running times (in *buf*\
+ * **->enabled** and *buf*\ **->running**, respectively) are
+ * copied.
+ *
+ * These values are interesting, because hardware PMU (Performance
+ * Monitoring Unit) counters are limited resources. When there are
+ * more PMU based perf events opened than available counters,
+ * kernel will multiplex these events so each event gets certain
+ * percentage (but not all) of the PMU time. In case that
+ * multiplexing happens, the number of samples or counter value
+ * will not reflect the case compared to when no multiplexing
+ * occurs. This makes comparison between different runs difficult.
+ * Typically, the counter value should be normalized before
+ * comparing to 

Re: [PATCH v11 0/4] set VSESR_EL2 by user space and support NOTIFY_SEI notification

2018-04-10 Thread James Morse
Hi Dongjiu Geng,

On 09/04/18 22:36, Dongjiu Geng wrote:
> 1. Detect whether KVM can set set guest SError syndrome
> 2. Support to Set VSESR_EL2 and inject SError by user space.
> 3. Support live migration to keep SError pending state and VSESR_EL2 value.
> 4. ACPI 6.1 adds support for NOTIFY_SEI as a GHES notification mechanism, so 
> support this
>notification in software, KVM or kernel ARCH code call handle_guest_sei() 
> to let ACP driver
>to handle this notification.

Please don't post code during the merge-window, will this apply to v4.17-rc1? We
can't know until its tagged.


This series is doing two separate things, please split it into two series.

But on the ACPI front: I don't see how any OS can support your NOTIFY_SEI when
firmware is ignoring the normal world's PSTATE.A.

The latest lobe of that discussion was on the list here:
https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1611496.html


As it is, we would need to spot SError being delivered while SError is masked,
spray nasty messages about firmware being horrifically buggy, then panic(). For
a corrected error, this looks bad, but its preferable to letting firmware
silently overwrite the exception registers, causing linux to spin through the
vectors 'eret' with all exceptions masked.
I still think its best to wait for firmware that does the right thing.


Thanks,

James
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v11 2/4] arm/arm64: KVM: Add KVM_GET/SET_VCPU_EVENTS

2018-04-10 Thread James Morse
Hi Dongjiu Geng,

On 09/04/18 22:36, Dongjiu Geng wrote:
> This new IOCTL exports user-invisible states related to SError.
> Together with appropriate user space changes, it can inject
> SError with specified syndrome to guest by setup kvm_vcpu_events
> value.

> Also it can support live migration.

Could you explain what user-space is expected to do for this?
(this is also relevant for snapshot-ing/suspending VMs)

It's probably worth noting that this solves an existing problem: KVM may make an
SError pending, but user-space has no way to discover/migrate this.


> diff --git a/Documentation/virtual/kvm/api.txt 
> b/Documentation/virtual/kvm/api.txt
> index 8a3d708..45719b4 100644
> --- a/Documentation/virtual/kvm/api.txt
> +++ b/Documentation/virtual/kvm/api.txt
> @@ -819,11 +819,13 @@ struct kvm_clock_data {
>  
>  Capability: KVM_CAP_VCPU_EVENTS
>  Extended by: KVM_CAP_INTR_SHADOW
> -Architectures: x86
> +Architectures: x86, arm, arm64
>  Type: vm ioctl
>  Parameters: struct kvm_vcpu_event (out)
>  Returns: 0 on success, -1 on error
>  
> +X86:
> +
>  Gets currently pending exceptions, interrupts, and NMIs as well as related
>  states of the vcpu.
>  
> @@ -865,15 +867,31 @@ Only two fields are defined in the flags field:
>  - KVM_VCPUEVENT_VALID_SMM may be set in the flags field to signal that
>smi contains a valid state.
>  
> +ARM, ARM64:
> +
> +Gets currently pending SError exceptions as well as related states of the 
> vcpu.
> +
> +struct kvm_vcpu_events {
> + struct {
> + __u8 serror_pending;
> + __u8 serror_has_esr;
> + /* Align it to 4 bytes */
> + __u8 pad[2];
> + __u64 serror_esr;
> + } exception;
> +};
> +

I'm not convinced we should change this struct from the layout/size x86 has. Its
confusing for the documentation, is this API call really the same on all
architectures?

What if we want to add some future interrupt, NMI or related state? We've found
ourselves needing to add this API, it seems odd to remove its other uses on x86.
We can't put them back in the future.

Having a different layout would force user-space to ifdef/duplicate any code
that accesses this between architectures.



The compiler will want that __u64 to be naturally aligned to 8-bytes, so your
4-byte padding still causes some secret compiler-padding to be inserted.
Different versions of the compiler may put it in different places.


>  4.32 KVM_SET_VCPU_EVENTS
>  
>  Capability: KVM_CAP_VCPU_EVENTS
>  Extended by: KVM_CAP_INTR_SHADOW
> -Architectures: x86
> +Architectures: x86, arm, arm64
>  Type: vm ioctl
>  Parameters: struct kvm_vcpu_event (in)
>  Returns: 0 on success, -1 on error
>  
> +X86:
> +
>  Set pending exceptions, interrupts, and NMIs as well as related states of the
>  vcpu.
>  
> @@ -894,6 +912,12 @@ shall be written into the VCPU.
>  
>  KVM_VCPUEVENT_VALID_SMM can only be set if KVM_CAP_X86_SMM is available.
>  
> +ARM, ARM64:
> +
> +Set pending SError exceptions as well as related states of the vcpu.
> +
> +See KVM_GET_VCPU_EVENTS for the data structure.
> +
>  
>  4.33 KVM_GET_DEBUGREGS
>  


> diff --git a/arch/arm64/include/uapi/asm/kvm.h 
> b/arch/arm64/include/uapi/asm/kvm.h
> index 9abbf30..855cc9a 100644
> --- a/arch/arm64/include/uapi/asm/kvm.h
> +++ b/arch/arm64/include/uapi/asm/kvm.h
> @@ -39,6 +39,7 @@
>  #define __KVM_HAVE_GUEST_DEBUG
>  #define __KVM_HAVE_IRQ_LINE
>  #define __KVM_HAVE_READONLY_MEM
> +#define __KVM_HAVE_VCPU_EVENTS
>  
>  #define KVM_COALESCED_MMIO_PAGE_OFFSET 1
>  
> @@ -153,6 +154,17 @@ struct kvm_sync_regs {
>  struct kvm_arch_memory_slot {
>  };
>  
> +/* for KVM_GET/SET_VCPU_EVENTS */
> +struct kvm_vcpu_events {
> + struct {
> + __u8 serror_pending;
> + __u8 serror_has_esr;

> + /* Align it to 4 bytes */
> + __u8 pad[2];

(padding noted above)


> + __u64 serror_esr;
> + } exception;
> +};
> +
>  /* If you need to interpret the index values, here is the key: */
>  #define KVM_REG_ARM_COPROC_MASK  0x0FFF
>  #define KVM_REG_ARM_COPROC_SHIFT 16


> diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
> index 5c7f657..42e1222 100644
> --- a/arch/arm64/kvm/guest.c
> +++ b/arch/arm64/kvm/guest.c
> @@ -277,6 +277,37 @@ int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu *vcpu,
>   return -EINVAL;
>  }
>  
> +int kvm_arm_vcpu_get_events(struct kvm_vcpu *vcpu,
> + struct kvm_vcpu_events *events)
> +{
> + events->exception.serror_pending = (vcpu_get_hcr(vcpu) & HCR_VSE);
> + events->exception.serror_has_esr =
> + cpus_have_const_cap(ARM64_HAS_RAS_EXTN) &&
> + (!!vcpu_get_vsesr(vcpu));

> + events->exception.serror_esr = vcpu_get_vsesr(vcpu);

This will return a stale ESR even if nothing is pending. On systems without the
RAS extensions it will return 'ESR_ELx_ISV' if kvm_inject_vabt() has ever been
called for this 

Re: [PATCH v11 1/4] arm64: KVM: export the capability to set guest SError syndrome

2018-04-10 Thread James Morse
Hi Dongjiu Geng,

On 09/04/18 22:36, Dongjiu Geng wrote:
> Before user space injects a SError, it needs to know whether it can
> specify the guest Exception Syndrome, so KVM should tell user space
> whether it has such capability.

(you could improve the commit message by briefly explaining how/why user-space
would want to do this. As this is patch 1, you don't have the context of the
previous patch to say that some systems can provide an ESR with virtual-SError)


> diff --git a/Documentation/virtual/kvm/api.txt 
> b/Documentation/virtual/kvm/api.txt
> index fc3ae95..8a3d708 100644
> --- a/Documentation/virtual/kvm/api.txt
> +++ b/Documentation/virtual/kvm/api.txt
> @@ -4415,3 +4415,14 @@ Parameters: none
>  This capability indicates if the flic device will be able to get/set the
>  AIS states for migration via the KVM_DEV_FLIC_AISM_ALL attribute and allows
>  to discover this without having to create a flic device.
> +
> +8.14 KVM_CAP_ARM_SET_SERROR_ESR
> +
> +Architectures: arm, arm64
> +
> +This capability indicates that userspace can specify syndrome value reported 
> to

(Nit: 'the syndrome value')

> +guest OS when guest takes a virtual SError interrupt exception.

(Nit: 'the guest')

> +If KVM has this capability, userspace can only specify the ISS field for the 
> ESR
> +syndrome, can not specify the EC field which is not under control by KVM.

(Nit: 'it can not specify...')

> +If this virtual SError is taken to EL1 using AArch64, this value will be 
> reported
> +into ISS filed of ESR_EL1.

(Nit: 'in the ISS field')


> diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
> index 3256b92..38c8a64 100644
> --- a/arch/arm64/kvm/reset.c
> +++ b/arch/arm64/kvm/reset.c
> @@ -77,6 +77,9 @@ int kvm_arch_dev_ioctl_check_extension(struct kvm *kvm, 
> long ext)
>   case KVM_CAP_ARM_PMU_V3:
>   r = kvm_arm_support_pmu_v3();
>   break;
> + case KVM_CAP_ARM_INJECT_SERROR_ESR:
> + r = cpus_have_const_cap(ARM64_HAS_RAS_EXTN);
> + break;
>   case KVM_CAP_SET_GUEST_DEBUG:
>   case KVM_CAP_VCPU_ATTRIBUTES:
>   r = 1;

'dev_ioctl' feels a bit weird, but we already have cpu_has_32bit_el1() in here.


> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index 8fb90a0..3587b33 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -934,6 +934,7 @@ struct kvm_ppc_resize_hpt {
>  #define KVM_CAP_S390_AIS_MIGRATION 150
>  #define KVM_CAP_PPC_GET_CPU_CHAR 151
>  #define KVM_CAP_S390_BPB 152
> +#define KVM_CAP_ARM_INJECT_SERROR_ESR 153
>  
>  #ifdef KVM_CAP_IRQ_ROUTING

(patch 1&2 should probably be swapped around, as on its own this does thing).

Reviewed-by: James Morse 


Thanks,

James
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V2 3/9] dt-bindings: Tegra186 tachometer device tree bindings

2018-04-10 Thread Guenter Roeck

On 04/10/2018 06:30 AM, Rob Herring wrote:

On Mon, Apr 9, 2018 at 9:37 AM, Mikko Perttunen  wrote:



On 04/09/2018 04:21 PM, Rob Herring wrote:


On Mon, Apr 9, 2018 at 12:38 AM, Mikko Perttunen  wrote:


Rob,



Please don't top post to lists.


this binding is for a specific IP block (for measuring/aggregating input
pulses) on the Tegra186 SoC, so I don't think it fits into any generic
binding.



What is it hooked up to to measure? You only mention "fan" five times
in the doc.



In practice, fans.



You have #pwm-cells too, so this block has PWM output as well? If not,
then where's the PWM for the fan control because there is no point in
having fan tach without some control mechanism.



It doesn't provide a PWM output. The (Linux) PWM framework provides
functionality in both directions - control and capture. But if the device
tree #pwm-cells/pwms properties are only for control, we may need to
introduce a new #capture-pwm-cells/capture-pwms or similar.


Yes, perhaps. But there is no point in having
#capture-pwm-cells/capture-pwms if you aren't describing the
connection between the fan and the fan controller.


The idea is that the generic fan node can then specify two pwms, one for
control and one for capture, to enable e.g. closed-loop control (I'm not
personally familiar with the usecase for this but I could imagine something
like that). The control PWM can be something completely different, maybe not
a PWM in the first place (e.g. some fixed voltage).


Yes. As you can have different types of fans (3-wire, 4-wire, etc.)
they would have different compatibles and differing properties
associated with them.


There's only so many ways to control fans and types of fans, so yes,
the interface of control and feedback lines between a fan and its
controller should absolutely be generic.



I'm not quite getting what you mean by this. Clearly we need a custom
compatibility string for the tachometer as it's a different hardware block
with different programming than others.


Yes, of course. It's the interface between fan controllers and fans
that I'm concerned about.


Or are you complaining about the
nvidia,pulse-per-rev/capture-window-len properties?


Well, those sound like properties of a fan (at least the first one),
so they belong in a fan node.

The aspeed fan controller is probably the closest thing we have to a
fan binding. Look at that if you haven't already.



FWIW, this is a fan speed (tachometer) counter which is modeled as pwm input.
This, in my opinion, and as stated before, is conceptually wrong. The pwm
subsystem should not (need to) know anything about fans, much less about
specifics such as the number of pulses per revolution.

Guenter
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V2 3/9] dt-bindings: Tegra186 tachometer device tree bindings

2018-04-10 Thread Rob Herring
On Mon, Apr 9, 2018 at 9:37 AM, Mikko Perttunen  wrote:
>
>
> On 04/09/2018 04:21 PM, Rob Herring wrote:
>>
>> On Mon, Apr 9, 2018 at 12:38 AM, Mikko Perttunen  wrote:
>>>
>>> Rob,
>>
>>
>> Please don't top post to lists.
>>
>>> this binding is for a specific IP block (for measuring/aggregating input
>>> pulses) on the Tegra186 SoC, so I don't think it fits into any generic
>>> binding.
>>
>>
>> What is it hooked up to to measure? You only mention "fan" five times
>> in the doc.
>
>
> In practice, fans.
>
>>
>> You have #pwm-cells too, so this block has PWM output as well? If not,
>> then where's the PWM for the fan control because there is no point in
>> having fan tach without some control mechanism.
>
>
> It doesn't provide a PWM output. The (Linux) PWM framework provides
> functionality in both directions - control and capture. But if the device
> tree #pwm-cells/pwms properties are only for control, we may need to
> introduce a new #capture-pwm-cells/capture-pwms or similar.

Yes, perhaps. But there is no point in having
#capture-pwm-cells/capture-pwms if you aren't describing the
connection between the fan and the fan controller.

> The idea is that the generic fan node can then specify two pwms, one for
> control and one for capture, to enable e.g. closed-loop control (I'm not
> personally familiar with the usecase for this but I could imagine something
> like that). The control PWM can be something completely different, maybe not
> a PWM in the first place (e.g. some fixed voltage).

Yes. As you can have different types of fans (3-wire, 4-wire, etc.)
they would have different compatibles and differing properties
associated with them.

>> There's only so many ways to control fans and types of fans, so yes,
>> the interface of control and feedback lines between a fan and its
>> controller should absolutely be generic.
>
>
> I'm not quite getting what you mean by this. Clearly we need a custom
> compatibility string for the tachometer as it's a different hardware block
> with different programming than others.

Yes, of course. It's the interface between fan controllers and fans
that I'm concerned about.

> Or are you complaining about the
> nvidia,pulse-per-rev/capture-window-len properties?

Well, those sound like properties of a fan (at least the first one),
so they belong in a fan node.

The aspeed fan controller is probably the closest thing we have to a
fan binding. Look at that if you haven't already.

Rob
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] gpiolib: add hogs support for machine code

2018-04-10 Thread Bartosz Golaszewski
Board files constitute a significant part of the users of the legacy
GPIO framework. In many cases they only export a line and set its
desired value. We could use GPIO hogs for that like we do for DT and
ACPI but there's no support for that in machine code.

This patch proposes to extend the machine.h API with support for
registering hog tables in board files.

Signed-off-by: Bartosz Golaszewski 
---
 Documentation/driver-api/gpio/board.rst | 16 ++
 drivers/gpio/gpiolib.c  | 67 +
 include/linux/gpio/machine.h| 31 
 3 files changed, 114 insertions(+)

diff --git a/Documentation/driver-api/gpio/board.rst 
b/Documentation/driver-api/gpio/board.rst
index 25d62b2e9fd0..2c112553df84 100644
--- a/Documentation/driver-api/gpio/board.rst
+++ b/Documentation/driver-api/gpio/board.rst
@@ -177,3 +177,19 @@ mapping and is thus transparent to GPIO consumers.
 
 A set of functions such as gpiod_set_value() is available to work with
 the new descriptor-oriented interface.
+
+Boards using platform data can also hog GPIO lines by defining GPIO hog tables.
+
+.. code-block:: c
+
+struct gpiod_hog gpio_hog_table[] = {
+GPIO_HOG("gpio.0", 10, "foo", GPIO_ACTIVE_LOW, GPIOD_OUT_HIGH),
+{ }
+};
+
+And the table can be added to the board code as follows::
+
+gpiod_add_hogs(gpio_hog_table);
+
+The line will be hogged as soon as the gpiochip is created or - in case the
+chip was created earlier - when the hog table is registered.
diff --git a/drivers/gpio/gpiolib.c b/drivers/gpio/gpiolib.c
index 43aeb07343ec..547adc149b62 100644
--- a/drivers/gpio/gpiolib.c
+++ b/drivers/gpio/gpiolib.c
@@ -71,6 +71,9 @@ static DEFINE_MUTEX(gpio_lookup_lock);
 static LIST_HEAD(gpio_lookup_list);
 LIST_HEAD(gpio_devices);
 
+static DEFINE_MUTEX(gpio_machine_hogs_mutex);
+static LIST_HEAD(gpio_machine_hogs);
+
 static void gpiochip_free_hogs(struct gpio_chip *chip);
 static int gpiochip_add_irqchip(struct gpio_chip *gpiochip,
struct lock_class_key *lock_key,
@@ -1171,6 +1174,41 @@ static int gpiochip_setup_dev(struct gpio_device *gdev)
return status;
 }
 
+static void gpiochip_machine_hog(struct gpio_chip *chip, struct gpiod_hog *hog)
+{
+   struct gpio_desc *desc;
+   int rv;
+
+   desc = gpiochip_get_desc(chip, hog->chip_hwnum);
+   if (IS_ERR(desc)) {
+   pr_err("%s: unable to get GPIO desc: %ld\n",
+  __func__, PTR_ERR(desc));
+   return;
+   }
+
+   if (desc->flags & FLAG_IS_HOGGED)
+   return;
+
+   rv = gpiod_hog(desc, hog->line_name, hog->lflags, hog->dflags);
+   if (rv)
+   pr_err("%s: unable to hog GPIO line (%s:%u): %d\n",
+  __func__, chip->label, hog->chip_hwnum, rv);
+}
+
+static void machine_gpiochip_add(struct gpio_chip *chip)
+{
+   struct gpiod_hog *hog;
+
+   mutex_lock(_machine_hogs_mutex);
+
+   list_for_each_entry(hog, _machine_hogs, list) {
+   if (!strcmp(chip->label, hog->chip_label))
+   gpiochip_machine_hog(chip, hog);
+   }
+
+   mutex_unlock(_machine_hogs_mutex);
+}
+
 static void gpiochip_setup_devs(void)
 {
struct gpio_device *gdev;
@@ -1326,6 +1364,8 @@ int gpiochip_add_data_with_key(struct gpio_chip *chip, 
void *data,
 
acpi_gpiochip_add(chip);
 
+   machine_gpiochip_add(chip);
+
/*
 * By first adding the chardev, and then adding the device,
 * we get a device node entry in sysfs under
@@ -3462,6 +3502,33 @@ void gpiod_remove_lookup_table(struct gpiod_lookup_table 
*table)
 }
 EXPORT_SYMBOL_GPL(gpiod_remove_lookup_table);
 
+/**
+ * gpiod_add_hogs() - register a set of GPIO hogs from machine code
+ * @hogs: table of gpio hog entries with a zeroed sentinel at the end
+ */
+void gpiod_add_hogs(struct gpiod_hog *hogs)
+{
+   struct gpio_chip *chip;
+   struct gpiod_hog *hog;
+
+   mutex_lock(_machine_hogs_mutex);
+
+   for (hog = [0]; hog->chip_label; hog++) {
+   list_add_tail(>list, _machine_hogs);
+
+   /*
+* The chip may have been registered earlier, so check if it
+* exists and, if so, try to hog the line now.
+*/
+   chip = find_chip_by_name(hog->chip_label);
+   if (chip)
+   gpiochip_machine_hog(chip, hog);
+   }
+
+   mutex_unlock(_machine_hogs_mutex);
+}
+EXPORT_SYMBOL_GPL(gpiod_add_hogs);
+
 static struct gpiod_lookup_table *gpiod_find_lookup_table(struct device *dev)
 {
const char *dev_id = dev ? dev_name(dev) : NULL;
diff --git a/include/linux/gpio/machine.h b/include/linux/gpio/machine.h
index b2f2dc638463..517957d6b168 100644
--- a/include/linux/gpio/machine.h
+++ b/include/linux/gpio/machine.h
@@ -39,6 +39,23 @@ struct gpiod_lookup_table {
struct gpiod_lookup 

Re: [PATCH v2 7/9] trace_uprobe/sdt: Fix multiple update of same reference counter

2018-04-10 Thread Oleg Nesterov
Hi Ravi,

On 04/10, Ravi Bangoria wrote:
>
> > and what if __mmu_notifier_register() fails simply because signal_pending() 
> > == T?
> > see mm_take_all_locks().
> >
> > at first glance this all look suspicious and sub-optimal,
>
> Yes. I should have added checks for failure cases.
> Will fix them in v3.

And what can you do if it fails? Nothing except report the problem. But
signal_pending() is not the unlikely or error condition, it should not
cause the tracing errors.

Plus mm_take_all_locks() is very heavy... BTW, uprobe_mmap_callback() is
called unconditionally. Whatever it does, can we at least move it after
the no_uprobe_events() check? Can't we also check MMF_HAS_UPROBES?

Either way, I do not feel that mmu_notifier is the right tool... Did you
consider the uprobe_clear_state() hook we already have?

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/32] docs/vm: convert to ReST format

2018-04-10 Thread Mike Rapoport
Jon, Andrew,

How do you suggest to continue with this?

On Sun, Apr 01, 2018 at 09:38:58AM +0300, Mike Rapoport wrote:
> (added akpm)
> 
> On Thu, Mar 29, 2018 at 03:46:07PM -0600, Jonathan Corbet wrote:
> > On Wed, 21 Mar 2018 21:22:16 +0200
> > Mike Rapoport  wrote:
> > 
> > > These patches convert files in Documentation/vm to ReST format, add an
> > > initial index and link it to the top level documentation.
> > > 
> > > There are no contents changes in the documentation, except few spelling
> > > fixes. The relatively large diffstat stems from the indentation and
> > > paragraph wrapping changes.
> > > 
> > > I've tried to keep the formatting as consistent as possible, but I could
> > > miss some places that needed markup and add some markup where it was not
> > > necessary.
> > 
> > So I've been pondering on these for a bit.  It looks like a reasonable and
> > straightforward RST conversion, no real complaints there.  But I do have a
> > couple of concerns...
> > 
> > One is that, as we move documentation into RST, I'm really trying to
> > organize it a bit so that it is better tuned to the various audiences we
> > have.  For example, ksm.txt is going to be of interest to sysadmin types,
> > who might want to tune it.  mmu_notifier.txt is of interest to ...
> > somebody, but probably nobody who is thinking in user space.  And so on.
> > 
> > So I would really like to see this material split up and put into the
> > appropriate places in the RST hierarchy - admin-guide for administrative
> > stuff, core-api for kernel development topics, etc.  That, of course,
> > could be done separately from the RST conversion, but I suspect I know
> > what will (or will not) happen if we agree to defer that for now :)
> 
> Well, I was actually planning on doing that ;-)
> 
> My thinking was to start with mechanical RST conversion and then to start
> working on the contents and ordering of the documentation. Some of the
> existing files, e.g. ksm.txt, can be moved as is into the appropriate
> places, others, like transhuge.txt should be at least split into admin/user
> and developer guides.
> 
> Another problem with many of the existing mm docs is that they are rather
> developer notes and it wouldn't be really straight forward to assign them
> to a particular topic.
> 
> I believe that keeping the mm docs together will give better visibility of
> what (little) mm documentation we have and will make the updates easier.
> The documents that fit well into a certain topic could be linked there. For
> instance:
> 
> -
> diff --git a/Documentation/admin-guide/index.rst 
> b/Documentation/admin-guide/index.rst
> index 5bb9161..8f6c6e6 100644
> --- a/Documentation/admin-guide/index.rst
> +++ b/Documentation/admin-guide/index.rst
> @@ -63,6 +63,7 @@ configure specific aspects of kernel behavior to your 
> liking.
> pm/index
> thunderbolt
> LSM/index
> +   vm/index
> 
>  .. only::  subproject and html
> 
> diff --git a/Documentation/admin-guide/vm/index.rst 
> b/Documentation/admin-guide/vm/index.rst
> new file mode 100644
> index 000..d86f1c8
> --- /dev/null
> +++ b/Documentation/admin-guide/vm/index.rst
> @@ -0,0 +1,5 @@
> +==
> +Knobs and Buttons for Memory Management Tuning
> +==
> +
> +* :ref:`ksm `
> -
> 
> > The other is the inevitable merge conflicts that changing that many doc
> > files will create.  Sending the patches through Andrew could minimize
> > that, I guess, or at least make it his problem.  Alternatively, we could
> > try to do it as an end-of-merge-window sort of thing.  I can try to manage
> > that, but an ack or two from the mm crowd would be nice to have.
> 
> I can rebase on top of Andrew's tree if that would help to minimize the
> merge conflicts.
> 
> > Thanks,
> > 
> > jon
> > 
> 
> -- 
> Sincerely yours,
> Mike.
> 

-- 
Sincerely yours,
Mike.

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC bpf-next] bpf: document eBPF helpers and add a script to generate man page

2018-04-10 Thread Quentin Monnet
2018-04-09 18:47 UTC-0700 ~ Alexei Starovoitov

> On Mon, Apr 09, 2018 at 02:25:26PM +0100, Quentin Monnet wrote:
>>
>> Anyway, I am fine with keeping just signatures, descriptions and return
>> values for now. I will submit a new version with only those items.
> 
> Thank you.
> 
> Could you also split it into few patches?
>  include/uapi/linux/bpf.h   | 2237 
> 
>  scripts/bpf_helpers_doc.py |  568 +++
>  2 files changed, 2429 insertions(+), 376 deletions(-)
> 
> replying back and forth on a single patch of such size will be tedious
> for others to follow.
> May be document ~10 helpers at a time ? Total of ~7 patches and extra
> patch for .py ?
> 

Sure, I'll do that. And I'll try to group helpers in a patch by author,
it should also help for reviewing the descriptions.

Quentin
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 7/9] trace_uprobe/sdt: Fix multiple update of same reference counter

2018-04-10 Thread Ravi Bangoria
Hi Oleg,

On 04/09/2018 06:59 PM, Oleg Nesterov wrote:
> On 04/04, Ravi Bangoria wrote:
>> +static void sdt_add_mm_list(struct trace_uprobe *tu, struct mm_struct *mm)
>> +{
>> +struct mmu_notifier *mn;
>> +struct sdt_mm_list *sml = kzalloc(sizeof(*sml), GFP_KERNEL);
>> +
>> +if (!sml)
>> +return;
>> +sml->mm = mm;
>> +list_add(&(sml->list), &(tu->sml.list));
>> +
>> +/* Register mmu_notifier for this mm. */
>> +mn = kzalloc(sizeof(*mn), GFP_KERNEL);
>> +if (!mn)
>> +return;
>> +
>> +mn->ops = _mmu_notifier_ops;
>> +__mmu_notifier_register(mn, mm);
>> +}
> and what if __mmu_notifier_register() fails simply because signal_pending() 
> == T?
> see mm_take_all_locks().
>
> at first glance this all look suspicious and sub-optimal,

Yes. I should have added checks for failure cases.
Will fix them in v3.

Thanks for the review,
Ravi

>  but let me repeat that
> I didn't read this version yet.
>
> Oleg.
>

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html