Re: [PATCH v2 1/2] mtd: rawnand: Provide helper for polling GPIO R/B pin

2018-10-12 Thread Boris Brezillon
Hi Janusz,

On Fri, 12 Oct 2018 22:41:00 +0200
Janusz Krzysztofik  wrote:

> Each controller driver with access to NAND R/B pin over GPIO would have 
> to reimplement the polling loop otherwise.
> 
> Signed-off-by: Janusz Krzysztofik 
> ---
> Changelog:
> v2:
> New patch - v1 consisted of only one patch (the followning one)
> 
> 
>  drivers/mtd/nand/raw/nand_base.c | 38 ++
>  include/linux/mtd/rawnand.h  | 10 ++
>  2 files changed, 48 insertions(+)
> 
> diff --git a/drivers/mtd/nand/raw/nand_base.c 
> b/drivers/mtd/nand/raw/nand_base.c
> index 05bd0779fe9b..ff1ac4a3c647 100644
> --- a/drivers/mtd/nand/raw/nand_base.c
> +++ b/drivers/mtd/nand/raw/nand_base.c
> @@ -45,6 +45,9 @@
>  #include 
>  #include 
>  #include 
> +#ifdef CONFIG_GPIOLIB
> +#include 
> +#endif

The ifdef is not needed here, linux/gpio/consumer.h already has dummy
wrappers when CONFIG_GPIOLIB is not enabled.

>  
>  #include "internals.h"
>  
> @@ -531,6 +534,41 @@ int nand_soft_waitrdy(struct nand_chip *chip, unsigned 
> long timeout_ms)
>  };
>  EXPORT_SYMBOL_GPL(nand_soft_waitrdy);
>  
> +#ifdef CONFIG_GPIOLIB
> +/**
> + * nand_gpio_waitrdy - Poll R/B GPIO pin until ready
> + * @chip: NAND chip structure
> + * @gpiod: GPIO descriptor of R/B pin
> + * @timeout_ms: Timeout in ms
> + *
> + * Poll the R/B GPIO pin until it becomes ready. If that does not happen
> + * whitin the specified timeout, -ETIMEDOUT is returned.
> + *
> + * This helper is intended to be used when the controller has access to the
> + * NAND R/B pin over GPIO.
> + *
> + * Be aware that calling this helper from an ->exec_op() implementation means
> + * ->exec_op() must be re-entrant.

This is not true for this function: it does not call nand_exec_op().

> + *
> + * Return 0 if the R/B pin indicates chip is ready, a negative error 
> otherwise.
> + */
> +int nand_gpio_waitrdy(struct nand_chip *chip, struct gpio_desc *gpiod,
> +   unsigned long timeout_ms)
> +{
> + /* Wait until command is processed or timeout occurs */
> + timeout_ms = jiffies + msecs_to_jiffies(timeout_ms);
> + do {
> + if (gpiod_get_value_cansleep(gpiod))
> + return 0;
> +
> + cond_resched();
> + } while (time_before(jiffies, timeout_ms));

> +
> + return gpiod_get_value_cansleep(gpiod) ? 0 : -ETIMEDOUT;
> +};
> +EXPORT_SYMBOL_GPL(nand_gpio_waitrdy);
> +#endif

Hm, I don't see any other helpers defined in #ifdef blocks though most
of them are optionals and are most of the time not used by drivers.
Let's keep things consistent (at the expense of embedding unused code
in nand.o) and remove the #ifdef here. If someone starts complaining
about the size of the rawnand core, we'll consider doing that.

> +
>  /**
>   * panic_nand_get_device - [GENERIC] Get chip for selected access
>   * @chip: the nand chip descriptor
> diff --git a/include/linux/mtd/rawnand.h b/include/linux/mtd/rawnand.h
> index e10b126e148f..09f0ed1345b1 100644
> --- a/include/linux/mtd/rawnand.h
> +++ b/include/linux/mtd/rawnand.h
> @@ -1346,4 +1346,14 @@ void nand_release(struct nand_chip *chip);
>   */
>  int nand_soft_waitrdy(struct nand_chip *chip, unsigned long timeout_ms);
>  
> +#ifdef CONFIG_GPIOLIB
> +struct gpio_desc;
> +/*
> + * External helper for controller drivers that have to implement the WAITRDY
> + * instruction and do have GPIO pin to check it.
> + */

You can drop this comment, this is already explained in the kerneldoc
header above the function def.

> +int nand_gpio_waitrdy(struct nand_chip *chip, struct gpio_desc *gpiod,
> +   unsigned long timeout_ms);
> +#endif
> +
>  #endif /* __LINUX_MTD_RAWNAND_H */

Regards,

Boris


Re: [PATCH v2 1/2] mtd: rawnand: Provide helper for polling GPIO R/B pin

2018-10-12 Thread Boris Brezillon
Hi Janusz,

On Fri, 12 Oct 2018 22:41:00 +0200
Janusz Krzysztofik  wrote:

> Each controller driver with access to NAND R/B pin over GPIO would have 
> to reimplement the polling loop otherwise.
> 
> Signed-off-by: Janusz Krzysztofik 
> ---
> Changelog:
> v2:
> New patch - v1 consisted of only one patch (the followning one)
> 
> 
>  drivers/mtd/nand/raw/nand_base.c | 38 ++
>  include/linux/mtd/rawnand.h  | 10 ++
>  2 files changed, 48 insertions(+)
> 
> diff --git a/drivers/mtd/nand/raw/nand_base.c 
> b/drivers/mtd/nand/raw/nand_base.c
> index 05bd0779fe9b..ff1ac4a3c647 100644
> --- a/drivers/mtd/nand/raw/nand_base.c
> +++ b/drivers/mtd/nand/raw/nand_base.c
> @@ -45,6 +45,9 @@
>  #include 
>  #include 
>  #include 
> +#ifdef CONFIG_GPIOLIB
> +#include 
> +#endif

The ifdef is not needed here, linux/gpio/consumer.h already has dummy
wrappers when CONFIG_GPIOLIB is not enabled.

>  
>  #include "internals.h"
>  
> @@ -531,6 +534,41 @@ int nand_soft_waitrdy(struct nand_chip *chip, unsigned 
> long timeout_ms)
>  };
>  EXPORT_SYMBOL_GPL(nand_soft_waitrdy);
>  
> +#ifdef CONFIG_GPIOLIB
> +/**
> + * nand_gpio_waitrdy - Poll R/B GPIO pin until ready
> + * @chip: NAND chip structure
> + * @gpiod: GPIO descriptor of R/B pin
> + * @timeout_ms: Timeout in ms
> + *
> + * Poll the R/B GPIO pin until it becomes ready. If that does not happen
> + * whitin the specified timeout, -ETIMEDOUT is returned.
> + *
> + * This helper is intended to be used when the controller has access to the
> + * NAND R/B pin over GPIO.
> + *
> + * Be aware that calling this helper from an ->exec_op() implementation means
> + * ->exec_op() must be re-entrant.

This is not true for this function: it does not call nand_exec_op().

> + *
> + * Return 0 if the R/B pin indicates chip is ready, a negative error 
> otherwise.
> + */
> +int nand_gpio_waitrdy(struct nand_chip *chip, struct gpio_desc *gpiod,
> +   unsigned long timeout_ms)
> +{
> + /* Wait until command is processed or timeout occurs */
> + timeout_ms = jiffies + msecs_to_jiffies(timeout_ms);
> + do {
> + if (gpiod_get_value_cansleep(gpiod))
> + return 0;
> +
> + cond_resched();
> + } while (time_before(jiffies, timeout_ms));

> +
> + return gpiod_get_value_cansleep(gpiod) ? 0 : -ETIMEDOUT;
> +};
> +EXPORT_SYMBOL_GPL(nand_gpio_waitrdy);
> +#endif

Hm, I don't see any other helpers defined in #ifdef blocks though most
of them are optionals and are most of the time not used by drivers.
Let's keep things consistent (at the expense of embedding unused code
in nand.o) and remove the #ifdef here. If someone starts complaining
about the size of the rawnand core, we'll consider doing that.

> +
>  /**
>   * panic_nand_get_device - [GENERIC] Get chip for selected access
>   * @chip: the nand chip descriptor
> diff --git a/include/linux/mtd/rawnand.h b/include/linux/mtd/rawnand.h
> index e10b126e148f..09f0ed1345b1 100644
> --- a/include/linux/mtd/rawnand.h
> +++ b/include/linux/mtd/rawnand.h
> @@ -1346,4 +1346,14 @@ void nand_release(struct nand_chip *chip);
>   */
>  int nand_soft_waitrdy(struct nand_chip *chip, unsigned long timeout_ms);
>  
> +#ifdef CONFIG_GPIOLIB
> +struct gpio_desc;
> +/*
> + * External helper for controller drivers that have to implement the WAITRDY
> + * instruction and do have GPIO pin to check it.
> + */

You can drop this comment, this is already explained in the kerneldoc
header above the function def.

> +int nand_gpio_waitrdy(struct nand_chip *chip, struct gpio_desc *gpiod,
> +   unsigned long timeout_ms);
> +#endif
> +
>  #endif /* __LINUX_MTD_RAWNAND_H */

Regards,

Boris


Re: [PATCH 09/50] amiserial: switch to ->[sg]et_serial()

2018-10-12 Thread Al Viro
On Thu, Oct 11, 2018 at 07:58:28PM +0200, Geert Uytterhoeven wrote:

> drivers/tty/amiserial.c:1076:3: error: 'retval' undeclared (first use
> in this function)
> 
> http://kisskb.ellerman.id.au/kisskb/buildresult/13544535/
> http://kisskb.ellerman.id.au/kisskb/buildresult/13544413/

Fixed and folded.


Re: [PATCH 09/50] amiserial: switch to ->[sg]et_serial()

2018-10-12 Thread Al Viro
On Thu, Oct 11, 2018 at 07:58:28PM +0200, Geert Uytterhoeven wrote:

> drivers/tty/amiserial.c:1076:3: error: 'retval' undeclared (first use
> in this function)
> 
> http://kisskb.ellerman.id.au/kisskb/buildresult/13544535/
> http://kisskb.ellerman.id.au/kisskb/buildresult/13544413/

Fixed and folded.


[PATCH v2] MAINTAINERS: Clarify UIO vs UIOVEC maintainer

2018-10-12 Thread Dan Williams
The UIO file mask in MAINTAINERS was incorrectly directing UIOVEC
(include/linux/uio.h) patches to Greg.

Tag Al as the UIOVEC maintainer as Ingo and others have explicitly
required his ack before taking architecture patches that touch
lib/iov_iter.c.

Cc: Al Viro 
Reported-by: Greg Kroah-Hartman 
Signed-off-by: Dan Williams 
---
Changes in v2:
* Rename UACCESS to UIOVEC

 MAINTAINERS |8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index d870cb57c887..c1841d58d7ae 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -15344,13 +15344,19 @@ F:arch/x86/um/
 F: fs/hostfs/
 F: fs/hppfs/
 
+USERSPACE COPYIN/COPYOUT (UIOVEC)
+M: Alexander Viro 
+S: Maintained
+F: lib/iov_iter.c
+F: include/linux/uio.h
+
 USERSPACE I/O (UIO)
 M: Greg Kroah-Hartman 
 S: Maintained
 T: git git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git
 F: Documentation/driver-api/uio-howto.rst
 F: drivers/uio/
-F: include/linux/uio*.h
+F: include/linux/uio_driver.h
 
 UTIL-LINUX PACKAGE
 M: Karel Zak 



[PATCH v2] MAINTAINERS: Clarify UIO vs UIOVEC maintainer

2018-10-12 Thread Dan Williams
The UIO file mask in MAINTAINERS was incorrectly directing UIOVEC
(include/linux/uio.h) patches to Greg.

Tag Al as the UIOVEC maintainer as Ingo and others have explicitly
required his ack before taking architecture patches that touch
lib/iov_iter.c.

Cc: Al Viro 
Reported-by: Greg Kroah-Hartman 
Signed-off-by: Dan Williams 
---
Changes in v2:
* Rename UACCESS to UIOVEC

 MAINTAINERS |8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index d870cb57c887..c1841d58d7ae 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -15344,13 +15344,19 @@ F:arch/x86/um/
 F: fs/hostfs/
 F: fs/hppfs/
 
+USERSPACE COPYIN/COPYOUT (UIOVEC)
+M: Alexander Viro 
+S: Maintained
+F: lib/iov_iter.c
+F: include/linux/uio.h
+
 USERSPACE I/O (UIO)
 M: Greg Kroah-Hartman 
 S: Maintained
 T: git git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git
 F: Documentation/driver-api/uio-howto.rst
 F: drivers/uio/
-F: include/linux/uio*.h
+F: include/linux/uio_driver.h
 
 UTIL-LINUX PACKAGE
 M: Karel Zak 



[PATCH v4] printk: Add line-buffered printk() API.

2018-10-12 Thread Tetsuo Handa
Sometimes we want to print a whole line without being disturbed by
concurrent printk() from interrupts and/or other threads, for printk()
which does not end with '\n' can be disturbed.

Mixed printk() output makes it hard to interpret. Assuming that we will go
to a direction that we allow prefixing context identifier to each line of
printk() output (so that we can group multiple lines into one block when
parsing), this patch introduces API for line-buffered printk() output
(so that we can make sure that printk() ends with '\n').

Since functions introduced by this patch are merely wrapping
printk()/vprintk() calls in order to minimize possibility of using
"struct cont", it is safe to replace printk()/vprintk() with this API.

Details:

  A structure named "struct printk_buffer" is introduced for buffering
  up to LOG_LINE_MAX bytes of printk() output which did not end with '\n'.

  get_printk_buffer() tries to assign a "struct printk_buffer" from
  statically preallocated array. get_printk_buffer() returns NULL if
  all "struct printk_buffer" are in use, but the caller does not need to
  check for NULL.

  put_printk_buffer() flushes and releases the "struct printk_buffer".
  put_printk_buffer() must match corresponding get_printk_buffer() as with
  rcu_read_unlock() must match corresponding rcu_read_lock().

  Three functions buffered_vprintk(), buffered_printk() and
  flush_printk_buffer() are provided for using "struct printk_buffer".
  These are like vfprintf(), fprintf(), fflush() except that these receive
  "struct printk_buffer *" for the first argument.

  buffered_vprintk() and buffered_printk() behave like vprintk() and
  printk() respectively if "struct printk_buffer *" argument is NULL.
  flush_printk_buffer() and put_printk_buffer() become no-op if
  "struct printk_buffer *" argument is NULL. Therefore, the caller of
  get_printk_buffer() does not need to check for NULL.

How to configure this API:

  For those who want to save memory footprint, this API is enabled only
  if CONFIG_PRINTK_LINE_BUFFERED option is selected.

  For those who want to tune the number of statically preallocated
  buffers, CONFIG_PRINTK_NUM_LINE_BUFFERS option is available. The default
  value is 16. Since "struct printk_buffer" makes difference only when
  there are multiple threads concurrently calling printk() which does not
  end with '\n', and this API will fallback to normal printk() when all
  CONFIG_PRINTK_NUM_LINE_BUFFERS buffers are in use, you won't need to
  specify a large number.

  But somebody might forget to call put_printk_buffer(). For those who
  want to know why all CONFIG_PRINTK_NUM_LINE_BUFFERS buffers are in use,
  CONFIG_PRINTK_REPORT_OUT_OF_LINE_BUFFERS option is available.
  This option reports when/where get_printk_buffer() was called and
  put_printk_buffer() is not yet called, up to once per a minute.

How to use this API:

  (1) Call get_printk_buffer() and acquire "struct printk_buffer *".

  (2) Rewrite printk() calls in the following way. The "ptr" is
  "struct printk_buffer *" obtained in step (1).

  printk(fmt, ...) => buffered_printk(ptr, fmt, ...)
  vprintk(fmt, args)   => buffered_vprintk(ptr, fmt, args)
  pr_emerg(fmt, ...)   => bpr_emerg(ptr, fmt, ...)
  pr_alert(fmt, ...)   => bpr_alert(ptr, fmt, ...)
  pr_crit(fmt, ...)=> bpr_crit(ptr, fmt, ...)
  pr_err(fmt, ...) => bpr_err(ptr, fmt, ...)
  pr_warning(fmt, ...) => bpr_warning(ptr, fmt, ...)
  pr_warn(fmt, ...)=> bpr_warn(ptr, fmt, ...)
  pr_notice(fmt, ...)  => bpr_notice(ptr, fmt, ...)
  pr_info(fmt, ...)=> bpr_info(ptr, fmt, ...)
  pr_cont(fmt, ...)=> bpr_cont(ptr, fmt, ...)

  (3) Release "struct printk_buffer" by calling put_printk_buffer().

Note that since "struct printk_buffer" buffers only up to one line, there
is no need to rewrite if it is known that the "struct printk_buffer" is
empty and printk() ends with '\n'.

  Good example:

printk("Hello ");=>  buf = get_printk_buffer();
pr_cont("world.\n"); buffered_printk(buf, "Hello ");
 buffered_printk(buf, "world.\n");
 put_printk_buffer(buf);

  Pointless example:

printk("Hello\n");   =>  buf = get_printk_buffer();
printk("World.\n");  buffered_printk(buf, "Hello\n");
 buffered_printk(buf, "World.\n");
 put_printk_buffer(buf);

Note that bpr_devel() and bpr_debug() are not defined. This is
because pr_devel()/pr_debug() should not be followed by pr_cont()
because pr_devel()/pr_debug() are conditionally enabled; output from
pr_devel()/pr_debug() should always end with '\n'.

Signed-off-by: Tetsuo Handa 
---
 include/linux/printk.h |  41 +
 init/Kconfig   |  31 +++
 kernel/printk/printk.c | 239 +
 3 files changed, 311 insertions(+)

diff --git a/include/linux/printk.h b/include/linux/printk.h

[PATCH v4] printk: Add line-buffered printk() API.

2018-10-12 Thread Tetsuo Handa
Sometimes we want to print a whole line without being disturbed by
concurrent printk() from interrupts and/or other threads, for printk()
which does not end with '\n' can be disturbed.

Mixed printk() output makes it hard to interpret. Assuming that we will go
to a direction that we allow prefixing context identifier to each line of
printk() output (so that we can group multiple lines into one block when
parsing), this patch introduces API for line-buffered printk() output
(so that we can make sure that printk() ends with '\n').

Since functions introduced by this patch are merely wrapping
printk()/vprintk() calls in order to minimize possibility of using
"struct cont", it is safe to replace printk()/vprintk() with this API.

Details:

  A structure named "struct printk_buffer" is introduced for buffering
  up to LOG_LINE_MAX bytes of printk() output which did not end with '\n'.

  get_printk_buffer() tries to assign a "struct printk_buffer" from
  statically preallocated array. get_printk_buffer() returns NULL if
  all "struct printk_buffer" are in use, but the caller does not need to
  check for NULL.

  put_printk_buffer() flushes and releases the "struct printk_buffer".
  put_printk_buffer() must match corresponding get_printk_buffer() as with
  rcu_read_unlock() must match corresponding rcu_read_lock().

  Three functions buffered_vprintk(), buffered_printk() and
  flush_printk_buffer() are provided for using "struct printk_buffer".
  These are like vfprintf(), fprintf(), fflush() except that these receive
  "struct printk_buffer *" for the first argument.

  buffered_vprintk() and buffered_printk() behave like vprintk() and
  printk() respectively if "struct printk_buffer *" argument is NULL.
  flush_printk_buffer() and put_printk_buffer() become no-op if
  "struct printk_buffer *" argument is NULL. Therefore, the caller of
  get_printk_buffer() does not need to check for NULL.

How to configure this API:

  For those who want to save memory footprint, this API is enabled only
  if CONFIG_PRINTK_LINE_BUFFERED option is selected.

  For those who want to tune the number of statically preallocated
  buffers, CONFIG_PRINTK_NUM_LINE_BUFFERS option is available. The default
  value is 16. Since "struct printk_buffer" makes difference only when
  there are multiple threads concurrently calling printk() which does not
  end with '\n', and this API will fallback to normal printk() when all
  CONFIG_PRINTK_NUM_LINE_BUFFERS buffers are in use, you won't need to
  specify a large number.

  But somebody might forget to call put_printk_buffer(). For those who
  want to know why all CONFIG_PRINTK_NUM_LINE_BUFFERS buffers are in use,
  CONFIG_PRINTK_REPORT_OUT_OF_LINE_BUFFERS option is available.
  This option reports when/where get_printk_buffer() was called and
  put_printk_buffer() is not yet called, up to once per a minute.

How to use this API:

  (1) Call get_printk_buffer() and acquire "struct printk_buffer *".

  (2) Rewrite printk() calls in the following way. The "ptr" is
  "struct printk_buffer *" obtained in step (1).

  printk(fmt, ...) => buffered_printk(ptr, fmt, ...)
  vprintk(fmt, args)   => buffered_vprintk(ptr, fmt, args)
  pr_emerg(fmt, ...)   => bpr_emerg(ptr, fmt, ...)
  pr_alert(fmt, ...)   => bpr_alert(ptr, fmt, ...)
  pr_crit(fmt, ...)=> bpr_crit(ptr, fmt, ...)
  pr_err(fmt, ...) => bpr_err(ptr, fmt, ...)
  pr_warning(fmt, ...) => bpr_warning(ptr, fmt, ...)
  pr_warn(fmt, ...)=> bpr_warn(ptr, fmt, ...)
  pr_notice(fmt, ...)  => bpr_notice(ptr, fmt, ...)
  pr_info(fmt, ...)=> bpr_info(ptr, fmt, ...)
  pr_cont(fmt, ...)=> bpr_cont(ptr, fmt, ...)

  (3) Release "struct printk_buffer" by calling put_printk_buffer().

Note that since "struct printk_buffer" buffers only up to one line, there
is no need to rewrite if it is known that the "struct printk_buffer" is
empty and printk() ends with '\n'.

  Good example:

printk("Hello ");=>  buf = get_printk_buffer();
pr_cont("world.\n"); buffered_printk(buf, "Hello ");
 buffered_printk(buf, "world.\n");
 put_printk_buffer(buf);

  Pointless example:

printk("Hello\n");   =>  buf = get_printk_buffer();
printk("World.\n");  buffered_printk(buf, "Hello\n");
 buffered_printk(buf, "World.\n");
 put_printk_buffer(buf);

Note that bpr_devel() and bpr_debug() are not defined. This is
because pr_devel()/pr_debug() should not be followed by pr_cont()
because pr_devel()/pr_debug() are conditionally enabled; output from
pr_devel()/pr_debug() should always end with '\n'.

Signed-off-by: Tetsuo Handa 
---
 include/linux/printk.h |  41 +
 init/Kconfig   |  31 +++
 kernel/printk/printk.c | 239 +
 3 files changed, 311 insertions(+)

diff --git a/include/linux/printk.h b/include/linux/printk.h

Re: [resend PATCH] MAINTAINERS: Clarify UIO vs UACCESS maintainer

2018-10-12 Thread Al Viro
On Mon, Oct 08, 2018 at 11:50:09AM -0700, Dan Williams wrote:
> The UIO file mask in MAINTAINERS was incorrectly directing UACCESS
> (include/linux/uio.h) patches to Greg.
> 
> Tag Al as the UACCESS maintainer as Ingo and others have explicitly
> required his ack before taking architecture patches that touch
> lib/iov_iter.c.
> 
> Cc: Al Viro 
> Reported-by: Greg Kroah-Hartman 
> Acked-by: Greg Kroah-Hartman 
> Signed-off-by: Dan Williams 
> ---
> I got a bounce last time I tried to send this, hopefully the situation
> has improved now.
> 
> Al, let me know if you want this entry. Alternatively we can just do the
> UIO file mask fixup by itself.

Sure, no problem, except that UACCESS refers to different things.
I'd probably call it UIOVEC, if we want a name that does refer to the
same somewhere (*BSD).  If anything, it's generalization of copyin/copyout
for non-userland destinations/sources; uaccess answer to (subset of) the
same problem would be set_fs()/get_fs() and the fewer we have left of that,
the better...


Re: [resend PATCH] MAINTAINERS: Clarify UIO vs UACCESS maintainer

2018-10-12 Thread Al Viro
On Mon, Oct 08, 2018 at 11:50:09AM -0700, Dan Williams wrote:
> The UIO file mask in MAINTAINERS was incorrectly directing UACCESS
> (include/linux/uio.h) patches to Greg.
> 
> Tag Al as the UACCESS maintainer as Ingo and others have explicitly
> required his ack before taking architecture patches that touch
> lib/iov_iter.c.
> 
> Cc: Al Viro 
> Reported-by: Greg Kroah-Hartman 
> Acked-by: Greg Kroah-Hartman 
> Signed-off-by: Dan Williams 
> ---
> I got a bounce last time I tried to send this, hopefully the situation
> has improved now.
> 
> Al, let me know if you want this entry. Alternatively we can just do the
> UIO file mask fixup by itself.

Sure, no problem, except that UACCESS refers to different things.
I'd probably call it UIOVEC, if we want a name that does refer to the
same somewhere (*BSD).  If anything, it's generalization of copyin/copyout
for non-userland destinations/sources; uaccess answer to (subset of) the
same problem would be set_fs()/get_fs() and the fewer we have left of that,
the better...


Re: [PATCH 4/6] mm: introduce page->dma_pinned_flags, _count

2018-10-12 Thread Dave Chinner
On Thu, Oct 11, 2018 at 11:00:12PM -0700, john.hubb...@gmail.com wrote:
> From: John Hubbard 
> 
> Add two struct page fields that, combined, are unioned with
> struct page->lru. There is no change in the size of
> struct page. These new fields are for type safety and clarity.
> 
> Also add page flag accessors to test, set and clear the new
> page->dma_pinned_flags field.
> 
> The page->dma_pinned_count field will be used in upcoming
> patches
> 
> Signed-off-by: John Hubbard 
> ---
>  include/linux/mm_types.h   | 22 +-
>  include/linux/page-flags.h | 47 ++
>  2 files changed, 63 insertions(+), 6 deletions(-)
> 
> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index 5ed8f6292a53..017ab82e36ca 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -78,12 +78,22 @@ struct page {
>*/
>   union {
>   struct {/* Page cache and anonymous pages */
> - /**
> -  * @lru: Pageout list, eg. active_list protected by
> -  * zone_lru_lock.  Sometimes used as a generic list
> -  * by the page owner.
> -  */
> - struct list_head lru;
> + union {
> + /**
> +  * @lru: Pageout list, eg. active_list protected
> +  * by zone_lru_lock.  Sometimes used as a
> +  * generic list by the page owner.
> +  */
> + struct list_head lru;
> + /* Used by get_user_pages*(). Pages may not be
> +  * on an LRU while these dma_pinned_* fields
> +  * are in use.
> +  */
> + struct {
> + unsigned long dma_pinned_flags;
> + atomic_t  dma_pinned_count;
> + };
> + };

Isn't this broken for mapped file-backed pages? i.e. they may be
passed as the user buffer to read/write direct IO and so the pages
passed to gup will be on the active/inactive LRUs. hence I can't see
how you can have dual use of the LRU list head like this

What am I missing here?

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com


Re: [PATCH 4/6] mm: introduce page->dma_pinned_flags, _count

2018-10-12 Thread Dave Chinner
On Thu, Oct 11, 2018 at 11:00:12PM -0700, john.hubb...@gmail.com wrote:
> From: John Hubbard 
> 
> Add two struct page fields that, combined, are unioned with
> struct page->lru. There is no change in the size of
> struct page. These new fields are for type safety and clarity.
> 
> Also add page flag accessors to test, set and clear the new
> page->dma_pinned_flags field.
> 
> The page->dma_pinned_count field will be used in upcoming
> patches
> 
> Signed-off-by: John Hubbard 
> ---
>  include/linux/mm_types.h   | 22 +-
>  include/linux/page-flags.h | 47 ++
>  2 files changed, 63 insertions(+), 6 deletions(-)
> 
> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index 5ed8f6292a53..017ab82e36ca 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -78,12 +78,22 @@ struct page {
>*/
>   union {
>   struct {/* Page cache and anonymous pages */
> - /**
> -  * @lru: Pageout list, eg. active_list protected by
> -  * zone_lru_lock.  Sometimes used as a generic list
> -  * by the page owner.
> -  */
> - struct list_head lru;
> + union {
> + /**
> +  * @lru: Pageout list, eg. active_list protected
> +  * by zone_lru_lock.  Sometimes used as a
> +  * generic list by the page owner.
> +  */
> + struct list_head lru;
> + /* Used by get_user_pages*(). Pages may not be
> +  * on an LRU while these dma_pinned_* fields
> +  * are in use.
> +  */
> + struct {
> + unsigned long dma_pinned_flags;
> + atomic_t  dma_pinned_count;
> + };
> + };

Isn't this broken for mapped file-backed pages? i.e. they may be
passed as the user buffer to read/write direct IO and so the pages
passed to gup will be on the active/inactive LRUs. hence I can't see
how you can have dual use of the LRU list head like this

What am I missing here?

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com


[PATCH -next] gpio: fix SNPS_CREG kconfig dependency warning

2018-10-12 Thread Randy Dunlap
From: Randy Dunlap 

Fix kconfig warning for GPIO_SNPS_CREG:

WARNING: unmet direct dependencies detected for OF_GPIO
  Depends on [n]: GPIOLIB [=y] && OF [=n] && HAS_IOMEM [=y]
  Selected by [y]:
  - GPIO_SNPS_CREG [=y] && GPIOLIB [=y] && HAS_IOMEM [=y] && (ARC || 
COMPILE_TEST [=y])

Drivers in drivers/gpio/Kconfig depend on OF_GPIO, not select it.
This prevents attempting to build when OF is not enabled.

Signed-off-by: Randy Dunlap 
Cc: Linus Walleij 
Cc: linux-g...@vger.kernel.org
Cc: Eugeniy Paltsev 
---
Found in mmotm but comes from/applies to linux-next.

 drivers/gpio/Kconfig |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- linux-next-20181012.orig/drivers/gpio/Kconfig
+++ linux-next-20181012/drivers/gpio/Kconfig
@@ -439,7 +439,7 @@ config GPIO_SIOX
 config GPIO_SNPS_CREG
bool "Synopsys GPIO via CREG (Control REGisters) driver"
depends on ARC || COMPILE_TEST
-   select OF_GPIO
+   depends on OF_GPIO
help
  This driver supports GPIOs via CREG on various Synopsys SoCs.
  This is a single-register MMIO GPIO driver for complex cases




[PATCH -next] gpio: fix SNPS_CREG kconfig dependency warning

2018-10-12 Thread Randy Dunlap
From: Randy Dunlap 

Fix kconfig warning for GPIO_SNPS_CREG:

WARNING: unmet direct dependencies detected for OF_GPIO
  Depends on [n]: GPIOLIB [=y] && OF [=n] && HAS_IOMEM [=y]
  Selected by [y]:
  - GPIO_SNPS_CREG [=y] && GPIOLIB [=y] && HAS_IOMEM [=y] && (ARC || 
COMPILE_TEST [=y])

Drivers in drivers/gpio/Kconfig depend on OF_GPIO, not select it.
This prevents attempting to build when OF is not enabled.

Signed-off-by: Randy Dunlap 
Cc: Linus Walleij 
Cc: linux-g...@vger.kernel.org
Cc: Eugeniy Paltsev 
---
Found in mmotm but comes from/applies to linux-next.

 drivers/gpio/Kconfig |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- linux-next-20181012.orig/drivers/gpio/Kconfig
+++ linux-next-20181012/drivers/gpio/Kconfig
@@ -439,7 +439,7 @@ config GPIO_SIOX
 config GPIO_SNPS_CREG
bool "Synopsys GPIO via CREG (Control REGisters) driver"
depends on ARC || COMPILE_TEST
-   select OF_GPIO
+   depends on OF_GPIO
help
  This driver supports GPIOs via CREG on various Synopsys SoCs.
  This is a single-register MMIO GPIO driver for complex cases




Re: [PATCH v4 1/2] arm64: dts: allwinner: new board - Emlid Neutis N5

2018-10-12 Thread Chen-Yu Tsai
On Sat, Oct 13, 2018 at 4:23 AM Maxime Ripard  wrote:
>
> On Fri, Oct 12, 2018 at 04:39:14PM +0300, Aleksandr Aleksandrov wrote:
> > Hi Andreas,
> >
> > Thanks for your feedback!
> >
> > > > + *
> > > > + * Copyright (C) 2018 Aleksandr Aleksandrov 
> > > > 
> > > > + */
> > > > +
> > > > +/dts-v1/;
> > > > +
> > > > +#include "sun50i-h5-emlid-neutis-n5.dtsi"
> > > > +
> > > > +/ {
> > > > + model = "Emlid Neutis N5 Developer board";
> > > > + compatible = "emlid,neutis-n5-devboard",
> > > > +  "emlid,neutis-n5",
> > >
> > > You are lacking bindings definitions for these new identifiers. The
> > > vendor prefix should be patch 1/3, the SoM/board compatibles 2/3 and
> > > this .dts[i] patch 3/3, so that only vendor prefixes and compatibles
> > > that are defined and don't result in checkpatch.pl warnings get used.
> >
> > Patch 2/3:
> >
> > commit 46dcb8632b36644cb20e6b35ede12ff0088a60eb
> > Author: Aleksandr Aleksandrov 
> > Date:   Fri Oct 12 16:22:28 2018 +0300
> >
> > dt-bindings: arm: sunxi: emlid,neutis-n5(-devboard)
> >
> > sunxi: add new compatibles for Emlid Neutis Dev board and SoM module
> >
> > Signed-off-by: Aleksandr Aleksandrov 
> >
> > diff --git a/Documentation/devicetree/bindings/arm/sunxi.txt 
> > b/Documentation/devicetree/bindings/arm/sunxi.txt
> > index e4beec3..a907e52 100644
> > --- a/Documentation/devicetree/bindings/arm/sunxi.txt
> > +++ b/Documentation/devicetree/bindings/arm/sunxi.txt
> > @@ -19,3 +19,5 @@ using one of the following compatible strings:
> >allwinner,sun9i-a80
> >allwinner,sun50i-a64
> >nextthing,gr8
> > +  emlid,neutis-n5
> > +  emlid,neutis-n5-devboard
> >
> > Is this right place for the neutis compatibles?
>
> No, those are for SoCs compatible. I'm not sure we ever created a
> board compatible files.

We never did. We don't even have all vendor prefixes in, only the ones
that seem more popular, and actually have an identifiable website.

ChenYu


Re: [PATCH v4 1/2] arm64: dts: allwinner: new board - Emlid Neutis N5

2018-10-12 Thread Chen-Yu Tsai
On Sat, Oct 13, 2018 at 4:23 AM Maxime Ripard  wrote:
>
> On Fri, Oct 12, 2018 at 04:39:14PM +0300, Aleksandr Aleksandrov wrote:
> > Hi Andreas,
> >
> > Thanks for your feedback!
> >
> > > > + *
> > > > + * Copyright (C) 2018 Aleksandr Aleksandrov 
> > > > 
> > > > + */
> > > > +
> > > > +/dts-v1/;
> > > > +
> > > > +#include "sun50i-h5-emlid-neutis-n5.dtsi"
> > > > +
> > > > +/ {
> > > > + model = "Emlid Neutis N5 Developer board";
> > > > + compatible = "emlid,neutis-n5-devboard",
> > > > +  "emlid,neutis-n5",
> > >
> > > You are lacking bindings definitions for these new identifiers. The
> > > vendor prefix should be patch 1/3, the SoM/board compatibles 2/3 and
> > > this .dts[i] patch 3/3, so that only vendor prefixes and compatibles
> > > that are defined and don't result in checkpatch.pl warnings get used.
> >
> > Patch 2/3:
> >
> > commit 46dcb8632b36644cb20e6b35ede12ff0088a60eb
> > Author: Aleksandr Aleksandrov 
> > Date:   Fri Oct 12 16:22:28 2018 +0300
> >
> > dt-bindings: arm: sunxi: emlid,neutis-n5(-devboard)
> >
> > sunxi: add new compatibles for Emlid Neutis Dev board and SoM module
> >
> > Signed-off-by: Aleksandr Aleksandrov 
> >
> > diff --git a/Documentation/devicetree/bindings/arm/sunxi.txt 
> > b/Documentation/devicetree/bindings/arm/sunxi.txt
> > index e4beec3..a907e52 100644
> > --- a/Documentation/devicetree/bindings/arm/sunxi.txt
> > +++ b/Documentation/devicetree/bindings/arm/sunxi.txt
> > @@ -19,3 +19,5 @@ using one of the following compatible strings:
> >allwinner,sun9i-a80
> >allwinner,sun50i-a64
> >nextthing,gr8
> > +  emlid,neutis-n5
> > +  emlid,neutis-n5-devboard
> >
> > Is this right place for the neutis compatibles?
>
> No, those are for SoCs compatible. I'm not sure we ever created a
> board compatible files.

We never did. We don't even have all vendor prefixes in, only the ones
that seem more popular, and actually have an identifiable website.

ChenYu


RE: [PATCH v1] KVM/x86/vPMU: Guest PMI Optimization

2018-10-12 Thread Wang, Wei W
On Saturday, October 13, 2018 12:31 AM, Andi Kleen wrote:
> > 4. Results
> > - Without this optimization, the guest pmi handling time is
> >   ~450 ns, and the max sampling rate is reduced to 250.
> > - With this optimization, the guest pmi handling time is ~9000 ns
> >   (i.e. 1 / 500 of the non-optimization case), and the max sampling
> >   rate remains at the original 10.
> 
> Impressive performance improvement!
> 
> It's not clear to me why you're special casing PMIs here. The optimization
> should work generically, right?


Yes, seems doable. I plan to try some lazy approach for the perf event 
allocation. 

> Is that guaranteed to be always called on the right CPU that will run the 
> vcpu?
> 
> AFAIK there's an ioctl to set MSRs in the guest from qemu, I'm pretty sure it
> won't handle that.
 

Thanks, will consider that case.

Best,
Wei


RE: [PATCH v1] KVM/x86/vPMU: Guest PMI Optimization

2018-10-12 Thread Wang, Wei W
On Saturday, October 13, 2018 12:31 AM, Andi Kleen wrote:
> > 4. Results
> > - Without this optimization, the guest pmi handling time is
> >   ~450 ns, and the max sampling rate is reduced to 250.
> > - With this optimization, the guest pmi handling time is ~9000 ns
> >   (i.e. 1 / 500 of the non-optimization case), and the max sampling
> >   rate remains at the original 10.
> 
> Impressive performance improvement!
> 
> It's not clear to me why you're special casing PMIs here. The optimization
> should work generically, right?


Yes, seems doable. I plan to try some lazy approach for the perf event 
allocation. 

> Is that guaranteed to be always called on the right CPU that will run the 
> vcpu?
> 
> AFAIK there's an ioctl to set MSRs in the guest from qemu, I'm pretty sure it
> won't handle that.
 

Thanks, will consider that case.

Best,
Wei


[PATCH] selftests/ftrace: Add color to the PASS / FAIL results

2018-10-12 Thread Steven Rostedt
From: Steven Rostedt (VMware) 

Now that ftracetest has over 80 tests, it is difficult to simply scroll
up the console window to find the failed tests when it reports just two
tests have failed. In order to make this stand out better, have the
color of the word "PASS" be green, "FAIL" and "XFAIL" be red, and all
other results be blue. This helps tremendously in quickly spotting the
failed tests by just scrolling up the console window.

Signed-off-by: Steven Rostedt (VMware) 
---
diff --git a/tools/testing/selftests/ftrace/ftracetest 
b/tools/testing/selftests/ftrace/ftracetest
index 5c71d58..4946b2e 100755
--- a/tools/testing/selftests/ftrace/ftracetest
+++ b/tools/testing/selftests/ftrace/ftracetest
@@ -153,7 +153,7 @@ else
 fi
 
 prlog() { # messages
-  [ -z "$LOG_FILE" ] && echo "$@" || echo "$@" | tee -a $LOG_FILE
+  [ -z "$LOG_FILE" ] && echo -e "$@" || echo -e "$@" | tee -a $LOG_FILE
 }
 catlog() { #file
   [ -z "$LOG_FILE" ] && cat $1 || cat $1 | tee -a $LOG_FILE
@@ -195,37 +195,37 @@ test_on_instance() { # testfile
 eval_result() { # sigval
   case $1 in
 $PASS)
-  prlog "  [PASS]"
+  prlog "  [\e[32mPASS\e[30m]"
   PASSED_CASES="$PASSED_CASES $CASENO"
   return 0
 ;;
 $FAIL)
-  prlog "  [FAIL]"
+  prlog "  [\e[31mFAIL\e[30m]"
   FAILED_CASES="$FAILED_CASES $CASENO"
   return 1 # this is a bug.
 ;;
 $UNRESOLVED)
-  prlog "  [UNRESOLVED]"
+  prlog "  [\e[34mUNRESOLVED\e[30m]"
   UNRESOLVED_CASES="$UNRESOLVED_CASES $CASENO"
   return 1 # this is a kind of bug.. something happened.
 ;;
 $UNTESTED)
-  prlog "  [UNTESTED]"
+  prlog "  [\e[34mUNTESTED\e[30m]"
   UNTESTED_CASES="$UNTESTED_CASES $CASENO"
   return 0
 ;;
 $UNSUPPORTED)
-  prlog "  [UNSUPPORTED]"
+  prlog "  [\e[34mUNSUPPORTED\e[30m]"
   UNSUPPORTED_CASES="$UNSUPPORTED_CASES $CASENO"
   return $UNSUPPORTED_RESULT # depends on use case
 ;;
 $XFAIL)
-  prlog "  [XFAIL]"
+  prlog "  [\e[31mXFAIL\e[30m]"
   XFAILED_CASES="$XFAILED_CASES $CASENO"
   return 0
 ;;
 *)
-  prlog "  [UNDEFINED]"
+  prlog "  [\e[34mUNDEFINED\e[30m]"
   UNDEFINED_CASES="$UNDEFINED_CASES $CASENO"
   return 1 # this must be a test bug
 ;;


[PATCH] selftests/ftrace: Add color to the PASS / FAIL results

2018-10-12 Thread Steven Rostedt
From: Steven Rostedt (VMware) 

Now that ftracetest has over 80 tests, it is difficult to simply scroll
up the console window to find the failed tests when it reports just two
tests have failed. In order to make this stand out better, have the
color of the word "PASS" be green, "FAIL" and "XFAIL" be red, and all
other results be blue. This helps tremendously in quickly spotting the
failed tests by just scrolling up the console window.

Signed-off-by: Steven Rostedt (VMware) 
---
diff --git a/tools/testing/selftests/ftrace/ftracetest 
b/tools/testing/selftests/ftrace/ftracetest
index 5c71d58..4946b2e 100755
--- a/tools/testing/selftests/ftrace/ftracetest
+++ b/tools/testing/selftests/ftrace/ftracetest
@@ -153,7 +153,7 @@ else
 fi
 
 prlog() { # messages
-  [ -z "$LOG_FILE" ] && echo "$@" || echo "$@" | tee -a $LOG_FILE
+  [ -z "$LOG_FILE" ] && echo -e "$@" || echo -e "$@" | tee -a $LOG_FILE
 }
 catlog() { #file
   [ -z "$LOG_FILE" ] && cat $1 || cat $1 | tee -a $LOG_FILE
@@ -195,37 +195,37 @@ test_on_instance() { # testfile
 eval_result() { # sigval
   case $1 in
 $PASS)
-  prlog "  [PASS]"
+  prlog "  [\e[32mPASS\e[30m]"
   PASSED_CASES="$PASSED_CASES $CASENO"
   return 0
 ;;
 $FAIL)
-  prlog "  [FAIL]"
+  prlog "  [\e[31mFAIL\e[30m]"
   FAILED_CASES="$FAILED_CASES $CASENO"
   return 1 # this is a bug.
 ;;
 $UNRESOLVED)
-  prlog "  [UNRESOLVED]"
+  prlog "  [\e[34mUNRESOLVED\e[30m]"
   UNRESOLVED_CASES="$UNRESOLVED_CASES $CASENO"
   return 1 # this is a kind of bug.. something happened.
 ;;
 $UNTESTED)
-  prlog "  [UNTESTED]"
+  prlog "  [\e[34mUNTESTED\e[30m]"
   UNTESTED_CASES="$UNTESTED_CASES $CASENO"
   return 0
 ;;
 $UNSUPPORTED)
-  prlog "  [UNSUPPORTED]"
+  prlog "  [\e[34mUNSUPPORTED\e[30m]"
   UNSUPPORTED_CASES="$UNSUPPORTED_CASES $CASENO"
   return $UNSUPPORTED_RESULT # depends on use case
 ;;
 $XFAIL)
-  prlog "  [XFAIL]"
+  prlog "  [\e[31mXFAIL\e[30m]"
   XFAILED_CASES="$XFAILED_CASES $CASENO"
   return 0
 ;;
 *)
-  prlog "  [UNDEFINED]"
+  prlog "  [\e[34mUNDEFINED\e[30m]"
   UNDEFINED_CASES="$UNDEFINED_CASES $CASENO"
   return 1 # this must be a test bug
 ;;


mmotm 2018-10-12-19-18 uploaded

2018-10-12 Thread akpm
The mm-of-the-moment snapshot 2018-10-12-19-18 has been uploaded to

   http://www.ozlabs.org/~akpm/mmotm/

mmotm-readme.txt says

README for mm-of-the-moment:

http://www.ozlabs.org/~akpm/mmotm/

This is a snapshot of my -mm patch queue.  Uploaded at random hopefully
more than once a week.

You will need quilt to apply these patches to the latest Linus release (4.x
or 4.x-rcY).  The series file is in broken-out.tar.gz and is duplicated in
http://ozlabs.org/~akpm/mmotm/series

The file broken-out.tar.gz contains two datestamp files: .DATE and
.DATE--mm-dd-hh-mm-ss.  Both contain the string -mm-dd-hh-mm-ss,
followed by the base kernel version against which this patch series is to
be applied.

This tree is partially included in linux-next.  To see which patches are
included in linux-next, consult the `series' file.  Only the patches
within the #NEXT_PATCHES_START/#NEXT_PATCHES_END markers are included in
linux-next.

A git tree which contains the memory management portion of this tree is
maintained at git://git.kernel.org/pub/scm/linux/kernel/git/mhocko/mm.git
by Michal Hocko.  It contains the patches which are between the
"#NEXT_PATCHES_START mm" and "#NEXT_PATCHES_END" markers, from the series
file, http://www.ozlabs.org/~akpm/mmotm/series.


A full copy of the full kernel tree with the linux-next and mmotm patches
already applied is available through git within an hour of the mmotm
release.  Individual mmotm releases are tagged.  The master branch always
points to the latest release, so it's constantly rebasing.

http://git.cmpxchg.org/cgit.cgi/linux-mmotm.git/

To develop on top of mmotm git:

  $ git remote add mmotm 
git://git.kernel.org/pub/scm/linux/kernel/git/mhocko/mm.git
  $ git remote update mmotm
  $ git checkout -b topic mmotm/master
  
  $ git send-email mmotm/master.. [...]

To rebase a branch with older patches to a new mmotm release:

  $ git remote update mmotm
  $ git rebase --onto mmotm/master  topic




The directory http://www.ozlabs.org/~akpm/mmots/ (mm-of-the-second)
contains daily snapshots of the -mm tree.  It is updated more frequently
than mmotm, and is untested.

A git copy of this tree is available at

http://git.cmpxchg.org/cgit.cgi/linux-mmots.git/

and use of this tree is similar to
http://git.cmpxchg.org/cgit.cgi/linux-mmotm.git/, described above.


This mmotm tree contains the following patches against 4.19-rc7:
(patches marked "*" will be included in linux-next)

  origin.patch
* ocfs2-fix-a-gcc-compiled-warning.patch
* mm-dont-clobber-partially-overlapping-vma-with-map_fixed_noreplace.patch
* mm-thp-fix-call-to-mmu_notifier-in-set_pmd_migration_entry-v2.patch
* fs-fat-add-cond_resched-to-fat_count_free_clusters.patch
* mm-thp-always-specify-disabled-vmas-as-nh-in-smaps.patch
* mm-thp-relax-__gfp_thisnode-for-madv_hugepage-mappings.patch
* arm-arch-arm-include-asm-pageh-needs-personalityh.patch
* linkageh-align-weak-symbols.patch
* arm64-lib-use-c-string-functions-with-kasan-enabled.patch
* lib-test_kasan-add-tests-for-several-string-memory-api-functions.patch
* scripts-tags-add-declare_hashtable.patch
* ocfs2-dlm-remove-unnecessary-parentheses.patch
* ocfs2-remove-unused-pointer-eb.patch
* ocfs2-fix-unneeded-null-check.patch
* fs-ocfs2-dlm-fix-a-sleep-in-atomic-context-bug-in-dlm_print_one_mle.patch
* ocfs2-remove-set-but-not-used-variable-rb.patch
* ocfs2-get-rid-of-ocfs2_is_o2cb_active-function.patch
* ocfs2-without-quota-support-try-to-avoid-calling-quota-recovery.patch
* ocfs2-dont-use-iocb-when-eiocbqueued-returns.patch
* ocfs2-fix-a-misuse-a-of-brelse-after-failing-ocfs2_check_dir_entry.patch
* ocfs2-dont-put-and-assigning-null-to-bh-allocated-outside.patch
* ocfs2-dlmglue-clean-up-timestamp-handling.patch
* fix-dead-lock-caused-by-ocfs2_defrag_extent.patch
* ocfs2-fix-dead-lock-caused-by-ocfs2_defrag_extent.patch
* fix-clusters-leak-in-ocfs2_defrag_extent.patch
* fix-clusters-leak-in-ocfs2_defrag_extent-fix.patch
* 
block-restore-proc-partitions-to-not-display-non-partitionable-removable-devices.patch
* vfs-allow-dedupe-of-user-owned-read-only-files.patch
* vfs-dedupe-should-return-eperm-if-permission-is-not-granted.patch
* fs-iomap-change-return-type-to-vm_fault_t.patch
* xtensa-use-generic-vgah.patch
  mm.patch
* mm-slubc-switch-to-bitmap_zalloc.patch
* mm-dont-warn-about-large-allocations-for-slab.patch
* slub-extend-slub-debug-to-handle-multiple-slabs.patch
* mm-rework-memcg-kernel-stack-accounting.patch
* mm-drain-memcg-stocks-on-css-offlining.patch
* mm-dont-miss-the-last-page-because-of-round-off-error.patch
* mm-dont-miss-the-last-page-because-of-round-off-error-fix.patch
* mmpage_alloc-pf_wq_worker-threads-must-sleep-at-should_reclaim_retry.patch
* mmpage_alloc-pf_wq_worker-threads-must-sleep-at-should_reclaim_retry-fix.patch
* mm-mmu_notifier-be-explicit-about-range-invalition-non-blocking-mode.patch
* 
revert-mm-mmu_notifier-annotate-mmu-notifiers-with-blockable-invalidate-callbacks.patch
* 

mmotm 2018-10-12-19-18 uploaded

2018-10-12 Thread akpm
The mm-of-the-moment snapshot 2018-10-12-19-18 has been uploaded to

   http://www.ozlabs.org/~akpm/mmotm/

mmotm-readme.txt says

README for mm-of-the-moment:

http://www.ozlabs.org/~akpm/mmotm/

This is a snapshot of my -mm patch queue.  Uploaded at random hopefully
more than once a week.

You will need quilt to apply these patches to the latest Linus release (4.x
or 4.x-rcY).  The series file is in broken-out.tar.gz and is duplicated in
http://ozlabs.org/~akpm/mmotm/series

The file broken-out.tar.gz contains two datestamp files: .DATE and
.DATE--mm-dd-hh-mm-ss.  Both contain the string -mm-dd-hh-mm-ss,
followed by the base kernel version against which this patch series is to
be applied.

This tree is partially included in linux-next.  To see which patches are
included in linux-next, consult the `series' file.  Only the patches
within the #NEXT_PATCHES_START/#NEXT_PATCHES_END markers are included in
linux-next.

A git tree which contains the memory management portion of this tree is
maintained at git://git.kernel.org/pub/scm/linux/kernel/git/mhocko/mm.git
by Michal Hocko.  It contains the patches which are between the
"#NEXT_PATCHES_START mm" and "#NEXT_PATCHES_END" markers, from the series
file, http://www.ozlabs.org/~akpm/mmotm/series.


A full copy of the full kernel tree with the linux-next and mmotm patches
already applied is available through git within an hour of the mmotm
release.  Individual mmotm releases are tagged.  The master branch always
points to the latest release, so it's constantly rebasing.

http://git.cmpxchg.org/cgit.cgi/linux-mmotm.git/

To develop on top of mmotm git:

  $ git remote add mmotm 
git://git.kernel.org/pub/scm/linux/kernel/git/mhocko/mm.git
  $ git remote update mmotm
  $ git checkout -b topic mmotm/master
  
  $ git send-email mmotm/master.. [...]

To rebase a branch with older patches to a new mmotm release:

  $ git remote update mmotm
  $ git rebase --onto mmotm/master  topic




The directory http://www.ozlabs.org/~akpm/mmots/ (mm-of-the-second)
contains daily snapshots of the -mm tree.  It is updated more frequently
than mmotm, and is untested.

A git copy of this tree is available at

http://git.cmpxchg.org/cgit.cgi/linux-mmots.git/

and use of this tree is similar to
http://git.cmpxchg.org/cgit.cgi/linux-mmotm.git/, described above.


This mmotm tree contains the following patches against 4.19-rc7:
(patches marked "*" will be included in linux-next)

  origin.patch
* ocfs2-fix-a-gcc-compiled-warning.patch
* mm-dont-clobber-partially-overlapping-vma-with-map_fixed_noreplace.patch
* mm-thp-fix-call-to-mmu_notifier-in-set_pmd_migration_entry-v2.patch
* fs-fat-add-cond_resched-to-fat_count_free_clusters.patch
* mm-thp-always-specify-disabled-vmas-as-nh-in-smaps.patch
* mm-thp-relax-__gfp_thisnode-for-madv_hugepage-mappings.patch
* arm-arch-arm-include-asm-pageh-needs-personalityh.patch
* linkageh-align-weak-symbols.patch
* arm64-lib-use-c-string-functions-with-kasan-enabled.patch
* lib-test_kasan-add-tests-for-several-string-memory-api-functions.patch
* scripts-tags-add-declare_hashtable.patch
* ocfs2-dlm-remove-unnecessary-parentheses.patch
* ocfs2-remove-unused-pointer-eb.patch
* ocfs2-fix-unneeded-null-check.patch
* fs-ocfs2-dlm-fix-a-sleep-in-atomic-context-bug-in-dlm_print_one_mle.patch
* ocfs2-remove-set-but-not-used-variable-rb.patch
* ocfs2-get-rid-of-ocfs2_is_o2cb_active-function.patch
* ocfs2-without-quota-support-try-to-avoid-calling-quota-recovery.patch
* ocfs2-dont-use-iocb-when-eiocbqueued-returns.patch
* ocfs2-fix-a-misuse-a-of-brelse-after-failing-ocfs2_check_dir_entry.patch
* ocfs2-dont-put-and-assigning-null-to-bh-allocated-outside.patch
* ocfs2-dlmglue-clean-up-timestamp-handling.patch
* fix-dead-lock-caused-by-ocfs2_defrag_extent.patch
* ocfs2-fix-dead-lock-caused-by-ocfs2_defrag_extent.patch
* fix-clusters-leak-in-ocfs2_defrag_extent.patch
* fix-clusters-leak-in-ocfs2_defrag_extent-fix.patch
* 
block-restore-proc-partitions-to-not-display-non-partitionable-removable-devices.patch
* vfs-allow-dedupe-of-user-owned-read-only-files.patch
* vfs-dedupe-should-return-eperm-if-permission-is-not-granted.patch
* fs-iomap-change-return-type-to-vm_fault_t.patch
* xtensa-use-generic-vgah.patch
  mm.patch
* mm-slubc-switch-to-bitmap_zalloc.patch
* mm-dont-warn-about-large-allocations-for-slab.patch
* slub-extend-slub-debug-to-handle-multiple-slabs.patch
* mm-rework-memcg-kernel-stack-accounting.patch
* mm-drain-memcg-stocks-on-css-offlining.patch
* mm-dont-miss-the-last-page-because-of-round-off-error.patch
* mm-dont-miss-the-last-page-because-of-round-off-error-fix.patch
* mmpage_alloc-pf_wq_worker-threads-must-sleep-at-should_reclaim_retry.patch
* mmpage_alloc-pf_wq_worker-threads-must-sleep-at-should_reclaim_retry-fix.patch
* mm-mmu_notifier-be-explicit-about-range-invalition-non-blocking-mode.patch
* 
revert-mm-mmu_notifier-annotate-mmu-notifiers-with-blockable-invalidate-callbacks.patch
* 

Re: [PATCH for 4.19] tracepoint: Fix: out-of-bound tracepoint array iteration

2018-10-12 Thread kbuild test robot
Hi Mathieu,

I love your patch! Yet something to improve:

[auto build test ERROR on tip/perf/core]
[also build test ERROR on v4.19-rc7 next-20181012]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Mathieu-Desnoyers/tracepoint-Fix-out-of-bound-tracepoint-array-iteration/20181013-073410
config: arm-allyesconfig (attached as .config)
compiler: arm-linux-gnueabi-gcc (Debian 7.2.0-11) 7.2.0
reproduce:
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
GCC_VERSION=7.2.0 make.cross ARCH=arm 

All errors (new ones prefixed by >>):

   In file included from drivers/media/media-devnode.c:33:0:
>> include/linux/module.h:433:2: error: unknown type name 'tracepoint_ptr_t'
 tracepoint_ptr_t *tracepoints_ptrs;
 ^~~~

vim +/tracepoint_ptr_t +433 include/linux/module.h

   430  
   431  #ifdef CONFIG_TRACEPOINTS
   432  unsigned int num_tracepoints;
 > 433  tracepoint_ptr_t *tracepoints_ptrs;
   434  #endif
   435  #ifdef HAVE_JUMP_LABEL
   436  struct jump_entry *jump_entries;
   437  unsigned int num_jump_entries;
   438  #endif
   439  #ifdef CONFIG_TRACING
   440  unsigned int num_trace_bprintk_fmt;
   441  const char **trace_bprintk_fmt_start;
   442  #endif
   443  #ifdef CONFIG_EVENT_TRACING
   444  struct trace_event_call **trace_events;
   445  unsigned int num_trace_events;
   446  struct trace_eval_map **trace_evals;
   447  unsigned int num_trace_evals;
   448  #endif
   449  #ifdef CONFIG_FTRACE_MCOUNT_RECORD
   450  unsigned int num_ftrace_callsites;
   451  unsigned long *ftrace_callsites;
   452  #endif
   453  

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip


Re: [PATCH for 4.19] tracepoint: Fix: out-of-bound tracepoint array iteration

2018-10-12 Thread kbuild test robot
Hi Mathieu,

I love your patch! Yet something to improve:

[auto build test ERROR on tip/perf/core]
[also build test ERROR on v4.19-rc7 next-20181012]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Mathieu-Desnoyers/tracepoint-Fix-out-of-bound-tracepoint-array-iteration/20181013-073410
config: arm-allyesconfig (attached as .config)
compiler: arm-linux-gnueabi-gcc (Debian 7.2.0-11) 7.2.0
reproduce:
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
GCC_VERSION=7.2.0 make.cross ARCH=arm 

All errors (new ones prefixed by >>):

   In file included from drivers/media/media-devnode.c:33:0:
>> include/linux/module.h:433:2: error: unknown type name 'tracepoint_ptr_t'
 tracepoint_ptr_t *tracepoints_ptrs;
 ^~~~

vim +/tracepoint_ptr_t +433 include/linux/module.h

   430  
   431  #ifdef CONFIG_TRACEPOINTS
   432  unsigned int num_tracepoints;
 > 433  tracepoint_ptr_t *tracepoints_ptrs;
   434  #endif
   435  #ifdef HAVE_JUMP_LABEL
   436  struct jump_entry *jump_entries;
   437  unsigned int num_jump_entries;
   438  #endif
   439  #ifdef CONFIG_TRACING
   440  unsigned int num_trace_bprintk_fmt;
   441  const char **trace_bprintk_fmt_start;
   442  #endif
   443  #ifdef CONFIG_EVENT_TRACING
   444  struct trace_event_call **trace_events;
   445  unsigned int num_trace_events;
   446  struct trace_eval_map **trace_evals;
   447  unsigned int num_trace_evals;
   448  #endif
   449  #ifdef CONFIG_FTRACE_MCOUNT_RECORD
   450  unsigned int num_ftrace_callsites;
   451  unsigned long *ftrace_callsites;
   452  #endif
   453  

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip


Re: [PATCH] MAINTAINERS: Replace Vince Bridgers as Altera TSE maintainer

2018-10-12 Thread Alan Tull
On Fri, Oct 12, 2018 at 8:12 PM Bridgers, Vince
 wrote:
>
>
>
> -Original Message-
> From: Thor Thayer 
> Sent: Friday, October 12, 2018 3:49 PM
> To: Greg KH 
> Cc: geert+rene...@glider.be; jho...@kernel.org; at...@kernel.org; 
> bhelg...@google.com; james.hart...@sondrel.com; linux-kernel@vger.kernel.org; 
> Bridgers, Vince 
> Subject: Re: [PATCH] MAINTAINERS: Replace Vince Bridgers as Altera TSE 
> maintainer
>
> On 10/12/2018 11:57 AM, Greg KH wrote:
> > On Fri, Oct 12, 2018 at 11:50:52AM -0500, thor.tha...@linux.intel.com wrote:
> >> From: Thor Thayer 
> >>
> >> Vince has moved to a different role. Replace him as Altera TSE
> >> maintainer.
> >>
> >> Signed-off-by: Thor Thayer 
> >
> > Would be nice if Vince can ack this...
> >
> + Vince (at a new email address)
>
> Thanks Thor,
>
> Acked-by: Vince Bridgers 

Acked-by:Alan Tull 

Alan


Re: [PATCH] MAINTAINERS: Replace Vince Bridgers as Altera TSE maintainer

2018-10-12 Thread Alan Tull
On Fri, Oct 12, 2018 at 8:12 PM Bridgers, Vince
 wrote:
>
>
>
> -Original Message-
> From: Thor Thayer 
> Sent: Friday, October 12, 2018 3:49 PM
> To: Greg KH 
> Cc: geert+rene...@glider.be; jho...@kernel.org; at...@kernel.org; 
> bhelg...@google.com; james.hart...@sondrel.com; linux-kernel@vger.kernel.org; 
> Bridgers, Vince 
> Subject: Re: [PATCH] MAINTAINERS: Replace Vince Bridgers as Altera TSE 
> maintainer
>
> On 10/12/2018 11:57 AM, Greg KH wrote:
> > On Fri, Oct 12, 2018 at 11:50:52AM -0500, thor.tha...@linux.intel.com wrote:
> >> From: Thor Thayer 
> >>
> >> Vince has moved to a different role. Replace him as Altera TSE
> >> maintainer.
> >>
> >> Signed-off-by: Thor Thayer 
> >
> > Would be nice if Vince can ack this...
> >
> + Vince (at a new email address)
>
> Thanks Thor,
>
> Acked-by: Vince Bridgers 

Acked-by:Alan Tull 

Alan


Re: [PATCH] docs: Fix typos in histogram.rst

2018-10-12 Thread Steven Rostedt
On Sat, 13 Oct 2018 10:30:57 +0900
Masanari Iida  wrote:

> This patch fixes some spelling typos.
> 
> Signed-off-by: Masanari Iida 
> ---
>  Documentation/trace/histogram.rst | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/Documentation/trace/histogram.rst 
> b/Documentation/trace/histogram.rst
> index 5ac724baea7d..7dda76503127 100644
> --- a/Documentation/trace/histogram.rst
> +++ b/Documentation/trace/histogram.rst
> @@ -1765,7 +1765,7 @@ For example, here's how a latency can be calculated::
># echo 'hist:keys=pid,prio:ts0=common_timestamp ...' >> event1/trigger
># echo 'hist:keys=next_pid:wakeup_lat=common_timestamp-$ts0 ...' >> 
> event2/trigger
>  
> -In the first line above, the event's timetamp is saved into the
> +In the first line above, the event's timestamp is saved into the
>  variable ts0.  In the next line, ts0 is subtracted from the second
>  event's timestamp to produce the latency, which is then assigned into
>  yet another variable, 'wakeup_lat'.  The hist trigger below in turn
> @@ -1811,7 +1811,7 @@ the command that defined it with a '!'::
>  /sys/kernel/debug/tracing/synthetic_events
>  
>  At this point, there isn't yet an actual 'wakeup_latency' event
> -instantiated in the event subsytem - for this to happen, a 'hist
> +instantiated in the event subsystem - for this to happen, a 'hist

The first two appear to show that Tom has a faulty 's' key ;-)

Acked-by: Steven Rostedt (VMware) 

-- Steve



>  trigger action' needs to be instantiated and bound to actual fields
>  and variables defined on other events (see Section 2.2.3 below on
>  how that is done using hist trigger 'onmatch' action). Once that is
> @@ -1837,7 +1837,7 @@ output can be displayed by reading the event's 'hist' 
> file.
>  A hist trigger 'action' is a function that's executed whenever a
>  histogram entry is added or updated.
>  
> -The default 'action' if no special function is explicity specified is
> +The default 'action' if no special function is explicitly specified is
>  as it always has been, to simply update the set of values associated
>  with an entry.  Some applications, however, may want to perform
>  additional actions at that point, such as generate another event, or



Re: [PATCH] docs: Fix typos in histogram.rst

2018-10-12 Thread Steven Rostedt
On Sat, 13 Oct 2018 10:30:57 +0900
Masanari Iida  wrote:

> This patch fixes some spelling typos.
> 
> Signed-off-by: Masanari Iida 
> ---
>  Documentation/trace/histogram.rst | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/Documentation/trace/histogram.rst 
> b/Documentation/trace/histogram.rst
> index 5ac724baea7d..7dda76503127 100644
> --- a/Documentation/trace/histogram.rst
> +++ b/Documentation/trace/histogram.rst
> @@ -1765,7 +1765,7 @@ For example, here's how a latency can be calculated::
># echo 'hist:keys=pid,prio:ts0=common_timestamp ...' >> event1/trigger
># echo 'hist:keys=next_pid:wakeup_lat=common_timestamp-$ts0 ...' >> 
> event2/trigger
>  
> -In the first line above, the event's timetamp is saved into the
> +In the first line above, the event's timestamp is saved into the
>  variable ts0.  In the next line, ts0 is subtracted from the second
>  event's timestamp to produce the latency, which is then assigned into
>  yet another variable, 'wakeup_lat'.  The hist trigger below in turn
> @@ -1811,7 +1811,7 @@ the command that defined it with a '!'::
>  /sys/kernel/debug/tracing/synthetic_events
>  
>  At this point, there isn't yet an actual 'wakeup_latency' event
> -instantiated in the event subsytem - for this to happen, a 'hist
> +instantiated in the event subsystem - for this to happen, a 'hist

The first two appear to show that Tom has a faulty 's' key ;-)

Acked-by: Steven Rostedt (VMware) 

-- Steve



>  trigger action' needs to be instantiated and bound to actual fields
>  and variables defined on other events (see Section 2.2.3 below on
>  how that is done using hist trigger 'onmatch' action). Once that is
> @@ -1837,7 +1837,7 @@ output can be displayed by reading the event's 'hist' 
> file.
>  A hist trigger 'action' is a function that's executed whenever a
>  histogram entry is added or updated.
>  
> -The default 'action' if no special function is explicity specified is
> +The default 'action' if no special function is explicitly specified is
>  as it always has been, to simply update the set of values associated
>  with an entry.  Some applications, however, may want to perform
>  additional actions at that point, such as generate another event, or



ubifs: WARNINGs

2018-10-12 Thread Randy Dunlap
Hi,

Linux 4.19-rc7, x86_64 laptop.

I don't have an ubifs filesystem.  When I just modprobe ubifs and then
rmmod ubifs, I get 2 WARNINGs from these 2 lines:

static void __exit ubifs_exit(void)
{
WARN_ON(list_empty(_infos));
WARN_ON(atomic_long_read(_clean_zn_cnt) == 0);

Is this normal/expected?



[  977.091170] calling  init_mtd+0x0/0x100 [mtd] @ 3187
[  977.092365] initcall init_mtd+0x0/0x100 [mtd] returned 0 after 251 usecs
[  977.146490] calling  ubi_init+0x0/0x249 [ubi] @ 3187
[  977.148351] initcall ubi_init+0x0/0x249 [ubi] returned 0 after 896 usecs
[  977.260504] calling  ubifs_init+0x0/0xae [ubifs] @ 3187
[  977.261976] initcall ubifs_init+0x0/0xae [ubifs] returned 0 after 516 usecs
[  987.695589] WARNING: CPU: 2 PID: 3217 at ../fs/ubifs/super.c:2340 
ubifs_exit+0xf/0x488 [ubifs]
[  987.697142] Modules linked in: ubifs(-) ubi mtd fuse xt_tcpudp ip6t_rpfilter 
ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_conntrack ip_set 
nfnetlink af_packet ebtable_nat ebtable_broute bridge stp llc ip6table_nat 
nf_nat_ipv6 ip6table_mangle ip6table_raw ip6table_security iptable_nat 
nf_nat_ipv4 nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_mangle 
iptable_raw iptable_security ebtable_filter ebtables ip6table_filter ip6_tables 
iptable_filter ip_tables x_tables bpfilter snd_hda_codec_hdmi 
snd_hda_codec_realtek btrfs snd_hda_codec_generic xor coretemp zstd_compress 
hwmon i915 raid6_pq uvcvideo intel_rapl x86_pkg_temp_thermal intel_powerclamp 
libcrc32c zstd_decompress xxhash videobuf2_vmalloc kvm_intel videobuf2_memops 
crct10dif_pclmul videobuf2_v4l2 videobuf2_common hid_generic
[  987.709985]  videodev arc4 usbmouse crc32_pclmul usbhid iwldvm media 
crc32c_intel hid ghash_clmulni_intel pcbc mac80211 msr aesni_intel aes_x86_64 
crypto_simd cryptd glue_helper iwlwifi kvmgt vfio_mdev mdev vfio_iommu_type1 
vfio intel_cstate snd_hda_intel snd_hda_codec gpio_ich sdhci_pci iTCO_wdt kvm 
iTCO_vendor_support snd_hda_core cqhci cfg80211 intel_uncore intel_rapl_perf 
snd_hwdep sdhci snd_pcm irqbypass joydev mei_me mmc_core mei toshiba_acpi evdev 
sparse_keymap e1000e mousedev snd_timer input_leds mac_hid sr_mod snd 
uio_pdrv_genirq lpc_ich uio rfkill wmi serio_raw led_class industrialio cdrom 
thermal video soundcore button pcc_cpufreq ac pcspkr toshiba_haps battery sg 
dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua autofs4
[  987.721976] CPU: 2 PID: 3217 Comm: rmmod Not tainted 4.19.0-rc7mod #2
[  987.723150] Hardware name: TOSHIBA PORTEGE R835/Portable PC, BIOS Version 
4.10   01/08/2013
[  987.724583] RIP: 0010:ubifs_exit+0xf/0x488 [ubifs]
[  987.725377] Code: c3 8b 97 24 0a 00 00 6a 00 45 31 c9 41 b8 10 00 00 00 31 
c9 e8 3a 32 ff ff 5a c3 48 8b 05 b1 e6 01 00 48 3d 30 62 fb c0 75 02 <0f> 0b 48 
8b 05 f0 3b 02 00 48 85 c0 75 02 0f 0b e8 44 b9 ff ff e8
[  987.728689] RSP: 0018:a3a18286fed0 EFLAGS: 00010246
[  987.729608] RAX: c0fb6230 RBX: c0fbb400 RCX: 0001
[  987.730907] RDX:  RSI: 0001 RDI: 0246
[  987.732211] RBP:  R08: 0001 R09: 0001
[  987.733512] R10:  R11:  R12: 0800
[  987.734807] R13:  R14:  R15: 
[  987.736103] FS:  7fd2ded67b80() GS:88664ae0() 
knlGS:
[  987.737527] CS:  0010 DS:  ES:  CR0: 80050033
[  987.738571] CR2: 55b4d9cd4ed8 CR3: 0001352b4004 CR4: 000606e0
[  987.739869] Call Trace:
[  987.740292]  __x64_sys_delete_module+0x157/0x240
[  987.741086]  ? task_work_run+0x6e/0xb0
[  987.741757]  do_syscall_64+0x60/0x180
[  987.742426]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  987.743346] RIP: 0033:0x7fd2de453f87
[  987.744013] Code: 73 01 c3 48 8b 0d 11 af 2b 00 f7 d8 64 89 01 48 83 c8 ff 
c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 b8 b0 00 00 00 0f 05 <48> 3d 01 
f0 ff ff 73 01 c3 48 8b 0d e1 ae 2b 00 f7 d8 64 89 01 48
[  987.747326] RSP: 002b:7ffe66c1a6d8 EFLAGS: 0206 ORIG_RAX: 
00b0
[  987.748624] RAX: ffda RBX: 7ffe66c1a738 RCX: 7fd2de453f87
[  987.749919] RDX: 000a RSI: 0800 RDI: 55b4d9cca7e8
[  987.751215] RBP: 55b4d9cca780 R08: 7ffe66c19651 R09: 
[  987.752512] R10: 7fd2de4c31e0 R11: 0206 R12: 7ffe66c1a900
[  987.753808] R13: 7ffe66c1c256 R14: 55b4d9cca260 R15: 55b4d9cca780
[  987.755111] ---[ end trace 512f0dc2470230e0 ]---
[  987.755997] WARNING: CPU: 2 PID: 3217 at ../fs/ubifs/super.c:2341 
ubifs_exit+0x1d/0x488 [ubifs]
[  987.756383] systemd-journald[142]: Compressed data object 804 -> 577 using 
LZ4
[  987.757553] Modules linked in: ubifs(-) ubi mtd fuse xt_tcpudp ip6t_rpfilter 
ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_conntrack ip_set 
nfnetlink af_packet ebtable_nat ebtable_broute bridge stp llc ip6table_nat 
nf_nat_ipv6 

ubifs: WARNINGs

2018-10-12 Thread Randy Dunlap
Hi,

Linux 4.19-rc7, x86_64 laptop.

I don't have an ubifs filesystem.  When I just modprobe ubifs and then
rmmod ubifs, I get 2 WARNINGs from these 2 lines:

static void __exit ubifs_exit(void)
{
WARN_ON(list_empty(_infos));
WARN_ON(atomic_long_read(_clean_zn_cnt) == 0);

Is this normal/expected?



[  977.091170] calling  init_mtd+0x0/0x100 [mtd] @ 3187
[  977.092365] initcall init_mtd+0x0/0x100 [mtd] returned 0 after 251 usecs
[  977.146490] calling  ubi_init+0x0/0x249 [ubi] @ 3187
[  977.148351] initcall ubi_init+0x0/0x249 [ubi] returned 0 after 896 usecs
[  977.260504] calling  ubifs_init+0x0/0xae [ubifs] @ 3187
[  977.261976] initcall ubifs_init+0x0/0xae [ubifs] returned 0 after 516 usecs
[  987.695589] WARNING: CPU: 2 PID: 3217 at ../fs/ubifs/super.c:2340 
ubifs_exit+0xf/0x488 [ubifs]
[  987.697142] Modules linked in: ubifs(-) ubi mtd fuse xt_tcpudp ip6t_rpfilter 
ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_conntrack ip_set 
nfnetlink af_packet ebtable_nat ebtable_broute bridge stp llc ip6table_nat 
nf_nat_ipv6 ip6table_mangle ip6table_raw ip6table_security iptable_nat 
nf_nat_ipv4 nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_mangle 
iptable_raw iptable_security ebtable_filter ebtables ip6table_filter ip6_tables 
iptable_filter ip_tables x_tables bpfilter snd_hda_codec_hdmi 
snd_hda_codec_realtek btrfs snd_hda_codec_generic xor coretemp zstd_compress 
hwmon i915 raid6_pq uvcvideo intel_rapl x86_pkg_temp_thermal intel_powerclamp 
libcrc32c zstd_decompress xxhash videobuf2_vmalloc kvm_intel videobuf2_memops 
crct10dif_pclmul videobuf2_v4l2 videobuf2_common hid_generic
[  987.709985]  videodev arc4 usbmouse crc32_pclmul usbhid iwldvm media 
crc32c_intel hid ghash_clmulni_intel pcbc mac80211 msr aesni_intel aes_x86_64 
crypto_simd cryptd glue_helper iwlwifi kvmgt vfio_mdev mdev vfio_iommu_type1 
vfio intel_cstate snd_hda_intel snd_hda_codec gpio_ich sdhci_pci iTCO_wdt kvm 
iTCO_vendor_support snd_hda_core cqhci cfg80211 intel_uncore intel_rapl_perf 
snd_hwdep sdhci snd_pcm irqbypass joydev mei_me mmc_core mei toshiba_acpi evdev 
sparse_keymap e1000e mousedev snd_timer input_leds mac_hid sr_mod snd 
uio_pdrv_genirq lpc_ich uio rfkill wmi serio_raw led_class industrialio cdrom 
thermal video soundcore button pcc_cpufreq ac pcspkr toshiba_haps battery sg 
dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua autofs4
[  987.721976] CPU: 2 PID: 3217 Comm: rmmod Not tainted 4.19.0-rc7mod #2
[  987.723150] Hardware name: TOSHIBA PORTEGE R835/Portable PC, BIOS Version 
4.10   01/08/2013
[  987.724583] RIP: 0010:ubifs_exit+0xf/0x488 [ubifs]
[  987.725377] Code: c3 8b 97 24 0a 00 00 6a 00 45 31 c9 41 b8 10 00 00 00 31 
c9 e8 3a 32 ff ff 5a c3 48 8b 05 b1 e6 01 00 48 3d 30 62 fb c0 75 02 <0f> 0b 48 
8b 05 f0 3b 02 00 48 85 c0 75 02 0f 0b e8 44 b9 ff ff e8
[  987.728689] RSP: 0018:a3a18286fed0 EFLAGS: 00010246
[  987.729608] RAX: c0fb6230 RBX: c0fbb400 RCX: 0001
[  987.730907] RDX:  RSI: 0001 RDI: 0246
[  987.732211] RBP:  R08: 0001 R09: 0001
[  987.733512] R10:  R11:  R12: 0800
[  987.734807] R13:  R14:  R15: 
[  987.736103] FS:  7fd2ded67b80() GS:88664ae0() 
knlGS:
[  987.737527] CS:  0010 DS:  ES:  CR0: 80050033
[  987.738571] CR2: 55b4d9cd4ed8 CR3: 0001352b4004 CR4: 000606e0
[  987.739869] Call Trace:
[  987.740292]  __x64_sys_delete_module+0x157/0x240
[  987.741086]  ? task_work_run+0x6e/0xb0
[  987.741757]  do_syscall_64+0x60/0x180
[  987.742426]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  987.743346] RIP: 0033:0x7fd2de453f87
[  987.744013] Code: 73 01 c3 48 8b 0d 11 af 2b 00 f7 d8 64 89 01 48 83 c8 ff 
c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 b8 b0 00 00 00 0f 05 <48> 3d 01 
f0 ff ff 73 01 c3 48 8b 0d e1 ae 2b 00 f7 d8 64 89 01 48
[  987.747326] RSP: 002b:7ffe66c1a6d8 EFLAGS: 0206 ORIG_RAX: 
00b0
[  987.748624] RAX: ffda RBX: 7ffe66c1a738 RCX: 7fd2de453f87
[  987.749919] RDX: 000a RSI: 0800 RDI: 55b4d9cca7e8
[  987.751215] RBP: 55b4d9cca780 R08: 7ffe66c19651 R09: 
[  987.752512] R10: 7fd2de4c31e0 R11: 0206 R12: 7ffe66c1a900
[  987.753808] R13: 7ffe66c1c256 R14: 55b4d9cca260 R15: 55b4d9cca780
[  987.755111] ---[ end trace 512f0dc2470230e0 ]---
[  987.755997] WARNING: CPU: 2 PID: 3217 at ../fs/ubifs/super.c:2341 
ubifs_exit+0x1d/0x488 [ubifs]
[  987.756383] systemd-journald[142]: Compressed data object 804 -> 577 using 
LZ4
[  987.757553] Modules linked in: ubifs(-) ubi mtd fuse xt_tcpudp ip6t_rpfilter 
ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_conntrack ip_set 
nfnetlink af_packet ebtable_nat ebtable_broute bridge stp llc ip6table_nat 
nf_nat_ipv6 

Re: [PATCH] docs: Fix typos in histogram.rst

2018-10-12 Thread Randy Dunlap
On 10/12/18 6:30 PM, Masanari Iida wrote:
> This patch fixes some spelling typos.
> 
> Signed-off-by: Masanari Iida 
> ---
>  Documentation/trace/histogram.rst | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)

Acked-by: Randy Dunlap 

thanks.
-- 
~Randy


Re: [PATCH] docs: Fix typos in histogram.rst

2018-10-12 Thread Randy Dunlap
On 10/12/18 6:30 PM, Masanari Iida wrote:
> This patch fixes some spelling typos.
> 
> Signed-off-by: Masanari Iida 
> ---
>  Documentation/trace/histogram.rst | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)

Acked-by: Randy Dunlap 

thanks.
-- 
~Randy


[PATCH] docs: Fix typos in histogram.rst

2018-10-12 Thread Masanari Iida
This patch fixes some spelling typos.

Signed-off-by: Masanari Iida 
---
 Documentation/trace/histogram.rst | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/Documentation/trace/histogram.rst 
b/Documentation/trace/histogram.rst
index 5ac724baea7d..7dda76503127 100644
--- a/Documentation/trace/histogram.rst
+++ b/Documentation/trace/histogram.rst
@@ -1765,7 +1765,7 @@ For example, here's how a latency can be calculated::
   # echo 'hist:keys=pid,prio:ts0=common_timestamp ...' >> event1/trigger
   # echo 'hist:keys=next_pid:wakeup_lat=common_timestamp-$ts0 ...' >> 
event2/trigger
 
-In the first line above, the event's timetamp is saved into the
+In the first line above, the event's timestamp is saved into the
 variable ts0.  In the next line, ts0 is subtracted from the second
 event's timestamp to produce the latency, which is then assigned into
 yet another variable, 'wakeup_lat'.  The hist trigger below in turn
@@ -1811,7 +1811,7 @@ the command that defined it with a '!'::
 /sys/kernel/debug/tracing/synthetic_events
 
 At this point, there isn't yet an actual 'wakeup_latency' event
-instantiated in the event subsytem - for this to happen, a 'hist
+instantiated in the event subsystem - for this to happen, a 'hist
 trigger action' needs to be instantiated and bound to actual fields
 and variables defined on other events (see Section 2.2.3 below on
 how that is done using hist trigger 'onmatch' action). Once that is
@@ -1837,7 +1837,7 @@ output can be displayed by reading the event's 'hist' 
file.
 A hist trigger 'action' is a function that's executed whenever a
 histogram entry is added or updated.
 
-The default 'action' if no special function is explicity specified is
+The default 'action' if no special function is explicitly specified is
 as it always has been, to simply update the set of values associated
 with an entry.  Some applications, however, may want to perform
 additional actions at that point, such as generate another event, or
-- 
2.19.1.328.g5a0cc8aca797



[PATCH] docs: Fix typos in histogram.rst

2018-10-12 Thread Masanari Iida
This patch fixes some spelling typos.

Signed-off-by: Masanari Iida 
---
 Documentation/trace/histogram.rst | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/Documentation/trace/histogram.rst 
b/Documentation/trace/histogram.rst
index 5ac724baea7d..7dda76503127 100644
--- a/Documentation/trace/histogram.rst
+++ b/Documentation/trace/histogram.rst
@@ -1765,7 +1765,7 @@ For example, here's how a latency can be calculated::
   # echo 'hist:keys=pid,prio:ts0=common_timestamp ...' >> event1/trigger
   # echo 'hist:keys=next_pid:wakeup_lat=common_timestamp-$ts0 ...' >> 
event2/trigger
 
-In the first line above, the event's timetamp is saved into the
+In the first line above, the event's timestamp is saved into the
 variable ts0.  In the next line, ts0 is subtracted from the second
 event's timestamp to produce the latency, which is then assigned into
 yet another variable, 'wakeup_lat'.  The hist trigger below in turn
@@ -1811,7 +1811,7 @@ the command that defined it with a '!'::
 /sys/kernel/debug/tracing/synthetic_events
 
 At this point, there isn't yet an actual 'wakeup_latency' event
-instantiated in the event subsytem - for this to happen, a 'hist
+instantiated in the event subsystem - for this to happen, a 'hist
 trigger action' needs to be instantiated and bound to actual fields
 and variables defined on other events (see Section 2.2.3 below on
 how that is done using hist trigger 'onmatch' action). Once that is
@@ -1837,7 +1837,7 @@ output can be displayed by reading the event's 'hist' 
file.
 A hist trigger 'action' is a function that's executed whenever a
 histogram entry is added or updated.
 
-The default 'action' if no special function is explicity specified is
+The default 'action' if no special function is explicitly specified is
 as it always has been, to simply update the set of values associated
 with an entry.  Some applications, however, may want to perform
 additional actions at that point, such as generate another event, or
-- 
2.19.1.328.g5a0cc8aca797



[PATCH] x86/boot: Add -Wno-pointer-sign to KBUILD_CFLAGS

2018-10-12 Thread Nathan Chancellor
When compiling the kernel with Clang, this warning appears even though
it is disabled for the whole kernel because this folder has its own set
of KBUILD_CFLAGS. It was disabled before the beginning of git history.

In file included from arch/x86/boot/compressed/kaslr.c:29:
In file included from arch/x86/boot/compressed/misc.h:21:
In file included from ./include/linux/elf.h:5:
In file included from ./arch/x86/include/asm/elf.h:77:
In file included from ./arch/x86/include/asm/vdso.h:11:
In file included from ./include/linux/mm_types.h:9:
In file included from ./include/linux/spinlock.h:88:
In file included from ./arch/x86/include/asm/spinlock.h:43:
In file included from ./arch/x86/include/asm/qrwlock.h:6:
./include/asm-generic/qrwlock.h:101:53: warning: passing 'u32 *' (aka
'unsigned int *') to parameter of type 'int *' converts between pointers
to integer types with different sign [-Wpointer-sign]
if (likely(atomic_try_cmpxchg_acquire(>cnts, , _QW_LOCKED)))
   ^
./include/linux/compiler.h:76:40: note: expanded from macro 'likely'
# define likely(x)  __builtin_expect(!!(x), 1)
^
./include/asm-generic/atomic-instrumented.h:69:66: note: passing
argument to parameter 'old' here
static __always_inline bool atomic_try_cmpxchg(atomic_t *v, int *old, int new)
 ^

Signed-off-by: Nathan Chancellor 
---

There has been some discussion around whether KBUILD_CFLAGS should be
overwritten in the folder (few examples linked below). In the meantime,
there is precedent for taking top level disabled warnings and adding
them here, thus this patch.

* 
https://lore.kernel.org/lkml/CAKwvOdmXcztP542kADhyJYN2=Fk3qyXif_MCs=kqpge8qtt...@mail.gmail.com/
* https://lore.kernel.org/lkml/56442061-7f55-878d-5b26-7cdd14e90...@zytor.com/

 arch/x86/boot/compressed/Makefile | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/boot/compressed/Makefile 
b/arch/x86/boot/compressed/Makefile
index 28764dacf018..466f66c8a7f8 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -37,6 +37,7 @@ KBUILD_CFLAGS += $(call cc-option,-ffreestanding)
 KBUILD_CFLAGS += $(call cc-option,-fno-stack-protector)
 KBUILD_CFLAGS += $(call cc-disable-warning, address-of-packed-member)
 KBUILD_CFLAGS += $(call cc-disable-warning, gnu)
+KBUILD_CFLAGS += -Wno-pointer-sign
 
 KBUILD_AFLAGS  := $(KBUILD_CFLAGS) -D__ASSEMBLY__
 GCOV_PROFILE := n
-- 
2.19.1



[PATCH 1/6] zorro_esp: Limit DMA transfers to 65535 bytes

2018-10-12 Thread Finn Thain
The core driver, esp_scsi, does not use the ESP_CONFIG2_FENAB bit, so
the chip's Transfer Counter register is only 16 bits wide (not 24).
A larger transfer cannot work and will theoretically result in a failed
command and a "DMA length is zero" error.

Fixes: 3109e5ae0311
Signed-off-by: Finn Thain 
Cc: Michael Schmitz 
---
 drivers/scsi/zorro_esp.c | 8 +---
 1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/drivers/scsi/zorro_esp.c b/drivers/scsi/zorro_esp.c
index bb70882e6b56..be79127db594 100644
--- a/drivers/scsi/zorro_esp.c
+++ b/drivers/scsi/zorro_esp.c
@@ -245,7 +245,7 @@ static int fastlane_esp_irq_pending(struct esp *esp)
 static u32 zorro_esp_dma_length_limit(struct esp *esp, u32 dma_addr,
u32 dma_len)
 {
-   return dma_len > 0xFF ? 0xFF : dma_len;
+   return dma_len > 0x ? 0x : dma_len;
 }
 
 static void zorro_esp_reset_dma(struct esp *esp)
@@ -484,7 +484,6 @@ static void zorro_esp_send_blz1230_dma_cmd(struct esp *esp, 
u32 addr,
scsi_esp_cmd(esp, ESP_CMD_DMA);
zorro_esp_write8(esp, (esp_count >> 0) & 0xff, ESP_TCLOW);
zorro_esp_write8(esp, (esp_count >> 8) & 0xff, ESP_TCMED);
-   zorro_esp_write8(esp, (esp_count >> 16) & 0xff, ESP_TCHI);
 
scsi_esp_cmd(esp, cmd);
 }
@@ -529,7 +528,6 @@ static void zorro_esp_send_blz1230II_dma_cmd(struct esp 
*esp, u32 addr,
scsi_esp_cmd(esp, ESP_CMD_DMA);
zorro_esp_write8(esp, (esp_count >> 0) & 0xff, ESP_TCLOW);
zorro_esp_write8(esp, (esp_count >> 8) & 0xff, ESP_TCMED);
-   zorro_esp_write8(esp, (esp_count >> 16) & 0xff, ESP_TCHI);
 
scsi_esp_cmd(esp, cmd);
 }
@@ -574,7 +572,6 @@ static void zorro_esp_send_blz2060_dma_cmd(struct esp *esp, 
u32 addr,
scsi_esp_cmd(esp, ESP_CMD_DMA);
zorro_esp_write8(esp, (esp_count >> 0) & 0xff, ESP_TCLOW);
zorro_esp_write8(esp, (esp_count >> 8) & 0xff, ESP_TCMED);
-   zorro_esp_write8(esp, (esp_count >> 16) & 0xff, ESP_TCHI);
 
scsi_esp_cmd(esp, cmd);
 }
@@ -599,7 +596,6 @@ static void zorro_esp_send_cyber_dma_cmd(struct esp *esp, 
u32 addr,
 
zorro_esp_write8(esp, (esp_count >> 0) & 0xff, ESP_TCLOW);
zorro_esp_write8(esp, (esp_count >> 8) & 0xff, ESP_TCMED);
-   zorro_esp_write8(esp, (esp_count >> 16) & 0xff, ESP_TCHI);
 
if (write) {
/* DMA receive */
@@ -649,7 +645,6 @@ static void zorro_esp_send_cyberII_dma_cmd(struct esp *esp, 
u32 addr,
 
zorro_esp_write8(esp, (esp_count >> 0) & 0xff, ESP_TCLOW);
zorro_esp_write8(esp, (esp_count >> 8) & 0xff, ESP_TCMED);
-   zorro_esp_write8(esp, (esp_count >> 16) & 0xff, ESP_TCHI);
 
if (write) {
/* DMA receive */
@@ -691,7 +686,6 @@ static void zorro_esp_send_fastlane_dma_cmd(struct esp 
*esp, u32 addr,
 
zorro_esp_write8(esp, (esp_count >> 0) & 0xff, ESP_TCLOW);
zorro_esp_write8(esp, (esp_count >> 8) & 0xff, ESP_TCMED);
-   zorro_esp_write8(esp, (esp_count >> 16) & 0xff, ESP_TCHI);
 
if (write) {
/* DMA receive */
-- 
2.16.4



[PATCH 2/6] esp_scsi: Track residual for PIO transfers

2018-10-12 Thread Finn Thain
If a target disconnects during a PIO data transfer the command may
fail when the target reconnects:

scsi host1: DMA length is zero!
scsi host1: cur adr[0438] len[]

The scsi bus is then reset. This happens because the residual reached
zero before the transfer was completed.

The usual residual calculation relies on the Transfer Count registers
which works for DMA transfers but not for PIO transfers. Fix the problem
by storing the PIO transfer residual and using that to correctly
calculate bytes_sent.

Fixes: 6fe07aaffbf0
Tested-by: Stan Johnson 
Signed-off-by: Finn Thain 
---
 drivers/scsi/esp_scsi.c | 1 +
 drivers/scsi/esp_scsi.h | 2 ++
 drivers/scsi/mac_esp.c  | 2 ++
 3 files changed, 5 insertions(+)

diff --git a/drivers/scsi/esp_scsi.c b/drivers/scsi/esp_scsi.c
index c3fc34b9964d..9e5d3f7d29ae 100644
--- a/drivers/scsi/esp_scsi.c
+++ b/drivers/scsi/esp_scsi.c
@@ -1338,6 +1338,7 @@ static int esp_data_bytes_sent(struct esp *esp, struct 
esp_cmd_entry *ent,
 
bytes_sent = esp->data_dma_len;
bytes_sent -= ecount;
+   bytes_sent -= esp->send_cmd_residual;
 
/*
 * The am53c974 has a DMA 'pecularity'. The doc states:
diff --git a/drivers/scsi/esp_scsi.h b/drivers/scsi/esp_scsi.h
index 8163dca2071b..db4b6ea94caa 100644
--- a/drivers/scsi/esp_scsi.h
+++ b/drivers/scsi/esp_scsi.h
@@ -540,6 +540,8 @@ struct esp {
 
void*dma;
int dmarev;
+
+   int send_cmd_residual;
 };
 
 /* A front-end driver for the ESP chip should do the following in
diff --git a/drivers/scsi/mac_esp.c b/drivers/scsi/mac_esp.c
index eb551f3cc471..71879f2207e0 100644
--- a/drivers/scsi/mac_esp.c
+++ b/drivers/scsi/mac_esp.c
@@ -427,6 +427,8 @@ static void mac_esp_send_pio_cmd(struct esp *esp, u32 addr, 
u32 esp_count,
scsi_esp_cmd(esp, ESP_CMD_TI);
}
}
+
+   esp->send_cmd_residual = esp_count;
 }
 
 static int mac_esp_irq_pending(struct esp *esp)
-- 
2.16.4



RE: [PATCH] MAINTAINERS: Replace Vince Bridgers as Altera TSE maintainer

2018-10-12 Thread Bridgers, Vince


-Original Message-
From: Thor Thayer  
Sent: Friday, October 12, 2018 3:49 PM
To: Greg KH 
Cc: geert+rene...@glider.be; jho...@kernel.org; at...@kernel.org; 
bhelg...@google.com; james.hart...@sondrel.com; linux-kernel@vger.kernel.org; 
Bridgers, Vince 
Subject: Re: [PATCH] MAINTAINERS: Replace Vince Bridgers as Altera TSE 
maintainer

On 10/12/2018 11:57 AM, Greg KH wrote:
> On Fri, Oct 12, 2018 at 11:50:52AM -0500, thor.tha...@linux.intel.com wrote:
>> From: Thor Thayer 
>>
>> Vince has moved to a different role. Replace him as Altera TSE 
>> maintainer.
>>
>> Signed-off-by: Thor Thayer 
> 
> Would be nice if Vince can ack this...
> 
+ Vince (at a new email address)

Thanks Thor, 

Acked-by: Vince Bridgers  


[PATCH 5/6] esp_scsi: De-duplicate PIO routines

2018-10-12 Thread Finn Thain
As a temporary measure, the code to implement PIO transfers was
duplicated in zorro_esp and mac_esp. Now that this code has stabilized,
move it into the core driver to eliminate the duplication.

This replaces the inline assembler with more portable writesb() calls.
Optimizing the m68k writesb() implementation is a separate patch.

Tested-by: Stan Johnson 
Signed-off-by: Finn Thain 
---
 drivers/scsi/esp_scsi.c  | 126 +
 drivers/scsi/esp_scsi.h  |   5 +
 drivers/scsi/mac_esp.c   | 173 ++-
 drivers/scsi/zorro_esp.c | 232 +++
 4 files changed, 171 insertions(+), 365 deletions(-)

diff --git a/drivers/scsi/esp_scsi.c b/drivers/scsi/esp_scsi.c
index 6ccaf818357e..646701fc22a4 100644
--- a/drivers/scsi/esp_scsi.c
+++ b/drivers/scsi/esp_scsi.c
@@ -2776,3 +2776,129 @@ MODULE_PARM_DESC(esp_debug,
 
 module_init(esp_init);
 module_exit(esp_exit);
+
+#if IS_ENABLED(CONFIG_SCSI_MAC_ESP) || IS_ENABLED(CONFIG_SCSI_ZORRO_ESP)
+static inline unsigned int esp_wait_for_fifo(struct esp *esp)
+{
+   int i = 50;
+
+   do {
+   unsigned int fbytes = esp_read8(ESP_FFLAGS) & ESP_FF_FBYTES;
+
+   if (fbytes)
+   return fbytes;
+
+   udelay(2);
+   } while (--i);
+
+   pr_err("FIFO is empty (sreg %02x)\n", esp_read8(ESP_STATUS));
+   return 0;
+}
+
+static inline int esp_wait_for_intr(struct esp *esp)
+{
+   int i = 50;
+
+   do {
+   esp->sreg = esp_read8(ESP_STATUS);
+   if (esp->sreg & ESP_STAT_INTR)
+   return 0;
+
+   udelay(2);
+   } while (--i);
+
+   pr_err("IRQ timeout (sreg %02x)\n", esp->sreg);
+   return 1;
+}
+
+#define ESP_FIFO_SIZE 16
+
+void esp_send_pio_cmd(struct esp *esp, u32 addr, u32 esp_count,
+ u32 dma_count, int write, u8 cmd)
+{
+   u8 phase = esp->sreg & ESP_STAT_PMASK;
+
+   cmd &= ~ESP_CMD_DMA;
+   esp->send_cmd_error = 0;
+
+   if (write) {
+   u8 *dst = (u8 *)addr;
+   u8 mask = ~(phase == ESP_MIP ? ESP_INTR_FDONE : ESP_INTR_BSERV);
+
+   scsi_esp_cmd(esp, cmd);
+
+   while (1) {
+   if (!esp_wait_for_fifo(esp))
+   break;
+
+   *dst++ = esp_read8(ESP_FDATA);
+   --esp_count;
+
+   if (!esp_count)
+   break;
+
+   if (esp_wait_for_intr(esp)) {
+   esp->send_cmd_error = 1;
+   break;
+   }
+
+   if ((esp->sreg & ESP_STAT_PMASK) != phase)
+   break;
+
+   esp->ireg = esp_read8(ESP_INTRPT);
+   if (esp->ireg & mask) {
+   esp->send_cmd_error = 1;
+   break;
+   }
+
+   if (phase == ESP_MIP)
+   scsi_esp_cmd(esp, ESP_CMD_MOK);
+
+   scsi_esp_cmd(esp, ESP_CMD_TI);
+   }
+   } else {
+   unsigned int n = ESP_FIFO_SIZE;
+   u8 *src = (u8 *)addr;
+
+   scsi_esp_cmd(esp, ESP_CMD_FLUSH);
+
+   if (n > esp_count)
+   n = esp_count;
+   writesb(esp->fifo_reg, src, n);
+   src += n;
+   esp_count -= n;
+
+   scsi_esp_cmd(esp, cmd);
+
+   while (esp_count) {
+   if (esp_wait_for_intr(esp)) {
+   esp->send_cmd_error = 1;
+   break;
+   }
+
+   if ((esp->sreg & ESP_STAT_PMASK) != phase)
+   break;
+
+   esp->ireg = esp_read8(ESP_INTRPT);
+   if (esp->ireg & ~ESP_INTR_BSERV) {
+   esp->send_cmd_error = 1;
+   break;
+   }
+
+   n = ESP_FIFO_SIZE -
+   (esp_read8(ESP_FFLAGS) & ESP_FF_FBYTES);
+
+   if (n > esp_count)
+   n = esp_count;
+   writesb(esp->fifo_reg, src, n);
+   src += n;
+   esp_count -= n;
+
+   scsi_esp_cmd(esp, ESP_CMD_TI);
+   }
+   }
+
+   esp->send_cmd_residual = esp_count;
+}
+EXPORT_SYMBOL(esp_send_pio_cmd);
+#endif
diff --git a/drivers/scsi/esp_scsi.h b/drivers/scsi/esp_scsi.h
index d0c032803749..2590e5eda595 100644
--- a/drivers/scsi/esp_scsi.h
+++ b/drivers/scsi/esp_scsi.h
@@ -431,6 +431,7 @@ struct esp_driver_ops {
 struct esp {
void __iomem*regs;
void __iomem*dma_regs;
+  

[PATCH] x86/boot: Add -Wno-pointer-sign to KBUILD_CFLAGS

2018-10-12 Thread Nathan Chancellor
When compiling the kernel with Clang, this warning appears even though
it is disabled for the whole kernel because this folder has its own set
of KBUILD_CFLAGS. It was disabled before the beginning of git history.

In file included from arch/x86/boot/compressed/kaslr.c:29:
In file included from arch/x86/boot/compressed/misc.h:21:
In file included from ./include/linux/elf.h:5:
In file included from ./arch/x86/include/asm/elf.h:77:
In file included from ./arch/x86/include/asm/vdso.h:11:
In file included from ./include/linux/mm_types.h:9:
In file included from ./include/linux/spinlock.h:88:
In file included from ./arch/x86/include/asm/spinlock.h:43:
In file included from ./arch/x86/include/asm/qrwlock.h:6:
./include/asm-generic/qrwlock.h:101:53: warning: passing 'u32 *' (aka
'unsigned int *') to parameter of type 'int *' converts between pointers
to integer types with different sign [-Wpointer-sign]
if (likely(atomic_try_cmpxchg_acquire(>cnts, , _QW_LOCKED)))
   ^
./include/linux/compiler.h:76:40: note: expanded from macro 'likely'
# define likely(x)  __builtin_expect(!!(x), 1)
^
./include/asm-generic/atomic-instrumented.h:69:66: note: passing
argument to parameter 'old' here
static __always_inline bool atomic_try_cmpxchg(atomic_t *v, int *old, int new)
 ^

Signed-off-by: Nathan Chancellor 
---

There has been some discussion around whether KBUILD_CFLAGS should be
overwritten in the folder (few examples linked below). In the meantime,
there is precedent for taking top level disabled warnings and adding
them here, thus this patch.

* 
https://lore.kernel.org/lkml/CAKwvOdmXcztP542kADhyJYN2=Fk3qyXif_MCs=kqpge8qtt...@mail.gmail.com/
* https://lore.kernel.org/lkml/56442061-7f55-878d-5b26-7cdd14e90...@zytor.com/

 arch/x86/boot/compressed/Makefile | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/boot/compressed/Makefile 
b/arch/x86/boot/compressed/Makefile
index 28764dacf018..466f66c8a7f8 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -37,6 +37,7 @@ KBUILD_CFLAGS += $(call cc-option,-ffreestanding)
 KBUILD_CFLAGS += $(call cc-option,-fno-stack-protector)
 KBUILD_CFLAGS += $(call cc-disable-warning, address-of-packed-member)
 KBUILD_CFLAGS += $(call cc-disable-warning, gnu)
+KBUILD_CFLAGS += -Wno-pointer-sign
 
 KBUILD_AFLAGS  := $(KBUILD_CFLAGS) -D__ASSEMBLY__
 GCOV_PROFILE := n
-- 
2.19.1



[PATCH 1/6] zorro_esp: Limit DMA transfers to 65535 bytes

2018-10-12 Thread Finn Thain
The core driver, esp_scsi, does not use the ESP_CONFIG2_FENAB bit, so
the chip's Transfer Counter register is only 16 bits wide (not 24).
A larger transfer cannot work and will theoretically result in a failed
command and a "DMA length is zero" error.

Fixes: 3109e5ae0311
Signed-off-by: Finn Thain 
Cc: Michael Schmitz 
---
 drivers/scsi/zorro_esp.c | 8 +---
 1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/drivers/scsi/zorro_esp.c b/drivers/scsi/zorro_esp.c
index bb70882e6b56..be79127db594 100644
--- a/drivers/scsi/zorro_esp.c
+++ b/drivers/scsi/zorro_esp.c
@@ -245,7 +245,7 @@ static int fastlane_esp_irq_pending(struct esp *esp)
 static u32 zorro_esp_dma_length_limit(struct esp *esp, u32 dma_addr,
u32 dma_len)
 {
-   return dma_len > 0xFF ? 0xFF : dma_len;
+   return dma_len > 0x ? 0x : dma_len;
 }
 
 static void zorro_esp_reset_dma(struct esp *esp)
@@ -484,7 +484,6 @@ static void zorro_esp_send_blz1230_dma_cmd(struct esp *esp, 
u32 addr,
scsi_esp_cmd(esp, ESP_CMD_DMA);
zorro_esp_write8(esp, (esp_count >> 0) & 0xff, ESP_TCLOW);
zorro_esp_write8(esp, (esp_count >> 8) & 0xff, ESP_TCMED);
-   zorro_esp_write8(esp, (esp_count >> 16) & 0xff, ESP_TCHI);
 
scsi_esp_cmd(esp, cmd);
 }
@@ -529,7 +528,6 @@ static void zorro_esp_send_blz1230II_dma_cmd(struct esp 
*esp, u32 addr,
scsi_esp_cmd(esp, ESP_CMD_DMA);
zorro_esp_write8(esp, (esp_count >> 0) & 0xff, ESP_TCLOW);
zorro_esp_write8(esp, (esp_count >> 8) & 0xff, ESP_TCMED);
-   zorro_esp_write8(esp, (esp_count >> 16) & 0xff, ESP_TCHI);
 
scsi_esp_cmd(esp, cmd);
 }
@@ -574,7 +572,6 @@ static void zorro_esp_send_blz2060_dma_cmd(struct esp *esp, 
u32 addr,
scsi_esp_cmd(esp, ESP_CMD_DMA);
zorro_esp_write8(esp, (esp_count >> 0) & 0xff, ESP_TCLOW);
zorro_esp_write8(esp, (esp_count >> 8) & 0xff, ESP_TCMED);
-   zorro_esp_write8(esp, (esp_count >> 16) & 0xff, ESP_TCHI);
 
scsi_esp_cmd(esp, cmd);
 }
@@ -599,7 +596,6 @@ static void zorro_esp_send_cyber_dma_cmd(struct esp *esp, 
u32 addr,
 
zorro_esp_write8(esp, (esp_count >> 0) & 0xff, ESP_TCLOW);
zorro_esp_write8(esp, (esp_count >> 8) & 0xff, ESP_TCMED);
-   zorro_esp_write8(esp, (esp_count >> 16) & 0xff, ESP_TCHI);
 
if (write) {
/* DMA receive */
@@ -649,7 +645,6 @@ static void zorro_esp_send_cyberII_dma_cmd(struct esp *esp, 
u32 addr,
 
zorro_esp_write8(esp, (esp_count >> 0) & 0xff, ESP_TCLOW);
zorro_esp_write8(esp, (esp_count >> 8) & 0xff, ESP_TCMED);
-   zorro_esp_write8(esp, (esp_count >> 16) & 0xff, ESP_TCHI);
 
if (write) {
/* DMA receive */
@@ -691,7 +686,6 @@ static void zorro_esp_send_fastlane_dma_cmd(struct esp 
*esp, u32 addr,
 
zorro_esp_write8(esp, (esp_count >> 0) & 0xff, ESP_TCLOW);
zorro_esp_write8(esp, (esp_count >> 8) & 0xff, ESP_TCMED);
-   zorro_esp_write8(esp, (esp_count >> 16) & 0xff, ESP_TCHI);
 
if (write) {
/* DMA receive */
-- 
2.16.4



[PATCH 2/6] esp_scsi: Track residual for PIO transfers

2018-10-12 Thread Finn Thain
If a target disconnects during a PIO data transfer the command may
fail when the target reconnects:

scsi host1: DMA length is zero!
scsi host1: cur adr[0438] len[]

The scsi bus is then reset. This happens because the residual reached
zero before the transfer was completed.

The usual residual calculation relies on the Transfer Count registers
which works for DMA transfers but not for PIO transfers. Fix the problem
by storing the PIO transfer residual and using that to correctly
calculate bytes_sent.

Fixes: 6fe07aaffbf0
Tested-by: Stan Johnson 
Signed-off-by: Finn Thain 
---
 drivers/scsi/esp_scsi.c | 1 +
 drivers/scsi/esp_scsi.h | 2 ++
 drivers/scsi/mac_esp.c  | 2 ++
 3 files changed, 5 insertions(+)

diff --git a/drivers/scsi/esp_scsi.c b/drivers/scsi/esp_scsi.c
index c3fc34b9964d..9e5d3f7d29ae 100644
--- a/drivers/scsi/esp_scsi.c
+++ b/drivers/scsi/esp_scsi.c
@@ -1338,6 +1338,7 @@ static int esp_data_bytes_sent(struct esp *esp, struct 
esp_cmd_entry *ent,
 
bytes_sent = esp->data_dma_len;
bytes_sent -= ecount;
+   bytes_sent -= esp->send_cmd_residual;
 
/*
 * The am53c974 has a DMA 'pecularity'. The doc states:
diff --git a/drivers/scsi/esp_scsi.h b/drivers/scsi/esp_scsi.h
index 8163dca2071b..db4b6ea94caa 100644
--- a/drivers/scsi/esp_scsi.h
+++ b/drivers/scsi/esp_scsi.h
@@ -540,6 +540,8 @@ struct esp {
 
void*dma;
int dmarev;
+
+   int send_cmd_residual;
 };
 
 /* A front-end driver for the ESP chip should do the following in
diff --git a/drivers/scsi/mac_esp.c b/drivers/scsi/mac_esp.c
index eb551f3cc471..71879f2207e0 100644
--- a/drivers/scsi/mac_esp.c
+++ b/drivers/scsi/mac_esp.c
@@ -427,6 +427,8 @@ static void mac_esp_send_pio_cmd(struct esp *esp, u32 addr, 
u32 esp_count,
scsi_esp_cmd(esp, ESP_CMD_TI);
}
}
+
+   esp->send_cmd_residual = esp_count;
 }
 
 static int mac_esp_irq_pending(struct esp *esp)
-- 
2.16.4



RE: [PATCH] MAINTAINERS: Replace Vince Bridgers as Altera TSE maintainer

2018-10-12 Thread Bridgers, Vince


-Original Message-
From: Thor Thayer  
Sent: Friday, October 12, 2018 3:49 PM
To: Greg KH 
Cc: geert+rene...@glider.be; jho...@kernel.org; at...@kernel.org; 
bhelg...@google.com; james.hart...@sondrel.com; linux-kernel@vger.kernel.org; 
Bridgers, Vince 
Subject: Re: [PATCH] MAINTAINERS: Replace Vince Bridgers as Altera TSE 
maintainer

On 10/12/2018 11:57 AM, Greg KH wrote:
> On Fri, Oct 12, 2018 at 11:50:52AM -0500, thor.tha...@linux.intel.com wrote:
>> From: Thor Thayer 
>>
>> Vince has moved to a different role. Replace him as Altera TSE 
>> maintainer.
>>
>> Signed-off-by: Thor Thayer 
> 
> Would be nice if Vince can ack this...
> 
+ Vince (at a new email address)

Thanks Thor, 

Acked-by: Vince Bridgers  


[PATCH 5/6] esp_scsi: De-duplicate PIO routines

2018-10-12 Thread Finn Thain
As a temporary measure, the code to implement PIO transfers was
duplicated in zorro_esp and mac_esp. Now that this code has stabilized,
move it into the core driver to eliminate the duplication.

This replaces the inline assembler with more portable writesb() calls.
Optimizing the m68k writesb() implementation is a separate patch.

Tested-by: Stan Johnson 
Signed-off-by: Finn Thain 
---
 drivers/scsi/esp_scsi.c  | 126 +
 drivers/scsi/esp_scsi.h  |   5 +
 drivers/scsi/mac_esp.c   | 173 ++-
 drivers/scsi/zorro_esp.c | 232 +++
 4 files changed, 171 insertions(+), 365 deletions(-)

diff --git a/drivers/scsi/esp_scsi.c b/drivers/scsi/esp_scsi.c
index 6ccaf818357e..646701fc22a4 100644
--- a/drivers/scsi/esp_scsi.c
+++ b/drivers/scsi/esp_scsi.c
@@ -2776,3 +2776,129 @@ MODULE_PARM_DESC(esp_debug,
 
 module_init(esp_init);
 module_exit(esp_exit);
+
+#if IS_ENABLED(CONFIG_SCSI_MAC_ESP) || IS_ENABLED(CONFIG_SCSI_ZORRO_ESP)
+static inline unsigned int esp_wait_for_fifo(struct esp *esp)
+{
+   int i = 50;
+
+   do {
+   unsigned int fbytes = esp_read8(ESP_FFLAGS) & ESP_FF_FBYTES;
+
+   if (fbytes)
+   return fbytes;
+
+   udelay(2);
+   } while (--i);
+
+   pr_err("FIFO is empty (sreg %02x)\n", esp_read8(ESP_STATUS));
+   return 0;
+}
+
+static inline int esp_wait_for_intr(struct esp *esp)
+{
+   int i = 50;
+
+   do {
+   esp->sreg = esp_read8(ESP_STATUS);
+   if (esp->sreg & ESP_STAT_INTR)
+   return 0;
+
+   udelay(2);
+   } while (--i);
+
+   pr_err("IRQ timeout (sreg %02x)\n", esp->sreg);
+   return 1;
+}
+
+#define ESP_FIFO_SIZE 16
+
+void esp_send_pio_cmd(struct esp *esp, u32 addr, u32 esp_count,
+ u32 dma_count, int write, u8 cmd)
+{
+   u8 phase = esp->sreg & ESP_STAT_PMASK;
+
+   cmd &= ~ESP_CMD_DMA;
+   esp->send_cmd_error = 0;
+
+   if (write) {
+   u8 *dst = (u8 *)addr;
+   u8 mask = ~(phase == ESP_MIP ? ESP_INTR_FDONE : ESP_INTR_BSERV);
+
+   scsi_esp_cmd(esp, cmd);
+
+   while (1) {
+   if (!esp_wait_for_fifo(esp))
+   break;
+
+   *dst++ = esp_read8(ESP_FDATA);
+   --esp_count;
+
+   if (!esp_count)
+   break;
+
+   if (esp_wait_for_intr(esp)) {
+   esp->send_cmd_error = 1;
+   break;
+   }
+
+   if ((esp->sreg & ESP_STAT_PMASK) != phase)
+   break;
+
+   esp->ireg = esp_read8(ESP_INTRPT);
+   if (esp->ireg & mask) {
+   esp->send_cmd_error = 1;
+   break;
+   }
+
+   if (phase == ESP_MIP)
+   scsi_esp_cmd(esp, ESP_CMD_MOK);
+
+   scsi_esp_cmd(esp, ESP_CMD_TI);
+   }
+   } else {
+   unsigned int n = ESP_FIFO_SIZE;
+   u8 *src = (u8 *)addr;
+
+   scsi_esp_cmd(esp, ESP_CMD_FLUSH);
+
+   if (n > esp_count)
+   n = esp_count;
+   writesb(esp->fifo_reg, src, n);
+   src += n;
+   esp_count -= n;
+
+   scsi_esp_cmd(esp, cmd);
+
+   while (esp_count) {
+   if (esp_wait_for_intr(esp)) {
+   esp->send_cmd_error = 1;
+   break;
+   }
+
+   if ((esp->sreg & ESP_STAT_PMASK) != phase)
+   break;
+
+   esp->ireg = esp_read8(ESP_INTRPT);
+   if (esp->ireg & ~ESP_INTR_BSERV) {
+   esp->send_cmd_error = 1;
+   break;
+   }
+
+   n = ESP_FIFO_SIZE -
+   (esp_read8(ESP_FFLAGS) & ESP_FF_FBYTES);
+
+   if (n > esp_count)
+   n = esp_count;
+   writesb(esp->fifo_reg, src, n);
+   src += n;
+   esp_count -= n;
+
+   scsi_esp_cmd(esp, ESP_CMD_TI);
+   }
+   }
+
+   esp->send_cmd_residual = esp_count;
+}
+EXPORT_SYMBOL(esp_send_pio_cmd);
+#endif
diff --git a/drivers/scsi/esp_scsi.h b/drivers/scsi/esp_scsi.h
index d0c032803749..2590e5eda595 100644
--- a/drivers/scsi/esp_scsi.h
+++ b/drivers/scsi/esp_scsi.h
@@ -431,6 +431,7 @@ struct esp_driver_ops {
 struct esp {
void __iomem*regs;
void __iomem*dma_regs;
+  

[PATCH] m68k: Unroll raw_outsb() loop

2018-10-12 Thread Finn Thain
Unroll the raw_outsb() loop using the optimized assembler code from
raw_outsw(). That code is copied and pasted, with movew changed to moveb.

This improves the performance of sequential write transfers using mac_esp
in PIO mode by 5% or 10%. (The DMA controller on the 840av/660av models is
still unsupported so PIO transfers are used.)

Tested-by: Stan Johnson 
Signed-off-by: Finn Thain 
---
In a separate patch series, mac_esp adopts writesb() in place of inline
assembler, making that code smaller and more reusable.
---
 arch/m68k/include/asm/raw_io.h | 39 +++
 1 file changed, 35 insertions(+), 4 deletions(-)

diff --git a/arch/m68k/include/asm/raw_io.h b/arch/m68k/include/asm/raw_io.h
index 85761255dde5..8a6dc6e5a279 100644
--- a/arch/m68k/include/asm/raw_io.h
+++ b/arch/m68k/include/asm/raw_io.h
@@ -107,12 +107,43 @@ static inline void raw_insb(volatile u8 __iomem *port, u8 
*buf, unsigned int len
 }
 
 static inline void raw_outsb(volatile u8 __iomem *port, const u8 *buf,
-unsigned int len)
+unsigned int nr)
 {
-   unsigned int i;
+   unsigned int tmp;
 
-for (i = 0; i < len; i++)
-   out_8(port, *buf++);
+   if (nr & 15) {
+   tmp = (nr & 15) - 1;
+   asm volatile (
+   "1: moveb %0@+,%2@; dbra %1,1b"
+   : "=a" (buf), "=d" (tmp)
+   : "a" (port), "0" (buf),
+ "1" (tmp));
+   }
+   if (nr >> 4) {
+   tmp = (nr >> 4) - 1;
+   asm volatile (
+   "1: "
+   "moveb %0@+,%2@; "
+   "moveb %0@+,%2@; "
+   "moveb %0@+,%2@; "
+   "moveb %0@+,%2@; "
+   "moveb %0@+,%2@; "
+   "moveb %0@+,%2@; "
+   "moveb %0@+,%2@; "
+   "moveb %0@+,%2@; "
+   "moveb %0@+,%2@; "
+   "moveb %0@+,%2@; "
+   "moveb %0@+,%2@; "
+   "moveb %0@+,%2@; "
+   "moveb %0@+,%2@; "
+   "moveb %0@+,%2@; "
+   "moveb %0@+,%2@; "
+   "moveb %0@+,%2@; "
+   "dbra %1,1b"
+   : "=a" (buf), "=d" (tmp)
+   : "a" (port), "0" (buf),
+ "1" (tmp));
+   }
 }
 
 static inline void raw_insw(volatile u16 __iomem *port, u16 *buf, unsigned int 
nr)
-- 
2.16.4



[PATCH] m68k: Unroll raw_outsb() loop

2018-10-12 Thread Finn Thain
Unroll the raw_outsb() loop using the optimized assembler code from
raw_outsw(). That code is copied and pasted, with movew changed to moveb.

This improves the performance of sequential write transfers using mac_esp
in PIO mode by 5% or 10%. (The DMA controller on the 840av/660av models is
still unsupported so PIO transfers are used.)

Tested-by: Stan Johnson 
Signed-off-by: Finn Thain 
---
In a separate patch series, mac_esp adopts writesb() in place of inline
assembler, making that code smaller and more reusable.
---
 arch/m68k/include/asm/raw_io.h | 39 +++
 1 file changed, 35 insertions(+), 4 deletions(-)

diff --git a/arch/m68k/include/asm/raw_io.h b/arch/m68k/include/asm/raw_io.h
index 85761255dde5..8a6dc6e5a279 100644
--- a/arch/m68k/include/asm/raw_io.h
+++ b/arch/m68k/include/asm/raw_io.h
@@ -107,12 +107,43 @@ static inline void raw_insb(volatile u8 __iomem *port, u8 
*buf, unsigned int len
 }
 
 static inline void raw_outsb(volatile u8 __iomem *port, const u8 *buf,
-unsigned int len)
+unsigned int nr)
 {
-   unsigned int i;
+   unsigned int tmp;
 
-for (i = 0; i < len; i++)
-   out_8(port, *buf++);
+   if (nr & 15) {
+   tmp = (nr & 15) - 1;
+   asm volatile (
+   "1: moveb %0@+,%2@; dbra %1,1b"
+   : "=a" (buf), "=d" (tmp)
+   : "a" (port), "0" (buf),
+ "1" (tmp));
+   }
+   if (nr >> 4) {
+   tmp = (nr >> 4) - 1;
+   asm volatile (
+   "1: "
+   "moveb %0@+,%2@; "
+   "moveb %0@+,%2@; "
+   "moveb %0@+,%2@; "
+   "moveb %0@+,%2@; "
+   "moveb %0@+,%2@; "
+   "moveb %0@+,%2@; "
+   "moveb %0@+,%2@; "
+   "moveb %0@+,%2@; "
+   "moveb %0@+,%2@; "
+   "moveb %0@+,%2@; "
+   "moveb %0@+,%2@; "
+   "moveb %0@+,%2@; "
+   "moveb %0@+,%2@; "
+   "moveb %0@+,%2@; "
+   "moveb %0@+,%2@; "
+   "moveb %0@+,%2@; "
+   "dbra %1,1b"
+   : "=a" (buf), "=d" (tmp)
+   : "a" (port), "0" (buf),
+ "1" (tmp));
+   }
 }
 
 static inline void raw_insw(volatile u16 __iomem *port, u16 *buf, unsigned int 
nr)
-- 
2.16.4



[PATCH] x86/time: Correct the attribute on jiffies' definition

2018-10-12 Thread Nathan Chancellor
Clang warns that the declaration of jiffies in include/linux/jiffies.h
doesn't match the definition in arch/x86/time/kernel.c:

arch/x86/kernel/time.c:29:42: warning: section does not match previous 
declaration [-Wsection]
__visible volatile unsigned long jiffies __cacheline_aligned = INITIAL_JIFFIES;
 ^
./include/linux/cache.h:49:4: note: expanded from macro '__cacheline_aligned'
 __section__(".data..cacheline_aligned")))
 ^
./include/linux/jiffies.h:81:31: note: previous attribute is here
extern unsigned long volatile __cacheline_aligned_in_smp __jiffy_arch_data 
jiffies;
  ^
./arch/x86/include/asm/cache.h:20:2: note: expanded from macro 
'__cacheline_aligned_in_smp'
__page_aligned_data
^
./include/linux/linkage.h:39:29: note: expanded from macro '__page_aligned_data'
#define __page_aligned_data __section(.data..page_aligned) 
__aligned(PAGE_SIZE)
^
./include/linux/compiler_attributes.h:233:56: note: expanded from macro 
'__section'
#define __section(S)__attribute__((__section__(#S)))
   ^
1 warning generated.

The declaration was changed in commit 7c30f352c852 ("jiffies.h: declare
jiffies and jiffies_64 with cacheline_aligned_in_smp") but wasn't
updated here. Make them match so Clang no longer warns.

Link: https://github.com/ClangBuiltLinux/linux/issues/98
Signed-off-by: Nathan Chancellor 
---
 arch/x86/kernel/time.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kernel/time.c b/arch/x86/kernel/time.c
index b23f5420b26a..0e14f6c0d35e 100644
--- a/arch/x86/kernel/time.c
+++ b/arch/x86/kernel/time.c
@@ -26,7 +26,7 @@
 #include 
 
 #ifdef CONFIG_X86_64
-__visible volatile unsigned long jiffies __cacheline_aligned = INITIAL_JIFFIES;
+__visible volatile unsigned long jiffies __cacheline_aligned_in_smp = 
INITIAL_JIFFIES;
 #endif
 
 unsigned long profile_pc(struct pt_regs *regs)
-- 
2.19.1



[PATCH] x86/time: Correct the attribute on jiffies' definition

2018-10-12 Thread Nathan Chancellor
Clang warns that the declaration of jiffies in include/linux/jiffies.h
doesn't match the definition in arch/x86/time/kernel.c:

arch/x86/kernel/time.c:29:42: warning: section does not match previous 
declaration [-Wsection]
__visible volatile unsigned long jiffies __cacheline_aligned = INITIAL_JIFFIES;
 ^
./include/linux/cache.h:49:4: note: expanded from macro '__cacheline_aligned'
 __section__(".data..cacheline_aligned")))
 ^
./include/linux/jiffies.h:81:31: note: previous attribute is here
extern unsigned long volatile __cacheline_aligned_in_smp __jiffy_arch_data 
jiffies;
  ^
./arch/x86/include/asm/cache.h:20:2: note: expanded from macro 
'__cacheline_aligned_in_smp'
__page_aligned_data
^
./include/linux/linkage.h:39:29: note: expanded from macro '__page_aligned_data'
#define __page_aligned_data __section(.data..page_aligned) 
__aligned(PAGE_SIZE)
^
./include/linux/compiler_attributes.h:233:56: note: expanded from macro 
'__section'
#define __section(S)__attribute__((__section__(#S)))
   ^
1 warning generated.

The declaration was changed in commit 7c30f352c852 ("jiffies.h: declare
jiffies and jiffies_64 with cacheline_aligned_in_smp") but wasn't
updated here. Make them match so Clang no longer warns.

Link: https://github.com/ClangBuiltLinux/linux/issues/98
Signed-off-by: Nathan Chancellor 
---
 arch/x86/kernel/time.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kernel/time.c b/arch/x86/kernel/time.c
index b23f5420b26a..0e14f6c0d35e 100644
--- a/arch/x86/kernel/time.c
+++ b/arch/x86/kernel/time.c
@@ -26,7 +26,7 @@
 #include 
 
 #ifdef CONFIG_X86_64
-__visible volatile unsigned long jiffies __cacheline_aligned = INITIAL_JIFFIES;
+__visible volatile unsigned long jiffies __cacheline_aligned_in_smp = 
INITIAL_JIFFIES;
 #endif
 
 unsigned long profile_pc(struct pt_regs *regs)
-- 
2.19.1



[PATCH] PCI: make pci_size() return real size

2018-10-12 Thread changbin . du
From: Du Changbin 

Currently, the pci_size() function actually return 'size-1'.
Make it return real size to avoid confusing.

Signed-off-by: Du Changbin 
---
 drivers/pci/probe.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 201f9e5ff55c..8ff2b1413865 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -121,13 +121,13 @@ static u64 pci_size(u64 base, u64 maxbase, u64 mask)
 * Get the lowest of them to find the decode size, and from that
 * the extent.
 */
-   size = (size & ~(size-1)) - 1;
+   size = size & ~(size-1);
 
/*
 * base == maxbase can be valid only if the BAR has already been
 * programmed with all 1s.
 */
-   if (base == maxbase && ((base | size) & mask) != mask)
+   if (base == maxbase && ((base | (size - 1)) & mask) != mask)
return 0;
 
return size;
@@ -278,7 +278,7 @@ int __pci_read_base(struct pci_dev *dev, enum pci_bar_type 
type,
/* Above 32-bit boundary; try to reallocate */
res->flags |= IORESOURCE_UNSET;
res->start = 0;
-   res->end = sz64;
+   res->end = sz64 - 1;
pci_info(dev, "reg 0x%x: can't handle BAR above 4GB 
(bus address %#010llx)\n",
 pos, (unsigned long long)l64);
goto out;
@@ -286,7 +286,7 @@ int __pci_read_base(struct pci_dev *dev, enum pci_bar_type 
type,
}
 
region.start = l64;
-   region.end = l64 + sz64;
+   region.end = l64 + sz64 - 1;
 
pcibios_bus_to_resource(dev->bus, res, );
pcibios_resource_to_bus(dev->bus, _region, res);
-- 
2.17.1



[PATCH] PCI: make pci_size() return real size

2018-10-12 Thread changbin . du
From: Du Changbin 

Currently, the pci_size() function actually return 'size-1'.
Make it return real size to avoid confusing.

Signed-off-by: Du Changbin 
---
 drivers/pci/probe.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 201f9e5ff55c..8ff2b1413865 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -121,13 +121,13 @@ static u64 pci_size(u64 base, u64 maxbase, u64 mask)
 * Get the lowest of them to find the decode size, and from that
 * the extent.
 */
-   size = (size & ~(size-1)) - 1;
+   size = size & ~(size-1);
 
/*
 * base == maxbase can be valid only if the BAR has already been
 * programmed with all 1s.
 */
-   if (base == maxbase && ((base | size) & mask) != mask)
+   if (base == maxbase && ((base | (size - 1)) & mask) != mask)
return 0;
 
return size;
@@ -278,7 +278,7 @@ int __pci_read_base(struct pci_dev *dev, enum pci_bar_type 
type,
/* Above 32-bit boundary; try to reallocate */
res->flags |= IORESOURCE_UNSET;
res->start = 0;
-   res->end = sz64;
+   res->end = sz64 - 1;
pci_info(dev, "reg 0x%x: can't handle BAR above 4GB 
(bus address %#010llx)\n",
 pos, (unsigned long long)l64);
goto out;
@@ -286,7 +286,7 @@ int __pci_read_base(struct pci_dev *dev, enum pci_bar_type 
type,
}
 
region.start = l64;
-   region.end = l64 + sz64;
+   region.end = l64 + sz64 - 1;
 
pcibios_bus_to_resource(dev->bus, res, );
pcibios_resource_to_bus(dev->bus, _region, res);
-- 
2.17.1



[PATCH] kernel/signal: Signal-based pre-coredump notification

2018-10-12 Thread Enke Chen
For simplicity and consistency, this patch provides an implementation
for signal-based fault notification prior to the coredump of a child
process. A new prctl command, PR_SET_PREDUMP_SIG, is defined that can
be used by an application to express its interest and to specify the
signal (SIGCHLD or SIGUSR1 or SIGUSR2) for such a notification. A new
signal code (si_code), CLD_PREDUMP, is also defined for SIGCHLD.

Background:

As the coredump of a process may take time, in certain time-sensitive
applications it is necessary for a parent process (e.g., a process
manager) to be notified of a child's imminent death before the coredump
so that the parent process can act sooner, such as re-spawning an
application process, or initiating a control-plane fail-over.

Currently there are two ways for a parent process to be notified of a
child process's state change. One is to use the POSIX signal, and
another is to use the kernel connector module. The specific events and
actions are summarized as follows:

Process EventPOSIX SignalConnector-based
--
ptrace_attach()  do_notify_parent_cldstop()  proc_ptrace_connector()
 SIGCHLD / CLD_STOPPED

ptrace_detach()  do_notify_parent_cldstop()  proc_ptrace_connector()
 SIGCHLD / CLD_CONTINUED

pre_coredump/N/A proc_coredump_connector()
get_signal()

post_coredump/   do_notify_parent()  proc_exit_connector()
do_exit()SIGCHLD / exit_signal
--

As shown in the table, the signal-based pre-coredump notification is not
currently available. In some cases using a connector-based notification
can be quite complicated (e.g., when a process manager is written in shell
scripts and thus is subject to certain inherent limitations), and a
signal-based notification would be simpler and better suited.

Signed-off-by: Enke Chen 
---
 arch/x86/kernel/signal_compat.c|  2 +-
 include/linux/sched.h  |  4 ++
 include/linux/signal.h |  5 +++
 include/uapi/asm-generic/siginfo.h |  3 +-
 include/uapi/linux/prctl.h |  4 ++
 kernel/fork.c  |  1 +
 kernel/signal.c| 51 +
 kernel/sys.c   | 77 ++
 8 files changed, 145 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/signal_compat.c b/arch/x86/kernel/signal_compat.c
index 9ccbf05..a3deba8 100644
--- a/arch/x86/kernel/signal_compat.c
+++ b/arch/x86/kernel/signal_compat.c
@@ -30,7 +30,7 @@ static inline void signal_compat_build_tests(void)
BUILD_BUG_ON(NSIGSEGV != 7);
BUILD_BUG_ON(NSIGBUS  != 5);
BUILD_BUG_ON(NSIGTRAP != 5);
-   BUILD_BUG_ON(NSIGCHLD != 6);
+   BUILD_BUG_ON(NSIGCHLD != 7);
BUILD_BUG_ON(NSIGSYS  != 1);
 
/* This is part of the ABI and can never change in size: */
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 09026ea..cfb9645 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -696,6 +696,10 @@ struct task_struct {
int exit_signal;
/* The signal sent when the parent dies: */
int pdeath_signal;
+
+   /* The signal sent prior to a child's coredump: */
+   int predump_signal;
+
/* JOBCTL_*, siglock protected: */
unsigned long   jobctl;
 
diff --git a/include/linux/signal.h b/include/linux/signal.h
index 706a499..7cb976d 100644
--- a/include/linux/signal.h
+++ b/include/linux/signal.h
@@ -256,6 +256,11 @@ static inline int valid_signal(unsigned long sig)
return sig <= _NSIG ? 1 : 0;
 }
 
+static inline int valid_predump_signal(int sig)
+{
+   return (sig == SIGCHLD) || (sig == SIGUSR1) || (sig == SIGUSR2);
+}
+
 struct timespec;
 struct pt_regs;
 enum pid_type;
diff --git a/include/uapi/asm-generic/siginfo.h 
b/include/uapi/asm-generic/siginfo.h
index cb3d6c2..1a47cef 100644
--- a/include/uapi/asm-generic/siginfo.h
+++ b/include/uapi/asm-generic/siginfo.h
@@ -267,7 +267,8 @@ struct {\
 #define CLD_TRAPPED4   /* traced child has trapped */
 #define CLD_STOPPED5   /* child has stopped */
 #define CLD_CONTINUED  6   /* stopped child has continued */
-#define NSIGCHLD   6
+#define CLD_PREDUMP7   /* child is about to dump core */
+#define NSIGCHLD   7
 
 /*
  * SIGPOLL (or any other signal without signal specific si_codes) si_codes
diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h
index c0d7ea0..79f0a8a 100644
--- a/include/uapi/linux/prctl.h
+++ b/include/uapi/linux/prctl.h
@@ -219,4 +219,8 @@ struct prctl_mm_map {
 # define PR_SPEC_DISABLE   (1UL << 2)
 # define PR_SPEC_FORCE_DISABLE (1UL << 3)
 
+/* Whether to receive signal 

[PATCH] kernel/signal: Signal-based pre-coredump notification

2018-10-12 Thread Enke Chen
For simplicity and consistency, this patch provides an implementation
for signal-based fault notification prior to the coredump of a child
process. A new prctl command, PR_SET_PREDUMP_SIG, is defined that can
be used by an application to express its interest and to specify the
signal (SIGCHLD or SIGUSR1 or SIGUSR2) for such a notification. A new
signal code (si_code), CLD_PREDUMP, is also defined for SIGCHLD.

Background:

As the coredump of a process may take time, in certain time-sensitive
applications it is necessary for a parent process (e.g., a process
manager) to be notified of a child's imminent death before the coredump
so that the parent process can act sooner, such as re-spawning an
application process, or initiating a control-plane fail-over.

Currently there are two ways for a parent process to be notified of a
child process's state change. One is to use the POSIX signal, and
another is to use the kernel connector module. The specific events and
actions are summarized as follows:

Process EventPOSIX SignalConnector-based
--
ptrace_attach()  do_notify_parent_cldstop()  proc_ptrace_connector()
 SIGCHLD / CLD_STOPPED

ptrace_detach()  do_notify_parent_cldstop()  proc_ptrace_connector()
 SIGCHLD / CLD_CONTINUED

pre_coredump/N/A proc_coredump_connector()
get_signal()

post_coredump/   do_notify_parent()  proc_exit_connector()
do_exit()SIGCHLD / exit_signal
--

As shown in the table, the signal-based pre-coredump notification is not
currently available. In some cases using a connector-based notification
can be quite complicated (e.g., when a process manager is written in shell
scripts and thus is subject to certain inherent limitations), and a
signal-based notification would be simpler and better suited.

Signed-off-by: Enke Chen 
---
 arch/x86/kernel/signal_compat.c|  2 +-
 include/linux/sched.h  |  4 ++
 include/linux/signal.h |  5 +++
 include/uapi/asm-generic/siginfo.h |  3 +-
 include/uapi/linux/prctl.h |  4 ++
 kernel/fork.c  |  1 +
 kernel/signal.c| 51 +
 kernel/sys.c   | 77 ++
 8 files changed, 145 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/signal_compat.c b/arch/x86/kernel/signal_compat.c
index 9ccbf05..a3deba8 100644
--- a/arch/x86/kernel/signal_compat.c
+++ b/arch/x86/kernel/signal_compat.c
@@ -30,7 +30,7 @@ static inline void signal_compat_build_tests(void)
BUILD_BUG_ON(NSIGSEGV != 7);
BUILD_BUG_ON(NSIGBUS  != 5);
BUILD_BUG_ON(NSIGTRAP != 5);
-   BUILD_BUG_ON(NSIGCHLD != 6);
+   BUILD_BUG_ON(NSIGCHLD != 7);
BUILD_BUG_ON(NSIGSYS  != 1);
 
/* This is part of the ABI and can never change in size: */
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 09026ea..cfb9645 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -696,6 +696,10 @@ struct task_struct {
int exit_signal;
/* The signal sent when the parent dies: */
int pdeath_signal;
+
+   /* The signal sent prior to a child's coredump: */
+   int predump_signal;
+
/* JOBCTL_*, siglock protected: */
unsigned long   jobctl;
 
diff --git a/include/linux/signal.h b/include/linux/signal.h
index 706a499..7cb976d 100644
--- a/include/linux/signal.h
+++ b/include/linux/signal.h
@@ -256,6 +256,11 @@ static inline int valid_signal(unsigned long sig)
return sig <= _NSIG ? 1 : 0;
 }
 
+static inline int valid_predump_signal(int sig)
+{
+   return (sig == SIGCHLD) || (sig == SIGUSR1) || (sig == SIGUSR2);
+}
+
 struct timespec;
 struct pt_regs;
 enum pid_type;
diff --git a/include/uapi/asm-generic/siginfo.h 
b/include/uapi/asm-generic/siginfo.h
index cb3d6c2..1a47cef 100644
--- a/include/uapi/asm-generic/siginfo.h
+++ b/include/uapi/asm-generic/siginfo.h
@@ -267,7 +267,8 @@ struct {\
 #define CLD_TRAPPED4   /* traced child has trapped */
 #define CLD_STOPPED5   /* child has stopped */
 #define CLD_CONTINUED  6   /* stopped child has continued */
-#define NSIGCHLD   6
+#define CLD_PREDUMP7   /* child is about to dump core */
+#define NSIGCHLD   7
 
 /*
  * SIGPOLL (or any other signal without signal specific si_codes) si_codes
diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h
index c0d7ea0..79f0a8a 100644
--- a/include/uapi/linux/prctl.h
+++ b/include/uapi/linux/prctl.h
@@ -219,4 +219,8 @@ struct prctl_mm_map {
 # define PR_SPEC_DISABLE   (1UL << 2)
 # define PR_SPEC_FORCE_DISABLE (1UL << 3)
 
+/* Whether to receive signal 

Re: [PATCH 6/6] mm: track gup pages with page->dma_pinned_* fields

2018-10-12 Thread John Hubbard
On 10/12/18 4:07 AM, Balbir Singh wrote:
> On Thu, Oct 11, 2018 at 11:00:14PM -0700, john.hubb...@gmail.com wrote:
>> From: John Hubbard 
[...]
>> +static int pin_page_for_dma(struct page *page)
>> +{
>> +int ret = 0;
>> +struct zone *zone;
>> +
>> +page = compound_head(page);
>> +zone = page_zone(page);
>> +
>> +spin_lock(zone_gup_lock(zone));
>> +
>> +if (PageDmaPinned(page)) {
>> +/* Page was not on an LRU list, because it was DMA-pinned. */
>> +VM_BUG_ON_PAGE(PageLRU(page), page);
>> +
>> +atomic_inc(>dma_pinned_count);
>> +goto unlock_out;
>> +}
>> +
>> +/*
>> + * Note that page->dma_pinned_flags is unioned with page->lru.
>> + * Therefore, the rules are: checking if any of the
>> + * PAGE_DMA_PINNED_FLAGS bits are set may be done while page->lru
>> + * is in use. However, setting those flags requires that
>> + * the page is both locked, and also, removed from the LRU.
>> + */
>> +ret = isolate_lru_page(page);
>> +
> 
> isolate_lru_page() can be expensive and in terms of the overall locking order
> sounds like zone_gup_lock is higher in the hierarchy than the locks taken
> inside isolate_lru_page()
 
As for the expensive part, that is a concern. But I do think we need some lock
here. The hierarchy shouldn't be a problem, given that this is a new lock. But
I'm not sure how to make this better. In any case, I think it should work--note 
that
the zone_lru_lock, within isolate_lru_page(), is of similar use, and is held
for a similar duration, so...maybe not really a problem?


>> +if (ret == 0) {
>> +/* Avoid problems later, when freeing the page: */
>> +ClearPageActive(page);
>> +ClearPageUnevictable(page);
>> +
>> +/* counteract isolate_lru_page's effects: */
>> +put_page(page);
> 
> Can the page get reaped here? What's the expected page count?

Nope. The page_count is at least one, because get_user_pages() incremented it.

 
>> +
>> +atomic_set(>dma_pinned_count, 1);
>> +SetPageDmaPinned(page);
>> +}
>> +
>> +unlock_out:
>> +spin_unlock(zone_gup_lock(zone));
>> +
>> +return ret;
>> +}
>> +
>>  static struct page *no_page_table(struct vm_area_struct *vma,
>>  unsigned int flags)
>>  {
>> @@ -659,7 +704,7 @@ static long __get_user_pages(struct task_struct *tsk, 
>> struct mm_struct *mm,
>>  unsigned int gup_flags, struct page **pages,
>>  struct vm_area_struct **vmas, int *nonblocking)
>>  {
>> -long i = 0;
>> +long i = 0, j;
>>  int err = 0;
>>  unsigned int page_mask;
>>  struct vm_area_struct *vma = NULL;
>> @@ -764,6 +809,10 @@ static long __get_user_pages(struct task_struct *tsk, 
>> struct mm_struct *mm,
>>  } while (nr_pages);
>>  
>>  out:
>> +if (pages)
>> +for (j = 0; j < i; j++)
>> +pin_page_for_dma(pages[j]);
>> +
> 
> Why does get_user_pages() unconditionally pin_page_for_dma?

That's the grand plan here: get_user_pages() now means "unconditionally pin the 
page for dma".
If you didn't want that, then either release it quickly (many callers do), or 
use a different
way of pinning or acquiring the page.

> 
>>  return i ? i : err;
>>  }
>>  
>> @@ -1841,7 +1890,7 @@ int get_user_pages_fast(unsigned long start, int 
>> nr_pages, int write,
>>  struct page **pages)
>>  {
>>  unsigned long addr, len, end;
>> -int nr = 0, ret = 0;
>> +int nr = 0, ret = 0, i;
>>  
>>  start &= PAGE_MASK;
>>  addr = start;
>> @@ -1862,6 +1911,9 @@ int get_user_pages_fast(unsigned long start, int 
>> nr_pages, int write,
>>  ret = nr;
>>  }
>>  
>> +for (i = 0; i < nr; i++)
>> +pin_page_for_dma(pages[i]);
> 
> Why does get_user_pages_fast() unconditionally pin_page_for_dma?

All of the get_user_pages*() variants need to follow the same rules, so the 
same 
explanation as above, applies here also.

>> +
>>  if (nr < nr_pages) {
>>  /* Try to get the remaining pages with get_user_pages */
>>  start += nr << PAGE_SHIFT;
>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>> index e79cb59552d9..af9719756081 100644
>> --- a/mm/memcontrol.c
>> +++ b/mm/memcontrol.c
>> @@ -2335,6 +2335,11 @@ static void lock_page_lru(struct page *page, int 
>> *isolated)
>>  if (PageLRU(page)) {
>>  struct lruvec *lruvec;
>>  
>> +/* LRU and PageDmaPinned are mutually exclusive: they use the
>> + * same fields in struct page, but for different purposes.
>> + */
> 
> Comment style needs fixing
 
oops, thanks for spotting those, will fix.


-- 
thanks,
John Hubbard
NVIDIA



Re: [PATCH 6/6] mm: track gup pages with page->dma_pinned_* fields

2018-10-12 Thread John Hubbard
On 10/12/18 4:07 AM, Balbir Singh wrote:
> On Thu, Oct 11, 2018 at 11:00:14PM -0700, john.hubb...@gmail.com wrote:
>> From: John Hubbard 
[...]
>> +static int pin_page_for_dma(struct page *page)
>> +{
>> +int ret = 0;
>> +struct zone *zone;
>> +
>> +page = compound_head(page);
>> +zone = page_zone(page);
>> +
>> +spin_lock(zone_gup_lock(zone));
>> +
>> +if (PageDmaPinned(page)) {
>> +/* Page was not on an LRU list, because it was DMA-pinned. */
>> +VM_BUG_ON_PAGE(PageLRU(page), page);
>> +
>> +atomic_inc(>dma_pinned_count);
>> +goto unlock_out;
>> +}
>> +
>> +/*
>> + * Note that page->dma_pinned_flags is unioned with page->lru.
>> + * Therefore, the rules are: checking if any of the
>> + * PAGE_DMA_PINNED_FLAGS bits are set may be done while page->lru
>> + * is in use. However, setting those flags requires that
>> + * the page is both locked, and also, removed from the LRU.
>> + */
>> +ret = isolate_lru_page(page);
>> +
> 
> isolate_lru_page() can be expensive and in terms of the overall locking order
> sounds like zone_gup_lock is higher in the hierarchy than the locks taken
> inside isolate_lru_page()
 
As for the expensive part, that is a concern. But I do think we need some lock
here. The hierarchy shouldn't be a problem, given that this is a new lock. But
I'm not sure how to make this better. In any case, I think it should work--note 
that
the zone_lru_lock, within isolate_lru_page(), is of similar use, and is held
for a similar duration, so...maybe not really a problem?


>> +if (ret == 0) {
>> +/* Avoid problems later, when freeing the page: */
>> +ClearPageActive(page);
>> +ClearPageUnevictable(page);
>> +
>> +/* counteract isolate_lru_page's effects: */
>> +put_page(page);
> 
> Can the page get reaped here? What's the expected page count?

Nope. The page_count is at least one, because get_user_pages() incremented it.

 
>> +
>> +atomic_set(>dma_pinned_count, 1);
>> +SetPageDmaPinned(page);
>> +}
>> +
>> +unlock_out:
>> +spin_unlock(zone_gup_lock(zone));
>> +
>> +return ret;
>> +}
>> +
>>  static struct page *no_page_table(struct vm_area_struct *vma,
>>  unsigned int flags)
>>  {
>> @@ -659,7 +704,7 @@ static long __get_user_pages(struct task_struct *tsk, 
>> struct mm_struct *mm,
>>  unsigned int gup_flags, struct page **pages,
>>  struct vm_area_struct **vmas, int *nonblocking)
>>  {
>> -long i = 0;
>> +long i = 0, j;
>>  int err = 0;
>>  unsigned int page_mask;
>>  struct vm_area_struct *vma = NULL;
>> @@ -764,6 +809,10 @@ static long __get_user_pages(struct task_struct *tsk, 
>> struct mm_struct *mm,
>>  } while (nr_pages);
>>  
>>  out:
>> +if (pages)
>> +for (j = 0; j < i; j++)
>> +pin_page_for_dma(pages[j]);
>> +
> 
> Why does get_user_pages() unconditionally pin_page_for_dma?

That's the grand plan here: get_user_pages() now means "unconditionally pin the 
page for dma".
If you didn't want that, then either release it quickly (many callers do), or 
use a different
way of pinning or acquiring the page.

> 
>>  return i ? i : err;
>>  }
>>  
>> @@ -1841,7 +1890,7 @@ int get_user_pages_fast(unsigned long start, int 
>> nr_pages, int write,
>>  struct page **pages)
>>  {
>>  unsigned long addr, len, end;
>> -int nr = 0, ret = 0;
>> +int nr = 0, ret = 0, i;
>>  
>>  start &= PAGE_MASK;
>>  addr = start;
>> @@ -1862,6 +1911,9 @@ int get_user_pages_fast(unsigned long start, int 
>> nr_pages, int write,
>>  ret = nr;
>>  }
>>  
>> +for (i = 0; i < nr; i++)
>> +pin_page_for_dma(pages[i]);
> 
> Why does get_user_pages_fast() unconditionally pin_page_for_dma?

All of the get_user_pages*() variants need to follow the same rules, so the 
same 
explanation as above, applies here also.

>> +
>>  if (nr < nr_pages) {
>>  /* Try to get the remaining pages with get_user_pages */
>>  start += nr << PAGE_SHIFT;
>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>> index e79cb59552d9..af9719756081 100644
>> --- a/mm/memcontrol.c
>> +++ b/mm/memcontrol.c
>> @@ -2335,6 +2335,11 @@ static void lock_page_lru(struct page *page, int 
>> *isolated)
>>  if (PageLRU(page)) {
>>  struct lruvec *lruvec;
>>  
>> +/* LRU and PageDmaPinned are mutually exclusive: they use the
>> + * same fields in struct page, but for different purposes.
>> + */
> 
> Comment style needs fixing
 
oops, thanks for spotting those, will fix.


-- 
thanks,
John Hubbard
NVIDIA



Re: [PATCH 4/6] mm: introduce page->dma_pinned_flags, _count

2018-10-12 Thread John Hubbard
On 10/12/18 3:56 AM, Balbir Singh wrote:
> On Thu, Oct 11, 2018 at 11:00:12PM -0700, john.hubb...@gmail.com wrote:
>> From: John Hubbard 
[...]
>> + * Because page->dma_pinned_flags is unioned with page->lru, any page that
>> + * uses these flags must NOT be on an LRU. That's partly enforced by
>> + * ClearPageDmaPinned, which gives the page back to LRU.
>> + *
>> + * PageDmaPinned also corresponds to PageTail (the 0th bit in the first 
>> union
>> + * of struct page), and this flag is checked without knowing whether it is a
>> + * tail page or a PageDmaPinned page. Therefore, start the flags at bit 1 
>> (0x2),
>> + * rather than bit 0.
>> + */
>> +#define PAGE_DMA_PINNED 0x2
>> +#define PAGE_DMA_PINNED_FLAGS   (PAGE_DMA_PINNED)
>> +
> 
> This is really subtle, additional changes to compound_head will need to 
> coordinate
> with these flags? Also doesn't this bit need to be unique across all structs 
> in
> the union? I guess that is guaranteed by the fact that page == 
> compound_head(page)
> as per your assertion, but I've forgotten why that is true. Could you please
> add some commentary on that
> 

Yes, agreed. I've rewritten and augmented that comment block, plus removed the 
PAGE_DMA_PINNED_FLAGS (there are no more bits available, so it's just 
misleading 
to even have it). So now it looks like this:

/*
 * Because page->dma_pinned_flags is unioned with page->lru, any page that
 * uses these flags must NOT be on an LRU. That's partly enforced by
 * ClearPageDmaPinned, which gives the page back to LRU.
 *
 * PageDmaPinned is checked without knowing whether it is a tail page or a
 * PageDmaPinned page. For that reason, PageDmaPinned avoids PageTail (the 0th
 * bit in the first union of struct page), and instead uses bit 1 (0x2),
 * rather than bit 0.
 *
 * PageDmaPinned can only be used if no other systems are using the same bit
 * across the first struct page union. In this regard, it is similar to
 * PageTail, and in fact, because of PageTail's constraint that bit 0 be left
 * alone, bit 1 is also left alone so far: other union elements (ignoring tail
 * pages) put pointers there, and pointer alignment leaves the lower two bits
 * available.
 *
 * So, constraints include:
 *
 * -- Only use PageDmaPinned on non-tail pages.
 * -- Remove the page from any LRU list first.
 */
#define PAGE_DMA_PINNED 0x2

/*
 * Because these flags are read outside of a lock, ensure visibility between
 * different threads, by using READ|WRITE_ONCE.
 */
static __always_inline int PageDmaPinned(struct page *page)
{
VM_BUG_ON(page != compound_head(page));
return (READ_ONCE(page->dma_pinned_flags) & PAGE_DMA_PINNED) != 0;
}

[...]
>> +static __always_inline void SetPageDmaPinned(struct page *page)
>> +{
>> +VM_BUG_ON(page != compound_head(page));
> 
> VM_BUG_ON(!list_empty(>lru))


There is only one place where we set this flag, and that is when (in patch 6/6)
transitioning from a page that might (or might not) have been
on an LRU. In that case, the calling code has already corrupted page->lru, by
writing to page->dma_pinned_count, which is unions with page->lru:

atomic_set(>dma_pinned_count, 1);
SetPageDmaPinned(page);

...so it would be inappropriate to call a list function, such as 
list_empty(), on that field.  Let's just leave it as-is.


> 
>> +WRITE_ONCE(page->dma_pinned_flags, PAGE_DMA_PINNED);
>> +}
>> +
>> +static __always_inline void ClearPageDmaPinned(struct page *page)
>> +{
>> +VM_BUG_ON(page != compound_head(page));
>> +VM_BUG_ON_PAGE(!PageDmaPinnedFlags(page), page);
>> +
>> +/* This does a WRITE_ONCE to the lru.next, which is also the
>> + * page->dma_pinned_flags field. So in addition to restoring page->lru,
>> + * this provides visibility to other threads.
>> + */
>> +INIT_LIST_HEAD(>lru);
> 
> This assumes certain things about list_head, why not use the correct
> initialization bits.
> 

Yes, OK, changed to:

static __always_inline void ClearPageDmaPinned(struct page *page)
{
VM_BUG_ON(page != compound_head(page));
VM_BUG_ON_PAGE(!PageDmaPinned(page), page);

/* Provide visibility to other threads: */
WRITE_ONCE(page->dma_pinned_flags, 0);

/*
 * Safety precaution: restore the list head, before possibly returning
 * the page to other subsystems.
 */
INIT_LIST_HEAD(>lru);
}



-- 
thanks,
John Hubbard
NVIDIA


Re: [PATCH 4/6] mm: introduce page->dma_pinned_flags, _count

2018-10-12 Thread John Hubbard
On 10/12/18 3:56 AM, Balbir Singh wrote:
> On Thu, Oct 11, 2018 at 11:00:12PM -0700, john.hubb...@gmail.com wrote:
>> From: John Hubbard 
[...]
>> + * Because page->dma_pinned_flags is unioned with page->lru, any page that
>> + * uses these flags must NOT be on an LRU. That's partly enforced by
>> + * ClearPageDmaPinned, which gives the page back to LRU.
>> + *
>> + * PageDmaPinned also corresponds to PageTail (the 0th bit in the first 
>> union
>> + * of struct page), and this flag is checked without knowing whether it is a
>> + * tail page or a PageDmaPinned page. Therefore, start the flags at bit 1 
>> (0x2),
>> + * rather than bit 0.
>> + */
>> +#define PAGE_DMA_PINNED 0x2
>> +#define PAGE_DMA_PINNED_FLAGS   (PAGE_DMA_PINNED)
>> +
> 
> This is really subtle, additional changes to compound_head will need to 
> coordinate
> with these flags? Also doesn't this bit need to be unique across all structs 
> in
> the union? I guess that is guaranteed by the fact that page == 
> compound_head(page)
> as per your assertion, but I've forgotten why that is true. Could you please
> add some commentary on that
> 

Yes, agreed. I've rewritten and augmented that comment block, plus removed the 
PAGE_DMA_PINNED_FLAGS (there are no more bits available, so it's just 
misleading 
to even have it). So now it looks like this:

/*
 * Because page->dma_pinned_flags is unioned with page->lru, any page that
 * uses these flags must NOT be on an LRU. That's partly enforced by
 * ClearPageDmaPinned, which gives the page back to LRU.
 *
 * PageDmaPinned is checked without knowing whether it is a tail page or a
 * PageDmaPinned page. For that reason, PageDmaPinned avoids PageTail (the 0th
 * bit in the first union of struct page), and instead uses bit 1 (0x2),
 * rather than bit 0.
 *
 * PageDmaPinned can only be used if no other systems are using the same bit
 * across the first struct page union. In this regard, it is similar to
 * PageTail, and in fact, because of PageTail's constraint that bit 0 be left
 * alone, bit 1 is also left alone so far: other union elements (ignoring tail
 * pages) put pointers there, and pointer alignment leaves the lower two bits
 * available.
 *
 * So, constraints include:
 *
 * -- Only use PageDmaPinned on non-tail pages.
 * -- Remove the page from any LRU list first.
 */
#define PAGE_DMA_PINNED 0x2

/*
 * Because these flags are read outside of a lock, ensure visibility between
 * different threads, by using READ|WRITE_ONCE.
 */
static __always_inline int PageDmaPinned(struct page *page)
{
VM_BUG_ON(page != compound_head(page));
return (READ_ONCE(page->dma_pinned_flags) & PAGE_DMA_PINNED) != 0;
}

[...]
>> +static __always_inline void SetPageDmaPinned(struct page *page)
>> +{
>> +VM_BUG_ON(page != compound_head(page));
> 
> VM_BUG_ON(!list_empty(>lru))


There is only one place where we set this flag, and that is when (in patch 6/6)
transitioning from a page that might (or might not) have been
on an LRU. In that case, the calling code has already corrupted page->lru, by
writing to page->dma_pinned_count, which is unions with page->lru:

atomic_set(>dma_pinned_count, 1);
SetPageDmaPinned(page);

...so it would be inappropriate to call a list function, such as 
list_empty(), on that field.  Let's just leave it as-is.


> 
>> +WRITE_ONCE(page->dma_pinned_flags, PAGE_DMA_PINNED);
>> +}
>> +
>> +static __always_inline void ClearPageDmaPinned(struct page *page)
>> +{
>> +VM_BUG_ON(page != compound_head(page));
>> +VM_BUG_ON_PAGE(!PageDmaPinnedFlags(page), page);
>> +
>> +/* This does a WRITE_ONCE to the lru.next, which is also the
>> + * page->dma_pinned_flags field. So in addition to restoring page->lru,
>> + * this provides visibility to other threads.
>> + */
>> +INIT_LIST_HEAD(>lru);
> 
> This assumes certain things about list_head, why not use the correct
> initialization bits.
> 

Yes, OK, changed to:

static __always_inline void ClearPageDmaPinned(struct page *page)
{
VM_BUG_ON(page != compound_head(page));
VM_BUG_ON_PAGE(!PageDmaPinned(page), page);

/* Provide visibility to other threads: */
WRITE_ONCE(page->dma_pinned_flags, 0);

/*
 * Safety precaution: restore the list head, before possibly returning
 * the page to other subsystems.
 */
INIT_LIST_HEAD(>lru);
}



-- 
thanks,
John Hubbard
NVIDIA


Re: [PATCH v2 1/7] modules: Create rlimit for module space

2018-10-12 Thread Jann Horn
On Sat, Oct 13, 2018 at 2:04 AM Edgecombe, Rick P
 wrote:
> On Fri, 2018-10-12 at 19:22 +0200, Jann Horn wrote:
> > On Fri, Oct 12, 2018 at 7:04 PM Edgecombe, Rick P
> >  wrote:
> > > On Fri, 2018-10-12 at 02:35 +0200, Jann Horn wrote:
> > > > Why all the rbtree stuff instead of stashing a pointer in struct
> > > > vmap_area, or something like that?
> > >
> > > Since the tracking was not for all vmalloc usage, the intention was to not
> > > bloat
> > > the structure for other usages likes stacks. I thought usually there
> > > wouldn't be
> > > nearly as much module space allocations as there would be kernel stacks, 
> > > but
> > > I
> > > didn't do any actual measurements on the tradeoffs.
> >
> > I imagine that one extra pointer in there - pointing to your struct
> > mod_alloc_user - would probably not be terrible. 8 bytes more per
> > kernel stack shouldn't be so bad?
>
> I looked into this and it starts to look a little messy. The nommu.c version 
> of
> vmalloc doesn't use or expose access to vmap_area or vm_struct. So it starts 
> to
> look like a bunch of IFDEFs to remove the rlimit in the nommu case or making a
> stand in that maintains pretend vm struct's in nommu.c. I had actually
> previously tried to at least pull the allocations size from vmalloc structs, 
> but it broke on nommu.
>
> Thought I would check back and see. How important do you think this is?

I don't think it's important - I just thought that it would be nice to
avoid the extra complexity if it is easily avoidable.


Re: [PATCH v2 1/7] modules: Create rlimit for module space

2018-10-12 Thread Jann Horn
On Sat, Oct 13, 2018 at 2:04 AM Edgecombe, Rick P
 wrote:
> On Fri, 2018-10-12 at 19:22 +0200, Jann Horn wrote:
> > On Fri, Oct 12, 2018 at 7:04 PM Edgecombe, Rick P
> >  wrote:
> > > On Fri, 2018-10-12 at 02:35 +0200, Jann Horn wrote:
> > > > Why all the rbtree stuff instead of stashing a pointer in struct
> > > > vmap_area, or something like that?
> > >
> > > Since the tracking was not for all vmalloc usage, the intention was to not
> > > bloat
> > > the structure for other usages likes stacks. I thought usually there
> > > wouldn't be
> > > nearly as much module space allocations as there would be kernel stacks, 
> > > but
> > > I
> > > didn't do any actual measurements on the tradeoffs.
> >
> > I imagine that one extra pointer in there - pointing to your struct
> > mod_alloc_user - would probably not be terrible. 8 bytes more per
> > kernel stack shouldn't be so bad?
>
> I looked into this and it starts to look a little messy. The nommu.c version 
> of
> vmalloc doesn't use or expose access to vmap_area or vm_struct. So it starts 
> to
> look like a bunch of IFDEFs to remove the rlimit in the nommu case or making a
> stand in that maintains pretend vm struct's in nommu.c. I had actually
> previously tried to at least pull the allocations size from vmalloc structs, 
> but it broke on nommu.
>
> Thought I would check back and see. How important do you think this is?

I don't think it's important - I just thought that it would be nice to
avoid the extra complexity if it is easily avoidable.


Re: [LKP] 601d5abfea [ 13.686356] BUG: unable to handle kernel paging request at 34ca027e

2018-10-12 Thread Eric W. Biederman
kernel test robot  writes:

> Greetings,
>
> 0day kernel testing robot got the below dmesg and the first bad commit is
>
> https://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git 
> siginfo-next
>

This is an odd one.  This is the first time I recall the lkp test robot
emailing me and telling me that I have fixed an issue.  I don't mind I
am just a little surprised.

Recently it was discovered that on my branch passing a large negative signal
number to rt_sigqueueinfo could cause an oops.  The underlying issues
were fixed by:

b2a2ab527d6d ("signal: Guard against negative signal numbers in 
copy_siginfo_from_user")
a36700589b85 ("signal: Guard against negative signal numbers in 
copy_siginfo_from_user32")

It appears that this issue was found in linux-next before these fixes
were applied.  And then the top of my siginfo-next branch where these
fixes exist was tested.

I have not problem with that sequence of events it is just a little
surprising.

If I have not read this test report correctly please let me know.

Eric

> commit 601d5abfeaf244b86bb68c1e05c6e0d57be2f6b0
> Author: Eric W. Biederman 
> AuthorDate: Fri Oct 5 09:02:48 2018 +0200
> Commit: Eric W. Biederman 
> CommitDate: Mon Oct 8 09:35:26 2018 +0200
>
> signal: In sigqueueinfo prefer sig not si_signo
> 
> Andrei Vagin  reported:
> 
> > Accoding to the man page, the user should not set si_signo, it has to 
> be set
> > by kernel.
> >
> > $ man 2 rt_sigqueueinfo
> >
> > The uinfo argument specifies the data to accompany  the  signal.   
> This
> >argument  is  a  pointer to a structure of type siginfo_t, 
> described in
> >sigaction(2) (and defined  by  including  ).   The  
> caller
> >should set the following fields in this structure:
> >
> >si_code
> >   This  must  be  one of the SI_* codes in the Linux kernel 
> source
> >   file include/asm-generic/siginfo.h, with  the  
> restriction  that
> >   the  code  must  be  negative (i.e., cannot be SI_USER, 
> which is
> >   used by the kernel to indicate a signal  sent  by  
> kill(2))  and
> >   cannot  (since  Linux  2.6.39) be SI_TKILL (which is used 
> by the
> >   kernel to indicate a signal sent using tgkill(2)).
> >
> >si_pid This should be set to a process ID, typically the process 
> ID  of
> >   the sender.
> >
> >si_uid This  should  be set to a user ID, typically the real 
> user ID of
> >   the sender.
> >
> >si_value
> >   This field contains the user data to accompany the 
> signal.   For
> >   more information, see the description of the last (union 
> sigval)
> >   argument of sigqueue(3).
> >
> >Internally, the kernel sets the si_signo field to the  value  
> specified
> >in  sig,  so that the receiver of the signal can also obtain the 
> signal
> >number via that field.
> >
> > On Tue, Sep 25, 2018 at 07:19:02PM +0200, Eric W. Biederman wrote:
> >>
> >> If there is some application that calls sigqueueinfo directly that has
> >> a problem with this added sanity check we can revisit this when we see
> >> what kind of crazy that application is doing.
> >
> >
> > I already know two "applications" ;)
> >
> > 
> https://github.com/torvalds/linux/blob/master/tools/testing/selftests/ptrace/peeksiginfo.c
> > 
> https://github.com/checkpoint-restore/criu/blob/master/test/zdtm/static/sigpending.c
> >
> > Disclaimer: I'm the author of both of them.
> 
> Looking at the kernel code the historical behavior has alwasy been to 
> prefer
> the signal number passed in by the kernel.
> 
> So sigh.  Implmenet __copy_siginfo_from_user and 
> __copy_siginfo_from_user32 to
> take that signal number and prefer it.  The user of ptrace will still
> use copy_siginfo_from_user and copy_siginfo_from_user32 as they do not and
> never have had a signal number there.
> 
> Luckily this change has never made it farther than linux-next.
> 
> Fixes: e75dc036c445 ("signal: Fail sigqueueinfo if si_signo != sig")
> Reported-by: Andrei Vagin 
> Tested-by: Andrei Vagin 
> Signed-off-by: "Eric W. Biederman" 
>
> 4ce5f9c9e7  signal: Use a smaller struct siginfo in the kernel
> 601d5abfea  signal: In sigqueueinfo prefer sig not si_signo
> a36700589b  signal: Guard against negative signal numbers in 
> copy_siginfo_from_user32
> 771b65e89c  Add linux-next specific files for 20181011
> +--++++---+
> |  | 4ce5f9c9e7 | 601d5abfea | 
> a36700589b | next-20181011 |
> 

Re: [LKP] 601d5abfea [ 13.686356] BUG: unable to handle kernel paging request at 34ca027e

2018-10-12 Thread Eric W. Biederman
kernel test robot  writes:

> Greetings,
>
> 0day kernel testing robot got the below dmesg and the first bad commit is
>
> https://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git 
> siginfo-next
>

This is an odd one.  This is the first time I recall the lkp test robot
emailing me and telling me that I have fixed an issue.  I don't mind I
am just a little surprised.

Recently it was discovered that on my branch passing a large negative signal
number to rt_sigqueueinfo could cause an oops.  The underlying issues
were fixed by:

b2a2ab527d6d ("signal: Guard against negative signal numbers in 
copy_siginfo_from_user")
a36700589b85 ("signal: Guard against negative signal numbers in 
copy_siginfo_from_user32")

It appears that this issue was found in linux-next before these fixes
were applied.  And then the top of my siginfo-next branch where these
fixes exist was tested.

I have not problem with that sequence of events it is just a little
surprising.

If I have not read this test report correctly please let me know.

Eric

> commit 601d5abfeaf244b86bb68c1e05c6e0d57be2f6b0
> Author: Eric W. Biederman 
> AuthorDate: Fri Oct 5 09:02:48 2018 +0200
> Commit: Eric W. Biederman 
> CommitDate: Mon Oct 8 09:35:26 2018 +0200
>
> signal: In sigqueueinfo prefer sig not si_signo
> 
> Andrei Vagin  reported:
> 
> > Accoding to the man page, the user should not set si_signo, it has to 
> be set
> > by kernel.
> >
> > $ man 2 rt_sigqueueinfo
> >
> > The uinfo argument specifies the data to accompany  the  signal.   
> This
> >argument  is  a  pointer to a structure of type siginfo_t, 
> described in
> >sigaction(2) (and defined  by  including  ).   The  
> caller
> >should set the following fields in this structure:
> >
> >si_code
> >   This  must  be  one of the SI_* codes in the Linux kernel 
> source
> >   file include/asm-generic/siginfo.h, with  the  
> restriction  that
> >   the  code  must  be  negative (i.e., cannot be SI_USER, 
> which is
> >   used by the kernel to indicate a signal  sent  by  
> kill(2))  and
> >   cannot  (since  Linux  2.6.39) be SI_TKILL (which is used 
> by the
> >   kernel to indicate a signal sent using tgkill(2)).
> >
> >si_pid This should be set to a process ID, typically the process 
> ID  of
> >   the sender.
> >
> >si_uid This  should  be set to a user ID, typically the real 
> user ID of
> >   the sender.
> >
> >si_value
> >   This field contains the user data to accompany the 
> signal.   For
> >   more information, see the description of the last (union 
> sigval)
> >   argument of sigqueue(3).
> >
> >Internally, the kernel sets the si_signo field to the  value  
> specified
> >in  sig,  so that the receiver of the signal can also obtain the 
> signal
> >number via that field.
> >
> > On Tue, Sep 25, 2018 at 07:19:02PM +0200, Eric W. Biederman wrote:
> >>
> >> If there is some application that calls sigqueueinfo directly that has
> >> a problem with this added sanity check we can revisit this when we see
> >> what kind of crazy that application is doing.
> >
> >
> > I already know two "applications" ;)
> >
> > 
> https://github.com/torvalds/linux/blob/master/tools/testing/selftests/ptrace/peeksiginfo.c
> > 
> https://github.com/checkpoint-restore/criu/blob/master/test/zdtm/static/sigpending.c
> >
> > Disclaimer: I'm the author of both of them.
> 
> Looking at the kernel code the historical behavior has alwasy been to 
> prefer
> the signal number passed in by the kernel.
> 
> So sigh.  Implmenet __copy_siginfo_from_user and 
> __copy_siginfo_from_user32 to
> take that signal number and prefer it.  The user of ptrace will still
> use copy_siginfo_from_user and copy_siginfo_from_user32 as they do not and
> never have had a signal number there.
> 
> Luckily this change has never made it farther than linux-next.
> 
> Fixes: e75dc036c445 ("signal: Fail sigqueueinfo if si_signo != sig")
> Reported-by: Andrei Vagin 
> Tested-by: Andrei Vagin 
> Signed-off-by: "Eric W. Biederman" 
>
> 4ce5f9c9e7  signal: Use a smaller struct siginfo in the kernel
> 601d5abfea  signal: In sigqueueinfo prefer sig not si_signo
> a36700589b  signal: Guard against negative signal numbers in 
> copy_siginfo_from_user32
> 771b65e89c  Add linux-next specific files for 20181011
> +--++++---+
> |  | 4ce5f9c9e7 | 601d5abfea | 
> a36700589b | next-20181011 |
> 

Re: [PATCH v2 1/7] modules: Create rlimit for module space

2018-10-12 Thread Edgecombe, Rick P
On Fri, 2018-10-12 at 19:22 +0200, Jann Horn wrote:
> On Fri, Oct 12, 2018 at 7:04 PM Edgecombe, Rick P
>  wrote:
> > On Fri, 2018-10-12 at 02:35 +0200, Jann Horn wrote:
> > > Why all the rbtree stuff instead of stashing a pointer in struct
> > > vmap_area, or something like that?
> > 
> > Since the tracking was not for all vmalloc usage, the intention was to not
> > bloat
> > the structure for other usages likes stacks. I thought usually there
> > wouldn't be
> > nearly as much module space allocations as there would be kernel stacks, but
> > I
> > didn't do any actual measurements on the tradeoffs.
> 
> I imagine that one extra pointer in there - pointing to your struct
> mod_alloc_user - would probably not be terrible. 8 bytes more per
> kernel stack shouldn't be so bad?

I looked into this and it starts to look a little messy. The nommu.c version of
vmalloc doesn't use or expose access to vmap_area or vm_struct. So it starts to
look like a bunch of IFDEFs to remove the rlimit in the nommu case or making a
stand in that maintains pretend vm struct's in nommu.c. I had actually
previously tried to at least pull the allocations size from vmalloc structs, 
but it broke on nommu.

Thought I would check back and see. How important do you think this is?




Re: [PATCH v2 1/7] modules: Create rlimit for module space

2018-10-12 Thread Edgecombe, Rick P
On Fri, 2018-10-12 at 19:22 +0200, Jann Horn wrote:
> On Fri, Oct 12, 2018 at 7:04 PM Edgecombe, Rick P
>  wrote:
> > On Fri, 2018-10-12 at 02:35 +0200, Jann Horn wrote:
> > > Why all the rbtree stuff instead of stashing a pointer in struct
> > > vmap_area, or something like that?
> > 
> > Since the tracking was not for all vmalloc usage, the intention was to not
> > bloat
> > the structure for other usages likes stacks. I thought usually there
> > wouldn't be
> > nearly as much module space allocations as there would be kernel stacks, but
> > I
> > didn't do any actual measurements on the tradeoffs.
> 
> I imagine that one extra pointer in there - pointing to your struct
> mod_alloc_user - would probably not be terrible. 8 bytes more per
> kernel stack shouldn't be so bad?

I looked into this and it starts to look a little messy. The nommu.c version of
vmalloc doesn't use or expose access to vmap_area or vm_struct. So it starts to
look like a bunch of IFDEFs to remove the rlimit in the nommu case or making a
stand in that maintains pretend vm struct's in nommu.c. I had actually
previously tried to at least pull the allocations size from vmalloc structs, 
but it broke on nommu.

Thought I would check back and see. How important do you think this is?




Re: [PATCH] mm/kasan: make quarantine_lock a raw_spinlock_t

2018-10-12 Thread Andrew Morton
On Wed, 10 Oct 2018 23:49:45 +0200 Sebastian Andrzej Siewior 
 wrote:

> On 2018-10-10 11:57:41 [+0200], Dmitry Vyukov wrote:
> > Yes. Clark's patch looks good to me. Probably would be useful to add a
> > comment as to why raw spinlock is used (otherwise somebody may
> > refactor it back later).
> 
> If you really insist, I could add something but this didn't happen so
> far. git's changelog should provide enough information why to why it was
> changed.

Requiring code readers to look up changelogs in git is rather user-hostile.
There are several reasons for using raw_*, so an explanatory comment at
each site is called for.

However it would be smarter to stop "using raw_* for several reasons". 
Instead, create a differently named variant for each such reason.  ie, do

/*
 * Nice comment goes here.  It explains all the possible reasons why -rt
 * might use a raw_spin_lock when a spin_lock could otherwise be used.
 */
#define raw_spin_lock_for_rtraw_spinlock

Then use raw_spin_lock_for_rt() at all such sites.


Re: [PATCH] mm/kasan: make quarantine_lock a raw_spinlock_t

2018-10-12 Thread Andrew Morton
On Wed, 10 Oct 2018 23:49:45 +0200 Sebastian Andrzej Siewior 
 wrote:

> On 2018-10-10 11:57:41 [+0200], Dmitry Vyukov wrote:
> > Yes. Clark's patch looks good to me. Probably would be useful to add a
> > comment as to why raw spinlock is used (otherwise somebody may
> > refactor it back later).
> 
> If you really insist, I could add something but this didn't happen so
> far. git's changelog should provide enough information why to why it was
> changed.

Requiring code readers to look up changelogs in git is rather user-hostile.
There are several reasons for using raw_*, so an explanatory comment at
each site is called for.

However it would be smarter to stop "using raw_* for several reasons". 
Instead, create a differently named variant for each such reason.  ie, do

/*
 * Nice comment goes here.  It explains all the possible reasons why -rt
 * might use a raw_spin_lock when a spin_lock could otherwise be used.
 */
#define raw_spin_lock_for_rtraw_spinlock

Then use raw_spin_lock_for_rt() at all such sites.


Re: [patch] mm, slab: avoid high-order slab pages when it does not reduce waste

2018-10-12 Thread David Rientjes
On Fri, 12 Oct 2018, Andrew Morton wrote:

> > The slab allocator has a heuristic that checks whether the internal
> > fragmentation is satisfactory and, if not, increases cachep->gfporder to
> > try to improve this.
> > 
> > If the amount of waste is the same at higher cachep->gfporder values,
> > there is no significant benefit to allocating higher order memory.  There
> > will be fewer calls to the page allocator, but each call will require
> > zone->lock and finding the page of best fit from the per-zone free areas.
> > 
> > Instead, it is better to allocate order-0 memory if possible so that pages
> > can be returned from the per-cpu pagesets (pcp).
> > 
> > There are two reasons to prefer this over allocating high order memory:
> > 
> >  - allocating from the pcp lists does not require a per-zone lock, and
> > 
> >  - this reduces stranding of MIGRATE_UNMOVABLE pageblocks on pcp lists
> >that increases slab fragmentation across a zone.
> 
> Confused.  Higher-order slab pages never go through the pcp lists, do
> they?

Nope.

> I'd have thought that by tending to increase the amount of
> order-0 pages which are used by slab, such stranding would be
> *increased*?
> 

These cpus have MIGRATE_UNMOVABLE pages on their pcp list.  But because 
they are order-1 instead of order-0, we take zone->lock and find the 
smallest possible page in the zone's free area that is of sufficient size.  
On low on memory situations, there are no pages of MIGRATE_UNMOVABLE 
migratetype at any order in the free area.  This calls 
__rmqueue_fallback() that steals pageblocks, MIGRATE_RECLAIMABLE and then 
MIGRATE_MOVABLE, and as MIGRATE_UNMOVABLE.

We rely on the pcp batch count to backfill MIGRATE_UNMOVABLE pages onto 
the pcp list so we don't need to take zone->lock, and as a result of these 
allocations being order-0 rather than order-1 we can then allocate from 
these pages when such slab caches are expanded rather than stranding them.

We noticed this when the amount of memory wasted for TCPv6 was the same 
for both order-0 and order-1 allocations (order-1 waste was two times the 
order-0 waste).  We had hundreds of cpus with pages on their 
MIGRATE_UNMOVABLE pcp list, but while allocating order-1 memory it would 
prefer to happily steal other pageblocks before calling reclaim and 
draining pcp lists.

> > We are particularly interested in the second point to eliminate cases
> > where all other pages on a pageblock are movable (or free) and fallback to
> > pageblocks of other migratetypes from the per-zone free areas causes
> > high-order slab memory to be allocated from them rather than from free
> > MIGRATE_UNMOVABLE pages on the pcp.
> > 
> >  mm/slab.c | 15 +++
> 
> Do slub and slob also suffer from this effect?
> 

SLOB should not, SLUB will typically increase the order to improve 
performance of the cpu cache; there's a drawback to changing out the cpu 
cache that SLAB does not have.  In the case that this patch is addressing, 
there is no greater memory utilization from the allocted slab pages.


Re: [patch] mm, slab: avoid high-order slab pages when it does not reduce waste

2018-10-12 Thread David Rientjes
On Fri, 12 Oct 2018, Andrew Morton wrote:

> > The slab allocator has a heuristic that checks whether the internal
> > fragmentation is satisfactory and, if not, increases cachep->gfporder to
> > try to improve this.
> > 
> > If the amount of waste is the same at higher cachep->gfporder values,
> > there is no significant benefit to allocating higher order memory.  There
> > will be fewer calls to the page allocator, but each call will require
> > zone->lock and finding the page of best fit from the per-zone free areas.
> > 
> > Instead, it is better to allocate order-0 memory if possible so that pages
> > can be returned from the per-cpu pagesets (pcp).
> > 
> > There are two reasons to prefer this over allocating high order memory:
> > 
> >  - allocating from the pcp lists does not require a per-zone lock, and
> > 
> >  - this reduces stranding of MIGRATE_UNMOVABLE pageblocks on pcp lists
> >that increases slab fragmentation across a zone.
> 
> Confused.  Higher-order slab pages never go through the pcp lists, do
> they?

Nope.

> I'd have thought that by tending to increase the amount of
> order-0 pages which are used by slab, such stranding would be
> *increased*?
> 

These cpus have MIGRATE_UNMOVABLE pages on their pcp list.  But because 
they are order-1 instead of order-0, we take zone->lock and find the 
smallest possible page in the zone's free area that is of sufficient size.  
On low on memory situations, there are no pages of MIGRATE_UNMOVABLE 
migratetype at any order in the free area.  This calls 
__rmqueue_fallback() that steals pageblocks, MIGRATE_RECLAIMABLE and then 
MIGRATE_MOVABLE, and as MIGRATE_UNMOVABLE.

We rely on the pcp batch count to backfill MIGRATE_UNMOVABLE pages onto 
the pcp list so we don't need to take zone->lock, and as a result of these 
allocations being order-0 rather than order-1 we can then allocate from 
these pages when such slab caches are expanded rather than stranding them.

We noticed this when the amount of memory wasted for TCPv6 was the same 
for both order-0 and order-1 allocations (order-1 waste was two times the 
order-0 waste).  We had hundreds of cpus with pages on their 
MIGRATE_UNMOVABLE pcp list, but while allocating order-1 memory it would 
prefer to happily steal other pageblocks before calling reclaim and 
draining pcp lists.

> > We are particularly interested in the second point to eliminate cases
> > where all other pages on a pageblock are movable (or free) and fallback to
> > pageblocks of other migratetypes from the per-zone free areas causes
> > high-order slab memory to be allocated from them rather than from free
> > MIGRATE_UNMOVABLE pages on the pcp.
> > 
> >  mm/slab.c | 15 +++
> 
> Do slub and slob also suffer from this effect?
> 

SLOB should not, SLUB will typically increase the order to improve 
performance of the cpu cache; there's a drawback to changing out the cpu 
cache that SLAB does not have.  In the case that this patch is addressing, 
there is no greater memory utilization from the allocted slab pages.


Re: turbostat-17.06.23 floating point exception

2018-10-12 Thread Len Brown
> Why would the cpu topology report 0 cpus?  I added a debug entry to
> cpu_usage_stat and /proc/stat showed it as an extra column.  Then
> fscanf parsing in for_all_cpus() failed, causing the SIGFPE.
>
> This is not an issue. Thanks.

Yes, it is true that turbostat doesn't check for systems with 0 cpus.
I'm curious how you provoked the kernel to claim that.  If it is
something others might do, we can have check for it and gracefully
exit.

thanks,
-Len




-- 
Len Brown, Intel Open Source Technology Center


Re: turbostat-17.06.23 floating point exception

2018-10-12 Thread Len Brown
> Why would the cpu topology report 0 cpus?  I added a debug entry to
> cpu_usage_stat and /proc/stat showed it as an extra column.  Then
> fscanf parsing in for_all_cpus() failed, causing the SIGFPE.
>
> This is not an issue. Thanks.

Yes, it is true that turbostat doesn't check for systems with 0 cpus.
I'm curious how you provoked the kernel to claim that.  If it is
something others might do, we can have check for it and gracefully
exit.

thanks,
-Len




-- 
Len Brown, Intel Open Source Technology Center


Re: [PATCH v2 4/7] arm64/modules: Add rlimit checking for arm64 modules

2018-10-12 Thread Edgecombe, Rick P
On Fri, 2018-10-12 at 22:01 +, Edgecombe, Rick P wrote:
> On Fri, 2018-10-12 at 16:32 +0200, Jessica Yu wrote:
> > +++ Dave Hansen [11/10/18 16:47 -0700]:
> > > On 10/11/2018 04:31 PM, Rick Edgecombe wrote:
> > > > +   if (check_inc_mod_rlimit(size))
> > > > +   return NULL;
> > > > +
> > > > p = __vmalloc_node_range(size, MODULE_ALIGN, module_alloc_base,
> > > > module_alloc_base + MODULES_VSIZE,
> > > > gfp_mask, PAGE_KERNEL_EXEC, 0,
> > > > @@ -65,6 +68,8 @@ void *module_alloc(unsigned long size)
> > > > return NULL;
> > > > }
> > > > 
> > > > +   update_mod_rlimit(p, size);
> > > 
> > > Is there a reason we couldn't just rename all of the existing per-arch
> > > module_alloc() calls to be called, say, "arch_module_alloc()", then put
> > > this new rlimit code in a generic helper that does:
> > > 
> > > 
> > >   if (check_inc_mod_rlimit(size))
> > >   return NULL;
> > > 
> > >   p = arch_module_alloc(...);
> > > 
> > >   ...
> > > 
> > >   update_mod_rlimit(p, size);
> > > 
> > 
> > I second this suggestion. Just make module_{alloc,memfree} generic,
> > non-weak functions that call the rlimit helpers in addition to the
> > arch-specific arch_module_{alloc,memfree} functions.
> > 
> > Jessica
> 
> Ok, thanks. I am going to try another version of this with just a system wide
> BPF JIT limit based on the problems Jann brought up. I think it would be nice
> to
> have a module space limit, but as far as I know the only way today un-
> privlidged 
> users could fill the space is from BPF JIT. Unless you see another purpose
> long
> term?

Err, nevermind. Going to try to include both limits. I'll incorporate this
refactor into the next version.


Re: [PATCH v2 4/7] arm64/modules: Add rlimit checking for arm64 modules

2018-10-12 Thread Edgecombe, Rick P
On Fri, 2018-10-12 at 22:01 +, Edgecombe, Rick P wrote:
> On Fri, 2018-10-12 at 16:32 +0200, Jessica Yu wrote:
> > +++ Dave Hansen [11/10/18 16:47 -0700]:
> > > On 10/11/2018 04:31 PM, Rick Edgecombe wrote:
> > > > +   if (check_inc_mod_rlimit(size))
> > > > +   return NULL;
> > > > +
> > > > p = __vmalloc_node_range(size, MODULE_ALIGN, module_alloc_base,
> > > > module_alloc_base + MODULES_VSIZE,
> > > > gfp_mask, PAGE_KERNEL_EXEC, 0,
> > > > @@ -65,6 +68,8 @@ void *module_alloc(unsigned long size)
> > > > return NULL;
> > > > }
> > > > 
> > > > +   update_mod_rlimit(p, size);
> > > 
> > > Is there a reason we couldn't just rename all of the existing per-arch
> > > module_alloc() calls to be called, say, "arch_module_alloc()", then put
> > > this new rlimit code in a generic helper that does:
> > > 
> > > 
> > >   if (check_inc_mod_rlimit(size))
> > >   return NULL;
> > > 
> > >   p = arch_module_alloc(...);
> > > 
> > >   ...
> > > 
> > >   update_mod_rlimit(p, size);
> > > 
> > 
> > I second this suggestion. Just make module_{alloc,memfree} generic,
> > non-weak functions that call the rlimit helpers in addition to the
> > arch-specific arch_module_{alloc,memfree} functions.
> > 
> > Jessica
> 
> Ok, thanks. I am going to try another version of this with just a system wide
> BPF JIT limit based on the problems Jann brought up. I think it would be nice
> to
> have a module space limit, but as far as I know the only way today un-
> privlidged 
> users could fill the space is from BPF JIT. Unless you see another purpose
> long
> term?

Err, nevermind. Going to try to include both limits. I'll incorporate this
refactor into the next version.


[PATCH] x86/intel_rdt: Prevent pseudo-locking from using stale pointers

2018-10-12 Thread Reinette Chatre
From: Jithu Joseph 

When the last CPU in an rdt_domain goes offline, its rdt_domain struct
gets freed. Current pseudo-locking code is unaware of this scenario
and tries to dereference the freed structure in a few places.

Add checks to prevent pseudo-locking code from doing this.

While further work is needed to seamlessly restore resource groups
(not just pseudo-locking) to their configuration when the domain is
brought back online, the immediate issue of invalid pointers is
addressed here.

Fixes: f4e80d67a5274 ("x86/intel_rdt: Resctrl files reflect pseudo-locked 
information")
Fixes: 443810fe61605 ("x86/intel_rdt: Create debugfs files for pseudo-locking 
testing")
Fixes: 746e08590b864 ("x86/intel_rdt: Create character device exposing 
pseudo-locked region")
Fixes: 33dc3e410a0d9 ("x86/intel_rdt: Make CPU information accessible for 
pseudo-locked regions")
Signed-off-by: Jithu Joseph 
Signed-off-by: Reinette Chatre 
---
 arch/x86/kernel/cpu/intel_rdt.c |  7 
 arch/x86/kernel/cpu/intel_rdt_ctrlmondata.c | 12 +--
 arch/x86/kernel/cpu/intel_rdt_pseudo_lock.c | 10 ++
 arch/x86/kernel/cpu/intel_rdt_rdtgroup.c| 38 +++--
 4 files changed, 55 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kernel/cpu/intel_rdt.c b/arch/x86/kernel/cpu/intel_rdt.c
index 1214f3f7ec6d..44272b7107ad 100644
--- a/arch/x86/kernel/cpu/intel_rdt.c
+++ b/arch/x86/kernel/cpu/intel_rdt.c
@@ -608,6 +608,13 @@ static void domain_remove_cpu(int cpu, struct rdt_resource 
*r)
cancel_delayed_work(>cqm_limbo);
}
 
+   /*
+* rdt_domain "d" is going to be freed below, so clear
+* its pointer from pseudo_lock_region struct.
+*/
+   if (d->plr)
+   d->plr->d = NULL;
+
kfree(d->ctrl_val);
kfree(d->mbps_val);
bitmap_free(d->rmid_busy_llc);
diff --git a/arch/x86/kernel/cpu/intel_rdt_ctrlmondata.c 
b/arch/x86/kernel/cpu/intel_rdt_ctrlmondata.c
index 0f53049719cd..27937458c231 100644
--- a/arch/x86/kernel/cpu/intel_rdt_ctrlmondata.c
+++ b/arch/x86/kernel/cpu/intel_rdt_ctrlmondata.c
@@ -404,8 +404,16 @@ int rdtgroup_schemata_show(struct kernfs_open_file *of,
for_each_alloc_enabled_rdt_resource(r)
seq_printf(s, "%s:uninitialized\n", r->name);
} else if (rdtgrp->mode == RDT_MODE_PSEUDO_LOCKED) {
-   seq_printf(s, "%s:%d=%x\n", rdtgrp->plr->r->name,
-  rdtgrp->plr->d->id, rdtgrp->plr->cbm);
+   if (!rdtgrp->plr->d) {
+   rdt_last_cmd_clear();
+   rdt_last_cmd_puts("Cache domain offline\n");
+   ret = -ENODEV;
+   } else {
+   seq_printf(s, "%s:%d=%x\n",
+  rdtgrp->plr->r->name,
+  rdtgrp->plr->d->id,
+  rdtgrp->plr->cbm);
+   }
} else {
closid = rdtgrp->closid;
for_each_alloc_enabled_rdt_resource(r) {
diff --git a/arch/x86/kernel/cpu/intel_rdt_pseudo_lock.c 
b/arch/x86/kernel/cpu/intel_rdt_pseudo_lock.c
index 9d54b191964b..9c8b602bd583 100644
--- a/arch/x86/kernel/cpu/intel_rdt_pseudo_lock.c
+++ b/arch/x86/kernel/cpu/intel_rdt_pseudo_lock.c
@@ -1174,6 +1174,11 @@ static int pseudo_lock_measure_cycles(struct rdtgroup 
*rdtgrp, int sel)
goto out;
}
 
+   if (!plr->d) {
+   ret = -ENODEV;
+   goto out;
+   }
+
plr->thread_done = 0;
cpu = cpumask_first(>d->cpu_mask);
if (!cpu_online(cpu)) {
@@ -1494,6 +1499,11 @@ static int pseudo_lock_dev_mmap(struct file *filp, 
struct vm_area_struct *vma)
 
plr = rdtgrp->plr;
 
+   if (!plr->d) {
+   mutex_unlock(_mutex);
+   return -ENODEV;
+   }
+
/*
 * Task is required to run with affinity to the cpus associated
 * with the pseudo-locked region. If this is not the case the task
diff --git a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c 
b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
index 5f55f8848da6..3a3ed7cf3773 100644
--- a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
+++ b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
@@ -268,17 +268,27 @@ static int rdtgroup_cpus_show(struct kernfs_open_file *of,
  struct seq_file *s, void *v)
 {
struct rdtgroup *rdtgrp;
+   struct cpumask *mask;
int ret = 0;
 
rdtgrp = rdtgroup_kn_lock_live(of->kn);
 
if (rdtgrp) {
-   if (rdtgrp->mode == RDT_MODE_PSEUDO_LOCKED)
-   seq_printf(s, is_cpu_list(of) ? "%*pbl\n" : "%*pb\n",
-  

[PATCH] x86/intel_rdt: Prevent pseudo-locking from using stale pointers

2018-10-12 Thread Reinette Chatre
From: Jithu Joseph 

When the last CPU in an rdt_domain goes offline, its rdt_domain struct
gets freed. Current pseudo-locking code is unaware of this scenario
and tries to dereference the freed structure in a few places.

Add checks to prevent pseudo-locking code from doing this.

While further work is needed to seamlessly restore resource groups
(not just pseudo-locking) to their configuration when the domain is
brought back online, the immediate issue of invalid pointers is
addressed here.

Fixes: f4e80d67a5274 ("x86/intel_rdt: Resctrl files reflect pseudo-locked 
information")
Fixes: 443810fe61605 ("x86/intel_rdt: Create debugfs files for pseudo-locking 
testing")
Fixes: 746e08590b864 ("x86/intel_rdt: Create character device exposing 
pseudo-locked region")
Fixes: 33dc3e410a0d9 ("x86/intel_rdt: Make CPU information accessible for 
pseudo-locked regions")
Signed-off-by: Jithu Joseph 
Signed-off-by: Reinette Chatre 
---
 arch/x86/kernel/cpu/intel_rdt.c |  7 
 arch/x86/kernel/cpu/intel_rdt_ctrlmondata.c | 12 +--
 arch/x86/kernel/cpu/intel_rdt_pseudo_lock.c | 10 ++
 arch/x86/kernel/cpu/intel_rdt_rdtgroup.c| 38 +++--
 4 files changed, 55 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kernel/cpu/intel_rdt.c b/arch/x86/kernel/cpu/intel_rdt.c
index 1214f3f7ec6d..44272b7107ad 100644
--- a/arch/x86/kernel/cpu/intel_rdt.c
+++ b/arch/x86/kernel/cpu/intel_rdt.c
@@ -608,6 +608,13 @@ static void domain_remove_cpu(int cpu, struct rdt_resource 
*r)
cancel_delayed_work(>cqm_limbo);
}
 
+   /*
+* rdt_domain "d" is going to be freed below, so clear
+* its pointer from pseudo_lock_region struct.
+*/
+   if (d->plr)
+   d->plr->d = NULL;
+
kfree(d->ctrl_val);
kfree(d->mbps_val);
bitmap_free(d->rmid_busy_llc);
diff --git a/arch/x86/kernel/cpu/intel_rdt_ctrlmondata.c 
b/arch/x86/kernel/cpu/intel_rdt_ctrlmondata.c
index 0f53049719cd..27937458c231 100644
--- a/arch/x86/kernel/cpu/intel_rdt_ctrlmondata.c
+++ b/arch/x86/kernel/cpu/intel_rdt_ctrlmondata.c
@@ -404,8 +404,16 @@ int rdtgroup_schemata_show(struct kernfs_open_file *of,
for_each_alloc_enabled_rdt_resource(r)
seq_printf(s, "%s:uninitialized\n", r->name);
} else if (rdtgrp->mode == RDT_MODE_PSEUDO_LOCKED) {
-   seq_printf(s, "%s:%d=%x\n", rdtgrp->plr->r->name,
-  rdtgrp->plr->d->id, rdtgrp->plr->cbm);
+   if (!rdtgrp->plr->d) {
+   rdt_last_cmd_clear();
+   rdt_last_cmd_puts("Cache domain offline\n");
+   ret = -ENODEV;
+   } else {
+   seq_printf(s, "%s:%d=%x\n",
+  rdtgrp->plr->r->name,
+  rdtgrp->plr->d->id,
+  rdtgrp->plr->cbm);
+   }
} else {
closid = rdtgrp->closid;
for_each_alloc_enabled_rdt_resource(r) {
diff --git a/arch/x86/kernel/cpu/intel_rdt_pseudo_lock.c 
b/arch/x86/kernel/cpu/intel_rdt_pseudo_lock.c
index 9d54b191964b..9c8b602bd583 100644
--- a/arch/x86/kernel/cpu/intel_rdt_pseudo_lock.c
+++ b/arch/x86/kernel/cpu/intel_rdt_pseudo_lock.c
@@ -1174,6 +1174,11 @@ static int pseudo_lock_measure_cycles(struct rdtgroup 
*rdtgrp, int sel)
goto out;
}
 
+   if (!plr->d) {
+   ret = -ENODEV;
+   goto out;
+   }
+
plr->thread_done = 0;
cpu = cpumask_first(>d->cpu_mask);
if (!cpu_online(cpu)) {
@@ -1494,6 +1499,11 @@ static int pseudo_lock_dev_mmap(struct file *filp, 
struct vm_area_struct *vma)
 
plr = rdtgrp->plr;
 
+   if (!plr->d) {
+   mutex_unlock(_mutex);
+   return -ENODEV;
+   }
+
/*
 * Task is required to run with affinity to the cpus associated
 * with the pseudo-locked region. If this is not the case the task
diff --git a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c 
b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
index 5f55f8848da6..3a3ed7cf3773 100644
--- a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
+++ b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
@@ -268,17 +268,27 @@ static int rdtgroup_cpus_show(struct kernfs_open_file *of,
  struct seq_file *s, void *v)
 {
struct rdtgroup *rdtgrp;
+   struct cpumask *mask;
int ret = 0;
 
rdtgrp = rdtgroup_kn_lock_live(of->kn);
 
if (rdtgrp) {
-   if (rdtgrp->mode == RDT_MODE_PSEUDO_LOCKED)
-   seq_printf(s, is_cpu_list(of) ? "%*pbl\n" : "%*pb\n",
-  

Re: [PATCH 1/6] mm: get_user_pages: consolidate error handling

2018-10-12 Thread John Hubbard

On 10/11/18 11:30 PM, Balbir Singh wrote:

On Thu, Oct 11, 2018 at 11:00:09PM -0700, john.hubb...@gmail.com wrote:

From: John Hubbard 

An upcoming patch requires a way to operate on each page that
any of the get_user_pages_*() variants returns.

In preparation for that, consolidate the error handling for
__get_user_pages(). This provides a single location (the "out:" label)
for operating on the collected set of pages that are about to be returned.

As long every use of the "ret" variable is being edited, rename
"ret" --> "err", so that its name matches its true role.
This also gets rid of two shadowed variable declarations, as a
tiny beneficial a side effect.

Reviewed-by: Jan Kara 
Reviewed-by: Andrew Morton 
Signed-off-by: John Hubbard 
---


Looks good, might not be needed but
Reviewed-by: Balbir Singh 



Thanks for the review, very good to have another set of eyes on
this one.

--
thanks,
John Hubbard
NVIDIA


Re: [PATCH 1/6] mm: get_user_pages: consolidate error handling

2018-10-12 Thread John Hubbard

On 10/11/18 11:30 PM, Balbir Singh wrote:

On Thu, Oct 11, 2018 at 11:00:09PM -0700, john.hubb...@gmail.com wrote:

From: John Hubbard 

An upcoming patch requires a way to operate on each page that
any of the get_user_pages_*() variants returns.

In preparation for that, consolidate the error handling for
__get_user_pages(). This provides a single location (the "out:" label)
for operating on the collected set of pages that are about to be returned.

As long every use of the "ret" variable is being edited, rename
"ret" --> "err", so that its name matches its true role.
This also gets rid of two shadowed variable declarations, as a
tiny beneficial a side effect.

Reviewed-by: Jan Kara 
Reviewed-by: Andrew Morton 
Signed-off-by: John Hubbard 
---


Looks good, might not be needed but
Reviewed-by: Balbir Singh 



Thanks for the review, very good to have another set of eyes on
this one.

--
thanks,
John Hubbard
NVIDIA


[PATCH] staging: rtl8188eu: fix spelling mistake "EINPROGESS" -> "EINPROGRESS"

2018-10-12 Thread Colin King
From: Colin Ian King 

Trivial fix to spelling mistake in RT_TRACE message text.

Signed-off-by: Colin Ian King 
---
 drivers/staging/rtl8188eu/os_dep/usb_ops_linux.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/staging/rtl8188eu/os_dep/usb_ops_linux.c 
b/drivers/staging/rtl8188eu/os_dep/usb_ops_linux.c
index 43f031d644fd..d6a499692e96 100644
--- a/drivers/staging/rtl8188eu/os_dep/usb_ops_linux.c
+++ b/drivers/staging/rtl8188eu/os_dep/usb_ops_linux.c
@@ -606,7 +606,7 @@ static void usb_write_port_complete(struct urb *purb, 
struct pt_regs *regs)
if ((purb->status == -EPIPE) || (purb->status == -EPROTO)) {
sreset_set_wifi_error_status(padapter, 
USB_WRITE_PORT_FAIL);
} else if (purb->status == -EINPROGRESS) {
-   RT_TRACE(_module_hci_ops_os_c_, _drv_err_, 
("usb_write_port_complete: EINPROGESS\n"));
+   RT_TRACE(_module_hci_ops_os_c_, _drv_err_, 
("usb_write_port_complete: EINPROGRESS\n"));
goto check_completion;
} else if (purb->status == -ENOENT) {
DBG_88E("%s: -ENOENT\n", __func__);
-- 
2.17.1



[PATCH] staging: rtl8188eu: fix spelling mistake "EINPROGESS" -> "EINPROGRESS"

2018-10-12 Thread Colin King
From: Colin Ian King 

Trivial fix to spelling mistake in RT_TRACE message text.

Signed-off-by: Colin Ian King 
---
 drivers/staging/rtl8188eu/os_dep/usb_ops_linux.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/staging/rtl8188eu/os_dep/usb_ops_linux.c 
b/drivers/staging/rtl8188eu/os_dep/usb_ops_linux.c
index 43f031d644fd..d6a499692e96 100644
--- a/drivers/staging/rtl8188eu/os_dep/usb_ops_linux.c
+++ b/drivers/staging/rtl8188eu/os_dep/usb_ops_linux.c
@@ -606,7 +606,7 @@ static void usb_write_port_complete(struct urb *purb, 
struct pt_regs *regs)
if ((purb->status == -EPIPE) || (purb->status == -EPROTO)) {
sreset_set_wifi_error_status(padapter, 
USB_WRITE_PORT_FAIL);
} else if (purb->status == -EINPROGRESS) {
-   RT_TRACE(_module_hci_ops_os_c_, _drv_err_, 
("usb_write_port_complete: EINPROGESS\n"));
+   RT_TRACE(_module_hci_ops_os_c_, _drv_err_, 
("usb_write_port_complete: EINPROGRESS\n"));
goto check_completion;
} else if (purb->status == -ENOENT) {
DBG_88E("%s: -ENOENT\n", __func__);
-- 
2.17.1



Re: [PATCH 2/6] mm: introduce put_user_page*(), placeholder versions

2018-10-12 Thread John Hubbard

On 10/12/18 12:35 AM, Balbir Singh wrote:

On Thu, Oct 11, 2018 at 11:00:10PM -0700, john.hubb...@gmail.com wrote:

From: John Hubbard 

[...]>> +/*

+ * put_user_pages_dirty() - for each page in the @pages array, make
+ * that page (or its head page, if a compound page) dirty, if it was
+ * previously listed as clean. Then, release the page using
+ * put_user_page().
+ *
+ * Please see the put_user_page() documentation for details.
+ *
+ * set_page_dirty(), which does not lock the page, is used here.
+ * Therefore, it is the caller's responsibility to ensure that this is
+ * safe. If not, then put_user_pages_dirty_lock() should be called instead.
+ *
+ * @pages:  array of pages to be marked dirty and released.
+ * @npages: number of pages in the @pages array.
+ *
+ */
+void put_user_pages_dirty(struct page **pages, unsigned long npages)
+{
+   unsigned long index;
+
+   for (index = 0; index < npages; index++) {

Do we need any checks on npages, npages <= (PUD_SHIFT - PAGE_SHIFT)?



Hi Balbir,

Thanks for combing through this series.

I'd go with "probably not", because the only check that can be done is
what you showed above: "did someone crazy pass in more pages than are
possible for this system?". I don't think checking for that helps here,
as that will show up loud and clear, in other ways.

The release_pages() implementation made the same judgment call to not 
check npages, which also influenced me.


A VM_BUG_ON could be added but I'd prefer not to, as it seems to have 
not enough benefit to be worth it.




+   struct page *page = compound_head(pages[index]);
+
+   if (!PageDirty(page))
+   set_page_dirty(page);
+
+   put_user_page(page);
+   }
+}
+EXPORT_SYMBOL(put_user_pages_dirty);
+
+/*
+ * put_user_pages_dirty_lock() - for each page in the @pages array, make
+ * that page (or its head page, if a compound page) dirty, if it was
+ * previously listed as clean. Then, release the page using
+ * put_user_page().
+ *
+ * Please see the put_user_page() documentation for details.
+ *
+ * This is just like put_user_pages_dirty(), except that it invokes
+ * set_page_dirty_lock(), instead of set_page_dirty().
+ *
+ * @pages:  array of pages to be marked dirty and released.
+ * @npages: number of pages in the @pages array.
+ *
+ */
+void put_user_pages_dirty_lock(struct page **pages, unsigned long npages)
+{
+   unsigned long index;
+
+   for (index = 0; index < npages; index++) {
+   struct page *page = compound_head(pages[index]);
+
+   if (!PageDirty(page))
+   set_page_dirty_lock(page);
+
+   put_user_page(page);
+   }
+}
+EXPORT_SYMBOL(put_user_pages_dirty_lock);
+


This can be collapsed w.r.t put_user_pages_dirty, a function pointer indirection
for the locked vs unlocked case, not sure how that affects function 
optimization.



OK, good point. I initially wanted to avoid the overhead of a function 
pointer, but considering that there are lots of other operations 
happening, I think you're right: best to just get rid of the code 
duplication. If later on we find that changing it back actually helps 
any benchmarks, that can always be done.


See below for how I'm planning on fixing it, and it is a nice little 
cleanup, so thanks for pointing that out.



+/*
+ * put_user_pages() - for each page in the @pages array, release the page
+ * using put_user_page().
+ *
+ * Please see the put_user_page() documentation for details.
+ *
+ * This is just like put_user_pages_dirty(), except that it invokes
+ * set_page_dirty_lock(), instead of set_page_dirty().


The comment is incorrect.


Yes, it sure is! Jan spotted it before, and I fixed it once, then 
rebased off of the version right *before* the fix, so now I have to 
delete that sentence again.  It's hard to kill! :)





+ *
+ * @pages:  array of pages to be marked dirty and released.
+ * @npages: number of pages in the @pages array.
+ *
+ */
+void put_user_pages(struct page **pages, unsigned long npages)
+{
+   unsigned long index;
+
+   for (index = 0; index < npages; index++)
+   put_user_page(pages[index]);
+}


Ditto in terms of code duplication



Here, I think you'll find that the end result, is sufficiently 
de-duplicated, after applying the function pointer above. Here's what it 
looks like without the comment blocks, below. I don't want to go any 
further than this, because the only thing left is the "for" loops, and 
macro-izing such a trivial thing is not really a win:



typedef int (*set_dirty_func)(struct page *page);

static void __put_user_pages_dirty(struct page **pages,
   unsigned long npages,
   set_dirty_func sdf)
{
unsigned long index;

for (index = 0; index < npages; index++) {
struct page *page = compound_head(pages[index]);

if (!PageDirty(page))
   

Re: [PATCH 2/6] mm: introduce put_user_page*(), placeholder versions

2018-10-12 Thread John Hubbard

On 10/12/18 12:35 AM, Balbir Singh wrote:

On Thu, Oct 11, 2018 at 11:00:10PM -0700, john.hubb...@gmail.com wrote:

From: John Hubbard 

[...]>> +/*

+ * put_user_pages_dirty() - for each page in the @pages array, make
+ * that page (or its head page, if a compound page) dirty, if it was
+ * previously listed as clean. Then, release the page using
+ * put_user_page().
+ *
+ * Please see the put_user_page() documentation for details.
+ *
+ * set_page_dirty(), which does not lock the page, is used here.
+ * Therefore, it is the caller's responsibility to ensure that this is
+ * safe. If not, then put_user_pages_dirty_lock() should be called instead.
+ *
+ * @pages:  array of pages to be marked dirty and released.
+ * @npages: number of pages in the @pages array.
+ *
+ */
+void put_user_pages_dirty(struct page **pages, unsigned long npages)
+{
+   unsigned long index;
+
+   for (index = 0; index < npages; index++) {

Do we need any checks on npages, npages <= (PUD_SHIFT - PAGE_SHIFT)?



Hi Balbir,

Thanks for combing through this series.

I'd go with "probably not", because the only check that can be done is
what you showed above: "did someone crazy pass in more pages than are
possible for this system?". I don't think checking for that helps here,
as that will show up loud and clear, in other ways.

The release_pages() implementation made the same judgment call to not 
check npages, which also influenced me.


A VM_BUG_ON could be added but I'd prefer not to, as it seems to have 
not enough benefit to be worth it.




+   struct page *page = compound_head(pages[index]);
+
+   if (!PageDirty(page))
+   set_page_dirty(page);
+
+   put_user_page(page);
+   }
+}
+EXPORT_SYMBOL(put_user_pages_dirty);
+
+/*
+ * put_user_pages_dirty_lock() - for each page in the @pages array, make
+ * that page (or its head page, if a compound page) dirty, if it was
+ * previously listed as clean. Then, release the page using
+ * put_user_page().
+ *
+ * Please see the put_user_page() documentation for details.
+ *
+ * This is just like put_user_pages_dirty(), except that it invokes
+ * set_page_dirty_lock(), instead of set_page_dirty().
+ *
+ * @pages:  array of pages to be marked dirty and released.
+ * @npages: number of pages in the @pages array.
+ *
+ */
+void put_user_pages_dirty_lock(struct page **pages, unsigned long npages)
+{
+   unsigned long index;
+
+   for (index = 0; index < npages; index++) {
+   struct page *page = compound_head(pages[index]);
+
+   if (!PageDirty(page))
+   set_page_dirty_lock(page);
+
+   put_user_page(page);
+   }
+}
+EXPORT_SYMBOL(put_user_pages_dirty_lock);
+


This can be collapsed w.r.t put_user_pages_dirty, a function pointer indirection
for the locked vs unlocked case, not sure how that affects function 
optimization.



OK, good point. I initially wanted to avoid the overhead of a function 
pointer, but considering that there are lots of other operations 
happening, I think you're right: best to just get rid of the code 
duplication. If later on we find that changing it back actually helps 
any benchmarks, that can always be done.


See below for how I'm planning on fixing it, and it is a nice little 
cleanup, so thanks for pointing that out.



+/*
+ * put_user_pages() - for each page in the @pages array, release the page
+ * using put_user_page().
+ *
+ * Please see the put_user_page() documentation for details.
+ *
+ * This is just like put_user_pages_dirty(), except that it invokes
+ * set_page_dirty_lock(), instead of set_page_dirty().


The comment is incorrect.


Yes, it sure is! Jan spotted it before, and I fixed it once, then 
rebased off of the version right *before* the fix, so now I have to 
delete that sentence again.  It's hard to kill! :)





+ *
+ * @pages:  array of pages to be marked dirty and released.
+ * @npages: number of pages in the @pages array.
+ *
+ */
+void put_user_pages(struct page **pages, unsigned long npages)
+{
+   unsigned long index;
+
+   for (index = 0; index < npages; index++)
+   put_user_page(pages[index]);
+}


Ditto in terms of code duplication



Here, I think you'll find that the end result, is sufficiently 
de-duplicated, after applying the function pointer above. Here's what it 
looks like without the comment blocks, below. I don't want to go any 
further than this, because the only thing left is the "for" loops, and 
macro-izing such a trivial thing is not really a win:



typedef int (*set_dirty_func)(struct page *page);

static void __put_user_pages_dirty(struct page **pages,
   unsigned long npages,
   set_dirty_func sdf)
{
unsigned long index;

for (index = 0; index < npages; index++) {
struct page *page = compound_head(pages[index]);

if (!PageDirty(page))
   

Re: turbostat-17.06.23 floating point exception

2018-10-12 Thread Solio Sarabia
On Fri, Oct 12, 2018 at 11:26:30AM -0700, Solio Sarabia wrote:
> Hi --
> 
> turbostat 17.06.23 is throwing an exception on a custom linux-4.16.12
> kernel, on Xeon E5-2699 v4 Broadwell EP, 2S, 22C/S, 44C total, HT off,
> VTx off.
> 
> Initially the system had 4.4.0-137. Then I built and installed
> linux-4.16.12-default.  turbostat works fine for these two versions.
> After building linux-4.16.12 for a second time, the older kernel is
> renamed and now `ls -l /boot/` (I'm using version without .old suffix):
> 
>   vmlinuz-4.16.12-default+
>   vmlinuz-4.16.12-default+.old
> 
> grep -i 'turbostat' /var/log/kern.log
> 
> kernel: [  159.140836] capability: warning: `turbostat' uses 32-bit
>   capabilities (legacy support in use)
> kernel: [  164.149264] traps: turbostat[1801] trap divide error
>   ip:407625 sp:7ffe4b0df000 error:0 in turbostat[40+17000]
> 
> (gdb)
> cpu22: MSR_PKGC3_IRTL: 0x (NOTvalid, 0 ns)
> cpu22: MSR_PKGC6_IRTL: 0x (NOTvalid, 0 ns)
> cpu22: MSR_PKGC7_IRTL: 0x (NOTvalid, 0 ns)
> 
> Program received signal SIGFPE, Arithmetic exception.
> 0x00407625 in compute_average (t=0x61a3b0, c=0x61a3d0, p=0x61a480) at 
> turbostat.c:1378
> 1378average.threads.tsc /= topo.num_cpus;
> 
Why would the cpu topology report 0 cpus?  I added a debug entry to
cpu_usage_stat and /proc/stat showed it as an extra column.  Then
fscanf parsing in for_all_cpus() failed, causing the SIGFPE.

This is not an issue. Thanks.

> Let me know if you need more details.
> 
> Thanks,
> -SS


RE: [PATCH net-next, v2] hv_netvsc: fix vf serial matching with pci slot info

2018-10-12 Thread Haiyang Zhang



> -Original Message-
> From: Stephen Hemminger 
> Sent: Friday, October 12, 2018 6:21 PM
> To: Haiyang Zhang 
> Cc: Haiyang Zhang ; da...@davemloft.net;
> net...@vger.kernel.org; o...@aepfle.de; linux-kernel@vger.kernel.org;
> de...@linuxdriverproject.org; vkuznets 
> Subject: Re: [PATCH net-next, v2] hv_netvsc: fix vf serial matching with pci 
> slot
> info
> 
> On Fri, 12 Oct 2018 20:55:15 +
> Haiyang Zhang  wrote:
> 
> Thanks for fixing this.
> 
> 
> > +   if (kstrtou32(kobject_name(>slot->kobj), 10, )) {
> > +   netdev_notice(vf_netdev, "Invalid vf serial:%s\n",
> > + pdev->slot->kobj.name);
> > +   return NULL;
> > +   }
> 
> Shouldn't this use kobject_name() in the message as well.
> 
> Looking at the pci.h code there is already an API to get name from slot (it 
> uses
> kobject_name()). So please use that one.

Sure, I will look for that api. Thanks.



Re: turbostat-17.06.23 floating point exception

2018-10-12 Thread Solio Sarabia
On Fri, Oct 12, 2018 at 11:26:30AM -0700, Solio Sarabia wrote:
> Hi --
> 
> turbostat 17.06.23 is throwing an exception on a custom linux-4.16.12
> kernel, on Xeon E5-2699 v4 Broadwell EP, 2S, 22C/S, 44C total, HT off,
> VTx off.
> 
> Initially the system had 4.4.0-137. Then I built and installed
> linux-4.16.12-default.  turbostat works fine for these two versions.
> After building linux-4.16.12 for a second time, the older kernel is
> renamed and now `ls -l /boot/` (I'm using version without .old suffix):
> 
>   vmlinuz-4.16.12-default+
>   vmlinuz-4.16.12-default+.old
> 
> grep -i 'turbostat' /var/log/kern.log
> 
> kernel: [  159.140836] capability: warning: `turbostat' uses 32-bit
>   capabilities (legacy support in use)
> kernel: [  164.149264] traps: turbostat[1801] trap divide error
>   ip:407625 sp:7ffe4b0df000 error:0 in turbostat[40+17000]
> 
> (gdb)
> cpu22: MSR_PKGC3_IRTL: 0x (NOTvalid, 0 ns)
> cpu22: MSR_PKGC6_IRTL: 0x (NOTvalid, 0 ns)
> cpu22: MSR_PKGC7_IRTL: 0x (NOTvalid, 0 ns)
> 
> Program received signal SIGFPE, Arithmetic exception.
> 0x00407625 in compute_average (t=0x61a3b0, c=0x61a3d0, p=0x61a480) at 
> turbostat.c:1378
> 1378average.threads.tsc /= topo.num_cpus;
> 
Why would the cpu topology report 0 cpus?  I added a debug entry to
cpu_usage_stat and /proc/stat showed it as an extra column.  Then
fscanf parsing in for_all_cpus() failed, causing the SIGFPE.

This is not an issue. Thanks.

> Let me know if you need more details.
> 
> Thanks,
> -SS


RE: [PATCH net-next, v2] hv_netvsc: fix vf serial matching with pci slot info

2018-10-12 Thread Haiyang Zhang



> -Original Message-
> From: Stephen Hemminger 
> Sent: Friday, October 12, 2018 6:21 PM
> To: Haiyang Zhang 
> Cc: Haiyang Zhang ; da...@davemloft.net;
> net...@vger.kernel.org; o...@aepfle.de; linux-kernel@vger.kernel.org;
> de...@linuxdriverproject.org; vkuznets 
> Subject: Re: [PATCH net-next, v2] hv_netvsc: fix vf serial matching with pci 
> slot
> info
> 
> On Fri, 12 Oct 2018 20:55:15 +
> Haiyang Zhang  wrote:
> 
> Thanks for fixing this.
> 
> 
> > +   if (kstrtou32(kobject_name(>slot->kobj), 10, )) {
> > +   netdev_notice(vf_netdev, "Invalid vf serial:%s\n",
> > + pdev->slot->kobj.name);
> > +   return NULL;
> > +   }
> 
> Shouldn't this use kobject_name() in the message as well.
> 
> Looking at the pci.h code there is already an API to get name from slot (it 
> uses
> kobject_name()). So please use that one.

Sure, I will look for that api. Thanks.



Re: [patch] mm, slab: avoid high-order slab pages when it does not reduce waste

2018-10-12 Thread Andrew Morton
On Fri, 12 Oct 2018 14:24:57 -0700 (PDT) David Rientjes  
wrote:

> The slab allocator has a heuristic that checks whether the internal
> fragmentation is satisfactory and, if not, increases cachep->gfporder to
> try to improve this.
> 
> If the amount of waste is the same at higher cachep->gfporder values,
> there is no significant benefit to allocating higher order memory.  There
> will be fewer calls to the page allocator, but each call will require
> zone->lock and finding the page of best fit from the per-zone free areas.
> 
> Instead, it is better to allocate order-0 memory if possible so that pages
> can be returned from the per-cpu pagesets (pcp).
> 
> There are two reasons to prefer this over allocating high order memory:
> 
>  - allocating from the pcp lists does not require a per-zone lock, and
> 
>  - this reduces stranding of MIGRATE_UNMOVABLE pageblocks on pcp lists
>that increases slab fragmentation across a zone.

Confused.  Higher-order slab pages never go through the pcp lists, do
they?  I'd have thought that by tending to increase the amount of
order-0 pages which are used by slab, such stranding would be
*increased*?

> We are particularly interested in the second point to eliminate cases
> where all other pages on a pageblock are movable (or free) and fallback to
> pageblocks of other migratetypes from the per-zone free areas causes
> high-order slab memory to be allocated from them rather than from free
> MIGRATE_UNMOVABLE pages on the pcp.
> 
>  mm/slab.c | 15 +++

Do slub and slob also suffer from this effect?



Re: [patch] mm, slab: avoid high-order slab pages when it does not reduce waste

2018-10-12 Thread Andrew Morton
On Fri, 12 Oct 2018 14:24:57 -0700 (PDT) David Rientjes  
wrote:

> The slab allocator has a heuristic that checks whether the internal
> fragmentation is satisfactory and, if not, increases cachep->gfporder to
> try to improve this.
> 
> If the amount of waste is the same at higher cachep->gfporder values,
> there is no significant benefit to allocating higher order memory.  There
> will be fewer calls to the page allocator, but each call will require
> zone->lock and finding the page of best fit from the per-zone free areas.
> 
> Instead, it is better to allocate order-0 memory if possible so that pages
> can be returned from the per-cpu pagesets (pcp).
> 
> There are two reasons to prefer this over allocating high order memory:
> 
>  - allocating from the pcp lists does not require a per-zone lock, and
> 
>  - this reduces stranding of MIGRATE_UNMOVABLE pageblocks on pcp lists
>that increases slab fragmentation across a zone.

Confused.  Higher-order slab pages never go through the pcp lists, do
they?  I'd have thought that by tending to increase the amount of
order-0 pages which are used by slab, such stranding would be
*increased*?

> We are particularly interested in the second point to eliminate cases
> where all other pages on a pageblock are movable (or free) and fallback to
> pageblocks of other migratetypes from the per-zone free areas causes
> high-order slab memory to be allocated from them rather than from free
> MIGRATE_UNMOVABLE pages on the pcp.
> 
>  mm/slab.c | 15 +++

Do slub and slob also suffer from this effect?



Re: [PATCH v2 1/5] mfd: lochnagar: Add initial binding documentation

2018-10-12 Thread Rob Herring
On Mon, Oct 08, 2018 at 02:25:38PM +0100, Charles Keepax wrote:
> Lochnagar is an evaluation and development board for Cirrus
> Logic Smart CODEC and Amp devices. It allows the connection of
> most Cirrus Logic devices on mini-cards, as well as allowing
> connection of various application processor systems to provide a
> full evaluation platform. This driver supports the board
> controller chip on the Lochnagar board.
> 
> Signed-off-by: Charles Keepax 
> ---
> 
> Changes since v1:
>  - Move the binding includes into this patch.
>  - Move patch to the start of the patch chain.
>  - Update commit message a little
> 
> Thanks,
> Charles
> 
>  .../devicetree/bindings/mfd/cirrus,lochnagar.txt   | 230 
> +
>  include/dt-bindings/clk/lochnagar.h|  33 +++
>  include/dt-bindings/pinctrl/lochnagar.h| 132 
>  3 files changed, 395 insertions(+)
>  create mode 100644 Documentation/devicetree/bindings/mfd/cirrus,lochnagar.txt
>  create mode 100644 include/dt-bindings/clk/lochnagar.h
>  create mode 100644 include/dt-bindings/pinctrl/lochnagar.h

A few nits, otherwise:

Acked-by: Rob Herring 

> 
> diff --git a/Documentation/devicetree/bindings/mfd/cirrus,lochnagar.txt 
> b/Documentation/devicetree/bindings/mfd/cirrus,lochnagar.txt
> new file mode 100644
> index 0..3382a14c19fe4
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/mfd/cirrus,lochnagar.txt
> @@ -0,0 +1,230 @@
> +Cirrus Logic Lochnagar Audio Development Board
> +
> +Lochnagar is an evaluation and development board for Cirrus Logic
> +Smart CODEC and Amp devices. It allows the connection of most Cirrus
> +Logic devices on mini-cards, as well as allowing connection of
> +various application processor systems to provide a full evaluation
> +platform.  Audio system topology, clocking and power can all be
> +controlled through the Lochnagar, allowing the device under test
> +to be used in a variety of possible use cases.
> +
> +Also see these documents for generic binding information:
> +  [1] GPIO : ../gpio/gpio.txt
> +  [2] Pinctrl: ../pinctrl/pinctrl-bindings.txt
> +  [3] Clock : ../clock/clock-bindings.txt
> +  [4] Regulator: ../regulator/regulator.txt
> +
> +And these for relevant defines:
> +  [5] include/dt-bindings/pinctrl/lochnagar.h
> +  [6] include/dt-bindings/clock/lochnagar.h
> +
> +Required properties:
> +
> +  - compatible : One of the following strings:
> + "cirrus,lochnagar1"
> + "cirrus,lochnagar2"
> +
> +  - reg : I2C slave address
> +
> +  - gpio-controller : Indicates this device is a GPIO controller.
> +  - #gpio-cells : Must be 2. The first cell is the pin number, see
> +[5] for available pins and the second cell is used to specify
> +optional parameters, see [1].
> +  - gpio-ranges : Range of pins managed by the GPIO controller, see
> +[1]. Both the GPIO and Pinctrl base should be set to zero and the
> +count to the appropriate of the LOCHNAGARx_PIN_NUM_GPIOS define,
> +see [5].
> +
> +  - reset-gpios : Reset line to the Lochnagar, see [1].
> +
> +  - #clock-cells : Must be 1. The first cell indicates the clock
> +number, see [6] for available clocks and [3].
> +
> +  - pinctrl-names : A pinctrl state named "default" must be defined.
> +  - pinctrl-0 : A phandle to the default pinctrl state.
> +
> +  - SYSVDD-supply: Primary power supply for the Lochnagar.
> +
> +Optional properties:
> +
> +  - present-gpios : Host present line, indicating the presence of a
> +host system, see [1]. This can be omitted if the present line is
> +tied in hardware.
> +
> +  - clocks : Must contain an entry for each clock in clock-names.
> +  - clock-names : May contain entries for each of the following
> +clocks:
> + - ln-cdc-clkout : Output clock from CODEC card.
> + - ln-dsp-clkout : Output clock from DSP card.
> + - ln-gf-mclk1,ln-gf-mclk2,ln-gf-mclk3,ln-gf-mclk4 : Optional
> +   input audio clocks from host system.
> + - ln-psia1-mclk, ln-psia2-mclk : Optional input audio clocks from
> +   external connector.
> + - ln-spdif-clkout : Optional input audio clock from SPDIF.
> + - ln-adat-clkout : Optional input audio clock from ADAT.
> +
> +  - assigned-clocks : A list of Lochnagar clocks to be reparented, see
> +[6] for available clocks.
> +  - assigned-clock-parents : Parents to be assigned to the clocks
> +listed in "assigned-clocks".
> +
> +  - MICBIAS1-supply, MICBIAS2-supply: Regulator supplies for the
> +MICxVDD outputs, supplying the digital microphones, normally
> +supplied from the attached CODEC.
> +
> +Required sub-nodes:
> +
> +The pin configurations are defined as a child of the pinctrl states
> +node, see [2]. Each sub-node can have the following properties:
> +  - groups : A list of groups to select (either this or "pins" must be
> +specified), available groups:
> +  codec-aif1, codec-aif2, codec-aif3, dsp-aif1, dsp-aif2, psia1,
> +  

Re: [PATCH v13 3/8] clk: Use devm_ in the register fixed factor clock

2018-10-12 Thread Stephen Boyd
Quoting Ricardo Salveti (2018-09-14 11:53:02)
> On Thu, Jun 14, 2018 at 6:55 PM  wrote:
> >
> > From: Ilia Lin 
> >
> > Use devm_clk_hw_register instead of clk_hw_register
> > to simplify the usage of this API. This way drivers that call
> > the clk_hw_register_fixed_factor won't need to maintain
> > a data structure for further cleanup.
> >
> > Signed-off-by: Ilia Lin 
> > Tested-by: Amit Kucheria 
> > ---
> >  drivers/clk/clk-fixed-factor.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/clk/clk-fixed-factor.c b/drivers/clk/clk-fixed-factor.c
> > index a5d402de5584..8e39bda8e596 100644
> > --- a/drivers/clk/clk-fixed-factor.c
> > +++ b/drivers/clk/clk-fixed-factor.c
> > @@ -94,7 +94,7 @@ struct clk_hw *clk_hw_register_fixed_factor(struct device 
> > *dev,
> > init.num_parents = 1;
> >
> > hw = >hw;
> > -   ret = clk_hw_register(dev, hw);
> > +   ret = devm_clk_hw_register(dev, hw);
> 
> Not sure what is the current state of this patch-set, but this change
> breaks drivers calling clk_hw_register_fixed_factor with a NULL dev
> (e.g. imx_clk_fixed_factor), as devm_clk_hw_register needs a valid dev
> for devres_add to work.
> 

Yep. Probably better to just have a driver register the clk_hw structure
itself with the clk framework vs. trying to get it right here in the
generic type registration function.



Re: [PATCH v2 1/5] mfd: lochnagar: Add initial binding documentation

2018-10-12 Thread Rob Herring
On Mon, Oct 08, 2018 at 02:25:38PM +0100, Charles Keepax wrote:
> Lochnagar is an evaluation and development board for Cirrus
> Logic Smart CODEC and Amp devices. It allows the connection of
> most Cirrus Logic devices on mini-cards, as well as allowing
> connection of various application processor systems to provide a
> full evaluation platform. This driver supports the board
> controller chip on the Lochnagar board.
> 
> Signed-off-by: Charles Keepax 
> ---
> 
> Changes since v1:
>  - Move the binding includes into this patch.
>  - Move patch to the start of the patch chain.
>  - Update commit message a little
> 
> Thanks,
> Charles
> 
>  .../devicetree/bindings/mfd/cirrus,lochnagar.txt   | 230 
> +
>  include/dt-bindings/clk/lochnagar.h|  33 +++
>  include/dt-bindings/pinctrl/lochnagar.h| 132 
>  3 files changed, 395 insertions(+)
>  create mode 100644 Documentation/devicetree/bindings/mfd/cirrus,lochnagar.txt
>  create mode 100644 include/dt-bindings/clk/lochnagar.h
>  create mode 100644 include/dt-bindings/pinctrl/lochnagar.h

A few nits, otherwise:

Acked-by: Rob Herring 

> 
> diff --git a/Documentation/devicetree/bindings/mfd/cirrus,lochnagar.txt 
> b/Documentation/devicetree/bindings/mfd/cirrus,lochnagar.txt
> new file mode 100644
> index 0..3382a14c19fe4
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/mfd/cirrus,lochnagar.txt
> @@ -0,0 +1,230 @@
> +Cirrus Logic Lochnagar Audio Development Board
> +
> +Lochnagar is an evaluation and development board for Cirrus Logic
> +Smart CODEC and Amp devices. It allows the connection of most Cirrus
> +Logic devices on mini-cards, as well as allowing connection of
> +various application processor systems to provide a full evaluation
> +platform.  Audio system topology, clocking and power can all be
> +controlled through the Lochnagar, allowing the device under test
> +to be used in a variety of possible use cases.
> +
> +Also see these documents for generic binding information:
> +  [1] GPIO : ../gpio/gpio.txt
> +  [2] Pinctrl: ../pinctrl/pinctrl-bindings.txt
> +  [3] Clock : ../clock/clock-bindings.txt
> +  [4] Regulator: ../regulator/regulator.txt
> +
> +And these for relevant defines:
> +  [5] include/dt-bindings/pinctrl/lochnagar.h
> +  [6] include/dt-bindings/clock/lochnagar.h
> +
> +Required properties:
> +
> +  - compatible : One of the following strings:
> + "cirrus,lochnagar1"
> + "cirrus,lochnagar2"
> +
> +  - reg : I2C slave address
> +
> +  - gpio-controller : Indicates this device is a GPIO controller.
> +  - #gpio-cells : Must be 2. The first cell is the pin number, see
> +[5] for available pins and the second cell is used to specify
> +optional parameters, see [1].
> +  - gpio-ranges : Range of pins managed by the GPIO controller, see
> +[1]. Both the GPIO and Pinctrl base should be set to zero and the
> +count to the appropriate of the LOCHNAGARx_PIN_NUM_GPIOS define,
> +see [5].
> +
> +  - reset-gpios : Reset line to the Lochnagar, see [1].
> +
> +  - #clock-cells : Must be 1. The first cell indicates the clock
> +number, see [6] for available clocks and [3].
> +
> +  - pinctrl-names : A pinctrl state named "default" must be defined.
> +  - pinctrl-0 : A phandle to the default pinctrl state.
> +
> +  - SYSVDD-supply: Primary power supply for the Lochnagar.
> +
> +Optional properties:
> +
> +  - present-gpios : Host present line, indicating the presence of a
> +host system, see [1]. This can be omitted if the present line is
> +tied in hardware.
> +
> +  - clocks : Must contain an entry for each clock in clock-names.
> +  - clock-names : May contain entries for each of the following
> +clocks:
> + - ln-cdc-clkout : Output clock from CODEC card.
> + - ln-dsp-clkout : Output clock from DSP card.
> + - ln-gf-mclk1,ln-gf-mclk2,ln-gf-mclk3,ln-gf-mclk4 : Optional
> +   input audio clocks from host system.
> + - ln-psia1-mclk, ln-psia2-mclk : Optional input audio clocks from
> +   external connector.
> + - ln-spdif-clkout : Optional input audio clock from SPDIF.
> + - ln-adat-clkout : Optional input audio clock from ADAT.
> +
> +  - assigned-clocks : A list of Lochnagar clocks to be reparented, see
> +[6] for available clocks.
> +  - assigned-clock-parents : Parents to be assigned to the clocks
> +listed in "assigned-clocks".
> +
> +  - MICBIAS1-supply, MICBIAS2-supply: Regulator supplies for the
> +MICxVDD outputs, supplying the digital microphones, normally
> +supplied from the attached CODEC.
> +
> +Required sub-nodes:
> +
> +The pin configurations are defined as a child of the pinctrl states
> +node, see [2]. Each sub-node can have the following properties:
> +  - groups : A list of groups to select (either this or "pins" must be
> +specified), available groups:
> +  codec-aif1, codec-aif2, codec-aif3, dsp-aif1, dsp-aif2, psia1,
> +  

Re: [PATCH v13 3/8] clk: Use devm_ in the register fixed factor clock

2018-10-12 Thread Stephen Boyd
Quoting Ricardo Salveti (2018-09-14 11:53:02)
> On Thu, Jun 14, 2018 at 6:55 PM  wrote:
> >
> > From: Ilia Lin 
> >
> > Use devm_clk_hw_register instead of clk_hw_register
> > to simplify the usage of this API. This way drivers that call
> > the clk_hw_register_fixed_factor won't need to maintain
> > a data structure for further cleanup.
> >
> > Signed-off-by: Ilia Lin 
> > Tested-by: Amit Kucheria 
> > ---
> >  drivers/clk/clk-fixed-factor.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/clk/clk-fixed-factor.c b/drivers/clk/clk-fixed-factor.c
> > index a5d402de5584..8e39bda8e596 100644
> > --- a/drivers/clk/clk-fixed-factor.c
> > +++ b/drivers/clk/clk-fixed-factor.c
> > @@ -94,7 +94,7 @@ struct clk_hw *clk_hw_register_fixed_factor(struct device 
> > *dev,
> > init.num_parents = 1;
> >
> > hw = >hw;
> > -   ret = clk_hw_register(dev, hw);
> > +   ret = devm_clk_hw_register(dev, hw);
> 
> Not sure what is the current state of this patch-set, but this change
> breaks drivers calling clk_hw_register_fixed_factor with a NULL dev
> (e.g. imx_clk_fixed_factor), as devm_clk_hw_register needs a valid dev
> for devres_add to work.
> 

Yep. Probably better to just have a driver register the clk_hw structure
itself with the clk framework vs. trying to get it right here in the
generic type registration function.



Re: [PATCH v2 4/7] arm64/modules: Add rlimit checking for arm64 modules

2018-10-12 Thread Edgecombe, Rick P
On Fri, 2018-10-12 at 16:32 +0200, Jessica Yu wrote:
> +++ Dave Hansen [11/10/18 16:47 -0700]:
> > On 10/11/2018 04:31 PM, Rick Edgecombe wrote:
> > > + if (check_inc_mod_rlimit(size))
> > > + return NULL;
> > > +
> > >   p = __vmalloc_node_range(size, MODULE_ALIGN, module_alloc_base,
> > >   module_alloc_base + MODULES_VSIZE,
> > >   gfp_mask, PAGE_KERNEL_EXEC, 0,
> > > @@ -65,6 +68,8 @@ void *module_alloc(unsigned long size)
> > >   return NULL;
> > >   }
> > > 
> > > + update_mod_rlimit(p, size);
> > 
> > Is there a reason we couldn't just rename all of the existing per-arch
> > module_alloc() calls to be called, say, "arch_module_alloc()", then put
> > this new rlimit code in a generic helper that does:
> > 
> > 
> > if (check_inc_mod_rlimit(size))
> > return NULL;
> > 
> > p = arch_module_alloc(...);
> > 
> > ...
> > 
> > update_mod_rlimit(p, size);
> > 
> 
> I second this suggestion. Just make module_{alloc,memfree} generic,
> non-weak functions that call the rlimit helpers in addition to the
> arch-specific arch_module_{alloc,memfree} functions.
> 
> Jessica

Ok, thanks. I am going to try another version of this with just a system wide
BPF JIT limit based on the problems Jann brought up. I think it would be nice to
have a module space limit, but as far as I know the only way today 
un-privlidged 
users could fill the space is from BPF JIT. Unless you see another purpose long
term?

Rick


Re: [PATCH v2 4/7] arm64/modules: Add rlimit checking for arm64 modules

2018-10-12 Thread Edgecombe, Rick P
On Fri, 2018-10-12 at 16:32 +0200, Jessica Yu wrote:
> +++ Dave Hansen [11/10/18 16:47 -0700]:
> > On 10/11/2018 04:31 PM, Rick Edgecombe wrote:
> > > + if (check_inc_mod_rlimit(size))
> > > + return NULL;
> > > +
> > >   p = __vmalloc_node_range(size, MODULE_ALIGN, module_alloc_base,
> > >   module_alloc_base + MODULES_VSIZE,
> > >   gfp_mask, PAGE_KERNEL_EXEC, 0,
> > > @@ -65,6 +68,8 @@ void *module_alloc(unsigned long size)
> > >   return NULL;
> > >   }
> > > 
> > > + update_mod_rlimit(p, size);
> > 
> > Is there a reason we couldn't just rename all of the existing per-arch
> > module_alloc() calls to be called, say, "arch_module_alloc()", then put
> > this new rlimit code in a generic helper that does:
> > 
> > 
> > if (check_inc_mod_rlimit(size))
> > return NULL;
> > 
> > p = arch_module_alloc(...);
> > 
> > ...
> > 
> > update_mod_rlimit(p, size);
> > 
> 
> I second this suggestion. Just make module_{alloc,memfree} generic,
> non-weak functions that call the rlimit helpers in addition to the
> arch-specific arch_module_{alloc,memfree} functions.
> 
> Jessica

Ok, thanks. I am going to try another version of this with just a system wide
BPF JIT limit based on the problems Jann brought up. I think it would be nice to
have a module space limit, but as far as I know the only way today 
un-privlidged 
users could fill the space is from BPF JIT. Unless you see another purpose long
term?

Rick


[PATCH-tip] locking/lockdep: Remove duplicated lock_class_ops percpu array

2018-10-12 Thread Waiman Long
Remove the duplicated lock_class_ops percpu array that is not used
anywhere.

Fixes: 8ca2b56cd7da ("locking/lockdep: Make class->ops a percpu counter
and move it under CONFIG_DEBUG_LOCKDEP=y")

Signed-off-by: Waiman Long 
---
 kernel/locking/lockdep.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index 5faeb2e..be76f47 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -435,7 +435,6 @@ static int save_trace(struct stack_trace *trace)
  * Various lockdep statistics:
  */
 DEFINE_PER_CPU(struct lockdep_stats, lockdep_stats);
-DEFINE_PER_CPU(unsigned long [MAX_LOCKDEP_KEYS], lock_class_ops);
 #endif
 
 /*
-- 
1.8.3.1



[PATCH-tip] locking/lockdep: Remove duplicated lock_class_ops percpu array

2018-10-12 Thread Waiman Long
Remove the duplicated lock_class_ops percpu array that is not used
anywhere.

Fixes: 8ca2b56cd7da ("locking/lockdep: Make class->ops a percpu counter
and move it under CONFIG_DEBUG_LOCKDEP=y")

Signed-off-by: Waiman Long 
---
 kernel/locking/lockdep.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index 5faeb2e..be76f47 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -435,7 +435,6 @@ static int save_trace(struct stack_trace *trace)
  * Various lockdep statistics:
  */
 DEFINE_PER_CPU(struct lockdep_stats, lockdep_stats);
-DEFINE_PER_CPU(unsigned long [MAX_LOCKDEP_KEYS], lock_class_ops);
 #endif
 
 /*
-- 
1.8.3.1



[PATCH] dt-bindings: ufs: Fix the compatible string definition

2018-10-12 Thread Douglas Anderson
If you look at the bindings for the UFS Host Controller it says:

- compatible: must contain "jedec,ufs-1.1" or "jedec,ufs-2.0", may
  also list one or more of the following:
 "qcom,msm8994-ufshc"
 "qcom,msm8996-ufshc"
 "qcom,ufshc"

My reading of that is that it's fine to just have either of these:
1. "qcom,msm8996-ufshc", "jedec,ufs-2.0"
2. "qcom,ufshc", "jedec,ufs-2.0"

As far as I can tell neither of the above is actually a good idea.

For #1 it turns out that the driver currently only keys off the
compatible string "qcom,ufshc" so it won't actually probe.

For #2 the driver won't probe but it's not a good idea to keep the SoC
name out of the compatible string.

Let's update the compatible string to make it really explicit.  We'll
include a nod to the existing driver and the old binding and say that
we should always include the "qcom,ufshc" string in addition to the
SoC compatible string.

While we're at it we'll also include another example SoC known to have
UFS: sdm845.

Fixes: 47555a5c8a11 ("scsi: ufs: make the UFS variant a platform device")
Signed-off-by: Douglas Anderson 
---

 .../devicetree/bindings/ufs/ufshcd-pltfrm.txt   | 13 -
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/Documentation/devicetree/bindings/ufs/ufshcd-pltfrm.txt 
b/Documentation/devicetree/bindings/ufs/ufshcd-pltfrm.txt
index 2df00524bd21..69a06a1b732e 100644
--- a/Documentation/devicetree/bindings/ufs/ufshcd-pltfrm.txt
+++ b/Documentation/devicetree/bindings/ufs/ufshcd-pltfrm.txt
@@ -4,11 +4,14 @@ UFSHC nodes are defined to describe on-chip UFS host 
controllers.
 Each UFS controller instance should have its own node.
 
 Required properties:
-- compatible   : must contain "jedec,ufs-1.1" or "jedec,ufs-2.0", may
- also list one or more of the following:
- "qcom,msm8994-ufshc"
- "qcom,msm8996-ufshc"
- "qcom,ufshc"
+- compatible   : must contain "jedec,ufs-1.1" or "jedec,ufs-2.0"
+
+ For Qualcomm SoCs must contain, as below, an
+ SoC-specific compatible along with "qcom,ufshc" and
+ the appropriate jedec string:
+   "qcom,msm8994-ufshc", "qcom,ufshc", "jedec,ufs-2.0"
+   "qcom,msm8996-ufshc", "qcom,ufshc", "jedec,ufs-2.0"
+   "qcom,sdm845-ufshc", "qcom,ufshc", "jedec,ufs-2.0"
 - interrupts: 
 - reg   : 
 
-- 
2.19.0.605.g01d371f741-goog



[PATCH] dt-bindings: ufs: Fix the compatible string definition

2018-10-12 Thread Douglas Anderson
If you look at the bindings for the UFS Host Controller it says:

- compatible: must contain "jedec,ufs-1.1" or "jedec,ufs-2.0", may
  also list one or more of the following:
 "qcom,msm8994-ufshc"
 "qcom,msm8996-ufshc"
 "qcom,ufshc"

My reading of that is that it's fine to just have either of these:
1. "qcom,msm8996-ufshc", "jedec,ufs-2.0"
2. "qcom,ufshc", "jedec,ufs-2.0"

As far as I can tell neither of the above is actually a good idea.

For #1 it turns out that the driver currently only keys off the
compatible string "qcom,ufshc" so it won't actually probe.

For #2 the driver won't probe but it's not a good idea to keep the SoC
name out of the compatible string.

Let's update the compatible string to make it really explicit.  We'll
include a nod to the existing driver and the old binding and say that
we should always include the "qcom,ufshc" string in addition to the
SoC compatible string.

While we're at it we'll also include another example SoC known to have
UFS: sdm845.

Fixes: 47555a5c8a11 ("scsi: ufs: make the UFS variant a platform device")
Signed-off-by: Douglas Anderson 
---

 .../devicetree/bindings/ufs/ufshcd-pltfrm.txt   | 13 -
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/Documentation/devicetree/bindings/ufs/ufshcd-pltfrm.txt 
b/Documentation/devicetree/bindings/ufs/ufshcd-pltfrm.txt
index 2df00524bd21..69a06a1b732e 100644
--- a/Documentation/devicetree/bindings/ufs/ufshcd-pltfrm.txt
+++ b/Documentation/devicetree/bindings/ufs/ufshcd-pltfrm.txt
@@ -4,11 +4,14 @@ UFSHC nodes are defined to describe on-chip UFS host 
controllers.
 Each UFS controller instance should have its own node.
 
 Required properties:
-- compatible   : must contain "jedec,ufs-1.1" or "jedec,ufs-2.0", may
- also list one or more of the following:
- "qcom,msm8994-ufshc"
- "qcom,msm8996-ufshc"
- "qcom,ufshc"
+- compatible   : must contain "jedec,ufs-1.1" or "jedec,ufs-2.0"
+
+ For Qualcomm SoCs must contain, as below, an
+ SoC-specific compatible along with "qcom,ufshc" and
+ the appropriate jedec string:
+   "qcom,msm8994-ufshc", "qcom,ufshc", "jedec,ufs-2.0"
+   "qcom,msm8996-ufshc", "qcom,ufshc", "jedec,ufs-2.0"
+   "qcom,sdm845-ufshc", "qcom,ufshc", "jedec,ufs-2.0"
 - interrupts: 
 - reg   : 
 
-- 
2.19.0.605.g01d371f741-goog



[PATCH] dt-bindings: phy-qcom-qmp: Fix several mistakes from prior commits

2018-10-12 Thread Douglas Anderson
Digging through the "phy-qcom-qmp" showed me many inconsistencies
between the bindings and the reality of the driver.  Let's fix them
all.

* In commit 2d66eab18375 ("dt-bindings: phy: qmp: Add support for QMP
  phy in IPQ8074") we probably should have explicitly listed that
  there are no clocks for this PHY and also added the reset names in
  alphabetical order.  You can see that there are no clocks in the
  driver where "clk_list" is NULL.

* In commit 8587b220f05e ("dt-bindings: phy-qcom-qmp: Update bindings
  for QMP V3 USB PHY") we probably should have listed the resets for
  this new PHY and also removed the "(Optional)" marking for the "cfg"
  reset since PHYs that need "cfg" really do need it.  It's just that
  not all PHYs need it.

* In commit 7f0802074120 ("dt-bindings: phy-qcom-qmp: Update bindings
  for sdm845") we forgot to update one instance of the string
  "qcom,qmp-v3-usb3-phy" to be "qcom,sdm845-qmp-usb3-phy".  Let's fix
  that.  We should also have added "qcom,sdm845-qmp-usb3-uni-phy" to
  the clock-names and reset-names lists.

* In commit 99c7c7364b71 ("dt-bindings: phy-qcom-qmp: Add UFS phy
  compatible string for sdm845") we should have added the set of
  clocks and resets for "qcom,sdm845-qmp-ufs-phy".  These were taken
  from the driver.

* Cleanup the wording for what properties child nodes have to make it
  more obvious which types of PHYs need clocks and resets.  This was
  sorta implicit in the "-names" description but I found myself
  confused.

* As per the code not all "pcie qmp phys" have resets.  Specifically
  note that the "has_lane_rst" property in the driver is false for
  "ipq8074-qmp-pcie-phy".  Thus make it clear exactly which PHYs need
  child nodes with resets.

Signed-off-by: Douglas Anderson 
---

 .../devicetree/bindings/phy/qcom-qmp-phy.txt  | 31 ++-
 1 file changed, 23 insertions(+), 8 deletions(-)

diff --git a/Documentation/devicetree/bindings/phy/qcom-qmp-phy.txt 
b/Documentation/devicetree/bindings/phy/qcom-qmp-phy.txt
index adf20b2bdf71..fbc198d5dd39 100644
--- a/Documentation/devicetree/bindings/phy/qcom-qmp-phy.txt
+++ b/Documentation/devicetree/bindings/phy/qcom-qmp-phy.txt
@@ -40,24 +40,36 @@ Required properties:
"ref" for 19.2 MHz ref clk,
"com_aux" for phy common block aux clock,
"ref_aux" for phy reference aux clock,
+
+   For "qcom,ipq8074-qmp-pcie-phy": no clocks are listed.
For "qcom,msm8996-qmp-pcie-phy" must contain:
"aux", "cfg_ahb", "ref".
For "qcom,msm8996-qmp-usb3-phy" must contain:
"aux", "cfg_ahb", "ref".
-   For "qcom,qmp-v3-usb3-phy" must contain:
+   For "qcom,sdm845-qmp-usb3-phy" must contain:
+   "aux", "cfg_ahb", "ref", "com_aux".
+   For "qcom,sdm845-qmp-usb3-uni-phy" must contain:
"aux", "cfg_ahb", "ref", "com_aux".
+   For "qcom,sdm845-qmp-ufs-phy" must contain:
+   "ref", "ref_aux".
 
  - resets: a list of phandles and reset controller specifier pairs,
   one for each entry in reset-names.
  - reset-names: "phy" for reset of phy block,
"common" for phy common block reset,
-   "cfg" for phy's ahb cfg block reset (Optional).
+   "cfg" for phy's ahb cfg block reset.
+
+   For "qcom,ipq8074-qmp-pcie-phy" must contain:
+   "phy", "common".
For "qcom,msm8996-qmp-pcie-phy" must contain:
-"phy", "common", "cfg".
+   "phy", "common", "cfg".
For "qcom,msm8996-qmp-usb3-phy" must contain
-"phy", "common".
-   For "qcom,ipq8074-qmp-pcie-phy" must contain:
-"phy", "common".
+   "phy", "common".
+   For "qcom,sdm845-qmp-usb3-phy" must contain:
+   "phy", "common".
+   For "qcom,sdm845-qmp-usb3-uni-phy" must contain:
+   "phy", "common".
+   For "qcom,sdm845-qmp-ufs-phy": no resets are listed.
 
  - vdda-phy-supply: Phandle to a regulator supply to PHY core block.
  - vdda-pll-supply: Phandle to 1.8V regulator supply to PHY refclk pll block.
@@ -79,9 +91,10 @@ Required properties for child node:
 
  - #phy-cells: must be 0
 
+Required properties child node of pcie and usb3 qmp phys:
  - clocks: a list of phandles and clock-specifier pairs,
   one for each entry in clock-names.
- - clock-names: Must contain following for pcie and usb qmp phys:
+ - clock-names: Must contain following:
 "pipe" for pipe clock specific to each lane.
  - clock-output-names: Name of the PHY clock that will be the parent for
   the above pipe clock.
@@ -91,9 +104,11 @@ Required properties for child node:
(or)
  "pcie20_phy1_pipe_clk"
 

[PATCH] dt-bindings: phy-qcom-qmp: Fix several mistakes from prior commits

2018-10-12 Thread Douglas Anderson
Digging through the "phy-qcom-qmp" showed me many inconsistencies
between the bindings and the reality of the driver.  Let's fix them
all.

* In commit 2d66eab18375 ("dt-bindings: phy: qmp: Add support for QMP
  phy in IPQ8074") we probably should have explicitly listed that
  there are no clocks for this PHY and also added the reset names in
  alphabetical order.  You can see that there are no clocks in the
  driver where "clk_list" is NULL.

* In commit 8587b220f05e ("dt-bindings: phy-qcom-qmp: Update bindings
  for QMP V3 USB PHY") we probably should have listed the resets for
  this new PHY and also removed the "(Optional)" marking for the "cfg"
  reset since PHYs that need "cfg" really do need it.  It's just that
  not all PHYs need it.

* In commit 7f0802074120 ("dt-bindings: phy-qcom-qmp: Update bindings
  for sdm845") we forgot to update one instance of the string
  "qcom,qmp-v3-usb3-phy" to be "qcom,sdm845-qmp-usb3-phy".  Let's fix
  that.  We should also have added "qcom,sdm845-qmp-usb3-uni-phy" to
  the clock-names and reset-names lists.

* In commit 99c7c7364b71 ("dt-bindings: phy-qcom-qmp: Add UFS phy
  compatible string for sdm845") we should have added the set of
  clocks and resets for "qcom,sdm845-qmp-ufs-phy".  These were taken
  from the driver.

* Cleanup the wording for what properties child nodes have to make it
  more obvious which types of PHYs need clocks and resets.  This was
  sorta implicit in the "-names" description but I found myself
  confused.

* As per the code not all "pcie qmp phys" have resets.  Specifically
  note that the "has_lane_rst" property in the driver is false for
  "ipq8074-qmp-pcie-phy".  Thus make it clear exactly which PHYs need
  child nodes with resets.

Signed-off-by: Douglas Anderson 
---

 .../devicetree/bindings/phy/qcom-qmp-phy.txt  | 31 ++-
 1 file changed, 23 insertions(+), 8 deletions(-)

diff --git a/Documentation/devicetree/bindings/phy/qcom-qmp-phy.txt 
b/Documentation/devicetree/bindings/phy/qcom-qmp-phy.txt
index adf20b2bdf71..fbc198d5dd39 100644
--- a/Documentation/devicetree/bindings/phy/qcom-qmp-phy.txt
+++ b/Documentation/devicetree/bindings/phy/qcom-qmp-phy.txt
@@ -40,24 +40,36 @@ Required properties:
"ref" for 19.2 MHz ref clk,
"com_aux" for phy common block aux clock,
"ref_aux" for phy reference aux clock,
+
+   For "qcom,ipq8074-qmp-pcie-phy": no clocks are listed.
For "qcom,msm8996-qmp-pcie-phy" must contain:
"aux", "cfg_ahb", "ref".
For "qcom,msm8996-qmp-usb3-phy" must contain:
"aux", "cfg_ahb", "ref".
-   For "qcom,qmp-v3-usb3-phy" must contain:
+   For "qcom,sdm845-qmp-usb3-phy" must contain:
+   "aux", "cfg_ahb", "ref", "com_aux".
+   For "qcom,sdm845-qmp-usb3-uni-phy" must contain:
"aux", "cfg_ahb", "ref", "com_aux".
+   For "qcom,sdm845-qmp-ufs-phy" must contain:
+   "ref", "ref_aux".
 
  - resets: a list of phandles and reset controller specifier pairs,
   one for each entry in reset-names.
  - reset-names: "phy" for reset of phy block,
"common" for phy common block reset,
-   "cfg" for phy's ahb cfg block reset (Optional).
+   "cfg" for phy's ahb cfg block reset.
+
+   For "qcom,ipq8074-qmp-pcie-phy" must contain:
+   "phy", "common".
For "qcom,msm8996-qmp-pcie-phy" must contain:
-"phy", "common", "cfg".
+   "phy", "common", "cfg".
For "qcom,msm8996-qmp-usb3-phy" must contain
-"phy", "common".
-   For "qcom,ipq8074-qmp-pcie-phy" must contain:
-"phy", "common".
+   "phy", "common".
+   For "qcom,sdm845-qmp-usb3-phy" must contain:
+   "phy", "common".
+   For "qcom,sdm845-qmp-usb3-uni-phy" must contain:
+   "phy", "common".
+   For "qcom,sdm845-qmp-ufs-phy": no resets are listed.
 
  - vdda-phy-supply: Phandle to a regulator supply to PHY core block.
  - vdda-pll-supply: Phandle to 1.8V regulator supply to PHY refclk pll block.
@@ -79,9 +91,10 @@ Required properties for child node:
 
  - #phy-cells: must be 0
 
+Required properties child node of pcie and usb3 qmp phys:
  - clocks: a list of phandles and clock-specifier pairs,
   one for each entry in clock-names.
- - clock-names: Must contain following for pcie and usb qmp phys:
+ - clock-names: Must contain following:
 "pipe" for pipe clock specific to each lane.
  - clock-output-names: Name of the PHY clock that will be the parent for
   the above pipe clock.
@@ -91,9 +104,11 @@ Required properties for child node:
(or)
  "pcie20_phy1_pipe_clk"
 

[BUG] perf stat: hangs with -p and process completes

2018-10-12 Thread Stephane Eranian
Hi,

I am running into a perf stat issue with the -p option which allows you
to attach to a running process. If that process happens to terminate
while under monitoring
perf hangs in there and never terminates. The proper behavior would be to stop.
I can see the issue in that the attached process is not a child, so
wait() would not work.

To reproduce:
  $ sleep 10 &
  $ perf stat -p $!

doing the same with perf record works, so there is a solution to this problem.


[BUG] perf stat: hangs with -p and process completes

2018-10-12 Thread Stephane Eranian
Hi,

I am running into a perf stat issue with the -p option which allows you
to attach to a running process. If that process happens to terminate
while under monitoring
perf hangs in there and never terminates. The proper behavior would be to stop.
I can see the issue in that the attached process is not a child, so
wait() would not work.

To reproduce:
  $ sleep 10 &
  $ perf stat -p $!

doing the same with perf record works, so there is a solution to this problem.


[patch] mm, slab: avoid high-order slab pages when it does not reduce waste

2018-10-12 Thread David Rientjes
The slab allocator has a heuristic that checks whether the internal
fragmentation is satisfactory and, if not, increases cachep->gfporder to
try to improve this.

If the amount of waste is the same at higher cachep->gfporder values,
there is no significant benefit to allocating higher order memory.  There
will be fewer calls to the page allocator, but each call will require
zone->lock and finding the page of best fit from the per-zone free areas.

Instead, it is better to allocate order-0 memory if possible so that pages
can be returned from the per-cpu pagesets (pcp).

There are two reasons to prefer this over allocating high order memory:

 - allocating from the pcp lists does not require a per-zone lock, and

 - this reduces stranding of MIGRATE_UNMOVABLE pageblocks on pcp lists
   that increases slab fragmentation across a zone.

We are particularly interested in the second point to eliminate cases
where all other pages on a pageblock are movable (or free) and fallback to
pageblocks of other migratetypes from the per-zone free areas causes
high-order slab memory to be allocated from them rather than from free
MIGRATE_UNMOVABLE pages on the pcp.

Signed-off-by: David Rientjes 
---
 mm/slab.c | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/mm/slab.c b/mm/slab.c
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -1748,6 +1748,7 @@ static size_t calculate_slab_order(struct kmem_cache 
*cachep,
for (gfporder = 0; gfporder <= KMALLOC_MAX_ORDER; gfporder++) {
unsigned int num;
size_t remainder;
+   int order;
 
num = cache_estimate(gfporder, size, flags, );
if (!num)
@@ -1803,6 +1804,20 @@ static size_t calculate_slab_order(struct kmem_cache 
*cachep,
 */
if (left_over * 8 <= (PAGE_SIZE << gfporder))
break;
+
+   /*
+* If a higher gfporder would not reduce internal fragmentation,
+* no need to continue.  The preference is to keep gfporder as
+* small as possible so slab allocations can be served from
+* MIGRATE_UNMOVABLE pcp lists to avoid stranding.
+*/
+   for (order = gfporder + 1; order <= slab_max_order; order++) {
+   cache_estimate(order, size, flags, );
+   if (remainder < left_over)
+   break;
+   }
+   if (order > slab_max_order)
+   break;
}
return left_over;
 }


[patch] mm, slab: avoid high-order slab pages when it does not reduce waste

2018-10-12 Thread David Rientjes
The slab allocator has a heuristic that checks whether the internal
fragmentation is satisfactory and, if not, increases cachep->gfporder to
try to improve this.

If the amount of waste is the same at higher cachep->gfporder values,
there is no significant benefit to allocating higher order memory.  There
will be fewer calls to the page allocator, but each call will require
zone->lock and finding the page of best fit from the per-zone free areas.

Instead, it is better to allocate order-0 memory if possible so that pages
can be returned from the per-cpu pagesets (pcp).

There are two reasons to prefer this over allocating high order memory:

 - allocating from the pcp lists does not require a per-zone lock, and

 - this reduces stranding of MIGRATE_UNMOVABLE pageblocks on pcp lists
   that increases slab fragmentation across a zone.

We are particularly interested in the second point to eliminate cases
where all other pages on a pageblock are movable (or free) and fallback to
pageblocks of other migratetypes from the per-zone free areas causes
high-order slab memory to be allocated from them rather than from free
MIGRATE_UNMOVABLE pages on the pcp.

Signed-off-by: David Rientjes 
---
 mm/slab.c | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/mm/slab.c b/mm/slab.c
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -1748,6 +1748,7 @@ static size_t calculate_slab_order(struct kmem_cache 
*cachep,
for (gfporder = 0; gfporder <= KMALLOC_MAX_ORDER; gfporder++) {
unsigned int num;
size_t remainder;
+   int order;
 
num = cache_estimate(gfporder, size, flags, );
if (!num)
@@ -1803,6 +1804,20 @@ static size_t calculate_slab_order(struct kmem_cache 
*cachep,
 */
if (left_over * 8 <= (PAGE_SIZE << gfporder))
break;
+
+   /*
+* If a higher gfporder would not reduce internal fragmentation,
+* no need to continue.  The preference is to keep gfporder as
+* small as possible so slab allocations can be served from
+* MIGRATE_UNMOVABLE pcp lists to avoid stranding.
+*/
+   for (order = gfporder + 1; order <= slab_max_order; order++) {
+   cache_estimate(order, size, flags, );
+   if (remainder < left_over)
+   break;
+   }
+   if (order > slab_max_order)
+   break;
}
return left_over;
 }


  1   2   3   4   5   6   7   8   9   10   >