date:20180413

[PATCH 3/3] USB: musb: dsps: propagate device-tree node

2018-04-13 Thread Johan Hovold

To be able to use DSPS-based controllers with device-tree descriptions
of the USB topology, we need to associate the glue device's device-tree
node with the child controller device.

Note that this can also be used to eventually let USB core manage
generic phys.

Also note that the other glue drivers will require similar changes to be
able to describe their buses in DT.

Signed-off-by: Johan Hovold 
---
 drivers/usb/musb/musb_dsps.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/usb/musb/musb_dsps.c b/drivers/usb/musb/musb_dsps.c
index 6a60bc0490c5..23dba59045a7 100644
--- a/drivers/usb/musb/musb_dsps.c
+++ b/drivers/usb/musb/musb_dsps.c
@@ -786,6 +786,7 @@ static int dsps_create_musb_pdev(struct dsps_glue *glue,
musb->dev.parent= dev;
musb->dev.dma_mask  = &musb_dmamask;
musb->dev.coherent_dma_mask = musb_dmamask;
+   device_set_of_node_from_dev(&musb->dev, &parent->dev);
 
glue->musb = musb;
 
-- 
2.17.0

[PATCH 2/3] USB: musb: host: prevent core phy initialisation

2018-04-13 Thread Johan Hovold

Set the new HCD flag which prevents USB core from trying to manage our
phys.

This is needed to be able to associate the controller platform device
with the glue device device-tree node on the BBB which uses legacy USB
phys. Otherwise, the generic phy lookup in usb_phy_roothub_init() and
thus HCD registration fails repeatedly with -EPROBE_DEFER (see commit
178a0bce05cb ("usb: core: hcd: integrate the PHY wrapper into the HCD
core")).

Note that a related phy-lookup issue was recently worked around in the
phy core by commit b7563e2796f8 ("phy: work around 'phys' references to
usb-nop-xceiv devices"). Something similar may now be needed for other
USB phys, and in particular if we eventually want to let USB core manage
musb generic phys.

Cc: Arnd Bergmann 
Cc: Martin Blumenstingl 
Signed-off-by: Johan Hovold 
---
 drivers/usb/musb/musb_host.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/usb/musb/musb_host.c b/drivers/usb/musb/musb_host.c
index 3a8451a15f7f..4fa372c845e1 100644
--- a/drivers/usb/musb/musb_host.c
+++ b/drivers/usb/musb/musb_host.c
@@ -2754,6 +2754,7 @@ int musb_host_setup(struct musb *musb, int power_budget)
hcd->self.otg_port = 1;
musb->xceiv->otg->host = &hcd->self;
hcd->power_budget = 2 * (power_budget ? : 250);
+   hcd->skip_phy_initialization = 1;
 
ret = usb_add_hcd(hcd, 0, 0);
if (ret < 0)
-- 
2.17.0

Re: [PATCH v6 01/11] dt-bindings: firmware: Add bindings for ZynqMP firmware

2018-04-13 Thread Rob Herring

On Tue, Apr 10, 2018 at 12:38:37PM -0700, Jolly Shah wrote:
> From: Rajan Vaja 
> 
> Add documentation to describe Xilinx ZynqMP firmware driver
> bindings. Firmware driver provides an interface to firmware
> APIs. Interface APIs can be used by any driver to communicate
> to PMUFW (Platform Management Unit).
> 
> Signed-off-by: Rajan Vaja 
> Signed-off-by: Jolly Shah 
> ---
>  .../firmware/xilinx/xlnx,zynqmp-firmware.txt   | 29 
> ++
>  1 file changed, 29 insertions(+)
>  create mode 100644 
> Documentation/devicetree/bindings/firmware/xilinx/xlnx,zynqmp-firmware.txt

Please add acks/reviewed-by's when posting new versions.

Rob

[PATCH] drm/vmwgfx: Fix scatterlist unmapping

2018-04-13 Thread Robin Murphy

dma_unmap_sg() should be called with the same number of entries
originally passed to dma_map_sg(), not the number it returned, which may
be fewer. Admittedly this driver probably never runs on non-coherent
architectures where getting that wrong could lead to data loss, but it's
always good to be correct, and it's trivially easy to fix by just
restoring the SG table state before the call instead of afterwards.

Signed-off-by: Robin Murphy 
---

Found by inspection while poking around TTM users.

 drivers/gpu/drm/vmwgfx/vmwgfx_buffer.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_buffer.c 
b/drivers/gpu/drm/vmwgfx/vmwgfx_buffer.c
index 2fd091f9..971223d39469 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_buffer.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_buffer.c
@@ -369,9 +369,9 @@ static void vmw_ttm_unmap_from_dma(struct vmw_ttm_tt 
*vmw_tt)
 {
struct device *dev = vmw_tt->dev_priv->dev->dev;
 
+   vmw_tt->sgt.nents = vmw_tt->sgt.orig_nents;
dma_unmap_sg(dev, vmw_tt->sgt.sgl, vmw_tt->sgt.nents,
DMA_BIDIRECTIONAL);
-   vmw_tt->sgt.nents = vmw_tt->sgt.orig_nents;
 }
 
 /**
-- 
2.16.1.dirty

[PATCH] cifs: smb2ops: Fix NULL check in smb2_query_symlink

2018-04-13 Thread Gustavo A. R. Silva

The current code null checks variable err_buf, which is always null
when it is checked, hence utf16_path is free'd and the function
returns -ENOENT everytime it is called, making it impossible for the
execution path to reach the following code:

err_buf = err_iov.iov_base;

Fix this by null checking err_iov.iov_base instead of err_buf. Also,
notice that err_buf no longer needs to be initialized to NULL.

Addresses-Coverity-ID: 1467876 ("Logically dead code")
Fixes: 2d636199e400 ("cifs: Change SMB2_open to return an iov for the error 
parameter")
Signed-off-by: Gustavo A. R. Silva 
---
 fs/cifs/smb2ops.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/cifs/smb2ops.c b/fs/cifs/smb2ops.c
index b4ae932..38ebf3f 100644
--- a/fs/cifs/smb2ops.c
+++ b/fs/cifs/smb2ops.c
@@ -1452,7 +1452,7 @@ smb2_query_symlink(const unsigned int xid, struct 
cifs_tcon *tcon,
struct cifs_open_parms oparms;
struct cifs_fid fid;
struct kvec err_iov = {NULL, 0};
-   struct smb2_err_rsp *err_buf = NULL;
+   struct smb2_err_rsp *err_buf;
struct smb2_symlink_err_rsp *symlink;
unsigned int sub_len;
unsigned int sub_offset;
@@ -1476,7 +1476,7 @@ smb2_query_symlink(const unsigned int xid, struct 
cifs_tcon *tcon,
 
rc = SMB2_open(xid, &oparms, utf16_path, &oplock, NULL, &err_iov);
 
-   if (!rc || !err_buf) {
+   if (!rc || !err_iov.iov_base) {
kfree(utf16_path);
return -ENOENT;
}
-- 
2.7.4

Re: Build error for samples/bpf/ due to commit d0266046ad54 ("x86: Remove FAST_FEATURE_TESTS")

2018-04-13 Thread Alexei Starovoitov

On Fri, Apr 13, 2018 at 03:22:37PM +0200, Jesper Dangaard Brouer wrote:
> Hi Peter,
> 
> Your commit d0266046ad54 ("x86: Remove FAST_FEATURE_TESTS") broke build
> for several samples/bpf programs. I'm unsure what the best way forward
> is to unbreak these...
> 
> The issue is that these samples are build with LLVM/clang (which
> doesn't like 'asm goto' constructs).  And they end up including
> arch/x86/include/asm/cpufeature.h via a long include path, see build
> examples below (through different path to include/linux/thread_info.h).
> 
> Maybe Alexei or Daniel have an idea how to work around this?
> As tools/testing/selftests/bpf/ does not seem to fail!?

Right. All of bpf tracing and samples/bpf/ broke.
Here is the proposed fix that we're asking Peter to apply and send to Linus 
asap.
https://lkml.org/lkml/2018/4/10/825

> Build error#1:
> --
> clang  -nostdinc -isystem /usr/lib/gcc/x86_64-redhat-linux/7/include 
> -I./arch/x86/include -I./arch/x86/include/generated  -I./include 
> -I./arch/x86/include/uapi -I./arch/x86/include/generated/uapi 
> -I./include/uapi -I./include/generated/uapi -include 
> ./include/linux/kconfig.h  -Isamples/bpf \
> -I./tools/testing/selftests/bpf/ \
> -D__KERNEL__ -Wno-unused-value -Wno-pointer-sign \
> -D__TARGET_ARCH_x86 -Wno-compare-distinct-pointer-types \
> -Wno-gnu-variable-sized-type-not-at-end \
> -Wno-address-of-packed-member -Wno-tautological-compare \
> -Wno-unknown-warning-option  \
> -O2 -emit-llvm -c samples/bpf/sockex2_kern.c -o -| llc -march=bpf 
> -filetype=obj -o samples/bpf/sockex2_kern.o
> In file included from samples/bpf/sockex2_kern.c:3:
> In file included from ./include/uapi/linux/in.h:24:
> In file included from ./include/linux/socket.h:8:
> In file included from ./include/linux/uio.h:13:
> In file included from ./include/linux/thread_info.h:38:
> In file included from ./arch/x86/include/asm/thread_info.h:53:
> ./arch/x86/include/asm/cpufeature.h:150:2: error: 'asm goto' constructs are 
> not supported yet
> asm_volatile_goto("1: jmp 6f\n"
> ^
> ./include/linux/compiler-gcc.h:290:42: note: expanded from macro 
> 'asm_volatile_goto'
> #define asm_volatile_goto(x...) do { asm goto(x); asm (""); } while (0)

Re: [PATCH v7 11/26] of: base: Add of_get_cpu_state_node() to get idle states for a CPU node

2018-04-13 Thread Rob Herring

On Thu, Apr 12, 2018 at 6:14 AM, Ulf Hansson  wrote:
> The CPU's idle state nodes are currently parsed at the common cpuidle DT
> library, but also when initializing back-end data for the arch specific CPU
> operations, as in the PSCI driver case.
>
> To avoid open-coding, let's introduce of_get_cpu_state_node(), which takes
> the device node for the CPU and the index to the requested idle state node,
> as in-parameters. In case a corresponding idle state node is found, it
> returns the node with the refcount incremented for it, else it returns
> NULL.
>
> Moreover, for ARM, there are two generic methods, to describe the CPU's
> idle states, either via the flattened description through the
> "cpu-idle-states" binding [1] or via the hierarchical layout, using the
> "power-domains" and the "domain-idle-states" bindings [2]. Hence, let's
> take both options into account.
>
> [1]
> Documentation/devicetree/bindings/arm/idle-states.txt
> [2]
> Documentation/devicetree/bindings/arm/psci.txt
>
> Cc: Rob Herring 
> Cc: devicet...@vger.kernel.org
> Cc: Lina Iyer 
> Suggested-by: Sudeep Holla 
> Co-developed-by: Lina Iyer 
> Signed-off-by: Ulf Hansson 
> ---
>  drivers/of/base.c  | 35 +++
>  include/linux/of.h |  8 
>  2 files changed, 43 insertions(+)

Some reason you didn't add my Reviewed-by from v6?

Rob

Re: [PATCH] mmap.2: MAP_FIXED is okay if the address range has been reserved

2018-04-13 Thread Jann Horn

On Fri, Apr 13, 2018 at 8:49 AM, Michal Hocko  wrote:
> On Fri 13-04-18 08:43:27, Michael Kerrisk wrote:
> [...]
>> So, you mean remove this entire paragraph:
>>
>>   For cases in which the specified memory region has not been
>>   reserved using an existing mapping,  newer  kernels  (Linux
>>   4.17  and later) provide an option MAP_FIXED_NOREPLACE that
>>   should be used instead; older kernels require the caller to
>>   use addr as a hint (without MAP_FIXED) and take appropriate
>>   action if the kernel places the new mapping at a  different
>>   address.
>>
>> It seems like some version of the first half of the paragraph is worth
>> keeping, though, so as to point the reader in the direction of a remedy.
>> How about replacing that text with the following:
>>
>>   Since  Linux 4.17, the MAP_FIXED_NOREPLACE flag can be used
>>   in a multithreaded program to avoid  the  hazard  described
>>   above.
>
> Yes, that sounds reasonable to me.

But that kind of sounds as if you can't avoid it before Linux 4.17,
when actually, you just have to call mmap() with the address as hint,
and if mmap() returns a different address, munmap() it and go on your
normal error path.

Re: [PATCH] iommu: amd: hide unused iommu_table_lock

2018-04-13 Thread Sebastian Andrzej Siewior

On 2018-04-04 12:56:59 [+0200], Arnd Bergmann wrote:
> The newly introduced lock is only used when CONFIG_IRQ_REMAP is enabled:
> 
> drivers/iommu/amd_iommu.c:86:24: error: 'iommu_table_lock' defined but not 
> used [-Werror=unused-variable]
>  static DEFINE_SPINLOCK(iommu_table_lock);
> 
> This moves the definition next to the user, within the #ifdef protected
> section of the file.
> 
> Fixes: ea6166f4b83e ("iommu/amd: Split irq_lookup_table out of the 
> amd_iommu_devtable_lock")
> Signed-off-by: Arnd Bergmann 
Acked-by: Sebastian Andrzej Siewior 

Thank you Arnd.

Sebastian

Re: [PATCH] tools build: Use -Xpreprocessor instead of -Wp and leave pathnames intact

2018-04-13 Thread Dave Martin

On Fri, Apr 13, 2018 at 02:53:10PM +0100, Will Deacon wrote:
> Build.include invokes the pre-processor via GCC in order to generate a
> dependency list for the input file. Since these options are passed using
> '-Wp,-M...,$(depfile)' it is important that $(depfile) does not contain
> any commas, so these are substituted with underscores. This substitution
> will break the build if the directory name of the output directory happens
> to include a comma, e.g. when using "aiaiai" for bisection testing:
> 
>   | cc1: fatal error: x86/tools/objtool/fixdep.o: No such file or directory
>   | compilation terminated.
>   | cat: 
> /tmp/aiaiai-test-patchset.qroS/before/obj.defconfig_x86/tools/objtool/.fixdep.o.d:
>  No such file or directory
>   | make[5]: *** [tools/objtool/fixdep.o] Error 1
> 
> We can address this by using -Xpreprocessor instead of -Wp, which allows
> us to pass down an unmodified pathname.
> 
> Cc: Jiri Olsa 
> Cc: Dave Martin 
> Cc: Arnaldo Carvalho de Melo 
> Cc: Ingo Molnar 
> Signed-off-by: Will Deacon 
> ---
> 
> As an aside, the way we currently pass the depfile to -MD appears to be
> in direct contradiction with the preprocessor documentation, although it
> does work with the cc1 implementation.

Hmmm, I try cc1 --help, and it gives

  ...

  -I 
  -M
  -MD
  -MF 
  -MG
  -MM

  ...

so it looks like even cc1 shouldn't really be parsing a depfile name
argument after -MD.

The only way to get -MD  parsed in the undocumented way seems to
be with gcc -Wp,-MD,... or direct invocation of cc1.  The cpp
frontend, and the gcc frontend itself seem to follow the documentation
and don't parse  as the depfile name here:

[...]

> diff --git a/tools/build/Build.include b/tools/build/Build.include

We should probably address this everywhere when we've figured out what
to do.

> index 418871d02ebf..e1914f8e2328 100644
> --- a/tools/build/Build.include
> +++ b/tools/build/Build.include
> @@ -22,9 +22,7 @@ dot-target = $(dir $@).$(notdir $@)
>  basetarget = $(basename $(notdir $@))
>  
>  ###
> -# The temporary file to save gcc -MD generated dependencies must not
> -# contain a comma
> -depfile = $(subst $(comma),_,$(dot-target).d)
> +depfile = $(dot-target).d
>  
>  ###
>  # Check if both arguments has same arguments. Result is empty string if 
> equal.
> @@ -89,12 +87,12 @@ if_changed = $(if $(strip $(any-prereq) $(arg-check)),
>\
>  # - per target C flags
>  # - per object C flags
>  # - BUILD_STR macro to allow '-D"$(variable)"' constructs
> -c_flags_1 = -Wp,-MD,$(depfile) -Wp,-MT,$@ $(CFLAGS) -D"BUILD_STR(s)=\#s" 
> $(CFLAGS_$(basetarget).o) $(CFLAGS_$(obj))
> +c_flags_1 = -Xpreprocessor -MD -Xpreprocessor $(depfile) -Xpreprocessor -MT 
> -Xpreprocessor $@ $(CFLAGS) -D"BUILD_STR(s)=\#s" $(CFLAGS_$(basetarget).o) 
> $(CFLAGS_$(obj))
>  c_flags_2 = $(filter-out $(CFLAGS_REMOVE_$(basetarget).o), $(c_flags_1))
>  c_flags   = $(filter-out $(CFLAGS_REMOVE_$(obj)), $(c_flags_2))
> -cxx_flags = -Wp,-MD,$(depfile) -Wp,-MT,$@ $(CXXFLAGS) -D"BUILD_STR(s)=\#s" 
> $(CXXFLAGS_$(basetarget).o) $(CXXFLAGS_$(obj))
> +cxx_flags = -Xpreprocessor -MD -Xpreprocessor $(depfile) -Xpreprocessor -MT 
> -Xpreprocessor $@ $(CXXFLAGS) -D"BUILD_STR(s)=\#s" 
> $(CXXFLAGS_$(basetarget).o) $(CXXFLAGS_$(obj))
>  
>  ###
>  ## HOSTCC C flags
>  
> -host_c_flags = -Wp,-MD,$(depfile) -Wp,-MT,$@ $(CHOSTFLAGS) 
> -D"BUILD_STR(s)=\#s" $(CHOSTFLAGS_$(basetarget).o) $(CHOSTFLAGS_$(obj))
> +host_c_flags = -Xpreprocessor -MD -Xpreprocessor $(depfile) -Xpreprocessor 
> -MT -Xpreprocessor $@ $(CHOSTFLAGS) -D"BUILD_STR(s)=\#s" 
> $(CHOSTFLAGS_$(basetarget).o) $(CHOSTFLAGS_$(obj))

Any idea why we use -Wp here other than as a bug compatibility hack?
The gcc/clang support the depfile options directly.  It's possible that
gcc didn't support them, or didn't support -MF, sometime in the distant
past.  This use in the kernel makefiles predates git.

I'm wondering whether we should actually switch to using -M -MF, or -MD
-MF (strictly without -Wp or -Xpreprocessor) rather than relying on
a combination of undocumented interactions between -Wp and cc1, and
cc1 violating its own documentation.

Cheers
---Dave

Re: [PATCH 0/2] drm: Make it compilable without CONFIG_HDMI and CONFIG_I2C

2018-04-13 Thread Daniel Vetter

On Fri, Apr 13, 2018 at 4:46 PM, Thomas Huth  wrote:
> On 13.04.2018 16:32, Daniel Vetter wrote:
>> On Fri, Apr 13, 2018 at 11:40 AM, Thomas Huth  wrote:
>>> By enabling the DRM code for virtio-gpu on S390, you currently also get
>>> all the code that is enabled by CONFIG_HDMI and CONFIG_I2C automatically.
>>> This is quite ugly, since on S390, there is no HDMI and no I2C. Thus it
>>> would be great if the DRM code could also be compiled without CONFIG_HDMI
>>> and CONFIG_I2C. These two patches now refactor the DRM code a little bit
>>> so that we can compile it also without CONFIG_HDMI and CONFIG_I2C.
>>>
>>> Thomas Huth (2):
>>>   drivers/gpu/drm: Move CONFIG_HDMI-dependent code to a separate file
>>>   drivers/gpu/drm: Make the DRM code compilable without CONFIG_I2C
>>
>> What's the benefit? Why does I2C/HDMI hurt you?
>
> Why should I be forced to compile-in subsystems that do not make any
> sense on this architecture? It's just completely weird to see CONFIG_I2C
> enabled on s390x.

"Looks wierd" is not really a good engineering criteria, especially in
graphics :-)

For context: In DRM almost nothing is optional, and it greatly
simplifies life and coding. We don't have epic amounts of #ifdef
battles to make trivial code changes compile, except in all the places
where external stuff is optional (like backlight).

So making something optional will have a pretty clear cost on the drm
subsystem, and it doesn't make sense to pay that cost to "look less
wierd". To get this merged we need some clear benefits, which will
balance out the inevitable cost of having to maintain this forever
(and most likely getting yelled at by Linus for making some rando
compile config no longer work).
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

Re: [PATCH] netfilter: fix CONFIG_NF_REJECT_IPV6=m link error

2018-04-13 Thread Arnd Bergmann

On Fri, Apr 13, 2018 at 3:15 PM, Pablo Neira Ayuso  wrote:
> On Mon, Apr 09, 2018 at 04:43:40PM +0200, Arnd Bergmann wrote:
>> On Mon, Apr 9, 2018 at 4:37 PM, Pablo Neira Ayuso  
>> wrote:
>> > Hi Arnd,
>> >
>> > On Mon, Apr 09, 2018 at 12:53:12PM +0200, Arnd Bergmann wrote:
>> >> We get a new link error with CONFIG_NFT_REJECT_INET=y and 
>> >> CONFIG_NF_REJECT_IPV6=m
>> >
>> > I think we can update NFT_REJECT_INET so it depends on NFT_REJECT_IPV4
>> > and NFT_REJECT_IPV6. This doesn't allow here CONFIG_NFT_REJECT_INET=y
>> > and CONFIG_NF_REJECT_IPV6=m.
>> >
>> > I mean, just like we do with NFT_FIB_INET.
>>
>> That can only work if NFT_REJECT_INET can be made a 'tristate' symbol
>> again, so that code gets built as a loadable module if
>> CONFIG_NF_REJECT_IPV6=m.
>>
>> > BTW, I think this problem has been is not related to the recent patch,
>> > but something older that kbuild robot has triggered more easily for
>> > some reason?
>>
>> 02c7b25e5f54 is the one that turned NF_TABLES_INET into a 'bool'
>> symbol. NFT_REJECT depends on NF_TABLES_INET, so it used to
>> restricted to a loadable module with IPV6=m, but can now be
>> built-in, which causes that link error.
>
> Still one more spin on this, I would like to see if we have a way to
> fix this by simplifing things a bit.
>
> Would this one I'm attaching would work?

One disadvantage is that it makes the vmlinux bigger since
NF_REJECT_IPV{4,6} can no longer be a module at all now.

I suspect you also stil get a link error with IPV6=m, this time because
the nf_reject_ipv6.o file fails to link against the ipv6 code, e.g.
ipv6_skip_exthdr() and icmpv6_send() appear to be unreachable here.
I haven't tried that though, so I might be missing something.

Arnd

Re: [RFC PATCH 16/35] ovl: readd lsattr/chattr support

2018-04-13 Thread Amir Goldstein

On Thu, Apr 12, 2018 at 6:08 PM, Miklos Szeredi  wrote:
> Implement FS_IOC_GETFLAGS and FS_IOC_SETFLAGS.
>
> Needs vfs_ioctl() exported to modules.
>
> Signed-off-by: Miklos Szeredi 
> ---
>  fs/internal.h   |  1 -
>  fs/ioctl.c  |  1 +
>  fs/overlayfs/file.c | 59 
> +
>  include/linux/fs.h  |  2 ++
>  4 files changed, 62 insertions(+), 1 deletion(-)
>
> diff --git a/fs/internal.h b/fs/internal.h
> index 3319bf39e339..d5108d9c6a2f 100644
> --- a/fs/internal.h
> +++ b/fs/internal.h
> @@ -176,7 +176,6 @@ extern const struct dentry_operations 
> ns_dentry_operations;
>   */
>  extern int do_vfs_ioctl(struct file *file, unsigned int fd, unsigned int cmd,
> unsigned long arg);
> -extern long vfs_ioctl(struct file *file, unsigned int cmd, unsigned long 
> arg);
>
>  /*
>   * iomap support:
> diff --git a/fs/ioctl.c b/fs/ioctl.c
> index 5ace7efb0d04..696f4c46a868 100644
> --- a/fs/ioctl.c
> +++ b/fs/ioctl.c
> @@ -49,6 +49,7 @@ long vfs_ioctl(struct file *filp, unsigned int cmd, 
> unsigned long arg)
>   out:
> return error;
>  }
> +EXPORT_SYMBOL(vfs_ioctl);
>
>  static int ioctl_fibmap(struct file *filp, int __user *p)
>  {
> diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c
> index 05e3e2f80b89..cc004ff1b05b 100644
> --- a/fs/overlayfs/file.c
> +++ b/fs/overlayfs/file.c
> @@ -8,6 +8,7 @@
>
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include "overlayfs.h"
> @@ -291,6 +292,63 @@ long ovl_fallocate(struct file *file, int mode, loff_t 
> offset, loff_t len)
> return ret;
>  }
>
> +static long ovl_real_ioctl(struct file *file, unsigned int cmd,
> +  unsigned long arg)
> +{
> +   struct fd real;
> +   const struct cred *old_cred;
> +   long ret;
> +
> +   ret = ovl_real_file(file, &real);
> +   if (ret)
> +   return ret;
> +
> +   old_cred = ovl_override_creds(file_inode(file)->i_sb);
> +   ret = vfs_ioctl(real.file, cmd, arg);
> +   revert_creds(old_cred);
> +
> +   fdput(real);
> +
> +   return ret;
> +}
> +
> +long ovl_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
> +{
> +   long ret;
> +   struct inode *inode = file_inode(file);
> +
> +   switch (cmd) {
> +   case FS_IOC_GETFLAGS:
> +   ret = ovl_real_ioctl(file, cmd, arg);
> +   break;
> +
> +   case FS_IOC_SETFLAGS:
> +   if (!inode_owner_or_capable(inode))
> +   return -EACCES;
> +
> +   ret = mnt_want_write_file(file);
> +   if (ret)
> +   return ret;
> +
> +   ret = ovl_copy_up(file_dentry(file));
> +   if (!ret) {
> +   ret = ovl_real_ioctl(file, cmd, arg);
> +
> +   inode_lock(inode);
> +   ovl_copyflags(ovl_inode_real(inode), inode);
> +   inode_unlock(inode);
> +   }
> +
> +   mnt_drop_write_file(file);
> +   break;
> +
> +   default:
> +   ret = -ENOTTY;

I am wondering out loud.
This is a change of behavior that fs specific ioctls cannot be executed
on overlay file - arguably a good change of behavior, but still a change
that applications may got dependent on.

Would it have been better to opt-in for this change by a more generic
config/mount options, for example "consistent_fd" , instead of
"copy_up_shared" and then we can choose whether or not to
pass though unknown ioctls to real file.

I know we removed the want_write_file() protection from VFS, but
still, pass through of ioctls was the legacy behavior. Thoughts?
I don't mind to wait and see if someone shouts.

Thanks,
Amir.

Re: [PATCH 2/2] iio: afe: unit-converter: add support for adi,lt6106

2018-04-13 Thread Andrew F. Davis

On 04/12/2018 05:31 PM, Peter Rosin wrote:
> On 2018-04-12 17:35, Andrew F. Davis wrote:
>> On 04/12/2018 09:29 AM, Peter Rosin wrote:
>>> On 2018-04-11 18:13, Andrew F. Davis wrote:
 On 04/11/2018 10:51 AM, Lars-Peter Clausen wrote:
> On 04/11/2018 05:43 PM, Andrew F. Davis wrote:
>> On 04/11/2018 09:15 AM, Peter Rosin wrote:
>>> This is a current sense amplifier from Analog Devices.
>>>
>>> Signed-off-by: Peter Rosin 
>>> ---
>>>  drivers/iio/afe/Kconfig  |  3 +-
>>>  drivers/iio/afe/iio-unit-converter.c | 54 
>>> 
>>>  2 files changed, 56 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/iio/afe/Kconfig b/drivers/iio/afe/Kconfig
>>> index 642ce4eb12a6..0e10fe8f459a 100644
>>> --- a/drivers/iio/afe/Kconfig
>>> +++ b/drivers/iio/afe/Kconfig
>>> @@ -10,7 +10,8 @@ config IIO_UNIT_CONVERTER
>>> depends on OF || COMPILE_TEST
>>> help
>>>   Say yes here to build support for the IIO unit converter
>>> - that handles voltage dividers and current sense shunts.
>>> + that handles voltage dividers, current sense shunts and
>>> + the LT6106 Current Sense Amplifier from Analog Devices.
>>
>> Could work better to split these out into separate drivers. Maybe a
>> iio-shunt-resistor.c that does just voltage->current with the
>> appropriate scaling. Then make a a separate lt6106.c.
>
> I don't think we need a separate driver here. There are tons of circuits
> that all work the same way and all require the same properties. If we'd 
> add
> a driver for each of them we'd get buried in boilerplate code.
>

 Fair enough, then it should at least be renamed to something generic
 like current-sense-amplifier, as you said lots of circuits do this, not
 just lt6106s. We will have then have support for:

 current-sense-amplifier
 current-sense-shunt
 voltage-divider
>>>
>>> For the compatible "current-sense-amplifier", I would advocate the
>>> properties...
>>>
>>>  sense-resistor-micro-ohms
>>>  sense-gain
>>>
>>> (or something close to that)
>>>
>>> ...and not input-resistor-ohms and output-resistor-ohms which are way
>>> more particular to the LT6106.
>>>
>>> But as I said in the cover letter, I didn't go with sense-gain since I
>>> thought I would end up with requests for non-integer gains. There is
>>> yet to be a comment on the non-integer gain problem, and before there
>>> is a path forward for that case, I'm reluctant.
>>>
>>
>> Why not similar to what you had before with the resistor:
>>
>> sense-gain-multiplier
>> sense-gain-divider
>>
>> if either are missing assume they are 1.
> 
> Hmm, how about sense-gain for the normal integer case, and then divide
> by sense-attenuation if needed? I.e. exactly the same functionality as
> you describe, just different names.
> 

I like these names, but I think gain/attenuation sound very analog and I
would be tempted to assume they are floating point numbers or the units
are logarithmic (dB).

To prevent any more needless bike-shedding on my part I'd like to say
either yours, mine, or Lars-Peter's suggestion all work for me.

 compatibles in this driver called "unit-converter" which is still a
 misnomer IMHO.
>>>
>>> I don't remember you having presented your preference, and I think
>>> that goes against the established bike-shedding protocol?
>>>
>>
>> True, how about "current-sense-from-voltage" ?
> 
> Doesn't cover "voltage-divider" (and we don't need separate drivers
> doing the exact same calculations, that's a maintenance nightmare).
> 


The driver name doesn't have to cover every use, just more than the
other name.


> Cheers,
> Peter
>

Re: [PATCH 0/2] drm: Make it compilable without CONFIG_HDMI and CONFIG_I2C

2018-04-13 Thread Thomas Huth

On 13.04.2018 16:32, Daniel Vetter wrote:
> On Fri, Apr 13, 2018 at 11:40 AM, Thomas Huth  wrote:
>> By enabling the DRM code for virtio-gpu on S390, you currently also get
>> all the code that is enabled by CONFIG_HDMI and CONFIG_I2C automatically.
>> This is quite ugly, since on S390, there is no HDMI and no I2C. Thus it
>> would be great if the DRM code could also be compiled without CONFIG_HDMI
>> and CONFIG_I2C. These two patches now refactor the DRM code a little bit
>> so that we can compile it also without CONFIG_HDMI and CONFIG_I2C.
>>
>> Thomas Huth (2):
>>   drivers/gpu/drm: Move CONFIG_HDMI-dependent code to a separate file
>>   drivers/gpu/drm: Make the DRM code compilable without CONFIG_I2C
> 
> What's the benefit? Why does I2C/HDMI hurt you?

Why should I be forced to compile-in subsystems that do not make any
sense on this architecture? It's just completely weird to see CONFIG_I2C
enabled on s390x.

 Thomas

Re: [PATCH 2/6] tracing: Add trace event error log

2018-04-13 Thread Steven Rostedt

On Fri, 13 Apr 2018 09:24:34 -0500
Tom Zanussi  wrote:

> Yeah, I agree - I'd rather get it right than get it in now.  I thought
> this made sense, and was based on input from Masami, which I may have
> misinterpreted, but I'll wait for some more ideas about the best way to
> do this.

Too bad we are not closer to November, as this would actually be a good
Plumbers topic. Maybe it's not that important and we should wait until
then. I'd like to get some brain storming ideas out before we decide on
anything, and this is something I believe is better done face to face
than over email.

-- Steve

Re: [PATCH] ath10k: search all IEs for variant before falling back

2018-04-13 Thread Kalle Valo

Kalle Valo  writes:

> Thomas Hebb  writes:
>
>> commit f2593cb1b291 ("ath10k: Search SMBIOS for OEM board file
>> extension") added a feature to ath10k that allows Board Data File
>> (BDF) conflicts between multiple devices that use the same device IDs
>> but have different calibration requirements to be resolved by allowing
>> a "variant" string to be stored in SMBIOS [and later device tree, added
>> by commit d06f26c5c8a4 ("ath10k: search DT for qcom,ath10k-calibration-
>> variant")] that gets appended to the ID stored in board-2.bin.
>>
>> This original patch had a regression, however. Namely that devices with
>> a variant present in SMBIOS that didn't need custom BDFs could no longer
>> find the default BDF, which has no variant appended. The patch was
>> reverted and re-applied with a fix for this issue in commit 1657b8f84ed9
>> ("search SMBIOS for OEM board file extension").
>>
>> But the fix to fall back to a default BDF introduced another issue: the
>> driver currently parses IEs in board-2.bin one by one, and for each one
>> it first checks to see if it matches the ID with the variant appended.
>> If it doesn't, it checks to see if it matches the "fallback" ID with no
>> variant. If a matching BDF is found at any point during this search, the
>> search is terminated and that BDF is used. The issue is that it's very
>> possible (and is currently the case for board-2.bin files present in the
>> ath10k-firmware repository) for the default BDF to occur in an earlier
>> IE than the variant-specific BDF. In this case, the current code will
>> happily choose the default BDF even though a better-matching BDF is
>> present later in the file.
>>
>> This patch fixes the issue by first searching the entire file for the ID
>> with variant, and searching for the fallback ID only if that search
>> fails. It also includes some code cleanup in the area, as
>> ath10k_core_fetch_board_data_api_n() no longer does its own string
>> mangling to remove the variant from an ID, instead leaving that job to a
>> new flag passed to ath10k_core_create_board_name().
>>
>> I've tested this patch on a QCA4019 and verified that the driver behaves
>> correctly for 1) both fallback and variant BDFs present, 2) only fallback
>> BDF present, and 3) no matching BDFs present.
>>
>> Fixes: 1657b8f84ed9 ("ath10k: search SMBIOS for OEM board file extension")
>> Signed-off-by: Thomas Hebb 
>
> BTW, you forgot to CC linux-wireless so I don't see this in patchwork.
>
> https://wireless.wiki.kernel.org/en/users/drivers/ath10k/submittingpatches

I submitted v2 so that I see it in patchwork:

https://patchwork.kernel.org/patch/10340241/

-- 
Kalle Valo

Re: [PATCH 3/3] dcache: account external names as indirectly reclaimable memory

2018-04-13 Thread Johannes Weiner

On Fri, Apr 13, 2018 at 04:28:21PM +0200, Michal Hocko wrote:
> On Fri 13-04-18 16:20:00, Vlastimil Babka wrote:
> > We would need kmalloc-reclaimable-X variants. It could be worth it,
> > especially if we find more similar usages. I suspect they would be more
> > useful than the existing dma-kmalloc-X :)
> 
> I am still not sure why __GFP_RECLAIMABLE cannot be made work as
> expected and account slab pages as SLAB_RECLAIMABLE

Can you outline how this would work without separate caches?

Re: Some minor fixes for perf user tools

2018-04-13 Thread Arnaldo Carvalho de Melo

Em Fri, Apr 13, 2018 at 03:13:09PM +0200, Jiri Olsa escreveu:
> On Fri, Apr 06, 2018 at 01:38:08PM -0700, Andi Kleen wrote:
> > This patchkit fixes some random minor issues in the perf user tools
> 
> Acked-by: Jiri Olsa 

Thanks, applied.

- Arnaldo

Re: [PATCH 0/2] drm: Make it compilable without CONFIG_HDMI and CONFIG_I2C

2018-04-13 Thread Daniel Vetter

On Fri, Apr 13, 2018 at 11:40 AM, Thomas Huth  wrote:
> By enabling the DRM code for virtio-gpu on S390, you currently also get
> all the code that is enabled by CONFIG_HDMI and CONFIG_I2C automatically.
> This is quite ugly, since on S390, there is no HDMI and no I2C. Thus it
> would be great if the DRM code could also be compiled without CONFIG_HDMI
> and CONFIG_I2C. These two patches now refactor the DRM code a little bit
> so that we can compile it also without CONFIG_HDMI and CONFIG_I2C.
>
> Thomas Huth (2):
>   drivers/gpu/drm: Move CONFIG_HDMI-dependent code to a separate file
>   drivers/gpu/drm: Make the DRM code compilable without CONFIG_I2C

What's the benefit? Why does I2C/HDMI hurt you?

Note that you still can't compile out DP code, and the DRM legacy
code, and that's much bigger ...
-Daniel

>
>  drivers/gpu/drm/Kconfig |   6 +-
>  drivers/gpu/drm/Makefile|  17 ++--
>  drivers/gpu/drm/drm_crtc_internal.h |   2 +
>  drivers/gpu/drm/drm_edid.c  | 173 ++
>  drivers/gpu/drm/drm_hdmi.c  | 182 
> 
>  5 files changed, 206 insertions(+), 174 deletions(-)
>  create mode 100644 drivers/gpu/drm/drm_hdmi.c
>
> --
> 1.8.3.1
>
> ___
> dri-devel mailing list
> dri-de...@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel



-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

Re: [PATCH 3/3] dcache: account external names as indirectly reclaimable memory

2018-04-13 Thread Michal Hocko

On Fri 13-04-18 16:20:00, Vlastimil Babka wrote:
> On 04/13/2018 03:59 PM, Michal Hocko wrote:
> > On Fri 13-04-18 22:35:19, Minchan Kim wrote:
> >> On Mon, Mar 05, 2018 at 01:37:43PM +, Roman Gushchin wrote:
> > [...]
> >>> @@ -1614,9 +1623,11 @@ struct dentry *__d_alloc(struct super_block *sb, 
> >>> const struct qstr *name)
> >>>   name = &slash_name;
> >>>   dname = dentry->d_iname;
> >>>   } else if (name->len > DNAME_INLINE_LEN-1) {
> >>> - size_t size = offsetof(struct external_name, name[1]);
> >>> - struct external_name *p = kmalloc(size + name->len,
> >>> -   GFP_KERNEL_ACCOUNT);
> >>> + struct external_name *p;
> >>> +
> >>> + reclaimable = offsetof(struct external_name, name[1]) +
> >>> + name->len;
> >>> + p = kmalloc(reclaimable, GFP_KERNEL_ACCOUNT);
> >>
> >> Can't we use kmem_cache_alloc with own cache created with 
> >> SLAB_RECLAIM_ACCOUNT
> >> if they are reclaimable? 
> > 
> > No, because names have different sizes and so we would basically have to
> > duplicate many caches.
> 
> We would need kmalloc-reclaimable-X variants. It could be worth it,
> especially if we find more similar usages. I suspect they would be more
> useful than the existing dma-kmalloc-X :)

I am still not sure why __GFP_RECLAIMABLE cannot be made work as
expected and account slab pages as SLAB_RECLAIMABLE
-- 
Michal Hocko
SUSE Labs

Re: [PATCH 2/6] tracing: Add trace event error log

2018-04-13 Thread Tom Zanussi

Hi Steve,

On Fri, 2018-04-13 at 09:45 -0400, Steven Rostedt wrote:
> On Thu, 12 Apr 2018 18:52:13 -0500
> Tom Zanussi  wrote:
> 
> > Hi Steve,
> > 
> > On Thu, 2018-04-12 at 18:20 -0400, Steven Rostedt wrote:
> > > On Thu, 12 Apr 2018 10:13:17 -0500
> > > Tom Zanussi  wrote:
> > >   
> > > > diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
> > > > index 6fb46a0..f2dc7e6 100644
> > > > --- a/kernel/trace/trace.h
> > > > +++ b/kernel/trace/trace.h
> > > > @@ -1765,6 +1765,9 @@ extern ssize_t trace_parse_run_command(struct 
> > > > file *file,
> > > > const char __user *buffer, size_t count, loff_t *ppos,
> > > > int (*createfn)(int, char**));
> > > >  
> > > > +extern void event_log_err(const char *loc, const char *cmd, const char 
> > > > *fmt,
> > > > + ...);
> > > > +
> > > >  /*
> > > >   * Normal trace_printk() and friends allocates special buffers
> > > >   * to do the manipulation, as well as saves the print formats
> > > > diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
> > > > index 05c7172..fd02e22 100644
> > > > --- a/kernel/trace/trace_events.c
> > > > +++ b/kernel/trace/trace_events.c
> > > > @@ -1668,6 +1668,164 @@ static void ignore_task_cpu(void *data)
> > > > return ret;
> > > >  }
> > > >  
> > > > +#define EVENT_LOG_ERRS_MAX (PAGE_SIZE / sizeof(struct 
> > > > event_log_err))  
> > >   
> > > > +#define EVENT_ERR_LOG_MASK (EVENT_LOG_ERRS_MAX - 1)  
> > > 
> > > BTW, the above only works if EVENT_LOG_ERRS_MAX is a power of two,
> > > which it's not guaranteed to be.
> > >   
> > 
> > My assumption was that we'd only ever need a page or two for the
> > error_log and so would always would be a power of two, since the size of
> > the struct event_log_err is 512.
> 
> Assumptions are not what we want to rely on. There should be something
> like:
> 
>   BUILD_BUG_ON(EVENT_LOG_ERRS_MAX & EVENT_ERR_LOG_MASK);
> 
> Which would guarantee that your assumption is correct otherwise the
> kernel wont build.
> 

OK.

> 
> > 
> > Anyway, I should probably have put comments about all this in the code,
> > and I will, but the way it works kind of assumes a very small number of
> > errors - it's replacing a simple 'last error' facility for the hist
> > triggers and making it a common facility for other things that have
> > similar needs like Masami's kprobe_events errors.  For those purposes, I
> > assumed it would suffice to simply be able to show that last 8 or some
> > similar small number of errors and constantly recycle the slots.
> 
> The errors are still in the files that have the errors right? Perhaps
> just have a file that lists the files that contain errors. That way if
> something goes wrong, you can examine that file and then look at the
> file that contains the error?
> 

No, that's part of the motivation for this change - currently there is
just one last 'last error', the output tacked onto whichever event's
hist file you read (normally this would be the one you just got the
error for, but doesn't have to be) - there isn't a last error per event.
Masami of course found this unintuitive, which it is, I agree, and
wanted a single file (error_log) to look into for the last error.  In
addition, it should have a logging interface that any trace event
command could use, such as kprobe_events.

> And I'm not sure it being in the events directory is the best place
> either, especially, if you plan to have it handle kprobe_events because
> that's not in the events directory.
> 

Yeah, I put it there because it's associated with trace events - putting
it in tracing/ would imply that it's meant for ftrace in general (which
maybe it should be but this isn't).  Actually I'm not sure kprobe_events
shouldn't be in tracing/events too..

> > 
> > Basically it just splits the page into 16 strings, 2 per error, one for
> > the actual error text, the other for the command the user entered.  The
> > struct event_log_err just overlays a struct on top of 2 strings just to
> > make it easier to manage.
> > 
> > Anyway, because it is such a small number, and we start with a zeroed
> > page, whenever we print the error log, we print all 16 strings even if
> > we only have one error (2 strings).  The rest are NULL and print
> > nothing.  We start with the tail, which could also be thought of as the
> > 'oldest' or the 'first' error in the buffer and just cycle through them
> > all.  Hope that clears up some of the other questions you had about how
> > a non-full log gets printed, etc...
> 
> OK, I was thinking a NULL entry would return NULL, but we are
> returning a pointer to NULL. That's where I missed it.
>  
> > 
> > > > +
> > > > +struct event_log_err {
> > > > +   charerr[MAX_FILTER_STR_VAL];
> > > > +   charcmd[MAX_FILTER_STR_VAL];
> > > > +};  
> > > 
> > > I like the event_log_err idea, but the above can be shrunk to:
> > > 
> > > struct err_info {
> > >   u

Re: [PATCH] KVM: x86: VMX: hyper-v: Enlightened MSR-Bitmap support

2018-04-13 Thread Vitaly Kuznetsov

Paolo Bonzini  writes:

> On 12/04/2018 17:25, Vitaly Kuznetsov wrote:
>> @@ -5335,6 +5353,9 @@ static void __always_inline 
>> vmx_disable_intercept_for_msr(unsigned long *msr_bit
>>  if (!cpu_has_vmx_msr_bitmap())
>>  return;
>>  
>> +if (static_branch_unlikely(&enable_emsr_bitmap))
>> +evmcs_touch_msr_bitmap();
>> +
>>  /*
>>   * See Intel PRM Vol. 3, 20.6.9 (MSR-Bitmap Address). Early manuals
>>   * have the write-low and read-high bitmap offsets the wrong way round.
>> @@ -5370,6 +5391,9 @@ static void __always_inline 
>> vmx_enable_intercept_for_msr(unsigned long *msr_bitm
>>  if (!cpu_has_vmx_msr_bitmap())
>>  return;
>>  
>> +if (static_branch_unlikely(&enable_emsr_bitmap))
>> +evmcs_touch_msr_bitmap();
>
> I'm not sure about the "unlikely".  Can you just check current_evmcs
> instead (dropping the static key completely)?

current_evmcs is just a cast:

 (struct hv_enlightened_vmcs *)this_cpu_read(current_vmcs)

so it is always not NULL here :-) We need to check enable_evmcs static
key first. Getting rid of the newly added enable_emsr_bitmap is, of
course, possible.

(Actually, we only call vmx_{dis,en}able_intercept_for_msr in the very
beginning of vCPUs life so this is not a hotpath and likeliness doesn't
really matter).

Will do v2 without the static key, thanks!

>
> The function, also, is small enough that inlining should be beneficial.
>
> Paolo

-- 
  Vitaly

Re: [PATCH] kbuild: rpm-pkg: use kernel-install as a fallback for new-kernel-pkg

2018-04-13 Thread Masahiro Yamada

2018-04-12 3:15 GMT+09:00 Javier Martinez Canillas :
> The new-kernel-pkg script is only present when grubby is installed, but it
> may not always be the case. So if the script isn't present, attempt to use
> the kernel-install script as a fallback instead.
>
> Signed-off-by: Javier Martinez Canillas 
>
> ---
>

Applied to linux-kbuild.  Thanks!


>  scripts/package/mkspec | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/scripts/package/mkspec b/scripts/package/mkspec
> index 61427c6f2209..e05646dc24dc 100755
> --- a/scripts/package/mkspec
> +++ b/scripts/package/mkspec
> @@ -118,6 +118,8 @@ $S$Mln -sf /usr/src/kernels/$KERNELRELEASE source
> %preun
> if [ -x /sbin/new-kernel-pkg ]; then
> new-kernel-pkg --remove $KERNELRELEASE --rminitrd 
> --initrdfile=/boot/initramfs-$KERNELRELEASE.img
> +   elif [ -x /usr/bin/kernel-install ]; then
> +   kernel-install remove $KERNELRELEASE
> fi
>
> %postun
> --
> 2.14.3
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kbuild" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Best Regards
Masahiro Yamada

Re: [RFC PATCH 04/35] ovl: copy up times

2018-04-13 Thread Vivek Goyal

On Thu, Apr 12, 2018 at 05:07:55PM +0200, Miklos Szeredi wrote:
> Copy up mtime and ctime to overlay inode after times in real object are
> modified.  Be careful not to dirty cachelines when not necessary.
> 
> This is in preparation for moving overlay functionality out of the VFS.
> 
> This patch shouldn't have any observable effect.

So there are bunch of operations which will change inode ctime. I had
missed this in my metadata only copy up patch series and that would broken
atime updates in some cases.

Vivek

> 
> Signed-off-by: Miklos Szeredi 
> ---
>  fs/overlayfs/dir.c   |  5 +
>  fs/overlayfs/inode.c |  1 +
>  fs/overlayfs/overlayfs.h |  7 +++
>  fs/overlayfs/util.c  | 19 +++
>  4 files changed, 32 insertions(+)
> 
> diff --git a/fs/overlayfs/dir.c b/fs/overlayfs/dir.c
> index 839709c7803a..cd0fa2363723 100644
> --- a/fs/overlayfs/dir.c
> +++ b/fs/overlayfs/dir.c
> @@ -507,6 +507,7 @@ static int ovl_create_or_link(struct dentry *dentry, 
> struct inode *inode,
>   else
>   err = ovl_create_over_whiteout(dentry, inode, attr,
>   hardlink);
> + ovl_copytimes_with_parent(dentry);
>   }
>  out_revert_creds:
>   revert_creds(old_cred);
> @@ -768,6 +769,7 @@ static int ovl_do_remove(struct dentry *dentry, bool 
> is_dir)
>   drop_nlink(dentry->d_inode);
>   }
>   ovl_nlink_end(dentry, locked);
> + ovl_copytimes_with_parent(dentry);
>  out_drop_write:
>   ovl_drop_write(dentry);
>  out:
> @@ -1079,6 +1081,9 @@ static int ovl_rename(struct inode *olddir, struct 
> dentry *old,
>   ovl_dentry_version_inc(new->d_parent, ovl_type_origin(old) ||
>  (d_inode(new) && ovl_type_origin(new)));
>  
> + ovl_copytimes_with_parent(old);
> + ovl_copytimes_with_parent(new);
> +
>  out_dput:
>   dput(newdentry);
>  out_dput_old:
> diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c
> index 6e3815fb006b..33635106c5f7 100644
> --- a/fs/overlayfs/inode.c
> +++ b/fs/overlayfs/inode.c
> @@ -303,6 +303,7 @@ int ovl_xattr_set(struct dentry *dentry, struct inode 
> *inode, const char *name,
>   err = vfs_removexattr(realdentry, name);
>   }
>   revert_creds(old_cred);
> + ovl_copytimes(d_inode(dentry));
>  
>  out_drop_write:
>   ovl_drop_write(dentry);
> diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
> index e0b7de799f6b..eef720ef0f07 100644
> --- a/fs/overlayfs/overlayfs.h
> +++ b/fs/overlayfs/overlayfs.h
> @@ -258,6 +258,13 @@ bool ovl_need_index(struct dentry *dentry);
>  int ovl_nlink_start(struct dentry *dentry, bool *locked);
>  void ovl_nlink_end(struct dentry *dentry, bool locked);
>  int ovl_lock_rename_workdir(struct dentry *workdir, struct dentry *upperdir);
> +void ovl_copytimes(struct inode *inode);
> +
> +static inline void ovl_copytimes_with_parent(struct dentry *dentry)
> +{
> + ovl_copytimes(d_inode(dentry));
> + ovl_copytimes(d_inode(dentry->d_parent));
> +}
>  
>  static inline bool ovl_is_impuredir(struct dentry *dentry)
>  {
> diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c
> index 6f1078028c66..11e62e70733a 100644
> --- a/fs/overlayfs/util.c
> +++ b/fs/overlayfs/util.c
> @@ -675,3 +675,22 @@ int ovl_lock_rename_workdir(struct dentry *workdir, 
> struct dentry *upperdir)
>   pr_err("overlayfs: failed to lock workdir+upperdir\n");
>   return -EIO;
>  }
> +
> +void ovl_copytimes(struct inode *inode)
> +{
> + struct inode *upperinode;
> +
> + if (!inode)
> + return;
> +
> + upperinode = ovl_inode_upper(inode);
> +
> + if (!upperinode)
> + return;
> +
> + if ((!timespec_equal(&inode->i_mtime, &upperinode->i_mtime) ||
> +  !timespec_equal(&inode->i_ctime, &upperinode->i_ctime))) {
> + inode->i_mtime = upperinode->i_mtime;
> + inode->i_ctime = upperinode->i_ctime;
> + }
> +}
> -- 
> 2.14.3
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-unionfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/3] dcache: account external names as indirectly reclaimable memory

2018-04-13 Thread Vlastimil Babka

On 04/13/2018 03:59 PM, Michal Hocko wrote:
> On Fri 13-04-18 22:35:19, Minchan Kim wrote:
>> On Mon, Mar 05, 2018 at 01:37:43PM +, Roman Gushchin wrote:
> [...]
>>> @@ -1614,9 +1623,11 @@ struct dentry *__d_alloc(struct super_block *sb, 
>>> const struct qstr *name)
>>> name = &slash_name;
>>> dname = dentry->d_iname;
>>> } else if (name->len > DNAME_INLINE_LEN-1) {
>>> -   size_t size = offsetof(struct external_name, name[1]);
>>> -   struct external_name *p = kmalloc(size + name->len,
>>> - GFP_KERNEL_ACCOUNT);
>>> +   struct external_name *p;
>>> +
>>> +   reclaimable = offsetof(struct external_name, name[1]) +
>>> +   name->len;
>>> +   p = kmalloc(reclaimable, GFP_KERNEL_ACCOUNT);
>>
>> Can't we use kmem_cache_alloc with own cache created with 
>> SLAB_RECLAIM_ACCOUNT
>> if they are reclaimable? 
> 
> No, because names have different sizes and so we would basically have to
> duplicate many caches.

We would need kmalloc-reclaimable-X variants. It could be worth it,
especially if we find more similar usages. I suspect they would be more
useful than the existing dma-kmalloc-X :)

Maybe create both (dma and reclaimable) on demand?

[PATCH] sparc: fix compat siginfo ABI regression

2018-04-13 Thread Dmitry V. Levin

Starting with commit v4.14-rc1~60^2^2~1, a SIGFPE signal sent via kill
results to wrong values in si_pid and si_uid fields of compat siginfo_t.

This happens due to FPE_FIXME being defined to 0 for sparc, and at the
same time siginfo_layout() introduced by the same commit returns
SIL_FAULT for SIGFPE if si_code == SI_USER and FPE_FIXME is defined to 0.

Fix this regression by removing FPE_FIXME macro and changing all its users
to assign FPE_FLTUNK to si_code instead of FPE_FIXME.

Note that FPE_FLTUNK is a new macro introduced by commit
266da65e9156d93e1126e185259a4aae68188d0e.

Tested with commit v4.16-11958-g16e205cf42da.

This bug was found by strace test suite.

Link: https://github.com/strace/strace/issues/21
Fixes: cc731525f26a ("signal: Remove kernel interal si_code magic")
Thanks-to: Anatoly Pugachev 
Signed-off-by: Dmitry V. Levin 
---
 arch/sparc/include/uapi/asm/siginfo.h | 7 ---
 arch/sparc/kernel/traps_32.c  | 2 +-
 arch/sparc/kernel/traps_64.c  | 2 +-
 3 files changed, 2 insertions(+), 9 deletions(-)

diff --git a/arch/sparc/include/uapi/asm/siginfo.h 
b/arch/sparc/include/uapi/asm/siginfo.h
index 896ce44..e704955 100644
--- a/arch/sparc/include/uapi/asm/siginfo.h
+++ b/arch/sparc/include/uapi/asm/siginfo.h
@@ -18,13 +18,6 @@
 #define SI_NOINFO  32767   /* no information in siginfo_t */
 
 /*
- * SIGFPE si_codes
- */
-#ifdef __KERNEL__
-#define FPE_FIXME  0   /* Broken dup of SI_USER */
-#endif /* __KERNEL__ */
-
-/*
  * SIGEMT si_codes
  */
 #define EMT_TAGOVF 1   /* tag overflow */
diff --git a/arch/sparc/kernel/traps_32.c b/arch/sparc/kernel/traps_32.c
index b1ed763..33cd35b 100644
--- a/arch/sparc/kernel/traps_32.c
+++ b/arch/sparc/kernel/traps_32.c
@@ -307,7 +307,7 @@ void do_fpe_trap(struct pt_regs *regs, unsigned long pc, 
unsigned long npc,
info.si_errno = 0;
info.si_addr = (void __user *)pc;
info.si_trapno = 0;
-   info.si_code = FPE_FIXME;
+   info.si_code = FPE_FLTUNK;
if ((fsr & 0x1c000) == (1 << 14)) {
if (fsr & 0x10)
info.si_code = FPE_FLTINV;
diff --git a/arch/sparc/kernel/traps_64.c b/arch/sparc/kernel/traps_64.c
index 462a21a..e81072a 100644
--- a/arch/sparc/kernel/traps_64.c
+++ b/arch/sparc/kernel/traps_64.c
@@ -2372,7 +2372,7 @@ static void do_fpe_common(struct pt_regs *regs)
info.si_errno = 0;
info.si_addr = (void __user *)regs->tpc;
info.si_trapno = 0;
-   info.si_code = FPE_FIXME;
+   info.si_code = FPE_FLTUNK;
if ((fsr & 0x1c000) == (1 << 14)) {
if (fsr & 0x10)
info.si_code = FPE_FLTINV;
-- 
ldv

Re: [PATCH 1/2] tracing/events: block: track and print if unplug was explicit or schedule

2018-04-13 Thread Steven Rostedt

On Fri, 13 Apr 2018 15:07:17 +0200
Steffen Maier  wrote:

> Just like blktrace distinguishes explicit and schedule by means of
> BLK_TA_UNPLUG_IO and BLK_TA_UNPLUG_TIMER, actually make use of the
> existing argument "explicit" to distinguish the two cases in the one
> common tracepoint block_unplug.
> 
> Complements v2.6.39 commit 49cac01e1fa7 ("block: make unplug timer trace
> event correspond to the schedule() unplug") and commit d9c978331790
> ("block: remove block_unplug_timer() trace point").
> 
> Signed-off-by: Steffen Maier 
> ---
>  include/trace/events/block.h | 10 +-
>  1 file changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/include/trace/events/block.h b/include/trace/events/block.h
> index 81b43f5bdf23..a13613d27cee 100644
> --- a/include/trace/events/block.h
> +++ b/include/trace/events/block.h
> @@ -470,6 +470,11 @@ TRACE_EVENT(block_plug,
>   TP_printk("[%s]", __entry->comm)
>  );
>  
> +#define show_block_unplug_explicit(val)  \
> + __print_symbolic(val,   \
> +  {false, "schedule"},   \
> +  {true,  "explicit"})

That's new. I haven't seen "true"/"false" values used for
print_symbolic before. But could you please use 1 and 0 instead, because
perf and trace-cmd won't be able to parse that. I could update
libtraceevent to handle it, but really, the first parameter is suppose
to be numeric.

-- Steve

> +
>  DECLARE_EVENT_CLASS(block_unplug,
>  
>   TP_PROTO(struct request_queue *q, unsigned int depth, bool explicit),
> @@ -478,15 +483,18 @@ DECLARE_EVENT_CLASS(block_unplug,
>  
>   TP_STRUCT__entry(
>   __field( int,   nr_rq   )
> + __field( bool,  explicit)
>   __array( char,  comm,   TASK_COMM_LEN   )
>   ),
>  
>   TP_fast_assign(
>   __entry->nr_rq = depth;
> + __entry->explicit = explicit;
>   memcpy(__entry->comm, current->comm, TASK_COMM_LEN);
>   ),
>  
> - TP_printk("[%s] %d", __entry->comm, __entry->nr_rq)
> + TP_printk("[%s] %d %s", __entry->comm, __entry->nr_rq,
> +   show_block_unplug_explicit(__entry->explicit))
>  );
>  
>  /**

Re: [PATCH] printk: Ratelimit messages printed by console drivers

2018-04-13 Thread Steven Rostedt

On Fri, 13 Apr 2018 14:47:04 +0200
Petr Mladek  wrote:

> The interval is set to one hour. It is rather arbitrary selected time.
> It is supposed to be a compromise between never print these messages,
> do not lockup the machine, do not fill the entire buffer too quickly,
> and get information if something changes over time.

I think an hour is incredibly long. We only allow 100 lines per hour for
printks happening inside another printk?

I think 5 minutes (at most) would probably be plenty. One minute may be
good enough.

-- Steve

Re: [PATCH v2 2/3] microblaze: remove redundant early_printk support

2018-04-13 Thread Rob Herring

On Tue, Apr 10, 2018 at 8:44 AM, Michal Simek  wrote:
> Hi Rob,
>
> On 28.3.2018 04:06, Rob Herring wrote:
>> With earlycon support now enabled, the arch specific early_printk support
>> can be removed.
>
> earlycon is not the full replacement of early_printk support as is
> designed right now.
> Definitely current early_printk is pretty old and contains code
> duplication but it starts much earlier then earlycon.

Yes, essentially it's after MMU enabling rather than before. But it is
still before any h/w specific setup (dependent on the DT) which is
where one would typically fail to boot. Generally, I've found before
DT unflattening to be early enough. What can go wrong at this early
stage? Memory is flaky or you've passed in bad memory ranges or image
locations. An earlier console may or may not help there and those
problems are easier to debug in the bootloader.

So it is a question of what you want to maintain.

>> Signed-off-by: Rob Herring 
>> Cc: Michal Simek 
>> ---
>> v2:
>> - Fix booting. The setup_memory call needed to be before the
>>   parse_early_param call.
>
> What's the reason for calling setup_memory before parse_early_param?
> Is there any dependency?

Yes, either fixmap or ioremap (in your case) has to be functional when
earlycon is setup which happens via parse_early_param.

Rob

[GIT PULL] arm64: Late updates for 4.17

2018-04-13 Thread Will Deacon

Hi Linus,

As I mentioned in the previous pull request, we had some nasty conflicts
with the KVM tree that resulted in us dropping some spectre-related work
shortly before the merge window opened. Now that the KVM tree has been
merged, we've put together an updated version of the patches based on
your merge commit (details in the tag). I appreciate this isn't ideal,
so if you'd rather just see this stuff at -rc1 please let me know and we
can do that instead.

There are also a couple of patches here adding some unused assembler
macros which will be needed by some 4.18 crypto code and we'd like to
head that dependency off early.

Thanks,

Will

--->8

The following changes since commit d8312a3f61024352f1c7cb967571fd53631b0d6c:

  Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm 
(2018-04-09 11:42:31 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git 
tags/arm64-upstream

for you to fetch changes up to 24534b3511828c66215fdf1533d77a7bf2e1fdb2:

  arm64: assembler: add macros to conditionally yield the NEON under PREEMPT 
(2018-04-11 18:50:34 +0100)


Additional arm64 updates for 4.17

A few late updates to address some issues arising from conflicts with
other trees:

- Removal of Qualcomm-specific Spectre-v2 mitigation in favour of the
  generic SMCCC-based firmware call

- Fix EL2 hardening capability checking, which was bodged to reduce
  conflicts with the KVM tree

- Add some currently unused assembler macros for managing SIMD registers
  which will be used by some crypto code in the next merge window


Ard Biesheuvel (2):
  arm64: assembler: add utility macros to push/pop stack frames
  arm64: assembler: add macros to conditionally yield the NEON under PREEMPT

Marc Zyngier (3):
  arm64: capabilities: Rework EL2 vector hardening entry
  arm64: Get rid of __smccc_workaround_1_hvc_*
  arm64: Move the content of bpi.S to hyp-entry.S

Shanker Donthineni (1):
  arm64: KVM: Use SMCCC_ARCH_WORKAROUND_1 for Falkor BP hardening

 arch/arm64/include/asm/assembler.h | 136 +
 arch/arm64/include/asm/cpucaps.h   |  13 ++--
 arch/arm64/include/asm/kvm_asm.h   |   2 -
 arch/arm64/kernel/Makefile |   2 -
 arch/arm64/kernel/asm-offsets.c|   3 +
 arch/arm64/kernel/bpi.S| 102 
 arch/arm64/kernel/cpu_errata.c |  97 ++
 arch/arm64/kvm/hyp/entry.S |  12 
 arch/arm64/kvm/hyp/hyp-entry.S |  64 -
 arch/arm64/kvm/hyp/switch.c|  10 ---
 10 files changed, 242 insertions(+), 199 deletions(-)
 delete mode 100644 arch/arm64/kernel/bpi.S

Re: [PATCH ipmi/kcs_bmc v1] ipmi: kcs_bmc: optimize the data buffers allocation

2018-04-13 Thread Wang, Haiyue




On 2018-04-13 21:50, Corey Minyard wrote:

On 04/07/2018 02:54 AM, Wang, Haiyue wrote:

Hi Corey,

Since IPMI 2.0 just defined minimum, no maximum:



KCS/SMIC Input : Required: 40 bytes IPMI Message, minimum

KCS/SMIC Output : Required: 38 bytes IPMI Message, minimum



Yes, though there are practical maximums that are much smaller than 
1000 bytes.






We can enlarge the block size for avoiding waste, and make our driver

support most worst message size case. And I think this patch make 
checking


simple (from 3 to 1), and the code clean, this is the biggest reason 
I want to


change. The TLB is just memory management study from book, no data to

support access improvement. :)


I would argue that the way it is now expresses the intent of the code 
better
than one allocation split into three parts.  Expressing your intent is 
more

important than the number of checks and a minuscule performance
improvement.  For me it makes the code easier to understand.  If you had
a tool that checked for out-of-bounds memory access, then a single 
allocation

might not find an overrun between the parts.  Smaller allocations tend
to result in less memory fragmentation.


When I wrote the commit, I felt that the message was not so professional,
and the reason sounded weak. The driver development is a complex work,
needs considering more things, not just one. Thanks for your patience.

My preference is to leave it as it is.  However, it's not that 
important, and

if you really want this patch, I can include it.


So leave it as it is, abandon this patch. :-)

BTW, another patch about KCS BMC chip support:
https://lkml.org/lkml/2018/3/22/284
Look forward your reviewing, I've tried my best to make it better.


Thanks,

-corey



BR,

Haiyue


On 2018-04-07 10:37, Wang, Haiyue wrote:



On 2018-04-07 05:47, Corey Minyard wrote:

On 03/15/2018 07:20 AM, Haiyue Wang wrote:
Allocate a continuous memory block for the three KCS data buffers 
with

related index assignment.


I'm finally getting to this.

Is there a reason you want to do this?  In general, it's better to 
not try to

outsmart your base system.  Depending on the memory allocator, in this
case, you might actually use more memory.  You probably won't use any
less.

I got this idea from another code review, but that patch allocates 
30 more
the same size memory block, reducing the devm_kmalloc call will be 
better.

For KCS only have 3, may be the key point is memory waste.

In the original case, you allocate three 1000 byte buffers, 
resulting in 3

1024 byte slab allocated.

In the changed case, you will allocate a 3000 byte buffer, 
resulting in

a single 4096 byte slab allocation, wasting 1024 more bytes of memory.


As the kcs has memory copy between in/out/kbuffer, put them in the same
page will be better ? Such as the same TLB ? (Well, I just got this 
from book,
no real experience of memory accessing performance. And also, I was 
told

that using space to save the time. :-)).

Just my stupid thinking. I'm OK to drop this patch if it doesn't 
help with

performance, or something else.

BR.
Haiyue


-corey


Signed-off-by: Haiyue Wang 
---
  drivers/char/ipmi/kcs_bmc.c | 10 ++
  1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/drivers/char/ipmi/kcs_bmc.c 
b/drivers/char/ipmi/kcs_bmc.c

index fbfc05e..dc19c0d 100644
--- a/drivers/char/ipmi/kcs_bmc.c
+++ b/drivers/char/ipmi/kcs_bmc.c
@@ -435,6 +435,7 @@ static const struct file_operations 
kcs_bmc_fops = {
  struct kcs_bmc *kcs_bmc_alloc(struct device *dev, int 
sizeof_priv, u32 channel)

  {
  struct kcs_bmc *kcs_bmc;
+    void *buf;
    kcs_bmc = devm_kzalloc(dev, sizeof(*kcs_bmc) + 
sizeof_priv, GFP_KERNEL);

  if (!kcs_bmc)
@@ -448,11 +449,12 @@ struct kcs_bmc *kcs_bmc_alloc(struct device 
*dev, int sizeof_priv, u32 channel)

  mutex_init(&kcs_bmc->mutex);
  init_waitqueue_head(&kcs_bmc->queue);
  -    kcs_bmc->data_in = devm_kmalloc(dev, KCS_MSG_BUFSIZ, 
GFP_KERNEL);
-    kcs_bmc->data_out = devm_kmalloc(dev, KCS_MSG_BUFSIZ, 
GFP_KERNEL);
-    kcs_bmc->kbuffer = devm_kmalloc(dev, KCS_MSG_BUFSIZ, 
GFP_KERNEL);
-    if (!kcs_bmc->data_in || !kcs_bmc->data_out || 
!kcs_bmc->kbuffer)

+    buf = devm_kmalloc_array(dev, 3, KCS_MSG_BUFSIZ, GFP_KERNEL);
+    if (!buf)
  return NULL;
+    kcs_bmc->data_in  = buf;
+    kcs_bmc->data_out = buf + KCS_MSG_BUFSIZ;
+    kcs_bmc->kbuffer  = buf + KCS_MSG_BUFSIZ * 2;
    kcs_bmc->miscdev.minor = MISC_DYNAMIC_MINOR;
  kcs_bmc->miscdev.name = dev_name(dev);

Re: [PATCH] ARM: omap2: Fix build when using split object directories

2018-04-13 Thread Dave Gerlach

Tony,
On 04/12/2018 04:08 AM, Masahiro Yamada wrote:
> 2018-04-12 17:21 GMT+09:00 Anders Roxell :
>> On 2018-04-11 16:15, Dave Gerlach wrote:
>>> The sleep33xx and sleep43xx files should not depend on a header file
>>> generated in drivers/memory. Remove this dependency and instead allow
>>> both drivers/memory and arch/arm/mach-omap2 to generate all macros
>>> needed in headers local to their own paths.
>>>
>>> This fixes an issue where the build fail will when using O= to set a
>>> split object directory and arch/arm/mach-omap2 is built before
>>> drivers/memory with the following error:
>>>
>>> .../drivers/memory/emif-asm-offsets.c:1:0: fatal error: can't open 
>>> drivers/memory/emif-asm-offsets.s for writing: No such file or directory
>>> compilation terminated.
>>>
>>> Fixes: 41d9d44d7258 ("ARM: OMAP2+: pm33xx-core: Add platform code needed 
>>> for PM")
>>> Acked-by: Tony Lindgren 
>>> Reviewed-by: Masahiro Yamada 
>>> Signed-off-by: Dave Gerlach 
>>
>> Tested-by: Anders Roxell 
>>
>> Maybe we can remove drivers/memory/Makefile.asm-offsets and move those
>> changes into drivers/memory/Makefile ?
> 
> Agree!
> 

This is the version of this patch that we want to use, will this go through you?

Regards,
Dave

> 
> 
>

Re: [PATCH v2] ARM: omap2: Fix build when using split object directories

2018-04-13 Thread Dave Gerlach

On 04/12/2018 10:24 PM, Masahiro Yamada wrote:
> 2018-04-13 11:58 GMT+09:00 Dave Gerlach :
>> The sleep33xx and sleep43xx files should not depend on a header file
>> generated in drivers/memory. Remove this dependency and instead allow
>> both drivers/memory and arch/arm/mach-omap2 to generate all macros
>> needed in headers local to their own paths.
>>
>> This fixes an issue where the build fail will when using O= to set a
>> split object directory and arch/arm/mach-omap2 is built before
>> drivers/memory with the following error:
>>
>> .../drivers/memory/emif-asm-offsets.c:1:0: fatal error: can't open 
>> drivers/memory/emif-asm-offsets.s for writing: No such file or directory
>> compilation terminated.
>>
>> Fixes: 41d9d44d7258 ("ARM: OMAP2+: pm33xx-core: Add platform code needed for 
>> PM")
>> Acked-by: Tony Lindgren 
>> Reviewed-by: Masahiro Yamada 
>> Tested-by: Anders Roxell 
>> Signed-off-by: Dave Gerlach 
>> ---
>> v1 -> v2:
>>  * Removed drivers/memory/Makefile.asm-offsets and consolidated into
>>drivers/memory/Makefile.
> 
> 
> 
> I did not mean like this.
> 
> I thought this clean-up would be done in a separate patch.
> 
> I think your previous patch is OK as-is.
> 

Ok sorry for the confusion let's forget this version then.

Regards,
Dave

> 
> 
> 
> 
>>  arch/arm/mach-omap2/Makefile |  6 +--
>>  arch/arm/mach-omap2/pm-asm-offsets.c |  3 ++
>>  arch/arm/mach-omap2/sleep33xx.S  |  1 -
>>  arch/arm/mach-omap2/sleep43xx.S  |  1 -
>>  drivers/memory/Makefile  |  8 +++-
>>  drivers/memory/Makefile.asm-offsets  |  5 ---
>>  drivers/memory/emif-asm-offsets.c| 72 +-
>>  include/linux/ti-emif-sram.h | 75 
>> 
>>  8 files changed, 86 insertions(+), 85 deletions(-)
>>  delete mode 100644 drivers/memory/Makefile.asm-offsets
>>
>> diff --git a/arch/arm/mach-omap2/Makefile b/arch/arm/mach-omap2/Makefile
>> index 4603c30fef73..0d9ce58bc464 100644
>> --- a/arch/arm/mach-omap2/Makefile
>> +++ b/arch/arm/mach-omap2/Makefile
>> @@ -243,8 +243,4 @@ arch/arm/mach-omap2/pm-asm-offsets.s: 
>> arch/arm/mach-omap2/pm-asm-offsets.c
>>  include/generated/ti-pm-asm-offsets.h: arch/arm/mach-omap2/pm-asm-offsets.s 
>> FORCE
>> $(call filechk,offsets,__TI_PM_ASM_OFFSETS_H__)
>>
>> -# For rule to generate ti-emif-asm-offsets.h dependency
>> -include drivers/memory/Makefile.asm-offsets
>> -
>> -arch/arm/mach-omap2/sleep33xx.o: include/generated/ti-pm-asm-offsets.h 
>> include/generated/ti-emif-asm-offsets.h
>> -arch/arm/mach-omap2/sleep43xx.o: include/generated/ti-pm-asm-offsets.h 
>> include/generated/ti-emif-asm-offsets.h
>> +$(obj)/sleep33xx.o $(obj)/sleep43xx.o: include/generated/ti-pm-asm-offsets.h
>> diff --git a/arch/arm/mach-omap2/pm-asm-offsets.c 
>> b/arch/arm/mach-omap2/pm-asm-offsets.c
>> index 6d4392da7c11..b9846b19e5e2 100644
>> --- a/arch/arm/mach-omap2/pm-asm-offsets.c
>> +++ b/arch/arm/mach-omap2/pm-asm-offsets.c
>> @@ -7,9 +7,12 @@
>>
>>  #include 
>>  #include 
>> +#include 
>>
>>  int main(void)
>>  {
>> +   ti_emif_asm_offsets();
>> +
>> DEFINE(AMX3_PM_WFI_FLAGS_OFFSET,
>>offsetof(struct am33xx_pm_sram_data, wfi_flags));
>> DEFINE(AMX3_PM_L2_AUX_CTRL_VAL_OFFSET,
>> diff --git a/arch/arm/mach-omap2/sleep33xx.S 
>> b/arch/arm/mach-omap2/sleep33xx.S
>> index 218d79930b04..322b3bb868b4 100644
>> --- a/arch/arm/mach-omap2/sleep33xx.S
>> +++ b/arch/arm/mach-omap2/sleep33xx.S
>> @@ -6,7 +6,6 @@
>>   * Dave Gerlach, Vaibhav Bedia
>>   */
>>
>> -#include 
>>  #include 
>>  #include 
>>  #include 
>> diff --git a/arch/arm/mach-omap2/sleep43xx.S 
>> b/arch/arm/mach-omap2/sleep43xx.S
>> index b24be624e8b9..8903814a6677 100644
>> --- a/arch/arm/mach-omap2/sleep43xx.S
>> +++ b/arch/arm/mach-omap2/sleep43xx.S
>> @@ -6,7 +6,6 @@
>>   * Dave Gerlach, Vaibhav Bedia
>>   */
>>
>> -#include 
>>  #include 
>>  #include 
>>  #include 
>> diff --git a/drivers/memory/Makefile b/drivers/memory/Makefile
>> index 66f55240830e..b3b95380346f 100644
>> --- a/drivers/memory/Makefile
>> +++ b/drivers/memory/Makefile
>> @@ -28,6 +28,10 @@ ti-emif-sram-objs:= ti-emif-pm.o 
>> ti-emif-sram-pm.o
>>
>>  AFLAGS_ti-emif-sram-pm.o   :=-Wa,-march=armv7-a
>>
>> -include drivers/memory/Makefile.asm-offsets
>> +drivers/memory/emif-asm-offsets.s: drivers/memory/emif-asm-offsets.c
>> +   $(call if_changed_dep,cc_s_c)
>>
>> -drivers/memory/ti-emif-sram-pm.o: include/generated/ti-emif-asm-offsets.h
>> +include/generated/ti-emif-asm-offsets.h: drivers/memory/emif-asm-offsets.s 
>> FORCE
>> +   $(call filechk,offsets,__TI_EMIF_ASM_OFFSETS_H__)
>> +
>> +$(obj)/ti-emif-sram-pm.o: include/generated/ti-emif-asm-offsets.h
>> diff --git a/drivers/memory/Makefile.asm-offsets 
>> b/drivers/memory/Makefile.asm-offsets
>> deleted file mode 100644
>> index 843ff60ccb5a..
>> --- a/drivers/memory/Makefile.asm-offsets
>> +++ /dev/null
>> @@ -1,5 +0,0 @@
>> -drivers/memor

Re: [PATCH v5 05/14] PCI: Add pcie_print_link_status() to log link speed and whether it's limited

2018-04-13 Thread Bjorn Helgaas

On Thu, Apr 12, 2018 at 09:32:49PM -0700, Jakub Kicinski wrote:
> On Fri, 30 Mar 2018 16:05:18 -0500, Bjorn Helgaas wrote:
> > +   if (bw_avail >= bw_cap)
> > +   pci_info(dev, "%d Mb/s available bandwidth (%s x%d link)\n",
> > +bw_cap, PCIE_SPEED2STR(speed_cap), width_cap);
> > +   else
> > +   pci_info(dev, "%d Mb/s available bandwidth, limited by %s x%d 
> > link at %s (capable of %d Mb/s with %s x%d link)\n",
> > +bw_avail, PCIE_SPEED2STR(speed), width,
> > +limiting_dev ? pci_name(limiting_dev) : "",
> > +bw_cap, PCIE_SPEED2STR(speed_cap), width_cap);
> 
> I was just looking at using this new function to print PCIe BW for a
> NIC, but I'm slightly worried that there is nothing in the message that
> says PCIe...  For a NIC some people may interpret the bandwidth as NIC
> bandwidth:
> 
> [   39.839989] nfp :04:00.0: Netronome Flow Processor NFP4000/NFP6000 
> PCIe Card Probe
> [   39.848943] nfp :04:00.0: 63.008 Gb/s available bandwidth (8 GT/s x8 
> link)
> [   39.857146] nfp :04:00.0: RESERVED BARs: 0.0: General/MSI-X SRAM, 0.1: 
> PCIe XPB/MSI-X PBA, 0.4: Explicit0, 0.5: Explicit1, fre4
> 
> It's not a 63Gbps NIC...  I'm sorry if this was discussed before and I
> didn't find it.  Would it make sense to add the "PCIe: " prefix to the
> message like bnx2x used to do?  Like:
> 
> nfp :04:00.0: PCIe: 63.008 Gb/s available bandwidth (8 GT/s x8 link)

I agree, that does look potentially confusing.  How about this:

  nfp :04:00.0: 63.008 Gb/s available PCIe bandwidth (8 GT/s x8 link)

I did have to look twice at this before I remembered that we're
printing Gb/s (not GB/s).  Most of the references I found on the web
use GB/s when talking about total PCIe bandwidth.

But either way I think it's definitely worth mentioning PCIe
explicitly.

[PATCH 01/17] perf stat: Enable 1ms interval for printing event counters values

2018-04-13 Thread Arnaldo Carvalho de Melo

From: Alexey Budankov 

Currently print count interval for performance counters values is
limited by 10ms so reading the values at frequencies higher than 100Hz
is restricted by the tool.

This change makes perf stat -I possible on frequencies up to 1KHz and,
to some extent, makes perf stat -I to be on-par with perf record
sampling profiling.

When running perf stat -I for monitoring e.g. PCIe uncore counters and
at the same time profiling some I/O workload by perf record e.g. for
cpu-cycles and context switches, it is then possible to observe
consolidated CPU/OS/IO(Uncore) performance picture for that workload.

Tool overhead warning printed when specifying -v option can be missed
due to screen scrolling in case you have output to the console
so message is moved into help available by running perf stat -h.

Signed-off-by: Alexey Budankov 
Acked-by: Jiri Olsa 
Cc: Alexander Shishkin 
Cc: Andi Kleen 
Cc: Namhyung Kim 
Cc: Peter Zijlstra 
Link: 
http://lkml.kernel.org/r/b842ad6a-d606-32e4-afe5-974071b51...@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/perf/Documentation/perf-stat.txt |  2 +-
 tools/perf/builtin-stat.c  | 14 ++
 2 files changed, 3 insertions(+), 13 deletions(-)

diff --git a/tools/perf/Documentation/perf-stat.txt 
b/tools/perf/Documentation/perf-stat.txt
index f15b306be183..e6c3b4e555c2 100644
--- a/tools/perf/Documentation/perf-stat.txt
+++ b/tools/perf/Documentation/perf-stat.txt
@@ -153,7 +153,7 @@ perf stat --repeat 10 --null --sync --pre 'make -s 
O=defconfig-build/clean' -- m
 
 -I msecs::
 --interval-print msecs::
-Print count deltas every N milliseconds (minimum: 10ms)
+Print count deltas every N milliseconds (minimum: 1ms)
 The overhead percentage could be high in some cases, for instance with small, 
sub 100ms intervals.  Use with caution.
example: 'perf stat -I 1000 -e cycles -a sleep 5'
 
diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index f5c454855908..147a27e8c937 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -1943,7 +1943,8 @@ static const struct option stat_options[] = {
OPT_STRING(0, "post", &post_cmd, "command",
"command to run after to the measured command"),
OPT_UINTEGER('I', "interval-print", &stat_config.interval,
-   "print counts at regular interval in ms (>= 10)"),
+   "print counts at regular interval in ms "
+   "(overhead is possible for values <= 100ms)"),
OPT_INTEGER(0, "interval-count", &stat_config.times,
"print counts for fixed number of times"),
OPT_UINTEGER(0, "timeout", &stat_config.timeout,
@@ -2923,17 +2924,6 @@ int cmd_stat(int argc, const char **argv)
}
}
 
-   if (interval && interval < 100) {
-   if (interval < 10) {
-   pr_err("print interval must be >= 10ms\n");
-   parse_options_usage(stat_usage, stat_options, "I", 1);
-   goto out;
-   } else
-   pr_warning("print interval < 100ms. "
-  "The overhead percentage could be high in 
some cases. "
-  "Please proceed with caution.\n");
-   }
-
if (stat_config.times && interval)
interval_count = true;
else if (stat_config.times && !interval) {
-- 
2.14.3

[PATCH 02/17] tools headers: Restore READ_ONCE() C++ compatibility

2018-04-13 Thread Arnaldo Carvalho de Melo

From: Mark Rutland 

Our userspace  defines READ_ONCE() in a way that clang
doesn't like, as we have an anonymous union in which neither field is
initialized.

WRITE_ONCE() is fine since it initializes the __val field. For
READ_ONCE() we can keep clang and GCC happy with a dummy initialization
of the __c field, so let's do that.

At the same time, let's split READ_ONCE() and WRITE_ONCE() over several
lines for legibility, as we do in the in-kernel .

Reported-by: Li Zhijian 
Reported-by: Sandipan Das 
Tested-by: Sandipan Das 
Signed-off-by: Mark Rutland 
Fixes: 6aa7de059173a986 ("locking/atomics: COCCINELLE/treewide: Convert trivial 
ACCESS_ONCE() patterns to READ_ONCE()/WRITE_ONCE()")
Link: http://lkml.kernel.org/r/20180404163445.16492-1-mark.rutl...@arm.com
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/include/linux/compiler.h | 20 +++-
 1 file changed, 15 insertions(+), 5 deletions(-)

diff --git a/tools/include/linux/compiler.h b/tools/include/linux/compiler.h
index 04e32f965ad7..1827c2f973f9 100644
--- a/tools/include/linux/compiler.h
+++ b/tools/include/linux/compiler.h
@@ -151,11 +151,21 @@ static __always_inline void __write_once_size(volatile 
void *p, void *res, int s
  * required ordering.
  */
 
-#define READ_ONCE(x) \
-   ({ union { typeof(x) __val; char __c[1]; } __u; __read_once_size(&(x), 
__u.__c, sizeof(x)); __u.__val; })
-
-#define WRITE_ONCE(x, val) \
-   ({ union { typeof(x) __val; char __c[1]; } __u = { .__val = (val) }; 
__write_once_size(&(x), __u.__c, sizeof(x)); __u.__val; })
+#define READ_ONCE(x)   \
+({ \
+   union { typeof(x) __val; char __c[1]; } __u =   \
+   { .__c = { 0 } };   \
+   __read_once_size(&(x), __u.__c, sizeof(x)); \
+   __u.__val;  \
+})
+
+#define WRITE_ONCE(x, val) \
+({ \
+   union { typeof(x) __val; char __c[1]; } __u =   \
+   { .__val = (val) }; \
+   __write_once_size(&(x), __u.__c, sizeof(x));\
+   __u.__val;  \
+})
 
 
 #ifndef __fallthrough
-- 
2.14.3

[PATCH] vfio-ccw: process ssch with interrupts disabled

2018-04-13 Thread Cornelia Huck

When we call ssch, an interrupt might already be pending once we
return from the START SUBCHANNEL instruction. Therefore we need to
make sure interrupts are disabled until after we're done with our
processing.

Note that the subchannel lock is the same as the ccwdevice lock that
is mentioned in the documentation for ccw_device_start() and friends.

Signed-off-by: Cornelia Huck 
---
 drivers/s390/cio/vfio_ccw_fsm.c | 19 ---
 1 file changed, 12 insertions(+), 7 deletions(-)

diff --git a/drivers/s390/cio/vfio_ccw_fsm.c b/drivers/s390/cio/vfio_ccw_fsm.c
index ff6963ad6e39..3c800642134e 100644
--- a/drivers/s390/cio/vfio_ccw_fsm.c
+++ b/drivers/s390/cio/vfio_ccw_fsm.c
@@ -20,12 +20,12 @@ static int fsm_io_helper(struct vfio_ccw_private *private)
int ccode;
__u8 lpm;
unsigned long flags;
+   int ret;
 
sch = private->sch;
 
spin_lock_irqsave(sch->lock, flags);
private->state = VFIO_CCW_STATE_BUSY;
-   spin_unlock_irqrestore(sch->lock, flags);
 
orb = cp_get_orb(&private->cp, (u32)(addr_t)sch, sch->lpm);
 
@@ -38,10 +38,12 @@ static int fsm_io_helper(struct vfio_ccw_private *private)
 * Initialize device status information
 */
sch->schib.scsw.cmd.actl |= SCSW_ACTL_START_PEND;
-   return 0;
+   ret = 0;
+   break;
case 1: /* Status pending */
case 2: /* Busy */
-   return -EBUSY;
+   ret = -EBUSY;
+   break;
case 3: /* Device/path not operational */
{
lpm = orb->cmd.lpm;
@@ -51,13 +53,16 @@ static int fsm_io_helper(struct vfio_ccw_private *private)
sch->lpm = 0;
 
if (cio_update_schib(sch))
-   return -ENODEV;
-
-   return sch->lpm ? -EACCES : -ENODEV;
+   ret = -ENODEV;
+   else
+   ret = sch->lpm ? -EACCES : -ENODEV;
+   break;
}
default:
-   return ccode;
+   ret = ccode;
}
+   spin_unlock_irqrestore(sch->lock, flags);
+   return ret;
 }
 
 static void fsm_notoper(struct vfio_ccw_private *private,
-- 
2.14.3

[PATCH 03/17] perf tests: Run dwarf unwind test on arm32

2018-04-13 Thread Arnaldo Carvalho de Melo

From: Kim Phillips 

Enable the unwind test on arm32:

  $ perf test unwind
  58: DWARF unwind  : Ok

Signed-off-by: Kim Phillips 
Cc: Alexander Shishkin 
Cc: Brian Robbins 
Cc: Jiri Olsa 
Cc: Namhyung Kim 
Cc: Peter Zijlstra 
Link: http://lkml.kernel.org/r/20180410191624.a3a468670dd4548c66d3d...@arm.com
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/perf/arch/arm/include/arch-tests.h | 12 
 tools/perf/arch/arm/tests/Build  |  2 ++
 tools/perf/arch/arm/tests/arch-tests.c   | 16 
 3 files changed, 30 insertions(+)
 create mode 100644 tools/perf/arch/arm/include/arch-tests.h
 create mode 100644 tools/perf/arch/arm/tests/arch-tests.c

diff --git a/tools/perf/arch/arm/include/arch-tests.h 
b/tools/perf/arch/arm/include/arch-tests.h
new file mode 100644
index ..90ec4c8cb880
--- /dev/null
+++ b/tools/perf/arch/arm/include/arch-tests.h
@@ -0,0 +1,12 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef ARCH_TESTS_H
+#define ARCH_TESTS_H
+
+#ifdef HAVE_DWARF_UNWIND_SUPPORT
+struct thread;
+struct perf_sample;
+#endif
+
+extern struct test arch_tests[];
+
+#endif
diff --git a/tools/perf/arch/arm/tests/Build b/tools/perf/arch/arm/tests/Build
index b30eff9bcc83..883c57ff0c08 100644
--- a/tools/perf/arch/arm/tests/Build
+++ b/tools/perf/arch/arm/tests/Build
@@ -1,2 +1,4 @@
 libperf-y += regs_load.o
 libperf-y += dwarf-unwind.o
+
+libperf-y += arch-tests.o
diff --git a/tools/perf/arch/arm/tests/arch-tests.c 
b/tools/perf/arch/arm/tests/arch-tests.c
new file mode 100644
index ..5b1543c98022
--- /dev/null
+++ b/tools/perf/arch/arm/tests/arch-tests.c
@@ -0,0 +1,16 @@
+// SPDX-License-Identifier: GPL-2.0
+#include 
+#include "tests/tests.h"
+#include "arch-tests.h"
+
+struct test arch_tests[] = {
+#ifdef HAVE_DWARF_UNWIND_SUPPORT
+   {
+   .desc = "DWARF unwind",
+   .func = test__dwarf_unwind,
+   },
+#endif
+   {
+   .func = NULL,
+   },
+};
-- 
2.14.3

[PATCH 06/17] perf jvmti: Give hints about package names needed to build

2018-04-13 Thread Arnaldo Carvalho de Melo

From: Arnaldo Carvalho de Melo 

Give as examples of package names to install to have this built for
fedora and debian, to help the user a bit.

The part from 'e.g.:' onwards:

  No openjdk development package found, please install JDK package, e.g. 
openjdk-8-jdk, java-1.8.0-openjdk-devel

Cc: Andi Kleen 
Cc: David Ahern 
Cc: Jiri Olsa 
Cc: Namhyung Kim 
Cc: Stephane Eranian 
Cc: William Cohen 
Link: https://lkml.kernel.org/n/tip-edbi4r2pvzn7no6ebxbtc...@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/perf/Makefile.config | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/perf/Makefile.config b/tools/perf/Makefile.config
index c7abd83a8e19..6b307e97dc57 100644
--- a/tools/perf/Makefile.config
+++ b/tools/perf/Makefile.config
@@ -847,7 +847,7 @@ ifndef NO_JVMTI
   ifeq ($(feature-jvmti), 1)
 $(call detected_var,JDIR)
   else
-$(warning No openjdk development package found, please install JDK package)
+$(warning No openjdk development package found, please install JDK 
package, e.g. openjdk-8-jdk, java-1.8.0-openjdk-devel)
 NO_JVMTI := 1
   endif
 endif
-- 
2.14.3

[GIT PULL V2] Thermal management updates for v4.17-rc1

2018-04-13 Thread Zhang Rui

Hi, Linus,

Please pull from
  git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux.git next

to receive the latest Thermal Management updates for v4.17-rc1 with
top-most commit b907b408ca64482989cd95dacef804ce509a3673:

  Merge branches 'thermal-core' and 'thermal-soc' into next (2018-04-13 
14:11:53 +0800)

on top of commit 0c8efd610b58cb23cefdfa12015799079aef94ae:

  Linux 4.16-rc5 (2018-03-11 17:25:09 -0700)

Differences in V2:
- Dropped all patches from thermal-soc tree, including the exynos patch
that introduces the compiler warnings.

Specifics:

- Fix race condition in imx_thermal_probe(). (Mikhail Lappo)

- Add cooling device's statistics in sysfs. (Viresh Kumar)

thanks,
rui


Mikhail Lappo (1):
  thermal: imx: Fix race condition in imx_thermal_probe()

Viresh Kumar (1):
  thermal: Add cooling device's statistics in sysfs

Zhang Rui (1):
  Merge branches 'thermal-core' and 'thermal-soc' into next

 Documentation/thermal/sysfs-api.txt |  31 +
 drivers/thermal/Kconfig |   7 ++
 drivers/thermal/imx_thermal.c   |   6 +-
 drivers/thermal/thermal_core.c  |   3 +-
 drivers/thermal/thermal_core.h  |  10 ++
 drivers/thermal/thermal_helpers.c   |   5 +-
 drivers/thermal/thermal_sysfs.c | 225

 include/linux/thermal.h |   1 +
 8 files changed, 283 insertions(+), 5 deletions(-)

[PATCH 08/17] Revert "x86/asm: Allow again using asm.h when building for the 'bpf' clang target"

2018-04-13 Thread Arnaldo Carvalho de Melo

From: Arnaldo Carvalho de Melo 

This reverts commit ca26cffa4e4aaeb09bb9e308f95c7835cb149248.

Newer clang versions accept that asm(_ASM_SP) construct, and now that
the bpf-script-test-kbuild.c script, used in one of the 'perf test LLVM'
subtests doesn't include ptrace.h, which ended up including
arch/x86/include/asm/asm.h, we can revert this patch.

Suggested-by: Yonghong Song 
Link: https://lkml.kernel.org/r/613f0a0d-c433-8f4d-dcc1-c9889deae...@fb.com
Acked-by: Yonghong Song 
Cc: Adrian Hunter 
Cc: Alexander Potapenko 
Cc: Alexei Starovoitov 
Cc: Andrey Ryabinin 
Cc: Andy Lutomirski 
Cc: Arnd Bergmann 
Cc: Daniel Borkmann 
Cc: David Ahern 
Cc: Dmitriy Vyukov 
Cc: Jiri Olsa 
Cc: Josh Poimboeuf 
Cc: Linus Torvalds 
Cc: Matthias Kaehlcke 
Cc: Miguel Bernal Marin 
Cc: Namhyung Kim 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: Wang Nan 
Link: https://lkml.kernel.org/n/tip-nqozcv8loq40tkqpfw997...@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo 
---
 arch/x86/include/asm/asm.h | 2 --
 1 file changed, 2 deletions(-)

diff --git a/arch/x86/include/asm/asm.h b/arch/x86/include/asm/asm.h
index 386a6900e206..219faaec51df 100644
--- a/arch/x86/include/asm/asm.h
+++ b/arch/x86/include/asm/asm.h
@@ -136,7 +136,6 @@
 #endif
 
 #ifndef __ASSEMBLY__
-#ifndef __BPF__
 /*
  * This output constraint should be used for any inline asm which has a "call"
  * instruction.  Otherwise the asm may be inserted before the frame pointer
@@ -146,6 +145,5 @@
 register unsigned long current_stack_pointer asm(_ASM_SP);
 #define ASM_CALL_CONSTRAINT "+r" (current_stack_pointer)
 #endif
-#endif
 
 #endif /* _ASM_X86_ASM_H */
-- 
2.14.3

Re: [RFC PATCH 24/35] Revert "ovl: fix relatime for directories"

2018-04-13 Thread Amir Goldstein

On Thu, Apr 12, 2018 at 6:08 PM, Miklos Szeredi  wrote:
> This reverts commit cd91304e7190b4c4802f8e413ab2214b233e0260.
>
> Overlayfs no longer relies on the vfs correct atime handling.
>
> Signed-off-by: Miklos Szeredi 
> ---
>  fs/inode.c | 21 -
>  fs/overlayfs/super.c   |  3 ---
>  include/linux/dcache.h |  3 ---
>  3 files changed, 4 insertions(+), 23 deletions(-)
>
> diff --git a/fs/inode.c b/fs/inode.c
> index ef362364d396..163715de8cb2 100644
> --- a/fs/inode.c
> +++ b/fs/inode.c
> @@ -1570,24 +1570,11 @@ EXPORT_SYMBOL(bmap);
>  static void update_ovl_inode_times(struct dentry *dentry, struct inode 
> *inode,
>bool rcu)
>  {
> -   struct dentry *upperdentry;
> +   if (!rcu) {
> +   struct inode *realinode = d_real_inode(dentry);
>
> -   /*
> -* Nothing to do if in rcu or if non-overlayfs
> -*/
> -   if (rcu || likely(!(dentry->d_flags & DCACHE_OP_REAL)))
> -   return;
> -
> -   upperdentry = d_real(dentry, NULL, 0, D_REAL_UPPER);
> -
> -   /*
> -* If file is on lower then we can't update atime, so no worries about
> -* stale mtime/ctime.
> -*/
> -   if (upperdentry) {
> -   struct inode *realinode = d_inode(upperdentry);
> -
> -   if ((!timespec_equal(&inode->i_mtime, &realinode->i_mtime) ||
> +   if (unlikely(inode != realinode) &&
> +   (!timespec_equal(&inode->i_mtime, &realinode->i_mtime) ||
>  !timespec_equal(&inode->i_ctime, &realinode->i_ctime))) {
> inode->i_mtime = realinode->i_mtime;
> inode->i_ctime = realinode->i_ctime;
> diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
> index c3d8c7ea180f..006dc70d7425 100644
> --- a/fs/overlayfs/super.c
> +++ b/fs/overlayfs/super.c
> @@ -107,9 +107,6 @@ static struct dentry *ovl_d_real(struct dentry *dentry,
> if (inode && d_inode(dentry) == inode)
> return dentry;
>
> -   if (flags & D_REAL_UPPER)
> -   return ovl_dentry_upper(dentry);
> -
> if (!d_is_reg(dentry)) {
> if (!inode || inode == d_inode(dentry))
> return dentry;
> diff --git a/include/linux/dcache.h b/include/linux/dcache.h
> index 82a99d366aec..4c7ab11c627a 100644
> --- a/include/linux/dcache.h
> +++ b/include/linux/dcache.h
> @@ -565,9 +565,6 @@ static inline struct dentry *d_backing_dentry(struct 
> dentry *upper)
> return upper;
>  }
>
> -/* d_real() flags */
> -#define D_REAL_UPPER   0x2 /* return upper dentry or NULL if non-upper */
> -

Premature removal of constant. Still in use by may_write_real() at this point.

Thanks,
Amir.

[PATCH 14/17] perf record: Change warning for missing sysfs entry to debug

2018-04-13 Thread Arnaldo Carvalho de Melo

From: Thomas Richter 

Using perf on 4.16.0 kernel on s390 shows this warning:

   failed: can't open node sysfs data

each time I run command perf record ... for example:

  [root@s35lp76 perf]# ./perf record -e rB -- sleep 1
  [ perf record: Woken up 1 times to write data ]
  failed: can't open node sysfs data
  [ perf record: Captured and wrote 0.001 MB perf.data (4 samples) ]
  [root@s35lp76 perf]#

It turns out commit e2091cedd51bf ("perf tools: Add MEM_TOPOLOGY feature
to perf data file") tries to open directory named /sys/devices/system/node/
which does not exist on s390.

This is the call stack:
 __cmd_record
 +---> perf_session__write_header
   +---> perf_header__adds_write
 +---> do_write_feat
   +---> write_mem_topology
 +---> build_mem_topology
   prints warning

The issue starts in do_write_feat() which unconditionally loops over all
features and now includes HEADER_MEM_TOPOLOGY and calls write_mem_topology().

Function record__init_features() at the beginning of __cmd_record() sets
all features and then turns off some of them.

Fix this by changing the warning to a level 2 debug output statement.

So it is only shown when debug level 2 or higher is set.

Signed-off-by: Thomas Richter 
Cc: Heiko Carstens 
Cc: Hendrik Brueckner 
Cc: Martin Schwidefsky 
Link: http://lkml.kernel.org/r/20180412133246.92801-1-tmri...@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/perf/util/header.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/header.c b/tools/perf/util/header.c
index 121df1683c36..a8bff2178fbc 100644
--- a/tools/perf/util/header.c
+++ b/tools/perf/util/header.c
@@ -1320,7 +1320,8 @@ static int build_mem_topology(struct memory_node *nodes, 
u64 size, u64 *cntp)
 
dir = opendir(path);
if (!dir) {
-   pr_warning("failed: can't open node sysfs data\n");
+   pr_debug2("%s: could't read %s, does this arch have topology 
information?\n",
+ __func__, path);
return -1;
}
 
-- 
2.14.3

[PATCH 12/17] perf sched: Fix documentation for timehist

2018-04-13 Thread Arnaldo Carvalho de Melo

From: Takuya Yamamoto 

Fixed a incorrect option and usage to those shown by "perf sched timehist -h",
i.e. the default is really --call-graph, which is equivalent to -g.

Signed-off-by: Takuya Yamamoto 
Cc: Peter Zijlstra 
Link: https://lkml.kernel.org/n/tip-8fzo0dlsi1mku5aqx8bre...@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/perf/Documentation/perf-sched.txt | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/perf/Documentation/perf-sched.txt 
b/tools/perf/Documentation/perf-sched.txt
index bb33601a823b..63f938b887dd 100644
--- a/tools/perf/Documentation/perf-sched.txt
+++ b/tools/perf/Documentation/perf-sched.txt
@@ -104,8 +104,8 @@ OPTIONS for 'perf sched timehist'
 kallsyms pathname
 
 -g::
---no-call-graph::
-   Do not display call chains if present.
+--call-graph::
+   Display call chains if present (default on).
 
 --max-stack::
Maximum number of functions to display in backtrace, default 5.
-- 
2.14.3

[PATCH 13/17] perf tests: Disable breakpoint accounting test for powerpc

2018-04-13 Thread Arnaldo Carvalho de Melo

From: Sandipan Das 

We disable this test as instruction breakpoints (HW_BREAKPOINT_X) are
not available for powerpc.

Before applying patch:

  21: Breakpoint accounting :
  --- start ---
  test child forked, pid 3635
  failed opening event 0
  failed opening event 0
  watchpoints count 1, breakpoints count 0, has_ioctl 1, share 0
  test child finished with -2
   end 
  Breakpoint accounting: Skip

After applying patch:

  21: Breakpoint accounting : Disabled

Signed-off-by: Sandipan Das 
Cc: Jiri Olsa 
Cc: Naveen N. Rao 
Cc: Ravi Bangoria 
Link: http://lkml.kernel.org/r/20180412162140.2992-1-sandi...@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/perf/tests/builtin-test.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tools/perf/tests/builtin-test.c b/tools/perf/tests/builtin-test.c
index 625f5a6772af..cac8f8889bc3 100644
--- a/tools/perf/tests/builtin-test.c
+++ b/tools/perf/tests/builtin-test.c
@@ -118,6 +118,7 @@ static struct test generic_tests[] = {
{
.desc = "Breakpoint accounting",
.func = test__bp_accounting,
+   .is_supported = test__bp_signal_is_supported,
},
{
.desc = "Number of exit events of a simple workload",
-- 
2.14.3

[PATCH 15/17] perf report: Fix switching to another perf.data file

2018-04-13 Thread Arnaldo Carvalho de Melo

From: Arnaldo Carvalho de Melo 

In the TUI the 's' hotkey can be used to switch to another perf.data
file in the current directory, but that got broken in Fixes:
b01141f4f59c ("perf annotate: Initialize the priv are in symbol__new()"),
that would show this once another file was chosen:

┌─Fatal Error─┐
│Annotation needs to be init before symbol__init()│
│ │
│ │
│Press any key... │
└─┘

Fix it by just silently bailing out if symbol__annotation_init() was already
called, just like is done with symbol__init(), i.e. they are done just once at
session start, not when switching to a new perf.data file.

Cc: Adrian Hunter 
Cc: Andi Kleen 
Cc: David Ahern 
Cc: Jin Yao 
Cc: Jiri Olsa 
Cc: Martin Liška 
Cc: Namhyung Kim 
Cc: Ravi Bangoria 
Cc: Thomas Richter 
Cc: Wang Nan 
Fixes: b01141f4f59c ("perf annotate: Initialize the priv are in symbol__new()")
Link: https://lkml.kernel.org/n/tip-ogppdtpzfax7y1h6gjdv5...@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/perf/util/symbol.c | 8 +++-
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
index 62b2dd2253eb..1466814ebada 100644
--- a/tools/perf/util/symbol.c
+++ b/tools/perf/util/symbol.c
@@ -2091,16 +2091,14 @@ static bool symbol__read_kptr_restrict(void)
 
 int symbol__annotation_init(void)
 {
+   if (symbol_conf.init_annotation)
+   return 0;
+
if (symbol_conf.initialized) {
pr_err("Annotation needs to be init before symbol__init()\n");
return -1;
}
 
-   if (symbol_conf.init_annotation) {
-   pr_warning("Annotation being initialized multiple times\n");
-   return 0;
-   }
-
symbol_conf.priv_size += sizeof(struct annotation);
symbol_conf.init_annotation = true;
return 0;
-- 
2.14.3

[PATCH 16/17] perf annotate: Allow setting the offset level in .perfconfig

2018-04-13 Thread Arnaldo Carvalho de Melo

From: Arnaldo Carvalho de Melo 

The default is 1 (jump_target):

  # perf annotate --ignore-vmlinux --stdio2 _raw_spin_lock_irqsave
  Samples: 3K of event 'cycles:ppp', 3000 Hz, Event count (approx.): 2766398574
  _raw_spin_lock_irqsave() /proc/kcore
0.26nop
4.61push   %rbx
   19.33pushfq
7.97pop%rax
0.32nop
0.06mov%rax,%rbx
   14.63cli
0.06nop
xor%eax,%eax
mov$0x1,%edx
   49.94lock   cmpxchg %edx,(%rdi)
0.16test   %eax,%eax
  ↓ jne2b
2.66mov%rbx,%rax
pop%rbx
  ← retq
  2b:   mov%eax,%esi
  → callq  *b30eaed0
mov%rbx,%rax
pop%rbx
  ← retq
  #

But one can ask for showing offsets for call instructions by setting
this:

  # perf annotate --ignore-vmlinux --stdio2 _raw_spin_lock_irqsave
  Samples: 3K of event 'cycles:ppp', 3000 Hz, Event count (approx.): 2766398574
  _raw_spin_lock_irqsave() /proc/kcore
0.26nop
4.61push   %rbx
   19.33pushfq
7.97pop%rax
0.32nop
0.06mov%rax,%rbx
   14.63cli
0.06nop
xor%eax,%eax
mov$0x1,%edx
   49.94lock   cmpxchg %edx,(%rdi)
0.16test   %eax,%eax
  ↓ jne2b
2.66mov%rbx,%rax
pop%rbx
  ← retq
  2b:   mov%eax,%esi
  2d: → callq  *b30eaed0
mov%rbx,%rax
pop%rbx
  ← retq
  #

Or using a big value to ask for all offsets to be shown:

  # cat ~/.perfconfig
  [annotate]

offset_level = 100

hide_src_code = true
  # perf annotate --ignore-vmlinux --stdio2 _raw_spin_lock_irqsave
  Samples: 3K of event 'cycles:ppp', 3000 Hz, Event count (approx.): 2766398574
  _raw_spin_lock_irqsave() /proc/kcore
0.26   0:   nop
4.61   5:   push   %rbx
   19.33   6:   pushfq
7.97   7:   pop%rax
0.32   8:   nop
0.06   d:   mov%rax,%rbx
   14.63  10:   cli
0.06  11:   nop
  17:   xor%eax,%eax
  19:   mov$0x1,%edx
   49.94  1e:   lock   cmpxchg %edx,(%rdi)
0.16  22:   test   %eax,%eax
  24: ↓ jne2b
2.66  26:   mov%rbx,%rax
  29:   pop%rbx
  2a: ← retq
  2b:   mov%eax,%esi
  2d: → callq  *b30eaed0
  32:   mov%rbx,%rax
  35:   pop%rbx
  36: ← retq
   #

This also affects the TUI, i.e. the default 'perf annotate' and 'perf
top/report' -> A hotkey -> annotate interfaces, when slang-devel is present
in the build, i.e.:

  # perf version --build-options | grep slang
  libslang: [ on  ]  # HAVE_SLANG_SUPPORT
  #

Cc: Adrian Hunter 
Cc: Andi Kleen 
Cc: David Ahern 
Cc: Jin Yao 
Cc: Jiri Olsa 
Cc: Martin Liška 
Cc: Namhyung Kim 
Cc: Ravi Bangoria 
Cc: Thomas Richter 
Cc: Wang Nan 
Link: https://lkml.kernel.org/n/tip-venm6x5zrt40eu8hxdsmq...@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/perf/Documentation/perf-config.txt |  5 +
 tools/perf/util/annotate.c   | 15 ---
 2 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/tools/perf/Documentation/perf-config.txt 
b/tools/perf/Documentation/perf-config.txt
index 5b4fff3adc4b..32f4a898e3f2 100644
--- a/tools/perf/Documentation/perf-config.txt
+++ b/tools/perf/Documentation/perf-config.txt
@@ -334,6 +334,11 @@ annotate.*::
 
99.93 │  mov%eax,%eax
 
+   annotate.offset_level::
+   Default is '1', meaning just jump targets will have offsets 
show right beside
+   the instruction. When set to '2' 'call' instructions will also 
have its offsets
+   shown, 3 or higher will show offsets for all instructions.
+
 hist.*::
hist.percentage::
This option control the way to calculate overhead of filtered 
entries -
diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
index 5edc565d86c4..536ee148bff8 100644
--- a/tools/perf/util/annotate.c
+++ b/tools/perf/util/annotate.c
@@ -2649,10 +2649,11 @@ int __annotation__scnprintf_samples_period(struct 
annotation *notes,
  */
 static struct annotation_config {
const char *name;
-   bool *value;
+   void *value;
 } annotation__configs[] = {
ANNOTATION__CFG(hide_src_code),
ANNOTATION__CFG(jump_arrows),
+   ANNOTATION__CFG(offset_level),
ANNOTATION__CFG(show_linenr),
ANNOTATION__CFG(show_nr_jumps),
ANNOTATION__CFG(show_nr_samples),
@@ -2684,8 +2685,16 @@ static int annotation__config(const char *var, const 
char *value,
 
if (cfg == NULL)
pr_debug("%s variable unknown, ignoring...", var);
-   else
-   *cfg->valu

[PATCH 17/17] perf annotate: Handle variables in 'sub', 'or' and many other instructions

2018-04-13 Thread Arnaldo Carvalho de Melo

From: Arnaldo Carvalho de Melo 

Just like is done for 'mov' and others that can have as source or
targets variables resolved by objdump, to make them more compact:

-   orb$0x4,0x224d71(%rip)# 226ca4 <_rtld_global+0xca4>
+   orb$0x4,_rtld_global+0xca4

Cc: Adrian Hunter 
Cc: Andi Kleen 
Cc: David Ahern 
Cc: Jin Yao 
Cc: Jiri Olsa 
Cc: Martin Liška 
Cc: Namhyung Kim 
Cc: Ravi Bangoria 
Cc: Thomas Richter 
Cc: Wang Nan 
Link: https://lkml.kernel.org/n/tip-efex7746id4w4wa03nqxv...@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/perf/arch/x86/annotate/instructions.c | 67 -
 1 file changed, 66 insertions(+), 1 deletion(-)

diff --git a/tools/perf/arch/x86/annotate/instructions.c 
b/tools/perf/arch/x86/annotate/instructions.c
index 5bd1ba8c0282..44f5aba78210 100644
--- a/tools/perf/arch/x86/annotate/instructions.c
+++ b/tools/perf/arch/x86/annotate/instructions.c
@@ -1,21 +1,43 @@
 // SPDX-License-Identifier: GPL-2.0
 static struct ins x86__instructions[] = {
+   { .name = "adc",.ops = &mov_ops,  },
+   { .name = "adcb",   .ops = &mov_ops,  },
+   { .name = "adcl",   .ops = &mov_ops,  },
{ .name = "add",.ops = &mov_ops,  },
{ .name = "addl",   .ops = &mov_ops,  },
{ .name = "addq",   .ops = &mov_ops,  },
+   { .name = "addsd",  .ops = &mov_ops,  },
{ .name = "addw",   .ops = &mov_ops,  },
{ .name = "and",.ops = &mov_ops,  },
+   { .name = "andb",   .ops = &mov_ops,  },
+   { .name = "andl",   .ops = &mov_ops,  },
+   { .name = "andpd",  .ops = &mov_ops,  },
+   { .name = "andps",  .ops = &mov_ops,  },
+   { .name = "andq",   .ops = &mov_ops,  },
+   { .name = "andw",   .ops = &mov_ops,  },
+   { .name = "bsr",.ops = &mov_ops,  },
+   { .name = "bt", .ops = &mov_ops,  },
+   { .name = "btr",.ops = &mov_ops,  },
{ .name = "bts",.ops = &mov_ops,  },
+   { .name = "btsq",   .ops = &mov_ops,  },
{ .name = "call",   .ops = &call_ops, },
{ .name = "callq",  .ops = &call_ops, },
+   { .name = "cmovbe", .ops = &mov_ops,  },
+   { .name = "cmove",  .ops = &mov_ops,  },
+   { .name = "cmovae", .ops = &mov_ops,  },
{ .name = "cmp",.ops = &mov_ops,  },
{ .name = "cmpb",   .ops = &mov_ops,  },
{ .name = "cmpl",   .ops = &mov_ops,  },
{ .name = "cmpq",   .ops = &mov_ops,  },
{ .name = "cmpw",   .ops = &mov_ops,  },
{ .name = "cmpxch", .ops = &mov_ops,  },
+   { .name = "cmpxchg",.ops = &mov_ops,  },
+   { .name = "cs", .ops = &mov_ops,  },
{ .name = "dec",.ops = &dec_ops,  },
{ .name = "decl",   .ops = &dec_ops,  },
+   { .name = "divsd",  .ops = &mov_ops,  },
+   { .name = "divss",  .ops = &mov_ops,  },
+   { .name = "gs", .ops = &mov_ops,  },
{ .name = "imul",   .ops = &mov_ops,  },
{ .name = "inc",.ops = &dec_ops,  },
{ .name = "incl",   .ops = &dec_ops,  },
@@ -57,25 +79,68 @@ static struct ins x86__instructions[] = {
{ .name = "lea",.ops = &mov_ops,  },
{ .name = "lock",   .ops = &lock_ops, },
{ .name = "mov",.ops = &mov_ops,  },
+   { .name = "movapd", .ops = &mov_ops,  },
+   { .name = "movaps", .ops = &mov_ops,  },
{ .name = "movb",   .ops = &mov_ops,  },
{ .name = "movdqa", .ops = &mov_ops,  },
+   { .name = "movdqu", .ops = &mov_ops,  },
{ .name = "movl",   .ops = &mov_ops,  },
{ .name = "movq",   .ops = &mov_ops,  },
+   { .name = "movsd",  .ops = &mov_ops,  },
{ .name = "movslq", .ops = &mov_ops,  },
+   { .name = "movss",  .ops = &mov_ops,  },
+   { .name = "movupd", .ops = &mov_ops,  },
+   { .name = "movups", .ops = &mov_ops,  },
+   { .name = "movw",   .ops = &mov_ops,  },
{ .name = "movzbl", .ops = &mov_ops,  },
{ .name = "movzwl", .ops = &mov_ops,  },
+   { .name = "mulsd",  .ops = &mov_ops,  },
+   { .name = "mulss",  .ops = &mov_ops,  },
{ .name = "nop",.ops = &nop_ops,  },
{ .name = "nopl",   .ops = &nop_ops,  },
{ .name = "nopw",   .ops = &nop_ops,  },
{ .name = "or", .ops = &mov_ops,  },
+   { .name = "orb",.ops = &mov_ops,  },
{ .name = "orl",.ops = &mov_ops,  },
+   { .name = "orps",   .ops = &mov_ops,  },
+   { .name = "orq",.ops = &mov_ops,  },
+   { .name = "pand",   .ops = &mov_ops,  },
+   { .name = "paddq",  .ops = &mov_ops,  },
+   { .name = "pcmpeqb",.ops = &mov_ops,  },
+   { .name = "por",.ops = &mov_ops,  },
+   { .n

[PATCH 11/17] perf version: Print status for syscall_table

2018-04-13 Thread Arnaldo Carvalho de Melo

From: Jin Yao 

This patch doesn't print "libaudit" line if HAVE_SYSCALL_TABLE_SUPPORT
is available and add a line for HAVE_SYSCALL_TABLE_SUPPORT.

For example,

$ ./perf -vv
perf version 4.13.rc5.gc2f8af9
 dwarf: [ on  ]  # HAVE_DWARF_SUPPORT
dwarf_getlocations: [ on  ]  # HAVE_DWARF_GETLOCATIONS_SUPPORT
 glibc: [ on  ]  # HAVE_GLIBC_SUPPORT
  gtk2: [ on  ]  # HAVE_GTK2_SUPPORT
 syscall_table: [ on  ]  # HAVE_SYSCALL_TABLE_SUPPORT
libbfd: [ on  ]  # HAVE_LIBBFD_SUPPORT
libelf: [ on  ]  # HAVE_LIBELF_SUPPORT
   libnuma: [ on  ]  # HAVE_LIBNUMA_SUPPORT
numa_num_possible_cpus: [ on  ]  # HAVE_LIBNUMA_SUPPORT
   libperl: [ on  ]  # HAVE_LIBPERL_SUPPORT
 libpython: [ on  ]  # HAVE_LIBPYTHON_SUPPORT
  libslang: [ on  ]  # HAVE_SLANG_SUPPORT
 libcrypto: [ on  ]  # HAVE_LIBCRYPTO_SUPPORT
 libunwind: [ on  ]  # HAVE_LIBUNWIND_SUPPORT
libdw-dwarf-unwind: [ on  ]  # HAVE_DWARF_SUPPORT
  zlib: [ on  ]  # HAVE_ZLIB_SUPPORT
  lzma: [ on  ]  # HAVE_LZMA_SUPPORT
 get_cpuid: [ on  ]  # HAVE_AUXTRACE_SUPPORT
   bpf: [ on  ]  # HAVE_LIBBPF_SUPPORT

The line "syscall_table: [ on  ]  # HAVE_SYSCALL_TABLE_SUPPORT" is
new created.

Signed-off-by: Jin Yao 
Suggested-by: Arnaldo Carvalho de Melo 
Tested-by: Arnaldo Carvalho de Melo 
Cc: Alexander Shishkin 
Cc: Andi Kleen 
Cc: Jiri Olsa 
Cc: Kan Liang 
Cc: Peter Zijlstra 
Link: 
http://lkml.kernel.org/r/1523269609-28824-4-git-send-email-yao@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/perf/builtin-version.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/tools/perf/builtin-version.c b/tools/perf/builtin-version.c
index 2abe3910d6b6..50df168be326 100644
--- a/tools/perf/builtin-version.c
+++ b/tools/perf/builtin-version.c
@@ -60,7 +60,10 @@ static void library_status(void)
STATUS(HAVE_DWARF_GETLOCATIONS_SUPPORT, dwarf_getlocations);
STATUS(HAVE_GLIBC_SUPPORT, glibc);
STATUS(HAVE_GTK2_SUPPORT, gtk2);
+#ifndef HAVE_SYSCALL_TABLE_SUPPORT
STATUS(HAVE_LIBAUDIT_SUPPORT, libaudit);
+#endif
+   STATUS(HAVE_SYSCALL_TABLE_SUPPORT, syscall_table);
STATUS(HAVE_LIBBFD_SUPPORT, libbfd);
STATUS(HAVE_LIBELF_SUPPORT, libelf);
STATUS(HAVE_LIBNUMA_SUPPORT, libnuma);
-- 
2.14.3

[PATCH 10/17] perf tools: Rename HAVE_SYSCALL_TABLE to HAVE_SYSCALL_TABLE_SUPPORT

2018-04-13 Thread Arnaldo Carvalho de Melo

From: Jin Yao 

To be consistent with other HAVE_XXX_SUPPORT uses in Makefile.config,
this patch renames HAVE_SYSCALL_TABLE to HAVE_SYSCALL_TABLE_SUPPORT and
updates the C code accordingly.

Signed-off-by: Jin Yao 
Suggested-by: Arnaldo Carvalho de Melo 
Cc: Alexander Shishkin 
Cc: Andi Kleen 
Cc: Jiri Olsa 
Cc: Kan Liang 
Cc: Peter Zijlstra 
Link: 
http://lkml.kernel.org/r/1523269609-28824-3-git-send-email-yao@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/perf/Makefile.config  | 2 +-
 tools/perf/builtin-help.c   | 2 +-
 tools/perf/perf.c   | 4 ++--
 tools/perf/util/generate-cmdlist.sh | 2 +-
 tools/perf/util/syscalltbl.c| 6 +++---
 5 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/tools/perf/Makefile.config b/tools/perf/Makefile.config
index 6b307e97dc57..ae7dc46e8f8a 100644
--- a/tools/perf/Makefile.config
+++ b/tools/perf/Makefile.config
@@ -68,7 +68,7 @@ ifeq ($(NO_PERF_REGS),0)
 endif
 
 ifneq ($(NO_SYSCALL_TABLE),1)
-  CFLAGS += -DHAVE_SYSCALL_TABLE
+  CFLAGS += -DHAVE_SYSCALL_TABLE_SUPPORT
 endif
 
 # So far there's only x86 and arm libdw unwind support merged in perf.
diff --git a/tools/perf/builtin-help.c b/tools/perf/builtin-help.c
index 4aca13f23b9d..1c41b4eaf73c 100644
--- a/tools/perf/builtin-help.c
+++ b/tools/perf/builtin-help.c
@@ -439,7 +439,7 @@ int cmd_help(int argc, const char **argv)
 #ifdef HAVE_LIBELF_SUPPORT
"probe",
 #endif
-#if defined(HAVE_LIBAUDIT_SUPPORT) || defined(HAVE_SYSCALL_TABLE)
+#if defined(HAVE_LIBAUDIT_SUPPORT) || defined(HAVE_SYSCALL_TABLE_SUPPORT)
"trace",
 #endif
NULL };
diff --git a/tools/perf/perf.c b/tools/perf/perf.c
index 1659029d03fc..20a08cb32332 100644
--- a/tools/perf/perf.c
+++ b/tools/perf/perf.c
@@ -73,7 +73,7 @@ static struct cmd_struct commands[] = {
{ "lock",   cmd_lock,   0 },
{ "kvm",cmd_kvm,0 },
{ "test",   cmd_test,   0 },
-#if defined(HAVE_LIBAUDIT_SUPPORT) || defined(HAVE_SYSCALL_TABLE)
+#if defined(HAVE_LIBAUDIT_SUPPORT) || defined(HAVE_SYSCALL_TABLE_SUPPORT)
{ "trace",  cmd_trace,  0 },
 #endif
{ "inject", cmd_inject, 0 },
@@ -491,7 +491,7 @@ int main(int argc, const char **argv)
argv[0] = cmd;
}
if (strstarts(cmd, "trace")) {
-#if defined(HAVE_LIBAUDIT_SUPPORT) || defined(HAVE_SYSCALL_TABLE)
+#if defined(HAVE_LIBAUDIT_SUPPORT) || defined(HAVE_SYSCALL_TABLE_SUPPORT)
setup_path();
argv[0] = "trace";
return cmd_trace(argc, argv);
diff --git a/tools/perf/util/generate-cmdlist.sh 
b/tools/perf/util/generate-cmdlist.sh
index ff17920a5ebc..c3cef36d4176 100755
--- a/tools/perf/util/generate-cmdlist.sh
+++ b/tools/perf/util/generate-cmdlist.sh
@@ -38,7 +38,7 @@ do
 done
 echo "#endif /* HAVE_LIBELF_SUPPORT */"
 
-echo "#if defined(HAVE_LIBAUDIT_SUPPORT) || defined(HAVE_SYSCALL_TABLE)"
+echo "#if defined(HAVE_LIBAUDIT_SUPPORT) || 
defined(HAVE_SYSCALL_TABLE_SUPPORT)"
 sed -n -e 's/^perf-\([^]*\)[   ].* audit*/\1/p' command-list.txt |
 sort |
 while read cmd
diff --git a/tools/perf/util/syscalltbl.c b/tools/perf/util/syscalltbl.c
index 895122d638dd..0ee7f568d60c 100644
--- a/tools/perf/util/syscalltbl.c
+++ b/tools/perf/util/syscalltbl.c
@@ -17,7 +17,7 @@
 #include 
 #include 
 
-#ifdef HAVE_SYSCALL_TABLE
+#ifdef HAVE_SYSCALL_TABLE_SUPPORT
 #include 
 #include "string2.h"
 #include "util.h"
@@ -139,7 +139,7 @@ int syscalltbl__strglobmatch_first(struct syscalltbl *tbl, 
const char *syscall_g
return syscalltbl__strglobmatch_next(tbl, syscall_glob, idx);
 }
 
-#else /* HAVE_SYSCALL_TABLE */
+#else /* HAVE_SYSCALL_TABLE_SUPPORT */
 
 #include 
 
@@ -176,4 +176,4 @@ int syscalltbl__strglobmatch_first(struct syscalltbl *tbl, 
const char *syscall_g
 {
return syscalltbl__strglobmatch_next(tbl, syscall_glob, idx);
 }
-#endif /* HAVE_SYSCALL_TABLE */
+#endif /* HAVE_SYSCALL_TABLE_SUPPORT */
-- 
2.14.3

[PATCH 07/17] perf tests bpf: Remove unused ptrace.h include from LLVM test

2018-04-13 Thread Arnaldo Carvalho de Melo

From: Arnaldo Carvalho de Melo 

The bpf-script-test-kbuild.c script, used in one of the LLVM subtests,
includes ptrace.h unnecessarily, and that ends up making it include a
header that uses asm(_ASM_SP), a feature that is not supported by clang
<= 4.0, breaking that 'perf test' entry.

This ended up leading to the ca26cffa4e4a ("x86/asm: Allow again using
asm.h when building for the 'bpf' clang target"), adding an ifndef
__BPF__ to the arch/x86/include/asm/asm.h file.

Newer clang versions accept that asm(_ASM_SP) construct, so just remove
the ptrace.h include, which paves the way for reverting ca26cffa4e4a
("x86/asm: Allow again using asm.h when building for the 'bpf' clang
target").

Suggested-by: Yonghong Song 
Acked-by: Yonghong Song 
Link: https://lkml.kernel.org/r/613f0a0d-c433-8f4d-dcc1-c9889deae...@fb.com
Cc: Adrian Hunter 
Cc: Alexander Potapenko 
Cc: Alexei Starovoitov 
Cc: Andrey Ryabinin 
Cc: Andy Lutomirski 
Cc: Arnd Bergmann 
Cc: Daniel Borkmann 
Cc: David Ahern 
Cc: Dmitriy Vyukov 
Cc: Jiri Olsa 
Cc: Josh Poimboeuf 
Cc: Linus Torvalds 
Cc: Matthias Kaehlcke 
Cc: Miguel Bernal Marin 
Cc: Namhyung Kim 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: Wang Nan 
Link: https://lkml.kernel.org/n/tip-clbcnzbakdp18ibme4wt4...@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/perf/tests/bpf-script-test-kbuild.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/tools/perf/tests/bpf-script-test-kbuild.c 
b/tools/perf/tests/bpf-script-test-kbuild.c
index 3626924740d8..ff3ec8337f0a 100644
--- a/tools/perf/tests/bpf-script-test-kbuild.c
+++ b/tools/perf/tests/bpf-script-test-kbuild.c
@@ -9,7 +9,6 @@
 #define SEC(NAME) __attribute__((section(NAME), used))
 
 #include 
-#include 
 
 SEC("func=vfs_llseek")
 int bpf_func__vfs_llseek(void *ctx)
-- 
2.14.3

[PATCH 05/17] perf annotate browser: Allow showing offsets in more than just jump targets

2018-04-13 Thread Arnaldo Carvalho de Melo

From: Arnaldo Carvalho de Melo 

Jesper wanted to see offsets at callq sites when doing some performance
investigation related to retpolines, so save him some time by providing
a 'O' hotkey to allow showing offsets from function start at call
instructions or in all instructions, just go on pressing 'O' till the
offsets you need appear.

Example:

Starts with:

Samples: 64  of event 'cycles:ppp', 10 Hz, Event count (approx.): 318963
ixgbe_read_reg  /proc/kcore
Percent│↑ je 2a
   │   ┌──cmp$0x,%r13d
   │   ├──je d0
   │   │  mov$0x53e3,%edi
   │   │→ callq  __const_udelay
   │   │  sub$0x1,%r15d
   │   │↑ jne83
   │   │  mov0x8(%rbp),%rax
   │   │  testb  $0x20,0x1799(%rax)
   │   │↑ je 2a
   │   │  mov0x200(%rax),%rdi
   │   │  mov%r13d,%edx
   │   │  mov$0xc02595d8,%rsi
   │   │→ callq  netdev_warn
   │   │↑ jmpq   2a
   │d0:└─→mov0x8(%rbp),%rsi
   │  mov%rbp,%rdi
   │  mov%eax,0x4(%rsp)
   │→ callq  ixgbe_remove_adapter.isra.77
   │  mov0x4(%rsp),%eax
Press 'h' for help on key bindings


Pess 'O':

Samples: 64  of event 'cycles:ppp', 10 Hz, Event count (approx.): 318963
ixgbe_read_reg  /proc/kcore
Percent│↑ je 2a
   │   ┌──cmp$0x,%r13d
   │   ├──je d0
   │   │  mov$0x53e3,%edi
   │99:│→ callq  __const_udelay
   │   │  sub$0x1,%r15d
   │   │↑ jne83
   │   │  mov0x8(%rbp),%rax
   │   │  testb  $0x20,0x1799(%rax)
   │   │↑ je 2a
   │   │  mov0x200(%rax),%rdi
   │   │  mov%r13d,%edx
   │   │  mov$0xc02595d8,%rsi
   │c6:│→ callq  netdev_warn
   │   │↑ jmpq   2a
   │d0:└─→mov0x8(%rbp),%rsi
   │  mov%rbp,%rdi
   │  mov%eax,0x4(%rsp)
   │db: → callq  ixgbe_remove_adapter.isra.77
   │  mov0x4(%rsp),%eax
Press 'h' for help on key bindings


Press 'O' again:

Samples: 64  of event 'cycles:ppp', 10 Hz, Event count (approx.): 318963
ixgbe_read_reg  /proc/kcore
Percent│8c: ↑ je 2a
   │8e:┌──cmp$0x,%r13d
   │92:├──je d0
   │94:│  mov$0x53e3,%edi
   │99:│→ callq  __const_udelay
   │9e:│  sub$0x1,%r15d
   │a2:│↑ jne83
   │a4:│  mov0x8(%rbp),%rax
   │a8:│  testb  $0x20,0x1799(%rax)
   │af:│↑ je 2a
   │b5:│  mov0x200(%rax),%rdi
   │bc:│  mov%r13d,%edx
   │bf:│  mov$0xc02595d8,%rsi
   │c6:│→ callq  netdev_warn
   │cb:│↑ jmpq   2a
   │d0:└─→mov0x8(%rbp),%rsi
   │d4:   mov%rbp,%rdi
   │d7:   mov%eax,0x4(%rsp)
   │db: → callq  ixgbe_remove_adapter.isra.77
   │e0:   mov0x4(%rsp),%eax
Press 'h' for help on key bindings


Press 'O' again and it will show just jump target offsets.

Suggested-by: Jesper Dangaard Brouer 
Cc: Adrian Hunter 
Cc: Alexei Starovoitov 
Cc: Andi Kleen 
Cc: Daniel Borkmann 
Cc: David Ahern 
Cc: Jin Yao 
Cc: Jiri Olsa 
Cc: Linus Torvalds 
Cc: Martin Liška 
Cc: Namhyung Kim 
Cc: Ravi Bangoria 
Cc: Thomas Richter 
Cc: Wang Nan 
Link: https://lkml.kernel.org/n/tip-upp6pfdetwlsx18ec2uf1...@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/perf/ui/browsers/annotate.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/tools/perf/ui/browsers/annotate.c 
b/tools/perf/ui/browsers/annotate.c
index 12c099a87f8b..3781d74088a7 100644
--- a/tools/perf/ui/browsers/annotate.c
+++ b/tools/perf/ui/browsers/annotate.c
@@ -692,6 +692,7 @@ static int annotate_browser__run(struct annotate_browser 
*browser,
"J Toggle showing number of jump sources on 
targets\n"
"n Search next string\n"
"o Toggle disassembler output/simplified view\n"
+   "O Bump offset level (jump targets -> +call -> all 
-> cycle thru)\n"
"s Toggle source code view\n"
"t Circulate percent, total period, samples view\n"
"/ Search string\n"
@@ -719,6 +720,10 @@ static int annotate_browser__run(struct annotate_browser 
*browser,
notes->options->use_offset = 
!notes->options->use_offset;
annotation__update_column_widths(notes);
continue;
+   case 'O':
+   if (++notes->options->offset_level > 
ANNOTATION__MAX_OFFSET_LEVEL)
+   notes->options->offset_level = 
ANNOTATION__MIN_OFFSET_LEVEL;
+   continue;
case 'j':
notes->options->jump_arrows = 
!notes->

[PATCH 04/17] perf annotate: Allow showing offsets in more than just jump targets

2018-04-13 Thread Arnaldo Carvalho de Melo

From: Arnaldo Carvalho de Melo 

Jesper wanted to see offsets at callq sites when doing some performance
investigation related to retpolines, so save him some time by providing
an 'struct annotation_options' to control where offsets should appear:
just on jump targets? That + call instructions? All?

This puts in place the logic to show the offsets, now we need to wire
this up in the TUI browser (next patch) and on the 'perf annotate --stdio2"
interface, where we need a more general mechanism to setup the
'annotation_options' struct from the command line.

Suggested-by: Jesper Dangaard Brouer 
Cc: Adrian Hunter 
Cc: Alexei Starovoitov 
Cc: Andi Kleen 
Cc: Daniel Borkmann 
Cc: David Ahern 
Cc: Jin Yao 
Cc: Jiri Olsa 
Cc: Linus Torvalds 
Cc: Martin Liška 
Cc: Namhyung Kim 
Cc: Ravi Bangoria 
Cc: Thomas Richter 
Cc: Wang Nan 
Link: https://lkml.kernel.org/n/tip-m3jc9c3swobye9tj08gnh...@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/perf/util/annotate.c | 11 +--
 tools/perf/util/annotate.h |  9 +
 2 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
index fbad8dfbb186..5edc565d86c4 100644
--- a/tools/perf/util/annotate.c
+++ b/tools/perf/util/annotate.c
@@ -46,6 +46,7 @@
 struct annotation_options annotation__default_options = {
.use_offset = true,
.jump_arrows= true,
+   .offset_level   = ANNOTATION__OFFSET_JUMP_TARGETS,
 };
 
 const char *disassembler_style;
@@ -2512,7 +2513,8 @@ static void __annotation_line__write(struct 
annotation_line *al, struct annotati
if (!notes->options->use_offset) {
printed = scnprintf(bf, sizeof(bf), "%" PRIx64 ": ", 
addr);
} else {
-   if (al->jump_sources) {
+   if (al->jump_sources &&
+   notes->options->offset_level >= 
ANNOTATION__OFFSET_JUMP_TARGETS) {
if (notes->options->show_nr_jumps) {
int prev;
printed = scnprintf(bf, sizeof(bf), 
"%*d ",
@@ -2523,9 +2525,14 @@ static void __annotation_line__write(struct 
annotation_line *al, struct annotati
obj__printf(obj, bf);
obj__set_color(obj, prev);
}
-
+print_addr:
printed = scnprintf(bf, sizeof(bf), "%*" PRIx64 
": ",
notes->widths.target, addr);
+   } else if (ins__is_call(&disasm_line(al)->ins) &&
+  notes->options->offset_level >= 
ANNOTATION__OFFSET_CALL) {
+   goto print_addr;
+   } else if (notes->options->offset_level == 
ANNOTATION__MAX_OFFSET_LEVEL) {
+   goto print_addr;
} else {
printed = scnprintf(bf, sizeof(bf), "%-*s  ",
notes->widths.addr, " ");
diff --git a/tools/perf/util/annotate.h b/tools/perf/util/annotate.h
index db8d09bea07e..f28a9e43421d 100644
--- a/tools/perf/util/annotate.h
+++ b/tools/perf/util/annotate.h
@@ -70,8 +70,17 @@ struct annotation_options {
 show_nr_jumps,
 show_nr_samples,
 show_total_period;
+   u8   offset_level;
 };
 
+enum {
+   ANNOTATION__OFFSET_JUMP_TARGETS = 1,
+   ANNOTATION__OFFSET_CALL,
+   ANNOTATION__MAX_OFFSET_LEVEL,
+};
+
+#define ANNOTATION__MIN_OFFSET_LEVEL ANNOTATION__OFFSET_JUMP_TARGETS
+
 extern struct annotation_options annotation__default_options;
 
 struct annotation;
-- 
2.14.3

Re: [PATCH 3/3] dcache: account external names as indirectly reclaimable memory

2018-04-13 Thread Michal Hocko

On Fri 13-04-18 22:35:19, Minchan Kim wrote:
> On Mon, Mar 05, 2018 at 01:37:43PM +, Roman Gushchin wrote:
[...]
> > @@ -1614,9 +1623,11 @@ struct dentry *__d_alloc(struct super_block *sb, 
> > const struct qstr *name)
> > name = &slash_name;
> > dname = dentry->d_iname;
> > } else if (name->len > DNAME_INLINE_LEN-1) {
> > -   size_t size = offsetof(struct external_name, name[1]);
> > -   struct external_name *p = kmalloc(size + name->len,
> > - GFP_KERNEL_ACCOUNT);
> > +   struct external_name *p;
> > +
> > +   reclaimable = offsetof(struct external_name, name[1]) +
> > +   name->len;
> > +   p = kmalloc(reclaimable, GFP_KERNEL_ACCOUNT);
> 
> Can't we use kmem_cache_alloc with own cache created with SLAB_RECLAIM_ACCOUNT
> if they are reclaimable? 

No, because names have different sizes and so we would basically have to
duplicate many caches.
-- 
Michal Hocko
SUSE Labs

Re: [PATCH 00/30] kconfig: move compiler capability tests to Kconfig

2018-04-13 Thread Masahiro Yamada

2018-04-13 21:21 GMT+09:00 Masahiro Yamada :
> 2018-04-13 14:52 GMT+09:00 Kees Cook :
>> On Thu, Apr 12, 2018 at 10:06 PM, Masahiro Yamada
>>  wrote:
>>> [Major Changes in V3]
>>
>> Awesome work! I don't see this pushed to your git tree? I'd like to
>> test it, but I'd rather "git fetch" instead of "git am" :)
>>
>> -Kees
>>
>
> I pushed this series to the following branch.
>
> git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild.git
> kconfig-shell-v3
>

If this approach is successful, we will move more and more
compiler option tests to the Kconfig stage in the future.

People (including me) might be worried about
how slow Kconfig will become.

First, I compared the before/after on my PC.

Without this series,

masahiro@grover:~/workspace/linux-kbuild$ time make -s defconfig

real 0m0.175s
user 0m0.128s
sys 0m0.008s

With this series,
masahiro@grover:~/workspace/linux-kbuild$ time make -s defconfig

real 0m0.729s
user 0m0.400s
sys 0m0.056s

This is noticeable difference.

Then, I looked into per-commit analysis.

Here is the result of the real time of 'time make -s defconfig'

[30/30] kbuild: test dead code/data elimination... 0m0.719s
[29/30] arm64: move GCC version check for...   0m0.711s
[28/30] gcc-plugins: allow to enable GCC_PLUGINS...0m0.722s
[27/30] gcc-plugins: test plugin support in... 0m0.719s   [+0.31]
[26/30] gcc-plugins: move GCC version check... 0m0.410s
[25/30] kcov: test compiler capability in...   0m0.392s
[24/30] gcov: remove CONFIG_GCOV_FORMAT_AUTODETECT 0m0.400s
[23/30] kconfig: add CC_IS_CLANG and CLANG_VERSION 0m0.396s
[22/30] kconfig: add CC_IS_GCC and GCC_VERSION 0m0.392s
[21/30] stack-protector: test compiler capability...   0m0.381s   [+0.04]
[20/30] kconfig: add basic helper macros to... 0m0.343s
[19/30] kconfig: show compiler version text... 0m0.345s
[18/30] kconfig: test: test text expansion...  0m0.342s
[17/30] Documentation: kconfig: document...0m0.344s
[16/30] kconfig: add 'info' and 'warning'...   0m0.347s
[15/30] kconfig: expand lefthand side of...0m0.340s
[14/30] kconfig: support append assignment...  0m0.342s
[13/30] kconfig: support simply expanded...0m0.341s
[12/30] kconfig: support variable and...   0m0.344s
[11/30] kconfig: begin PARAM state only... 0m0.342s
[10/30] kconfig: replace $(UNAME_RELEASE)...   0m0.347s
[09/30] kconfig: add 'shell' built-in function 0m0.344s
[08/30] kconfig: add built-in function support 0m0.350s
[07/30] kconfig: remove sym_expand_string_value()  0m0.344s
[06/30] kconfig: remove string expansion...0m0.349s
[05/30] kconfig: remove string expansion...0m0.342s
[04/30] kconfig: reference environment...  0m0.342s
[03/30] kbuild: remove CONFIG_CROSS_COMPILE... 0m0.347s
[02/30] kbuild: remove kbuild cache0m0.347s  [+0.17]
[01/30] gcc-plugins: fix build condition...0m0.171s
[00/30] Merge tag 'drm-fixes-for-v4.17-rc1'... 0m0.176s

There are three big jump points.

The first one is [02/30]  (+0.17)
We are removing the build cache, so this is what we expect.

The second one is  [21/30]  (+0.04)
For x86, Kconfig runs scripts/gcc-x86_{32,64}-has-stack-protector.sh

The biggest one is [27/30]  (+0.31)
scripts/gcc-plugins.sh is probably very costly script.
If we bump the minimum gcc version to GCC 4.8
the script will be much cleaner in the future.

I was also interested in the cost of
a single $(cc-option ...) invocation.

It is pretty easy to measure this.

For example, copy $(cc-option -fstack-protector)
1000 lines like follows.

config FOO
   bool
   default $(cc-option -fstack-protector)
   default $(cc-option -fstack-protector)
   default $(cc-option -fstack-protector)
   default $(cc-option -fstack-protector)
   default $(cc-option -fstack-protector)
   default $(cc-option -fstack-protector)
   default $(cc-option -fstack-protector)
   default $(cc-option -fstack-protector)
   default $(cc-option -fstack-protector)
   default $(cc-option -fstack-protector)
 ...  [ repeat 1000 line ]

On my core i7 PC, it took 7.2 msec
to run $(cc-option -fstack-protector) 1000 times.

We can make it much faster.

Currently we use
   $(CC) -Werror $(1) -c -x c /dev/null
to test the compiler flag.

Ulf Magnusson suggested to use -S instead of -c
(https://patchwork.kernel.org/patch/10309297/)
With -S, the compiler stops after the compilation stage.
It took only 4.0 msec
to run $(cc-option -fstack-protector) 1000 times

If I use -E  (only pre-process stage), it becomes even faster.
It took only 2.6 msec.

As for $(cc-option ...), probably this will not be a problem.

For some feature, we need special shell-scripts,
some of which can be more costly.

-- 
Best Regards
Masahiro Yamada

[GIT PULL] dmi fixes for v4.17

2018-04-13 Thread Jean Delvare

Hi Linus,

Please pull dmi subsystem updates/fixes for Linux v4.17 from:

git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging.git dmi-for-linus

 drivers/firmware/dmi_scan.c | 16 
 include/linux/mod_devicetable.h |  1 +
 2 files changed, 13 insertions(+), 4 deletions(-)

---

Alex Hung (1):
  firmware: dmi_scan: Add DMI_OEM_STRING support to dmi_matches

Jean Delvare (2):
  firmware: dmi_scan: Fix UUID length safety check
  firmware: dmi_scan: Use lowercase letters for UUID

Thanks,
-- 
Jean Delvare
SUSE L3 Support

[PATCH v3 1/2] dmaengine: stm32-mdma: align TLEN and buffer length on burst

2018-04-13 Thread Pierre-Yves MORDRET

Both buffer Transfer Length (TLEN if any) and transfer size have to be
aligned on burst size (burst beats*bus width).

Signed-off-by: Pierre-Yves MORDRET 
---
  Version history:
v1:
   * Initial
v2:
v3:
   * Get rid of while loop in favor of computed values
---
---
 drivers/dma/stm32-mdma.c | 9 +++--
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/drivers/dma/stm32-mdma.c b/drivers/dma/stm32-mdma.c
index daa1602..4c7634c 100644
--- a/drivers/dma/stm32-mdma.c
+++ b/drivers/dma/stm32-mdma.c
@@ -410,13 +410,10 @@ static enum dma_slave_buswidth 
stm32_mdma_get_max_width(dma_addr_t addr,
 static u32 stm32_mdma_get_best_burst(u32 buf_len, u32 tlen, u32 max_burst,
 enum dma_slave_buswidth width)
 {
-   u32 best_burst = max_burst;
-   u32 burst_len = best_burst * width;
+   u32 best_burst;
 
-   while ((burst_len > 0) && (tlen % burst_len)) {
-   best_burst = best_burst >> 1;
-   burst_len = best_burst * width;
-   }
+   best_burst = min((u32)1 << __ffs(tlen | buf_len),
+max_burst * width) / width;
 
return (best_burst > 0) ? best_burst : 1;
 }
-- 
2.7.4

Re: [PATCH] KVM: x86: VMX: hyper-v: Enlightened MSR-Bitmap support

2018-04-13 Thread Paolo Bonzini

On 12/04/2018 17:25, Vitaly Kuznetsov wrote:
> @@ -5335,6 +5353,9 @@ static void __always_inline 
> vmx_disable_intercept_for_msr(unsigned long *msr_bit
>   if (!cpu_has_vmx_msr_bitmap())
>   return;
>  
> + if (static_branch_unlikely(&enable_emsr_bitmap))
> + evmcs_touch_msr_bitmap();
> +
>   /*
>* See Intel PRM Vol. 3, 20.6.9 (MSR-Bitmap Address). Early manuals
>* have the write-low and read-high bitmap offsets the wrong way round.
> @@ -5370,6 +5391,9 @@ static void __always_inline 
> vmx_enable_intercept_for_msr(unsigned long *msr_bitm
>   if (!cpu_has_vmx_msr_bitmap())
>   return;
>  
> + if (static_branch_unlikely(&enable_emsr_bitmap))
> + evmcs_touch_msr_bitmap();

I'm not sure about the "unlikely".  Can you just check current_evmcs
instead (dropping the static key completely)?

The function, also, is small enough that inlining should be beneficial.

Paolo

[PATCH v3 2/2] dmaengine: stm32-mdma: Fix incomplete Hw descriptors allocator

2018-04-13 Thread Pierre-Yves MORDRET

Only 1 Hw Descriptor is allocated. Loop over required Hw descriptor for
proper allocation.

Signed-off-by: Pierre-Yves MORDRET 
---
  Version history:
v1:
   * Initial
v2:
   * Fix kbuild warning format: /0x%08x/%pad/
v3:
   * use of "offsetof" instead of explicit calculation
---
---
 drivers/dma/stm32-mdma.c | 89 ++--
 1 file changed, 55 insertions(+), 34 deletions(-)

diff --git a/drivers/dma/stm32-mdma.c b/drivers/dma/stm32-mdma.c
index 4c7634c..1ac775f 100644
--- a/drivers/dma/stm32-mdma.c
+++ b/drivers/dma/stm32-mdma.c
@@ -252,13 +252,17 @@ struct stm32_mdma_hwdesc {
u32 cmdr;
 } __aligned(64);
 
+struct stm32_mdma_desc_node {
+   struct stm32_mdma_hwdesc *hwdesc;
+   dma_addr_t hwdesc_phys;
+};
+
 struct stm32_mdma_desc {
struct virt_dma_desc vdesc;
u32 ccr;
-   struct stm32_mdma_hwdesc *hwdesc;
-   dma_addr_t hwdesc_phys;
bool cyclic;
u32 count;
+   struct stm32_mdma_desc_node node[];
 };
 
 struct stm32_mdma_chan {
@@ -344,30 +348,42 @@ static struct stm32_mdma_desc *stm32_mdma_alloc_desc(
struct stm32_mdma_chan *chan, u32 count)
 {
struct stm32_mdma_desc *desc;
+   int i;
 
-   desc = kzalloc(sizeof(*desc), GFP_NOWAIT);
+   desc = kzalloc(offsetof(typeof(*desc), node[count]), GFP_NOWAIT);
if (!desc)
return NULL;
 
-   desc->hwdesc = dma_pool_alloc(chan->desc_pool, GFP_NOWAIT,
- &desc->hwdesc_phys);
-   if (!desc->hwdesc) {
-   dev_err(chan2dev(chan), "Failed to allocate descriptor\n");
-   kfree(desc);
-   return NULL;
+   for (i = 0; i < count; i++) {
+   desc->node[i].hwdesc =
+   dma_pool_alloc(chan->desc_pool, GFP_NOWAIT,
+  &desc->node[i].hwdesc_phys);
+   if (!desc->node[i].hwdesc)
+   goto err;
}
 
desc->count = count;
 
return desc;
+
+err:
+   dev_err(chan2dev(chan), "Failed to allocate descriptor\n");
+   while (--i >= 0)
+   dma_pool_free(chan->desc_pool, desc->node[i].hwdesc,
+ desc->node[i].hwdesc_phys);
+   kfree(desc);
+   return NULL;
 }
 
 static void stm32_mdma_desc_free(struct virt_dma_desc *vdesc)
 {
struct stm32_mdma_desc *desc = to_stm32_mdma_desc(vdesc);
struct stm32_mdma_chan *chan = to_stm32_mdma_chan(vdesc->tx.chan);
+   int i;
 
-   dma_pool_free(chan->desc_pool, desc->hwdesc, desc->hwdesc_phys);
+   for (i = 0; i < desc->count; i++)
+   dma_pool_free(chan->desc_pool, desc->node[i].hwdesc,
+ desc->node[i].hwdesc_phys);
kfree(desc);
 }
 
@@ -666,18 +682,18 @@ static int stm32_mdma_set_xfer_param(struct 
stm32_mdma_chan *chan,
 }
 
 static void stm32_mdma_dump_hwdesc(struct stm32_mdma_chan *chan,
-  struct stm32_mdma_hwdesc *hwdesc)
+  struct stm32_mdma_desc_node *node)
 {
-   dev_dbg(chan2dev(chan), "hwdesc:  0x%p\n", hwdesc);
-   dev_dbg(chan2dev(chan), "CTCR:0x%08x\n", hwdesc->ctcr);
-   dev_dbg(chan2dev(chan), "CBNDTR:  0x%08x\n", hwdesc->cbndtr);
-   dev_dbg(chan2dev(chan), "CSAR:0x%08x\n", hwdesc->csar);
-   dev_dbg(chan2dev(chan), "CDAR:0x%08x\n", hwdesc->cdar);
-   dev_dbg(chan2dev(chan), "CBRUR:   0x%08x\n", hwdesc->cbrur);
-   dev_dbg(chan2dev(chan), "CLAR:0x%08x\n", hwdesc->clar);
-   dev_dbg(chan2dev(chan), "CTBR:0x%08x\n", hwdesc->ctbr);
-   dev_dbg(chan2dev(chan), "CMAR:0x%08x\n", hwdesc->cmar);
-   dev_dbg(chan2dev(chan), "CMDR:0x%08x\n\n", hwdesc->cmdr);
+   dev_dbg(chan2dev(chan), "hwdesc:  %pad\n", &node->hwdesc_phys);
+   dev_dbg(chan2dev(chan), "CTCR:0x%08x\n", node->hwdesc->ctcr);
+   dev_dbg(chan2dev(chan), "CBNDTR:  0x%08x\n", node->hwdesc->cbndtr);
+   dev_dbg(chan2dev(chan), "CSAR:0x%08x\n", node->hwdesc->csar);
+   dev_dbg(chan2dev(chan), "CDAR:0x%08x\n", node->hwdesc->cdar);
+   dev_dbg(chan2dev(chan), "CBRUR:   0x%08x\n", node->hwdesc->cbrur);
+   dev_dbg(chan2dev(chan), "CLAR:0x%08x\n", node->hwdesc->clar);
+   dev_dbg(chan2dev(chan), "CTBR:0x%08x\n", node->hwdesc->ctbr);
+   dev_dbg(chan2dev(chan), "CMAR:0x%08x\n", node->hwdesc->cmar);
+   dev_dbg(chan2dev(chan), "CMDR:0x%08x\n\n", node->hwdesc->cmdr);
 }
 
 static void stm32_mdma_setup_hwdesc(struct stm32_mdma_chan *chan,
@@ -691,7 +707,7 @@ static void stm32_mdma_setup_hwdesc(struct stm32_mdma_chan 
*chan,
struct stm32_mdma_hwdesc *hwdesc;
u32 next = count + 1;
 
-   hwdesc = &desc->hwdesc[count];
+   hwdesc = desc->node[count].hwdesc;
hwdesc->ctcr = ctcr;
hwdesc->cbndtr &= ~(STM32_MDMA_CBNDTR_BRC_MK |
STM32_MDMA_C

Re: [PATCH v9 3/7] acpi: apei: Add SEI notification type support for ARMv8

2018-04-13 Thread gengdongjiu

James,
   Thanks for this mail.

On 2018/4/13 0:14, James Morse wrote:
> Hi gengdongjiu,
> 
> On 12/04/18 06:00, gengdongjiu wrote:
>> 2018-02-16 1:55 GMT+08:00 James Morse :
>>> On 05/02/18 11:24, gengdongjiu wrote:
> Is the emulated SError routed following the routing rules for 
> HCR_EL2.{AMO,
> TGE}?

 Yes, it is.
>>>
>>> ... and yet ...
>>>
>>>
> What does your firmware do when it wants to emulate SError but its masked?
> (e.g.1: The physical-SError interrupted EL2 and the SPSR shows EL2 had
> PSTATE.A  set.
>  e.g.2: The physical-SError interrupted EL2 but HCR_EL2 indicates the
> emulated  SError should go to EL1. This effectively masks SError.)

 Currently we does not consider much about the mask status(SPSR).
>>>
>>> .. this is a problem.
>>>
>>> If you ignore SPSR_EL3 you may deliver an SError to EL1 when the exception
>>> interrupted EL2. Even if you setup the EL1 register correctly, EL1 can't 
>>> eret to
>>> EL2. This should never happen, SError is effectively masked if you are 
>>> running
>>> at an EL higher than the one its routed to.
>>>
>>> More obviously: if the exception came from the EL that SError should be 
>>> routed
>>> to, but PSTATE.A was set, you can't deliver SError. Masking SError is the 
>>> only
> 
>> James, I  summarized the masking and routing rules for SError to
>> confirm with you for the firmware first solution,
> 
> You also said "Currently we does not consider much about the mask 
> status(SPSR)."
Yes, we currently do not consider much it. After clarification with you, we 
want to modify the EL3 firmware to follow this rule.

> 
> 
>> 1. If the HCR_EL2.{AMO,TGE} is set,
> 
> If one or the other of these bits is set: (AMO==1 || TGE==1)
> 
>> which means the SError should route to EL2,
>> When system happens SError and trap to EL3,   If EL3 find
>> HCR_EL2.{AMO,TGE} and SPSR_EL3.A are both set,
>> and find this SError come from EL2, it will not deliver an SError:
>> store the RAS error in the BERT and 'reboot'; but if
>> it find that this SError come from EL1 or EL0, it also need to deliver
>> an SError, right?
> 
> Yes.
> 
> 
>> 2. If the HCR_EL2.{AMO,TGE} is not set,
> 
> If neither of these bits is set: (AMO==0 && TGE == 0)
> 
>> which means the SError should route to EL1,
>> When system happens SError and trap to EL3, If EL3 find
>> HCR_EL2.{AMO,TGE} and SPSR_EL3.A are both not set,
> 
> (I'm reading this as all three of these bits are clear)
sorry, it is a typo issue.
it should be HCR_EL2.AMO and HCR_EL2.TGE are both clear, but SPSR_EL3.A is set.

> 
>> and find this SError come from EL1, it will not deliver an SError:
>> store the RAS error in the BERT and 'reboot'; 
> 
> No, (AMO==0 && TGE == 0) means SError is routed to EL1, this exception
> interrupted EL1 and the A bit was clear, so EL1 can take an SError.

Agree.

> 
> The two cases here are:
> AMO==0,TGE==0 means SError should be routed to EL1. If SPSR_EL3 says the
> exception interrupted EL1 and the A bit was set, you need to do the BERT 
> trick.

> 
> If SPSR_EL3 says the exception interrupted EL2, you need to do the BERT trick
"BERT trick" is storing the RAS error in the BERT and 'reboot, right?

> regardless of the A bit, as SError is implicitly masked by running at a higher
> exception level than it was routed to.


> 
> 
>>From your v11 reply:
>> 2. The exception came from the EL that SError should not be routed
>> to(according to hcr_EL2.{AMO, TGE}),even though the PSTATE.A was set,EL3
>> firmware still deliver SError
> 
> (this is re-iterating the two-cases above:)
> 'not be routed to' is one of two things: Route-to-EL2+interruted-EL1, or
> Route-to-EL1+interrupted-EL2.
> 
> Route-to-EL2+interrupted-EL1 is fine, regardless of SPSR_EL3.A the emulated
> SError can be delivered to EL2, as EL2 can't mask SError when executing at a
> lower EL.
Agree.

> 
> Route-to-EL1+interrupted-EL2 is the problem. SError is implicitly masked by
> running at a higher EL. Regardless of SPSR_EL3.A, the emulated SError can not 
> be
> delivered.
"can not be delivered" means storing the RAS error in the BERT and 'reboot, 
right?
In the Table D1-15 in "D1.14.2 Asynchronous exception masking", for the case, 
it is "C"
"C"means SError is not taken regardless of the value of the Process state 
interrupt mask.
for this case, whether it will be unsafe if  BIOS directly reboot?


> KVM does this on the way out of a guest, if an SError occurs during this time
> the CPU will wait until execution returns to EL1 before delivering the SError.
> Your firmware has to do the same.
> 
> Table D1-15 in "D1.14.2 Asynchronous exception masking" has a table with all 
> the
> combinations. The ARM-ARM is what we need to match with this behaviour.
> 
> 
>> but if it find that this SError come from EL0, it also need to deliver an
>> SError, right?
> 
> I thought interrupted-EL0 could always be delivered: but re-reading the
> ARM-ARM's "D1.14.2 Asynchronous exception masking", if asynchronous excepti

[PATCH v3 0/2] Append some fixes and improvements

2018-04-13 Thread Pierre-Yves MORDRET

Fix an issue with FIFO Size and burst size.
Fix an incomplete allocator for Hardware descriptors: memory badly
allocated.
---
  Version history:
v1:
   * Initial
v2:
   * Fix kbuild warning format: /0x%08x/%pad/
v3:
   * Get rid of while loop in favor of computed values
   * use of "offsetof" instead of explicit calculation
---

Pierre-Yves MORDRET (2):
  dmaengine: stm32-mdma: align TLEN and buffer length on burst
  dmaengine: stm32-mdma: Fix incomplete Hw descriptors allocator

 drivers/dma/stm32-mdma.c | 98 
 1 file changed, 58 insertions(+), 40 deletions(-)

-- 
2.7.4

[PATCH] tools build: Use -Xpreprocessor instead of -Wp and leave pathnames intact

2018-04-13 Thread Will Deacon

Build.include invokes the pre-processor via GCC in order to generate a
dependency list for the input file. Since these options are passed using
'-Wp,-M...,$(depfile)' it is important that $(depfile) does not contain
any commas, so these are substituted with underscores. This substitution
will break the build if the directory name of the output directory happens
to include a comma, e.g. when using "aiaiai" for bisection testing:

  | cc1: fatal error: x86/tools/objtool/fixdep.o: No such file or directory
  | compilation terminated.
  | cat: 
/tmp/aiaiai-test-patchset.qroS/before/obj.defconfig_x86/tools/objtool/.fixdep.o.d:
 No such file or directory
  | make[5]: *** [tools/objtool/fixdep.o] Error 1

We can address this by using -Xpreprocessor instead of -Wp, which allows
us to pass down an unmodified pathname.

Cc: Jiri Olsa 
Cc: Dave Martin 
Cc: Arnaldo Carvalho de Melo 
Cc: Ingo Molnar 
Signed-off-by: Will Deacon 
---

As an aside, the way we currently pass the depfile to -MD appears to be
in direct contradiction with the preprocessor documentation, although it
does work with the cc1 implementation.

 tools/build/Build.include | 10 --
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/tools/build/Build.include b/tools/build/Build.include
index 418871d02ebf..e1914f8e2328 100644
--- a/tools/build/Build.include
+++ b/tools/build/Build.include
@@ -22,9 +22,7 @@ dot-target = $(dir $@).$(notdir $@)
 basetarget = $(basename $(notdir $@))
 
 ###
-# The temporary file to save gcc -MD generated dependencies must not
-# contain a comma
-depfile = $(subst $(comma),_,$(dot-target).d)
+depfile = $(dot-target).d
 
 ###
 # Check if both arguments has same arguments. Result is empty string if equal.
@@ -89,12 +87,12 @@ if_changed = $(if $(strip $(any-prereq) $(arg-check)),  
 \
 # - per target C flags
 # - per object C flags
 # - BUILD_STR macro to allow '-D"$(variable)"' constructs
-c_flags_1 = -Wp,-MD,$(depfile) -Wp,-MT,$@ $(CFLAGS) -D"BUILD_STR(s)=\#s" 
$(CFLAGS_$(basetarget).o) $(CFLAGS_$(obj))
+c_flags_1 = -Xpreprocessor -MD -Xpreprocessor $(depfile) -Xpreprocessor -MT 
-Xpreprocessor $@ $(CFLAGS) -D"BUILD_STR(s)=\#s" $(CFLAGS_$(basetarget).o) 
$(CFLAGS_$(obj))
 c_flags_2 = $(filter-out $(CFLAGS_REMOVE_$(basetarget).o), $(c_flags_1))
 c_flags   = $(filter-out $(CFLAGS_REMOVE_$(obj)), $(c_flags_2))
-cxx_flags = -Wp,-MD,$(depfile) -Wp,-MT,$@ $(CXXFLAGS) -D"BUILD_STR(s)=\#s" 
$(CXXFLAGS_$(basetarget).o) $(CXXFLAGS_$(obj))
+cxx_flags = -Xpreprocessor -MD -Xpreprocessor $(depfile) -Xpreprocessor -MT 
-Xpreprocessor $@ $(CXXFLAGS) -D"BUILD_STR(s)=\#s" $(CXXFLAGS_$(basetarget).o) 
$(CXXFLAGS_$(obj))
 
 ###
 ## HOSTCC C flags
 
-host_c_flags = -Wp,-MD,$(depfile) -Wp,-MT,$@ $(CHOSTFLAGS) 
-D"BUILD_STR(s)=\#s" $(CHOSTFLAGS_$(basetarget).o) $(CHOSTFLAGS_$(obj))
+host_c_flags = -Xpreprocessor -MD -Xpreprocessor $(depfile) -Xpreprocessor -MT 
-Xpreprocessor $@ $(CHOSTFLAGS) -D"BUILD_STR(s)=\#s" 
$(CHOSTFLAGS_$(basetarget).o) $(CHOSTFLAGS_$(obj))
-- 
2.1.4

Re: [PATCH ipmi/kcs_bmc v1] ipmi: kcs_bmc: optimize the data buffers allocation

2018-04-13 Thread Corey Minyard


On 04/07/2018 02:54 AM, Wang, Haiyue wrote:

Hi Corey,

Since IPMI 2.0 just defined minimum, no maximum:



KCS/SMIC Input : Required: 40 bytes IPMI Message, minimum

KCS/SMIC Output : Required: 38 bytes IPMI Message, minimum



Yes, though there are practical maximums that are much smaller than 1000 
bytes.






We can enlarge the block size for avoiding waste, and make our driver

support most worst message size case. And I think this patch make 
checking


simple (from 3 to 1), and the code clean, this is the biggest reason I 
want to


change. The TLB is just memory management study from book, no data to

support access improvement. :)


I would argue that the way it is now expresses the intent of the code better
than one allocation split into three parts.  Expressing your intent is more
important than the number of checks and a minuscule performance
improvement.  For me it makes the code easier to understand.  If you had
a tool that checked for out-of-bounds memory access, then a single 
allocation

might not find an overrun between the parts.  Smaller allocations tend
to result in less memory fragmentation.

My preference is to leave it as it is.  However, it's not that 
important, and

if you really want this patch, I can include it.

Thanks,

-corey



BR,

Haiyue


On 2018-04-07 10:37, Wang, Haiyue wrote:



On 2018-04-07 05:47, Corey Minyard wrote:

On 03/15/2018 07:20 AM, Haiyue Wang wrote:

Allocate a continuous memory block for the three KCS data buffers with
related index assignment.


I'm finally getting to this.

Is there a reason you want to do this?  In general, it's better to 
not try to

outsmart your base system.  Depending on the memory allocator, in this
case, you might actually use more memory.  You probably won't use any
less.

I got this idea from another code review, but that patch allocates 30 
more
the same size memory block, reducing the devm_kmalloc call will be 
better.

For KCS only have 3, may be the key point is memory waste.

In the original case, you allocate three 1000 byte buffers, 
resulting in 3

1024 byte slab allocated.

In the changed case, you will allocate a 3000 byte buffer, resulting in
a single 4096 byte slab allocation, wasting 1024 more bytes of memory.


As the kcs has memory copy between in/out/kbuffer, put them in the same
page will be better ? Such as the same TLB ? (Well, I just got this 
from book,

no real experience of memory accessing performance. And also, I was told
that using space to save the time. :-)).

Just my stupid thinking. I'm OK to drop this patch if it doesn't help 
with

performance, or something else.

BR.
Haiyue


-corey


Signed-off-by: Haiyue Wang 
---
  drivers/char/ipmi/kcs_bmc.c | 10 ++
  1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/drivers/char/ipmi/kcs_bmc.c b/drivers/char/ipmi/kcs_bmc.c
index fbfc05e..dc19c0d 100644
--- a/drivers/char/ipmi/kcs_bmc.c
+++ b/drivers/char/ipmi/kcs_bmc.c
@@ -435,6 +435,7 @@ static const struct file_operations 
kcs_bmc_fops = {
  struct kcs_bmc *kcs_bmc_alloc(struct device *dev, int 
sizeof_priv, u32 channel)

  {
  struct kcs_bmc *kcs_bmc;
+    void *buf;
    kcs_bmc = devm_kzalloc(dev, sizeof(*kcs_bmc) + sizeof_priv, 
GFP_KERNEL);

  if (!kcs_bmc)
@@ -448,11 +449,12 @@ struct kcs_bmc *kcs_bmc_alloc(struct device 
*dev, int sizeof_priv, u32 channel)

  mutex_init(&kcs_bmc->mutex);
  init_waitqueue_head(&kcs_bmc->queue);
  -    kcs_bmc->data_in = devm_kmalloc(dev, KCS_MSG_BUFSIZ, 
GFP_KERNEL);
-    kcs_bmc->data_out = devm_kmalloc(dev, KCS_MSG_BUFSIZ, 
GFP_KERNEL);

-    kcs_bmc->kbuffer = devm_kmalloc(dev, KCS_MSG_BUFSIZ, GFP_KERNEL);
-    if (!kcs_bmc->data_in || !kcs_bmc->data_out || !kcs_bmc->kbuffer)
+    buf = devm_kmalloc_array(dev, 3, KCS_MSG_BUFSIZ, GFP_KERNEL);
+    if (!buf)
  return NULL;
+    kcs_bmc->data_in  = buf;
+    kcs_bmc->data_out = buf + KCS_MSG_BUFSIZ;
+    kcs_bmc->kbuffer  = buf + KCS_MSG_BUFSIZ * 2;
    kcs_bmc->miscdev.minor = MISC_DYNAMIC_MINOR;
  kcs_bmc->miscdev.name = dev_name(dev);

[PATCH] Move handling of the MIDR Variant and Revision bits into the mapfile.csv file

2018-04-13 Thread William Cohen

The arm64 code indentification code was filtering out the Variant and
Revision bits when it initially read the MIDR value.  It is better to
do the filtering of Variant and Revision bits in the regular
expressions in the mapsfile.csv.  If some performance events do not
function for particular versions of silicon, special case maps can be
added to mapsfile.csv before the general case to handle them.

Signed-off-by: William Cohen 
---
 tools/perf/arch/arm64/util/header.c  |  7 ---
 tools/perf/pmu-events/arch/arm64/mapfile.csv | 12 +++-
 2 files changed, 7 insertions(+), 12 deletions(-)

diff --git a/tools/perf/arch/arm64/util/header.c 
b/tools/perf/arch/arm64/util/header.c
index 534cd2507d83..05d1439c2cff 100644
--- a/tools/perf/arch/arm64/util/header.c
+++ b/tools/perf/arch/arm64/util/header.c
@@ -5,9 +5,6 @@
 
 #define MIDR "/regs/identification/midr_el1"
 #define MIDR_SIZE 19
-#define MIDR_REVISION_MASK  0xf
-#define MIDR_VARIANT_SHIFT  20
-#define MIDR_VARIANT_MASK   (0xf << MIDR_VARIANT_SHIFT)
 
 char *get_cpuid_str(struct perf_pmu *pmu)
 {
@@ -44,11 +41,7 @@ char *get_cpuid_str(struct perf_pmu *pmu)
}
fclose(file);
 
-   /* Ignore/clear Variant[23:20] and
-* Revision[3:0] of MIDR
-*/
midr = strtoul(buf, NULL, 16);
-   midr &= (~(MIDR_VARIANT_MASK | MIDR_REVISION_MASK));
scnprintf(buf, MIDR_SIZE, "0x%016lx", midr);
/* got midr break loop */
break;
diff --git a/tools/perf/pmu-events/arch/arm64/mapfile.csv 
b/tools/perf/pmu-events/arch/arm64/mapfile.csv
index f03e26ecb658..23372a335f97 100644
--- a/tools/perf/pmu-events/arch/arm64/mapfile.csv
+++ b/tools/perf/pmu-events/arch/arm64/mapfile.csv
@@ -3,7 +3,9 @@
 #
 # where
 #  MIDRProcessor version
-#  Variant[23:20] and Revision [3:0] should be zero.
+#  Variant[23:20] and Revision [3:0] bits should be matched
+#  with regular expression hex digits ([[:xdigit:]])
+#  unless particular variants or revisions need special handling.
 #  Version could be used to track version of of JSON file
 #  but currently unused.
 #  JSON/file/pathname is the path to JSON file, relative
@@ -12,7 +14,7 @@
 #
 #
 #Family-model,Version,Filename,EventType
-0x410fd03[[:xdigit:]],v1,arm/cortex-a53,core
-0x420f5160,v1,cavium/thunderx2,core
-0x430f0af0,v1,cavium/thunderx2,core
-0x480fd010,v1,hisilicon/hip08,core
+0x41[[:xdigit:]]fd03[[:xdigit:]],v1,arm/cortex-a53,core
+0x42[[:xdigit:]]f516[[:xdigit:]],v1,cavium/thunderx2,core
+0x43[[:xdigit:]]f0af[[:xdigit:]],v1,cavium/thunderx2,core
+0x48[[:xdigit:]]fd01[[:xdigit:]],v1,hisilicon/hip08,core
-- 
2.14.3

Re: [PATCH RFC 2/8] mm: introduce PG_offline

2018-04-13 Thread David Hildenbrand

On 13.04.2018 15:40, Michal Hocko wrote:
> On Fri 13-04-18 15:16:26, David Hildenbrand wrote:
>> online_pages()/offline_pages() theoretically allows us to work on
>> sub-section sizes. This is especially relevant in the context of
>> virtualization. It e.g. allows us to add/remove memory to Linux in a VM in
>> 4MB chunks.
> 
> Well, theoretically possible but this would require a lot of auditing
> because the hotplug and per section assumption is quite a spread one.

Indeed. But besides changing section sizes / size of memory blocks this
seems to be the only way to do it. (btw, I think Windows allows to add
1MB chunks - e.g. 1MB DIMMs)

But as these pages "belong to nobody" nobody (besides kdump) should dare
to access the content, although the section is online.

> 
>> While the whole section is marked as online/offline, we have to know
>> the state of each page. E.g. to not read memory that is not online
>> during kexec() or to properly mark a section as offline as soon as all
>> contained pages are offline.
> 
> But you cannot use a page flag for that, I am afraid. Page flags are
> extremely scarce resource. I haven't looked at the rest of the series
> but _if_ we have a bit spare which I am not really sure about then you
> should prove there are no other ways around this.

Open for suggestions. We could remember per segment/memory block which
parts are online/offline and use that to decide if a section can go offline.

However: kdump will also have to (easily) know which pages are offline,
so it can skip reading them. (see the other patch)

>  
>> Signed-off-by: David Hildenbrand 

-- 

Thanks,

David / dhildenb

Re: [PATCH 2/6] tracing: Add trace event error log

2018-04-13 Thread Steven Rostedt

On Thu, 12 Apr 2018 18:52:13 -0500
Tom Zanussi  wrote:

> Hi Steve,
> 
> On Thu, 2018-04-12 at 18:20 -0400, Steven Rostedt wrote:
> > On Thu, 12 Apr 2018 10:13:17 -0500
> > Tom Zanussi  wrote:
> >   
> > > diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
> > > index 6fb46a0..f2dc7e6 100644
> > > --- a/kernel/trace/trace.h
> > > +++ b/kernel/trace/trace.h
> > > @@ -1765,6 +1765,9 @@ extern ssize_t trace_parse_run_command(struct file 
> > > *file,
> > >   const char __user *buffer, size_t count, loff_t *ppos,
> > >   int (*createfn)(int, char**));
> > >  
> > > +extern void event_log_err(const char *loc, const char *cmd, const char 
> > > *fmt,
> > > +   ...);
> > > +
> > >  /*
> > >   * Normal trace_printk() and friends allocates special buffers
> > >   * to do the manipulation, as well as saves the print formats
> > > diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
> > > index 05c7172..fd02e22 100644
> > > --- a/kernel/trace/trace_events.c
> > > +++ b/kernel/trace/trace_events.c
> > > @@ -1668,6 +1668,164 @@ static void ignore_task_cpu(void *data)
> > >   return ret;
> > >  }
> > >  
> > > +#define EVENT_LOG_ERRS_MAX   (PAGE_SIZE / sizeof(struct 
> > > event_log_err))  
> >   
> > > +#define EVENT_ERR_LOG_MASK   (EVENT_LOG_ERRS_MAX - 1)  
> > 
> > BTW, the above only works if EVENT_LOG_ERRS_MAX is a power of two,
> > which it's not guaranteed to be.
> >   
> 
> My assumption was that we'd only ever need a page or two for the
> error_log and so would always would be a power of two, since the size of
> the struct event_log_err is 512.

Assumptions are not what we want to rely on. There should be something
like:

BUILD_BUG_ON(EVENT_LOG_ERRS_MAX & EVENT_ERR_LOG_MASK);

Which would guarantee that your assumption is correct otherwise the
kernel wont build.


> 
> Anyway, I should probably have put comments about all this in the code,
> and I will, but the way it works kind of assumes a very small number of
> errors - it's replacing a simple 'last error' facility for the hist
> triggers and making it a common facility for other things that have
> similar needs like Masami's kprobe_events errors.  For those purposes, I
> assumed it would suffice to simply be able to show that last 8 or some
> similar small number of errors and constantly recycle the slots.

The errors are still in the files that have the errors right? Perhaps
just have a file that lists the files that contain errors. That way if
something goes wrong, you can examine that file and then look at the
file that contains the error?

And I'm not sure it being in the events directory is the best place
either, especially, if you plan to have it handle kprobe_events because
that's not in the events directory.

> 
> Basically it just splits the page into 16 strings, 2 per error, one for
> the actual error text, the other for the command the user entered.  The
> struct event_log_err just overlays a struct on top of 2 strings just to
> make it easier to manage.
> 
> Anyway, because it is such a small number, and we start with a zeroed
> page, whenever we print the error log, we print all 16 strings even if
> we only have one error (2 strings).  The rest are NULL and print
> nothing.  We start with the tail, which could also be thought of as the
> 'oldest' or the 'first' error in the buffer and just cycle through them
> all.  Hope that clears up some of the other questions you had about how
> a non-full log gets printed, etc...

OK, I was thinking a NULL entry would return NULL, but we are
returning a pointer to NULL. That's where I missed it.
 
> 
> > > +
> > > +struct event_log_err {
> > > + charerr[MAX_FILTER_STR_VAL];
> > > + charcmd[MAX_FILTER_STR_VAL];
> > > +};  
> > 
> > I like the event_log_err idea, but the above can be shrunk to:
> > 
> > struct err_info {
> > u8  type; /* I can only imagine 254 types */
> > u8  pos;  /* MAX_FILTER_STR_VAR = 256 */
> > };
> > 
> > struct event_log_err {
> > struct err_info info;
> > charcmd[MAX_FILTER_STR_VAL];
> > };
> > 
> > There's no reason to put in a bunch of text that's going to be static
> > anyway. Have a lookup table like we do for filters.
> > 
> > +   log_err("Variable name not unique, need to use 
> > fully qualified name (%s) for variable: ", fqvar(system, event_name, 
> > var_name, true));
> >   
> 
> Hmm, most of the log_errs use printf strings that get expanded, so need
> a destination buffer, the event_log_err->err string, but I think I see
> what you're getting at - that we can get rid of the format strings
> altogether and make them static strings if we use the method of simply
> printing the static string and putting a caret where the error is as
> below.
> 
> > 
> > Instead of making the fqvar, find the location of the variable, and add:
> > 
> >  blah blah $var blah blah
> > ^
> >   Variable

[PATCH 4/6] Documentation for Pmalloc

2018-04-13 Thread Igor Stoppa

Detailed documentation about the protectable memory allocator.

Signed-off-by: Igor Stoppa 
---
 Documentation/core-api/index.rst   |   1 +
 Documentation/core-api/pmalloc.rst | 107 +
 2 files changed, 108 insertions(+)
 create mode 100644 Documentation/core-api/pmalloc.rst

diff --git a/Documentation/core-api/index.rst b/Documentation/core-api/index.rst
index c670a8031786..8f5de42d6571 100644
--- a/Documentation/core-api/index.rst
+++ b/Documentation/core-api/index.rst
@@ -25,6 +25,7 @@ Core utilities
genalloc
errseq
printk-formats
+   pmalloc
 
 Interfaces for kernel debugging
 ===
diff --git a/Documentation/core-api/pmalloc.rst 
b/Documentation/core-api/pmalloc.rst
new file mode 100644
index ..c14907485137
--- /dev/null
+++ b/Documentation/core-api/pmalloc.rst
@@ -0,0 +1,107 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+.. _pmalloc:
+
+Protectable memory allocator
+
+
+Purpose
+---
+
+The pmalloc library is meant to provide read-only status to data that,
+for some reason, could neither be declared as constant, nor could it take
+advantage of the qualifier __ro_after_init, but is write-once and
+read-only in spirit. At least as long as it doesn't get teared down.
+It protects data from both accidental and malicious overwrites.
+
+Example: A policy that is loaded from userspace.
+
+
+Concept
+---
+
+The MMU available in the system can be used to write protect memory pages.
+Unfortunately this feature cannot be used as-it-is, to protect sensitive
+data, because this potentially read-only data is typically interleaved
+with other data, which must stay writeable.
+
+pmalloc introduces the concept of protectable memory pools.
+A pool contains a list of areas of virtually contiguous pages of
+memory. An area is the minimum amount of memory that pmalloc allows to
+protect, because the user might have allocated a memory range that
+crosses the boundary between pages.
+
+When an allocation is performed, if there is not enough memory already
+available in the pool, a new area of suitable size is grabbed.
+The size chosen is the largest between the roundup (to PAGE_SIZE) of
+the request from pmalloc and friends and the refill parameter specified
+when creating the pool.
+
+When a pool is created, it is possible to specify two parameters:
+- refill size: the minimum size of the memory area to allocate when needed
+- align_order: the default alignment to use when reserving memory
+
+To facilitate the conversion of existing code to pmalloc pools, several
+helper functions are provided, mirroring their k/vmalloc counterparts.
+However one is missing. There is no pfree() because the memory protected
+by a pool will be released exclusively when the pool is destroyed.
+
+
+
+Caveats
+---
+
+- When a pool is protected, whatever memory would be still available in
+  the current vmap_area (from which allocations are performed) is
+  relinquished.
+
+- As already explained, freeing of memory is not supported. Pages will be
+  returned to the system upon destruction of the memory pool that they
+  belong to.
+
+- The address range available for vmalloc (and thus for pmalloc too) is
+  limited, on 32-bit systems. However it shouldn't be an issue, since not
+  much data is expected tobe dynamically allocated and turned into
+  read-only.
+
+- Regarding SMP systems, the allocations are expected to happen mostly
+  during an initial transient, after which there should be no more need
+  to perform cross-processor synchronizations of page tables.
+  Loading of kernel modules is an exception to this, but it's not expected
+  to happen with such high frequency to become a problem.
+
+
+Use
+---
+
+The typical sequence, when using pmalloc, is:
+
+#. create a pool
+
+   :c:func:`pmalloc_create_pool`
+
+#. issue one or more allocation requests to the pool
+
+   :c:func:`pmalloc`
+
+   or
+
+   :c:func:`pzalloc`
+
+#. initialize the memory obtained, with the desired values
+
+#. write-protect the memory so far allocated
+
+   :c::func:`pmalloc_protect_pool`
+
+#. iterate over the last 3 points as needed
+
+#. [optional] destroy the pool
+
+   :c:func:`pmalloc_destroy_pool`
+
+API
+---
+
+.. kernel-doc:: include/linux/pmalloc.h
+.. kernel-doc:: mm/pmalloc.c
-- 
2.14.1

[PATCH 5/6] Pmalloc selftest

2018-04-13 Thread Igor Stoppa

Add basic self-test functionality for pmalloc.

The testing is introduced as early as possible, right after the main
dependency, genalloc, has passed successfully, so that it can help
diagnosing failures in pmalloc users.

Signed-off-by: Igor Stoppa 
---
 include/linux/test_pmalloc.h |  24 
 init/main.c  |   2 +
 mm/Kconfig   |  10 
 mm/Makefile  |   1 +
 mm/test_pmalloc.c| 137 +++
 5 files changed, 174 insertions(+)
 create mode 100644 include/linux/test_pmalloc.h
 create mode 100644 mm/test_pmalloc.c

diff --git a/include/linux/test_pmalloc.h b/include/linux/test_pmalloc.h
new file mode 100644
index ..c7e2e451c17c
--- /dev/null
+++ b/include/linux/test_pmalloc.h
@@ -0,0 +1,24 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * test_pmalloc.h
+ *
+ * (C) Copyright 2018 Huawei Technologies Co. Ltd.
+ * Author: Igor Stoppa 
+ */
+
+
+#ifndef __LINUX_TEST_PMALLOC_H
+#define __LINUX_TEST_PMALLOC_H
+
+
+#ifdef CONFIG_TEST_PROTECTABLE_MEMORY
+
+void test_pmalloc(void);
+
+#else
+
+static inline void test_pmalloc(void){};
+
+#endif
+
+#endif
diff --git a/init/main.c b/init/main.c
index b795aa341a3a..27f8479c4578 100644
--- a/init/main.c
+++ b/init/main.c
@@ -91,6 +91,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -679,6 +680,7 @@ asmlinkage __visible void __init start_kernel(void)
 */
mem_encrypt_init();
 
+   test_pmalloc();
 #ifdef CONFIG_BLK_DEV_INITRD
if (initrd_start && !initrd_below_start_ok &&
page_to_pfn(virt_to_page((void *)initrd_start)) < min_low_pfn) {
diff --git a/mm/Kconfig b/mm/Kconfig
index d7ef40eaa4e8..f98b4c0aebce 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -758,3 +758,13 @@ config PROTECTABLE_MEMORY
 depends on MMU
 depends on ARCH_HAS_SET_MEMORY
 default y
+
+config TEST_PROTECTABLE_MEMORY
+   bool "Run self test for pmalloc memory allocator"
+depends on MMU
+   depends on ARCH_HAS_SET_MEMORY
+   select PROTECTABLE_MEMORY
+   default n
+   help
+ Tries to verify that pmalloc works correctly and that the memory
+ is effectively protected.
diff --git a/mm/Makefile b/mm/Makefile
index 6a6668f99799..802cba37013b 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -66,6 +66,7 @@ obj-$(CONFIG_SPARSEMEM_VMEMMAP) += sparse-vmemmap.o
 obj-$(CONFIG_SLOB) += slob.o
 obj-$(CONFIG_MMU_NOTIFIER) += mmu_notifier.o
 obj-$(CONFIG_PROTECTABLE_MEMORY) += pmalloc.o
+obj-$(CONFIG_TEST_PROTECTABLE_MEMORY) += test_pmalloc.o
 obj-$(CONFIG_KSM) += ksm.o
 obj-$(CONFIG_PAGE_POISONING) += page_poison.o
 obj-$(CONFIG_SLAB) += slab.o
diff --git a/mm/test_pmalloc.c b/mm/test_pmalloc.c
new file mode 100644
index ..b0e091bf6329
--- /dev/null
+++ b/mm/test_pmalloc.c
@@ -0,0 +1,137 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * test_pmalloc.c
+ *
+ * (C) Copyright 2018 Huawei Technologies Co. Ltd.
+ * Author: Igor Stoppa 
+ */
+
+#include 
+#include 
+#include 
+#include 
+
+#define SIZE_1 (PAGE_SIZE * 3)
+#define SIZE_2 1000
+
+
+/* wrapper for is_pmalloc_object() with messages */
+static inline bool validate_alloc(bool expected, void *addr,
+ unsigned long size)
+{
+   bool test;
+
+   test = is_pmalloc_object(addr, size) > 0;
+   pr_notice("must be %s: %s",
+ expected ? "ok" : "no", test ? "ok" : "no");
+   return test == expected;
+}
+
+
+#define is_alloc_ok(variable, size)\
+   validate_alloc(true, variable, size)
+
+
+#define is_alloc_no(variable, size)\
+   validate_alloc(false, variable, size)
+
+/* tests the basic life-cycle of a pool */
+static bool create_and_destroy_pool(void)
+{
+   static struct pmalloc_pool *pool;
+
+   pr_notice("Testing pool creation and destruction capability");
+
+   pool = pmalloc_create_pool();
+   if (WARN(!pool, "Cannot allocate memory for pmalloc selftest."))
+   return false;
+   pmalloc_destroy_pool(pool);
+   return true;
+}
+
+
+/*  verifies that it's possible to allocate from the pool */
+static bool test_alloc(void)
+{
+   static struct pmalloc_pool *pool;
+   static void *p;
+
+   pr_notice("Testing allocation capability");
+   pool = pmalloc_create_pool();
+   if (WARN(!pool, "Unable to allocate memory for pmalloc selftest."))
+   return false;
+   p = pmalloc(pool,  SIZE_1 - 1);
+   pmalloc_protect_pool(pool);
+   pmalloc_destroy_pool(pool);
+   if (WARN(!p, "Failed to allocate memory from the pool"))
+   return false;
+   return true;
+}
+
+
+/* tests the identification of pmalloc ranges */
+static bool test_is_pmalloc_object(void)
+{
+   struct pmalloc_pool *pool;
+   void *pmalloc_p;
+   void *vmalloc_p;
+   bool retval = false;
+
+   pr_notice("Test correctness of is_pmalloc_object()");
+
+   vmalloc_p = vmalloc(SIZE_1);
+

Re: [virtio-dev] Re: [PATCH v2] virtio_balloon: export hugetlb page allocation counts

2018-04-13 Thread Michael S. Tsirkin

On Fri, Apr 13, 2018 at 03:01:11PM +0800, Jason Wang wrote:
> 
> 
> On 2018年04月12日 08:24, Jonathan Helman wrote:
> > 
> > 
> > On 04/10/2018 08:12 PM, Jason Wang wrote:
> > > 
> > > 
> > > On 2018年04月10日 05:11, Jonathan Helman wrote:
> > > > 
> > > > 
> > > > On 03/22/2018 07:38 PM, Jason Wang wrote:
> > > > > 
> > > > > 
> > > > > On 2018年03月22日 11:10, Michael S. Tsirkin wrote:
> > > > > > On Thu, Mar 22, 2018 at 09:52:18AM +0800, Jason Wang wrote:
> > > > > > > On 2018年03月20日 12:26, Jonathan Helman wrote:
> > > > > > > > > On Mar 19, 2018, at 7:31 PM, Jason
> > > > > > > > > Wang wrote:
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > On 2018年03月20日 06:14, Jonathan Helman wrote:
> > > > > > > > > > Export the number of successful and failed hugetlb page
> > > > > > > > > > allocations via the virtio balloon driver. These 2 counts
> > > > > > > > > > come directly from the vm_events HTLB_BUDDY_PGALLOC and
> > > > > > > > > > HTLB_BUDDY_PGALLOC_FAIL.
> > > > > > > > > > 
> > > > > > > > > > Signed-off-by: Jonathan Helman
> > > > > > > > > Reviewed-by: Jason Wang
> > > > > > > > Thanks.
> > > > > > > > 
> > > > > > > > > > ---
> > > > > > > > > >    drivers/virtio/virtio_balloon.c | 6 ++
> > > > > > > > > >    include/uapi/linux/virtio_balloon.h | 4 +++-
> > > > > > > > > >    2 files changed, 9 insertions(+), 1 deletion(-)
> > > > > > > > > > 
> > > > > > > > > > diff --git
> > > > > > > > > > a/drivers/virtio/virtio_balloon.c
> > > > > > > > > > b/drivers/virtio/virtio_balloon.c
> > > > > > > > > > index dfe5684..6b237e3 100644
> > > > > > > > > > --- a/drivers/virtio/virtio_balloon.c
> > > > > > > > > > +++ b/drivers/virtio/virtio_balloon.c
> > > > > > > > > > @@ -272,6 +272,12 @@ static unsigned int
> > > > > > > > > > update_balloon_stats(struct
> > > > > > > > > > virtio_balloon *vb)
> > > > > > > > > > pages_to_bytes(events[PSWPOUT]));
> > > > > > > > > >    update_stat(vb, idx++,
> > > > > > > > > > VIRTIO_BALLOON_S_MAJFLT,
> > > > > > > > > > events[PGMAJFAULT]);
> > > > > > > > > >    update_stat(vb, idx++,
> > > > > > > > > > VIRTIO_BALLOON_S_MINFLT,
> > > > > > > > > > events[PGFAULT]);
> > > > > > > > > > +#ifdef CONFIG_HUGETLB_PAGE
> > > > > > > > > > +    update_stat(vb, idx++, VIRTIO_BALLOON_S_HTLB_PGALLOC,
> > > > > > > > > > +    events[HTLB_BUDDY_PGALLOC]);
> > > > > > > > > > +    update_stat(vb, idx++, VIRTIO_BALLOON_S_HTLB_PGFAIL,
> > > > > > > > > > +    events[HTLB_BUDDY_PGALLOC_FAIL]);
> > > > > > > > > > +#endif
> > > > > > > > > >    #endif
> > > > > > > > > >    update_stat(vb, idx++, VIRTIO_BALLOON_S_MEMFREE,
> > > > > > > > > >    pages_to_bytes(i.freeram));
> > > > > > > > > > diff --git
> > > > > > > > > > a/include/uapi/linux/virtio_balloon.h
> > > > > > > > > > b/include/uapi/linux/virtio_balloon.h
> > > > > > > > > > index 4e8b830..40297a3 100644
> > > > > > > > > > --- a/include/uapi/linux/virtio_balloon.h
> > > > > > > > > > +++ b/include/uapi/linux/virtio_balloon.h
> > > > > > > > > > @@ -53,7 +53,9 @@ struct virtio_balloon_config {
> > > > > > > > > >    #define VIRTIO_BALLOON_S_MEMTOT   5  
> > > > > > > > > > /* Total amount of memory */
> > > > > > > > > >    #define VIRTIO_BALLOON_S_AVAIL    6  
> > > > > > > > > > /* Available memory as in /proc */
> > > > > > > > > >    #define VIRTIO_BALLOON_S_CACHES   7   /* Disk caches */
> > > > > > > > > > -#define VIRTIO_BALLOON_S_NR   8
> > > > > > > > > > +#define VIRTIO_BALLOON_S_HTLB_PGALLOC 
> > > > > > > > > > 8  /* Hugetlb page allocations */
> > > > > > > > > > +#define VIRTIO_BALLOON_S_HTLB_PGFAIL  
> > > > > > > > > > 9  /* Hugetlb page allocation failures
> > > > > > > > > > */
> > > > > > > > > > +#define VIRTIO_BALLOON_S_NR   10
> > > > > > > > > >  /*
> > > > > > > > > >     * Memory statistics structure.
> > > > > > > > > Not for this patch, but it looks to me that
> > > > > > > > > exporting such nr through uapi is fragile.
> > > > > > > > Sorry, can you explain what you mean here?
> > > > > > > > 
> > > > > > > > Jon
> > > > > > > Spec said "Within an output buffer submitted to the
> > > > > > > statsq, the device MUST
> > > > > > > ignore entries with tag values that it does not
> > > > > > > recognize". So exporting
> > > > > > > VIRTIO_BALLOON_S_NR seems useless and device
> > > > > > > implementation can not depend
> > > > > > > on such number in uapi.
> > > > > > > 
> > > > > > > Thanks
> > > > > > Suggestions? I don't like to break build for people ...
> > > > > > 
> > > > > 
> > > > > Didn't have a good idea. But maybe we should keep
> > > > > VIRTIO_BALLOON_S_NR unchanged, and add a comment here.
> > > > > 
> > > > > Thanks
> > > > 
> > > > I think Jason's comment is for a future patch. Didn't see this
> > > > patch get applied, so wondering if it could be.
> > > > 
> > > > Thanks,
> > > > Jon
> > > 
> > > Hi Jon:
> > > 
> > > Have you tested new driver with old qemu?
> > 
> > Yes, this testing scenario looks g

[PATCH] virtio_balloon: add array of stat names

2018-04-13 Thread Michael S. Tsirkin

Jason Wang points out that it's vary hard for users to build an array of
stat names. The naive thing is to use VIRTIO_BALLOON_S_NR but that
breaks if we add more stats.

Let's add an array of reasonably readable names.

Fixes: 6c64fe7f2 ("virtio_balloon: export hugetlb page allocation counts")
Cc: Jason Wang 
Cc: Jonathan Helman ,
Signed-off-by: Michael S. Tsirkin 
---
 include/uapi/linux/virtio_balloon.h | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/include/uapi/linux/virtio_balloon.h 
b/include/uapi/linux/virtio_balloon.h
index 9e02137..1477c17 100644
--- a/include/uapi/linux/virtio_balloon.h
+++ b/include/uapi/linux/virtio_balloon.h
@@ -64,6 +64,21 @@ struct virtio_balloon_config {
 #define VIRTIO_BALLOON_S_HTLB_PGFAIL   9  /* Hugetlb page allocation failures 
*/
 #define VIRTIO_BALLOON_S_NR   10
 
+#define VIRTIO_BALLOON_S_NAMES_WITH_PREFIX(VIRTIO_BALLOON_S_NAMES_prefix) { \
+   VIRTIO_BALLOON_S_NAMES_prefix "swap-in", \
+   VIRTIO_BALLOON_S_NAMES_prefix "swap-out", \
+   VIRTIO_BALLOON_S_NAMES_prefix "major-faults", \
+   VIRTIO_BALLOON_S_NAMES_prefix "minor-faults", \
+   VIRTIO_BALLOON_S_NAMES_prefix "free-memory", \
+   VIRTIO_BALLOON_S_NAMES_prefix "total-memory", \
+   VIRTIO_BALLOON_S_NAMES_prefix "available-memory", \
+   VIRTIO_BALLOON_S_NAMES_prefix "disk-caches", \
+   VIRTIO_BALLOON_S_NAMES_prefix "hugetlb-allocations", \
+   VIRTIO_BALLOON_S_NAMES_prefix "hugetlb-failures" \
+}
+
+#define VIRTIO_BALLOON_S_NAMES VIRTIO_BALLOON_S_NAMES_WITH_PREFIX("")
+
 /*
  * Memory statistics structure.
  * Driver fills an array of these structures and passes to device.
-- 
MST

[PATCH 6/6] lkdtm: crash on overwriting protected pmalloc var

2018-04-13 Thread Igor Stoppa

Verify that pmalloc read-only protection is in place: trying to
overwrite a protected variable will crash the kernel.

Signed-off-by: Igor Stoppa 
---
 drivers/misc/lkdtm/core.c  |  3 +++
 drivers/misc/lkdtm/lkdtm.h |  1 +
 drivers/misc/lkdtm/perms.c | 25 +
 3 files changed, 29 insertions(+)

diff --git a/drivers/misc/lkdtm/core.c b/drivers/misc/lkdtm/core.c
index 2154d1bfd18b..c9fd42bda6ee 100644
--- a/drivers/misc/lkdtm/core.c
+++ b/drivers/misc/lkdtm/core.c
@@ -155,6 +155,9 @@ static const struct crashtype crashtypes[] = {
CRASHTYPE(ACCESS_USERSPACE),
CRASHTYPE(WRITE_RO),
CRASHTYPE(WRITE_RO_AFTER_INIT),
+#ifdef CONFIG_PROTECTABLE_MEMORY
+   CRASHTYPE(WRITE_RO_PMALLOC),
+#endif
CRASHTYPE(WRITE_KERN),
CRASHTYPE(REFCOUNT_INC_OVERFLOW),
CRASHTYPE(REFCOUNT_ADD_OVERFLOW),
diff --git a/drivers/misc/lkdtm/lkdtm.h b/drivers/misc/lkdtm/lkdtm.h
index 9e513dcfd809..dcda3ae76ceb 100644
--- a/drivers/misc/lkdtm/lkdtm.h
+++ b/drivers/misc/lkdtm/lkdtm.h
@@ -38,6 +38,7 @@ void lkdtm_READ_BUDDY_AFTER_FREE(void);
 void __init lkdtm_perms_init(void);
 void lkdtm_WRITE_RO(void);
 void lkdtm_WRITE_RO_AFTER_INIT(void);
+void lkdtm_WRITE_RO_PMALLOC(void);
 void lkdtm_WRITE_KERN(void);
 void lkdtm_EXEC_DATA(void);
 void lkdtm_EXEC_STACK(void);
diff --git a/drivers/misc/lkdtm/perms.c b/drivers/misc/lkdtm/perms.c
index 53b85c9d16b8..4660ff0bfa44 100644
--- a/drivers/misc/lkdtm/perms.c
+++ b/drivers/misc/lkdtm/perms.c
@@ -9,6 +9,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 /* Whether or not to fill the target memory area with do_nothing(). */
@@ -104,6 +105,30 @@ void lkdtm_WRITE_RO_AFTER_INIT(void)
*ptr ^= 0xabcd1234;
 }
 
+#ifdef CONFIG_PROTECTABLE_MEMORY
+void lkdtm_WRITE_RO_PMALLOC(void)
+{
+   struct pmalloc_pool *pool;
+   int *i;
+
+   pool = pmalloc_create_pool();
+   if (WARN(!pool, "Failed preparing pool for pmalloc test."))
+   return;
+
+   i = (int *)pmalloc(pool, sizeof(int));
+   if (WARN(!i, "Failed allocating memory for pmalloc test.")) {
+   pmalloc_destroy_pool(pool);
+   return;
+   }
+
+   *i = INT_MAX;
+   pmalloc_protect_pool(pool);
+
+   pr_info("attempting bad pmalloc write at %p\n", i);
+   *i = 0;
+}
+#endif
+
 void lkdtm_WRITE_KERN(void)
 {
size_t size;
-- 
2.14.1

[PATCH 3/6] Protectable Memory

2018-04-13 Thread Igor Stoppa

The MMU available in many systems running Linux can often provide R/O
protection to the memory pages it handles.

However, the MMU-based protection works efficiently only when said pages
contain exclusively data that will not need further modifications.

Statically allocated variables can be segregated into a dedicated
section (that's how __ro_after_init works), but this does not sit very
well with dynamically allocated ones.

Dynamic allocation does not provide, currently, any means for grouping
variables in memory pages that would contain exclusively data suitable
for conversion to read only access mode.

The allocator here provided (pmalloc - protectable memory allocator)
introduces the concept of pools of protectable memory.

A module can instantiate a pool, and then refer any allocation request to
the pool handler it has received.

A pool is organized ias list of areas of virtually contiguous memory.
Whenever the protection functionality is invoked on a pool, all the
areas it contains that are not yet read-only are write-protected.

The process of growing and protecting the pool can be iterated at will.
Each iteration will prevent further allocation from the memory area
currently active, turn it into read-only mode and then proceed to
secure whatever other area might still be unprotected.

Write-protcting some part of a pool before completing all the
allocations can be wasteful, however it will guarrantee the minimum
window of vulnerability, sice the data can be allocated, initialized
and protected in a single sweep.

There are pros and cons, depending on the allocation patterns, the size
of the areas being allocated, the time intervals between initialization
and protection.

Dstroying a pool is the only way to claim back the associated memory.
It is up to its user to avoid any further references to the memory that
was allocated, once the destruction is invoked.

An example where it is desirable to destroy a pool and claim back its
memory is when unloading a kernel module.

A module can have as many pools as needed.

Since pmalloc memory is obtained from vmalloc, an attacker that has
gained access to the physical mapping, still has to identify where the
target of the attack (in virtually contiguous mapping) is located.

Compared to plain vmalloc, pmalloc does not generate as much TLB
trashing, since it can host multiple allocations in the same page,
where present.

Signed-off-by: Igor Stoppa 
---
 include/linux/pmalloc.h | 166 ++
 include/linux/vmalloc.h |   3 +
 mm/Kconfig  |   6 ++
 mm/Makefile |   1 +
 mm/pmalloc.c| 265 
 mm/usercopy.c   |  33 ++
 mm/vmalloc.c|   2 +-
 7 files changed, 475 insertions(+), 1 deletion(-)
 create mode 100644 include/linux/pmalloc.h
 create mode 100644 mm/pmalloc.c

diff --git a/include/linux/pmalloc.h b/include/linux/pmalloc.h
new file mode 100644
index ..1c24067eb167
--- /dev/null
+++ b/include/linux/pmalloc.h
@@ -0,0 +1,166 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * pmalloc.h: Header for Protectable Memory Allocator
+ *
+ * (C) Copyright 2017-18 Huawei Technologies Co. Ltd.
+ * Author: Igor Stoppa 
+ */
+
+#ifndef _LINUX_PMALLOC_H
+#define _LINUX_PMALLOC_H
+
+
+#include 
+#include 
+
+/*
+ * Library for dynamic allocation of pools of protectable memory.
+ * A pool is a single linked list of vmap_area structures.
+ * Whenever a pool is protected, all the areas it contain at that point
+ * are write protected.
+ * More areas can be added and protected, in the same way.
+ * Memory in a pool cannot be individually unprotected, but the pool can
+ * be destroyed.
+ * Upon destruction of a certain pool, all the related memory is released,
+ * including its metadata.
+ *
+ * Pmalloc memory is intended to complement __read_only_after_init.
+ * It can be used, for example, where there is a write-once variable, for
+ * which it is not possible to know the initialization value before init
+ * is completed (which is what __read_only_after_init requires).
+ *
+ * It can be useful also where the amount of data to protect is not known
+ * at compile time and the memory can only be allocated dynamically.
+ *
+ * Finally, it can be useful also when it is desirable to control
+ * dynamically (for example throguh the command line) if something ought
+ * to be protected or not, without having to rebuild the kernel (like in
+ * the build used for a linux distro).
+ */
+
+
+#define PMALLOC_REFILL_DEFAULT (0)
+#define PMALLOC_ALIGN_DEFAULT ARCH_KMALLOC_MINALIGN
+
+struct pmalloc_pool *pmalloc_create_custom_pool(size_t refill,
+   unsigned short align_order);
+
+/**
+ * pmalloc_create_pool() - create a protectable memory pool
+ *
+ * Shorthand for pmalloc_create_custom_pool() with default argument:
+ * * refill is set to PMALLOC_REFILL_DEFAULT
+ * * align_order is set to PMALLOC_ALIGN_DEFAULT
+

Applied "ASoC: tfa9879: switch to SPDX license tag" to the asoc tree

2018-04-13 Thread Mark Brown

The patch

   ASoC: tfa9879: switch to SPDX license tag

has been applied to the asoc tree at

   https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound.git 

All being well this means that it will be integrated into the linux-next
tree (usually sometime in the next 24 hours) and sent to Linus during
the next merge window (or sooner if it is a bug fix), however if
problems are discovered then the patch may be dropped or reverted.  

You may get further e-mails resulting from automated or manual testing
and review of the tree, please engage with people reporting problems and
send followup patches addressing any issues that are reported if needed.

If any updates are required or you are submitting further changes they
should be sent as incremental updates against current git, existing
patches will not be replaced.

Please add any relevant lists and maintainers to the CCs when replying
to this mail.

Thanks,
Mark

>From 55c19bd95f4dac2ee221272349900dda75a67ebb Mon Sep 17 00:00:00 2001
From: Peter Rosin 
Date: Fri, 13 Apr 2018 13:47:51 +0200
Subject: [PATCH] ASoC: tfa9879: switch to SPDX license tag

It's less overhead, clearer and generally neater.

Signed-off-by: Peter Rosin 
Signed-off-by: Mark Brown 
---
 sound/soc/codecs/tfa9879.c | 18 ++
 sound/soc/codecs/tfa9879.h |  7 +--
 2 files changed, 7 insertions(+), 18 deletions(-)

diff --git a/sound/soc/codecs/tfa9879.c b/sound/soc/codecs/tfa9879.c
index 4ed020262a27..abc114a3ae2b 100644
--- a/sound/soc/codecs/tfa9879.c
+++ b/sound/soc/codecs/tfa9879.c
@@ -1,15 +1,9 @@
-/*
- * tfa9879.c  --  driver for NXP Semiconductors TFA9879
- *
- * Copyright (C) 2014 Axentia Technologies AB
- * Author: Peter Rosin 
- *
- *  This program is free software; you can redistribute  it and/or modify it
- *  under  the terms of  the GNU General  Public License as published by the
- *  Free Software Foundation;  either version 2 of the  License, or (at your
- *  option) any later version.
- *
- */
+// SPDX-License-Identifier: GPL-2.0+
+//
+// tfa9879.c  --  driver for NXP Semiconductors TFA9879
+//
+// Copyright (C) 2014 Axentia Technologies AB
+// Author: Peter Rosin 
 
 #include 
 #include 
diff --git a/sound/soc/codecs/tfa9879.h b/sound/soc/codecs/tfa9879.h
index 3408c90c4628..66c88d0396fe 100644
--- a/sound/soc/codecs/tfa9879.h
+++ b/sound/soc/codecs/tfa9879.h
@@ -1,14 +1,9 @@
+/* SPDX-License-Identifier: GPL-2.0+ */
 /*
  * tfa9879.h  --  driver for NXP Semiconductors TFA9879
  *
  * Copyright (C) 2014 Axentia Technologies AB
  * Author: Peter Rosin 
- *
- *  This program is free software; you can redistribute  it and/or modify it
- *  under  the terms of  the GNU General  Public License as published by the
- *  Free Software Foundation;  either version 2 of the  License, or (at your
- *  option) any later version.
- *
  */
 
 #ifndef _TFA9879_H
-- 
2.17.0

[RFC PATCH v22 0/6] mm: security: ro protection for dynamic data

2018-04-13 Thread Igor Stoppa

This patch-set introduces the possibility of protecting memory that has
been allocated dynamically.

The memory is managed in pools: when a memory pool is protected, all the
memory that is currently part of it, will become R/O.

A R/O pool can be expanded (adding more protectable memory).
It can also be destroyed, to recover its memory, but it cannot be
turned back into R/W mode.

This is intentional. This feature is meant for data that doesn't need
further modifications after initialization.

However the data might need to be released, for example as part of module
unloading. The pool, therefore, can be destroyed.

An example is provided, in the form of self-testing.

Since it was advised to give an example of protecting real kernel data
[1],
a well known vulnerability has been used to demo an effective use of
pmalloc.

[1] http://www.openwall.com/lists/kernel-hardening/2018/03/29/7

However it turned out to be almost an how-to for attacking the kernel, so
it was sent first to secur...@kernel.org, for obtaining clearance about
the
publication.

Changes since v21:

[http://www.openwall.com/lists/kernel-hardening/2018/03/27/23]

* fixed type mismatch error in use of max(), detected by gcc 7.3
* converted internal types into size_t
* fixed leak of vmalloc memory in the self-test code

Igor Stoppa (6):
  struct page: add field for vm_struct
  vmalloc: rename llist field in vmap_area
  Protectable Memory
  Documentation for Pmalloc
  Pmalloc selftest
  lkdtm: crash on overwriting protected pmalloc var

Igor Stoppa (6):
  struct page: add field for vm_struct
  vmalloc: rename llist field in vmap_area
  Protectable Memory
  Documentation for Pmalloc
  Pmalloc selftest
  lkdtm: crash on overwriting protected pmalloc var

 Documentation/core-api/index.rst   |   1 +
 Documentation/core-api/pmalloc.rst | 107 +++
 drivers/misc/lkdtm/core.c  |   3 +
 drivers/misc/lkdtm/lkdtm.h |   1 +
 drivers/misc/lkdtm/perms.c |  25 
 include/linux/mm_types.h   |   1 +
 include/linux/pmalloc.h| 166 +++
 include/linux/test_pmalloc.h   |  24 
 include/linux/vmalloc.h|   5 +-
 init/main.c|   2 +
 mm/Kconfig |  16 +++
 mm/Makefile|   2 +
 mm/pmalloc.c   | 265 +
 mm/test_pmalloc.c  | 137 +++
 mm/usercopy.c  |  33 +
 mm/vmalloc.c   |  10 +-
 16 files changed, 793 insertions(+), 5 deletions(-)
 create mode 100644 Documentation/core-api/pmalloc.rst
 create mode 100644 include/linux/pmalloc.h
 create mode 100644 include/linux/test_pmalloc.h
 create mode 100644 mm/pmalloc.c
 create mode 100644 mm/test_pmalloc.c

-- 
2.14.1

[PATCH 2/6] vmalloc: rename llist field in vmap_area

2018-04-13 Thread Igor Stoppa

The vmap_area structure has a field of type struct llist_node, named
purge_list and is used when performing lazy purge of the area.

Such field is left unused during the actual utilization of the
structure.

This patch renames the field to a more generic "area_list", to allow for
utilization outside of the purging phase.

Since the purging happens after the vmap_area is dismissed, its use is
mutually exclusive with any use performed while the area is allocated.

Signed-off-by: Igor Stoppa 
---
 include/linux/vmalloc.h | 2 +-
 mm/vmalloc.c| 6 +++---
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
index 1e5d8c392f15..2d07dfef3cfd 100644
--- a/include/linux/vmalloc.h
+++ b/include/linux/vmalloc.h
@@ -47,7 +47,7 @@ struct vmap_area {
unsigned long flags;
struct rb_node rb_node; /* address sorted rbtree */
struct list_head list;  /* address sorted list */
-   struct llist_node purge_list;/* "lazy purge" list */
+   struct llist_node area_list;/* generic list of areas */
struct vm_struct *vm;
struct rcu_head rcu_head;
 };
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 61a1ca22b0f6..1bb2233bb262 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -682,7 +682,7 @@ static bool __purge_vmap_area_lazy(unsigned long start, 
unsigned long end)
lockdep_assert_held(&vmap_purge_lock);
 
valist = llist_del_all(&vmap_purge_list);
-   llist_for_each_entry(va, valist, purge_list) {
+   llist_for_each_entry(va, valist, area_list) {
if (va->va_start < start)
start = va->va_start;
if (va->va_end > end)
@@ -696,7 +696,7 @@ static bool __purge_vmap_area_lazy(unsigned long start, 
unsigned long end)
flush_tlb_kernel_range(start, end);
 
spin_lock(&vmap_area_lock);
-   llist_for_each_entry_safe(va, n_va, valist, purge_list) {
+   llist_for_each_entry_safe(va, n_va, valist, area_list) {
int nr = (va->va_end - va->va_start) >> PAGE_SHIFT;
 
__free_vmap_area(va);
@@ -743,7 +743,7 @@ static void free_vmap_area_noflush(struct vmap_area *va)
&vmap_lazy_nr);
 
/* After this point, we may free va at any time */
-   llist_add(&va->purge_list, &vmap_purge_list);
+   llist_add(&va->area_list, &vmap_purge_list);
 
if (unlikely(nr_lazy > lazy_max_pages()))
try_purge_vmap_area_lazy();
-- 
2.14.1

[PATCH 1/6] struct page: add field for vm_struct

2018-04-13 Thread Igor Stoppa

When a page is used for virtual memory, it is often necessary to obtain
a handler to the corresponding vm_struct, which refers to the virtually
continuous area generated when invoking vmalloc.

The struct page has a "mapping" field, which can be re-used, to store a
pointer to the parent area.

This will avoid more expensive searches, later on.

Signed-off-by: Igor Stoppa 
Reviewed-by: Jay Freyensee 
Reviewed-by: Matthew Wilcox 
---
 include/linux/mm_types.h | 1 +
 mm/vmalloc.c | 2 ++
 2 files changed, 3 insertions(+)

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 21612347d311..c74e2aa9a48b 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -86,6 +86,7 @@ struct page {
void *s_mem;/* slab first object */
atomic_t compound_mapcount; /* first tail page */
/* page_deferred_list().next -- second tail page */
+   struct vm_struct *area;
};
 
/* Second double word */
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index ebff729cc956..61a1ca22b0f6 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -1536,6 +1536,7 @@ static void __vunmap(const void *addr, int 
deallocate_pages)
struct page *page = area->pages[i];
 
BUG_ON(!page);
+   page->area = NULL;
__free_pages(page, 0);
}
 
@@ -1705,6 +1706,7 @@ static void *__vmalloc_area_node(struct vm_struct *area, 
gfp_t gfp_mask,
area->nr_pages = i;
goto fail;
}
+   page->area = area;
area->pages[i] = page;
if (gfpflags_allow_blocking(gfp_mask|highmem_mask))
cond_resched();
-- 
2.14.1

Re: [PATCH RFC 5/8] mm: only mark section offline when all pages are offline

2018-04-13 Thread David Hildenbrand

On 13.04.2018 15:32, David Hildenbrand wrote:
> If any page is still online, the section should stay online.
> 
> Signed-off-by: David Hildenbrand 
> ---

This is a duplicate, please ignore.

(get_maintainers.sh and my mail server had a little clinch, so I had to
send half of the series out manually -_- )

-- 

Thanks,

David / dhildenb

Re: [PATCH v3 0/2] ASoC: max9860/tfa9879: switch to SPDX license tag

2018-04-13 Thread Mark Brown

On Fri, Apr 13, 2018 at 01:47:49PM +0200, Peter Rosin wrote:

> Peter Rosin (2):
>   ASoC: max9860: switch to SPDX license tag

This one didn't turn up yet - it's only just been sent though so it
might be stuck in a mail queue somewhere, I've applied patch 2 and I
expect I'll apply this one as soon as it appears.

signature.asc
Description: PGP signature

Re: [PATCH RFC 2/8] mm: introduce PG_offline

2018-04-13 Thread Michal Hocko

On Fri 13-04-18 15:16:26, David Hildenbrand wrote:
> online_pages()/offline_pages() theoretically allows us to work on
> sub-section sizes. This is especially relevant in the context of
> virtualization. It e.g. allows us to add/remove memory to Linux in a VM in
> 4MB chunks.

Well, theoretically possible but this would require a lot of auditing
because the hotplug and per section assumption is quite a spread one.

> While the whole section is marked as online/offline, we have to know
> the state of each page. E.g. to not read memory that is not online
> during kexec() or to properly mark a section as offline as soon as all
> contained pages are offline.

But you cannot use a page flag for that, I am afraid. Page flags are
extremely scarce resource. I haven't looked at the rest of the series
but _if_ we have a bit spare which I am not really sure about then you
should prove there are no other ways around this.

> Signed-off-by: David Hildenbrand 
-- 
Michal Hocko
SUSE Labs

Re: [PATCH 3/3] dcache: account external names as indirectly reclaimable memory

2018-04-13 Thread Minchan Kim

On Mon, Mar 05, 2018 at 01:37:43PM +, Roman Gushchin wrote:
> I was reported about suspicious growth of unreclaimable slabs
> on some machines. I've found that it happens on machines
> with low memory pressure, and these unreclaimable slabs
> are external names attached to dentries.
> 
> External names are allocated using generic kmalloc() function,
> so they are accounted as unreclaimable. But they are held
> by dentries, which are reclaimable, and they will be reclaimed
> under the memory pressure.
> 
> In particular, this breaks MemAvailable calculation, as it
> doesn't take unreclaimable slabs into account.
> This leads to a silly situation, when a machine is almost idle,
> has no memory pressure and therefore has a big dentry cache.
> And the resulting MemAvailable is too low to start a new workload.
> 
> To address the issue, the NR_INDIRECTLY_RECLAIMABLE_BYTES counter
> is used to track the amount of memory, consumed by external names.
> The counter is increased in the dentry allocation path, if an external
> name structure is allocated; and it's decreased in the dentry freeing
> path.
> 
> To reproduce the problem I've used the following Python script:
>   import os
> 
>   for iter in range (0, 1000):
>   try:
>   name = ("/some_long_name_%d" % iter) + "_" * 220
>   os.stat(name)
>   except Exception:
>   pass
> 
> Without this patch:
>   $ cat /proc/meminfo | grep MemAvailable
>   MemAvailable:7811688 kB
>   $ python indirect.py
>   $ cat /proc/meminfo | grep MemAvailable
>   MemAvailable:2753052 kB
> 
> With the patch:
>   $ cat /proc/meminfo | grep MemAvailable
>   MemAvailable:7809516 kB
>   $ python indirect.py
>   $ cat /proc/meminfo | grep MemAvailable
>   MemAvailable:7749144 kB
> 
> Signed-off-by: Roman Gushchin 
> Cc: Andrew Morton 
> Cc: Alexander Viro 
> Cc: Michal Hocko 
> Cc: Johannes Weiner 
> Cc: linux-fsde...@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Cc: linux...@kvack.org
> Cc: kernel-t...@fb.com
> ---
>  fs/dcache.c | 29 -
>  1 file changed, 24 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/dcache.c b/fs/dcache.c
> index 5c7df1df81ff..a0312d73f575 100644
> --- a/fs/dcache.c
> +++ b/fs/dcache.c
> @@ -273,8 +273,16 @@ static void __d_free(struct rcu_head *head)
>  static void __d_free_external(struct rcu_head *head)
>  {
>   struct dentry *dentry = container_of(head, struct dentry, d_u.d_rcu);
> - kfree(external_name(dentry));
> - kmem_cache_free(dentry_cache, dentry); 
> + struct external_name *name = external_name(dentry);
> + unsigned long bytes;
> +
> + bytes = dentry->d_name.len + offsetof(struct external_name, name[1]);
> + mod_node_page_state(page_pgdat(virt_to_page(name)),
> + NR_INDIRECTLY_RECLAIMABLE_BYTES,
> + -kmalloc_size(kmalloc_index(bytes)));
> +
> + kfree(name);
> + kmem_cache_free(dentry_cache, dentry);
>  }
>  
>  static inline int dname_external(const struct dentry *dentry)
> @@ -1598,6 +1606,7 @@ struct dentry *__d_alloc(struct super_block *sb, const 
> struct qstr *name)
>   struct dentry *dentry;
>   char *dname;
>   int err;
> + size_t reclaimable = 0;
>  
>   dentry = kmem_cache_alloc(dentry_cache, GFP_KERNEL);
>   if (!dentry)
> @@ -1614,9 +1623,11 @@ struct dentry *__d_alloc(struct super_block *sb, const 
> struct qstr *name)
>   name = &slash_name;
>   dname = dentry->d_iname;
>   } else if (name->len > DNAME_INLINE_LEN-1) {
> - size_t size = offsetof(struct external_name, name[1]);
> - struct external_name *p = kmalloc(size + name->len,
> -   GFP_KERNEL_ACCOUNT);
> + struct external_name *p;
> +
> + reclaimable = offsetof(struct external_name, name[1]) +
> + name->len;
> + p = kmalloc(reclaimable, GFP_KERNEL_ACCOUNT);

Can't we use kmem_cache_alloc with own cache created with SLAB_RECLAIM_ACCOUNT
if they are reclaimable? 
With that, it would help fragmentation problem with __GFP_RECLAIMABLE for
page allocation as well as counting problem, IMHO.


>   if (!p) {
>   kmem_cache_free(dentry_cache, dentry); 
>   return NULL;
> @@ -1665,6 +1676,14 @@ struct dentry *__d_alloc(struct super_block *sb, const 
> struct qstr *name)
>   }
>   }
>  
> + if (unlikely(reclaimable)) {
> + pg_data_t *pgdat;
> +
> + pgdat = page_pgdat(virt_to_page(external_name(dentry)));
> + mod_node_page_state(pgdat, NR_INDIRECTLY_RECLAIMABLE_BYTES,
> + kmalloc_size(kmalloc_index(reclaimable)));
> + }
> +
>   this_cpu_inc(nr_dentry);
>  
>   return dentry;
> -- 
> 2.14.3
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majord...@kvack.org.  For more

Re: [RFC PATCH 11/35] ovl: readd read_iter

2018-04-13 Thread Amir Goldstein

On Thu, Apr 12, 2018 at 6:08 PM, Miklos Szeredi  wrote:
> Implement stacked reading.
>

I couldn't decipher the meaning of "readd" in the subject of this
and other file ops pacthes??

> Signed-off-by: Miklos Szeredi 
> ---
>  fs/overlayfs/file.c | 56 
> +
>  1 file changed, 56 insertions(+)
>
> diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c
> index 409b542ff30c..a19429c5965d 100644
> --- a/fs/overlayfs/file.c
> +++ b/fs/overlayfs/file.c
> @@ -9,6 +9,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include "overlayfs.h"
>
>  static struct file *ovl_open_realfile(const struct file *file)
> @@ -129,8 +130,63 @@ static loff_t ovl_llseek(struct file *file, loff_t 
> offset, int whence)
> i_size_read(realinode));
>  }
>
> +static void ovl_file_accessed(struct file *file)
> +{
> +   struct inode *inode = file_inode(file);
> +
> +   if ((file->f_flags & O_NOATIME) || !ovl_inode_upper(inode))
> +   return;
> +
> +   ovl_copytimes(inode);
> +   touch_atime(&file->f_path);
> +}
> +
> +static rwf_t ovl_iocb_to_rwf(struct kiocb *iocb)
> +{
> +   int ifl = iocb->ki_flags;
> +   rwf_t flags = 0;
> +
> +   if (ifl & IOCB_NOWAIT)
> +   flags |= RWF_NOWAIT;
> +   if (ifl & IOCB_HIPRI)
> +   flags |= RWF_HIPRI;
> +   if (ifl & IOCB_DSYNC)
> +   flags |= RWF_DSYNC;
> +   if (ifl & IOCB_SYNC)
> +   flags |= RWF_SYNC;
> +
> +   return flags;
> +}
> +
> +static ssize_t ovl_read_iter(struct kiocb *iocb, struct iov_iter *iter)
> +{
> +   struct file *file = iocb->ki_filp;
> +   struct fd real;
> +   const struct cred *old_cred;
> +   ssize_t ret;
> +
> +   if (!iov_iter_count(iter))
> +   return 0;
> +
> +   ret = ovl_real_file(file, &real);
> +   if (ret)
> +   return ret;
> +
> +   old_cred = ovl_override_creds(file_inode(file)->i_sb);
> +   ret = vfs_iter_read(real.file, iter, &iocb->ki_pos,
> +   ovl_iocb_to_rwf(iocb));
> +   revert_creds(old_cred);
> +
> +   ovl_file_accessed(file);
> +
> +   fdput(real);

I find it confusing that the name of ovl_real_file() does not suggest it
may take a reference, so this  fdput() looks unbalanced.

All other ovl_XXX_{real,upper,lower} helpers do not take a reference.
Perhaps something along the lines of ovl_file_real_fdget().

Thanks,
Amir.

Re: [PATCH v9 00/24] Speculative page faults

2018-04-13 Thread Laurent Dufour

On 14/03/2018 14:11, Michal Hocko wrote:
> On Tue 13-03-18 18:59:30, Laurent Dufour wrote:
>> Changes since v8:
>>  - Don't check PMD when locking the pte when THP is disabled
>>Thanks to Daniel Jordan for reporting this.
>>  - Rebase on 4.16
> 
> Is this really worth reposting the whole pile? I mean this is at v9,
> each doing little changes. It is quite tiresome to barely get to a
> bookmarked version just to find out that there are 2 new versions out.
> 
> I am sorry to be grumpy and I can understand some frustration it doesn't
> move forward that easilly but this is a _big_ change. We should start
> with a real high level review rather than doing small changes here and
> there and reach v20 quickly.

I know this would mean v10, but there has been a bunch of reviews from David
Rientjes and Jerome Glisse, and I had to make many changes to address them.
So I think this is time to push a v10.

If you have already started a review of this v9 series, please send me your
remarks so that I can compile them in this v10 asap.

Thanks,
Laurent.

[PATCH RFC 7/8] mm: allow to control onlining/offlining of memory by a driver

2018-04-13 Thread David Hildenbrand

Some devices (esp. paravirtualized) might want to control
- when to online/offline a memory block
- how to online memory (MOVABLE/NORMAL)
- in which granularity to online/offline memory

So let's add a new flag "driver_managed" and disallow to change the
state by user space. Device onlining/offlining will still work, however
the memory will not be actually onlined/offlined. That has to be handled
by the device driver that owns the memory.

Signed-off-by: David Hildenbrand 
---
 drivers/base/memory.c  | 22 ++
 drivers/xen/balloon.c  |  2 +-
 include/linux/memory.h |  1 +
 include/linux/memory_hotplug.h |  4 +++-
 mm/memory_hotplug.c| 34 --
 5 files changed, 51 insertions(+), 12 deletions(-)

diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index bffe8616bd55..3b8616551561 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -231,27 +231,28 @@ static bool pages_correctly_probed(unsigned long 
start_pfn)
  * Must already be protected by mem_hotplug_begin().
  */
 static int
-memory_block_action(unsigned long phys_index, unsigned long action, int 
online_type)
+memory_block_action(struct memory_block *mem, unsigned long action)
 {
-   unsigned long start_pfn;
+   unsigned long start_pfn = section_nr_to_pfn(mem->start_section_nr);
unsigned long nr_pages = PAGES_PER_SECTION * sections_per_block;
-   int ret;
+   int ret = 0;
 
-   start_pfn = section_nr_to_pfn(phys_index);
+   if (mem->driver_managed)
+   return 0;
 
switch (action) {
case MEM_ONLINE:
if (!pages_correctly_probed(start_pfn))
return -EBUSY;
 
-   ret = online_pages(start_pfn, nr_pages, online_type);
+   ret = online_pages(start_pfn, nr_pages, mem->online_type);
break;
case MEM_OFFLINE:
ret = offline_pages(start_pfn, nr_pages);
break;
default:
WARN(1, KERN_WARNING "%s(%ld, %ld) unknown action: "
-"%ld\n", __func__, phys_index, action, action);
+"%ld\n", __func__, mem->start_section_nr, action, action);
ret = -EINVAL;
}
 
@@ -269,8 +270,7 @@ static int memory_block_change_state(struct memory_block 
*mem,
if (to_state == MEM_OFFLINE)
mem->state = MEM_GOING_OFFLINE;
 
-   ret = memory_block_action(mem->start_section_nr, to_state,
-   mem->online_type);
+   ret = memory_block_action(mem, to_state);
 
mem->state = ret ? from_state_req : to_state;
 
@@ -350,6 +350,11 @@ store_mem_state(struct device *dev,
 */
mem_hotplug_begin();
 
+   if (mem->driver_managed) {
+   ret = -EINVAL;
+   goto out;
+   }
+
switch (online_type) {
case MMOP_ONLINE_KERNEL:
case MMOP_ONLINE_MOVABLE:
@@ -364,6 +369,7 @@ store_mem_state(struct device *dev,
ret = -EINVAL; /* should never happen */
}
 
+out:
mem_hotplug_done();
 err:
unlock_device_hotplug();
diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c
index 065f0b607373..89981d573c06 100644
--- a/drivers/xen/balloon.c
+++ b/drivers/xen/balloon.c
@@ -401,7 +401,7 @@ static enum bp_state reserve_additional_memory(void)
 * callers drop the mutex before trying again.
 */
mutex_unlock(&balloon_mutex);
-   rc = add_memory_resource(nid, resource, memhp_auto_online);
+   rc = add_memory_resource(nid, resource, memhp_auto_online, false);
mutex_lock(&balloon_mutex);
 
if (rc) {
diff --git a/include/linux/memory.h b/include/linux/memory.h
index 9f8cd856ca1e..018c5e5ecde1 100644
--- a/include/linux/memory.h
+++ b/include/linux/memory.h
@@ -29,6 +29,7 @@ struct memory_block {
unsigned long state;/* serialized by the dev->lock */
int section_count;  /* serialized by mem_sysfs_mutex */
int online_type;/* for passing data to online routine */
+   bool driver_managed;/* driver handles online/offline */
int phys_device;/* to which fru does this belong? */
void *hw;   /* optional pointer to fw/hw data */
int (*phys_callback)(struct memory_block *);
diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index e0e49b5b1ee1..46c6ceb1110d 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -320,7 +320,9 @@ static inline void remove_memory(int nid, u64 start, u64 
size) {}
 extern int walk_memory_range(unsigned long start_pfn, unsigned long end_pfn,
void *arg, int (*func)(struct memory_block *, void *));
 extern int add_memory(int nid, u64 start, u64 size);
-extern int add_memory_resource(int nid, struct resource *resource, bool 
online);
+

[PATCH RFC 5/8] mm: only mark section offline when all pages are offline

2018-04-13 Thread David Hildenbrand

If any page is still online, the section should stay online.

Signed-off-by: David Hildenbrand 
---
 mm/page_alloc.c |  2 +-
 mm/sparse.c | 25 -
 2 files changed, 25 insertions(+), 2 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 2e5dcfdb0908..ae9023da2ca2 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -8013,7 +8013,6 @@ __offline_isolated_pages(unsigned long start_pfn, 
unsigned long end_pfn)
break;
if (pfn == end_pfn)
return;
-   offline_mem_sections(pfn, end_pfn);
zone = page_zone(pfn_to_page(pfn));
spin_lock_irqsave(&zone->lock, flags);
pfn = start_pfn;
@@ -8051,6 +8050,7 @@ __offline_isolated_pages(unsigned long start_pfn, 
unsigned long end_pfn)
pfn += (1 << order);
}
spin_unlock_irqrestore(&zone->lock, flags);
+   offline_mem_sections(start_pfn, end_pfn);
 }
 #endif
 
diff --git a/mm/sparse.c b/mm/sparse.c
index 58cab483e81b..44978cb18fed 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -623,7 +623,27 @@ void online_mem_sections(unsigned long start_pfn, unsigned 
long end_pfn)
 }
 
 #ifdef CONFIG_MEMORY_HOTREMOVE
-/* Mark all memory sections within the pfn range as online */
+static bool all_pages_in_section_offline(unsigned long section_nr)
+{
+   unsigned long pfn = section_nr_to_pfn(section_nr);
+   struct page *page;
+   int i;
+
+   for (i = 0; i < PAGES_PER_SECTION; i++, pfn++) {
+   if (!pfn_valid(pfn))
+   continue;
+
+   page = pfn_to_page(pfn);
+   if (!PageOffline(page))
+   return false;
+   }
+   return true;
+}
+
+/*
+ * Mark all memory sections within the pfn range as offline (if all pages
+ * of a memory section are already offline)
+ */
 void offline_mem_sections(unsigned long start_pfn, unsigned long end_pfn)
 {
unsigned long pfn;
@@ -639,6 +659,9 @@ void offline_mem_sections(unsigned long start_pfn, unsigned 
long end_pfn)
if (WARN_ON(!valid_section_nr(section_nr)))
continue;
 
+   if (!all_pages_in_section_offline(section_nr))
+   continue;
+
ms = __nr_to_section(section_nr);
ms->section_mem_map &= ~SECTION_IS_ONLINE;
}
-- 
2.14.3

[PATCH RFC 8/8] mm: export more functions used to online/offline memory

2018-04-13 Thread David Hildenbrand

Kernel modules that want to control how/when memory is onlined/offlined
need these functions.

Signed-off-by: David Hildenbrand 
---
 mm/memory_hotplug.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index ac14ea772792..3c374d308cf4 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -979,6 +979,7 @@ int __ref online_pages(unsigned long pfn, unsigned long 
nr_pages, int online_typ
memory_notify(MEM_CANCEL_ONLINE, &arg);
return ret;
 }
+EXPORT_SYMBOL(online_pages);
 #endif /* CONFIG_MEMORY_HOTPLUG_SPARSE */
 
 static void reset_node_present_pages(pg_data_t *pgdat)
@@ -1296,6 +1297,7 @@ bool is_mem_section_removable(unsigned long start_pfn, 
unsigned long nr_pages)
/* All pageblocks in the memory block are likely to be hot-removable */
return true;
 }
+EXPORT_SYMBOL(is_mem_section_removable);
 
 /*
  * Confirm all pages in a range [start, end) belong to the same zone.
@@ -1752,6 +1754,7 @@ int offline_pages(unsigned long start_pfn, unsigned long 
nr_pages)
 {
return __offline_pages(start_pfn, start_pfn + nr_pages);
 }
+EXPORT_SYMBOL(offline_pages);
 #endif /* CONFIG_MEMORY_HOTREMOVE */
 
 /**
@@ -1802,6 +1805,7 @@ int walk_memory_range(unsigned long start_pfn, unsigned 
long end_pfn,
 
return 0;
 }
+EXPORT_SYMBOL(walk_memory_range);
 
 #ifdef CONFIG_MEMORY_HOTREMOVE
 static int check_memblock_offlined_cb(struct memory_block *mem, void *arg)
-- 
2.14.3

[PATCH RFC 6/8] mm: offline_pages() is also limited by MAX_ORDER

2018-04-13 Thread David Hildenbrand

Page blocks might contain references to the next page block. So
a page block cannot be offlined independently. E.g. on x86: page block
size is 2MB, MAX_ORDER -1 (10) allows 4MB allocations.
-> Right now, __offline_isolated_pages() will mark pages in the following
page block as reserved.

Let document offline_pages() while at it.

Signed-off-by: David Hildenbrand 
---
 mm/memory_hotplug.c | 22 --
 1 file changed, 20 insertions(+), 2 deletions(-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 3a8d56476233..1d6054edc241 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1598,11 +1598,14 @@ static int __ref __offline_pages(unsigned long 
start_pfn,
struct zone *zone;
struct memory_notify arg;
 
-   /* at least, alignment against pageblock is necessary */
if (!IS_ALIGNED(start_pfn, pageblock_nr_pages))
return -EINVAL;
+   if (!IS_ALIGNED(start_pfn, (1 << (MAX_ORDER - 1
+   return -EINVAL;
if (!IS_ALIGNED(end_pfn, pageblock_nr_pages))
return -EINVAL;
+   if (!IS_ALIGNED(end_pfn, (1 << (MAX_ORDER - 1
+   return -EINVAL;
/* This makes hotplug much easier...and readable.
   we assume this for now. .*/
if (!test_pages_in_a_zone(start_pfn, end_pfn, &valid_start, &valid_end))
@@ -1699,7 +1702,22 @@ static int __ref __offline_pages(unsigned long start_pfn,
return ret;
 }
 
-/* Must be protected by mem_hotplug_begin() or a device_lock */
+/**
+ * offline_pages - offline pages in a given range (that are currently online)
+ * @start_pfn: start pfn of the memory range
+ * @nr_pages: the number of pages
+ *
+ * This function tries to offline the given pages. The alignment/size that
+ * can be used is max(pageblock_nr_pages, 1 << (MAX_ORDER - 1)).
+ *
+ * Returns 0 if sucessful, -EBUSY if the pages cannot be offlined and
+ * -EINVAL if start_pfn/nr_pages is not properly aligned or not in a zone.
+ * -EINTR is returned if interrupted by a signal.
+ *
+ * Bad things will happen if pages are already offline.
+ *
+ * Must be protected by mem_hotplug_begin() or a device_lock
+ */
 int offline_pages(unsigned long start_pfn, unsigned long nr_pages)
 {
return __offline_pages(start_pfn, start_pfn + nr_pages);
-- 
2.14.3

[PATCH RFC 5/8] mm: only mark section offline when all pages are offline

2018-04-13 Thread David Hildenbrand

If any page is still online, the section should stay online.

Signed-off-by: David Hildenbrand 
---
 mm/page_alloc.c |  2 +-
 mm/sparse.c | 25 -
 2 files changed, 25 insertions(+), 2 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 2e5dcfdb0908..ae9023da2ca2 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -8013,7 +8013,6 @@ __offline_isolated_pages(unsigned long start_pfn, 
unsigned long end_pfn)
break;
if (pfn == end_pfn)
return;
-   offline_mem_sections(pfn, end_pfn);
zone = page_zone(pfn_to_page(pfn));
spin_lock_irqsave(&zone->lock, flags);
pfn = start_pfn;
@@ -8051,6 +8050,7 @@ __offline_isolated_pages(unsigned long start_pfn, 
unsigned long end_pfn)
pfn += (1 << order);
}
spin_unlock_irqrestore(&zone->lock, flags);
+   offline_mem_sections(start_pfn, end_pfn);
 }
 #endif
 
diff --git a/mm/sparse.c b/mm/sparse.c
index 58cab483e81b..44978cb18fed 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -623,7 +623,27 @@ void online_mem_sections(unsigned long start_pfn, unsigned 
long end_pfn)
 }
 
 #ifdef CONFIG_MEMORY_HOTREMOVE
-/* Mark all memory sections within the pfn range as online */
+static bool all_pages_in_section_offline(unsigned long section_nr)
+{
+   unsigned long pfn = section_nr_to_pfn(section_nr);
+   struct page *page;
+   int i;
+
+   for (i = 0; i < PAGES_PER_SECTION; i++, pfn++) {
+   if (!pfn_valid(pfn))
+   continue;
+
+   page = pfn_to_page(pfn);
+   if (!PageOffline(page))
+   return false;
+   }
+   return true;
+}
+
+/*
+ * Mark all memory sections within the pfn range as offline (if all pages
+ * of a memory section are already offline)
+ */
 void offline_mem_sections(unsigned long start_pfn, unsigned long end_pfn)
 {
unsigned long pfn;
@@ -639,6 +659,9 @@ void offline_mem_sections(unsigned long start_pfn, unsigned 
long end_pfn)
if (WARN_ON(!valid_section_nr(section_nr)))
continue;
 
+   if (!all_pages_in_section_offline(section_nr))
+   continue;
+
ms = __nr_to_section(section_nr);
ms->section_mem_map &= ~SECTION_IS_ONLINE;
}
-- 
2.14.3

[PATCH RFC 4/8] kdump: expose PG_offline

2018-04-13 Thread David Hildenbrand

This allows user space to skip pages that are offline when dumping. This is
especially relevant when dealing with pages that have been unplugged in
the context of virtualization, and their backing storage has already
been freed.

Signed-off-by: David Hildenbrand 
---
 kernel/crash_core.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/kernel/crash_core.c b/kernel/crash_core.c
index a93590cdd9e1..d6f21b19aeb3 100644
--- a/kernel/crash_core.c
+++ b/kernel/crash_core.c
@@ -463,6 +463,9 @@ static int __init crash_save_vmcoreinfo_init(void)
 #ifdef CONFIG_HUGETLB_PAGE
VMCOREINFO_NUMBER(HUGETLB_PAGE_DTOR);
 #endif
+#ifdef CONFIG_MEMORY_HOTPLUG
+   VMCOREINFO_NUMBER(PG_offline);
+#endif
 
arch_crash_save_vmcoreinfo();
update_vmcoreinfo_note();
-- 
2.14.3

Re: [PATCH] x86/mm: vmemmap and vmalloc base addressess are usngined longs

2018-04-13 Thread Kirill A. Shutemov

On Thu, Apr 12, 2018 at 02:39:10PM +0200, Jiri Kosina wrote:
> From: Jiri Kosina 
> 
> Commits 9b46a051e4 ("x86/mm: Initialize vmemmap_base at boot-time") and 
> a7412546d8 ("x86/mm: Adjust vmalloc base and size at boot-time") lost the 
> type information for __VMALLOC_BASE_L4, __VMALLOC_BASE_L5, 
> __VMEMMAP_BASE_L4 and __VMEMMAP_BASE_L5 constants.
> 
> Let's declare them explicitly unsigned long again.

It is just cosmetics, right? I mean these literals are 'unsigned long'
anyway.

-- 
 Kirill A. Shutemov

Build error for samples/bpf/ due to commit d0266046ad54 ("x86: Remove FAST_FEATURE_TESTS")

2018-04-13 Thread Jesper Dangaard Brouer

Hi Peter,

Your commit d0266046ad54 ("x86: Remove FAST_FEATURE_TESTS") broke build
for several samples/bpf programs. I'm unsure what the best way forward
is to unbreak these...

The issue is that these samples are build with LLVM/clang (which
doesn't like 'asm goto' constructs).  And they end up including
arch/x86/include/asm/cpufeature.h via a long include path, see build
examples below (through different path to include/linux/thread_info.h).

Maybe Alexei or Daniel have an idea how to work around this?
As tools/testing/selftests/bpf/ does not seem to fail!?

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

Build error#1:
--
clang  -nostdinc -isystem /usr/lib/gcc/x86_64-redhat-linux/7/include 
-I./arch/x86/include -I./arch/x86/include/generated  -I./include 
-I./arch/x86/include/uapi -I./arch/x86/include/generated/uapi -I./include/uapi 
-I./include/generated/uapi -include ./include/linux/kconfig.h  -Isamples/bpf \
-I./tools/testing/selftests/bpf/ \
-D__KERNEL__ -Wno-unused-value -Wno-pointer-sign \
-D__TARGET_ARCH_x86 -Wno-compare-distinct-pointer-types \
-Wno-gnu-variable-sized-type-not-at-end \
-Wno-address-of-packed-member -Wno-tautological-compare \
-Wno-unknown-warning-option  \
-O2 -emit-llvm -c samples/bpf/sockex2_kern.c -o -| llc -march=bpf 
-filetype=obj -o samples/bpf/sockex2_kern.o
In file included from samples/bpf/sockex2_kern.c:3:
In file included from ./include/uapi/linux/in.h:24:
In file included from ./include/linux/socket.h:8:
In file included from ./include/linux/uio.h:13:
In file included from ./include/linux/thread_info.h:38:
In file included from ./arch/x86/include/asm/thread_info.h:53:
./arch/x86/include/asm/cpufeature.h:150:2: error: 'asm goto' constructs are not 
supported yet
asm_volatile_goto("1: jmp 6f\n"
^
./include/linux/compiler-gcc.h:290:42: note: expanded from macro 
'asm_volatile_goto'
#define asm_volatile_goto(x...) do { asm goto(x); asm (""); } while (0)
 ^


Build error#2:
--
clang  -nostdinc -isystem /usr/lib/gcc/x86_64-redhat-linux/7/include 
-I./arch/x86/include -I./arch/x86/include/generated  -I./include 
-I./arch/x86/include/uapi -I./arch/x86/include/generated/uapi -I./include/uapi 
-I./include/generated/uapi -include ./include/linux/kconfig.h  -Isamples/bpf \
-I./tools/testing/selftests/bpf/ \
-D__KERNEL__ -Wno-unused-value -Wno-pointer-sign \
-D__TARGET_ARCH_x86 -Wno-compare-distinct-pointer-types \
-Wno-gnu-variable-sized-type-not-at-end \
-Wno-address-of-packed-member -Wno-tautological-compare \
-Wno-unknown-warning-option  \
-O2 -emit-llvm -c samples/bpf/tracex1_kern.c -o -| llc -march=bpf 
-filetype=obj -o samples/bpf/tracex1_kern.o
In file included from samples/bpf/tracex1_kern.c:7:
In file included from ./include/linux/skbuff.h:19:
In file included from ./include/linux/time.h:6:
In file included from ./include/linux/seqlock.h:36:
In file included from ./include/linux/spinlock.h:51:
In file included from ./include/linux/preempt.h:81:
In file included from ./arch/x86/include/asm/preempt.h:7:
In file included from ./include/linux/thread_info.h:38:
In file included from ./arch/x86/include/asm/thread_info.h:53:
./arch/x86/include/asm/cpufeature.h:150:2: error: 'asm goto' constructs are not 
supported yet
asm_volatile_goto("1: jmp 6f\n"
^
./include/linux/compiler-gcc.h:290:42: note: expanded from macro 
'asm_volatile_goto'
#define asm_volatile_goto(x...) do { asm goto(x); asm (""); } while (0)
 ^


Build error#3:
--
clang  -nostdinc -isystem /usr/lib/gcc/x86_64-redhat-linux/7/include 
-I./arch/x86/include -I./arch/x86/include/generated  -I./include -I./arch/x86
/include/uapi -I./arch/x86/include/generated/uapi -I./include/uapi 
-I./include/generated/uapi -include ./include/linux/kconfig.h  -Isamples/bpf \
-I./tools/testing/selftests/bpf/ \
-D__KERNEL__ -Wno-unused-value -Wno-pointer-sign \
-D__TARGET_ARCH_x86 -Wno-compare-distinct-pointer-types \
-Wno-gnu-variable-sized-type-not-at-end \
-Wno-address-of-packed-member -Wno-tautological-compare \
-Wno-unknown-warning-option  \
-O2 -emit-llvm -c samples/bpf/xdp1_kern.c -o -| llc -march=bpf 
-filetype=obj -o samples/bpf/xdp1_kern.o
In file included from samples/bpf/xdp1_kern.c:9:
In file included from ./include/linux/in.h:23:
In file included from ./include/uapi/linux/in.h:24:
In file included from ./include/linux/socket.h:8:
In file included from ./include/linux/uio.h:13:
In file included from ./include/linux/thread_info.h:38:
In file included from ./arch/x86/include/asm/thread_info.h:53:
./arch/x86/include/asm/cpufeature.h:150:2: error: 'asm goto' constructs are not 
supported yet
asm_volatile_goto("1: jmp 6f\n

Re: [PATCH v2 12/17] kvm: arm/arm64: Expose supported physical address limit for VM

2018-04-13 Thread Peter Maydell

On 27 March 2018 at 14:15, Suzuki K Poulose  wrote:
> Expose the maximum physical address size supported by the host
> for a VM. This could be later used by the userspace to choose the
> appropriate size for a given VM. The limit is determined as the
> minimum of actual CPU limit, the kernel limit (i.e, either 48 or 52)
> and the stage2 page table support limit (which is 40bits at the moment).
> For backward compatibility, we support a minimum of 40bits. The limit
> will be lifted as we add support for the stage2 to support the host
> kernel PA limit.
>
> This value may be different from what is exposed to the VM via
> CPU ID registers. The limit only applies to the stage2 page table.
>
> Cc: Christoffer Dall 
> Cc: Marc Zyngier 
> Cc: Peter Maydel 
> Signed-off-by: Suzuki K Poulose 
> ---
>  Documentation/virtual/kvm/api.txt | 14 ++
>  arch/arm/include/asm/kvm_mmu.h|  5 +
>  arch/arm64/include/asm/kvm_mmu.h  |  5 +
>  include/uapi/linux/kvm.h  |  6 ++
>  virt/kvm/arm/arm.c|  6 ++
>  5 files changed, 36 insertions(+)
>
> diff --git a/Documentation/virtual/kvm/api.txt 
> b/Documentation/virtual/kvm/api.txt
> index 792fa87..55908a8 100644
> --- a/Documentation/virtual/kvm/api.txt
> +++ b/Documentation/virtual/kvm/api.txt
> @@ -3500,6 +3500,20 @@ Returns: 0 on success; -1 on error
>  This ioctl can be used to unregister the guest memory region registered
>  with KVM_MEMORY_ENCRYPT_REG_REGION ioctl above.
>
> +4.113 KVM_ARM_GET_MAX_VM_PHYS_SHIFT
> +Capability: basic
> +Architectures: arm, arm64
> +Type: system ioctl
> +Parameters: none
> +Returns: log2(Maximum physical address space size) supported by the
> +hyperviosr.

typo: "hypervisor".

> +
> +This ioctl can be used to identify the maximum physical address space size
> +supported by the hypervisor.

Is that the physical address space on the host, or the physical
address space size we present to the guest?

> The returned value indicates the maximum size
> +of the address that can be resolved by the stage2 translation table on
> +arm/arm64. On arm64, the value is decided based on the host kernel
> +configuration and the system wide safe value of ID_AA64MMFR0_EL1:PARange.
> +This may not match the value exposed to the VM in CPU ID registers.

Isn't it likely to confuse the guest if we lie to it about the PA range it
sees? When would the two values differ?

Do we also need a 'set' operation, so userspace can create a VM
that has a 40 bit userspace on a CPU that supports more than that,
or does it just work?

What's the x86 API for KVM to tell userspace about physical address
range restrictions?

thanks
-- PMM

Re: [RFC tip/locking/lockdep v6 19/20] rcu: Equip sleepable RCU with lockdep dependency graph checks

2018-04-13 Thread Boqun Feng

On Thu, Apr 12, 2018 at 11:12:17AM +0200, Peter Zijlstra wrote:
> On Thu, Apr 12, 2018 at 10:12:33AM +0800, Boqun Feng wrote:
> > A trivial fix/hack would be adding local_irq_disable() and
> > local_irq_enable() around srcu_lock_sync() like:
> > 
> > static inline void srcu_lock_sync(struct lockdep_map *map)
> > {
> > local_irq_disable();
> > lock_map_acquire(map);
> > lock_map_release(map);
> > local_irq_enable();
> > }
> > 
> > However, it might be better, if lockdep could provide some annotation
> > API for such an empty critical section to say the grap-and-drop is
> > atomic. Something like:
> > 
> > /*
> >  * Annotate a wait point for all previous critical section to
> >  * go out.
> >  * 
> >  * This won't make @map a irq unsafe lock, no matter it's called
> >  * w/ or w/o irq disabled.
> >  */
> > lock_wait_unlock(struct lockdep_map *map, ..)
> > 
> > And in this primitive, we do something similar like
> > lock_acquire()+lock_release(). This primitive could be used elsewhere,
> > as I bebieve we have several empty grab-and-drop critical section for
> > lockdep annotations, e.g. in start_flush_work().
> > 
> > Thoughts?
> > 
> > This cerntainly requires a bit more work, in the meanwhile, I will add
> > another self testcase which has a srcu_read_lock() called in irq.
> 
> Yeah, I've never really bothered to clean those things up, but I don't
> see any reason to stop you from doing it ;-)
> 
> As to the initial pattern with disabling IRQs, I think I've seen code
> like that before, and in general performance isn't a top priority

Yeah, I saw we used that pattern in del_timer_sync()

> (within reason) when you're running lockdep kernels, so I've usually let
> it be.

Turns out it's not very hard to write a working version of
lock_wait_unlock() ;-) Just call __lock_acquire() and __lock_release()
back-to-back with the @hardirqoff for __lock_acquire() to be 1:

/*
 * lock_sync() - synchronize with all previous critical sections to 
finish.
 *
 * Simply a acquire+release annotation with hardirqoff is true, because 
no lock
 * is actually held, so this annotaion alone is safe to be interrupted 
as if
 * irqs are off
 */
void lock_sync(struct lockdep_map *lock, unsigned subclass, int read,
   int check, struct lockdep_map *nest_lock, unsigned long 
ip)
{
unsigned long flags;

if (unlikely(current->lockdep_recursion))
return;

raw_local_irq_save(flags);
check_flags(flags);

current->lockdep_recursion = 1;
__lock_acquire(lock, subclass, 0, read, check, 1, nest_lock, 
ip, 0, 0);
if (__lock_release(lock, 0, ip))
check_chain_key(current);

current->lockdep_recursion = 0;
raw_local_irq_restore(flags);
}
EXPORT_SYMBOL_GPL(lock_sync);

I rename as lock_sync(), because most of the time, we annotate with this
for a "sync point" with other critical sections. We can avoid some
overhead if we refactor __lock_acquire() and __lock_release() with some
helper functions, but I think this version is good enough for now, at
least better than disabling IRQs around lock_map_acquire() +
lock_map_release() ;-)

Thoughts?

Regards,
Boqun



signature.asc
Description: PGP signature

[GIT PULL] s390 patches for the 4.17 merge window #2

2018-04-13 Thread Martin Schwidefsky

Hi Linus,

please pull from the 'for-linus' branch of

git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux.git for-linus

to receive the following updates:

Three notable larger changes next to the usual bug fixing:

* Update the email addresses in MAINTAINERS for the s390 folks to use
  the simpler linux.ibm.com domain instead of the old linux.vnet.ibm.com

* An update for the zcrypt device driver that removes some old and
  obsolete interfaces and add support for up to 256 crypto adapters

* A rework of the IPL aka boot code

Harald Freudenberger (6):
  s390/crypto: Adjust s390 aes and paes cipher priorities
  s390/zcrypt: remove unused functions and declarations
  s390/zcrypt: Make ap init functions static.
  s390/zcrypt: Remove deprecated ioctls.
  s390/zcrypt: Remove deprecated zcrypt proc interface.
  s390/zcrypt: Support up to 256 crypto adapters.

Heiko Carstens (2):
  s390/compat: fix setup_frame32
  MAINTAINERS: update s390 maintainers email addresses

Julian Wiedmann (3):
  s390/ccwgroup: require at least one ccw device
  s390/qdio: clear intparm during shutdown
  s390/qdio: lock device while installing IRQ handler

Martin Schwidefsky (1):
  s390: correct nospec auto detection init order

Vasily Gorbik (11):
  s390/ipl: ensure loadparm valid flag is set
  s390/ipl: unite diag308 and scsi boot ipl blocks
  s390/ipl: get rid of ipl_ssid and ipl_devno
  s390/ipl: move ipl_flags to ipl.c
  s390/ipl: rely on diag308 store to get ipl info
  s390/ipl: correct ipl parmblock valid checks
  s390/ipl: avoid adding scpdata to cmdline during ftp/dvd boot
  s390: assume diag308 set always works
  s390/ipl: remove non-existing functions declaration
  s390/ipl: correct kdump reipl block checksum calculation
  s390/ipl: remove reipl_method and dump_method

 MAINTAINERS   |  34 +--
 arch/s390/boot/compressed/misc.c  |  23 --
 arch/s390/crypto/aes_s390.c   |   8 +-
 arch/s390/crypto/paes_s390.c  |   8 +-
 arch/s390/include/asm/ap.h|   6 +-
 arch/s390/include/asm/cio.h   |  10 -
 arch/s390/include/asm/ipl.h   |  25 +-
 arch/s390/include/asm/nospec-branch.h |   1 +
 arch/s390/include/asm/reset.h |  20 --
 arch/s390/include/uapi/asm/zcrypt.h   | 163 ++--
 arch/s390/kernel/compat_signal.c  |   2 +-
 arch/s390/kernel/early.c  |  14 +-
 arch/s390/kernel/ipl.c| 376 +--
 arch/s390/kernel/machine_kexec.c  |   2 +-
 arch/s390/kernel/nospec-branch.c  |   8 +-
 arch/s390/kernel/reipl.S  |  87 ---
 arch/s390/kernel/relocate_kernel.S|  54 +---
 arch/s390/kernel/setup.c  |   3 +
 drivers/s390/cio/ccwgroup.c   |   5 +-
 drivers/s390/cio/cio.c| 257 ---
 drivers/s390/cio/ioasm.c  |  24 --
 drivers/s390/cio/ioasm.h  |   1 -
 drivers/s390/cio/qdio_main.c  |   4 +-
 drivers/s390/cio/qdio_setup.c |   2 +
 drivers/s390/crypto/ap_bus.c  |  32 +--
 drivers/s390/crypto/ap_bus.h  |   5 +-
 drivers/s390/crypto/ap_debug.h|   3 -
 drivers/s390/crypto/pkey_api.c|  41 +--
 drivers/s390/crypto/zcrypt_api.c  | 471 ++
 drivers/s390/crypto/zcrypt_api.h  |  26 +-
 30 files changed, 357 insertions(+), 1358 deletions(-)
 delete mode 100644 arch/s390/include/asm/reset.h

[PATCH RFC 2/8] mm: introduce PG_offline

2018-04-13 Thread David Hildenbrand

online_pages()/offline_pages() theoretically allows us to work on
sub-section sizes. This is especially relevant in the context of
virtualization. It e.g. allows us to add/remove memory to Linux in a VM in
4MB chunks.

While the whole section is marked as online/offline, we have to know
the state of each page. E.g. to not read memory that is not online
during kexec() or to properly mark a section as offline as soon as all
contained pages are offline.

Signed-off-by: David Hildenbrand 
---
 include/linux/page-flags.h | 10 ++
 include/trace/events/mmflags.h |  9 -
 2 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index e34a27727b9a..8ebc4bad7824 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -49,6 +49,9 @@
  * PG_hwpoison indicates that a page got corrupted in hardware and contains
  * data with incorrect ECC bits that triggered a machine check. Accessing is
  * not safe since it may cause another machine check. Don't touch!
+ *
+ * PG_offline indicates that a page is offline and the backing storage
+ * might already have been removed (virtualization). Don't touch!
  */
 
 /*
@@ -100,6 +103,9 @@ enum pageflags {
 #if defined(CONFIG_IDLE_PAGE_TRACKING) && defined(CONFIG_64BIT)
PG_young,
PG_idle,
+#endif
+#ifdef CONFIG_MEMORY_HOTPLUG
+   PG_offline, /* Page is offline. Don't touch */
 #endif
__NR_PAGEFLAGS,
 
@@ -381,6 +387,10 @@ TESTCLEARFLAG(Young, young, PF_ANY)
 PAGEFLAG(Idle, idle, PF_ANY)
 #endif
 
+#ifdef CONFIG_MEMORY_HOTPLUG
+PAGEFLAG(Offline, offline, PF_ANY)
+#endif
+
 /*
  * On an anonymous page mapped into a user virtual memory area,
  * page->mapping points to its anon_vma, not to a struct address_space;
diff --git a/include/trace/events/mmflags.h b/include/trace/events/mmflags.h
index a81cffb76d89..14c31209e34a 100644
--- a/include/trace/events/mmflags.h
+++ b/include/trace/events/mmflags.h
@@ -79,6 +79,12 @@
 #define IF_HAVE_PG_IDLE(flag,string)
 #endif
 
+#ifdef CONFIG_MEMORY_HOTPLUG
+#define IF_HAVE_PG_OFFLINE(flag,string) ,{1UL << flag, string}
+#else
+#define IF_HAVE_PG_OFFLINE(flag,string)
+#endif
+
 #define __def_pageflag_names   \
{1UL << PG_locked,  "locked"},  \
{1UL << PG_waiters, "waiters"   },  \
@@ -104,7 +110,8 @@ IF_HAVE_PG_MLOCK(PG_mlocked,"mlocked"   
)   \
 IF_HAVE_PG_UNCACHED(PG_uncached,   "uncached"  )   \
 IF_HAVE_PG_HWPOISON(PG_hwpoison,   "hwpoison"  )   \
 IF_HAVE_PG_IDLE(PG_young,  "young" )   \
-IF_HAVE_PG_IDLE(PG_idle,   "idle"  )
+IF_HAVE_PG_IDLE(PG_idle,   "idle"  )   \
+IF_HAVE_PG_OFFLINE(PG_offline, "offline"   )
 
 #define show_page_flags(flags) \
(flags) ? __print_flags(flags, "|", \
-- 
2.14.3

[PATCH RFC 1/8] mm/memory_hotplug: Revert "mm/memory_hotplug: optimize memory hotplug"

2018-04-13 Thread David Hildenbrand

Conflicts with the possibility to online sub-section chunks. Revert it
for now.

Signed-off-by: David Hildenbrand 
---
 drivers/base/node.c|  2 --
 include/linux/memory.h |  1 -
 mm/memory_hotplug.c| 27 +++
 mm/page_alloc.c| 28 ++--
 mm/sparse.c|  8 +---
 5 files changed, 38 insertions(+), 28 deletions(-)

diff --git a/drivers/base/node.c b/drivers/base/node.c
index 7a3a580821e0..92b00a7e6a02 100644
--- a/drivers/base/node.c
+++ b/drivers/base/node.c
@@ -407,8 +407,6 @@ int register_mem_sect_under_node(struct memory_block 
*mem_blk, int nid,
 
if (!mem_blk)
return -EFAULT;
-
-   mem_blk->nid = nid;
if (!node_online(nid))
return 0;
 
diff --git a/include/linux/memory.h b/include/linux/memory.h
index 31ca3e28b0eb..9f8cd856ca1e 100644
--- a/include/linux/memory.h
+++ b/include/linux/memory.h
@@ -33,7 +33,6 @@ struct memory_block {
void *hw;   /* optional pointer to fw/hw data */
int (*phys_callback)(struct memory_block *);
struct device dev;
-   int nid;/* NID for this memory block */
 };
 
 int arch_get_memory_phys_device(unsigned long start_pfn);
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index f74826cdceea..d4474781c799 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -250,6 +250,7 @@ static int __meminit __add_section(int nid, unsigned long 
phys_start_pfn,
struct vmem_altmap *altmap, bool want_memblock)
 {
int ret;
+   int i;
 
if (pfn_valid(phys_start_pfn))
return -EEXIST;
@@ -258,6 +259,23 @@ static int __meminit __add_section(int nid, unsigned long 
phys_start_pfn,
if (ret < 0)
return ret;
 
+   /*
+* Make all the pages reserved so that nobody will stumble over half
+* initialized state.
+* FIXME: We also have to associate it with a node because page_to_nid
+* relies on having page with the proper node.
+*/
+   for (i = 0; i < PAGES_PER_SECTION; i++) {
+   unsigned long pfn = phys_start_pfn + i;
+   struct page *page;
+   if (!pfn_valid(pfn))
+   continue;
+
+   page = pfn_to_page(pfn);
+   set_page_node(page, nid);
+   SetPageReserved(page);
+   }
+
if (!want_memblock)
return 0;
 
@@ -891,15 +909,8 @@ int __ref online_pages(unsigned long pfn, unsigned long 
nr_pages, int online_typ
int nid;
int ret;
struct memory_notify arg;
-   struct memory_block *mem;
-
-   /*
-* We can't use pfn_to_nid() because nid might be stored in struct page
-* which is not yet initialized. Instead, we find nid from memory block.
-*/
-   mem = find_memory_block(__pfn_to_section(pfn));
-   nid = mem->nid;
 
+   nid = pfn_to_nid(pfn);
/* associate pfn range with the zone */
zone = move_pfn_range(online_type, nid, pfn, nr_pages);
 
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 905db9d7962f..647c8c6dd4d1 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1172,9 +1172,10 @@ static void free_one_page(struct zone *zone,
 }
 
 static void __meminit __init_single_page(struct page *page, unsigned long pfn,
-   unsigned long zone, int nid)
+   unsigned long zone, int nid, bool zero)
 {
-   mm_zero_struct_page(page);
+   if (zero)
+   mm_zero_struct_page(page);
set_page_links(page, zone, nid, pfn);
init_page_count(page);
page_mapcount_reset(page);
@@ -1188,6 +1189,12 @@ static void __meminit __init_single_page(struct page 
*page, unsigned long pfn,
 #endif
 }
 
+static void __meminit __init_single_pfn(unsigned long pfn, unsigned long zone,
+   int nid, bool zero)
+{
+   return __init_single_page(pfn_to_page(pfn), pfn, zone, nid, zero);
+}
+
 #ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
 static void __meminit init_reserved_page(unsigned long pfn)
 {
@@ -1206,7 +1213,7 @@ static void __meminit init_reserved_page(unsigned long 
pfn)
if (pfn >= zone->zone_start_pfn && pfn < zone_end_pfn(zone))
break;
}
-   __init_single_page(pfn_to_page(pfn), pfn, zid, nid);
+   __init_single_pfn(pfn, zid, nid, true);
 }
 #else
 static inline void init_reserved_page(unsigned long pfn)
@@ -1523,7 +1530,7 @@ static unsigned long  __init deferred_init_pages(int nid, 
int zid,
} else {
page++;
}
-   __init_single_page(page, pfn, zid, nid);
+   __init_single_page(page, pfn, zid, nid, true);
nr_pages++;
}
return (nr_pages);
@@ -5460,7 +5467,6 @@ void __meminit memmap_init_zone(unsigned long size, int 
nid, un

[PATCH RFC 3/8] mm: use PG_offline in online/offlining code

2018-04-13 Thread David Hildenbrand

Let's mark all offline pages with PG_offline. We'll continue to mark
them reserved.

Signed-off-by: David Hildenbrand 
---
 drivers/hv/hv_balloon.c |  2 +-
 mm/memory_hotplug.c | 10 ++
 mm/page_alloc.c |  5 -
 3 files changed, 11 insertions(+), 6 deletions(-)

diff --git a/drivers/hv/hv_balloon.c b/drivers/hv/hv_balloon.c
index b3e9f13f8bc3..04d98d9b6191 100644
--- a/drivers/hv/hv_balloon.c
+++ b/drivers/hv/hv_balloon.c
@@ -893,7 +893,7 @@ static unsigned long handle_pg_range(unsigned long pg_start,
 * backed previously) online too.
 */
if (start_pfn > has->start_pfn &&
-   !PageReserved(pfn_to_page(start_pfn - 1)))
+   !PageOffline(pfn_to_page(start_pfn - 1)))
hv_bring_pgs_online(has, start_pfn, pgs_ol);
 
}
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index d4474781c799..3a8d56476233 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -260,8 +260,8 @@ static int __meminit __add_section(int nid, unsigned long 
phys_start_pfn,
return ret;
 
/*
-* Make all the pages reserved so that nobody will stumble over half
-* initialized state.
+* Make all the pages offline and reserved so that nobody will stumble
+* over half initialized state.
 * FIXME: We also have to associate it with a node because page_to_nid
 * relies on having page with the proper node.
 */
@@ -274,6 +274,7 @@ static int __meminit __add_section(int nid, unsigned long 
phys_start_pfn,
page = pfn_to_page(pfn);
set_page_node(page, nid);
SetPageReserved(page);
+   SetPageOffline(page);
}
 
if (!want_memblock)
@@ -669,6 +670,7 @@ EXPORT_SYMBOL_GPL(__online_page_increment_counters);
 
 void __online_page_free(struct page *page)
 {
+   ClearPageOffline(page);
__free_reserved_page(page);
 }
 EXPORT_SYMBOL_GPL(__online_page_free);
@@ -687,7 +689,7 @@ static int online_pages_range(unsigned long start_pfn, 
unsigned long nr_pages,
unsigned long onlined_pages = *(unsigned long *)arg;
struct page *page;
 
-   if (PageReserved(pfn_to_page(start_pfn)))
+   if (PageOffline(pfn_to_page(start_pfn)))
for (i = 0; i < nr_pages; i++) {
page = pfn_to_page(start_pfn + i);
(*online_page_callback)(page);
@@ -1437,7 +1439,7 @@ do_migrate_range(unsigned long start_pfn, unsigned long 
end_pfn)
 }
 
 /*
- * remove from free_area[] and mark all as Reserved.
+ * remove from free_area[] and mark all as Reserved and Offline.
  */
 static int
 offline_isolated_pages_cb(unsigned long start, unsigned long nr_pages,
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 647c8c6dd4d1..2e5dcfdb0908 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -8030,6 +8030,7 @@ __offline_isolated_pages(unsigned long start_pfn, 
unsigned long end_pfn)
if (unlikely(!PageBuddy(page) && PageHWPoison(page))) {
pfn++;
SetPageReserved(page);
+   SetPageOffline(page);
continue;
}
 
@@ -8043,8 +8044,10 @@ __offline_isolated_pages(unsigned long start_pfn, 
unsigned long end_pfn)
list_del(&page->lru);
rmv_page_order(page);
zone->free_area[order].nr_free--;
-   for (i = 0; i < (1 << order); i++)
+   for (i = 0; i < (1 << order); i++) {
SetPageReserved((page+i));
+   SetPageOffline(page + i);
+   }
pfn += (1 << order);
}
spin_unlock_irqrestore(&zone->lock, flags);
-- 
2.14.3

Re: [PATCH] netfilter: fix CONFIG_NF_REJECT_IPV6=m link error

2018-04-13 Thread Pablo Neira Ayuso

On Mon, Apr 09, 2018 at 04:43:40PM +0200, Arnd Bergmann wrote:
> On Mon, Apr 9, 2018 at 4:37 PM, Pablo Neira Ayuso  wrote:
> > Hi Arnd,
> >
> > On Mon, Apr 09, 2018 at 12:53:12PM +0200, Arnd Bergmann wrote:
> >> We get a new link error with CONFIG_NFT_REJECT_INET=y and 
> >> CONFIG_NF_REJECT_IPV6=m
> >
> > I think we can update NFT_REJECT_INET so it depends on NFT_REJECT_IPV4
> > and NFT_REJECT_IPV6. This doesn't allow here CONFIG_NFT_REJECT_INET=y
> > and CONFIG_NF_REJECT_IPV6=m.
> >
> > I mean, just like we do with NFT_FIB_INET.
> 
> That can only work if NFT_REJECT_INET can be made a 'tristate' symbol
> again, so that code gets built as a loadable module if
> CONFIG_NF_REJECT_IPV6=m.
> 
> > BTW, I think this problem has been is not related to the recent patch,
> > but something older that kbuild robot has triggered more easily for
> > some reason?
> 
> 02c7b25e5f54 is the one that turned NF_TABLES_INET into a 'bool'
> symbol. NFT_REJECT depends on NF_TABLES_INET, so it used to
> restricted to a loadable module with IPV6=m, but can now be
> built-in, which causes that link error.

Still one more spin on this, I would like to see if we have a way to
fix this by simplifing things a bit.

Would this one I'm attaching would work?

Thanks for you patience.
>From af07bc7ff5d34ce54e7913233912c058e6699e3c Mon Sep 17 00:00:00 2001
From: Pablo Neira Ayuso 
Date: Fri, 13 Apr 2018 10:48:40 +0200
Subject: [PATCH] netfilter: CONFIG_NF_REJECT_IPV{4,6} becomes bool toggle

Arnd reports that we get a new link error with CONFIG_NFT_REJECT_INET=y
and CONFIG_NF_REJECT_IPV6=m after larger parts of the nftables modules
are linked together:

net/netfilter/nft_reject_inet.o: In function `nft_reject_inet_eval':
nft_reject_inet.c:(.text+0x17c): undefined reference to `nf_send_unreach6'
nft_reject_inet.c:(.text+0x190): undefined reference to `nf_send_reset6'

The problem is that with NF_TABLES_INET set, we implicitly try to use
the ipv6 version as well for NFT_REJECT, but when CONFIG_IPV6 is set to
a loadable module, it's impossible to reach that.

This patch fixes this problem by building-in nf_reject_ipv{4,6}.c, IPv6
symbol dependencies for the IPv6 reject infrastructure are located in
exthdrs_core.c, ip6_checksum.c and ip6_icmp.c which are also built-in,
so let's do the same to simplify this.

Fixes: 02c7b25e5f54 ("netfilter: nf_tables: build-in filter chain type")
Reported-by: Arnd Bergmann 
Signed-off-by: Pablo Neira Ayuso 
---
 net/ipv4/netfilter/Kconfig | 3 +--
 net/ipv6/netfilter/Kconfig | 3 +--
 net/netfilter/Kconfig  | 2 ++
 3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/net/ipv4/netfilter/Kconfig b/net/ipv4/netfilter/Kconfig
index 280048e1e395..3e4e0ae2a9a1 100644
--- a/net/ipv4/netfilter/Kconfig
+++ b/net/ipv4/netfilter/Kconfig
@@ -104,8 +104,7 @@ config NF_LOG_IPV4
 	select NF_LOG_COMMON
 
 config NF_REJECT_IPV4
-	tristate "IPv4 packet rejection"
-	default m if NETFILTER_ADVANCED=n
+	bool "IPv4 packet rejection"
 
 config NF_NAT_IPV4
 	tristate "IPv4 NAT"
diff --git a/net/ipv6/netfilter/Kconfig b/net/ipv6/netfilter/Kconfig
index ccbfa83e4bb0..1e5d040a60b8 100644
--- a/net/ipv6/netfilter/Kconfig
+++ b/net/ipv6/netfilter/Kconfig
@@ -87,8 +87,7 @@ config NF_DUP_IPV6
 	  packet to be rerouted to another destination.
 
 config NF_REJECT_IPV6
-	tristate "IPv6 packet rejection"
-	default m if NETFILTER_ADVANCED=n
+	bool "IPv6 packet rejection"
 
 config NF_LOG_IPV6
 	tristate "IPv6 packet logging"
diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig
index 4189f574f5ec..d7b3272fe821 100644
--- a/net/netfilter/Kconfig
+++ b/net/netfilter/Kconfig
@@ -609,6 +609,8 @@ config NFT_REJECT
 
 config NFT_REJECT_INET
 	depends on NF_TABLES_INET
+	select NF_REJECT_IPV4
+	select NF_REJECT_IPV6
 	default NFT_REJECT
 	tristate
 
-- 
2.11.0

Re: Some minor fixes for perf user tools

2018-04-13 Thread Jiri Olsa

On Fri, Apr 06, 2018 at 01:38:08PM -0700, Andi Kleen wrote:
> This patchkit fixes some random minor issues in the perf user tools

Acked-by: Jiri Olsa 

thanks,
jirka

[PATCH 0/2] tracing/events: block: bring more on a par with blktrace

2018-04-13 Thread Steffen Maier

I had the need to understand I/O request processing in detail.
But I also had the need to enrich block traces with other trace events
including my own dynamic kprobe events. So I preferred block trace events
over blktrace to get everything nicely sorted into one ftrace output.
However, I missed device filtering for (un)plug events and also
the difference between the two flavors of unplug.

The first two patches bring block trace events closer to their
counterpart in blktrace tooling.

The last patch is just an RFC. I still kept it in this patch set because
it is inspired by PATCH 2/2.

Steffen Maier (3):
  tracing/events: block: track and print if unplug was explicit or
schedule
  tracing/events: block: dev_t via driver core for plug and unplug
events
  tracing/events: block: also try to get dev_t via driver core for some
events

 include/trace/events/block.h | 33 -
 1 file changed, 28 insertions(+), 5 deletions(-)

-- 
2.13.5

< 1 2 3 4 5 6 7 >

301 - 400 of 634 matches

Mail list logo