date:20170227

Re: [PATCH v4 03/11] dt-bindings: perf: hisi: Add Devicetree bindings for Hisilicon SoC PMU

2017-02-27 Thread Rob Herring

On Sun, Feb 19, 2017 at 01:50:21PM -0500, Anurup M wrote:
> 1) Device tree bindings for Hisilicon SoC PMU.
> 2) Add example for Hisilicon L3 cache and MN PMU.
> 3) Add child nodes of L3C and MN in djtag bindings example.
> 
> Signed-off-by: Anurup M 
> Signed-off-by: Shaokun Zhang 
> ---
>  .../devicetree/bindings/arm/hisilicon/djtag.txt|  25 +
>  .../devicetree/bindings/arm/hisilicon/pmu.txt  | 103 
> +
>  2 files changed, 128 insertions(+)
>  create mode 100644 Documentation/devicetree/bindings/arm/hisilicon/pmu.txt

Acked-by: Rob Herring

Re: [PATCH] jump_label: align jump_entry table to at least 4-bytes

2017-02-27 Thread David Daney


On 02/27/2017 10:49 AM, Jason Baron wrote:

The core jump_label code makes use of the 2 lower bits of the
static_key::[type|entries|next] field. Thus, ensure that the jump_entry
table is at least 4-byte aligned.


[...]

diff --git a/arch/mips/include/asm/jump_label.h 
b/arch/mips/include/asm/jump_label.h
index e77672539e8e..243791f3ae71 100644
--- a/arch/mips/include/asm/jump_label.h
+++ b/arch/mips/include/asm/jump_label.h
@@ -31,6 +31,7 @@ static __always_inline bool arch_static_branch(struct 
static_key *key, bool bran
asm_volatile_goto("1:\t" NOP_INSN "\n\t"
"nop\n\t"
".pushsection __jump_table,  \"aw\"\n\t"
+   ".balign 4\n\t"
WORD_INSN " 1b, %l[l_yes], %0\n\t"
".popsection\n\t"
: :  "i" (&((char *)key)[branch]) : : l_yes);
@@ -45,6 +46,7 @@ static __always_inline bool arch_static_branch_jump(struct 
static_key *key, bool
asm_volatile_goto("1:\tj %l[l_yes]\n\t"
"nop\n\t"
".pushsection __jump_table,  \"aw\"\n\t"
+   ".balign 4\n\t"
WORD_INSN " 1b, %l[l_yes], %0\n\t"
".popsection\n\t"
: :  "i" (&((char *)key)[branch]) : : l_yes);



I will speak only for the MIPS part.

If the section is not already properly aligned, this change will add 
padding, which is probably not what we want.


Have you ever seen a problem with misalignment in the real world?

If so, I think a better approach might be to set properties on the 
__jump_table section to force the proper alignment, or do something in 
the linker script.


David Daney

Re: [PATCH V2] vhost: introduce O(1) vq metadata cache

2017-02-27 Thread Michael S. Tsirkin

On Wed, Feb 15, 2017 at 01:37:17PM +0800, Jason Wang wrote:
> 
> 
> On 2016年12月14日 17:53, Jason Wang wrote:
> > When device IOTLB is enabled, all address translations were stored in
> > interval tree. O(lgN) searching time could be slow for virtqueue
> > metadata (avail, used and descriptors) since they were accessed much
> > often than other addresses. So this patch introduces an O(1) array
> > which points to the interval tree nodes that store the translations of
> > vq metadata. Those array were update during vq IOTLB prefetching and
> > were reset during each invalidation and tlb update. Each time we want
> > to access vq metadata, this small array were queried before interval
> > tree. This would be sufficient for static mappings but not dynamic
> > mappings, we could do optimizations on top.
> > 
> > Test were done with l2fwd in guest (2M hugepage):
> > 
> > noiommu  | before| after
> > tx 1.32Mpps | 1.06Mpps(82%) | 1.30Mpps(98%)
> > rx 2.33Mpps | 1.46Mpps(63%) | 2.29Mpps(98%)
> > 
> > We can almost reach the same performance as noiommu mode.
> > 
> > Signed-off-by: Jason Wang
> > ---
> > Changes from V1:
> > - silent 32bit build warning
> 
> ping

Could you rebase pls?
I pushed my tree into linux next.

-- 
MST

Re: [PATCH] Add pidfs filesystem

2017-02-27 Thread Michael Kerrisk

[CC += linux-...@vger.kernel.org]

Hi Alexey

This is a change to the kernel-user-space API. Please CC
linux-...@vger.kernel.org on any future iterations of this patch.

Thanks,

Michael


On Sat, Feb 18, 2017 at 11:53 PM, Alexey Gladkov
 wrote:
> The pidfs filesystem contains a subset of the /proc file system which
> contains only information about the processes.
>
> Some of the container virtualization systems are mounted /proc inside
> the container. This is done in most cases to operate with information
> about the processes. Knowing that /proc filesystem is not fully
> virtualized they are mounted on top of dangerous places empty files or
> directories (for exmaple /proc/kcore, /sys/firmware, etc.).
>
> The structure of this filesystem is dynamic and any module can create a
> new object which will not necessarily be virtualized. There are
> proprietary modules that aren't in the mainline whose work we can not
> verify.
>
> This opens up a potential threat to the system. The developers of the
> virtualization system can't predict all dangerous places in /proc by
> definition.
>
> A more effective solution would be to mount into the container only what
> is necessary and ignore the rest.
>
> Right now there is the opportunity to pass in the container any port of
> the /proc filesystem using mount --bind expect the pids.
>
> This patch allows to mount only the part of /proc related to pids
> without rest objects. Since this is an addon to /proc, flags applied to
> /proc have an effect on this pidfs filesystem.
>
> Why not implement it as another flag to /proc ?
>
> The /proc flags is stored in the pid_namespace and are global for
> namespace. It means that if you add a flag to hide all except the pids,
> then it will act on all mounted instances of /proc.
>
> Originally the idea was that the container will be mounted only pidfs
> and additional required files will be mounted on top using the
> overlayfs. But I found out that /proc does not support overlayfs and
> does not allow to mount anything on top or under it.
>
> My question is whether it's possible to add overlayfs support for /proc?
>
> Cc: Kirill A. Shutemov 
> Signed-off-by: Alexey Gladkov 
> ---
>  Documentation/filesystems/pidfs.txt | 16 
>  fs/proc/Kconfig |  8 
>  fs/proc/inode.c |  8 +++-
>  fs/proc/internal.h  |  2 +
>  fs/proc/root.c  | 76 
> ++---
>  fs/proc/self.c  |  6 +++
>  fs/proc/thread_self.c   |  6 +++
>  include/linux/pid_namespace.h   |  5 +++
>  8 files changed, 119 insertions(+), 8 deletions(-)
>  create mode 100644 Documentation/filesystems/pidfs.txt
>
> diff --git a/Documentation/filesystems/pidfs.txt 
> b/Documentation/filesystems/pidfs.txt
> new file mode 100644
> index 000..ce958a5
> --- /dev/null
> +++ b/Documentation/filesystems/pidfs.txt
> @@ -0,0 +1,16 @@
> +The PIDFS Filesystem
> +
> +
> +The pidfs filesystem contains a subset of the /proc file system which 
> contains
> +only information about the processes. The link self points to the process
> +reading the file system. All other special files and directories in /proc are
> +not available in this filesystem.
> +
> +The pidfs is not an independent filesystem, its implementation shares code
> +with /proc.
> +
> +All mount options applicable to /proc filesystem are also applicable
> +to pidfs filesystem. For example, access to the information in /proc/[pid]
> +directories can be restricted using hidepid option.
> +
> +To get more information about the processes read the proc.txt
> diff --git a/fs/proc/Kconfig b/fs/proc/Kconfig
> index 1ade120..fa568f6 100644
> --- a/fs/proc/Kconfig
> +++ b/fs/proc/Kconfig
> @@ -43,6 +43,14 @@ config PROC_VMCORE
>  help
>  Exports the dump image of crashed kernel in ELF format.
>
> +config PROC_PIDFS
> +   bool "pidfs file system support"
> +   depends on PROC_FS
> +   default n
> +   help
> + The pidfs filesystem contains a subset of the /proc file system
> + which contains only information only about the processes.
> +
>  config PROC_SYSCTL
> bool "Sysctl support (/proc/sys)" if EXPERT
> depends on PROC_FS
> diff --git a/fs/proc/inode.c b/fs/proc/inode.c
> index 783bc19..1be65b4 100644
> --- a/fs/proc/inode.c
> +++ b/fs/proc/inode.c
> @@ -474,12 +474,16 @@ struct inode *proc_get_inode(struct super_block *sb, 
> struct proc_dir_entry *de)
>  int proc_fill_super(struct super_block *s, void *data, int silent)
>  {
> struct pid_namespace *ns = get_pid_ns(s->s_fs_info);
> +   struct proc_dir_entry *fs_root = &proc_root;
> struct inode *root_inode;
> int ret;
>
> if (!proc_parse_options(data, ns))
> return -EINVAL;
>
> +   if (IS_ENABLED(CONFIG_PROC_PIDFS) && s->s_type == &pidfs_fs_type)
> +   fs_root = &pidfs_root;
> +
> /

Re: [PATCH] PCI: dwc: Fix crashes seen due to missing assignments

2017-02-27 Thread Bjorn Helgaas

On Sat, Feb 25, 2017 at 02:08:12AM -0800, Guenter Roeck wrote:
> Fix the following crash, seen in dwc/pci-imx6.
> 
> Unable to handle kernel NULL pointer dereference at virtual address 0070
> pgd = c0004000
> [0070] *pgd=
> Internal error: Oops: 805 [#1] SMP ARM
> Modules linked in:
> CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.10.0-09686-g9e31489 #1
> Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
> task: cb85 task.stack: cb84e000
> PC is at imx6_pcie_probe+0x2f4/0x414
> ...
> 
> While at it, fix the same problem in various drivers instead of waiting
> for individual crash reports.
> 
> The change in the imx6 driver was tested with qemu. The changes in other
> drivers are based on code inspection and have been compile tested only.
> 
> Fixes: 442ec4c04d12 ("PCI: dwc: all: Split struct pcie_port into ...")
> Cc: Kishon Vijay Abraham I 
> Signed-off-by: Guenter Roeck 

Applied with Kishon's ack to for-linus for v4.11, thanks, Guenter!

> ---
>  drivers/pci/dwc/pci-exynos.c | 1 +
>  drivers/pci/dwc/pci-imx6.c   | 1 +
>  drivers/pci/dwc/pci-keystone.c   | 2 ++
>  drivers/pci/dwc/pci-layerscape.c | 2 ++
>  drivers/pci/dwc/pcie-armada8k.c  | 2 ++
>  drivers/pci/dwc/pcie-artpec6.c   | 2 ++
>  drivers/pci/dwc/pcie-hisi.c  | 2 ++
>  drivers/pci/dwc/pcie-qcom.c  | 2 ++
>  drivers/pci/dwc/pcie-spear13xx.c | 2 ++
>  9 files changed, 16 insertions(+)
> 
> diff --git a/drivers/pci/dwc/pci-exynos.c b/drivers/pci/dwc/pci-exynos.c
> index 001c91a945aa..993b650ef275 100644
> --- a/drivers/pci/dwc/pci-exynos.c
> +++ b/drivers/pci/dwc/pci-exynos.c
> @@ -668,6 +668,7 @@ static int __init exynos_pcie_probe(struct 
> platform_device *pdev)
>   pci->dev = dev;
>   pci->ops = &dw_pcie_ops;
>  
> + ep->pci = pci;
>   ep->ops = (const struct exynos_pcie_ops *)
>   of_device_get_match_data(dev);
>  
> diff --git a/drivers/pci/dwc/pci-imx6.c b/drivers/pci/dwc/pci-imx6.c
> index 3ab6761db9e8..801e46cd266d 100644
> --- a/drivers/pci/dwc/pci-imx6.c
> +++ b/drivers/pci/dwc/pci-imx6.c
> @@ -605,6 +605,7 @@ static int __init imx6_pcie_probe(struct platform_device 
> *pdev)
>   pci->dev = dev;
>   pci->ops = &dw_pcie_ops;
>  
> + imx6_pcie->pci = pci;
>   imx6_pcie->variant =
>   (enum imx6_pcie_variants)of_device_get_match_data(dev);
>  
> diff --git a/drivers/pci/dwc/pci-keystone.c b/drivers/pci/dwc/pci-keystone.c
> index 8dc66409182d..fcc9723bad6e 100644
> --- a/drivers/pci/dwc/pci-keystone.c
> +++ b/drivers/pci/dwc/pci-keystone.c
> @@ -401,6 +401,8 @@ static int __init ks_pcie_probe(struct platform_device 
> *pdev)
>   pci->dev = dev;
>   pci->ops = &dw_pcie_ops;
>  
> + ks_pcie->pci = pci;
> +
>   /* initialize SerDes Phy if present */
>   phy = devm_phy_get(dev, "pcie-phy");
>   if (PTR_ERR_OR_ZERO(phy) == -EPROBE_DEFER)
> diff --git a/drivers/pci/dwc/pci-layerscape.c 
> b/drivers/pci/dwc/pci-layerscape.c
> index 175c09e3a932..c32e392a0ae6 100644
> --- a/drivers/pci/dwc/pci-layerscape.c
> +++ b/drivers/pci/dwc/pci-layerscape.c
> @@ -280,6 +280,8 @@ static int __init ls_pcie_probe(struct platform_device 
> *pdev)
>   pci->dev = dev;
>   pci->ops = pcie->drvdata->dw_pcie_ops;
>  
> + pcie->pci = pci;
> +
>   dbi_base = platform_get_resource_byname(pdev, IORESOURCE_MEM, "regs");
>   pci->dbi_base = devm_ioremap_resource(dev, dbi_base);
>   if (IS_ERR(pci->dbi_base))
> diff --git a/drivers/pci/dwc/pcie-armada8k.c b/drivers/pci/dwc/pcie-armada8k.c
> index 66bac6fbfa9f..f110e3b24a26 100644
> --- a/drivers/pci/dwc/pcie-armada8k.c
> +++ b/drivers/pci/dwc/pcie-armada8k.c
> @@ -220,6 +220,8 @@ static int armada8k_pcie_probe(struct platform_device 
> *pdev)
>   pci->dev = dev;
>   pci->ops = &dw_pcie_ops;
>  
> + pcie->pci = pci;
> +
>   pcie->clk = devm_clk_get(dev, NULL);
>   if (IS_ERR(pcie->clk))
>   return PTR_ERR(pcie->clk);
> diff --git a/drivers/pci/dwc/pcie-artpec6.c b/drivers/pci/dwc/pcie-artpec6.c
> index 59ecc9e66436..fcd3ef845883 100644
> --- a/drivers/pci/dwc/pcie-artpec6.c
> +++ b/drivers/pci/dwc/pcie-artpec6.c
> @@ -253,6 +253,8 @@ static int artpec6_pcie_probe(struct platform_device 
> *pdev)
>  
>   pci->dev = dev;
>  
> + artpec6_pcie->pci = pci;
> +
>   dbi_base = platform_get_resource_byname(pdev, IORESOURCE_MEM, "dbi");
>   pci->dbi_base = devm_ioremap_resource(dev, dbi_base);
>   if (IS_ERR(pci->dbi_base))
> diff --git a/drivers/pci/dwc/pcie-hisi.c b/drivers/pci/dwc/pcie-hisi.c
> index e3e4fedd9f68..fd66a3199db7 100644
> --- a/drivers/pci/dwc/pcie-hisi.c
> +++ b/drivers/pci/dwc/pcie-hisi.c
> @@ -284,6 +284,8 @@ static int hisi_pcie_probe(struct platform_device *pdev)
>  
>   driver = dev->driver;
>  
> + hisi_pcie->pci = pci;
> +
>   hisi_pcie->soc_ops = of_device_get_match_data(dev);
>  
>   hisi_pcie->subctrl =
> diff --git a/drivers/pci/dwc/pcie-qcom.c b/drivers/pci/dwc/pcie-qcom.c
> index e36abe0d9d6f..

Re: [PATCH v4 02/11] dt-bindings: hisi: Add Hisilicon HiP05/06/07 Djtag dts bindings

2017-02-27 Thread Rob Herring

On Sun, Feb 19, 2017 at 01:50:13PM -0500, Anurup M wrote:
> From: Tan Xiaojun 
> 
> Add Hisilicon HiP05/06/07 Djtag dts bindings for CPU and IO Die
> 
> Signed-off-by: Tan Xiaojun 
> Signed-off-by: Anurup M 
> ---
>  .../devicetree/bindings/arm/hisilicon/djtag.txt| 51 
> ++
>  1 file changed, 51 insertions(+)
>  create mode 100644 Documentation/devicetree/bindings/arm/hisilicon/djtag.txt

Acked-by: Rob Herring

Re: automatic IRQ affinity for virtio V3

2017-02-27 Thread Michael S. Tsirkin

On Mon, Feb 27, 2017 at 09:48:32AM +0100, Christoph Hellwig wrote:
> On Thu, Feb 09, 2017 at 06:01:57PM +0200, Michael S. Tsirkin wrote:
> > > Any chance to get this in for 4.11 after I got reviews from Jason
> > > for most of the patches?
> > 
> > Absolutely, I intend to merge it.
> 
> So, what is the plan for virtio this merge window?  No changes seem
> to have made it into linux-next before the merge window, never mind
> a week into it.

Sorry I've been a bit busy. Any chance you could rebase?
This conflicts with your patch removing vq info.

[PATCH] jump_label: align jump_entry table to at least 4-bytes

2017-02-27 Thread Jason Baron

The core jump_label code makes use of the 2 lower bits of the
static_key::[type|entries|next] field. Thus, ensure that the jump_entry
table is at least 4-byte aligned.

Reported-and-tested-by: Sachin Sant 
Cc: Steven Rostedt 
Cc: Ingo Molnar 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Michael Ellerman 
Cc: Anton Blanchard 
Cc: Rabin Vincent 
Cc: Russell King 
Cc: Ralf Baechle 
Cc: Chris Metcalf 
Cc: Zhigang Lu 
Cc: David Daney 
Signed-off-by: Jason Baron 
---
 arch/arm/include/asm/jump_label.h | 2 ++
 arch/mips/include/asm/jump_label.h| 2 ++
 arch/powerpc/include/asm/jump_label.h | 3 +++
 arch/tile/include/asm/jump_label.h| 2 ++
 4 files changed, 9 insertions(+)

diff --git a/arch/arm/include/asm/jump_label.h 
b/arch/arm/include/asm/jump_label.h
index 34f7b6980d21..9c017bb04d1c 100644
--- a/arch/arm/include/asm/jump_label.h
+++ b/arch/arm/include/asm/jump_label.h
@@ -13,6 +13,7 @@ static __always_inline bool arch_static_branch(struct 
static_key *key, bool bran
asm_volatile_goto("1:\n\t"
 WASM(nop) "\n\t"
 ".pushsection __jump_table,  \"aw\"\n\t"
+".balign 4\n\t"
 ".word 1b, %l[l_yes], %c0\n\t"
 ".popsection\n\t"
 : :  "i" (&((char *)key)[branch]) :  : l_yes);
@@ -27,6 +28,7 @@ static __always_inline bool arch_static_branch_jump(struct 
static_key *key, bool
asm_volatile_goto("1:\n\t"
 WASM(b) " %l[l_yes]\n\t"
 ".pushsection __jump_table,  \"aw\"\n\t"
+".balign 4\n\t"
 ".word 1b, %l[l_yes], %c0\n\t"
 ".popsection\n\t"
 : :  "i" (&((char *)key)[branch]) :  : l_yes);
diff --git a/arch/mips/include/asm/jump_label.h 
b/arch/mips/include/asm/jump_label.h
index e77672539e8e..243791f3ae71 100644
--- a/arch/mips/include/asm/jump_label.h
+++ b/arch/mips/include/asm/jump_label.h
@@ -31,6 +31,7 @@ static __always_inline bool arch_static_branch(struct 
static_key *key, bool bran
asm_volatile_goto("1:\t" NOP_INSN "\n\t"
"nop\n\t"
".pushsection __jump_table,  \"aw\"\n\t"
+   ".balign 4\n\t"
WORD_INSN " 1b, %l[l_yes], %0\n\t"
".popsection\n\t"
: :  "i" (&((char *)key)[branch]) : : l_yes);
@@ -45,6 +46,7 @@ static __always_inline bool arch_static_branch_jump(struct 
static_key *key, bool
asm_volatile_goto("1:\tj %l[l_yes]\n\t"
"nop\n\t"
".pushsection __jump_table,  \"aw\"\n\t"
+   ".balign 4\n\t"
WORD_INSN " 1b, %l[l_yes], %0\n\t"
".popsection\n\t"
: :  "i" (&((char *)key)[branch]) : : l_yes);
diff --git a/arch/powerpc/include/asm/jump_label.h 
b/arch/powerpc/include/asm/jump_label.h
index 9a287e0ac8b1..bfe83496b590 100644
--- a/arch/powerpc/include/asm/jump_label.h
+++ b/arch/powerpc/include/asm/jump_label.h
@@ -24,6 +24,7 @@ static __always_inline bool arch_static_branch(struct 
static_key *key, bool bran
asm_volatile_goto("1:\n\t"
 "nop # arch_static_branch\n\t"
 ".pushsection __jump_table,  \"aw\"\n\t"
+".balign 4\n\t"
 JUMP_ENTRY_TYPE "1b, %l[l_yes], %c0\n\t"
 ".popsection \n\t"
 : :  "i" (&((char *)key)[branch]) : : l_yes);
@@ -38,6 +39,7 @@ static __always_inline bool arch_static_branch_jump(struct 
static_key *key, bool
asm_volatile_goto("1:\n\t"
 "b %l[l_yes] # arch_static_branch_jump\n\t"
 ".pushsection __jump_table,  \"aw\"\n\t"
+".balign 4\n\t"
 JUMP_ENTRY_TYPE "1b, %l[l_yes], %c0\n\t"
 ".popsection \n\t"
 : :  "i" (&((char *)key)[branch]) : : l_yes);
@@ -63,6 +65,7 @@ struct jump_entry {
 #define ARCH_STATIC_BRANCH(LABEL, KEY) \
 1098:  nop;\
.pushsection __jump_table, "aw";\
+   .balign 4;  \
FTR_ENTRY_LONG 1098b, LABEL, KEY;   \
.popsection
 #endif
diff --git a/arch/tile/include/asm/jump_label.h 
b/arch/tile/include/asm/jump_label.h
index cde7573f397b..a964e6135ea3 100644
--- a/arch/tile/include/asm/jump_label.h
+++ b/arch/tile/include/asm/jump_label.h
@@ -25,6 +25,7 @@ static __always_inline bool arch_static_branch(struct 
static_key *key,
asm_volatile_goto("1:\n\t"
"nop" "\n\t"
".pushsection __jump_table,  \"aw\"\n\t"
+   ".balign 4\n\t"
".quad 1b, %l[l_yes], %0 + %1 \n\t"
".popsection\n\t"
: :  "i" (key), "i" (branch) : : l_yes);
@@ -39,6 +40,7 @@ static __always_inline bool arch_static_branch_jump(struct 
static_key *key,
asm_volatile_goto("1:\n\t"
"j %l[l_yes]" "\n\t"
".pushsection __jump_table,  \"aw\"\n\t"
+   ".bali

Re: [PATCH] hlist_add_tail_rcu disable sparse warning

2017-02-27 Thread Michael S. Tsirkin

On Fri, Feb 10, 2017 at 07:39:49PM +0200, Michael S. Tsirkin wrote:
> sparse is unhappy about this code in hlist_add_tail_rcu:
> 
> struct hlist_node *i, *last = NULL;
> 
> for (i = hlist_first_rcu(h); i; i = hlist_next_rcu(i))
> last = i;
> 
> This is because hlist_next_rcu and hlist_next_rcu return
> __rcu pointers.
> 
> It's a false positive - it's a write side primitive and so
> does not need to be called in a read side critical section.
> 
> The following trivial patch disables the warning
> without changing the behaviour in any way.
> 
> Note: __hlist_for_each_rcu would also remove the warning but it would be
> confusing since it calls rcu_derefence and is designed to run in the rcu
> read side critical section.
> 
> Signed-off-by: Michael S. Tsirkin 
> ---

ping

> changes since RFC
>   added commit log text to explain why don't we use __hlist_for_each_rcu
> 
>  include/linux/rculist.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/include/linux/rculist.h b/include/linux/rculist.h
> index 4f7a956..bf578e8 100644
> --- a/include/linux/rculist.h
> +++ b/include/linux/rculist.h
> @@ -509,7 +509,7 @@ static inline void hlist_add_tail_rcu(struct hlist_node 
> *n,
>  {
>   struct hlist_node *i, *last = NULL;
>  
> - for (i = hlist_first_rcu(h); i; i = hlist_next_rcu(i))
> + for (i = h->first; i; i = i->next)
>   last = i;
>  
>   if (last) {
> -- 
> MST

Re: [PATCH v2 0/7] Tegra210 clock bug fixes

2017-02-27 Thread Mikko Perttunen


Series,

Reviewed-by: Mikko Perttunen 
Tested-by: Mikko Perttunen 

On 02/23/2017 12:44 PM, Peter De Schrijver wrote:

A number of bug fixes for the Tegra210 clock implementation.

Changelog:

v2: add better description for 'remove non-existing pll_m_out1 clock'

Peter De Schrijver (7):
  clk: tegra: fix pll_a1 iddq register, add pll_a1
  clk: tegra: fix isp clock modelling
  clk: tegra: correct afi parent
  clk: tegra: remove non-existing pll_m_out1 clock
  clk: tegra: don't warn for PLL defaults unnecessarily
  clk: tegra: correct tegra210_pll_fixed_mdiv_cfg rate calculation
  clk: tegra: fix type for m field

 drivers/clk/tegra/clk-id.h   |  1 +
 drivers/clk/tegra/clk-tegra-periph.c | 13 +---
 drivers/clk/tegra/clk-tegra210.c | 35 
 drivers/clk/tegra/clk.h  |  2 +-
 include/dt-bindings/clock/tegra210-car.h |  4 ++--
 5 files changed, 36 insertions(+), 19 deletions(-)

Re: automatic IRQ affinity for virtio V3

2017-02-27 Thread Michael S. Tsirkin

On Mon, Feb 27, 2017 at 08:45:25PM +0200, Michael S. Tsirkin wrote:
> On Mon, Feb 27, 2017 at 09:48:32AM +0100, Christoph Hellwig wrote:
> > On Thu, Feb 09, 2017 at 06:01:57PM +0200, Michael S. Tsirkin wrote:
> > > > Any chance to get this in for 4.11 after I got reviews from Jason
> > > > for most of the patches?
> > > 
> > > Absolutely, I intend to merge it.
> > 
> > So, what is the plan for virtio this merge window?  No changes seem
> > to have made it into linux-next before the merge window, never mind
> > a week into it.
> 
> Sorry I've been a bit busy. Any chance you could rebase?
> This conflicts with your patch removing vq info.

Actually it doesn't, my bad. Applied now, thanks!

Re: [tpmdd-devel] [PATCH 1/2] tpm: Apply an adapterlimit for retransmission.

2017-02-27 Thread Enric Balletbo Serra

Bounce to Wolfram Sang

2017-02-22 15:01 GMT+01:00 Andrew Lunn :
> On Wed, Feb 22, 2017 at 12:16:08PM +0100, Enric Balletbo i Serra wrote:
>> Hi Andrew,
>>
>> Removing Bryan Freed from the loop as seems his email is not valid anymore. 
>> I already CC'ied Andrey which is doing the TPM bit in chromeos kernel.
>>
>> On 21/02/17 17:29, Andrew Lunn wrote:
>> > On Tue, Feb 21, 2017 at 03:44:59PM +0100, Enric Balletbo i Serra wrote:
>> >> From: Bryan Freed 
>> >>
>> >> When the I2C Infineon part is attached to an I2C adapter that imposes
>> >> a size limitation, large requests will fail -EINVAL.
>> >> Retry them with size backoff without re-issuing the 0x05 command
>> >> as this appears to occasionally put the TPM in a bad state.
>> >
>> > Hi Enric
>> >
>> > Rather than trying small and smaller transfers, would it not be better
>> > to get the i2c core to expose the quirk info about transfer limits?
>> >
>>
>> Sounds a good idea to me, I guess the quirk info can be accessed with
>>
>>   tpm_dev.client->adapter->quirks->max_read_len
>>
>> so I think we don't need to touch the i2c core. I'll propose a second 
>> version of the patch.
>
> Hi Enric
>
> You should probably ask Wolfram Sang , the i2c
> subsystem maintainer. He may prefer adding an API call.
>
>   Andrew

Re: [PATCH 1/2] lightnvm: add generic ocssd detection

2017-02-27 Thread Keith Busch

On Mon, Feb 27, 2017 at 08:35:06PM +0200, Sagi Grimberg wrote:
> > On Sat, Feb 25, 2017 at 08:16:04PM +0100, Matias Bjørling wrote:
> > > On 02/25/2017 07:21 PM, Christoph Hellwig wrote:
> > > > No way in hell.  vs is vendor specific and we absolutely can't overload
> > > > it with any sort of meaning.  Get OCSSD support properly standardized 
> > > > and
> > > > add a class code for it.  Until then it's individual PCI IDs.
> > > > 
> > > 
> > > You are right, that is the right way to go, and we are working on it. In 
> > > the
> > > meantime, there are a couple of reasons I want to do a pragmatic solution:
> > 
> > Reasonable reaosons, but that's just not how standard interfaces work.
> > Either you standardize the behaviour and have a standardized trigger
> > for it, or it is vendor specific and needs to be keyed off a specific
> > vendor/device identification.
> 
> I agree, I don't see how we're allowed to use vs for that.

>From personal experience, some OEMs will put whatever they want in the
VS region for their rebranded device, making it an unreliable place to
check for a capability.

Re: [PATCH] ASoC: fsl: Remove unneeded init of static variable

2017-02-27 Thread Nicolin Chen

On Sat, Feb 25, 2017 at 12:47:26PM +0200, Alin Grigorean wrote:
> This was reported by checkpatch.pl
> 
> Signed-off-by: Alin Grigorean 

Acked-by: Nicolin Chen 

> ---
>  sound/soc/fsl/imx-pcm-fiq.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/sound/soc/fsl/imx-pcm-fiq.c b/sound/soc/fsl/imx-pcm-fiq.c
> index dac6688540dc..92410f7ca1fa 100644
> --- a/sound/soc/fsl/imx-pcm-fiq.c
> +++ b/sound/soc/fsl/imx-pcm-fiq.c
> @@ -282,7 +282,7 @@ static int imx_pcm_new(struct snd_soc_pcm_runtime *rtd)
>   return 0;
>  }
>  
> -static int ssi_irq = 0;
> +static int ssi_irq;
>  
>  static int imx_pcm_fiq_new(struct snd_soc_pcm_runtime *rtd)
>  {
> -- 
> 2.11.1
>

Re: [PATCH] hlist_add_tail_rcu disable sparse warning

2017-02-27 Thread Steven Rostedt

On Fri, 10 Feb 2017 19:39:49 +0200
"Michael S. Tsirkin"  wrote:

> diff --git a/include/linux/rculist.h b/include/linux/rculist.h
> index 4f7a956..bf578e8 100644
> --- a/include/linux/rculist.h
> +++ b/include/linux/rculist.h
> @@ -509,7 +509,7 @@ static inline void hlist_add_tail_rcu(struct hlist_node 
> *n,
>  {
>   struct hlist_node *i, *last = NULL;
>  
> - for (i = hlist_first_rcu(h); i; i = hlist_next_rcu(i))
> + for (i = h->first; i; i = i->next)

Looks good to me, but I would probably add a comment above the loop
about this being in a write side section, and rcu conversions are not
needed.

Reviewed-by: Steven Rostedt (VMware) 

-- Steve

>   last = i;
>  
>   if (last) {

Re: __queue_work oops.

2017-02-27 Thread Tejun Heo

Hello,

On Mon, Feb 27, 2017 at 12:14:39PM -0500, Dave Jones wrote:
> Oops:  [#1] PREEMPT SMP DEBUG_PAGEALLOC
> CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.10.0-think+ #9 
> task: 88017f105440 task.stack: c9094000
> RIP: 0010:__queue_work+0x2d/0x700
> RSP: 0018:880507c03df8 EFLAGS: 00010046
> RAX: 0082 RBX: 0101 RCX: 0002
> RDX: 88047bf07c98 RSI:  RDI: 
> RBP: 880507c03e30 R08: 0001 R09: 8294bf68
> R10: 880507c03e58 R11:  R12: 88047bf07ce8
> R13:  R14:  R15: 88047bf07c98
> FS:  () GS:880507c0() knlGS:
> CS:  0010 DS:  ES:  CR0: 80050033
> CR2: 01c2 CR3: 04e11000 CR4: 001406e0
> Call Trace:
>  
>  ? work_on_cpu+0xb0/0xb0
>  delayed_work_timer_fn+0x1e/0x20
>  call_timer_fn+0xbd/0x480
...
> Code starting with the faulting instruction
> ===
>0: 41 f6 85 c2 01 00 00testb  $0x1,0x1c2(%r13)
>7: 01 
>8: 0f 85 22 04 00 00   jne0x430
>e: 49  rex.WB
>f: bc eb 83 b5 80  mov$0x80b583eb,%esp
>   14: 46  rex.RX
> 
> 3cf0 <__queue_work>:
> {
> 3cf0:   e8 00 00 00 00  callq  3cf5 <__queue_work+0x5>
> 3cf5:   55  push   %rbp
> 3cf6:   48 89 e5mov%rsp,%rbp
> 3cf9:   41 57   push   %r15
> 3cfb:   49 89 d7mov%rdx,%r15
> 3cfe:   41 56   push   %r14
> unsigned int req_cpu = cpu;
> 3d00:   41 89 femov%edi,%r14d
> {
> 3d03:   41 55   push   %r13
> 3d05:   49 89 f5mov%rsi,%r13
> 3d08:   41 54   push   %r12
> 3d0a:   53  push   %rbx
> 3d0b:   48 83 ec 10 sub$0x10,%rsp
> 3d0f:   89 7d d4mov%edi,-0x2c(%rbp)
> asm volatile("# __raw_save_flags\n\t"
> 3d12:   9c  pushfq 
> 3d13:   58  pop%rax
> WARN_ON_ONCE(!irqs_disabled());
> 3d14:   f6 c4 02test   $0x2,%ah
> 3d17:   0f 85 06 04 00 00   jne4123 <__queue_work+0x433>
> if (unlikely(wq->flags & __WQ_DRAINING) &&
> 3d1d:   41 f6 85 c2 01 00 00testb  $0x1,0x1c2(%r13)
> 
> 
> So we called __queue_work with a null wq.

So, that's somebody calling queue_delayed_work[_on]() with a NULL wq
and when the timeout expires the timer callback trying to queue
against NULL.  Hmm... the work function would be able to tell us who
queued it but it isn't part of the information dumped here (would be
0x18(%rdx)).

I'll add a sanity check on queue_delayed_work_on() so that we can
catch it synchronously when it happens.

Thanks.

-- 
tejun

Re: mm: GPF in bdi_put

2017-02-27 Thread Al Viro

On Mon, Feb 27, 2017 at 06:11:11PM +0100, Dmitry Vyukov wrote:
> Hello,
> 
> The following program triggers GPF in bdi_put:
> https://gist.githubusercontent.com/dvyukov/15b3e211f937ff6abc558724369066ce/raw/cc017edf57963e30175a6a6fe2b8d917f6e92899/gistfile1.txt

What happens is
* attempt of, essentially, mount -t bdev ..., calls mount_pseudo()
and then promptly destroys the new instance it has created.
* the only inode created on that sucker (root directory, that
is) gets evicted.
* most of ->evict_inode() is harmless, until it gets to
if (bdev->bd_bdi != &noop_backing_dev_info)
bdi_put(bdev->bd_bdi);

added there by "block: Make blk_get_backing_dev_info() safe without open bdev".
Since ->bd_bdi hadn't been initialized for that sucker (the same patch has
placed initialization into bdget()), we step into shit of varying nastiness,
depending on phase of moon, etc.

Could somebody explain WTF do we have those two lines in bdev_evict_inode(),
anyway?  We set ->bd_bdi to something other than noop_backing_dev_info only
in __blkdev_get() when ->bd_openers goes from zero to positive, so why is
the matching bdi_put() not in __blkdev_put()?  Jan?

Re: [REGRESSION] pinctrl, of, unable to find hogs

2017-02-27 Thread Tony Lindgren

* Gary Bisson  [170227 08:42]:
> On Mon, Feb 27, 2017 at 05:27:47PM +0100, Gary Bisson wrote:
> > Mika, Tony, All,
> > 
> > On Mon, Feb 27, 2017 at 07:53:53AM -0800, Tony Lindgren wrote:
> > > * Mika Penttilä  [170226 21:46]:
> > > > 
> > > > With current linus git (pre 4.11), unable to find the pinctrl hogs :
> > > > 
> > > > 
> > > >  imx6q-pinctrl 20e.iomuxc: unable to find group for node hoggrp
> > > > 
> > > > 
> > > > Device is i.MX6 based.
> > > 
> > > Sorry to hear about that, maybe imx_pinctrl_probe_dt() should be
> > > called before devm_pinctrl_register_and_init()?
> > > 
> > > Things got moved around a bit with e566fc11ea76 ("pinctrl: imx: use
> > > generic pinctrl helpers for managing groups") it seems. But maybe that
> > > was done because we did not have commit 950b0d91dc10 ("pinctrl: core:
> > > Fix regression caused by delayed work for hogs") when the imx_pinctrl
> > > changes got merged.
> > 
> > Indeed the i.MX changes were made before your the rework.
> 
> s/the/hog/
> 
> > The reason imx_pinctrl_probe_dt got moved around is because
> > devm_pinctrl_register is the one that initializes the radix trees that
> > are needed when probing the dt.

OK

> > > Gary, are you able to reproduce this? Seems it should happen with
> > > any imx with hogs configured in the dts.
> > 
> > Yes I can reproduce the issue.

OK good to hear.

> > Not sure how to fix it though since we can't move the dt probing before
> > radix tree init.

Yup looks like we still have an issue with pinctrl driver functions
getting called before driver probe has completed.

How about we introduce something like:

int pinctrl_claim_hogs(struct pinctrl_dev *pctldev);

Then the drivers can call that at the end of the probe after
the pins have been parsed?

This should be safe as no other driver can claim the pins either
before the pins have been parsed :)

Regards,

Tony

Re: [PATCH 2/5] clk: tegra: define Tegra210 DMIC sync clocks

2017-02-27 Thread Mikko Perttunen


On 02/23/2017 02:39 PM, Peter De Schrijver wrote:

Tegra210 has 3 DMIC inputs which can be clocked from the recovered clock
of several other audio inputs (eg. i2s0, i2s1, ...). To model this, we add
a 3 new clocks similar to the audio* clocks which handle the same function
for the i2s and spdif clocks.

Signed-off-by: Peter De Schrijver 
---
 drivers/clk/tegra/clk-id.h   |  8 ++-
 drivers/clk/tegra/clk-tegra-audio.c  | 85 +++-
 drivers/clk/tegra/clk-tegra210.c |  6 +++
 include/dt-bindings/clock/tegra210-car.h |  9 +++-
 4 files changed, 82 insertions(+), 26 deletions(-)

diff --git a/drivers/clk/tegra/clk-id.h b/drivers/clk/tegra/clk-id.h
index fc978b2..ab9b347 100644
--- a/drivers/clk/tegra/clk-id.h
+++ b/drivers/clk/tegra/clk-id.h
@@ -307,8 +307,14 @@ enum clk_id {
tegra_clk_xusb_ssp_src,
tegra_clk_sclk_mux,
tegra_clk_sor_safe,
-   tegra_clk_ispa,
tegra_clk_cec,
+   tegra_clk_ispa,
+   tegra_clk_dmic1_sync_clk,
+   tegra_clk_dmic2_sync_clk,
+   tegra_clk_dmic3_sync_clk,
+   tegra_clk_dmic1_sync_clk_mux,
+   tegra_clk_dmic2_sync_clk_mux,
+   tegra_clk_dmic3_sync_clk_mux,
tegra_clk_max,
 };

diff --git a/drivers/clk/tegra/clk-tegra-audio.c 
b/drivers/clk/tegra/clk-tegra-audio.c
index e2bfa9b..b4da6e0 100644
--- a/drivers/clk/tegra/clk-tegra-audio.c
+++ b/drivers/clk/tegra/clk-tegra-audio.c
@@ -31,6 +31,9 @@
 #define AUDIO_SYNC_CLK_I2S3 0x4ac
 #define AUDIO_SYNC_CLK_I2S4 0x4b0
 #define AUDIO_SYNC_CLK_SPDIF 0x4b4
+#define AUDIO_SYNC_CLK_DMIC1 0x560
+#define AUDIO_SYNC_CLK_DMIC2 0x564
+#define AUDIO_SYNC_CLK_DMIC3 0x6b8

 #define AUDIO_SYNC_DOUBLER 0x49c

@@ -91,8 +94,14 @@ struct tegra_audio2x_clk_initdata {

 static DEFINE_SPINLOCK(clk_doubler_lock);

-static const char *mux_audio_sync_clk[] = { "spdif_in_sync", "i2s0_sync",
-   "i2s1_sync", "i2s2_sync", "i2s3_sync", "i2s4_sync", "vimclk_sync",
+static const char * const mux_audio_sync_clk[] = { "spdif_in_sync",
+   "i2s0_sync", "i2s1_sync", "i2s2_sync", "i2s3_sync", "i2s4_sync",
+   "pll_a_out0", "vimclk_sync",
+};
+
+static const char * const mux_dmic_sync_clk[] = { "unused", "i2s0_sync",
+   "i2s1_sync", "i2s2_sync", "i2s3_sync", "i2s4_sync", "pll_a_out0",
+   "vimclk_sync",
 };


My GCC spews a bunch of warnings because these are "const char * const" 
and are passed to tegra_audio_sync_clk_init which takes "const char **". 
Similarly for mux_dmic[123] which end up in a struct 
tegra_periph_init_data which also has a "const char **" field; and 
finally aclk_parents has the same issue.


Apart from that, the series:

Reviewed-by: Mikko Perttunen 
Tested-by: Mikko Perttunen 
(booted and verified clocks show up)



 static struct tegra_sync_source_initdata sync_source_clks[] __initdata = {
@@ -114,6 +123,12 @@ struct tegra_audio2x_clk_initdata {
AUDIO(spdif, AUDIO_SYNC_CLK_SPDIF),
 };

+static struct tegra_audio_clk_initdata dmic_clks[] = {
+   AUDIO(dmic1_sync_clk, AUDIO_SYNC_CLK_DMIC1),
+   AUDIO(dmic2_sync_clk, AUDIO_SYNC_CLK_DMIC2),
+   AUDIO(dmic3_sync_clk, AUDIO_SYNC_CLK_DMIC3),
+};
+
 static struct tegra_audio2x_clk_initdata audio2x_clks[] = {
AUDIO2X(audio0, 113, 24),
AUDIO2X(audio1, 114, 25),
@@ -123,6 +138,41 @@ struct tegra_audio2x_clk_initdata {
AUDIO2X(spdif, 118, 29),
 };

+static void __init tegra_audio_sync_clk_init(void __iomem *clk_base,
+ struct tegra_clk *tegra_clks,
+ struct tegra_audio_clk_initdata *sync,
+ int num_sync_clks,
+ const char **mux_names,
+ int num_mux_inputs)
+{
+   struct clk *clk;
+   struct clk **dt_clk;
+   struct tegra_audio_clk_initdata *data;
+   int i;
+
+   for (i = 0, data = sync; i < num_sync_clks; i++, data++) {
+   dt_clk = tegra_lookup_dt_id(data->mux_clk_id, tegra_clks);
+   if (!dt_clk)
+   continue;
+
+   clk = clk_register_mux(NULL, data->mux_name, mux_names,
+   num_mux_inputs,
+   CLK_SET_RATE_NO_REPARENT,
+   clk_base + data->offset, 0, 3, 0,
+   NULL);
+   *dt_clk = clk;
+
+   dt_clk = tegra_lookup_dt_id(data->gate_clk_id, tegra_clks);
+   if (!dt_clk)
+   continue;
+
+   clk = clk_register_gate(NULL, data->gate_name, data->mux_name,
+   0, clk_base + data->offset, 4,
+   CLK_GATE_SET_TO_DISABLE, NULL);
+   *dt_clk = clk;
+   }
+}
+
 void __init tegra_audio_clk_init(void __iomem *clk_base,
void __iomem *pmc_base, struct tegra_clk *tegra_clks,

Re: [RFC PATCH] mm, hotplug: get rid of auto_online_blocks

2017-02-27 Thread Michal Hocko

On Mon 27-02-17 11:28:52, Reza Arbab wrote:
> On Mon, Feb 27, 2017 at 10:28:17AM +0100, Michal Hocko wrote:
> >diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
> >index 134a2f69c21a..a72f7f64ee26 100644
> >--- a/include/linux/memory_hotplug.h
> >+++ b/include/linux/memory_hotplug.h
> >@@ -100,8 +100,6 @@ extern void __online_page_free(struct page *page);
> >
> >extern int try_online_node(int nid);
> >
> >-extern bool memhp_auto_online;
> >-
> >#ifdef CONFIG_MEMORY_HOTREMOVE
> >extern bool is_pageblock_removable_nolock(struct page *page);
> >extern int arch_remove_memory(u64 start, u64 size);
> >@@ -272,7 +270,7 @@ static inline void remove_memory(int nid, u64 start, u64 
> >size) {}
> >
> >extern int walk_memory_range(unsigned long start_pfn, unsigned long end_pfn,
> > void *arg, int (*func)(struct memory_block *, void *));
> >-extern int add_memory(int nid, u64 start, u64 size);
> >+extern int add_memory(int nid, u64 start, u64 size, bool online);
> >extern int add_memory_resource(int nid, struct resource *resource, bool 
> >online);
> >extern int zone_for_memory(int nid, u64 start, u64 size, int zone_default,
> > bool for_device);
> 
> It would be nice if instead of a 'bool online' argument, add_memory() and
> add_memory_resource() took an 'int online_type', ala online_pages().
> 
> That way we could specify offline, online, online+movable, etc.

Sure that would require more changes though and as such it is out of
scope of this patch. But you are right, this is a logical follow up
step.
-- 
Michal Hocko
SUSE Labs

Re: [RFC PATCH v4 21/28] x86: Check for memory encryption on the APs

2017-02-27 Thread Borislav Petkov

On Thu, Feb 16, 2017 at 09:46:47AM -0600, Tom Lendacky wrote:
> Add support to check if memory encryption is active in the kernel and that
> it has been enabled on the AP. If memory encryption is active in the kernel
> but has not been enabled on the AP, then set the SYS_CFG MSR bit to enable
> memory encryption on that AP and allow the AP to continue start up.
> 
> Signed-off-by: Tom Lendacky 
> ---
>  arch/x86/include/asm/realmode.h  |   12 
>  arch/x86/realmode/init.c |4 
>  arch/x86/realmode/rm/trampoline_64.S |   17 +
>  3 files changed, 33 insertions(+)
> 
> diff --git a/arch/x86/include/asm/realmode.h b/arch/x86/include/asm/realmode.h
> index 230e190..4f7ef53 100644
> --- a/arch/x86/include/asm/realmode.h
> +++ b/arch/x86/include/asm/realmode.h
> @@ -1,6 +1,15 @@
>  #ifndef _ARCH_X86_REALMODE_H
>  #define _ARCH_X86_REALMODE_H
>  
> +/*
> + * Flag bit definitions for use with the flags field of the trampoline header
> + * int the CONFIG_X86_64 variant.

s/int/in/

> + */
> +#define TH_FLAGS_SME_ACTIVE_BIT  0
> +#define TH_FLAGS_SME_ACTIVE  BIT(TH_FLAGS_SME_ACTIVE_BIT)
> +
> +#ifndef __ASSEMBLY__
> +
>  #include 
>  #include 
>  
> @@ -38,6 +47,7 @@ struct trampoline_header {
>   u64 start;
>   u64 efer;
>   u32 cr4;
> + u32 flags;
>  #endif
>  };
>  
> @@ -69,4 +79,6 @@ static inline size_t real_mode_size_needed(void)
>  void set_real_mode_mem(phys_addr_t mem, size_t size);
>  void reserve_real_mode(void);
>  
> +#endif /* __ASSEMBLY__ */
> +
>  #endif /* _ARCH_X86_REALMODE_H */
> diff --git a/arch/x86/realmode/init.c b/arch/x86/realmode/init.c
> index 21d7506..5010089 100644
> --- a/arch/x86/realmode/init.c
> +++ b/arch/x86/realmode/init.c
> @@ -102,6 +102,10 @@ static void __init setup_real_mode(void)
>   trampoline_cr4_features = &trampoline_header->cr4;
>   *trampoline_cr4_features = mmu_cr4_features;
>  
> + trampoline_header->flags = 0;
> + if (sme_active())
> + trampoline_header->flags |= TH_FLAGS_SME_ACTIVE;
> +
>   trampoline_pgd = (u64 *) __va(real_mode_header->trampoline_pgd);
>   trampoline_pgd[0] = trampoline_pgd_entry.pgd;
>   trampoline_pgd[511] = init_level4_pgt[511].pgd;
> diff --git a/arch/x86/realmode/rm/trampoline_64.S 
> b/arch/x86/realmode/rm/trampoline_64.S
> index dac7b20..a88c3d1 100644
> --- a/arch/x86/realmode/rm/trampoline_64.S
> +++ b/arch/x86/realmode/rm/trampoline_64.S
> @@ -30,6 +30,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include "realmode.h"
>  
>   .text
> @@ -92,6 +93,21 @@ ENTRY(startup_32)
>   movl%edx, %fs
>   movl%edx, %gs
>  
> + /* Check for memory encryption support */

Let's add some blurb here about this being a safety net in case BIOS
f*cks up. Which wouldn't be that far-fetched... :-)

> + bt  $TH_FLAGS_SME_ACTIVE_BIT, pa_tr_flags
> + jnc .Ldone
> + movl$MSR_K8_SYSCFG, %ecx
> + rdmsr
> + bts $MSR_K8_SYSCFG_MEM_ENCRYPT_BIT, %eax
> + jc  .Ldone

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

Re: [PATCH v2 1/6] arm64: dts: Add symlinks for cros-ec-keyboard and cros-ec-sbs

2017-02-27 Thread Heiko Stuebner

Hi Olof,

Am Dienstag, 21. Februar 2017, 15:47:31 CET schrieb Olof Johansson:
> On Thu, Feb 9, 2017 at 5:05 PM, Brian Norris  wrote:
> > From: Douglas Anderson 
> > 
> > We'd like to be able to use the cros-ec-keyboard.dtsi and
> > cros-ec-sbs.dtsi snippets for arm64 devices.  Currently those files live
> > in the arm/boot/dts directory.
> > 
> > Let's follow the convention set by commit 8ee57b8182c4 ("ARM64: dts:
> > vexpress: Use a symlink to vexpress-v2m-rs1.dtsi from arch=arm") and use
> > a symlink.  Note that in this case we put the files in a new
> > "include/common" directory since these snippets may need to be
> > referenced by dts files in many different subdirectories.
> 
> I'd rather have something like this:
> 
> https://marc.info/?m=147547436324674&w=2
> 
> Instead of having everybody move things over. I.e. make it easy to
> refer to the arm version from arm64 instead of creating a "common"
> layer inbetween.

just so it gets noticed, I've done and tested [0], which hopefully should
implement your suggestions above.

If that looks ok, how do you want that picked up? Should I just include
them in my regular rockchip branches or do you to pick them into some
immutable branch, if other surprise-users turn up in time for 4.12?


Thanks
Heiko


[0] 
http://lists.infradead.org/pipermail/linux-rockchip/2017-February/014226.html

Re: net/ipv6: null-ptr-deref in ip6_route_del/lock_acquire

2017-02-27 Thread Cong Wang

On Mon, Feb 27, 2017 at 7:28 AM, Andrey Konovalov  wrote:
> Hi,
>
> I've got the following error report while fuzzing the kernel with syzkaller.
>
> On commit e5d56efc97f8240d0b5d66c03949382b6d7e5570 (Feb 26).
>
> A reproducer and .config are attached.
>
> kasan: CONFIG_KASAN_INLINE enabled
> kasan: GPF could be caused by NULL-ptr deref or user memory access
> general protection fault:  [#1] SMP KASAN
> Modules linked in:
> CPU: 0 PID: 4045 Comm: a.out Not tainted 4.10.0+ #54
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> task: 88006b6bac00 task.stack: 88006a688000
> RIP: 0010:__lock_acquire+0xac4/0x3270 kernel/locking/lockdep.c:3224
> RSP: 0018:88006a68f250 EFLAGS: 00010006
> RAX: dc00 RBX: dc00 RCX: 
> RDX: 0006 RSI:  RDI: 11000d4d1ea4
> RBP: 88006a68f788 R08: 0001 R09: 
> R10: 0030 R11:  R12: 88006b6bac00
> R13:  R14: 86e64ec0 R15: 0001
> FS:  7fda492ff700() GS:88006ca0() knlGS:
> CS:  0010 DS:  ES:  CR0: 80050033
> CR2: 208c4000 CR3: 6a7e9000 CR4: 06f0
> Call Trace:
>  lock_acquire+0x241/0x580 kernel/locking/lockdep.c:3753
>  __raw_write_lock_bh ./include/linux/rwlock_api_smp.h:203
>  _raw_write_lock_bh+0x3a/0x50 kernel/locking/spinlock.c:319
>  __ip6_del_rt_siblings net/ipv6/route.c:2177
>  ip6_route_del+0x4dd/0xa70 net/ipv6/route.c:2257
>  ipv6_route_ioctl+0x62d/0x790 net/ipv6/route.c:2620
>  inet6_ioctl+0xef/0x1e0 net/ipv6/af_inet6.c:520
>  sock_do_ioctl+0x65/0xb0 net/socket.c:895
>  sock_ioctl+0x28f/0x440 net/socket.c:993
>  vfs_ioctl fs/ioctl.c:43
>  do_vfs_ioctl+0x1bf/0x1780 fs/ioctl.c:683
>  SYSC_ioctl fs/ioctl.c:698
>  SyS_ioctl+0x8f/0xc0 fs/ioctl.c:689
>  entry_SYSCALL_64_fastpath+0x1f/0xc2 arch/x86/entry/entry_64.S:204

The attached patch fixes this crash, but I am not sure if it is the
best way to fix this bug yet...

Thanks.
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index f54f426..3d1b260 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -2216,12 +2216,13 @@ static int __ip6_del_rt_siblings(struct rt6_info *rt, 
struct fib6_config *cfg)
 
 static int ip6_route_del(struct fib6_config *cfg)
 {
+   struct net *net = cfg->fc_nlinfo.nl_net;
struct fib6_table *table;
struct fib6_node *fn;
struct rt6_info *rt;
int err = -ESRCH;
 
-   table = fib6_get_table(cfg->fc_nlinfo.nl_net, cfg->fc_table);
+   table = fib6_get_table(net, cfg->fc_table);
if (!table)
return err;
 
@@ -2247,6 +2248,8 @@ static int ip6_route_del(struct fib6_config *cfg)
continue;
if (cfg->fc_protocol && cfg->fc_protocol != 
rt->rt6i_protocol)
continue;
+   if (rt == net->ipv6.ip6_null_entry)
+   continue;
dst_hold(&rt->dst);
read_unlock_bh(&table->tb6_lock);

Re: [PATCH v2 1/2] Documentation: devicetree: Add i2c binding for mediatek MT2701 Soc Platform

2017-02-27 Thread Rob Herring

On Wed, Feb 15, 2017 at 10:32:55AM +0800, Jun Gao wrote:
> From: Jun Gao 
> 
> Add i2c DT binding to i2c-mt6577.txt for MT2701 and there is no need to 
> modify i2c driver.

Wrap your lines.

> 
> Change-Id: I892f866d755aa3865bffd1a80884cd41b6ecc1f1

Remove Gerrit Ids.

> Signed-off-by: Jun Gao 
> ---
>  .../devicetree/bindings/i2c/i2c-mt6577.txt |   11 ++-
>  1 file changed, 6 insertions(+), 5 deletions(-)
> 
> diff --git a/Documentation/devicetree/bindings/i2c/i2c-mt6577.txt 
> b/Documentation/devicetree/bindings/i2c/i2c-mt6577.txt
> index 0ce6fa3..ef22ecf 100644
> --- a/Documentation/devicetree/bindings/i2c/i2c-mt6577.txt
> +++ b/Documentation/devicetree/bindings/i2c/i2c-mt6577.txt
> @@ -4,11 +4,12 @@ The Mediatek's I2C controller is used to interface with I2C 
> devices.
>  
>  Required properties:
>- compatible: value should be either of the following.
> -  (a) "mediatek,mt6577-i2c", for i2c compatible with mt6577 i2c.
> -  (b) "mediatek,mt6589-i2c", for i2c compatible with mt6589 i2c.
> -  (c) "mediatek,mt8127-i2c", for i2c compatible with mt8127 i2c.
> -  (d) "mediatek,mt8135-i2c", for i2c compatible with mt8135 i2c.
> -  (e) "mediatek,mt8173-i2c", for i2c compatible with mt8173 i2c.
> + "mediatek,mt2701-i2c", for i2c compatible with mt2701 i2c.
> + "mediatek,mt6577-i2c", for i2c compatible with mt6577 i2c.
> + "mediatek,mt6589-i2c", for i2c compatible with mt6589 i2c.
> + "mediatek,mt8127-i2c", for i2c compatible with mt8127 i2c.
> + "mediatek,mt8135-i2c", for i2c compatible with mt8135 i2c.
> + "mediatek,mt8173-i2c", for i2c compatible with mt8173 i2c.

All the parts after the compatible strings are a bit redundant. Just 
drop them.

>- reg: physical base address of the controller and dma base, length of 
> memory
>  mapped region.
>- interrupts: interrupt number to the cpu.
> -- 
> 1.7.9.5
> 
> --
> To unsubscribe from this list: send the line "unsubscribe devicetree" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] sched: Optimize pick_next_task for idle_sched_class too

2017-02-27 Thread Steven Rostedt


Sorry, for the late reply. Just got back from traveling.

On Thu, 23 Feb 2017 18:54:38 +0100
Peter Zijlstra  wrote:

> On Thu, Feb 23, 2017 at 06:45:05PM +0100, Peter Zijlstra wrote:
> > Hurm.. maybe we should do what Steve initially suggested. The
> > alternative is link order trickery, and I'm not sure we want to do that.  
> 
> That is, given:
> 
> kernel/sched/Makefile: obj-y += idle_task.o fair.o rt.o deadline.o stop_task.o
> 
> results in:
> 
> readelf -s defconfig-build/vmlinux | awk '/sched_class/ {print $2 " " $8}' | 
> sort -n
> 602c93c0 idle_sched_class
> 602c9480 fair_sched_class
> 602c9580 rt_sched_class
> 602c96c0 dl_sched_class
> 602c97c0 stop_sched_class
> 
> we can do this, but yuck!
> 
> ---
>  kernel/sched/core.c | 12 +---
>  1 file changed, 9 insertions(+), 3 deletions(-)
> 
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 8f972df76eb2..eebe6729ceb7 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -3285,10 +3285,16 @@ pick_next_task(struct rq *rq, struct task_struct 
> *prev, struct rq_flags *rf)
>   struct task_struct *p;
>  
>   /*
> -  * Optimization: we know that if all tasks are in
> -  * the fair class we can call that function directly:
> +  * Optimization: we know that if all tasks are in the fair class we can
> +  * call that function directly, but only if the @prev task wasn't of a
> +  * higher scheduling class, because otherwise those loose the
> +  * opportinity to pull in more work from other CPUs.
> +  *
> +  * Depends on link order in kernel/sched/Makefile.
>*/
> - if (likely(rq->nr_running == rq->cfs.h_nr_running)) {
> + if (likely(rq->nr_running == rq->cfs.h_nr_running &&
> +prev->sched_class <= &fair_sched_class)) {

If we go this route, I would suggest that we hardcode the classes in
vmlinux.lds.h.

-- Steve

> +
>   p = fair_sched_class.pick_next_task(rq, prev, rf);
>   if (unlikely(p == RETRY_TASK))
>   goto again;

[PATCH 2/5] staging: rtl8192u: Remove unnecessary else after return

2017-02-27 Thread simran singhal

This patch fixes the checkpatch warning that else is not generally
useful after a break or return.

This was done using Coccinelle:

@@
expression e2;
statement s1;
@@
if(e2) { ... return ...; }
-else
 s1

Signed-off-by: simran singhal 
---
 drivers/staging/rtl8192u/ieee80211/ieee80211_crypt_tkip.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/staging/rtl8192u/ieee80211/ieee80211_crypt_tkip.c 
b/drivers/staging/rtl8192u/ieee80211/ieee80211_crypt_tkip.c
index 2453413..4d6c928 100644
--- a/drivers/staging/rtl8192u/ieee80211/ieee80211_crypt_tkip.c
+++ b/drivers/staging/rtl8192u/ieee80211/ieee80211_crypt_tkip.c
@@ -374,8 +374,7 @@ static int ieee80211_tkip_encrypt(struct sk_buff *skb, int 
hdr_len, void *priv)
 
if (!tcb_desc->bHwSec)
return ret;
-   else
-   return 0;
+   return 0;
 
 
 }
-- 
2.7.4

[PATCH 5/5] staging: gdm724x: Remove unnecessary else after return

2017-02-27 Thread simran singhal

This patch fixes the checkpatch warning that else is not generally
useful after a break or return.

This was done using Coccinelle:
@@
expression e2;
statement s1;
@@
if(e2) { ... return ...; }
-else
 s1

Signed-off-by: simran singhal 
---
 drivers/staging/gdm724x/gdm_endian.c | 12 
 1 file changed, 4 insertions(+), 8 deletions(-)

diff --git a/drivers/staging/gdm724x/gdm_endian.c 
b/drivers/staging/gdm724x/gdm_endian.c
index d0b43e2..4ef728f 100644
--- a/drivers/staging/gdm724x/gdm_endian.c
+++ b/drivers/staging/gdm724x/gdm_endian.c
@@ -26,30 +26,26 @@ __dev16 gdm_cpu_to_dev16(struct gdm_endian *ed, u16 x)
 {
if (ed->dev_ed == ENDIANNESS_LITTLE)
return (__force __dev16)cpu_to_le16(x);
-   else
-   return (__force __dev16)cpu_to_be16(x);
+   return (__force __dev16)cpu_to_be16(x);
 }
 
 u16 gdm_dev16_to_cpu(struct gdm_endian *ed, __dev16 x)
 {
if (ed->dev_ed == ENDIANNESS_LITTLE)
return le16_to_cpu((__force __le16)x);
-   else
-   return be16_to_cpu((__force __be16)x);
+   return be16_to_cpu((__force __be16)x);
 }
 
 __dev32 gdm_cpu_to_dev32(struct gdm_endian *ed, u32 x)
 {
if (ed->dev_ed == ENDIANNESS_LITTLE)
return (__force __dev32)cpu_to_le32(x);
-   else
-   return (__force __dev32)cpu_to_be32(x);
+   return (__force __dev32)cpu_to_be32(x);
 }
 
 u32 gdm_dev32_to_cpu(struct gdm_endian *ed, __dev32 x)
 {
if (ed->dev_ed == ENDIANNESS_LITTLE)
return le32_to_cpu((__force __le32)x);
-   else
-   return be32_to_cpu((__force __be32)x);
+   return be32_to_cpu((__force __be32)x);
 }
-- 
2.7.4

[PATCH 3/5] staging: rtl8712: Remove unnecessary else after return

2017-02-27 Thread simran singhal

This patch fixes the checkpatch warning that else is not generally
useful after a break or return.

This was done using Coccinelle:
@@
expression e2;
statement s1;
@@
if(e2) { ... return ...; }
-else
 s1

Signed-off-by: simran singhal 
---
 drivers/staging/rtl8712/os_intfs.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/staging/rtl8712/os_intfs.c 
b/drivers/staging/rtl8712/os_intfs.c
index 8836b31..3062167 100644
--- a/drivers/staging/rtl8712/os_intfs.c
+++ b/drivers/staging/rtl8712/os_intfs.c
@@ -411,8 +411,7 @@ static int netdev_open(struct net_device *pnetdev)
goto netdev_open_error;
if (!padapter->dvobjpriv.inirp_init)
goto netdev_open_error;
-   else
-   padapter->dvobjpriv.inirp_init(padapter);
+   padapter->dvobjpriv.inirp_init(padapter);
r8712_set_ps_mode(padapter, padapter->registrypriv.power_mgnt,
  padapter->registrypriv.smart_ps);
}
-- 
2.7.4

[PATCH 1/5] staging: lustre: Remove unnecessary else after return

2017-02-27 Thread simran singhal

This patch fixes the checkpatch warning that else is not generally
useful after a break or return.

@@
expression e2;
statement s1;
@@
if(e2) { ... return ...; }
-else
 s1

Signed-off-by: simran singhal 
---
 drivers/staging/lustre/lnet/klnds/socklnd/socklnd.c | 3 +--
 drivers/staging/lustre/lustre/ldlm/ldlm_pool.c  | 3 +--
 2 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/staging/lustre/lnet/klnds/socklnd/socklnd.c 
b/drivers/staging/lustre/lnet/klnds/socklnd/socklnd.c
index fbbd8a5..02d49b7 100644
--- a/drivers/staging/lustre/lnet/klnds/socklnd/socklnd.c
+++ b/drivers/staging/lustre/lnet/klnds/socklnd/socklnd.c
@@ -1806,8 +1806,7 @@ ksocknal_close_matching_conns(struct lnet_process_id id, 
__u32 ipaddr)
 
if (!count)
return -ENOENT;
-   else
-   return 0;
+   return 0;
 }
 
 void
diff --git a/drivers/staging/lustre/lustre/ldlm/ldlm_pool.c 
b/drivers/staging/lustre/lustre/ldlm/ldlm_pool.c
index cf3fc57..ac32c82 100644
--- a/drivers/staging/lustre/lustre/ldlm/ldlm_pool.c
+++ b/drivers/staging/lustre/lustre/ldlm/ldlm_pool.c
@@ -338,8 +338,7 @@ static int ldlm_cli_pool_shrink(struct ldlm_pool *pl,
 
if (nr == 0)
return (unused / 100) * sysctl_vfs_cache_pressure;
-   else
-   return ldlm_cancel_lru(ns, nr, LCF_ASYNC, LDLM_LRU_FLAG_SHRINK);
+   return ldlm_cancel_lru(ns, nr, LCF_ASYNC, LDLM_LRU_FLAG_SHRINK);
 }
 
 static const struct ldlm_pool_ops ldlm_cli_pool_ops = {
-- 
2.7.4

[PATCH 4/5] staging: sm750fb: Remove unnecessary else after return

2017-02-27 Thread simran singhal

This patch fixes the checkpatch warning that else is not generally
useful after a break or return.

This was done using Coccinelle:
@@
expression e2;
statement s1;
@@
if(e2) { ... return ...; }
-else
 s1

Signed-off-by: simran singhal 
---
 drivers/staging/sm750fb/ddk750_sii164.c | 6 ++
 drivers/staging/sm750fb/ddk750_swi2c.c  | 6 ++
 2 files changed, 4 insertions(+), 8 deletions(-)

diff --git a/drivers/staging/sm750fb/ddk750_sii164.c 
b/drivers/staging/sm750fb/ddk750_sii164.c
index 259006a..6906598 100644
--- a/drivers/staging/sm750fb/ddk750_sii164.c
+++ b/drivers/staging/sm750fb/ddk750_sii164.c
@@ -368,8 +368,7 @@ unsigned char sii164IsConnected(void)
hotPlugValue = i2cReadReg(SII164_I2C_ADDRESS, SII164_DETECT) & 
SII164_DETECT_HOT_PLUG_STATUS_MASK;
if (hotPlugValue == SII164_DETECT_HOT_PLUG_STATUS_ON)
return 1;
-   else
-   return 0;
+   return 0;
 }
 
 /*
@@ -387,8 +386,7 @@ unsigned char sii164CheckInterrupt(void)
detectReg = i2cReadReg(SII164_I2C_ADDRESS, SII164_DETECT) & 
SII164_DETECT_MONITOR_STATE_MASK;
if (detectReg == SII164_DETECT_MONITOR_STATE_CHANGE)
return 1;
-   else
-   return 0;
+   return 0;
 }
 
 /*
diff --git a/drivers/staging/sm750fb/ddk750_swi2c.c 
b/drivers/staging/sm750fb/ddk750_swi2c.c
index a4ac07c..5997349 100644
--- a/drivers/staging/sm750fb/ddk750_swi2c.c
+++ b/drivers/staging/sm750fb/ddk750_swi2c.c
@@ -199,8 +199,7 @@ static unsigned char sw_i2c_read_sda(void)
gpio_data = peek32(sw_i2c_data_gpio_data_reg);
if (gpio_data & (1 << sw_i2c_data_gpio))
return 1;
-   else
-   return 0;
+   return 0;
 }
 
 /*
@@ -295,8 +294,7 @@ static long sw_i2c_write_byte(unsigned char data)
 
if (i < 0xff)
return 0;
-   else
-   return -1;
+   return -1;
 }
 
 /*
-- 
2.7.4

Re: [PATCH] ptrace: fix PTRACE_LISTEN race corrupting task->state

2017-02-27 Thread bsegall

Oleg Nesterov  writes:

> (add akpm, we usually route ptrace fixes via -mm tree)
>
> On 02/21, bseg...@google.com wrote:
>>
>> --- a/kernel/ptrace.c
>> +++ b/kernel/ptrace.c
>> @@ -184,10 +184,14 @@ static void ptrace_unfreeze_traced(struct task_struct 
>> *task)
>>  
>> WARN_ON(!task->ptrace || task->parent != current);
>>  
>> +   /*
>> +* Double check __TASK_TRACED under the lock to prevent corrupting 
>> state
>> +* in case of a ptrace_trap_notify wakeup
>> +*/
>> spin_lock_irq(&task->sighand->siglock);
>> if (__fatal_signal_pending(task))
>> wake_up_state(task, __TASK_TRACED);
>> -   else
>> +   else if (task->state == __TASK_TRACED)
>> task->state = TASK_TRACED;
>> spin_unlock_irq(&task->sighand->siglock);
>
> So yes, I think your patch is fine except the comment should explain that
> we need this because PTRACE_LISTEN makes ptrace_trap_notify() possible. And
> perhaps it would be better to do the 2nd check before fatal_signal_pending:
>
>   if (task->state == __TASK_TRACED) {
>   if (__fatal_signal_pending(task))
>   wake_up_state(task, __TASK_TRACED);
>   else
>   task->state = TASK_TRACED;
>   }
>
> just to make the logic more clear. wake_up_state(__TASK_TRACED) can
> never hurt if the task is killed, just it doesn't look strictly correct
> if the tracee was already woken. But this is minor.
>
>
>
> You know, I'd prefer another fix, see below.
>
> Why. ptrace_unfreeze_traced() assumes that - since ptrace_freeze_traced()
> checks PTRACE_LISTEN - nobody but us can wake the tracee up. So the
> __TASK_TRACED check at the start of ptrace_unfreeze_traced() means that
> the tracee is still freezed, it was not woken up by (say) PTRACE_CONT.
>
> IOW, currently we assume that only the caller of ptrace_freeze_traced()
> can do the __TASK_TRACED -> WHATEVER transition.
>
> However, as you pointed out, I forgot that JOBCTL_LISTENING set by LISTEN
> breaks this assumption, and imo it would be nice to fix this.
>
> What do you think? I won't insist too much if you prefer your simple
> change.

My knowledge of the ptrace state machine isn't the best, but this looks
valid to me and doesn't crash

>
> Oleg.
>
> --- x/kernel/ptrace.c
> +++ x/kernel/ptrace.c
> @@ -174,6 +174,18 @@
>   return ret;
>  }
>  
> +static bool __ptrace_unfreeze_traced(struct task_struct *task)
> +{
> + bool killed = __fatal_signal_pending(task);
> +
> + if (killed)
> + wake_up_state(task, __TASK_TRACED);
> + else
> + task->state = TASK_TRACED;
> +
> + return !killed'
> +}
> +
>  static void ptrace_unfreeze_traced(struct task_struct *task)
>  {
>   if (task->state != __TASK_TRACED)
> @@ -182,10 +194,7 @@
>   WARN_ON(!task->ptrace || task->parent != current);
>  
>   spin_lock_irq(&task->sighand->siglock);
> - if (__fatal_signal_pending(task))
> - wake_up_state(task, __TASK_TRACED);
> - else
> - task->state = TASK_TRACED;
> + __ptrace_unfreeze_traced(task);
>   spin_unlock_irq(&task->sighand->siglock);
>  }
>  
> @@ -993,7 +1002,12 @@
>   break;
>  
>   si = child->last_siginfo;
> - if (likely(si && (si->si_code >> 8) == PTRACE_EVENT_STOP)) {
> + /*
> +  * Once we set JOBCTL_LISTENING we do not own child->state,
> +  * need to unfreeze first.
> +  */
> + if (__ptrace_unfreeze_traced(child) &&
> + likely(si && (si->si_code >> 8) == PTRACE_EVENT_STOP)) {
>   child->jobctl |= JOBCTL_LISTENING;
>   /*
>* If NOTIFY is set, it means event happened between

Re: tip.today - scheduler bam boom crash (cpu hotplug)

2017-02-27 Thread Thomas Gleixner

On Mon, 27 Feb 2017, Paolo Bonzini wrote:
> diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
> index 2724dc82f992..3080b6877190 100644
> --- a/arch/x86/kernel/tsc.c
> +++ b/arch/x86/kernel/tsc.c
> @@ -1398,6 +1398,9 @@ void __init tsc_init(void)
> 
>   use_tsc_delay();
> 
> + if (!boot_cpu_has(X86_FEATURE_NONSTOP_TSC))
> + mark_tsc_unstable("not invariant");

Errm, no. 

That makes TSC unusable for systems which do not go into C/P states in
which the TSC stops. There is a world outside KVM 

Thanks,

tglx

Re: Schedule affinity_notify work while migrating IRQs during hot plug

2017-02-27 Thread Thomas Gleixner

On Mon, 27 Feb 2017, Sodagudi Prasad wrote:
> So I am thinking that, adding following sched_work() would notify clients.

And break the world and some more.

> diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
> index 6b66959..5e4766b 100644
> --- a/kernel/irq/manage.c
> +++ b/kernel/irq/manage.c
> @@ -207,6 +207,7 @@ int irq_do_set_affinity(struct irq_data *data, const
> struct cpumask *mask,
> case IRQ_SET_MASK_OK_DONE:
> cpumask_copy(desc->irq_common_data.affinity, mask);
> case IRQ_SET_MASK_OK_NOCOPY:
> +   schedule_work(&desc->affinity_notify->work);
> irq_set_thread_affinity(desc);
> ret = 0;

You cannot do that unconditionally and just slap that schedule_work() call
into the code. Aside of that schedule_work() would be invoked twice for all
calls which come via irq_set_affinity_locked() 

Thanks,

tglx

Re: [lkp-robot] [x86/mm/ptdump] 243b72aae2: WARNING:at_arch/x86/mm/dump_pagetables.c:#note_page

2017-02-27 Thread Masami Hiramatsu

On Mon, 27 Feb 2017 16:01:34 +0300
Andrey Ryabinin  wrote:

> On 02/27/2017 04:03 AM, kernel test robot wrote:
> > 
> > FYI, we noticed the following commit:
> > 
> > commit: 243b72aae28ca1032284028323bb81c9235b15c9 ("x86/mm/ptdump: Optimize 
> > check for W+X mappings for CONFIG_KASAN=y")
> > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
> 
> ...
> > 
> > caused below changes (please refer to attached dmesg/kmsg for entire 
> > log/backtrace):
> > 
> > 
> > +-+++
> > | | 5b1ad68f9b | 
> > 243b72aae2 |
> > +-+++
> > | boot_successes  | 0  | 0  
> > |
> > | boot_failures   | 8  | 6  
> > |
> > | BUG:KASAN:slab-out-of-bounds| 8  | 6  
> > |
> > | WARNING:at_arch/x86/mm/dump_pagetables.c:#note_page | 0  | 6  
> > |
> > +-+++
> 
> Ok, I reproduced this, but it's definitely caused *not* by 243b72aae28.
> This WARN is also reproducible on the parent commit 5b1ad68f9b.
> The only difference here is that on parent one needs dozens of seconds to 
> reach this WARNING.
> It seems that this time difference somehow confused the robot.
> 
> As for the warning itself, it caused by kprobes. krpobe code use 
> module_alloc() which
> creates these RWX mappings.
> I'm not sure how to fix this as AFAIK kprobes actually need RWX mapping.

Thanks! I also noticed this issue. I actually have a plan to make kprobes with 
R+X mappings (as ftrace does).
I'll queue this on my TODO list.



-- 
Masami Hiramatsu

Re: [PATCH V5 3/6] mm: move MADV_FREE pages into LRU_INACTIVE_FILE list

2017-02-27 Thread Johannes Weiner

On Fri, Feb 24, 2017 at 01:31:46PM -0800, Shaohua Li wrote:
> madv MADV_FREE indicate pages are 'lazyfree'. They are still anonymous
> pages, but they can be freed without pageout. To destinguish them
> against normal anonymous pages, we clear their SwapBacked flag.
> 
> MADV_FREE pages could be freed without pageout, so they pretty much like
> used once file pages. For such pages, we'd like to reclaim them once
> there is memory pressure. Also it might be unfair reclaiming MADV_FREE
> pages always before used once file pages and we definitively want to
> reclaim the pages before other anonymous and file pages.
> 
> To speed up MADV_FREE pages reclaim, we put the pages into
> LRU_INACTIVE_FILE list. The rationale is LRU_INACTIVE_FILE list is tiny
> nowadays and should be full of used once file pages. Reclaiming
> MADV_FREE pages will not have much interfere of anonymous and active
> file pages. And the inactive file pages and MADV_FREE pages will be
> reclaimed according to their age, so we don't reclaim too many MADV_FREE
> pages too. Putting the MADV_FREE pages into LRU_INACTIVE_FILE_LIST also
> means we can reclaim the pages without swap support. This idea is
> suggested by Johannes.
> 
> This patch doesn't move MADV_FREE pages to LRU_INACTIVE_FILE list yet to
> avoid bisect failure, next patch will do it.
> 
> The patch is based on Minchan's original patch.
> 
> Cc: Michal Hocko 
> Cc: Minchan Kim 
> Cc: Hugh Dickins 
> Cc: Rik van Riel 
> Cc: Mel Gorman 
> Cc: Andrew Morton 
> Suggested-by: Johannes Weiner 
> Signed-off-by: Shaohua Li 

Acked-by: Johannes Weiner

Re: [RFC PATCH v4 19/28] swiotlb: Add warnings for use of bounce buffers with SME

2017-02-27 Thread Borislav Petkov

On Thu, Feb 16, 2017 at 09:46:19AM -0600, Tom Lendacky wrote:
> Add warnings to let the user know when bounce buffers are being used for
> DMA when SME is active.  Since the bounce buffers are not in encrypted
> memory, these notifications are to allow the user to determine some
> appropriate action - if necessary.
> 
> Signed-off-by: Tom Lendacky 
> ---
>  arch/x86/include/asm/mem_encrypt.h |   11 +++
>  include/linux/dma-mapping.h|   11 +++
>  include/linux/mem_encrypt.h|6 ++
>  lib/swiotlb.c  |3 +++
>  4 files changed, 31 insertions(+)
> 
> diff --git a/arch/x86/include/asm/mem_encrypt.h 
> b/arch/x86/include/asm/mem_encrypt.h
> index 87e816f..5a17f1b 100644
> --- a/arch/x86/include/asm/mem_encrypt.h
> +++ b/arch/x86/include/asm/mem_encrypt.h
> @@ -26,6 +26,11 @@ static inline bool sme_active(void)
>   return (sme_me_mask) ? true : false;
>  }
>  
> +static inline u64 sme_dma_mask(void)
> +{
> + return ((u64)sme_me_mask << 1) - 1;
> +}
> +
>  void __init sme_early_encrypt(resource_size_t paddr,
> unsigned long size);
>  void __init sme_early_decrypt(resource_size_t paddr,
> @@ -53,6 +58,12 @@ static inline bool sme_active(void)
>  {
>   return false;
>  }
> +
> +static inline u64 sme_dma_mask(void)
> +{
> + return 0ULL;
> +}
> +
>  #endif
>  
>  static inline void __init sme_early_encrypt(resource_size_t paddr,
> diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
> index 10c5a17..130bef7 100644
> --- a/include/linux/dma-mapping.h
> +++ b/include/linux/dma-mapping.h
> @@ -10,6 +10,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  /**
>   * List of possible attributes associated with a DMA mapping. The semantics
> @@ -557,6 +558,11 @@ static inline int dma_set_mask(struct device *dev, u64 
> mask)
>  
>   if (!dev->dma_mask || !dma_supported(dev, mask))
>   return -EIO;
> +
> + if (sme_active() && (mask < sme_dma_mask()))
> + dev_warn(dev,
> +  "SME is active, device will require DMA bounce 
> buffers\n");
> +

Yes, definitely _once() here.

It could be extended later to be per-device if the need arises.

Also, a bit above in this function, we test if (ops->set_dma_mask) so
device drivers which supply even an empty ->set_dma_mask will circumvent
this check.

It probably doesn't matter all that much right now because the
only driver I see right now defining this method, though, is
ethernet/intel/fm10k/fm10k_pf.c and some other arches' functionality
which is unrelated here.

But still...


-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

Re: [PATCH V5 4/6] mm: reclaim MADV_FREE pages

2017-02-27 Thread Johannes Weiner

On Fri, Feb 24, 2017 at 01:31:47PM -0800, Shaohua Li wrote:
> When memory pressure is high, we free MADV_FREE pages. If the pages are
> not dirty in pte, the pages could be freed immediately. Otherwise we
> can't reclaim them. We put the pages back to anonumous LRU list (by
> setting SwapBacked flag) and the pages will be reclaimed in normal
> swapout way.
> 
> We use normal page reclaim policy. Since MADV_FREE pages are put into
> inactive file list, such pages and inactive file pages are reclaimed
> according to their age. This is expected, because we don't want to
> reclaim too many MADV_FREE pages before used once pages.
> 
> Based on Minchan's original patch
> 
> Cc: Michal Hocko 
> Cc: Minchan Kim 
> Cc: Hugh Dickins 
> Cc: Johannes Weiner 
> Cc: Rik van Riel 
> Cc: Mel Gorman 
> Cc: Andrew Morton 
> Signed-off-by: Shaohua Li 

Acked-by: Johannes Weiner 

FWIW, I agree with Minchan that this could be folded into the previous
patch and would be a little neater. But I don't feel strongly in this
case since I didn't have any trouble reviewing the patches like this -
void mark_page_lazyfree(struct page *) is an easy API to remember.

Re: [PATCH v2] mm/vmscan: fix high cpu usage of kswapd if there are no reclaimable pages

2017-02-27 Thread Johannes Weiner

On Mon, Feb 27, 2017 at 09:50:24AM +0100, Michal Hocko wrote:
> On Fri 24-02-17 11:51:05, Johannes Weiner wrote:
> [...]
> > >From 29fefdca148e28830e0934d4e6cceb95ed2ee36e Mon Sep 17 00:00:00 2001
> > From: Johannes Weiner 
> > Date: Fri, 24 Feb 2017 10:56:32 -0500
> > Subject: [PATCH] mm: vmscan: disable kswapd on unreclaimable nodes
> > 
> > Jia He reports a problem with kswapd spinning at 100% CPU when
> > requesting more hugepages than memory available in the system:
> > 
> > $ echo 4000 >/proc/sys/vm/nr_hugepages
> > 
> > top - 13:42:59 up  3:37,  1 user,  load average: 1.09, 1.03, 1.01
> > Tasks:   1 total,   1 running,   0 sleeping,   0 stopped,   0 zombie
> > %Cpu(s):  0.0 us, 12.5 sy,  0.0 ni, 85.5 id,  2.0 wa,  0.0 hi,  0.0 si,  
> > 0.0 st
> > KiB Mem:  31371520 total, 30915136 used,   456384 free,  320 buffers
> > KiB Swap:  6284224 total,   115712 used,  6168512 free.48192 cached Mem
> > 
> >   PID USER  PR  NIVIRTRESSHR S  %CPU  %MEM TIME+ COMMAND
> >76 root  20   0   0  0  0 R 100.0 0.000 217:17.29 kswapd3
> > 
> > At that time, there are no reclaimable pages left in the node, but as
> > kswapd fails to restore the high watermarks it refuses to go to sleep.
> > 
> > Kswapd needs to back away from nodes that fail to balance. Up until
> > 1d82de618ddd ("mm, vmscan: make kswapd reclaim in terms of nodes")
> > kswapd had such a mechanism. It considered zones whose theoretically
> > reclaimable pages it had reclaimed six times over as unreclaimable and
> > backed away from them. This guard was erroneously removed as the patch
> > changed the definition of a balanced node.
> > 
> > However, simply restoring this code wouldn't help in the case reported
> > here: there *are* no reclaimable pages that could be scanned until the
> > threshold is met. Kswapd would stay awake anyway.
> > 
> > Introduce a new and much simpler way of backing off. If kswapd runs
> > through MAX_RECLAIM_RETRIES (16) cycles without reclaiming a single
> > page, make it back off from the node. This is the same number of shots
> > direct reclaim takes before declaring OOM. Kswapd will go to sleep on
> > that node until a direct reclaimer manages to reclaim some pages, thus
> > proving the node reclaimable again.
> 
> Yes this looks, nice&simple. I would just be worried about [1] a bit.
> Maybe that is worth a separate patch though.
> 
> [1] http://lkml.kernel.org/r/20170223111609.hlncnvokhq3qu...@dhcp22.suse.cz

I think I'd prefer the simplicity of keeping this contained inside
vmscan.c, as an interaction between direct reclaimers and kswapd, as
well as leaving the wakeup tied to actually seeing reclaimable pages
rather than merely producing free pages (e.g. should we also add a
kick to a large munmap() for example?).

OOM kills come with such high latencies that I cannot imagine a
slightly quicker kswapd restart would matter in practice.

> > Reported-by: Jia He 
> > Signed-off-by: Johannes Weiner 
> 
> Acked-by: Michal Hocko 

Thanks!

> I would have just one more suggestion. Please move MAX_RECLAIM_RETRIES
> to mm/internal.h. This is MM internal thing and there is no need to make
> it visible.

Good point, I'll move it.

Re: [PATCH v2 1/2] Documentation: devicetree: Add i2c binding for mediatek MT2701 Soc Platform

2017-02-27 Thread Rob Herring

On Wed, Feb 15, 2017 at 10:32:55AM +0800, Jun Gao wrote:
> From: Jun Gao 

And for the subject, please use "dt-bindings: i2c: ..."

Re: tip.today - scheduler bam boom crash (cpu hotplug)

2017-02-27 Thread Paolo Bonzini



On 27/02/2017 17:36, Peter Zijlstra wrote:
> diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
> index 4e95b2e0d95f..bc3bbb6a8ab0 100644
> --- a/arch/x86/kernel/cpu/amd.c
> +++ b/arch/x86/kernel/cpu/amd.c
> @@ -555,10 +555,8 @@ static void early_init_amd(struct cpuinfo_x86 *c)
>   if (c->x86_power & (1 << 8)) {
>   set_cpu_cap(c, X86_FEATURE_CONSTANT_TSC);
>   set_cpu_cap(c, X86_FEATURE_NONSTOP_TSC);
> - if (check_tsc_unstable())
> - clear_sched_clock_stable();
>   } else {
> - clear_sched_clock_stable();
> + mark_tsc_unstable("not invariant");
>   }
>  
>   /* Bit 12 of 8000_0007 edx is accumulated power mechanism. */
> diff --git a/arch/x86/kernel/cpu/centaur.c b/arch/x86/kernel/cpu/centaur.c
> index 2c234a6d94c4..0fdff183aa30 100644
> --- a/arch/x86/kernel/cpu/centaur.c
> +++ b/arch/x86/kernel/cpu/centaur.c
> @@ -105,7 +105,7 @@ static void early_init_centaur(struct cpuinfo_x86 *c)
>   set_cpu_cap(c, X86_FEATURE_SYSENTER32);
>  #endif
>  
> - clear_sched_clock_stable();
> + mark_tsc_unstable("not invariant");
>  }
>  
>  static void init_centaur(struct cpuinfo_x86 *c)
> diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
> index f07005e6f461..2b7ff648ea25 100644
> --- a/arch/x86/kernel/cpu/common.c
> +++ b/arch/x86/kernel/cpu/common.c
> @@ -86,7 +86,7 @@ static void default_init(struct cpuinfo_x86 *c)
>   strcpy(c->x86_model_id, "386");
>   }
>  #endif
> - clear_sched_clock_stable();
> + mark_tsc_unstable("not invariant");
>  }
>  
>  static const struct cpu_dev default_cpu = {
> @@ -1076,7 +1076,7 @@ static void identify_cpu(struct cpuinfo_x86 *c)
>   if (this_cpu->c_init)
>   this_cpu->c_init(c);
>   else
> - clear_sched_clock_stable();
> + mark_tsc_unstable("not invariant");
>  
>   /* Disable the PN if appropriate */
>   squash_the_stupid_serial_number(c);
> diff --git a/arch/x86/kernel/cpu/cyrix.c b/arch/x86/kernel/cpu/cyrix.c
> index 47416f959a48..35057d67e864 100644
> --- a/arch/x86/kernel/cpu/cyrix.c
> +++ b/arch/x86/kernel/cpu/cyrix.c
> @@ -184,7 +184,7 @@ static void early_init_cyrix(struct cpuinfo_x86 *c)
>   set_cpu_cap(c, X86_FEATURE_CYRIX_ARR);
>   break;
>   }
> - clear_sched_clock_stable();
> + mark_tsc_unstable("not invariant");
>  }
>  
>  static void init_cyrix(struct cpuinfo_x86 *c)
> diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
> index 017ecd3bb553..e0e192e43a4c 100644
> --- a/arch/x86/kernel/cpu/intel.c
> +++ b/arch/x86/kernel/cpu/intel.c
> @@ -161,10 +161,8 @@ static void early_init_intel(struct cpuinfo_x86 *c)
>   if (c->x86_power & (1 << 8)) {
>   set_cpu_cap(c, X86_FEATURE_CONSTANT_TSC);
>   set_cpu_cap(c, X86_FEATURE_NONSTOP_TSC);
> - if (check_tsc_unstable())
> - clear_sched_clock_stable();
>   } else {
> - clear_sched_clock_stable();
> + mark_tsc_unstable("not invariant");
>   }
>  
>   /* Penwell and Cloverview have the TSC which doesn't sleep on S3 */

Doh, these are called _before_ kvmclock_init.  But perhaps they can all
be replaced by something like this:

diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index 2724dc82f992..3080b6877190 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -1398,6 +1398,9 @@ void __init tsc_init(void)

use_tsc_delay();

+   if (!boot_cpu_has(X86_FEATURE_NONSTOP_TSC))
+   mark_tsc_unstable("not invariant");
+
if (unsynchronized_tsc())
mark_tsc_unstable("TSCs unsynchronized");

The rest seems nice.

Paolo

Re: [PATCH v2 6/7] tpm: expose spaces via a device link /dev/tpms

2017-02-27 Thread Jason Gunthorpe

On Sun, Feb 26, 2017 at 01:44:40PM +0200, Jarkko Sakkinen wrote:

> There's now tabrm-v3 branch. I had to tweak error handling in your
> device adding patch because of b4e9d7561a70. I hope I didn't break
> anything.

Just looked, the flow seemed like it works to me. Just confusing that
tpm_add_char_device isn't undone by tpm_del_char_device

Jason

Re: [PATCH 2/3] of: add vendor prefix for Sensirion company

2017-02-27 Thread Rob Herring

On Sun, Feb 19, 2017 at 8:59 AM, Tomasz Duszynski  wrote:
> This patch adds prefix for Sensirion company.
>
> Signed-off-by: Tomasz Duszynski 
> ---
>  Documentation/devicetree/bindings/vendor-prefixes.txt | 1 +
>  1 file changed, 1 insertion(+)

There's another patch already adding this that I'm applying, so drop this one.

Rob

Re: [PATCH v2 3/3] percpu: improve allocation success rate for non-GFP_KERNEL callers

2017-02-27 Thread Michal Hocko

On Mon 27-02-17 18:07:53, Michal Hocko wrote:
> On Mon 27-02-17 09:01:09, Tahsin Erdogan wrote:
> > On Mon, Feb 27, 2017 at 7:25 AM, Michal Hocko  wrote:
> > > /*
> > >  * No space left.  Create a new chunk.  We don't want multiple
> > >  * tasks to create chunks simultaneously.  Serialize and create 
> > > iff
> > >  * there's still no empty chunk after grabbing the mutex.
> > >  */
> > > if (is_atomic)
> > > goto fail;
> > >
> > > right before pcpu_populate_chunk so is this actually a problem?
> > 
> > Yes, this prevents adding more pcpu chunks and so cause "atomic" allocations
> > to fail more easily.
> 
> Then I fail to see what is the problem you are trying to fix.

To be more specific. Could you describe what more can we do in the
vmalloc layer for GFP_NOWAIT allocations? They certainly cannot sleep
and cannot perform the reclaim so you have to rely on the background
work.
-- 
Michal Hocko
SUSE Labs

Re: [PATCH v6 4/6] i2c: designware: introducing I2C_SLAVE definitions

2017-02-27 Thread Rob Herring

On Wed, Feb 15, 2017 at 12:22:13PM +, Luis Oliveira wrote:
> - Definitions were added to core library
> - A example was added to designware-core.txt Documentation that shows
>   how the slave can be setup using DTS
> 
> SLAVE related definitions were added to the core of the controller.
> 
> Signed-off-by: Luis Oliveira 
> Reviewed-by: Andy Shevchenko 
> ---
> V5->V6
> - Included an example of use in the device tree binding document
> 
>  .../devicetree/bindings/i2c/i2c-designware.txt | 16 +-
>  drivers/i2c/busses/i2c-designware-core.h   | 35 
> --
>  2 files changed, 48 insertions(+), 3 deletions(-)
> 
> diff --git a/Documentation/devicetree/bindings/i2c/i2c-designware.txt 
> b/Documentation/devicetree/bindings/i2c/i2c-designware.txt
> index fee26dc3e858..c6f1d9a63d18 100644
> --- a/Documentation/devicetree/bindings/i2c/i2c-designware.txt
> +++ b/Documentation/devicetree/bindings/i2c/i2c-designware.txt
> @@ -20,7 +20,7 @@ Optional properties :
>   - i2c-sda-falling-time-ns : should contain the SDA falling time in 
> nanoseconds.
> This value which is by default 300ns is used to compute the tHIGH period.
>  
> -Example :
> +Examples :
>  
>   i2c@f {
>   #address-cells = <1>;
> @@ -43,3 +43,17 @@ Example :
>   i2c-sda-falling-time-ns = <300>;
>   i2c-scl-falling-time-ns = <300>;
>   };
> +
> + i2c@0112 {

Drop leading 0. With that, for the binding:

Acked-by: Rob Herring 

> + #address-cells = <1>;
> + #size-cells = <0>;
> + reg = <0x2000 0x100>;
> + clock-frequency = <40>;
> + clocks = <&i2cclk>;
> + interrupts = <0>;
> +
> + eeprom@64 {
> + compatible = "linux,slave-24c02";
> + reg = <0x4064>;
> + };
> + };

Re: [tpmdd-devel] [PATCH v2 6/7] tpm: expose spaces via a device link /dev/tpms

2017-02-27 Thread Jason Gunthorpe

On Sat, Feb 25, 2017 at 12:04:49PM -0500, James Bottomley wrote:

> >  device cgroup blocks access to the cdevs of tpm0 but not to the
> > sysfs files.
> 
> What the device cgroup currently does for us and what it could do are
> two different things.  It seems if it exported
> __devcgroup_check_permission, we could use that as a check to gate the
> sysfs file access.

Make sense, maybe we should be doing that..

Stefan, are you still interested in this? This seems like a fairly
simple solution to your problem???

> > I am talking about using a situation like kernel IMA or keyring in 
> > the container with a tpm that is not tpm0, eg a vtpm.
> 
> a vtpm appears as a tpm device so it can be controlled by the device
> cgroup ... I think I'm not seeing the issue.

When an in-kernel call opens the TPM it does not go through the cdev,
it does something like this:

extern int tpm_pcr_read(u32 chip_num, int pcr_idx, u8 *res_buf);

And hardwires 'chip_num' to TPM_ANY_NUM. Keyring does the same (see
trusted_instantiate)

Practically speaking this means in-kernel callers pretty much always
operate on tpm0.

I think we need to change TPM_ANY_NUM to something more container
friendly, but I'm not sure what that should be.

> be done at all) it's usually better to start with use cases.  So
> instead of saying we need to virtualize the PCRs we should start with X
> container has this requirement for attestation of its Y state.  Often
> the best way simply is an extension of the multi user model for the
> resource ... in this case no-one's really come up with one for PCRs, so
> that might be the place to begin.

Broadly makes sense to me.

Maybe kernel keyring is a better example, it already has a multi-user
model.

Jason

Re: [PATCH v15 2/5] Documentation: bindings: add dt documentation for cdn DP controller

2017-02-27 Thread Enric Balletbo Serra

Hi all,

2016-09-10 4:15 GMT+02:00 Chris Zhong :
> This patch adds a binding that describes the cdn DP controller for
> rk3399.
>
> Signed-off-by: Chris Zhong 
> Acked-by: Rob Herring 
> Reviewed-by: Guenter Roeck 
>
> ---
>
> Changes in v15: None
> Changes in v14: None
> Changes in v13:
> - add dptx and apb reset
>
> Changes in v12: None
> Changes in v11:
> - refer dp phy
>
> Changes in v10:
> - add pclk_vio_grf clock
>
> Changes in v9:
> - modify the reference phy = <&tcphy0 0>, <&tcphy1 0>;
>
> Changes in v8: None
> Changes in v7: None
> Changes in v6:
> - add assigned-clocks and assigned-clock-rates
> - add power-domains
>
> Changes in v5: None
> Changes in v4:
> - add a reset node
> - support 2 phys
>
> Changes in v3:
> - add SoC specific compatible string
> - remove reg = <1>;
>
> Changes in v2: None
> Changes in v1:
> - add extcon node description
> - add #sound-dai-cells description
>
>  .../bindings/display/rockchip/cdn-dp-rockchip.txt  | 75 
> ++
>  1 file changed, 75 insertions(+)
>  create mode 100644 
> Documentation/devicetree/bindings/display/rockchip/cdn-dp-rockchip.txt
>
> diff --git 
> a/Documentation/devicetree/bindings/display/rockchip/cdn-dp-rockchip.txt 
> b/Documentation/devicetree/bindings/display/rockchip/cdn-dp-rockchip.txt
> new file mode 100644
> index 000..9bd2c13
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/display/rockchip/cdn-dp-rockchip.txt
> @@ -0,0 +1,75 @@
> +Rockchip RK3399 specific extensions to the cdn Display Port
> +
> +
> +Required properties:
> +- compatible: must be "rockchip,rk3399-cdn-dp"
> +
> +- reg: physical base address of the controller and length
> +
> +- clocks: from common clock binding: handle to dp clock.
> +
> +- clock-names: from common clock binding:
> +  Required elements: "core-clk" "pclk" "spdif" "grf"
> +
> +- resets : a list of phandle + reset specifier pairs
> +- reset-names : string reset name, must be:
> +   "spdif", "dptx", "apb".
> +- power-domains : power-domain property defined with a phandle
> + to respective power domain.
> +- assigned-clocks: main clock, should be <&cru SCLK_DP_CORE>
> +- assigned-clock-rates : the DP core clk frequency, shall be: 1
> +
> +- rockchip,grf: this soc should set GRF regs, so need get grf here.
> +
> +- ports: contain a port nodes with endpoint definitions as defined in
> +Documentation/devicetree/bindings/media/video-interfaces.txt.
> +contained 2 endpoints, connecting to the output of vop.
> +
> +- phys: from general PHY binding: the phandle for the PHY device.
> +
> +- extcon: extcon specifier for the Power Delivery
> +
> +- #sound-dai-cells = it must be 1 if your system is using 2 DAIs: I2S, SPDIF
> +
> +---
> +
> +Example:
> +   cdn_dp: dp@fec0 {
> +   compatible = "rockchip,rk3399-cdn-dp";
> +   reg = <0x0 0xfec0 0x0 0x10>;
> +   interrupts = ;
> +   clocks = <&cru SCLK_DP_CORE>, <&cru PCLK_DP_CTRL>,
> +<&cru SCLK_SPDIF_REC_DPTX>, <&cru PCLK_VIO_GRF>;
> +   clock-names = "core-clk", "pclk", "spdif", "grf";
> +   assigned-clocks = <&cru SCLK_DP_CORE>;
> +   assigned-clock-rates = <1>;
> +   power-domains = <&power RK3399_PD_HDCP>;
> +   phys = <&tcphy0_dp>, <&tcphy1_dp>;
> +   resets = <&cru SRST_DPTX_SPDIF_REC>, <&cru SRST_P_UPHY0_DPTX>,
> +<&cru SRST_P_UPHY0_APB>;
> +   reset-names = "spdif", "dptx", "apb";
> +   extcon = <&fusb0>, <&fusb1>;
> +   rockchip,grf = <&grf>;
> +   #address-cells = <1>;
> +   #size-cells = <0>;
> +   #sound-dai-cells = <1>;
> +
> +   ports {
> +   #address-cells = <1>;
> +   #size-cells = <0>;
> +
> +   dp_in: port {
> +   #address-cells = <1>;
> +   #size-cells = <0>;
> +   dp_in_vopb: endpoint@0 {
> +   reg = <0>;
> +   remote-endpoint = <&vopb_out_dp>;
> +   };
> +
> +   dp_in_vopl: endpoint@1 {
> +   reg = <1>;
> +   remote-endpoint = <&vopl_out_dp>;
> +   };
> +   };
> +   };
> +   };
> --
> 1.9.1
>

CC'ing Mark Yao

I saw that the cdn-dp driver is merged but not the documentation
binding. Maybe we forget to include this patch with the pull request?
Or there is another reason?

Thanks,
  Enric

Re: [PATCH v1 1/3] ASoC: zx-96p22: add documentation for zte's aud96p22 controller

2017-02-27 Thread Rob Herring

On Wed, Feb 15, 2017 at 06:55:08PM +0800, Baoyou Xie wrote:
> This patch adds dt-binding documentation for zte's aud96p22 controller.
> 
> Signed-off-by: Baoyou Xie 
> ---
>  .../devicetree/bindings/sound/zte,zx-96p22.txt | 24 
> ++
>  1 file changed, 24 insertions(+)
>  create mode 100644 Documentation/devicetree/bindings/sound/zte,zx-96p22.txt
> 
> diff --git a/Documentation/devicetree/bindings/sound/zte,zx-96p22.txt 
> b/Documentation/devicetree/bindings/sound/zte,zx-96p22.txt
> new file mode 100644
> index 000..4184566
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/sound/zte,zx-96p22.txt
> @@ -0,0 +1,24 @@
> +ZTE zx96p22 controller
> +
> +Required properties:
> + - compatible : Must be "zte,zx-aud96p22"
> + - #sound-dai-cells: Should be 0
> + - reg : Offset of I2C register for zx96p22
> +
> +Example:
> +
> + audio_i2c0: audio_i2c0@1486000 {

i2c@...

> + compatible = "zte,zx296718-i2c";
> + reg = <0x01486000 0x1000>;
> + interrupts = ;
> + #address-cells = <1>;
> + #size-cells = <0>;
> + clocks = <&audiocrm AUDIO_I2C0_WCLK>;
> + clock-frequency = <160>;
> + status = "ok";

No need for status in examples.

With that,

Acked-by: Rob Herring 


> + inner_codec: aud96p22@22 {
> + compatible = "zte,zx-aud96p22";
> + #sound-dai-cells = <0>;
> + reg = <0x22>;
> + };
> + };
> -- 
> 2.7.4
>

Re: Schedule affinity_notify work while migrating IRQs during hot plug

2017-02-27 Thread Sodagudi Prasad


On 2017-02-21 12:59, Sodagudi Prasad wrote:

Hi Thomas,

Currently irq_set_affinity() is called to migrate irqs from 
migrate_one_irq()

during cpu hot plug and clients which are interested to know the irq
affinity change
not getting notified

take_cpu_down () --> __cpu_disable() --> 
irq_migrate_all_off_this_cpu();


irq_set_affinity() is changing the IRQ affinity at chip level
but it is not notifying the affinity_notify work.



Hi Thomas and All,

I could see that in the 3.10 kernel irq affinity notifiers are getting 
called when a core getting hot plugged. But in later kernel versions api 
used to change
affinity was changed from irq_set_affinity_locked() to 
irq_do_set_affinity(), so irq notifiers are not getting called when a 
core hot plugged.

https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/tree/kernel/irq/manage.c?id=refs/tags/v3.10.105#n201


In latest kernel following path is executed to migrate IRQs.
irq_migrate_all_off_this_cpu() --> migrate_one_irq() -> 
irq_do_set_affinity().


As I mentioned above, irq_set_affinity_locked() notifies all clients 
drivers, which are registered for IRQ affinity change but 
irq_do_set_affinity() API just changes the affinity at irq chip level 
but does not notify the clients drivers. I am not sure whether it is 
just a miss during IRQ framework refactor or intentionally done like 
this. Can you please check whether following code change make sense or 
not?


So I am thinking that, adding following sched_work() would notify 
clients.


diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
index 6b66959..5e4766b 100644
--- a/kernel/irq/manage.c
+++ b/kernel/irq/manage.c
@@ -207,6 +207,7 @@ int irq_do_set_affinity(struct irq_data *data, const 
struct cpumask *mask,

case IRQ_SET_MASK_OK_DONE:
cpumask_copy(desc->irq_common_data.affinity, mask);
case IRQ_SET_MASK_OK_NOCOPY:
+   schedule_work(&desc->affinity_notify->work);
irq_set_thread_affinity(desc);
ret = 0;
}



How about below change, so that clients drivers gets notified about
irq affinity changes?
--- a/kernel/irq/manage.c
+++ b/kernel/irq/manage.c
@@ -207,6 +207,7 @@ int irq_do_set_affinity(struct irq_data *data,
const struct cpumask *mask,
case IRQ_SET_MASK_OK_DONE:
cpumask_copy(desc->irq_common_data.affinity, mask);
case IRQ_SET_MASK_OK_NOCOPY:
+   schedule_work(&desc->affinity_notify->work);
irq_set_thread_affinity(desc);
ret = 0;

With this change, notifications of IRQ affinity gets executed and 
notified

to client drivers.


-Thanks, Prasad
--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora 
Forum,

Linux Foundation Collaborative Project

mm: GPF in bdi_put

2017-02-27 Thread Dmitry Vyukov

Hello,

The following program triggers GPF in bdi_put:
https://gist.githubusercontent.com/dvyukov/15b3e211f937ff6abc558724369066ce/raw/cc017edf57963e30175a6a6fe2b8d917f6e92899/gistfile1.txt

general protection fault:  [#1] SMP KASAN
Modules linked in:
CPU: 0 PID: 2952 Comm: a.out Not tainted 4.10.0+ #229
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
task: 880063e72180 task.stack: 880064a78000
RIP: 0010:__read_once_size include/linux/compiler.h:247 [inline]
RIP: 0010:atomic_read arch/x86/include/asm/atomic.h:26 [inline]
RIP: 0010:refcount_sub_and_test include/linux/refcount.h:156 [inline]
RIP: 0010:refcount_dec_and_test include/linux/refcount.h:181 [inline]
RIP: 0010:kref_put include/linux/kref.h:71 [inline]
RIP: 0010:bdi_put+0x8b/0x1d0 mm/backing-dev.c:914
RSP: 0018:880064a7f0b0 EFLAGS: 00010202
RAX: 0007 RBX:  RCX: 
RDX: 880064a7f118 RSI: 0001 RDI: 
RBP: 880064a7f140 R08: 880065603280 R09: 0001
R10:  R11: 0001 R12: dc00
R13: 0038 R14: 11000c94fe17 R15: 880064a7f218
FS:  00eb5880() GS:88006d00() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 20914ffa CR3: 6bc37000 CR4: 001426f0
Call Trace:
 bdev_evict_inode+0x203/0x3a0 fs/block_dev.c:888
 evict+0x46e/0x980 fs/inode.c:553
 iput_final fs/inode.c:1515 [inline]
 iput+0x589/0xb20 fs/inode.c:1542
 dentry_unlink_inode+0x43b/0x600 fs/dcache.c:343
 __dentry_kill+0x34d/0x740 fs/dcache.c:538
 dentry_kill fs/dcache.c:579 [inline]
 dput.part.27+0x5ce/0x7c0 fs/dcache.c:791
 dput fs/dcache.c:753 [inline]
 do_one_tree+0x43/0x50 fs/dcache.c:1454
 shrink_dcache_for_umount+0xbb/0x2b0 fs/dcache.c:1468
 generic_shutdown_super+0xcd/0x4c0 fs/super.c:421
 kill_anon_super+0x3c/0x50 fs/super.c:988
 deactivate_locked_super+0x88/0xd0 fs/super.c:309
 deactivate_super+0x155/0x1b0 fs/super.c:340
 cleanup_mnt+0xb2/0x160 fs/namespace.c:1112
 __cleanup_mnt+0x16/0x20 fs/namespace.c:1119
 task_work_run+0x18a/0x260 kernel/task_work.c:116
 tracehook_notify_resume include/linux/tracehook.h:191 [inline]
 exit_to_usermode_loop+0x23b/0x2a0 arch/x86/entry/common.c:160
 prepare_exit_to_usermode arch/x86/entry/common.c:190 [inline]
 syscall_return_slowpath+0x4d3/0x570 arch/x86/entry/common.c:259
 entry_SYSCALL_64_fastpath+0xc0/0xc2
RIP: 0033:0x435e19
RSP: 002b:7ffc9d7f2748 EFLAGS: 0246 ORIG_RAX: 00a5
RAX: ffea RBX: 0100 RCX: 00435e19
RDX: 20063000 RSI: 20914ffa RDI: 20037000
RBP: 7ffc9d7f2fe0 R08: 20039000 R09: 
R10:  R11: 0246 R12: 
R13: 00402b70 R14: 00402c00 R15: 
Code: 04 f2 f2 f2 c7 40 08 f3 f3 f3 f3 e8 f0 ec de ff 48 8d 45 98 48
8b 95 70 ff ff ff 48 c1 e8 03 42 c6 04 20 04 4c 89 e8 48 c1 e8 03 <42>
0f b6 0c 20 4c 89 e8 83 e0 07 83 c0 03 38 c8 7c 08 84 c9 0f
RIP: __read_once_size include/linux/compiler.h:247 [inline] RSP:
880064a7f0b0
RIP: atomic_read arch/x86/include/asm/atomic.h:26 [inline] RSP: 880064a7f0b0
RIP: refcount_sub_and_test include/linux/refcount.h:156 [inline] RSP:
880064a7f0b0
RIP: refcount_dec_and_test include/linux/refcount.h:181 [inline] RSP:
880064a7f0b0
RIP: kref_put include/linux/kref.h:71 [inline] RSP: 880064a7f0b0
RIP: bdi_put+0x8b/0x1d0 mm/backing-dev.c:914 RSP: 880064a7f0b0
---[ end trace 8991b3d16ac9bf93 ]---

On commit e5d56efc97f8240d0b5d66c03949382b6d7e5570.

Re: [PATCH 1/2] dt-bindings: qoriq-clock: Add coreclk

2017-02-27 Thread Rob Herring

On Wed, Feb 15, 2017 at 01:47:35PM +0800, yuantian.t...@nxp.com wrote:
> From: Tang Yuantian 
> 
> ls1012a has separate input root clocks for core PLLs versus the platform
> PLL, with the latter described as sysclk in the hw docs.
> Update the qoriq-clock binding to allow a second input clock, named
> "coreclk".  If present, this clock will be used for the core PLLs.
> 
> Signed-off-by: Scott Wood 
> Signed-off-by: Tang Yuantian 
> ---
>  Documentation/devicetree/bindings/clock/qoriq-clock.txt | 6 ++
>  1 file changed, 6 insertions(+)

The change looks fine, but sounds like Scott should remain the author 
(or agree he shouldn't be).

> 
> diff --git a/Documentation/devicetree/bindings/clock/qoriq-clock.txt 
> b/Documentation/devicetree/bindings/clock/qoriq-clock.txt
> index df9cb5a..97a9666 100644
> --- a/Documentation/devicetree/bindings/clock/qoriq-clock.txt
> +++ b/Documentation/devicetree/bindings/clock/qoriq-clock.txt
> @@ -55,6 +55,11 @@ Optional properties:
>  - clocks: If clock-frequency is not specified, sysclk may be provided
>   as an input clock.  Either clock-frequency or clocks must be
>   provided.
> + A second input clock, called "coreclk", may be provided if
> + core PLLs are based on a different input clock from the
> + platform PLL.
> +- clock-names: Required if a coreclk is present.  Valid names are
> + "sysclk" and "coreclk".
>  
>  2. Clock Provider
>  
> @@ -71,6 +76,7 @@ second cell is the clock index for the specified type.
>   2   hwaccel index (n in CLKCGnHWACSR)
>   3   fman0 for fm1, 1 for fm2
>   4   platform pll0=pll, 1=pll/2, 2=pll/3, 3=pll/4
> + 5   coreclk must be 0
>  
>  3. Example
>  
> -- 
> 2.1.0.27.g96db324
>

RE: [PATCH] Drivers: hv: util: on deinit, don't wait the release event, if we shouldn't

2017-02-27 Thread Stephen Hemminger

The patch looks good. I don't  understand the exact semantics here and 
therefore have a couple of naïve questions

Is it possible device could be opened multiple times? Is reference counting 
done above?
If device was never opened do you even have to do all the cleanup steps at all?

-Original Message-
From: Dexuan Cui 
Sent: Monday, February 27, 2017 3:26 AM
To: gre...@linuxfoundation.org; driverdev-de...@linuxdriverproject.org; KY 
Srinivasan ; Haiyang Zhang ; 
Stephen Hemminger 
Cc: Vitaly Kuznetsov ; linux-kernel@vger.kernel.org; Alex 
Ng (LIS) 
Subject: [PATCH] Drivers: hv: util: on deinit, don't wait the release event, if 
we shouldn't


If the daemon is NOT running at all, when we disable the util device from
Hyper-V Manager (or sometimes the host can rescind a util device and then
re-offer it), we'll hang in util_remove -> hv_kvp_deinit ->
wait_for_completion(&release_event), because this code path doesn't run:
hvt_op_release -> ... -> kvp_on_reset -> complete(&release_event).

Due to this, we even can't reboot the VM properly.

The patch tracks if the dev file is opened or not, and we only need to
wait if it's opened.

Fixes: 5a66fecbf6aa ("Drivers: hv: util: kvp: Fix a rescind processing issue")
Signed-off-by: Dexuan Cui 
Cc: Vitaly Kuznetsov 
Cc: "K. Y. Srinivasan" 
Cc: Haiyang Zhang 
Cc: Stephen Hemminger 
---
 drivers/hv/hv_fcopy.c   | 5 -
 drivers/hv/hv_kvp.c | 6 +-
 drivers/hv/hv_snapshot.c| 5 -
 drivers/hv/hv_utils_transport.c | 2 ++
 drivers/hv/hv_utils_transport.h | 1 +
 5 files changed, 16 insertions(+), 3 deletions(-)

diff --git a/drivers/hv/hv_fcopy.c b/drivers/hv/hv_fcopy.c
index 9aee601..545cf43 100644
--- a/drivers/hv/hv_fcopy.c
+++ b/drivers/hv/hv_fcopy.c
@@ -358,8 +358,11 @@ int hv_fcopy_init(struct hv_util_service *srv)
 
 void hv_fcopy_deinit(void)
 {
+   bool wait = hvt->dev_opened;
+
fcopy_transaction.state = HVUTIL_DEVICE_DYING;
cancel_delayed_work_sync(&fcopy_timeout_work);
hvutil_transport_destroy(hvt);
-   wait_for_completion(&release_event);
+   if (wait)
+   wait_for_completion(&release_event);
 }
diff --git a/drivers/hv/hv_kvp.c b/drivers/hv/hv_kvp.c
index de26371..15c7873 100644
--- a/drivers/hv/hv_kvp.c
+++ b/drivers/hv/hv_kvp.c
@@ -742,10 +742,14 @@ hv_kvp_init(struct hv_util_service *srv)
 
 void hv_kvp_deinit(void)
 {
+   bool wait = hvt->dev_opened;
+
kvp_transaction.state = HVUTIL_DEVICE_DYING;
cancel_delayed_work_sync(&kvp_host_handshake_work);
cancel_delayed_work_sync(&kvp_timeout_work);
cancel_work_sync(&kvp_sendkey_work);
hvutil_transport_destroy(hvt);
-   wait_for_completion(&release_event);
+
+   if (wait)
+   wait_for_completion(&release_event);
 }
diff --git a/drivers/hv/hv_snapshot.c b/drivers/hv/hv_snapshot.c
index bcc03f0..3847f19 100644
--- a/drivers/hv/hv_snapshot.c
+++ b/drivers/hv/hv_snapshot.c
@@ -396,9 +396,12 @@ hv_vss_init(struct hv_util_service *srv)
 
 void hv_vss_deinit(void)
 {
+   bool wait = hvt->dev_opened;
+
vss_transaction.state = HVUTIL_DEVICE_DYING;
cancel_delayed_work_sync(&vss_timeout_work);
cancel_work_sync(&vss_handle_request_work);
hvutil_transport_destroy(hvt);
-   wait_for_completion(&release_event);
+   if (wait)
+   wait_for_completion(&release_event);
 }
diff --git a/drivers/hv/hv_utils_transport.c b/drivers/hv/hv_utils_transport.c
index c235a95..05e0648 100644
--- a/drivers/hv/hv_utils_transport.c
+++ b/drivers/hv/hv_utils_transport.c
@@ -153,6 +153,7 @@ static int hvt_op_open(struct inode *inode, struct file 
*file)
 
if (issue_reset)
hvt_reset(hvt);
+   hvt->dev_opened = (hvt->mode == HVUTIL_TRANSPORT_CHARDEV) && !ret;
 
mutex_unlock(&hvt->lock);
 
@@ -182,6 +183,7 @@ static int hvt_op_release(struct inode *inode, struct file 
*file)
 * connects back.
 */
hvt_reset(hvt);
+   hvt->dev_opened = false;
mutex_unlock(&hvt->lock);
 
if (mode_old == HVUTIL_TRANSPORT_DESTROY)
diff --git a/drivers/hv/hv_utils_transport.h b/drivers/hv/hv_utils_transport.h
index d98f522..9871283 100644
--- a/drivers/hv/hv_utils_transport.h
+++ b/drivers/hv/hv_utils_transport.h
@@ -32,6 +32,7 @@ struct hvutil_transport {
int mode;   /* hvutil_transport_mode */
struct file_operations fops;/* file operations */
struct miscdevice mdev; /* misc device */
+   bool   dev_opened;  /* Is the device opened? */
struct cb_id cn_id; /* CN_*_IDX/CN_*_VAL */
struct list_head list;  /* hvt_list */
int (*on_msg)(void *, int); /* callback on new user message */
-- 
2.7.4

__queue_work oops.

2017-02-27 Thread Dave Jones

Oops:  [#1] PREEMPT SMP DEBUG_PAGEALLOC
CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.10.0-think+ #9 
task: 88017f105440 task.stack: c9094000
RIP: 0010:__queue_work+0x2d/0x700
RSP: 0018:880507c03df8 EFLAGS: 00010046
RAX: 0082 RBX: 0101 RCX: 0002
RDX: 88047bf07c98 RSI:  RDI: 
RBP: 880507c03e30 R08: 0001 R09: 8294bf68
R10: 880507c03e58 R11:  R12: 88047bf07ce8
R13:  R14:  R15: 88047bf07c98
FS:  () GS:880507c0() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 01c2 CR3: 04e11000 CR4: 001406e0
Call Trace:
 
 ? work_on_cpu+0xb0/0xb0
 delayed_work_timer_fn+0x1e/0x20
 call_timer_fn+0xbd/0x480
 ? call_timer_fn+0x5/0x480
 ? work_on_cpu+0xb0/0xb0
 run_timer_softirq+0x1a1/0x700
 __do_softirq+0xbf/0x5b1
 irq_exit+0xb5/0xc0
 smp_apic_timer_interrupt+0x3d/0x50
 apic_timer_interrupt+0x97/0xa0
RIP: 0010:cpuidle_enter_state+0x12e/0x400
RSP: 0018:c9097e40 EFLAGS: 0202
[CONT START]  ORIG_RAX: ff10
RAX: 88017f105440 RBX: e8a03dc8 RCX: 0001
RDX: 20c49ba5e353f7cf RSI: 0001 RDI: 88017f105440
RBP: c9097e80 R08: cccd R09: 0018
R10:  R11:  R12: 0005
R13: 81eaf198 R14: 0005 R15: 81eaf180
 
 cpuidle_enter+0x17/0x20
 call_cpuidle+0x23/0x40
 do_idle+0xfb/0x200
 cpu_startup_entry+0x71/0x80
 start_secondary+0x16a/0x210
 start_cpu+0x14/0x14
Code: 44 00 00 55 48 89 e5 41 57 49 89 d7 41 56 41 89 fe 41 55 49 89 f5 41 54 
53 48 83 ec 10 89 7d d4 9c 58 f6 c4 02 0f 85 06 04 00 00 <41> f6 85 c2 01 00 00 
01 0f 85 22 04 00 00 49 bc eb 83 b5 80 46 

Code starting with the faulting instruction
===
   0:   41 f6 85 c2 01 00 00testb  $0x1,0x1c2(%r13)
   7:   01 
   8:   0f 85 22 04 00 00   jne0x430
   e:   49  rex.WB
   f:   bc eb 83 b5 80  mov$0x80b583eb,%esp
  14:   46  rex.RX



3cf0 <__queue_work>:
{
3cf0:   e8 00 00 00 00  callq  3cf5 <__queue_work+0x5>
3cf5:   55  push   %rbp
3cf6:   48 89 e5mov%rsp,%rbp
3cf9:   41 57   push   %r15
3cfb:   49 89 d7mov%rdx,%r15
3cfe:   41 56   push   %r14
unsigned int req_cpu = cpu;
3d00:   41 89 femov%edi,%r14d
{
3d03:   41 55   push   %r13
3d05:   49 89 f5mov%rsi,%r13
3d08:   41 54   push   %r12
3d0a:   53  push   %rbx
3d0b:   48 83 ec 10 sub$0x10,%rsp
3d0f:   89 7d d4mov%edi,-0x2c(%rbp)
asm volatile("# __raw_save_flags\n\t"
3d12:   9c  pushfq 
3d13:   58  pop%rax
WARN_ON_ONCE(!irqs_disabled());
3d14:   f6 c4 02test   $0x2,%ah
3d17:   0f 85 06 04 00 00   jne4123 <__queue_work+0x433>
if (unlikely(wq->flags & __WQ_DRAINING) &&
3d1d:   41 f6 85 c2 01 00 00testb  $0x1,0x1c2(%r13)


So we called __queue_work with a null wq.

Thoughts ?

Dave

Re: mm: GPF in bdi_put

2017-02-27 Thread Dmitry Vyukov

On Mon, Feb 27, 2017 at 6:11 PM, Dmitry Vyukov  wrote:
> Hello,
>
> The following program triggers GPF in bdi_put:
> https://gist.githubusercontent.com/dvyukov/15b3e211f937ff6abc558724369066ce/raw/cc017edf57963e30175a6a6fe2b8d917f6e92899/gistfile1.txt
>
> general protection fault:  [#1] SMP KASAN
> Modules linked in:
> CPU: 0 PID: 2952 Comm: a.out Not tainted 4.10.0+ #229
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> task: 880063e72180 task.stack: 880064a78000
> RIP: 0010:__read_once_size include/linux/compiler.h:247 [inline]
> RIP: 0010:atomic_read arch/x86/include/asm/atomic.h:26 [inline]
> RIP: 0010:refcount_sub_and_test include/linux/refcount.h:156 [inline]
> RIP: 0010:refcount_dec_and_test include/linux/refcount.h:181 [inline]
> RIP: 0010:kref_put include/linux/kref.h:71 [inline]
> RIP: 0010:bdi_put+0x8b/0x1d0 mm/backing-dev.c:914
> RSP: 0018:880064a7f0b0 EFLAGS: 00010202
> RAX: 0007 RBX:  RCX: 
> RDX: 880064a7f118 RSI: 0001 RDI: 
> RBP: 880064a7f140 R08: 880065603280 R09: 0001
> R10:  R11: 0001 R12: dc00
> R13: 0038 R14: 11000c94fe17 R15: 880064a7f218
> FS:  00eb5880() GS:88006d00() knlGS:
> CS:  0010 DS:  ES:  CR0: 80050033
> CR2: 20914ffa CR3: 6bc37000 CR4: 001426f0
> Call Trace:
>  bdev_evict_inode+0x203/0x3a0 fs/block_dev.c:888
>  evict+0x46e/0x980 fs/inode.c:553
>  iput_final fs/inode.c:1515 [inline]
>  iput+0x589/0xb20 fs/inode.c:1542
>  dentry_unlink_inode+0x43b/0x600 fs/dcache.c:343
>  __dentry_kill+0x34d/0x740 fs/dcache.c:538
>  dentry_kill fs/dcache.c:579 [inline]
>  dput.part.27+0x5ce/0x7c0 fs/dcache.c:791
>  dput fs/dcache.c:753 [inline]
>  do_one_tree+0x43/0x50 fs/dcache.c:1454
>  shrink_dcache_for_umount+0xbb/0x2b0 fs/dcache.c:1468
>  generic_shutdown_super+0xcd/0x4c0 fs/super.c:421
>  kill_anon_super+0x3c/0x50 fs/super.c:988
>  deactivate_locked_super+0x88/0xd0 fs/super.c:309
>  deactivate_super+0x155/0x1b0 fs/super.c:340
>  cleanup_mnt+0xb2/0x160 fs/namespace.c:1112
>  __cleanup_mnt+0x16/0x20 fs/namespace.c:1119
>  task_work_run+0x18a/0x260 kernel/task_work.c:116
>  tracehook_notify_resume include/linux/tracehook.h:191 [inline]
>  exit_to_usermode_loop+0x23b/0x2a0 arch/x86/entry/common.c:160
>  prepare_exit_to_usermode arch/x86/entry/common.c:190 [inline]
>  syscall_return_slowpath+0x4d3/0x570 arch/x86/entry/common.c:259
>  entry_SYSCALL_64_fastpath+0xc0/0xc2
> RIP: 0033:0x435e19
> RSP: 002b:7ffc9d7f2748 EFLAGS: 0246 ORIG_RAX: 00a5
> RAX: ffea RBX: 0100 RCX: 00435e19
> RDX: 20063000 RSI: 20914ffa RDI: 20037000
> RBP: 7ffc9d7f2fe0 R08: 20039000 R09: 
> R10:  R11: 0246 R12: 
> R13: 00402b70 R14: 00402c00 R15: 
> Code: 04 f2 f2 f2 c7 40 08 f3 f3 f3 f3 e8 f0 ec de ff 48 8d 45 98 48
> 8b 95 70 ff ff ff 48 c1 e8 03 42 c6 04 20 04 4c 89 e8 48 c1 e8 03 <42>
> 0f b6 0c 20 4c 89 e8 83 e0 07 83 c0 03 38 c8 7c 08 84 c9 0f
> RIP: __read_once_size include/linux/compiler.h:247 [inline] RSP:
> 880064a7f0b0
> RIP: atomic_read arch/x86/include/asm/atomic.h:26 [inline] RSP: 
> 880064a7f0b0
> RIP: refcount_sub_and_test include/linux/refcount.h:156 [inline] RSP:
> 880064a7f0b0
> RIP: refcount_dec_and_test include/linux/refcount.h:181 [inline] RSP:
> 880064a7f0b0
> RIP: kref_put include/linux/kref.h:71 [inline] RSP: 880064a7f0b0
> RIP: bdi_put+0x8b/0x1d0 mm/backing-dev.c:914 RSP: 880064a7f0b0
> ---[ end trace 8991b3d16ac9bf93 ]---
>
> On commit e5d56efc97f8240d0b5d66c03949382b6d7e5570.


I also wee the following WARNING. Do you think it' the same underlying bug?

[ cut here ]
WARNING: CPU: 1 PID: 24265 at mm/backing-dev.c:899
bdi_exit+0x13e/0x160 mm/backing-dev.c:899
Kernel panic - not syncing: panic_on_warn set ...

CPU: 1 PID: 24265 Comm: syz-executor3 Not tainted 4.10.0-next-20170227+ #1
Hardware name: Google Google Compute Engine/Google Compute Engine,
BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:15 [inline]
 dump_stack+0x2ee/0x3ef lib/dump_stack.c:51
 panic+0x1fb/0x412 kernel/panic.c:179
 __warn+0x1c4/0x1e0 kernel/panic.c:540
 warn_slowpath_null+0x2c/0x40 kernel/panic.c:583
 bdi_exit+0x13e/0x160 mm/backing-dev.c:899
 release_bdi+0x19/0x30 mm/backing-dev.c:908
 kref_put include/linux/kref.h:72 [inline]
 bdi_put+0x2a/0x40 mm/backing-dev.c:914
 bdev_evict_inode+0x203/0x3a0 fs/block_dev.c:888
 evict+0x46e/0x980 fs/inode.c:553
 iput_final fs/inode.c:1

Re: [PATCH v2 14/22] dt-bindings: PCI: dra7xx: Add dt bindings to enable legacy mode

2017-02-27 Thread Rob Herring

On Fri, Feb 17, 2017 at 03:20:34PM +0530, Kishon Vijay Abraham I wrote:
> Update device tree binding documentation of TI's dra7xx PCI
> controller to include property for enabling legacy mode.
> 
> Signed-off-by: Kishon Vijay Abraham I 
> ---
>  Documentation/devicetree/bindings/pci/ti-pci.txt |4 
>  1 file changed, 4 insertions(+)
> 
> diff --git a/Documentation/devicetree/bindings/pci/ti-pci.txt 
> b/Documentation/devicetree/bindings/pci/ti-pci.txt
> index 190828a..72ebe2b 100644
> --- a/Documentation/devicetree/bindings/pci/ti-pci.txt
> +++ b/Documentation/devicetree/bindings/pci/ti-pci.txt
> @@ -39,6 +39,10 @@ DEVICE MODE
>   - interrupts : one interrupt entries must be specified for main interrupt.
>   - num-ib-windows : number of inbound address translation windows
>   - num-ob-windows : number of outbound address translation windows
> + - syscon-legacy-mode: phandle to the syscon dt node. The 1st argument should
> +contain the register offset within syscon and the 2nd
> +argument should contain the bit field for setting the
> +legacy mode

Vendor prefix needed and what does "legacy mode" mean? Perhaps name this 
around what the mode is/does, not that it is legacy.

Rob

Re: [PATCH] sd: close hole in > 2T device rejection when !CONFIG_LBDAF

2017-02-27 Thread Steve Magnani


Hi Bart -

Thanks for taking the time to look this over.

On 02/27/2017 10:13 AM, Bart Van Assche wrote:

On Mon, 2017-02-27 at 09:22 -0600, Steven J. Magnani wrote:

@@ -2122,7 +2122,10 @@ static int read_capacity_16(struct scsi_
return -ENODEV;
}
  
-	if ((sizeof(sdkp->capacity) == 4) && (lba >= 0xULL)) {

+   /* Make sure logical_to_sectors() won't overflow */
+   lba_in_sectors = lba << (ilog2(sector_size) - 9);
+   if ((sizeof(sdkp->capacity) == 4) &&
+   ((lba >= 0xULL) || (lba_in_sectors >= 0xULL))) {
sd_printk(KERN_ERR, sdkp, "Too big for this kernel. Use a "
"kernel compiled with support for large block "
"devices.\n");
@@ -2162,6 +2165,7 @@ static int read_capacity_10(struct scsi_
int the_result;
int retries = 3, reset_retries = READ_CAPACITY_RETRIES_ON_RESET;
sector_t lba;
+   unsigned long long lba_in_sectors;
unsigned sector_size;
  
  	do {

@@ -2208,7 +2212,10 @@ static int read_capacity_10(struct scsi_
return sector_size;
}
  
-	if ((sizeof(sdkp->capacity) == 4) && (lba == 0x)) {

+   /* Make sure logical_to_sectors() won't overflow */
+   lba_in_sectors = ((unsigned long long) lba) << (ilog2(sector_size) - 9);
+   if ((sizeof(sdkp->capacity) == 4) &&
+   (lba_in_sectors >= 0xULL)) {
sd_printk(KERN_ERR, sdkp, "Too big for this kernel. Use a "
"kernel compiled with support for large block "
"devices.\n");

Why are the two checks slightly different? Could the same code be used for
both checks?
The checks are different because with READ CAPACITY(16) a _really_ huge 
device could report a max LBA so large that left-shifting it causes bits 
to drop off the end. That's not an issue with READ CAPACITY(10) because 
at most the 32-bit LBA reported by the device will become a 35-bit value 
(since the max supported block size is 4096 == (512 << 3)).



BTW, using the macro below would make the above checks less
verbose and easier to read:

/*
  * Test whether the result of a shift-left operation would be larger than
  * what fits in a variable with the type of @a.
  */
#define shift_left_overflows(a, b)  \
({  \
typeof(a) _minus_one = -1LL;\
typeof(a) _plus_one = 1;\
bool _a_is_signed = _minus_one < 0;  \
int _shift = sizeof(a) * 8 - ((b) + _a_is_signed);  \
_shift < 0 || ((a) & ~((_plus_one << _shift) - 1)) != 0;\
})

Bart.
Perhaps but I am not a fan of putting braces in non-obvious places such 
as within array subscripts (which I've encountered recently) or 
conditional expressions, which is what this amounts to.


Regards,

 Steven J. Magnani   "I claim this network for MARS!
 www.digidescorp.com  Earthling, return my space modulator!"

 #include

Re: [REGRESSION] pinctrl, of, unable to find hogs

2017-02-27 Thread Gary Bisson

On Mon, Feb 27, 2017 at 05:27:47PM +0100, Gary Bisson wrote:
> Mika, Tony, All,
> 
> On Mon, Feb 27, 2017 at 07:53:53AM -0800, Tony Lindgren wrote:
> > * Mika Penttilä  [170226 21:46]:
> > > 
> > > With current linus git (pre 4.11), unable to find the pinctrl hogs :
> > > 
> > > 
> > >  imx6q-pinctrl 20e.iomuxc: unable to find group for node hoggrp
> > > 
> > > 
> > > Device is i.MX6 based.
> > 
> > Sorry to hear about that, maybe imx_pinctrl_probe_dt() should be
> > called before devm_pinctrl_register_and_init()?
> > 
> > Things got moved around a bit with e566fc11ea76 ("pinctrl: imx: use
> > generic pinctrl helpers for managing groups") it seems. But maybe that
> > was done because we did not have commit 950b0d91dc10 ("pinctrl: core:
> > Fix regression caused by delayed work for hogs") when the imx_pinctrl
> > changes got merged.
> 
> Indeed the i.MX changes were made before your the rework.

s/the/hog/

> The reason imx_pinctrl_probe_dt got moved around is because
> devm_pinctrl_register is the one that initializes the radix trees that
> are needed when probing the dt.
> 
> > Gary, are you able to reproduce this? Seems it should happen with
> > any imx with hogs configured in the dts.
> 
> Yes I can reproduce the issue.
> 
> Not sure how to fix it though since we can't move the dt probing before
> radix tree init.
> 
> Regards,
> Gary

Re: [PATCH V5 4/6] mm: reclaim MADV_FREE pages

2017-02-27 Thread Shaohua Li

On Mon, Feb 27, 2017 at 03:33:15PM +0900, Minchan Kim wrote:
> Hi Shaohua,
> 
> On Fri, Feb 24, 2017 at 01:31:47PM -0800, Shaohua Li wrote:
> > When memory pressure is high, we free MADV_FREE pages. If the pages are
> > not dirty in pte, the pages could be freed immediately. Otherwise we
> > can't reclaim them. We put the pages back to anonumous LRU list (by
> > setting SwapBacked flag) and the pages will be reclaimed in normal
> > swapout way.
> > 
> > We use normal page reclaim policy. Since MADV_FREE pages are put into
> > inactive file list, such pages and inactive file pages are reclaimed
> > according to their age. This is expected, because we don't want to
> > reclaim too many MADV_FREE pages before used once pages.
> > 
> > Based on Minchan's original patch
> > 
> > Cc: Michal Hocko 
> > Cc: Minchan Kim 
> > Cc: Hugh Dickins 
> > Cc: Johannes Weiner 
> > Cc: Rik van Riel 
> > Cc: Mel Gorman 
> > Cc: Andrew Morton 
> > Signed-off-by: Shaohua Li 
> > ---
> >  include/linux/rmap.h |  2 +-
> >  mm/huge_memory.c |  2 ++
> >  mm/madvise.c |  1 +
> >  mm/rmap.c| 40 +---
> >  mm/vmscan.c  | 34 ++
> >  5 files changed, 43 insertions(+), 36 deletions(-)
> > 
> > diff --git a/include/linux/rmap.h b/include/linux/rmap.h
> > index 7a39414..fee10d7 100644
> > --- a/include/linux/rmap.h
> > +++ b/include/linux/rmap.h
> > @@ -298,6 +298,6 @@ static inline int page_mkclean(struct page *page)
> >  #define SWAP_AGAIN 1
> >  #define SWAP_FAIL  2
> >  #define SWAP_MLOCK 3
> > -#define SWAP_LZFREE4
> > +#define SWAP_DIRTY 4
> 
> I still don't convinced why we should introduce SWAP_DIRTY in try_to_unmap.
> https://marc.info/?l=linux-mm&m=148797879123238&w=2
> 
> We have been SetPageMlocked in there but why cannot we SetPageSwapBacked
> in there? It's not a thing to change LRU type but it's just indication
> we found the page's status changed in late.

This one I don't have strong preference. Personally I agree with Johannes,
handling failure in vmscan sounds better. But since the failure handling is
just one statement, this probably doesn't make too much difference. If Johannes
and you made an agreement, I'll follow.

Thanks,
Shaohua

Re: [PATCH v5] locking/pvqspinlock: Relax cmpxchg's to improve performance on some archs

2017-02-27 Thread Pan Xinhui




在 2017/2/23 22:13, Waiman Long 写道:

All the locking related cmpxchg's in the following functions are
replaced with the _acquire variants:
 - pv_queued_spin_steal_lock()
 - trylock_clear_pending()

This change should help performance on architectures that use LL/SC.

On a 2-core 16-thread Power8 system with pvqspinlock explicitly
enabled, the performance of a locking microbenchmark with and without
this patch on a 4.10-rc8 kernel with Xinhui's PPC qspinlock patch
were as follows:

  # of thread w/o patchwith patch  % Change
  --- ---  
   4 4053.3 Mop/s  4223.7 Mop/s +4.2%
   8 3310.4 Mop/s  3406.0 Mop/s +2.9%
  12 2576.4 Mop/s  2674.6 Mop/s +3.8%

Signed-off-by: Waiman Long 
---

Works on my side :)

Reviewed-by: Pan Xinhui 


 v4->v5:
  - Correct some grammatical issues in comment.

 v3->v4:
  - Update the comment in pv_kick_node() to mention that the code
may not work in some archs.

 v2->v3:
  - Reduce scope by relaxing cmpxchg's in fast path only.

 v1->v2:
  - Add comments in changelog and code for the rationale of the change.

 kernel/locking/qspinlock_paravirt.h | 19 +--
 1 file changed, 13 insertions(+), 6 deletions(-)

diff --git a/kernel/locking/qspinlock_paravirt.h 
b/kernel/locking/qspinlock_paravirt.h
index e6b2f7a..4614e39 100644
--- a/kernel/locking/qspinlock_paravirt.h
+++ b/kernel/locking/qspinlock_paravirt.h
@@ -72,7 +72,7 @@ static inline bool pv_queued_spin_steal_lock(struct qspinlock 
*lock)
struct __qspinlock *l = (void *)lock;

if (!(atomic_read(&lock->val) & _Q_LOCKED_PENDING_MASK) &&
-   (cmpxchg(&l->locked, 0, _Q_LOCKED_VAL) == 0)) {
+   (cmpxchg_acquire(&l->locked, 0, _Q_LOCKED_VAL) == 0)) {
qstat_inc(qstat_pv_lock_stealing, true);
return true;
}
@@ -101,16 +101,16 @@ static __always_inline void clear_pending(struct 
qspinlock *lock)

 /*
  * The pending bit check in pv_queued_spin_steal_lock() isn't a memory
- * barrier. Therefore, an atomic cmpxchg() is used to acquire the lock
- * just to be sure that it will get it.
+ * barrier. Therefore, an atomic cmpxchg_acquire() is used to acquire the
+ * lock just to be sure that it will get it.
  */
 static __always_inline int trylock_clear_pending(struct qspinlock *lock)
 {
struct __qspinlock *l = (void *)lock;

return !READ_ONCE(l->locked) &&
-  (cmpxchg(&l->locked_pending, _Q_PENDING_VAL, _Q_LOCKED_VAL)
-   == _Q_PENDING_VAL);
+  (cmpxchg_acquire(&l->locked_pending, _Q_PENDING_VAL,
+   _Q_LOCKED_VAL) == _Q_PENDING_VAL);
 }
 #else /* _Q_PENDING_BITS == 8 */
 static __always_inline void set_pending(struct qspinlock *lock)
@@ -138,7 +138,7 @@ static __always_inline int trylock_clear_pending(struct 
qspinlock *lock)
 */
old = val;
new = (val & ~_Q_PENDING_MASK) | _Q_LOCKED_VAL;
-   val = atomic_cmpxchg(&lock->val, old, new);
+   val = atomic_cmpxchg_acquire(&lock->val, old, new);

if (val == old)
return 1;
@@ -361,6 +361,13 @@ static void pv_kick_node(struct qspinlock *lock, struct 
mcs_spinlock *node)
 * observe its next->locked value and advance itself.
 *
 * Matches with smp_store_mb() and cmpxchg() in pv_wait_node()
+*
+* The write to next->locked in arch_mcs_spin_unlock_contended()
+* must be ordered before the read of pn->state in the cmpxchg()
+* below for the code to work correctly. However, this is not
+* guaranteed on all architectures when the cmpxchg() call fails.
+* Both x86 and PPC can provide that guarantee, but other
+* architectures not necessarily.
 */
if (cmpxchg(&pn->state, vcpu_halted, vcpu_hashed) != vcpu_halted)
return;

Re: [PATCH v2 3/3] percpu: improve allocation success rate for non-GFP_KERNEL callers

2017-02-27 Thread Michal Hocko

On Mon 27-02-17 09:01:09, Tahsin Erdogan wrote:
> On Mon, Feb 27, 2017 at 7:25 AM, Michal Hocko  wrote:
> > /*
> >  * No space left.  Create a new chunk.  We don't want multiple
> >  * tasks to create chunks simultaneously.  Serialize and create iff
> >  * there's still no empty chunk after grabbing the mutex.
> >  */
> > if (is_atomic)
> > goto fail;
> >
> > right before pcpu_populate_chunk so is this actually a problem?
> 
> Yes, this prevents adding more pcpu chunks and so cause "atomic" allocations
> to fail more easily.

Then I fail to see what is the problem you are trying to fix.

-- 
Michal Hocko
SUSE Labs

Re: [WARNING: A/V UNSCANNABLE][Merge tag 'media/v4.11-1' of git] ff58d005cd: BUG: unable to handle kernel NULL pointer dereference at 0000039c

2017-02-27 Thread Thomas Gleixner

On Mon, 27 Feb 2017, Tony Lindgren wrote:
> * Ingo Molnar  [170227 07:44]:
> > Because it's not the requirement that hurts primarily, but the resulting 
> > non-determinism and the sporadic crashes. Which can be solved by making the 
> > race 
> > deterministic via the debug facility.
> > 
> > If the IRQ handler crashed the moment it was first written by the driver 
> > author 
> > we'd never see these problems.
> 
> Just in case this is PM related.. Maybe the spurious interrupt is pending
> from earlier? This could be caused by glitches on the lines with runtime PM,
> or a pending interrupt during suspend/resume. In that case IRQ_DISABLE_UNLAZY
> might provide more clues if the problem goes away.

It's not PM related.  That's just silly hardware. At the moment when you
enable some magic bit in the control register, which is required to probe
the version, the fricking thing spits out a spurious interrupt despite the
interrupt enable bit in the same control register being still disabled. Of
course we cannot install an interrupt handler before having probed the
version and setup other stuff, except we add magic 'if (!initialized)'
crappola into the handler and lose the ability to install version dependent
handlers afterwards.

Wonderful crap that, isn't it?

Thanks,

tglx

Re: [WARNING: A/V UNSCANNABLE][Merge tag 'media/v4.11-1' of git] ff58d005cd: BUG: unable to handle kernel NULL pointer dereference at 0000039c

2017-02-27 Thread Thomas Gleixner

On Mon, 27 Feb 2017, Ingo Molnar wrote:
> * Thomas Gleixner  wrote:
> 
> > The pending interrupt issue happens, at least on my test boxen, mostly on
> > the 'legacy' interrupts (0 - 15). But even the IOAPIC interrupts >=16
> > happen occasionally.
> >
> > 
> >  - Spurious interrupts on IRQ7, which are triggered by IRQ 0 (PIT/HPET). On
> >one of the affected machines this stops when the interrupt system is
> >switched to interrupt remapping !?!?!?
> > 
> >  - Spurious interrupts on various interrupt lines, which are triggered by
> >IOAPIC interrupts >= IRQ16. That's a known issue on quite some chipsets
> >that the legacy PCI interrupt (which is used when IOAPIC is disabled) is
> >triggered when the IOAPIC >=16 interrupt fires.
> > 
> >  - Spurious interrupt caused by driver probing itself. I.e. the driver
> >probing code causes an interrupt issued from the device
> >inadvertently. That happens even on IRQ >= 16.
> > 
> >This problem might be handled by the device driver code itself, but
> >that's going to be ugly. See below.
> 
> That's pretty colorful behavior...
> 
> > We can try to sample more data from the machines of affected users, but I 
> > doubt 
> > that it will give us more information than confirming that we really have 
> > to 
> > deal with all that hardware wreckage out there in some way or the other.
> 
> BTW., instead of trying to avoid the scenario, wow about moving in the other 
> direction: making CONFIG_DEBUG_SHIRQ=y unconditional property in the IRQ core 
> code 
> starting from v4.12 or so, i.e. requiring device driver IRQ handlers to 
> handle the 
> invocation of IRQ handlers at pretty much any moment. (We could also extend 
> it a 
> bit, such as invoking IRQ handlers early after suspend/resume wakeup.)
> 
> Because it's not the requirement that hurts primarily, but the resulting 
> non-determinism and the sporadic crashes. Which can be solved by making the 
> race 
> deterministic via the debug facility.
> 
> If the IRQ handler crashed the moment it was first written by the driver 
> author 
> we'd never see these problems.

Yes, I'd love to do that. That's just a nightmare as well.

See commit 6d83f94db95cf, which added the _FIXME suffix to that code.

So recently I tried to invoke the primary handler, which causes another
issue:

  Some of the low level code (e.g. IOAPIC interrupt migration, but also
  some PPC irq chip machinery) depends on being called in hard interrupt
  context. They invoke get_irq_regs(), which obviously does not work from
  thread context.

So I removed that one from -next as well and postponed it another time. And
I should have known before I tried it that it does not work. Simply because
of that stuff x86 cannot use the software based resend mechanism.

Still trying to wrap my head around a proper solution for the problem. On
x86 we might just check whether we are really in hard irq context and
otherwise skip the part which depends on get_irq_regs(). That would be a
sane thing to do. Have not yet looked at the PPC side of affairs, whether
that's easy to solve as well. But even if it is, then there might be still
other magic code in some irq chip drivers which relies on things which are
only available/correct when actually invoked by a hardware interrupt.

Not only the hardware has colorful behaviour 

Thanks,

tglx

Re: [PATCH 1/2] dts: mfd: axp20x: Add xpowers,master-mode property for AXP806 PMICs

2017-02-27 Thread Rob Herring

On Sat, Feb 18, 2017 at 08:51:18PM +0100, Rask Ingemann Lambertsen wrote:
> commit b101829a029a ("mfd: axp20x: Fix AXP806 access errors on cold boot")
> was intended to fix the case where a board uses an AXP806 in slave mode,
> but the boot loader leaves it in master mode for lack of AXP806 support.
> But now the driver breaks on boards where the PMIC is operating in master
> mode. To let the device tree describe which mode of operation is needed,
> this patch introduces a new property "xpowers,master-mode".
> 
> Fixes: b101829a029a ("mfd: axp20x: Fix AXP806 access errors on cold boot")
> Signed-off-by: Rask Ingemann Lambertsen 
> ---
>  Documentation/devicetree/bindings/mfd/axp20x.txt | 3 +++
>  1 file changed, 3 insertions(+)

Acked-by: Rob Herring

[PATCH v2 3.5/5] trace/kprobes: Add back warning about offset in return probes

2017-02-27 Thread Steven Rostedt (VMware)

Let's not remove the warning about offsets and return probes when the
offset is invalid.

Signed-off-by: Steven Rostedt (VMware) 
---
diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index 3f4f788..f626235 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -695,6 +695,11 @@ static int create_trace_kprobe(int argc, char **argv)
pr_info("Failed to parse symbol.\n");
return ret;
}
+   if (offset && is_return &&
+   !arch_function_offset_within_entry(offset)) {
+   pr_info("Given offset is not valid for return 
probe.\n");
+   return -EINVAL;
+   }
}
argc -= 2; argv += 2;

Re: [PATCH v2 3/3] percpu: improve allocation success rate for non-GFP_KERNEL callers

2017-02-27 Thread Tahsin Erdogan

On Mon, Feb 27, 2017 at 7:25 AM, Michal Hocko  wrote:
> /*
>  * No space left.  Create a new chunk.  We don't want multiple
>  * tasks to create chunks simultaneously.  Serialize and create iff
>  * there's still no empty chunk after grabbing the mutex.
>  */
> if (is_atomic)
> goto fail;
>
> right before pcpu_populate_chunk so is this actually a problem?

Yes, this prevents adding more pcpu chunks and so cause "atomic" allocations
to fail more easily.

>> By the way, I now noticed the might_sleep() in alloc_vmap_area() which makes
>> it unsafe to call vmalloc* in GFP_ATOMIC contexts. It was added recently:
>
> Do we call alloc_vmap_area from true atomic contexts (aka from under
> spinlocks etc)? I thought this was a nogo and GFP_NOWAIT resp.
> GFP_ATOMIC was more about optimistic request resp. access to memory
> reserves rather than true atomicity requirements.

In the call path that I am trying to fix, the caller uses GFP_NOWAIT mask.
The caller is holding a spinlock (request_queue->queue_lock) so we can't afford
to sleep.

Re: [PATCH] mm, add_memory_resource: hold device_hotplug lock over mem_hotplug_{begin, done}

2017-02-27 Thread Michal Hocko

[CC Rafael]

I've got lost in the acpi indirection (again). I can see
acpi_device_hotplug calling lock_device_hotplug() but i cannot find a
path down to add_memory() which might call add_memory_resource. But the
patch below sounds suspicious to me. Is it possible that this could lead
to a deadlock. I would suspect that it is the s390 code which needs to
do the locking. But I would have to double check - it is really easy to
get lost there.

On Sun 26-02-17 12:42:44, Sebastian Ott wrote:
> With 4.10.0-10265-gc4f3f22 the following warning is triggered on s390:
> 
> WARNING: CPU: 6 PID: 1 at drivers/base/core.c:643 
> assert_held_device_hotplug+0x4a/0x58
> [5.731214] Call Trace:
> [5.731219] ([<0067b8b0>] assert_held_device_hotplug+0x40/0x58)
> [5.731225]  [<00337914>] mem_hotplug_begin+0x34/0xc8
> [5.731231]  [<008b897e>] add_memory_resource+0x7e/0x1f8
> [5.731236]  [<008b8bd2>] add_memory+0xda/0x130
> [5.731243]  [<00d7f0dc>] add_memory_merged+0x15c/0x178
> [5.731247]  [<00d7f3a6>] sclp_detect_standby_memory+0x2ae/0x2f8
> [5.731252]  [<001002ba>] do_one_initcall+0xa2/0x150
> [5.731258]  [<00d3adc0>] kernel_init_freeable+0x228/0x2d8
> [5.731263]  [<008b6572>] kernel_init+0x2a/0x140
> [5.731267]  [<008c3972>] kernel_thread_starter+0x6/0xc
> [5.731272]  [<008c396c>] kernel_thread_starter+0x0/0xc
> [5.731276] no locks held by swapper/0/1.
> [5.731280] Last Breaking-Event-Address:
> [5.731285]  [<0067b8b6>] assert_held_device_hotplug+0x46/0x58
> [5.731292] ---[ end trace 46480df21194c96a ]---

such an informtion belongs to the changelog

> ->8
> mm, add_memory_resource: hold device_hotplug lock over mem_hotplug_{begin, 
> done}
> 
> With commit 3fc219241 ("mm: validate device_hotplug is held for memory 
> hotplug")
> a lock assertion was added to mem_hotplug_begin() which led to a warning
> when add_memory() is called. Fix this by acquiring device_hotplug_lock in
> add_memory_resource().
> 
> Signed-off-by: Sebastian Ott 
> ---
>  mm/memory_hotplug.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 1d3ed58..c633bbc 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -1361,6 +1361,7 @@ int __ref add_memory_resource(int nid, struct resource 
> *res, bool online)
>   new_pgdat = !p;
>   }
>  
> + lock_device_hotplug();
>   mem_hotplug_begin();
>  
>   /*
> @@ -1416,6 +1417,7 @@ int __ref add_memory_resource(int nid, struct resource 
> *res, bool online)
>  
>  out:
>   mem_hotplug_done();
> + unlock_device_hotplug();
>   return ret;
>  }
>  EXPORT_SYMBOL_GPL(add_memory_resource);
> -- 
> 2.3.0
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majord...@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: mailto:"d...@kvack.org";> em...@kvack.org 

-- 
Michal Hocko
SUSE Labs

Re: [PATCH v2 1/2] of: add devm_ functions for populate and depopulate

2017-02-27 Thread Daniel Vetter

On Mon, Feb 27, 2017 at 07:30:24AM -0600, Rob Herring wrote:
> On Sun, Feb 26, 2017 at 2:11 PM, Daniel Vetter  wrote:
> > On Fri, Feb 24, 2017 at 10:31:25AM -0600, Rob Herring wrote:
> >> On Fri, Feb 24, 2017 at 10:14 AM, Benjamin Gaignard
> >>  wrote:
> >> > Lots of calls to of_platform_populate() are not unbalanced by a call
> >> > to of_platform_depopulate(). This create issues while drivers are
> >> > bind/unbind.
> >> >
> >> > In way to solve those issues is to add devm_of_platform_populate()
> >> > which will call of_platform_depopulate() when the device is unbound
> >> > from the bus.
> >> >
> >> > Signed-off-by: Benjamin Gaignard 
> >> > ---
> >> > version 2:
> >> > - simplify function prototype to only keep device as parameter
> >> >
> >> >  drivers/of/platform.c   | 71 
> >> > +
> >> >  include/linux/of_platform.h | 11 +++
> >> >  2 files changed, 82 insertions(+)
> >>
> >> For this and patch 2:
> >>
> >> Acked-by: Rob Herring 
> >
> > Is this an ack for merging both through drm-misc, or should we do a
> > topic-branch dance here?
> 
> You can apply it directly.

Ok, this seems like a pretty cool idea, so applied both to drm-misc for
4.12.

Cheers, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

Re: [REGRESSION] pinctrl, of, unable to find hogs

2017-02-27 Thread Tony Lindgren

* Mika Penttilä  [170226 21:46]:
> 
> With current linus git (pre 4.11), unable to find the pinctrl hogs :
> 
> 
>  imx6q-pinctrl 20e.iomuxc: unable to find group for node hoggrp
> 
> 
> Device is i.MX6 based.

Sorry to hear about that, maybe imx_pinctrl_probe_dt() should be
called before devm_pinctrl_register_and_init()?

Things got moved around a bit with e566fc11ea76 ("pinctrl: imx: use
generic pinctrl helpers for managing groups") it seems. But maybe that
was done because we did not have commit 950b0d91dc10 ("pinctrl: core:
Fix regression caused by delayed work for hogs") when the imx_pinctrl
changes got merged.

Gary, are you able to reproduce this? Seems it should happen with
any imx with hogs configured in the dts.

Regards,

Tony

Re: [PATCH 0/6] ARM: dts: socfpga: Fix dtc warnings

2017-02-27 Thread Dinh Nguyen



On 02/24/2017 11:54 PM, Florian Vaussard wrote:
> We get a bunch of warnings when compiling the SoCFPGA device trees with W=1.
> This warnings happens because some nodes have a unit name but no 'reg' 
> property,
> or are missing a unit name while having a 'reg' property. This series enables 
> to
> compile all SoCFPGA dts without warnings.
> 
> Patches 1 to 5 fix these warnings by adding the unit name or removing the
> 'reg' property when approriate.
> 
> Patch 6 removes the inclusion of the deprecated skeleton.dtsi file, as SoCFPGA
> device trees already define the mandatory properties and nodes. This is needed
> to remove the latest warning.
> 
> This series was boot tested on a Cyclone5 SoC DK, thus covering some of the
> changes dones by patches 1 -> 3 + 6.
> 
> Regards,
> Florian
> 
> Florian Vaussard (6):
>   ARM: dts: socfpga: Add unit name to clock nodes
>   ARM: dts: socfpga: Add unit name to memory nodes
>   ARM: dts: socfpga: Remove unneeded unit names
>   ARM: dts: socfpga: Remove unneeded reg from stmpe_touchscreen
>   ARM: dts: socfpga: Remove unit name for LEDs in EBV SOCrates
>   ARM: dts: socfpga: Do not include skeleton.dtsi
> 
>  arch/arm/boot/dts/socfpga.dtsi | 43 +-
>  arch/arm/boot/dts/socfpga_arria10.dtsi | 51 
> +++---
>  arch/arm/boot/dts/socfpga_arria10_socdk.dtsi   |  2 +-
>  arch/arm/boot/dts/socfpga_arria5_socdk.dts |  2 +-
>  arch/arm/boot/dts/socfpga_cyclone5_de0_sockit.dts  |  2 +-
>  arch/arm/boot/dts/socfpga_cyclone5_mcv.dtsi|  2 +-
>  arch/arm/boot/dts/socfpga_cyclone5_mcvevk.dts  |  1 -
>  arch/arm/boot/dts/socfpga_cyclone5_socdk.dts   |  2 +-
>  arch/arm/boot/dts/socfpga_cyclone5_sockit.dts  |  2 +-
>  arch/arm/boot/dts/socfpga_cyclone5_socrates.dts|  8 ++--
>  arch/arm/boot/dts/socfpga_cyclone5_sodia.dts   |  2 +-
>  arch/arm/boot/dts/socfpga_cyclone5_vining_fpga.dts |  2 +-
>  arch/arm/boot/dts/socfpga_vt.dts   |  2 +-
>  13 files changed, 59 insertions(+), 62 deletions(-)
> 

All applied!

Thanks,
Dinh

Re: [PATCH] input: arizona-haptics - Add device property for HAP_ACT

2017-02-27 Thread Rob Herring

On Thu, Feb 16, 2017 at 05:35:58PM +, Richard Fitzgerald wrote:
> This adds a device property for setting the configuration for the
> HAP_ACT register field so that the connected actuator type can be
> configured on systems that are not using pdata.
> 
> Signed-off-by: Richard Fitzgerald 
> ---
>  Documentation/devicetree/bindings/input/arizona-haptics.txt | 10 ++
>  MAINTAINERS |  1 +
>  drivers/input/misc/arizona-haptics.c|  7 +++
>  3 files changed, 18 insertions(+)
>  create mode 100644 
> Documentation/devicetree/bindings/input/arizona-haptics.txt
> 
> diff --git a/Documentation/devicetree/bindings/input/arizona-haptics.txt 
> b/Documentation/devicetree/bindings/input/arizona-haptics.txt
> new file mode 100644
> index 000..a3e767b
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/input/arizona-haptics.txt
> @@ -0,0 +1,10 @@
> +Cirrus Logic Arizona class audio SoCs
> +
> +This document lists haptics bindings for these codecs.
> +Also see the primary binding document:
> +  ../mfd/arizona.txt
> +
> +Optional properties:
> +  - wlf,hap-act : Single value defining the actuator type, as per the HAP_ACT
> +register field. See the codec datasheet for the available HAP_ACT values
> +and their meaning.

What node does this property apply to? Document this in mfd/arizona.txt.

I would like the possible values documented here or a better 
education as to what this is because I have no idea if this property 
makes sense and I can't read everybody's datasheet. Just because you 
have it as platform data doesn't mean it belongs in DT.

Rob

Re: [PATCH V5 2/6] mm: don't assume anonymous pages have SwapBacked flag

2017-02-27 Thread Michal Hocko

On Mon 27-02-17 08:10:24, Shaohua Li wrote:
> On Mon, Feb 27, 2017 at 03:35:34PM +0100, Michal Hocko wrote:
> > On Fri 24-02-17 13:31:45, Shaohua Li wrote:
> > > There are a few places the code assumes anonymous pages should have
> > > SwapBacked flag set. MADV_FREE pages are anonymous pages but we are
> > > going to add them to LRU_INACTIVE_FILE list and clear SwapBacked flag
> > > for them. The assumption doesn't hold any more, so fix them.
> > > 
> > > Cc: Michal Hocko 
> > > Cc: Minchan Kim 
> > > Cc: Hugh Dickins 
> > > Cc: Rik van Riel 
> > > Cc: Mel Gorman 
> > > Cc: Andrew Morton 
> > > Acked-by: Johannes Weiner 
> > > Signed-off-by: Shaohua Li 

Anyway, feel free to add
Acked-by: Michal Hocko 

> > 
> > Looks good to me.
> > [...]
> > > index 96eb85c..c621088 100644
> > > --- a/mm/rmap.c
> > > +++ b/mm/rmap.c
> > > @@ -1416,7 +1416,8 @@ static int try_to_unmap_one(struct page *page, 
> > > struct vm_area_struct *vma,
> > >* Store the swap location in the pte.
> > >* See handle_pte_fault() ...
> > >*/
> > > - VM_BUG_ON_PAGE(!PageSwapCache(page), page);
> > > + VM_BUG_ON_PAGE(!PageSwapCache(page) && 
> > > PageSwapBacked(page),
> > > + page);
> > 
> > just this part makes me scratch my head. I really do not understand what
> > kind of problem it tries to prevent from, maybe I am missing something
> > obvoious...
> 
> Just check a page which isn't lazyfree but wrongly enters here without swap
> entry. Or maybe you suggest we delete this statement?

Ohh, I figured out when seeing later patch in the series, I then wanted
to get back to this one but forgot... This on its own didn't really tell
me much. Maybe a comment would be helpful or even drop the VM_BUG_ON
altogether.
-- 
Michal Hocko
SUSE Labs

Re: [ANNOUNCE] Git v2.12.0

2017-02-27 Thread Ævar Arnfjörð Bjarmason

On Fri, Feb 24, 2017 at 8:28 PM, Junio C Hamano  wrote:
> The latest feature release Git v2.12.0 is now available at the
> usual places.  It is comprised of 517 non-merge commits since
> v2.11.0, contributed by 80 people, 24 of which are new faces.

Yay, some explanations / notes / elaborations:

>  * "git diff" learned diff.interHunkContext configuration variable
>that gives the default value for its --inter-hunk-context option.

This is really cool. Now if you have e.g. lots of changed lines each
10 lines apart --inter-hunk-context=10 will show those all as one big
hunk, instead of needing to specify -U10 as you had to before, which
would give all hunks a context of 10 lines.

>  * An ancient repository conversion tool left in contrib/ has been
>removed.

I thought "what tool?" so here's what this is. git.git was born on
April 7, 2005. For the first 13 days we'd hash the contents of
*compressed* blobs, not their uncompressed contents. Linus changed
this in: https://github.com/git/git/commit/d98b46f8d9

This tool was the ancient tool to convert these old incompatible
repositories from the old format. If someone hasn't gotten around to
this since 2005 they probably aren't ever going to do it :)

>  * Some people feel the default set of colors used by "git log --graph"
>rather limiting.  A mechanism to customize the set of colors has
>been introduced.

This is controlled via the log.graphColors variable. E.g.:

git -c log.graphColors="red, green, yellow" log --graph HEAD~100..

Does anyone have a prettier invocation?

>  * "git diff" and its family had two experimental heuristics to shift
>the contents of a hunk to make the patch easier to read.  One of
>them turns out to be better than the other, so leave only the
>"--indent-heuristic" option and remove the other one.

... the other one being --compaction-heuristic.

Re: tip.today - scheduler bam boom crash (cpu hotplug)

2017-02-27 Thread Peter Zijlstra

On Mon, Feb 27, 2017 at 04:27:32PM +0100, Paolo Bonzini wrote:
> 
> 
> On 27/02/2017 14:04, Peter Zijlstra wrote:
>  This results in sched clock always unstable for kvm guest since there
>  is no invariant tsc cpuid bit exposed for kvm guest currently. 
> >>> What the heck is KVM_FEATURE_CLOCKSOURCE_STABLE_BIT /
> >>> PVCLOCK_TSC_STABLE_BIT about then?
> >> It checks that all the bugs in the host have been ironed out, and that
> >> the host itself supports invtsc.
> > But what does it mean if that is not so? That is, will kvm_clock_read()
> > still be stable even if !stable?
> 
> If kvmclock is !stable, nobody should have set that the sched clock to
> stable, to begin with.

OK, so if !KVM_FEATURE_CLOCKSOURCE_STABLE_BIT nothing is stable, but if
it is set, TSC might still not be stable, but kvm_clock_read() is.

> However, if kvmclock is stable, we know that the sched clock is stable.

Right, so the problem is that we only ever want to allow marking
unstable -- once its found unstable, for whatever reason, we should
never allow going stable. The corollary of this proposition is that we
must start out assuming it will become stable. And to avoid actually
using unstable TSC we do a 3 state bringup:

 1) sched_clock_running = 0, __stable_early = 1, __stable = 0
 2) sched_clock_running = 1 (__stable is effective, iow, we run unstable)
 3) sched_clock_running = 2 (__stable <- __stable_early)

2) happens 'early' but is 'safe'.
3) happens 'late', after we've brought up SMP and probed TSC

Between there, we should have detected the most common TSC wreckage and
made sure to not then switch to 'stable' at 3.

Now the problem appears to be that we assume sched_clock will use RDTSC
(native_sched_clock) while sched_clock is a paravirt op.

Now, I've not yet figured out the ordering between when we set
pv_time_ops.sched_clock and when we do the 'normal' TSC init stuff.

But it appears to me, we should not be calling
clear_sched_clock_stable() on TSC bits when we don't end up using
native_sched_clock().

Re: [PATCH] [linux-next] xenbus: Remove duplicate inclusion of linux/init.h

2017-02-27 Thread Boris Ostrovsky

On 02/27/2017 02:56 AM, Juergen Gross wrote:
> On 26/02/17 08:15, Masanari Iida wrote:
>> This patch remove duplicate inclusion of linux/init.h in
>> xenbus_dev_frontend.c.
>> Confirm successfully compile after remove the line.
>>
>> Signed-off-by: Masanari Iida 
> Reviewed-by: Juergen Gross 


Applied to for-linus-4.11.

-boris

Re: stack frame unwindind KASAN errors

2017-02-27 Thread Josh Poimboeuf

On Mon, Feb 27, 2017 at 12:49:59PM +0800, Daniel J Blueman wrote:
> On 4.9.13 with KASAN enabled [1], we see a number of stack unwinding
> errors reported [2,3].
> 
> This seems to occur at half of boots.
> 
> Let me know for further debug info/patch testing and thanks,
>   Daniel
> 
> [1] https://quora.org/config
> [2] https://quora.org/dmesg.txt

Hi Daniel,

Can you try the following patch?  It's a backport of the following
upstream commit:

  09ae68dd0a8d ("x86/unwind: Disable KASAN checks for non-current tasks")

If it fixes it then I'll submit it for 4.9 stable.

---

From: Josh Poimboeuf 
Subject: [PATCH] x86/unwind: Disable KASAN checks for non-current tasks

There are a handful of callers to save_stack_trace_tsk() and
show_stack() which try to unwind the stack of a task other than current.
In such cases, it's remotely possible that the task is running on one
CPU while the unwinder is reading its stack from another CPU, causing
the unwinder to see stack corruption.

These cases seem to be mostly harmless.  The unwinder has checks which
prevent it from following bad pointers beyond the bounds of the stack.
So it's not really a bug as long as the caller understands that
unwinding another task will not always succeed.

In such cases, it's possible that the unwinder may read a KASAN-poisoned
region of the stack.  Account for that by using READ_ONCE_NOCHECK() when
reading the stack of another task.

Use READ_ONCE() when reading the stack of the current task, since KASAN
warnings can still be useful for finding bugs in that case.

Reported-by: Dmitry Vyukov 
Signed-off-by: Josh Poimboeuf 
Cc: Andy Lutomirski 
Cc: Andy Lutomirski 
Cc: Borislav Petkov 
Cc: Brian Gerst 
Cc: Dave Jones 
Cc: Denys Vlasenko 
Cc: H. Peter Anvin 
Cc: Linus Torvalds 
Cc: Miroslav Benes 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Link: 
http://lkml.kernel.org/r/4c575eb288ba9f73d498dfe0acde2f58674598f1.1483978430.git.jpoim...@redhat.com
Signed-off-by: Ingo Molnar 
---
 arch/x86/include/asm/stacktrace.h |  5 -
 arch/x86/kernel/unwind_frame.c| 20 ++--
 2 files changed, 22 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/stacktrace.h 
b/arch/x86/include/asm/stacktrace.h
index 37f2e0b..4141ead 100644
--- a/arch/x86/include/asm/stacktrace.h
+++ b/arch/x86/include/asm/stacktrace.h
@@ -55,13 +55,16 @@ extern int kstack_depth_to_print;
 static inline unsigned long *
 get_frame_pointer(struct task_struct *task, struct pt_regs *regs)
 {
+   struct inactive_task_frame *frame;
+
if (regs)
return (unsigned long *)regs->bp;
 
if (task == current)
return __builtin_frame_address(0);
 
-   return (unsigned long *)((struct inactive_task_frame 
*)task->thread.sp)->bp;
+   frame = (struct inactive_task_frame *)task->thread.sp;
+   return (unsigned long *)READ_ONCE_NOCHECK(frame->bp);
 }
 #else
 static inline unsigned long *
diff --git a/arch/x86/kernel/unwind_frame.c b/arch/x86/kernel/unwind_frame.c
index a2456d4..caff129 100644
--- a/arch/x86/kernel/unwind_frame.c
+++ b/arch/x86/kernel/unwind_frame.c
@@ -6,6 +6,21 @@
 
 #define FRAME_HEADER_SIZE (sizeof(long) * 2)
 
+/*
+ * This disables KASAN checking when reading a value from another task's stack,
+ * since the other task could be running on another CPU and could have poisoned
+ * the stack in the meantime.
+ */
+#define READ_ONCE_TASK_STACK(task, x)  \
+({ \
+   unsigned long val;  \
+   if (task == current)\
+   val = READ_ONCE(x); \
+   else\
+   val = READ_ONCE_NOCHECK(x); \
+   val;\
+})
+
 unsigned long unwind_get_return_address(struct unwind_state *state)
 {
unsigned long addr;
@@ -14,7 +29,8 @@ unsigned long unwind_get_return_address(struct unwind_state 
*state)
if (unwind_done(state))
return 0;
 
-   addr = ftrace_graph_ret_addr(state->task, &state->graph_idx, *addr_p,
+   addr = READ_ONCE_TASK_STACK(state->task, *addr_p);
+   addr = ftrace_graph_ret_addr(state->task, &state->graph_idx, addr,
 addr_p);
 
return __kernel_text_address(addr) ? addr : 0;
@@ -48,7 +64,7 @@ bool unwind_next_frame(struct unwind_state *state)
if (unwind_done(state))
return false;
 
-   next_bp = (unsigned long *)*state->bp;
+   next_bp = (unsigned long *)READ_ONCE_TASK_STACK(state->task, 
*state->bp);
 
/* make sure the next frame's data is accessible */
if (!update_stack_state(state, next_bp, FRAME_HEADER_SIZE))
-- 
2.7.4

Re: [PATCH v2 3/5] trace/kprobes: allow return probes with offsets and absolute addresses

2017-02-27 Thread Steven Rostedt

On Wed, 22 Feb 2017 19:23:39 +0530
"Naveen N. Rao"  wrote:

> Since the kernel includes many non-global functions with same names, we
> will need to use offsets from other symbols (typically _text/_stext) or
> absolute addresses to place return probes on specific functions. Also,
> the core register_kretprobe() API never forbid use of offsets or
> absolute addresses with kretprobes.
> 
> Allow its use with the trace infrastructure. To distinguish kernels that
> support this, update ftrace README to explicitly call this out.
> 
> Signed-off-by: Naveen N. Rao 
> ---
>  kernel/trace/trace.c| 1 +
>  kernel/trace/trace_kprobe.c | 8 
>  2 files changed, 1 insertion(+), 8 deletions(-)
> 
> diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
> index d7449783987a..ababe56b3e8f 100644
> --- a/kernel/trace/trace.c
> +++ b/kernel/trace/trace.c
> @@ -4362,6 +4362,7 @@ static const char readme_msg[] =
>   "\t   -:[/]\n"
>  #ifdef CONFIG_KPROBE_EVENT
>   "\tplace: [:][+]|\n"
> +  "place (kretprobe): [:][+]|\n"
>  #endif
>  #ifdef CONFIG_UPROBE_EVENT
>   "\tplace: :\n"
> diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
> index 7ad9e53ad174..2768cb60ebd7 100644
> --- a/kernel/trace/trace_kprobe.c
> +++ b/kernel/trace/trace_kprobe.c
> @@ -679,10 +679,6 @@ static int create_trace_kprobe(int argc, char **argv)
>   return -EINVAL;
>   }
>   if (isdigit(argv[1][0])) {
> - if (is_return) {
> - pr_info("Return probe point must be a symbol.\n");
> - return -EINVAL;
> - }
>   /* an address specified */
>   ret = kstrtoul(&argv[1][0], 0, (unsigned long *)&addr);
>   if (ret) {
> @@ -698,10 +694,6 @@ static int create_trace_kprobe(int argc, char **argv)
>   pr_info("Failed to parse symbol.\n");
>   return ret;
>   }
> - if (offset && is_return) {
> - pr_info("Return probe must be used without offset.\n");
> - return -EINVAL;
> - }

I understand that your retprobes will now have an offset, but I'm
worried we are removing informative errors. For those archs that don't
allow an offset, will we still get the error telling users that offsets
are not allowed?

I don't want to lose informative error handling.

-- Steve


>   }
>   argc -= 2; argv += 2;
>

Re: [REGRESSION] pinctrl, of, unable to find hogs

2017-02-27 Thread Gary Bisson

Mika, Tony, All,

On Mon, Feb 27, 2017 at 07:53:53AM -0800, Tony Lindgren wrote:
> * Mika Penttilä  [170226 21:46]:
> > 
> > With current linus git (pre 4.11), unable to find the pinctrl hogs :
> > 
> > 
> >  imx6q-pinctrl 20e.iomuxc: unable to find group for node hoggrp
> > 
> > 
> > Device is i.MX6 based.
> 
> Sorry to hear about that, maybe imx_pinctrl_probe_dt() should be
> called before devm_pinctrl_register_and_init()?
> 
> Things got moved around a bit with e566fc11ea76 ("pinctrl: imx: use
> generic pinctrl helpers for managing groups") it seems. But maybe that
> was done because we did not have commit 950b0d91dc10 ("pinctrl: core:
> Fix regression caused by delayed work for hogs") when the imx_pinctrl
> changes got merged.

Indeed the i.MX changes were made before your the rework.

The reason imx_pinctrl_probe_dt got moved around is because
devm_pinctrl_register is the one that initializes the radix trees that
are needed when probing the dt.

> Gary, are you able to reproduce this? Seems it should happen with
> any imx with hogs configured in the dts.

Yes I can reproduce the issue.

Not sure how to fix it though since we can't move the dt probing before
radix tree init.

Regards,
Gary

Re: tip.today - scheduler bam boom crash (cpu hotplug)

2017-02-27 Thread Peter Zijlstra

On Mon, Feb 27, 2017 at 05:11:34PM +0100, Paolo Bonzini wrote:
> On 27/02/2017 16:59, Peter Zijlstra wrote:

> > Now, I've not yet figured out the ordering between when we set
> > pv_time_ops.sched_clock and when we do the 'normal' TSC init stuff.
> 
> I think the ordering is fine:
> 
> - pv_time_ops.sched_clock is set here:
> 
>   start_kernel (init/main.c line 509)
> setup_arch
>   kvmclock_init
> kvm_sched_clock_init
> 
> - TSC can be declared unstable only after this:
> 
>   start_kernel (init/main.c line 628)
> late_time_init
>   tsc_init
> 
> So by the time the tsc_cs_mark_unstable or mark_tsc_unstable can call
> clear_sched_clock_stable, pv_time_ops.sched_clock has been set.
> 
> > But it appears to me, we should not be calling
> > clear_sched_clock_stable() on TSC bits when we don't end up using
> > native_sched_clock().
> 
> Yes, this makes sense.

Something like the below then (completely untested for now) has a chance
of working. It does however change behaviour a little.

I'm trying to debug something else; after that I'll give this a little
more consideration.

---
 arch/x86/kernel/cpu/amd.c   |  4 +---
 arch/x86/kernel/cpu/centaur.c   |  2 +-
 arch/x86/kernel/cpu/common.c|  4 ++--
 arch/x86/kernel/cpu/cyrix.c |  2 +-
 arch/x86/kernel/cpu/intel.c |  4 +---
 arch/x86/kernel/cpu/transmeta.c |  2 +-
 arch/x86/kernel/tsc.c   | 33 +
 7 files changed, 28 insertions(+), 23 deletions(-)

diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index 4e95b2e0d95f..bc3bbb6a8ab0 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -555,10 +555,8 @@ static void early_init_amd(struct cpuinfo_x86 *c)
if (c->x86_power & (1 << 8)) {
set_cpu_cap(c, X86_FEATURE_CONSTANT_TSC);
set_cpu_cap(c, X86_FEATURE_NONSTOP_TSC);
-   if (check_tsc_unstable())
-   clear_sched_clock_stable();
} else {
-   clear_sched_clock_stable();
+   mark_tsc_unstable("not invariant");
}
 
/* Bit 12 of 8000_0007 edx is accumulated power mechanism. */
diff --git a/arch/x86/kernel/cpu/centaur.c b/arch/x86/kernel/cpu/centaur.c
index 2c234a6d94c4..0fdff183aa30 100644
--- a/arch/x86/kernel/cpu/centaur.c
+++ b/arch/x86/kernel/cpu/centaur.c
@@ -105,7 +105,7 @@ static void early_init_centaur(struct cpuinfo_x86 *c)
set_cpu_cap(c, X86_FEATURE_SYSENTER32);
 #endif
 
-   clear_sched_clock_stable();
+   mark_tsc_unstable("not invariant");
 }
 
 static void init_centaur(struct cpuinfo_x86 *c)
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index f07005e6f461..2b7ff648ea25 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -86,7 +86,7 @@ static void default_init(struct cpuinfo_x86 *c)
strcpy(c->x86_model_id, "386");
}
 #endif
-   clear_sched_clock_stable();
+   mark_tsc_unstable("not invariant");
 }
 
 static const struct cpu_dev default_cpu = {
@@ -1076,7 +1076,7 @@ static void identify_cpu(struct cpuinfo_x86 *c)
if (this_cpu->c_init)
this_cpu->c_init(c);
else
-   clear_sched_clock_stable();
+   mark_tsc_unstable("not invariant");
 
/* Disable the PN if appropriate */
squash_the_stupid_serial_number(c);
diff --git a/arch/x86/kernel/cpu/cyrix.c b/arch/x86/kernel/cpu/cyrix.c
index 47416f959a48..35057d67e864 100644
--- a/arch/x86/kernel/cpu/cyrix.c
+++ b/arch/x86/kernel/cpu/cyrix.c
@@ -184,7 +184,7 @@ static void early_init_cyrix(struct cpuinfo_x86 *c)
set_cpu_cap(c, X86_FEATURE_CYRIX_ARR);
break;
}
-   clear_sched_clock_stable();
+   mark_tsc_unstable("not invariant");
 }
 
 static void init_cyrix(struct cpuinfo_x86 *c)
diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index 017ecd3bb553..e0e192e43a4c 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -161,10 +161,8 @@ static void early_init_intel(struct cpuinfo_x86 *c)
if (c->x86_power & (1 << 8)) {
set_cpu_cap(c, X86_FEATURE_CONSTANT_TSC);
set_cpu_cap(c, X86_FEATURE_NONSTOP_TSC);
-   if (check_tsc_unstable())
-   clear_sched_clock_stable();
} else {
-   clear_sched_clock_stable();
+   mark_tsc_unstable("not invariant");
}
 
/* Penwell and Cloverview have the TSC which doesn't sleep on S3 */
diff --git a/arch/x86/kernel/cpu/transmeta.c b/arch/x86/kernel/cpu/transmeta.c
index c1ea5b999839..fa243ffd1c84 100644
--- a/arch/x86/kernel/cpu/transmeta.c
+++ b/arch/x86/kernel/cpu/transmeta.c
@@ -16,7 +16,7 @@ static void early_init_transmeta(struct cpuinfo_x86 *c)
c->x86_capability[CPUID_8086_0001_EDX] = 
cpuid_edx(0x80860001);
}
 
-

Re: [PATCH V5 4/6] mm: reclaim MADV_FREE pages

2017-02-27 Thread Michal Hocko

On Mon 27-02-17 08:19:08, Shaohua Li wrote:
> On Mon, Feb 27, 2017 at 03:33:15PM +0900, Minchan Kim wrote:
[...]
> > > --- a/include/linux/rmap.h
> > > +++ b/include/linux/rmap.h
> > > @@ -298,6 +298,6 @@ static inline int page_mkclean(struct page *page)
> > >  #define SWAP_AGAIN   1
> > >  #define SWAP_FAIL2
> > >  #define SWAP_MLOCK   3
> > > -#define SWAP_LZFREE  4
> > > +#define SWAP_DIRTY   4
> > 
> > I still don't convinced why we should introduce SWAP_DIRTY in try_to_unmap.
> > https://marc.info/?l=linux-mm&m=148797879123238&w=2
> > 
> > We have been SetPageMlocked in there but why cannot we SetPageSwapBacked
> > in there? It's not a thing to change LRU type but it's just indication
> > we found the page's status changed in late.
> 
> This one I don't have strong preference. Personally I agree with Johannes,
> handling failure in vmscan sounds better. But since the failure handling is
> just one statement, this probably doesn't make too much difference. If 
> Johannes
> and you made an agreement, I'll follow.

FWIW I like your current SWAP_DIRTY and the later handling at the vmscan
level more.
-- 
Michal Hocko
SUSE Labs

Re: [PATCH V5 3/6] mm: move MADV_FREE pages into LRU_INACTIVE_FILE list

2017-02-27 Thread Michal Hocko

On Mon 27-02-17 08:13:10, Shaohua Li wrote:
> On Mon, Feb 27, 2017 at 03:28:01PM +0900, Minchan Kim wrote:
[...]
> > This patch doesn't address I pointed out in v4.
> > 
> > https://marc.info/?i=20170224233752.GB4635%40bbox
> > 
> > Let's discuss it if you still are against.
> 
> I really think a spearate patch makes the code clearer. There are a lot of
> places we introduce a function but don't use it immediately, if the way makes
> the code clearer. But anyway, I'll let Andrew decide if the two patches should
> be merged.

I agree that it is almost always _preferable_ to add new functions along
with their callers. In this particular case I would lean towards keeping
the separation the way Shaohua did it because it makes the code really
cleaner IMHO.
-- 
Michal Hocko
SUSE Labs

Re: [PATCH 3/4] fpga dt: bindings for Altera Partial Reconfiguraion IP.

2017-02-27 Thread matthew . gerlach




On Mon, 27 Feb 2017, Rob Herring wrote:

Hi Rob,



On Wed, Feb 15, 2017 at 01:10:37PM -0800, matthew.gerl...@linux.intel.com wrote:

From: Matthew Gerlach 

Device Tree bindings for Altera Partial Reconfiguraion IP?

Signed-off-by: Matthew Gerlach 
---
 Documentation/devicetree/bindings/fpga/altera-pr-ip.txt | 12 
 1 file changed, 12 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/fpga/altera-pr-ip.txt

diff --git a/Documentation/devicetree/bindings/fpga/altera-pr-ip.txt 
b/Documentation/devicetree/bindings/fpga/altera-pr-ip.txt
new file mode 100644
index 000..ada821f
--- /dev/null
+++ b/Documentation/devicetree/bindings/fpga/altera-pr-ip.txt
@@ -0,0 +1,12 @@
+Altera Partial Reconfiguration IP
+
+Required properties:
+- compatible : should contain "altr,pr-ip"


Kind of generic. There's only one version of h/w?



Fair point on being generic.  It does match the published documentation, 
but we could be more specific with "altr,a10-pr-ip" because it 
really is only for an Arria10.


Matthew Gerlach




+- reg: base address and size for memory mapped io.
+
+Example:
+
+   fpga_mgr: fpga-mgr@ff20c000 {
+   compatible = "altr,pr-ip";
+   reg = <0xff20c000 0x10>;
+   };
--
2.7.4

Re: [WARNING: A/V UNSCANNABLE][Merge tag 'media/v4.11-1' of git] ff58d005cd: BUG: unable to handle kernel NULL pointer dereference at 0000039c

2017-02-27 Thread Tony Lindgren

* Thomas Gleixner  [170227 08:20]:
> On Mon, 27 Feb 2017, Tony Lindgren wrote:
> > * Ingo Molnar  [170227 07:44]:
> > > Because it's not the requirement that hurts primarily, but the resulting 
> > > non-determinism and the sporadic crashes. Which can be solved by making 
> > > the race 
> > > deterministic via the debug facility.
> > > 
> > > If the IRQ handler crashed the moment it was first written by the driver 
> > > author 
> > > we'd never see these problems.
> > 
> > Just in case this is PM related.. Maybe the spurious interrupt is pending
> > from earlier? This could be caused by glitches on the lines with runtime PM,
> > or a pending interrupt during suspend/resume. In that case 
> > IRQ_DISABLE_UNLAZY
> > might provide more clues if the problem goes away.
> 
> It's not PM related.  That's just silly hardware. At the moment when you
> enable some magic bit in the control register, which is required to probe
> the version, the fricking thing spits out a spurious interrupt despite the
> interrupt enable bit in the same control register being still disabled. Of
> course we cannot install an interrupt handler before having probed the
> version and setup other stuff, except we add magic 'if (!initialized)'
> crappola into the handler and lose the ability to install version dependent
> handlers afterwards.

OK and presumably no -EPROBE_DEFER happening either.

> Wonderful crap that, isn't it?

Sounds broken..

Regards,

Tony

Re: net/sctp: use-after-free in sctp_hash_transport

2017-02-27 Thread Xin Long

On Mon, Feb 27, 2017 at 11:45 PM, Andrey Konovalov
 wrote:
> Hi,
>
> I've got the following error report while fuzzing the kernel with syzkaller.
>
> On commit e5d56efc97f8240d0b5d66c03949382b6d7e5570 (Feb 26).
>
> A reproducer and .config are attached.
>
> ===
> [ ERR: suspicious RCU usage.  ]
> 4.10.0+ #54 Not tainted
> ---
> ./include/linux/rhashtable.h:602 suspicious rcu_dereference_check() usage!
>
> other info that might help us debug this:
>
>
> rcu_scheduler_active = 2, debug_locks = 0
> 1 lock held by a.out/4189:
>  #0:  (sk_lock-AF_INET6){+.+.+.}, at: []
> sctp_setsockopt+0x318/0x5f10
>
> stack backtrace:
> CPU: 1 PID: 4189 Comm: a.out Not tainted 4.10.0+ #54
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> Call Trace:
>  __dump_stack lib/dump_stack.c:15
>  dump_stack+0x292/0x398 lib/dump_stack.c:51
>  lockdep_rcu_suspicious+0x139/0x180 kernel/locking/lockdep.c:4452
>  __rhashtable_lookup ./include/linux/rhashtable.h:602
>  rhltable_lookup ./include/linux/rhashtable.h:690
>  sctp_hash_transport+0x826/0xcc0 net/sctp/input.c:887
>  sctp_assoc_add_peer+0xd0b/0x1470 net/sctp/associola.c:716
>  __sctp_connect+0x26d/0xdb0 net/sctp/socket.c:1184
>  __sctp_setsockopt_connectx+0x197/0x200 net/sctp/socket.c:1338
>  sctp_setsockopt_connectx net/sctp/socket.c:1370
>  sctp_setsockopt+0x15fa/0x5f10 net/sctp/socket.c:3936
>  sock_common_setsockopt+0x95/0xd0 net/core/sock.c:2725
>  SYSC_setsockopt net/socket.c:1786
>  SyS_setsockopt+0x270/0x3a0 net/socket.c:1765
>  entry_SYSCALL_64_fastpath+0x1f/0xc2 arch/x86/entry/entry_64.S:204
> RIP: 0033:0x7f3e27a55b79
> RSP: 002b:7f3e2296fd98 EFLAGS: 0206 ORIG_RAX: 0036
> RAX: ffda RBX: 7f3e229709c0 RCX: 7f3e27a55b79
> RDX: 006e RSI: 0084 RDI: 0003
> RBP: 7f3e27f21220 R08: 0010 R09: 
> R10: 20004000 R11: 0206 R12: 
> R13: 7f3e229709c0 R14: 7f3e2834c040 R15: 0003
> ==
> BUG: KASAN: use-after-free in sctp_hash_transport+0x855/0xcc0 at addr
> 8800671e1f8c
> Read of size 4 by task a.out/4189
> CPU: 1 PID: 4189 Comm: a.out Not tainted 4.10.0+ #54
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> Call Trace:
>  __dump_stack lib/dump_stack.c:15
>  dump_stack+0x292/0x398 lib/dump_stack.c:51
>  kasan_object_err+0x1c/0x70 mm/kasan/report.c:162
>  print_address_description mm/kasan/report.c:200
>  kasan_report_error mm/kasan/report.c:289
>  kasan_report.part.1+0x20e/0x4e0 mm/kasan/report.c:311
>  kasan_report mm/kasan/report.c:331
>  __asan_report_load4_noabort+0x29/0x30 mm/kasan/report.c:331
>  rht_key_hashfn ./include/linux/rhashtable.h:254
>  __rhashtable_lookup ./include/linux/rhashtable.h:604
>  rhltable_lookup ./include/linux/rhashtable.h:690
>  sctp_hash_transport+0x855/0xcc0 net/sctp/input.c:887
>  sctp_assoc_add_peer+0xd0b/0x1470 net/sctp/associola.c:716
>  __sctp_connect+0x26d/0xdb0 net/sctp/socket.c:1184
>  __sctp_setsockopt_connectx+0x197/0x200 net/sctp/socket.c:1338
>  sctp_setsockopt_connectx net/sctp/socket.c:1370
>  sctp_setsockopt+0x15fa/0x5f10 net/sctp/socket.c:3936
>  sock_common_setsockopt+0x95/0xd0 net/core/sock.c:2725
>  SYSC_setsockopt net/socket.c:1786
>  SyS_setsockopt+0x270/0x3a0 net/socket.c:1765
>  entry_SYSCALL_64_fastpath+0x1f/0xc2 arch/x86/entry/entry_64.S:204
> RIP: 0033:0x7f3e27a55b79
> RSP: 002b:7f3e2296fd98 EFLAGS: 0206 ORIG_RAX: 0036
> RAX: ffda RBX: 7f3e229709c0 RCX: 7f3e27a55b79
> RDX: 006e RSI: 0084 RDI: 0003
> RBP: 7f3e27f21220 R08: 0010 R09: 
> R10: 20004000 R11: 0206 R12: 
> R13: 7f3e229709c0 R14: 7f3e2834c040 R15: 0003
> Object at 8800671e1f80, in cache kmalloc-1024 size: 1024
> Allocated:
> PID = 1
> save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:57
>  save_stack+0x43/0xd0 mm/kasan/kasan.c:502
>  set_track mm/kasan/kasan.c:514
>  kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:605
>  __kmalloc+0xa0/0x2d0 mm/slub.c:3745
>  kmalloc ./include/linux/slab.h:495
>  kzalloc ./include/linux/slab.h:663
>  bucket_table_alloc+0x618/0x930 lib/rhashtable.c:224
>  rhashtable_init+0x5f8/0xc60 lib/rhashtable.c:1006
>  rhltable_init+0x53/0xa0 lib/rhashtable.c:1037
>  sctp_transport_hashtable_init+0x1c/0x20 net/sctp/input.c:865
>  sctp_init+0x62c/0x88f net/sctp/protocol.c:1486
>  do_one_initcall+0xf3/0x390 init/main.c:788
>  do_initcall_level init/main.c:854
>  do_initcalls init/main.c:862
>  do_basic_setup init/main.c:880
>  kernel_init_freeable+0x5cc/0x6a6 init/main.c:1031
>  kernel_init+0x13/0x180 init/main.c:955
>  ret_from_fork+0x31/0x40 arch/x86/entry/entry_64.S:430
> Freed:
> PID = 0
>  save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c

RE: [PATCH] checkpatch: add warning on %pk instead of %pK usage

2017-02-27 Thread Roberts, William C



> -Original Message-
> From: Roberts, William C [mailto:william.c.robe...@intel.com]
> Sent: Wednesday, February 15, 2017 3:49 PM
> To: Joe Perches 
> Cc: linux-kernel@vger.kernel.org; a...@canonical.com; kernel-
> harden...@lists.openwall.com
> Subject: [kernel-hardening] RE: [PATCH] checkpatch: add warning on %pk instead
> of %pK usage
> 
> 
> 
> > -Original Message-
> > From: Joe Perches [mailto:j...@perches.com]
> > Sent: Monday, February 13, 2017 2:21 PM
> > To: Roberts, William C 
> > Cc: linux-kernel@vger.kernel.org; a...@canonical.com; kernel-
> > harden...@lists.openwall.com
> > Subject: Re: [PATCH] checkpatch: add warning on %pk instead of %pK
> > usage
> >
> > (Adding back the cc's)
> >
> > On Mon, 2017-02-13 at 21:28 +, Roberts, William C wrote:
> > > 
> > > > No worries.
> > > > No idea why it doesn't work for you.
> > > > Maybe the hand applying was somehow faulty?
> > > >
> > > > The attached is on top of -next so it does have offsets on Linus'
> > > > tree, but it seems to work.
> > > >
> > > > (on -linux)
> > > >
> > > > $ patch -p1 < cp_vsp.diff
> > > > patching file scripts/checkpatch.pl Hunk #1 succeeded at 5634
> > > > (offset -36 lines).
> > > >
> > > > $ cat t_block.c
> > > > {
> > > > MY_DEBUG(drv->foo,
> > > >  "%pk",
> > > >  foo->boo);
> > > > }
> > > > $ ./scripts/checkpatch.pl -f t_block.c
> > > > WARNING: Invalid vsprintf pointer extension '%pk'
> > > > #2: FILE: t_block.c:2:
> > > > +   MY_DEBUG(drv->foo,
> > > > +    "%pk",
> > > > +    foo->boo);
> > > >
> > > > total: 0 errors, 1 warnings, 5 lines checked
> > > >
> > > > NOTE: For some of the reported defects, checkpatch may be able to
> > > >   mechanically convert to the typical style using --fix or 
> > > > --fix-inplace.
> > > >
> > > > t_block.c has style problems, please review.
> > > >
> > > > NOTE: If any of the errors are false positives, please report
> > > >   them to the maintainer, see CHECKPATCH in MAINTAINERS.
> > >
> > >
> > > Applied. It works fine with your example (see attached
> > > 0001-tblock.patch) but it doesn't provide Output for me with
> > > 0002-drv-hack.patch (attached as well)
> > >
> > > $ ./scripts/checkpatch.pl 0002-drv-hack.patch
> > > total: 0 errors, 0 warnings, 10 lines checked
> > >
> > > 0002-drv-hack.patch has no obvious style problems and is ready for
> submission.
> > >
> > > ./scripts/checkpatch.pl 0001-tblock.patch
> > > WARNING: added, moved or deleted file(s), does MAINTAINERS need
> > updating?
> > > #13:
> > > new file mode 100644
> > >
> > > WARNING: Invalid vsprintf pointer extension '%pk'
> > > #19: FILE: t_block.c:2:
> > > + MY_DEBUG(drv->foo,
> > > + "%pk",
> > > +  foo->boo);
> > >
> > > total: 0 errors, 2 warnings, 6 lines checked
> > >
> > > NOTE: For some of the reported defects, checkpatch may be able to
> > >   mechanically convert to the typical style using --fix or 
> > > --fix-inplace.
> > >
> > > 0001-tblock.patch has style problems, please review.
> > >
> > > NOTE: If any of the errors are false positives, please report
> > >   them to the maintainer, see CHECKPATCH in MAINTAINERS.
> >
> > This means _all_ the $stat checks aren't being done on patches that
> > add just a single multi-line statement.
> >
> > Andrew?  Any thoughts on how to enable $stat appropriately for patch
> > contexts with a single multi-line statement?
> 
> I'm for merging your patch as is, and then take up the fact that $stat is not
> working correctly as a separate change, does that seem reasonable?

I haven't seen anything on list about your patch, are we kind of stuck or do you
have some plan on adding your stat patch in the future?

Re: [PATCH V5 2/6] mm: don't assume anonymous pages have SwapBacked flag

2017-02-27 Thread Shaohua Li

On Mon, Feb 27, 2017 at 03:35:34PM +0100, Michal Hocko wrote:
> On Fri 24-02-17 13:31:45, Shaohua Li wrote:
> > There are a few places the code assumes anonymous pages should have
> > SwapBacked flag set. MADV_FREE pages are anonymous pages but we are
> > going to add them to LRU_INACTIVE_FILE list and clear SwapBacked flag
> > for them. The assumption doesn't hold any more, so fix them.
> > 
> > Cc: Michal Hocko 
> > Cc: Minchan Kim 
> > Cc: Hugh Dickins 
> > Cc: Rik van Riel 
> > Cc: Mel Gorman 
> > Cc: Andrew Morton 
> > Acked-by: Johannes Weiner 
> > Signed-off-by: Shaohua Li 
> 
> Looks good to me.
> [...]
> > index 96eb85c..c621088 100644
> > --- a/mm/rmap.c
> > +++ b/mm/rmap.c
> > @@ -1416,7 +1416,8 @@ static int try_to_unmap_one(struct page *page, struct 
> > vm_area_struct *vma,
> >  * Store the swap location in the pte.
> >  * See handle_pte_fault() ...
> >  */
> > -   VM_BUG_ON_PAGE(!PageSwapCache(page), page);
> > +   VM_BUG_ON_PAGE(!PageSwapCache(page) && 
> > PageSwapBacked(page),
> > +   page);
> 
> just this part makes me scratch my head. I really do not understand what
> kind of problem it tries to prevent from, maybe I am missing something
> obvoious...

Just check a page which isn't lazyfree but wrongly enters here without swap
entry. Or maybe you suggest we delete this statement?

Thanks,
Shaohua

Re: [PATCH 1/4] dt-bindings: dma: uart: add uart dma bindings

2017-02-27 Thread Rob Herring

On Thu, Feb 16, 2017 at 07:07:28PM +0800, Long Cheng wrote:
> add uart dma bindings

This and the subject need to be more specific in that this is for 
Mediatek.

> 
> Signed-off-by: Long Cheng 
> ---
>  .../devicetree/bindings/dma/mtk_uart_dma.txt   |   32 
> 

mediatek,uart-dma.txt for filename.

>  1 file changed, 32 insertions(+)
>  create mode 100644 Documentation/devicetree/bindings/dma/mtk_uart_dma.txt
> 
> diff --git a/Documentation/devicetree/bindings/dma/mtk_uart_dma.txt 
> b/Documentation/devicetree/bindings/dma/mtk_uart_dma.txt
> new file mode 100644
> index 000..b8aa7f4
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/dma/mtk_uart_dma.txt
> @@ -0,0 +1,32 @@
> +* Mediatek UART APDMA Controller

Is the block really for UARTs only or just happens to be connected to 
UARTs. For the latter, you should drop UART here and in the compatible.

> +
> +Required properties:
> +- compatible should contain:
> +  * "mediatek,mt2701-uart-dma" for MT2701 compatible APDMA
> +  * "mediatek,mt6577-uart-dma" for MT6577 and all of the above
> +
> +- reg: The base address of the APDMA register bank.
> +
> +- interrupts: A single interrupt specifier.
> +
> +- clocks : Must contain an entry for each entry in clock-names.
> +  See ../clocks/clock-bindings.txt for details.
> +- clock-names: The APDMA clock for register accesses

Needs to list "apdma" here. Is apdma really the right name here? It 
should be the name of the input to this block.

> +
> +Examples:
> +
> + apdma: dma-controller@11000380 {
> + compatible = "mediatek,mt2701-uart-dma";
> + reg = <0 0x11000380 0 0x400>;
> + interrupts = ,
> +  ,
> +  ,
> +  ,
> +  ,
> +  ,
> +  ,
> +  ;
> + clocks = <&pericfg CLK_PERI_AP_DMA>;
> + clock-names = "apdma";
> + #dma-cells = <1>;
> + };
> -- 
> 1.7.9.5
>

Re: tip.today - scheduler bam boom crash (cpu hotplug)

2017-02-27 Thread Paolo Bonzini



On 27/02/2017 16:59, Peter Zijlstra wrote:
> OK, so if !KVM_FEATURE_CLOCKSOURCE_STABLE_BIT nothing is stable, but if
> it is set, TSC might still not be stable, but kvm_clock_read() is.
> 
>> However, if kvmclock is stable, we know that the sched clock is stable.
> Right, so the problem is that we only ever want to allow marking
> unstable -- once its found unstable, for whatever reason, we should
> never allow going stable. The corollary of this proposition is that we
> must start out assuming it will become stable. And to avoid actually
> using unstable TSC we do a 3 state bringup:
> 
>  1) sched_clock_running = 0, __stable_early = 1, __stable = 0
>  2) sched_clock_running = 1 (__stable is effective, iow, we run unstable)
>  3) sched_clock_running = 2 (__stable <- __stable_early)
> 
> 2) happens 'early' but is 'safe'.
> 3) happens 'late', after we've brought up SMP and probed TSC
> 
> Between there, we should have detected the most common TSC wreckage and
> made sure to not then switch to 'stable' at 3.
> 
> Now the problem appears to be that we assume sched_clock will use RDTSC
> (native_sched_clock) while sched_clock is a paravirt op.
> 
> Now, I've not yet figured out the ordering between when we set
> pv_time_ops.sched_clock and when we do the 'normal' TSC init stuff.

I think the ordering is fine:

- pv_time_ops.sched_clock is set here:

start_kernel (init/main.c line 509)
  setup_arch
kvmclock_init
  kvm_sched_clock_init

- TSC can be declared unstable only after this:

start_kernel (init/main.c line 628)
  late_time_init
tsc_init

So by the time the tsc_cs_mark_unstable or mark_tsc_unstable can call
clear_sched_clock_stable, pv_time_ops.sched_clock has been set.

> But it appears to me, we should not be calling
> clear_sched_clock_stable() on TSC bits when we don't end up using
> native_sched_clock().

Yes, this makes sense.

Paolo

[PATCH 1/2] ASoC: es7134: add es7134 DAC driver

2017-02-27 Thread Jerome Brunet

The es7134 is 24bit, 192Khz i2s DA converter for PCM audio.
Datasheet is available here : http://www.everest-semi.com/pdf/ES7134LV%20DS.pdf

This driver is also compatible with the es7144, which is the same as the
es7134, with 2 additional pins for filtering capacitors.

Signed-off-by: Jerome Brunet 
---
 sound/soc/codecs/Kconfig  |   4 ++
 sound/soc/codecs/Makefile |   2 +
 sound/soc/codecs/es7134.c | 116 ++
 3 files changed, 122 insertions(+)
 create mode 100644 sound/soc/codecs/es7134.c

diff --git a/sound/soc/codecs/Kconfig b/sound/soc/codecs/Kconfig
index e49e9da7f1f6..7c7c2e96b836 100644
--- a/sound/soc/codecs/Kconfig
+++ b/sound/soc/codecs/Kconfig
@@ -72,6 +72,7 @@ config SND_SOC_ALL_CODECS
select SND_SOC_DMIC
select SND_SOC_ES8328_SPI if SPI_MASTER
select SND_SOC_ES8328_I2C if I2C
+   select SND_SOC_ES7134
select SND_SOC_GTM601
select SND_SOC_HDAC_HDMI
select SND_SOC_ICS43432
@@ -525,6 +526,9 @@ config SND_SOC_HDMI_CODEC
select SND_PCM_IEC958
select HDMI
 
+config SND_SOC_ES7134
+   tristate "Everest Semi ES7134 CODEC"
+
 config SND_SOC_ES8328
tristate
 
diff --git a/sound/soc/codecs/Makefile b/sound/soc/codecs/Makefile
index 1796cb987e71..b65868c963c9 100644
--- a/sound/soc/codecs/Makefile
+++ b/sound/soc/codecs/Makefile
@@ -63,6 +63,7 @@ snd-soc-da7219-objs := da7219.o da7219-aad.o
 snd-soc-da732x-objs := da732x.o
 snd-soc-da9055-objs := da9055.o
 snd-soc-dmic-objs := dmic.o
+snd-soc-es7134-objs := es7134.o
 snd-soc-es8328-objs := es8328.o
 snd-soc-es8328-i2c-objs := es8328-i2c.o
 snd-soc-es8328-spi-objs := es8328-spi.o
@@ -293,6 +294,7 @@ obj-$(CONFIG_SND_SOC_DA7219)+= snd-soc-da7219.o
 obj-$(CONFIG_SND_SOC_DA732X)   += snd-soc-da732x.o
 obj-$(CONFIG_SND_SOC_DA9055)   += snd-soc-da9055.o
 obj-$(CONFIG_SND_SOC_DMIC) += snd-soc-dmic.o
+obj-$(CONFIG_SND_SOC_ES7134)   += snd-soc-es7134.o
 obj-$(CONFIG_SND_SOC_ES8328)   += snd-soc-es8328.o
 obj-$(CONFIG_SND_SOC_ES8328_I2C)+= snd-soc-es8328-i2c.o
 obj-$(CONFIG_SND_SOC_ES8328_SPI)+= snd-soc-es8328-spi.o
diff --git a/sound/soc/codecs/es7134.c b/sound/soc/codecs/es7134.c
new file mode 100644
index ..25ede825d349
--- /dev/null
+++ b/sound/soc/codecs/es7134.c
@@ -0,0 +1,116 @@
+/*
+ * Copyright (c) 2017 BayLibre, SAS.
+ * Author: Jerome Brunet 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see .
+ * The full GNU General Public License is included in this distribution
+ * in the file called COPYING.
+ */
+
+#include 
+#include 
+
+/*
+ * The everest 7134 is a very simple DA converter with no register
+ */
+
+static int es7134_set_fmt(struct snd_soc_dai *codec_dai, unsigned int fmt)
+{
+   fmt &= (SND_SOC_DAIFMT_FORMAT_MASK | SND_SOC_DAIFMT_INV_MASK |
+   SND_SOC_DAIFMT_MASTER_MASK);
+
+   if (fmt != (SND_SOC_DAIFMT_I2S | SND_SOC_DAIFMT_NB_NF |
+   SND_SOC_DAIFMT_CBS_CFS)) {
+   dev_err(codec_dai->dev, "Invalid DAI format\n");
+   return -EINVAL;
+   }
+
+   return 0;
+}
+
+static const struct snd_soc_dai_ops es7134_dai_ops = {
+   .set_fmt= es7134_set_fmt,
+};
+
+static struct snd_soc_dai_driver es7134_dai = {
+   .name = "es7134-hifi",
+   .playback = {
+   .stream_name = "Playback",
+   .channels_min = 2,
+   .channels_max = 2,
+   .rates = SNDRV_PCM_RATE_8000_192000,
+   .formats = (SNDRV_PCM_FMTBIT_S16_LE  |
+   SNDRV_PCM_FMTBIT_S18_3LE |
+   SNDRV_PCM_FMTBIT_S20_3LE |
+   SNDRV_PCM_FMTBIT_S24_3LE |
+   SNDRV_PCM_FMTBIT_S24_LE),
+   },
+   .ops = &es7134_dai_ops,
+};
+
+static const struct snd_soc_dapm_widget es7134_dapm_widgets[] = {
+   SND_SOC_DAPM_OUTPUT("AOUTL"),
+   SND_SOC_DAPM_OUTPUT("AOUTR"),
+   SND_SOC_DAPM_DAC("DAC", "Playback", SND_SOC_NOPM, 0, 0),
+};
+
+static const struct snd_soc_dapm_route es7134_dapm_routes[] = {
+   { "AOUTL", NULL, "DAC" },
+   { "AOUTR", NULL, "DAC" },
+};
+
+static struct snd_soc_codec_driver es7134_codec_driver = {
+   .component_driver = {
+   .dapm_widgets   = es7134_dapm_widgets,
+   .num_dapm_widgets   = ARRAY_SIZE(es7134_dapm_widgets),
+   .dapm_routes= es7134_dapm_routes,
+   .num_dapm_

[RFC PATCH 2/2] kprobes/x86: Exit single-stepping before trying fixup_exception

2017-02-27 Thread Masami Hiramatsu

Exit single-stepping out of line and get back regs->ip to original
(probed) address before trying fixup_exception() if the exception
happened on the singlestep buffer, since the fixup_exception()
depends on regs->ip to search an entry on __ex_table.

Signed-off-by: Masami Hiramatsu 
---
 arch/x86/include/asm/kprobes.h |1 
 arch/x86/kernel/kprobes/core.c |   83 +---
 arch/x86/kernel/traps.c|   19 +
 3 files changed, 71 insertions(+), 32 deletions(-)

diff --git a/arch/x86/include/asm/kprobes.h b/arch/x86/include/asm/kprobes.h
index d1d1e50..79e121a 100644
--- a/arch/x86/include/asm/kprobes.h
+++ b/arch/x86/include/asm/kprobes.h
@@ -111,6 +111,7 @@ struct kprobe_ctlblk {
struct prev_kprobe prev_kprobe;
 };
 
+extern int kprobe_exit_singlestep(struct pt_regs *regs);
 extern int kprobe_fault_handler(struct pt_regs *regs, int trapnr);
 extern int kprobe_exceptions_notify(struct notifier_block *self,
unsigned long val, void *data);
diff --git a/arch/x86/kernel/kprobes/core.c b/arch/x86/kernel/kprobes/core.c
index 34d3a52..f2a3f3b 100644
--- a/arch/x86/kernel/kprobes/core.c
+++ b/arch/x86/kernel/kprobes/core.c
@@ -949,43 +949,62 @@ int kprobe_debug_handler(struct pt_regs *regs)
 }
 NOKPROBE_SYMBOL(kprobe_debug_handler);
 
-int kprobe_fault_handler(struct pt_regs *regs, int trapnr)
+/* Fixup current ip register and reset current kprobe, if needed. */
+int kprobe_exit_singlestep(struct pt_regs *regs)
 {
-   struct kprobe *cur = kprobe_running();
struct kprobe_ctlblk *kcb = get_kprobe_ctlblk();
+   struct kprobe *cur = kprobe_running();
 
-   if (unlikely(regs->ip == (unsigned long)cur->ainsn.insn)) {
-   /* This must happen on single-stepping */
-   WARN_ON(kcb->kprobe_status != KPROBE_HIT_SS &&
-   kcb->kprobe_status != KPROBE_REENTER);
-   /*
-* We are here because the instruction being single
-* stepped caused a page fault. We reset the current
-* kprobe and the ip points back to the probe address
-* and allow the page fault handler to continue as a
-* normal page fault.
-*/
-   regs->ip = (unsigned long)cur->addr;
-   /*
-* Trap flag (TF) has been set here because this fault
-* happened where the single stepping will be done.
-* So clear it by resetting the current kprobe:
-*/
-   regs->flags &= ~X86_EFLAGS_TF;
+   if (unlikely(regs->ip != (unsigned long)cur->ainsn.insn))
+   return 0;
 
-   /*
-* If the TF flag was set before the kprobe hit,
-* don't touch it:
-*/
-   regs->flags |= kcb->kprobe_old_flags;
+   /* This must happen on single-stepping */
+   WARN_ON(kcb->kprobe_status != KPROBE_HIT_SS &&
+   kcb->kprobe_status != KPROBE_REENTER);
+   /*
+* We are here because the instruction being single
+* stepped caused a page fault. We reset the current
+* kprobe and the ip points back to the probe address
+* and allow the page fault handler to continue as a
+* normal page fault.
+*/
+   regs->ip = (unsigned long)cur->addr;
+   /*
+* Trap flag (TF) has been set here because this fault
+* happened where the single stepping will be done.
+* So clear it by resetting the current kprobe:
+*/
+   regs->flags &= ~X86_EFLAGS_TF;
 
-   if (kcb->kprobe_status == KPROBE_REENTER)
-   restore_previous_kprobe(kcb);
-   else
-   reset_current_kprobe();
-   preempt_enable_no_resched();
-   } else if (kcb->kprobe_status == KPROBE_HIT_ACTIVE ||
-  kcb->kprobe_status == KPROBE_HIT_SSDONE) {
+   /*
+* If the TF flag was set before the kprobe hit,
+* don't touch it:
+*/
+   regs->flags |= kcb->kprobe_old_flags;
+
+   if (kcb->kprobe_status == KPROBE_REENTER)
+   restore_previous_kprobe(kcb);
+   else
+   reset_current_kprobe();
+
+   /* Preempt has been disabled before single stepping */
+   preempt_enable_no_resched();
+
+   return 1;
+}
+NOKPROBE_SYMBOL(kprobe_exit_singlestep);
+
+int kprobe_fault_handler(struct pt_regs *regs, int trapnr)
+{
+   struct kprobe_ctlblk *kcb = get_kprobe_ctlblk();
+   struct kprobe *cur = kprobe_running();
+
+   /* If the fault happened on singlestep, finish it and retry */
+   if (kprobe_exit_singlestep(regs))
+   return 0;
+
+   if (kcb->kprobe_status == KPROBE_HIT_ACTIVE ||
+   kcb->kprobe_status == KPROBE_HIT_SSDONE) {
/*
 * We increment the nmissed count for accounting,
 * we

[PATCH 0/2] Add es7134 DAC driver support

2017-02-27 Thread Jerome Brunet

This patchset adds the support for the es7134 from Everest Semiconductor.
The es7134 is simple i2s DAC with no configuration interface.
It has been tested on Amlogic's meson-gxbb-p200 board.

Jerome Brunet (2):
  ASoC: es7134: add es7134 DAC driver
  ASoC: es7134: add dt-bindings for the es7134 dac

 .../devicetree/bindings/sound/everest,es7134.txt   |  10 ++
 sound/soc/codecs/Kconfig   |   4 +
 sound/soc/codecs/Makefile  |   2 +
 sound/soc/codecs/es7134.c  | 116 +
 4 files changed, 132 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/sound/everest,es7134.txt
 create mode 100644 sound/soc/codecs/es7134.c

-- 
2.9.3

Re: [PATCH] vfio pci: kernel support of error recovery only for non fatal error

2017-02-27 Thread Michael S. Tsirkin

On Mon, Feb 27, 2017 at 03:28:43PM +0800, Cao jin wrote:
> Subject: Re: [PATCH] vfio pci: kernel support of error recovery only for non
> fatal error

Don't make the subject so long. This is why I had
[PATCH v3] vfio error recovery: kernel support
you also want to add versioning as you inherited my v3,
you should make this v4 etc.

> 0. What happens now (PCIE AER only)
>Fatal errors cause a link reset.
>Non fatal errors don't.
>All errors stop the VM eventually, but not immediately
>because it's detected and reported asynchronously.
>Interrupts are forwarded as usual.
>Correctable errors are not reported to guest at all.
>Note: PPC EEH is different. This focuses on AER.
> 
> 1. Correctable errors
>There is no need to report these to guest. So let's not.
> 
> 2. Fatal errors
>It's not easy to handle them gracefully since link reset
>is needed. As a first step, let's use the existing mechanism
>in that case.
> 
> 2. Non-fatal errors
>Here we could make progress by reporting them to guest
>and have guest handle them.
> 
>Issues:
>a. this behaviour should only be enabled with new userspace,
>   old userspace should work without changes.
> 
>   Suggestion: One way to address this would be to add a new eventfd
>   non_fatal_err_trigger. If not set, invoke err_trigger.
> 
>b. drivers are supposed to stop MMIO when error is reported,
>   if vm keeps going, we will keep doing MMIO/config.
> 
>   Suggestion 1: ignore this. vm stop happens much later when
>   userspace runs anyway, so we are not making things much worse.
> 
>   Suggestion 2: try to stop MMIO/config, resume on resume call
> 
>   Patch below implements Suggestion 1.
> 
>   Note that although this is really against the documentation, which
>   states error_detected() is the point at which the driver should quiesce
>   the device and not touch it further (until diagnostic poking at
>   mmio_enabled or full access at resume callback).
> 
>   Fixing this won't be easy. However, this is not a regression.
> 
>   Also note this does nothing about interrupts, documentation
>   suggests returning IRQ_NONE until reset.
>   Again, not a regression.
> 
>c. PF driver might detect that function is completely broken,
>   if vm keeps going, we will keep doing MMIO/config.
> 
>   Suggestion 1: ignore this. vm stop happens much later when
>   userspace runs anyway, so we are not making things much worse.
> 
>   Suggestion 2: detect this and invoke err_trigger to stop VM.
> 
>   Patch below implements Suggestion 2.
> 
> Suggested-by: Michael S. Tsirkin 

It's more than this, you are really reusing parts of my patch,
so you should say so and include my signature.

If you only added a line or two you can keep
the original author. To do this you add
From: Michael S. Tsirkin 
before commit text.

> Signed-off-by: Cao jin 
> ---

Changelog from my v3?

>  drivers/vfio/pci/vfio_pci.c | 38 
> +++--
>  drivers/vfio/pci/vfio_pci_intrs.c   | 19 +++
>  drivers/vfio/pci/vfio_pci_private.h |  1 +
>  include/uapi/linux/vfio.h   |  1 +
>  4 files changed, 57 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
> index 324c52e..3551cc9 100644
> --- a/drivers/vfio/pci/vfio_pci.c
> +++ b/drivers/vfio/pci/vfio_pci.c
> @@ -441,7 +441,8 @@ static int vfio_pci_get_irq_count(struct vfio_pci_device 
> *vdev, int irq_type)
>  
>   return (flags & PCI_MSIX_FLAGS_QSIZE) + 1;
>   }
> - } else if (irq_type == VFIO_PCI_ERR_IRQ_INDEX) {
> + } else if (irq_type == VFIO_PCI_ERR_IRQ_INDEX ||
> +irq_type == VFIO_PCI_NON_FATAL_ERR_IRQ_INDEX) {
>   if (pci_is_pcie(vdev->pdev))
>   return 1;
>   } else if (irq_type == VFIO_PCI_REQ_IRQ_INDEX) {
> @@ -796,6 +797,7 @@ static long vfio_pci_ioctl(void *device_data,
>   case VFIO_PCI_REQ_IRQ_INDEX:
>   break;
>   case VFIO_PCI_ERR_IRQ_INDEX:
> + case VFIO_PCI_NON_FATAL_ERR_IRQ_INDEX:
>   if (pci_is_pcie(vdev->pdev))
>   break;
>   /* pass thru to return error */
> @@ -1282,7 +1284,9 @@ static pci_ers_result_t 
> vfio_pci_aer_err_detected(struct pci_dev *pdev,
>  
>   mutex_lock(&vdev->igate);
>  
> - if (vdev->err_trigger)
> + if (state == pci_channel_io_normal && vdev->non_fatal_err_trigger)
> + eventfd_signal(vdev->non_fatal_err_trigger, 1);
> + else if (vdev->err_trigger)
>   eventfd_signal(vdev->err_trigger, 1);
>  
>   mutex_unlock(&vdev->igate);
> @@ -1292,8 +1296,38 @@ static pci_ers_result_t 
> vfio_pci_aer_err_detected(struct pci_dev *pdev,
>   return PCI_ERS_RESULT_CAN_RECOVER;
>  }
>  
> +static pci_ers_result_t vfio_pci_a

Re: [WARNING: A/V UNSCANNABLE][Merge tag 'media/v4.11-1' of git] ff58d005cd: BUG: unable to handle kernel NULL pointer dereference at 0000039c

2017-02-27 Thread Tony Lindgren

* Ingo Molnar  [170227 07:44]:
> 
> * Thomas Gleixner  wrote:
> 
> > The pending interrupt issue happens, at least on my test boxen, mostly on
> > the 'legacy' interrupts (0 - 15). But even the IOAPIC interrupts >=16
> > happen occasionally.
> >
> > 
> >  - Spurious interrupts on IRQ7, which are triggered by IRQ 0 (PIT/HPET). On
> >one of the affected machines this stops when the interrupt system is
> >switched to interrupt remapping !?!?!?
> > 
> >  - Spurious interrupts on various interrupt lines, which are triggered by
> >IOAPIC interrupts >= IRQ16. That's a known issue on quite some chipsets
> >that the legacy PCI interrupt (which is used when IOAPIC is disabled) is
> >triggered when the IOAPIC >=16 interrupt fires.
> > 
> >  - Spurious interrupt caused by driver probing itself. I.e. the driver
> >probing code causes an interrupt issued from the device
> >inadvertently. That happens even on IRQ >= 16.

This sounds a lot like what we saw few weeks ago with dwc3. See commit
12a7f17fac5b ("usb: dwc3: omap: fix race of pm runtime with irq handler in
probe"). It was caused by runtime PM and -EPROBE_DEFER, see the description
Grygorii wrote up in that commit.

> >This problem might be handled by the device driver code itself, but
> >that's going to be ugly. See below.
> 
> That's pretty colorful behavior...
> 
> > We can try to sample more data from the machines of affected users, but I 
> > doubt 
> > that it will give us more information than confirming that we really have 
> > to 
> > deal with all that hardware wreckage out there in some way or the other.
> 
> BTW., instead of trying to avoid the scenario, wow about moving in the other 
> direction: making CONFIG_DEBUG_SHIRQ=y unconditional property in the IRQ core 
> code 
> starting from v4.12 or so, i.e. requiring device driver IRQ handlers to 
> handle the 
> invocation of IRQ handlers at pretty much any moment. (We could also extend 
> it a 
> bit, such as invoking IRQ handlers early after suspend/resume wakeup.)
> 
> Because it's not the requirement that hurts primarily, but the resulting 
> non-determinism and the sporadic crashes. Which can be solved by making the 
> race 
> deterministic via the debug facility.
> 
> If the IRQ handler crashed the moment it was first written by the driver 
> author 
> we'd never see these problems.

Just in case this is PM related.. Maybe the spurious interrupt is pending
from earlier? This could be caused by glitches on the lines with runtime PM,
or a pending interrupt during suspend/resume. In that case IRQ_DISABLE_UNLAZY
might provide more clues if the problem goes away.

Regards,

Tony

[PATCH net] rxrpc: Fix deadlock between call creation and sendmsg/recvmsg

2017-02-27 Thread David Howells

All the routines by which rxrpc is accessed from the outside are serialised
by means of the socket lock (sendmsg, recvmsg, bind,
rxrpc_kernel_begin_call(), ...) and this presents a problem:

 (1) If a number of calls on the same socket are in the process of
 connection to the same peer, a maximum of four concurrent live calls
 are permitted before further calls need to wait for a slot.

 (2) If a call is waiting for a slot, it is deep inside sendmsg() or
 rxrpc_kernel_begin_call() and the entry function is holding the socket
 lock.

 (3) sendmsg() and recvmsg() or the in-kernel equivalents are prevented
 from servicing the other calls as they need to take the socket lock to
 do so.

 (4) The socket is stuck until a call is aborted and makes its slot
 available to the waiter.

Fix this by:

 (1) Provide each call with a mutex ('user_mutex') that arbitrates access
 by the users of rxrpc separately for each specific call.

 (2) Make rxrpc_sendmsg() and rxrpc_recvmsg() unlock the socket as soon as
 they've got a call and taken its mutex.

 Note that I'm returning EWOULDBLOCK from recvmsg() if MSG_DONTWAIT is
 set but someone else has the lock.  Should I instead only return
 EWOULDBLOCK if there's nothing currently to be done on a socket, and
 sleep in this particular instance because there is something to be
 done, but we appear to be blocked by the interrupt handler doing its
 ping?

 (3) Make rxrpc_new_client_call() unlock the socket after allocating a new
 call, locking its user mutex and adding it to the socket's call tree.
 The call is returned locked so that sendmsg() can add data to it
 immediately.

 From the moment the call is in the socket tree, it is subject to
 access by sendmsg() and recvmsg() - even if it isn't connected yet.

 (4) Lock new service calls in the UDP data_ready handler (in
 rxrpc_new_incoming_call()) because they may already be in the socket's
 tree and the data_ready handler makes them live immediately if a user
 ID has already been preassigned.

 Note that the new call is locked before any notifications are sent
 that it is live, so doing mutex_trylock() *ought* to always succeed.
 Userspace is prevented from doing sendmsg() on calls that are in a
 too-early state in rxrpc_do_sendmsg().

 (5) Make rxrpc_new_incoming_call() return the call with the user mutex
 held so that a ping can be scheduled immediately under it.

 Note that it might be worth moving the ping call into
 rxrpc_new_incoming_call() and then we can drop the mutex there.

 (6) Make rxrpc_accept_call() take the lock on the call it is accepting and
 release the socket after adding the call to the socket's tree.  This
 is slightly tricky as we've dequeued the call by that point and have
 to requeue it.

 Note that requeuing emits a trace event.

 (7) Make rxrpc_kernel_send_data() and rxrpc_kernel_recv_data() take the
 new mutex immediately and don't bother with the socket mutex at all.

This patch has the nice bonus that calls on the same socket are now to some
extent parallelisable.


Note that we might want to move rxrpc_service_prealloc() calls out from the
socket lock and give it its own lock, so that we don't hang progress in
other calls because we're waiting for the allocator.

We probably also want to avoid calling rxrpc_notify_socket() from within
the socket lock (rxrpc_accept_call()).

Signed-off-by: David Howells 
Tested-by: Marc Dionne 
---

 include/trace/events/rxrpc.h |2 +
 net/rxrpc/af_rxrpc.c |   12 +++--
 net/rxrpc/ar-internal.h  |1 +
 net/rxrpc/call_accept.c  |   48 +++
 net/rxrpc/call_object.c  |   18 -
 net/rxrpc/input.c|1 +
 net/rxrpc/recvmsg.c  |   39 -
 net/rxrpc/sendmsg.c  |   57 ++
 8 files changed, 156 insertions(+), 22 deletions(-)

diff --git a/include/trace/events/rxrpc.h b/include/trace/events/rxrpc.h
index 593f586545eb..39123c06a566 100644
--- a/include/trace/events/rxrpc.h
+++ b/include/trace/events/rxrpc.h
@@ -119,6 +119,7 @@ enum rxrpc_recvmsg_trace {
rxrpc_recvmsg_full,
rxrpc_recvmsg_hole,
rxrpc_recvmsg_next,
+   rxrpc_recvmsg_requeue,
rxrpc_recvmsg_return,
rxrpc_recvmsg_terminal,
rxrpc_recvmsg_to_be_accepted,
@@ -277,6 +278,7 @@ enum rxrpc_congest_change {
EM(rxrpc_recvmsg_full,  "FULL") \
EM(rxrpc_recvmsg_hole,  "HOLE") \
EM(rxrpc_recvmsg_next,  "NEXT") \
+   EM(rxrpc_recvmsg_requeue,   "REQU") \
EM(rxrpc_recvmsg_return,"RETN") \
EM(rxrpc_recvmsg_terminal,  "TERM") \
EM(rxrpc_recvmsg_to_be_accepted,"TBAC") \
diff --git a/net/rxrpc/af_rxrpc.c b/net/rxrpc/af_rxrpc.c
index 199b4

Re: [PATCH] sd: close hole in > 2T device rejection when !CONFIG_LBDAF

2017-02-27 Thread Bart Van Assche

On Mon, 2017-02-27 at 09:22 -0600, Steven J. Magnani wrote:
> @@ -2122,7 +2122,10 @@ static int read_capacity_16(struct scsi_
>   return -ENODEV;
>   }
>  
> - if ((sizeof(sdkp->capacity) == 4) && (lba >= 0xULL)) {
> + /* Make sure logical_to_sectors() won't overflow */
> + lba_in_sectors = lba << (ilog2(sector_size) - 9);
> + if ((sizeof(sdkp->capacity) == 4) &&
> + ((lba >= 0xULL) || (lba_in_sectors >= 0xULL))) {
>   sd_printk(KERN_ERR, sdkp, "Too big for this kernel. Use a "
>   "kernel compiled with support for large block "
>   "devices.\n");
> @@ -2162,6 +2165,7 @@ static int read_capacity_10(struct scsi_
>   int the_result;
>   int retries = 3, reset_retries = READ_CAPACITY_RETRIES_ON_RESET;
>   sector_t lba;
> + unsigned long long lba_in_sectors;
>   unsigned sector_size;
>  
>   do {
> @@ -2208,7 +2212,10 @@ static int read_capacity_10(struct scsi_
>   return sector_size;
>   }
>  
> - if ((sizeof(sdkp->capacity) == 4) && (lba == 0x)) {
> + /* Make sure logical_to_sectors() won't overflow */
> + lba_in_sectors = ((unsigned long long) lba) << (ilog2(sector_size) - 9);
> + if ((sizeof(sdkp->capacity) == 4) &&
> + (lba_in_sectors >= 0xULL)) {
>   sd_printk(KERN_ERR, sdkp, "Too big for this kernel. Use a "
>   "kernel compiled with support for large block "
>   "devices.\n");

Why are the two checks slightly different? Could the same code be used for
both checks? BTW, using the macro below would make the above checks less
verbose and easier to read:

/*
 * Test whether the result of a shift-left operation would be larger than
 * what fits in a variable with the type of @a.
 */
#define shift_left_overflows(a, b)  \
({  \
typeof(a) _minus_one = -1LL;\
typeof(a) _plus_one = 1;\
bool _a_is_signed = _minus_one < 0; \
int _shift = sizeof(a) * 8 - ((b) + _a_is_signed);  \
_shift < 0 || ((a) & ~((_plus_one << _shift) - 1)) != 0;\
})

Bart.

Re: LTP write03 writev07 xfs failures

2017-02-27 Thread Brian Foster

cc Christoph

On Mon, Feb 27, 2017 at 12:22:20PM +0800, Xiong Zhou wrote:
> Hi,
> 
> These 2 tests PASS on Linus tree commit:
>   37c8596 Merge tag 'tty-4.11-rc1' of git://git.kernel.org/pub/scm/linux...
> FAIL on commit:
>   60e8d3e Merge tag 'pci-v4.11-changes' of git://git.kernel.org/pub/scm/...
> 
> LTP latest commit: c60d3ca move_pages12: include lapi/mmap.h
> 
> Steps:
> 
> sh-4.2# pwd
> /root/ltp
> sh-4.2# git log --oneline -1
> c60d3ca move_pages12: include lapi/mmap.h
> sh-4.2# uname -r
> 4.10.0-master-60e8d3e+
> sh-4.2# mount | grep test1
> /dev/sda3 on /test1 type xfs 
> (rw,relatime,seclabel,attr2,inode64,logbsize=256k,sunit=512,swidth=512,noquota)
> sh-4.2# xfs_info /test1
> meta-data=/dev/sda3  isize=512agcount=16, agsize=245696 blks
>  =   sectsz=512   attr=2, projid32bit=1
>  =   crc=1finobt=1 spinodes=0
> data =   bsize=4096   blocks=3931136, imaxpct=25
>  =   sunit=64 swidth=64 blks
> naming   =version 2  bsize=4096   ascii-ci=0 ftype=1
> log  =internal   bsize=4096   blocks=2560, version=2
>  =   sectsz=512   sunit=64 blks, lazy-count=1
> realtime =none   extsz=4096   blocks=0, rtextents=0
> sh-4.2# 
> sh-4.2# TMPDIR=/test1 ./testcases/kernel/syscalls/write/write03
> write03 0  TINFO  :  Enter Block 1: test to check if write corrupts the 
> file when write fails
> write03 1  TFAIL  :  write03.c:125: failure of write(2) corrupted the file
> write03 0  TINFO  :  Exit block 1
> sh-4.2# 

On a quick test, both of these are reproduced after commit fa7f138ac4
("xfs: clear delalloc and cache on buffered write failure"). That patch
fixed a problem where if the write allocates a block but fails to write
anything (written == 0), we'd leave a delalloc block lingering in the
inode.

With that change, this test now fails because it sends two writes within
a single block. The first allocates the block, writes 100 bytes and
returns successfully. The next attempts to write the next 100 bytes,
fails and triggers the cleanup of the block because we can't tell
whether this write or the previous had allocated it.

I'm not convinced the right solution is to just go back to the previous
code. That obviously reintroduces the original problem, but we'd also
still have a similar problem if the second (failed) write was a rewrite
of the first. The error handling of the second write would kill off the
blocks allocated and written to successfully by the first. I'm wondering
if the right thing to do here is factor in i_size as it appears that's
what this code did prior to the iomap transition. I'm not sure where
that leaves us wrt to writes into sparse files, though. I may need to
play with this a bit..

Christoph, any thoughts on this?

Brian

> sh-4.2# TMPDIR=/test1 ./testcases/kernel/syscalls/writev/writev07
> tst_test.c:760: INFO: Timeout per run is 0h 05m 00s
> writev07.c:60: INFO: starting test with initial file offset: 0 
> writev07.c:82: INFO: got EFAULT
> writev07.c:87: FAIL: file was written to
> writev07.c:93: PASS: offset stayed unchanged
> writev07.c:60: INFO: starting test with initial file offset: 65 
> writev07.c:82: INFO: got EFAULT
> writev07.c:89: PASS: file stayed untouched
> writev07.c:93: PASS: offset stayed unchanged
> writev07.c:60: INFO: starting test with initial file offset: 4096 
> writev07.c:82: INFO: got EFAULT
> writev07.c:89: PASS: file stayed untouched
> writev07.c:93: PASS: offset stayed unchanged
> writev07.c:60: INFO: starting test with initial file offset: 4097 
> writev07.c:82: INFO: got EFAULT
> writev07.c:89: PASS: file stayed untouched
> writev07.c:93: PASS: offset stayed unchanged
> 
> Summary:
> passed   7
> failed   1
> skipped  0
> warnings 0
> sh-4.2# 
> sh-4.2# mkfs.xfs -V
> mkfs.xfs version 4.7.0
> sh-4.2# cd ../xfsprogs/
> sh-4.2# git log --oneline -1
> d7e1f5f xfsprogs: Release v4.7
> sh-4.2# 
> 
> Thanks,
> Xiong
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] tracing: Fix code comment for ftrace_ops_get_func

2017-02-27 Thread Steven Rostedt

On Wed, 22 Feb 2017 08:29:26 +0800
Chunyu Hu  wrote:

> There is no function 'ftrace_ops_recurs_func' existing in the code, it
> should be ftrace_ops_assist_func. Fix the comment to avoid misleading.

Applied, thanks!

I'm not sure it will go in this merge window. It might, I'll have to
see what else there is.

-- Steve

> 
> Signed-off-by: Chunyu Hu 
> ---
>  kernel/trace/ftrace.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
> index 0c06093..fd84f2e 100644
> --- a/kernel/trace/ftrace.c
> +++ b/kernel/trace/ftrace.c
> @@ -5487,7 +5487,7 @@ static void ftrace_ops_assist_func(unsigned long ip, 
> unsigned long parent_ip,
>   * Normally the mcount trampoline will call the ops->func, but there
>   * are times that it should not. For example, if the ops does not
>   * have its own recursion protection, then it should call the
> - * ftrace_ops_recurs_func() instead.
> + * ftrace_ops_assist_func() instead.
>   *
>   * Returns the function that the trampoline should call for @ops.
>   */

[PATCH 2/2] ASoC: es7134: add dt-bindings for the es7134 dac

2017-02-27 Thread Jerome Brunet

Signed-off-by: Jerome Brunet 
---
 Documentation/devicetree/bindings/sound/everest,es7134.txt | 10 ++
 1 file changed, 10 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/sound/everest,es7134.txt

diff --git a/Documentation/devicetree/bindings/sound/everest,es7134.txt 
b/Documentation/devicetree/bindings/sound/everest,es7134.txt
new file mode 100644
index ..5495a3cb8b7b
--- /dev/null
+++ b/Documentation/devicetree/bindings/sound/everest,es7134.txt
@@ -0,0 +1,10 @@
+ES7134 i2s DA converter
+
+Required properties:
+- compatible : "everest,es7134" or "everest,es7144"
+
+Example:
+
+i2s_codec: external-codec {
+   compatible = "everest,es7134";
+};
-- 
2.9.3

[RFC PATCH 1/2] kprobes/x86: Use probe_kernel_read instead of memcpy

2017-02-27 Thread Masami Hiramatsu

Use probe_kernel_read() for avoiding unexpected faults while
copying kernel text in both of __recover_probed_insn() and
__recover_optprobed_insn().

Signed-off-by: Masami Hiramatsu 
---
 arch/x86/kernel/kprobes/core.c |7 +--
 arch/x86/kernel/kprobes/opt.c  |5 -
 2 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/kprobes/core.c b/arch/x86/kernel/kprobes/core.c
index 6384eb7..34d3a52 100644
--- a/arch/x86/kernel/kprobes/core.c
+++ b/arch/x86/kernel/kprobes/core.c
@@ -264,7 +264,10 @@ __recover_probed_insn(kprobe_opcode_t *buf, unsigned long 
addr)
 * Fortunately, we know that the original code is the ideal 5-byte
 * long NOP.
 */
-   memcpy(buf, (void *)addr, MAX_INSN_SIZE * sizeof(kprobe_opcode_t));
+   if (probe_kernel_read(buf, (void *)addr,
+   MAX_INSN_SIZE * sizeof(kprobe_opcode_t)))
+   return 0UL;
+
if (faddr)
memcpy(buf, ideal_nops[NOP_ATOMIC5], 5);
else
@@ -276,7 +279,7 @@ __recover_probed_insn(kprobe_opcode_t *buf, unsigned long 
addr)
  * Recover the probed instruction at addr for further analysis.
  * Caller must lock kprobes by kprobe_mutex, or disable preemption
  * for preventing to release referencing kprobes.
- * Returns zero if the instruction can not get recovered.
+ * Returns zero if the instruction can not get recovered (or access failed).
  */
 unsigned long recover_probed_instruction(kprobe_opcode_t *buf, unsigned long 
addr)
 {
diff --git a/arch/x86/kernel/kprobes/opt.c b/arch/x86/kernel/kprobes/opt.c
index 3d1bee9..06ddd0b 100644
--- a/arch/x86/kernel/kprobes/opt.c
+++ b/arch/x86/kernel/kprobes/opt.c
@@ -65,7 +65,10 @@ unsigned long __recover_optprobed_insn(kprobe_opcode_t *buf, 
unsigned long addr)
 * overwritten by jump destination address. In this case, original
 * bytes must be recovered from op->optinsn.copied_insn buffer.
 */
-   memcpy(buf, (void *)addr, MAX_INSN_SIZE * sizeof(kprobe_opcode_t));
+   if (probe_kernel_read(buf, (void *)addr,
+   MAX_INSN_SIZE * sizeof(kprobe_opcode_t)))
+   return 0UL;
+
if (addr == (unsigned long)kp->addr) {
buf[0] = kp->opcode;
memcpy(buf + 1, op->optinsn.copied_insn, RELATIVE_ADDR_SIZE);

Re: [PATCH V5 3/6] mm: move MADV_FREE pages into LRU_INACTIVE_FILE list

2017-02-27 Thread Shaohua Li

On Mon, Feb 27, 2017 at 03:28:01PM +0900, Minchan Kim wrote:
> Hello Shaohua,
> 
> On Fri, Feb 24, 2017 at 01:31:46PM -0800, Shaohua Li wrote:
> > madv MADV_FREE indicate pages are 'lazyfree'. They are still anonymous
> > pages, but they can be freed without pageout. To destinguish them
> > against normal anonymous pages, we clear their SwapBacked flag.
> > 
> > MADV_FREE pages could be freed without pageout, so they pretty much like
> > used once file pages. For such pages, we'd like to reclaim them once
> > there is memory pressure. Also it might be unfair reclaiming MADV_FREE
> > pages always before used once file pages and we definitively want to
> > reclaim the pages before other anonymous and file pages.
> > 
> > To speed up MADV_FREE pages reclaim, we put the pages into
> > LRU_INACTIVE_FILE list. The rationale is LRU_INACTIVE_FILE list is tiny
> > nowadays and should be full of used once file pages. Reclaiming
> > MADV_FREE pages will not have much interfere of anonymous and active
> > file pages. And the inactive file pages and MADV_FREE pages will be
> > reclaimed according to their age, so we don't reclaim too many MADV_FREE
> > pages too. Putting the MADV_FREE pages into LRU_INACTIVE_FILE_LIST also
> > means we can reclaim the pages without swap support. This idea is
> > suggested by Johannes.
> > 
> > This patch doesn't move MADV_FREE pages to LRU_INACTIVE_FILE list yet to
> > avoid bisect failure, next patch will do it.
> > 
> > The patch is based on Minchan's original patch.
> > 
> > Cc: Michal Hocko 
> > Cc: Minchan Kim 
> > Cc: Hugh Dickins 
> > Cc: Rik van Riel 
> > Cc: Mel Gorman 
> > Cc: Andrew Morton 
> > Suggested-by: Johannes Weiner 
> > Signed-off-by: Shaohua Li 
> 
> This patch doesn't address I pointed out in v4.
> 
> https://marc.info/?i=20170224233752.GB4635%40bbox
> 
> Let's discuss it if you still are against.

I really think a spearate patch makes the code clearer. There are a lot of
places we introduce a function but don't use it immediately, if the way makes
the code clearer. But anyway, I'll let Andrew decide if the two patches should
be merged.

Thanks,
Shaohua

[RFC PATCH 0/2] kprobes/x86: Handle probing on ex_table cases

2017-02-27 Thread Masami Hiramatsu

Hi Peter,

Here, I've tried to handle kprobes on exception expected
instructions which are recorded in __ex_table.

1st patch fixes recover_probed_instruction() to use
probe_kernel_read() instead of memcpy kernel text.
So, if you need to get the original instruction for
checking the cause of #UD, you can use it instead
of probe_kernel_read().

2nd patch adds kprobe_exit_singlestep() right before
fixup_exception(), which fixes regs->ip to point original
address and reset current singlestepping process so that
fixup_exeption() can handle the exception correctly.

There seems some die() still not be fixed up. I'm not
sure we should fix that die() messages too. Would we
better fixup regs->ip in those cases?

Thank you,

---

Masami Hiramatsu (2):
  kprobes/x86: Use probe_kernel_read instead of memcpy
  kprobes/x86: Exit single-stepping before trying fixup_exception


 arch/x86/include/asm/kprobes.h |1 
 arch/x86/kernel/kprobes/core.c |   90 +---
 arch/x86/kernel/kprobes/opt.c  |5 ++
 arch/x86/kernel/traps.c|   19 
 4 files changed, 80 insertions(+), 35 deletions(-)

--
Masami Hiramatsu

BENEFIT

2017-02-27 Thread Mrs Julie Leach

You are a recipient to Mrs Julie Leach Donation of $3 million USD. 
Contact(julieleac...@gmail.com) for claims.

< 1 2 3 4 5 6 7 8 >

401 - 500 of 751 matches

Mail list logo