Re: [PATCH kernel v9 27/32] powerpc/iommu/ioda2: Add get_table_size() to calculate the size of future table

2015-04-30 Thread Alexey Kardashevskiy

On 05/01/2015 03:12 PM, David Gibson wrote:

On Fri, May 01, 2015 at 02:10:58PM +1000, Alexey Kardashevskiy wrote:

On 04/29/2015 04:40 PM, David Gibson wrote:

On Sat, Apr 25, 2015 at 10:14:51PM +1000, Alexey Kardashevskiy wrote:

This adds a way for the IOMMU user to know how much a new table will
use so it can be accounted in the locked_vm limit before allocation
happens.

This stores the allocated table size in pnv_pci_create_table()
so the locked_vm counter can be updated correctly when a table is
being disposed.

This defines an iommu_table_group_ops callback to let VFIO know
how much memory will be locked if a table is created.

Signed-off-by: Alexey Kardashevskiy 
---
Changes:
v9:
* reimplemented the whole patch
---
  arch/powerpc/include/asm/iommu.h  |  5 +
  arch/powerpc/platforms/powernv/pci-ioda.c | 14 
  arch/powerpc/platforms/powernv/pci.c  | 36 +++
  arch/powerpc/platforms/powernv/pci.h  |  2 ++
  4 files changed, 57 insertions(+)

diff --git a/arch/powerpc/include/asm/iommu.h b/arch/powerpc/include/asm/iommu.h
index 1472de3..9844c106 100644
--- a/arch/powerpc/include/asm/iommu.h
+++ b/arch/powerpc/include/asm/iommu.h
@@ -99,6 +99,7 @@ struct iommu_table {
unsigned long  it_size;  /* Size of iommu table in entries */
unsigned long  it_indirect_levels;
unsigned long  it_level_size;
+   unsigned long  it_allocated_size;
unsigned long  it_offset;/* Offset into global table */
unsigned long  it_base;  /* mapped address of tce table */
unsigned long  it_index; /* which iommu table this is */
@@ -155,6 +156,10 @@ extern struct iommu_table *iommu_init_table(struct 
iommu_table * tbl,
  struct iommu_table_group;

  struct iommu_table_group_ops {
+   unsigned long (*get_table_size)(
+   __u32 page_shift,
+   __u64 window_size,
+   __u32 levels);
long (*create_table)(struct iommu_table_group *table_group,
int num,
__u32 page_shift,
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index e0be556..7f548b4 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -2062,6 +2062,18 @@ static void pnv_pci_ioda2_setup_bypass_pe(struct pnv_phb 
*phb,
  }

  #ifdef CONFIG_IOMMU_API
+static unsigned long pnv_pci_ioda2_get_table_size(__u32 page_shift,
+   __u64 window_size, __u32 levels)
+{
+   unsigned long ret = pnv_get_table_size(page_shift, window_size, levels);
+
+   if (!ret)
+   return ret;
+
+   /* Add size of it_userspace */
+   return ret + (window_size >> page_shift) * sizeof(unsigned long);


This doesn't make much sense.  The userspace view can't possibly be a
property of the specific low-level IOMMU model.



This it_userspace thing is all about memory preregistration.

I need some way to track how many actual mappings the
mm_iommu_table_group_mem_t has in order to decide whether to allow
unregistering or not.

When I clear TCE, I can read the old value which is host physical address
which I cannot use to find the preregistered region and adjust the mappings
counter; I can only use userspace addresses for this (not even guest
physical addresses as it is VFIO and probably no KVM).

So I have to keep userspace addresses somewhere, one per IOMMU page, and the
iommu_table seems a natural place for this.


Well.. sort of.  But as noted elsewhere this pulls VFIO specific
constraints into a platform code structure.  And whether you get this
table depends on the platform IOMMU type rather than on what VFIO
wants to do with it, which doesn't make sense.

What might make more sense is an opaque pointer io iommu_table for use
by the table "owner" (in the take_ownership sense).  The pointer would
be stored in iommu_table, but VFIO is responsible for populating and
managing its contents.

Or you could just put the userspace mappings in the container.
Although you might want a different data structure in that case.


Nope. I need this table in in-kernel acceleration to update the mappings 
counter per mm_iommu_table_group_mem_t. In KVM's real mode handlers, I only 
have IOMMU tables, not containers or groups. QEMU creates a guest view of 
the table (KVM_CREATE_SPAPR_TCE) specifying a LIOBN, and then attaches TCE 
tables to it via set of ioctls (one per IOMMU group) to VFIO KVM device.


So if I call it it_opaque (instead of it_userspace), I will still need a 
common place (visible to VFIO and PowerKVM) for this to put:

#define IOMMU_TABLE_USERSPACE_ENTRY(tbl, entry)

So far this place was arch/powerpc/include/asm/iommu.h and the iommu_table 
struct.




The other thing to bear in mind is that registered regions are likely
to be large contiguous blocks in user addresses, though obviously not
contiguous in physical addr.  So you migh

Re: [PATCH v2] scripts/gdb: Add command to check list consistency

2015-04-30 Thread Jan Kiszka
On 2015-04-24 03:57, ThiƩbaud Weksteen wrote:
> Add a gdb script to verify the consistency of lists.
> 
> Signed-off-by: ThiƩbaud Weksteen 
> ---
> Implement suggestions from Jan.
> 
> Changes in v2:
>  - Add copyright line
>  - Rename check_list to list_check
>  - Remove casting and only accept (struct list_head) object
>  - Add error message if argument is missing
>  - Reformat error messages to include address of nodes
> 

Thanks! I've queued it up (git.kiszka.org/linux.git queues/gdb-scripts)
along with two small improvements (completion and support for list
pointers). Will push to Andrew for 4.2. I'm also still thinking about
lx-list-for-each...

Jan



signature.asc
Description: OpenPGP digital signature


[PATCH v2 2/2] arm64: dts: mt8173: Fixup pinctrl nodes

2015-04-30 Thread Yingjoe Chen
The 8173 pinctrl node doesn't follow dts convention. Fix them.
Also add a comment to explain pinctrl register usage to make it
more clear.

Signed-off-by: Yingjoe Chen 
Reviewed-by: Daniel Kurtz 
---
 arch/arm64/boot/dts/mediatek/mt8173.dtsi | 22 +-
 1 file changed, 13 insertions(+), 9 deletions(-)

diff --git a/arch/arm64/boot/dts/mediatek/mt8173.dtsi 
b/arch/arm64/boot/dts/mediatek/mt8173.dtsi
index 924fdb6..4595196 100644
--- a/arch/arm64/boot/dts/mediatek/mt8173.dtsi
+++ b/arch/arm64/boot/dts/mediatek/mt8173.dtsi
@@ -106,14 +106,13 @@
compatible = "simple-bus";
ranges;
 
-   syscfg_pctl_a: syscfg_pctl_a@10005000 {
-   compatible = "mediatek,mt8173-pctl-a-syscfg", "syscon";
-   reg = <0 0x10005000 0 0x1000>;
-   };
-
-   pio: pinctrl@0x10005000 {
+   /*
+* Pinctrl access register at 0x10005000 through regmap.
+* Register 0x1000b000 is used by EINT.
+*/
+   pio: pinctrl@10005000 {
compatible = "mediatek,mt8173-pinctrl";
-   reg = <0 0x1000B000 0 0x1000>;
+   reg = <0 0x1000b000 0 0x1000>;
mediatek,pctl-regmap = <&syscfg_pctl_a>;
pins-are-numbered;
gpio-controller;
@@ -121,8 +120,13 @@
interrupt-controller;
#interrupt-cells = <2>;
interrupts = ,
-   ,
-   ;
+,
+;
+   };
+
+   syscfg_pctl_a: syscfg_pctl_a@10005000 {
+   compatible = "mediatek,mt8173-pctl-a-syscfg", "syscon";
+   reg = <0 0x10005000 0 0x1000>;
};
 
sysirq: intpol-controller@10200620 {
-- 
1.8.1.1.dirty

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 1/2] ARM: dts: mt8135: Add pinctrl/GPIO/EINT node for mt8135.

2015-04-30 Thread Yingjoe Chen
From: Hongzhou Yang 

Patches based on v4.1-rc1. Change according to Matthias' suggestion.
- Remove comments on syscfg nodes
- Sort nodes by instance address & name.

---8<
Add pinctrl,GPIO and EINT node to mt8135.dtsi.

Signed-off-by: Hongzhou Yang 
Acked-by: Linus Walleij 
---
 arch/arm/boot/dts/mt8135-pinfunc.h | 1302 
 arch/arm/boot/dts/mt8135.dtsi  |   29 +
 2 files changed, 1331 insertions(+)
 create mode 100644 arch/arm/boot/dts/mt8135-pinfunc.h

diff --git a/arch/arm/boot/dts/mt8135-pinfunc.h 
b/arch/arm/boot/dts/mt8135-pinfunc.h
new file mode 100644
index 000..5a60987
--- /dev/null
+++ b/arch/arm/boot/dts/mt8135-pinfunc.h
@@ -0,0 +1,1302 @@
+/*
+ * Copyright (c) 2014 MediaTek Inc.
+ * Author: Hongzhou.Yang 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#ifndef __DTS_MT8135_PINFUNC_H
+#define __DTS_MT8135_PINFUNC_H
+
+#include 
+
+#define MT8135_PIN_0_MSDC0_DAT7__FUNC_GPIO0 (MTK_PIN_NO(0) | 0)
+#define MT8135_PIN_0_MSDC0_DAT7__FUNC_MSDC0_DAT7 (MTK_PIN_NO(0) | 1)
+#define MT8135_PIN_0_MSDC0_DAT7__FUNC_EINT49 (MTK_PIN_NO(0) | 2)
+#define MT8135_PIN_0_MSDC0_DAT7__FUNC_I2SOUT_DAT (MTK_PIN_NO(0) | 3)
+#define MT8135_PIN_0_MSDC0_DAT7__FUNC_DAC_DAT_OUT (MTK_PIN_NO(0) | 4)
+#define MT8135_PIN_0_MSDC0_DAT7__FUNC_PCM1_DO (MTK_PIN_NO(0) | 5)
+#define MT8135_PIN_0_MSDC0_DAT7__FUNC_SPI1_MO (MTK_PIN_NO(0) | 6)
+#define MT8135_PIN_0_MSDC0_DAT7__FUNC_NALE (MTK_PIN_NO(0) | 7)
+
+#define MT8135_PIN_1_MSDC0_DAT6__FUNC_GPIO1 (MTK_PIN_NO(1) | 0)
+#define MT8135_PIN_1_MSDC0_DAT6__FUNC_MSDC0_DAT6 (MTK_PIN_NO(1) | 1)
+#define MT8135_PIN_1_MSDC0_DAT6__FUNC_EINT48 (MTK_PIN_NO(1) | 2)
+#define MT8135_PIN_1_MSDC0_DAT6__FUNC_I2SIN_WS (MTK_PIN_NO(1) | 3)
+#define MT8135_PIN_1_MSDC0_DAT6__FUNC_DAC_WS (MTK_PIN_NO(1) | 4)
+#define MT8135_PIN_1_MSDC0_DAT6__FUNC_PCM1_WS (MTK_PIN_NO(1) | 5)
+#define MT8135_PIN_1_MSDC0_DAT6__FUNC_SPI1_CSN (MTK_PIN_NO(1) | 6)
+#define MT8135_PIN_1_MSDC0_DAT6__FUNC_NCLE (MTK_PIN_NO(1) | 7)
+
+#define MT8135_PIN_2_MSDC0_DAT5__FUNC_GPIO2 (MTK_PIN_NO(2) | 0)
+#define MT8135_PIN_2_MSDC0_DAT5__FUNC_MSDC0_DAT5 (MTK_PIN_NO(2) | 1)
+#define MT8135_PIN_2_MSDC0_DAT5__FUNC_EINT47 (MTK_PIN_NO(2) | 2)
+#define MT8135_PIN_2_MSDC0_DAT5__FUNC_I2SIN_CK (MTK_PIN_NO(2) | 3)
+#define MT8135_PIN_2_MSDC0_DAT5__FUNC_DAC_CK (MTK_PIN_NO(2) | 4)
+#define MT8135_PIN_2_MSDC0_DAT5__FUNC_PCM1_CK (MTK_PIN_NO(2) | 5)
+#define MT8135_PIN_2_MSDC0_DAT5__FUNC_SPI1_CLK (MTK_PIN_NO(2) | 6)
+#define MT8135_PIN_2_MSDC0_DAT5__FUNC_NLD4 (MTK_PIN_NO(2) | 7)
+
+#define MT8135_PIN_3_MSDC0_DAT4__FUNC_GPIO3 (MTK_PIN_NO(3) | 0)
+#define MT8135_PIN_3_MSDC0_DAT4__FUNC_MSDC0_DAT4 (MTK_PIN_NO(3) | 1)
+#define MT8135_PIN_3_MSDC0_DAT4__FUNC_EINT46 (MTK_PIN_NO(3) | 2)
+#define MT8135_PIN_3_MSDC0_DAT4__FUNC_A_FUNC_CK (MTK_PIN_NO(3) | 3)
+#define MT8135_PIN_3_MSDC0_DAT4__FUNC_LSCE1B_2X (MTK_PIN_NO(3) | 6)
+#define MT8135_PIN_3_MSDC0_DAT4__FUNC_NLD5 (MTK_PIN_NO(3) | 7)
+
+#define MT8135_PIN_4_MSDC0_CMD__FUNC_GPIO4 (MTK_PIN_NO(4) | 0)
+#define MT8135_PIN_4_MSDC0_CMD__FUNC_MSDC0_CMD (MTK_PIN_NO(4) | 1)
+#define MT8135_PIN_4_MSDC0_CMD__FUNC_EINT41 (MTK_PIN_NO(4) | 2)
+#define MT8135_PIN_4_MSDC0_CMD__FUNC_A_FUNC_DOUT_0 (MTK_PIN_NO(4) | 3)
+#define MT8135_PIN_4_MSDC0_CMD__FUNC_USB_TEST_IO_0 (MTK_PIN_NO(4) | 5)
+#define MT8135_PIN_4_MSDC0_CMD__FUNC_LRSTB_2X (MTK_PIN_NO(4) | 6)
+#define MT8135_PIN_4_MSDC0_CMD__FUNC_NRNB (MTK_PIN_NO(4) | 7)
+
+#define MT8135_PIN_5_MSDC0_CLK__FUNC_GPIO5 (MTK_PIN_NO(5) | 0)
+#define MT8135_PIN_5_MSDC0_CLK__FUNC_MSDC0_CLK (MTK_PIN_NO(5) | 1)
+#define MT8135_PIN_5_MSDC0_CLK__FUNC_EINT40 (MTK_PIN_NO(5) | 2)
+#define MT8135_PIN_5_MSDC0_CLK__FUNC_A_FUNC_DOUT_1 (MTK_PIN_NO(5) | 3)
+#define MT8135_PIN_5_MSDC0_CLK__FUNC_USB_TEST_IO_1 (MTK_PIN_NO(5) | 5)
+#define MT8135_PIN_5_MSDC0_CLK__FUNC_LPTE (MTK_PIN_NO(5) | 6)
+#define MT8135_PIN_5_MSDC0_CLK__FUNC_NREB (MTK_PIN_NO(5) | 7)
+
+#define MT8135_PIN_6_MSDC0_DAT3__FUNC_GPIO6 (MTK_PIN_NO(6) | 0)
+#define MT8135_PIN_6_MSDC0_DAT3__FUNC_MSDC0_DAT3 (MTK_PIN_NO(6) | 1)
+#define MT8135_PIN_6_MSDC0_DAT3__FUNC_EINT45 (MTK_PIN_NO(6) | 2)
+#define MT8135_PIN_6_MSDC0_DAT3__FUNC_A_FUNC_DOUT_2 (MTK_PIN_NO(6) | 3)
+#define MT8135_PIN_6_MSDC0_DAT3__FUNC_USB_TEST_IO_2 (MTK_PIN_NO(6) | 5)
+#define MT8135_PIN_6_MSDC0_DAT3__FUNC_LSCE0B_2X (MTK_PIN_NO(6) | 6)
+#define MT8135_PIN_6_MSDC0_DAT3__FUNC_NLD7 (MTK_PIN_NO(6) | 7)
+
+#define MT8135_PIN_7_MSDC0_DAT2__FUNC_GPIO7 (MTK_PIN_NO(7) | 0)
+#define MT8135_PIN_7_MSDC0_DAT2__FUNC_MSDC0_DAT2 (MTK_PIN_NO(7) | 1)
+#define MT8135_PIN_7_MSDC0_DAT2__FUNC_EINT44 (MT

[PATCH 3/5] metag: use for_each_sg()

2015-04-30 Thread Akinobu Mita
Since metag doesn't select ARCH_HAS_SG_CHAIN, it is not necessary to
use for_each_sg() in order to loop over each sg element.  But this can
help find problems with drivers that do not properly initialize their
sg tables when CONFIG_DEBUG_SG is enabled.

Signed-off-by: Akinobu Mita 
Cc: James Hogan 
Cc: linux-me...@vger.kernel.org
Cc: linux-a...@vger.kernel.org
---
 arch/metag/include/asm/dma-mapping.h | 14 +-
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/arch/metag/include/asm/dma-mapping.h 
b/arch/metag/include/asm/dma-mapping.h
index 14b23ef..eb5cdec 100644
--- a/arch/metag/include/asm/dma-mapping.h
+++ b/arch/metag/include/asm/dma-mapping.h
@@ -134,20 +134,24 @@ dma_sync_single_range_for_device(struct device *dev, 
dma_addr_t dma_handle,
 }
 
 static inline void
-dma_sync_sg_for_cpu(struct device *dev, struct scatterlist *sg, int nelems,
+dma_sync_sg_for_cpu(struct device *dev, struct scatterlist *sglist, int nelems,
enum dma_data_direction direction)
 {
int i;
-   for (i = 0; i < nelems; i++, sg++)
+   struct scatterlist *sg;
+
+   for_each_sg(sglist, sg, nelems, i)
dma_sync_for_cpu(sg_virt(sg), sg->length, direction);
 }
 
 static inline void
-dma_sync_sg_for_device(struct device *dev, struct scatterlist *sg, int nelems,
-  enum dma_data_direction direction)
+dma_sync_sg_for_device(struct device *dev, struct scatterlist *sglist,
+  int nelems, enum dma_data_direction direction)
 {
int i;
-   for (i = 0; i < nelems; i++, sg++)
+   struct scatterlist *sg;
+
+   for_each_sg(sglist, sg, nelems, i)
dma_sync_for_device(sg_virt(sg), sg->length, direction);
 }
 
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 5/5] mips: use for_each_sg()

2015-04-30 Thread Akinobu Mita
Since mips doesn't select ARCH_HAS_SG_CHAIN, it is not necessary to
use for_each_sg() in order to loop over each sg element.  But this can
help find problems with drivers that do not properly initialize their
sg tables when CONFIG_DEBUG_SG is enabled.

Signed-off-by: Akinobu Mita 
Cc: Ralf Baechle 
Cc: linux-m...@linux-mips.org
Cc: linux-a...@vger.kernel.org
---
 arch/mips/mm/dma-default.c | 30 --
 1 file changed, 20 insertions(+), 10 deletions(-)

diff --git a/arch/mips/mm/dma-default.c b/arch/mips/mm/dma-default.c
index 609d124..eeaf024 100644
--- a/arch/mips/mm/dma-default.c
+++ b/arch/mips/mm/dma-default.c
@@ -262,12 +262,13 @@ static void mips_dma_unmap_page(struct device *dev, 
dma_addr_t dma_addr,
plat_unmap_dma_mem(dev, dma_addr, size, direction);
 }
 
-static int mips_dma_map_sg(struct device *dev, struct scatterlist *sg,
+static int mips_dma_map_sg(struct device *dev, struct scatterlist *sglist,
int nents, enum dma_data_direction direction, struct dma_attrs *attrs)
 {
int i;
+   struct scatterlist *sg;
 
-   for (i = 0; i < nents; i++, sg++) {
+   for_each_sg(sglist, sg, nents, i) {
if (!plat_device_is_coherent(dev))
__dma_sync(sg_page(sg), sg->offset, sg->length,
   direction);
@@ -291,13 +292,14 @@ static dma_addr_t mips_dma_map_page(struct device *dev, 
struct page *page,
return plat_map_dma_mem_page(dev, page) + offset;
 }
 
-static void mips_dma_unmap_sg(struct device *dev, struct scatterlist *sg,
+static void mips_dma_unmap_sg(struct device *dev, struct scatterlist *sglist,
int nhwentries, enum dma_data_direction direction,
struct dma_attrs *attrs)
 {
int i;
+   struct scatterlist *sg;
 
-   for (i = 0; i < nhwentries; i++, sg++) {
+   for_each_sg(sglist, sg, nhwentries, i) {
if (!plat_device_is_coherent(dev) &&
direction != DMA_TO_DEVICE)
__dma_sync(sg_page(sg), sg->offset, sg->length,
@@ -324,26 +326,34 @@ static void mips_dma_sync_single_for_device(struct device 
*dev,
 }
 
 static void mips_dma_sync_sg_for_cpu(struct device *dev,
-   struct scatterlist *sg, int nelems, enum dma_data_direction direction)
+   struct scatterlist *sglist, int nelems,
+   enum dma_data_direction direction)
 {
int i;
+   struct scatterlist *sg;
 
-   if (cpu_needs_post_dma_flush(dev))
-   for (i = 0; i < nelems; i++, sg++)
+   if (cpu_needs_post_dma_flush(dev)) {
+   for_each_sg(sglist, sg, nelems, i) {
__dma_sync(sg_page(sg), sg->offset, sg->length,
   direction);
+   }
+   }
plat_post_dma_flush(dev);
 }
 
 static void mips_dma_sync_sg_for_device(struct device *dev,
-   struct scatterlist *sg, int nelems, enum dma_data_direction direction)
+   struct scatterlist *sglist, int nelems,
+   enum dma_data_direction direction)
 {
int i;
+   struct scatterlist *sg;
 
-   if (!plat_device_is_coherent(dev))
-   for (i = 0; i < nelems; i++, sg++)
+   if (!plat_device_is_coherent(dev)) {
+   for_each_sg(sglist, sg, nelems, i) {
__dma_sync(sg_page(sg), sg->offset, sg->length,
   direction);
+   }
+   }
 }
 
 int mips_dma_mapping_error(struct device *dev, dma_addr_t dma_addr)
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 4/5] xtensa: use for_each_sg()

2015-04-30 Thread Akinobu Mita
Since xtensa doesn't select ARCH_HAS_SG_CHAIN, it is not necessary to
use for_each_sg() in order to loop over each sg element.  But this can
help find problems with drivers that do not properly initialize their
sg tables when CONFIG_DEBUG_SG is enabled.

Signed-off-by: Akinobu Mita 
Cc: Chris Zankel 
Cc: Max Filippov 
Cc: linux-xte...@linux-xtensa.org
Cc: linux-a...@vger.kernel.org
---
 arch/xtensa/include/asm/dma-mapping.h | 19 ---
 1 file changed, 12 insertions(+), 7 deletions(-)

diff --git a/arch/xtensa/include/asm/dma-mapping.h 
b/arch/xtensa/include/asm/dma-mapping.h
index 172a02a..54d2b22 100644
--- a/arch/xtensa/include/asm/dma-mapping.h
+++ b/arch/xtensa/include/asm/dma-mapping.h
@@ -52,14 +52,15 @@ dma_unmap_single(struct device *dev, dma_addr_t dma_addr, 
size_t size,
 }
 
 static inline int
-dma_map_sg(struct device *dev, struct scatterlist *sg, int nents,
+dma_map_sg(struct device *dev, struct scatterlist *sglist, int nents,
   enum dma_data_direction direction)
 {
int i;
+   struct scatterlist *sg;
 
BUG_ON(direction == DMA_NONE);
 
-   for (i = 0; i < nents; i++, sg++ ) {
+   for_each_sg(sglist, sg, nents, i) {
BUG_ON(!sg_page(sg));
 
sg->dma_address = sg_phys(sg);
@@ -124,20 +125,24 @@ dma_sync_single_range_for_device(struct device *dev, 
dma_addr_t dma_handle,
consistent_sync((void *)bus_to_virt(dma_handle)+offset,size,direction);
 }
 static inline void
-dma_sync_sg_for_cpu(struct device *dev, struct scatterlist *sg, int nelems,
+dma_sync_sg_for_cpu(struct device *dev, struct scatterlist *sglist, int nelems,
 enum dma_data_direction dir)
 {
int i;
-   for (i = 0; i < nelems; i++, sg++)
+   struct scatterlist *sg;
+
+   for_each_sg(sglist, sg, nelems, i)
consistent_sync(sg_virt(sg), sg->length, dir);
 }
 
 static inline void
-dma_sync_sg_for_device(struct device *dev, struct scatterlist *sg, int nelems,
-enum dma_data_direction dir)
+dma_sync_sg_for_device(struct device *dev, struct scatterlist *sglist,
+  int nelems, enum dma_data_direction dir)
 {
int i;
-   for (i = 0; i < nelems; i++, sg++)
+   struct scatterlist *sg;
+
+   for_each_sg(sglist, sg, nelems, i)
consistent_sync(sg_virt(sg), sg->length, dir);
 }
 static inline int
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/5] arc: use for_each_sg()

2015-04-30 Thread Akinobu Mita
Since arc doesn't select ARCH_HAS_SG_CHAIN, it is not necessary to
use for_each_sg() in order to loop over each sg element.  But this can
help find problems with drivers that do not properly initialize their
sg tables when CONFIG_DEBUG_SG is enabled.

Signed-off-by: Akinobu Mita 
Cc: Vineet Gupta 
Cc: linux-a...@vger.kernel.org
---
 arch/arc/include/asm/dma-mapping.h | 12 +++-
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/arch/arc/include/asm/dma-mapping.h 
b/arch/arc/include/asm/dma-mapping.h
index 45b8e0c..f787894 100644
--- a/arch/arc/include/asm/dma-mapping.h
+++ b/arch/arc/include/asm/dma-mapping.h
@@ -178,22 +178,24 @@ dma_sync_single_range_for_device(struct device *dev, 
dma_addr_t dma_handle,
 }
 
 static inline void
-dma_sync_sg_for_cpu(struct device *dev, struct scatterlist *sg, int nelems,
+dma_sync_sg_for_cpu(struct device *dev, struct scatterlist *sglist, int nelems,
enum dma_data_direction dir)
 {
int i;
+   struct scatterlist *sg;
 
-   for (i = 0; i < nelems; i++, sg++)
+   for_each_sg(sglist, sg, nelems, i)
_dma_cache_sync((unsigned int)sg_virt(sg), sg->length, dir);
 }
 
 static inline void
-dma_sync_sg_for_device(struct device *dev, struct scatterlist *sg, int nelems,
-  enum dma_data_direction dir)
+dma_sync_sg_for_device(struct device *dev, struct scatterlist *sglist,
+  int nelems, enum dma_data_direction dir)
 {
int i;
+   struct scatterlist *sg;
 
-   for (i = 0; i < nelems; i++, sg++)
+   for_each_sg(sglist, sg, nelems, i)
_dma_cache_sync((unsigned int)sg_virt(sg), sg->length, dir);
 }
 
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/5] m68k: use for_each_sg()

2015-04-30 Thread Akinobu Mita
Since m68k doesn't select ARCH_HAS_SG_CHAIN, it is not necessary to
use for_each_sg() in order to loop over each sg element.  But this can
help find problems with drivers that do not properly initialize their
sg tables when CONFIG_DEBUG_SG is enabled.

Signed-off-by: Akinobu Mita 
Cc: Geert Uytterhoeven 
Cc: linux-m...@lists.linux-m68k.org
Cc: linux-a...@vger.kernel.org
---
 arch/m68k/kernel/dma.c | 19 ---
 1 file changed, 12 insertions(+), 7 deletions(-)

diff --git a/arch/m68k/kernel/dma.c b/arch/m68k/kernel/dma.c
index e546a55..564665f 100644
--- a/arch/m68k/kernel/dma.c
+++ b/arch/m68k/kernel/dma.c
@@ -120,13 +120,16 @@ void dma_sync_single_for_device(struct device *dev, 
dma_addr_t handle,
 }
 EXPORT_SYMBOL(dma_sync_single_for_device);
 
-void dma_sync_sg_for_device(struct device *dev, struct scatterlist *sg, int 
nents,
-   enum dma_data_direction dir)
+void dma_sync_sg_for_device(struct device *dev, struct scatterlist *sglist,
+   int nents, enum dma_data_direction dir)
 {
int i;
+   struct scatterlist *sg;
 
-   for (i = 0; i < nents; sg++, i++)
-   dma_sync_single_for_device(dev, sg->dma_address, sg->length, 
dir);
+   for_each_sg(sglist, sg, nents, i) {
+   dma_sync_single_for_device(dev, sg->dma_address, sg->length,
+  dir);
+   }
 }
 EXPORT_SYMBOL(dma_sync_sg_for_device);
 
@@ -151,14 +154,16 @@ dma_addr_t dma_map_page(struct device *dev, struct page 
*page,
 }
 EXPORT_SYMBOL(dma_map_page);
 
-int dma_map_sg(struct device *dev, struct scatterlist *sg, int nents,
+int dma_map_sg(struct device *dev, struct scatterlist *sglist, int nents,
   enum dma_data_direction dir)
 {
int i;
+   struct scatterlist *sg;
 
-   for (i = 0; i < nents; sg++, i++) {
+   for_each_sg(sglist, sg, nents, i) {
sg->dma_address = sg_phys(sg);
-   dma_sync_single_for_device(dev, sg->dma_address, sg->length, 
dir);
+   dma_sync_single_for_device(dev, sg->dma_address, sg->length,
+  dir);
}
return nents;
 }
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[BUG] Jiffies not incrementing, tick_handover_do_timer strictly tied to hotplug

2015-04-30 Thread pawandeep oza
Hi,

Linux version 3.10.17

Problem Statement: The timekeeping/do_timer seems to be stopped and
the core (in this case it is core0) which is aborting is stuck in the
loop which relies on jiffies.


The root cause/Reason:

we have tickless kernel, so cpu goes to deep idle state, and stop
sched tick. tick_nohz_stop_sched_tick

tick_sched_do_timer should then take the job and whichever cpu is
running transfer jiffies incrementing job to itself. which is
tick_sched_do_timer


but when say core0 has raised BUG, ipi_cpu_stop will amek other cpu to
go to stop. and clcokevents_notify/tick_notify/hrtimer_notifiy
eventually seem to be conencted through cpu_chain.

but this code belong to hotplug where cpu_down happen and then it can
successfully call tick_handover_do_timer which will take over the duty
from dying cpu and assign it to the one which is online.

static void tick_handover_do_timer(int *cpup) { if (*cpup ==
tick_do_timer_cpu) { int cpu = cpumask_first(cpu_online_mask);
tick_do_timer_cpu = (cpu < nr_cpu_ids) ? cpu : TICK_DO_TIMER_NONE; } }


but since cpu_down is not getting called, this handover is not happening.
and the last status of the variable tick_do_timer_cpu is always
pointing to DEAD cpu (1,2 or 3).

and core0 waits forever (where if the code relies on the increment of jiffies).


what is the right way to approach this problem, at first it looks like
kernel should take care of handing over the jiffies job to other
online core indepedent of hotplug.

Regards,
Oza.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v6 4/4] clk: dt: Introduce binding for always-on clock support

2015-04-30 Thread Lee Jones
On Fri, 01 May 2015, Sascha Hauer wrote:
> On Thu, Apr 30, 2015 at 10:57:22AM +0100, Lee Jones wrote:
> > On Wed, 29 Apr 2015, Maxime Ripard wrote:
> > 
> > > On Wed, Apr 29, 2015 at 03:17:51PM +0100, Lee Jones wrote:
> > > > On Wed, 22 Apr 2015, Maxime Ripard wrote:
> > > > 
> > > > > On Wed, Apr 08, 2015 at 06:23:44PM +0100, Lee Jones wrote:
> > > > > > On Wed, 08 Apr 2015, Maxime Ripard wrote:
> > > > > > 
> > > > > > > On Wed, Apr 08, 2015 at 11:38:32AM +0100, Lee Jones wrote:
> > > > > > > > On Wed, 08 Apr 2015, Maxime Ripard wrote:
> > > > > > > > 
> > > > > > > > > On Wed, Apr 08, 2015 at 09:14:50AM +0100, Lee Jones wrote:
> > > > > > > > > > > > +
> > > > > > > > > > > > +   This property is not to be abused.  
> > > > > > > > > > > > It is only to be used to
> > > > > > > > > > > > +   protect platforms from being 
> > > > > > > > > > > > crippled by gated clocks, not
> > > > > > > > > > > > +   as a convenience function to avoid 
> > > > > > > > > > > > using the framework
> > > > > > > > > > > > +   correctly inside device drivers.
> > > > > > > > > > > 
> > > > > > > > > > > Disregarding what's stated here, I'm pretty sure that 
> > > > > > > > > > > this will
> > > > > > > > > > > actually happen. Where do you place the cursor?
> > > > > > > > > > 
> > > > > > > > > > That's up to Mike.
> > > > > > > > > 
> > > > > > > > > Except that Mike won't review any of the DT changes, so he 
> > > > > > > > > won't be
> > > > > > > > > able to refrain users from using it. Let alone out-of-tree 
> > > > > > > > > DTs using a
> > > > > > > > > mainline kernel.
> > > > > > > > 
> > > > > > > > Ideally Mike should be Cc'ed on patches using clock bindings, 
> > > > > > > > but if
> > > > > > > > he isn't the DT guys are smart enough to either make the right
> > > > > > > > decisions themselves (Rob has Acked these bindings already, so 
> > > > > > > > will be
> > > > > > > > on the lookout for misuse, I'm sure), or ask for Mike's help.
> > > > > > > 
> > > > > > > Yeah, right, as if this strategy really worked in the past
> > > > > > > 
> > > > > > > Do we really want to look at even the DT bindings that have 
> > > > > > > actually
> > > > > > > been reviewed by maintainers that got merged?
> > > > > > > 
> > > > > > > They don't have time for that, which is totally fine, but we 
> > > > > > > really
> > > > > > > should bury our head in the sand by actually thinking they will 
> > > > > > > review
> > > > > > > every single DT-related patch.
> > > > > > > 
> > > > > > > Using that as an argument is just plain denial of what really 
> > > > > > > happened
> > > > > > > for the past 4 years.
> > > > > > 
> > > > > > I agree that it's a problem, but this is a process problem and has
> > > > > > nothing to do with this set.  If you have a problem with the current
> > > > > > process and have a better alternative, submit your thoughts to the 
> > > > > > DT
> > > > > > list.  Rejecting all new bindings because you are frightened that 
> > > > > > they
> > > > > > will be used in a manner that they were not intended is not the way 
> > > > > > to
> > > > > > go though.
> > > > > 
> > > > > I'm not saying that this binding should not go in because of a process
> > > > > issue.
> > > > > 
> > > > > I'm saying that discarding arguments against your binding by adding
> > > > > restrictions that cannot be enforced is not reasonable.
> > > > 
> > > > I'm open to constructive suggestions/alternatives.
> > > > 
> > > > Hand rolling this stuff in C per vendor is not of of them.
> > > 
> > > I'm sorry, but ruling out alternatives that work for everyone (and
> > > actually work better) just because you don't want to edit a C file is
> > > not really constructive either.
> > > 
> > > > > > > > > > > Should we create a new driver for our RAM controller, or 
> > > > > > > > > > > do we want to
> > > > > > > > > > > use clock-always-on?
> > > > > > > > > > 
> > > > > > > > > > I would say that if all the driver did was to enable 
> > > > > > > > > > clocks, then you
> > > > > > > > > > should use this instead.  This binding was designed 
> > > > > > > > > > specifically for
> > > > > > > > > > that purpose.
> > > > > > > > > > 
> > > > > > > > > > However, if the aforementioned driver clock can be safely 
> > > > > > > > > > gated, then
> > > > > > > > > > it should not be an always-on clock.
> > > > > > > > > 
> > > > > > > > > Yeah, of course, I understand the original intent of it, but 
> > > > > > > > > that
> > > > > > > > > argument, which might very well be true at one point in time, 
> > > > > > > > > might
> > > > > > > > > not be true anymore two or three releases later.
> > > > > > > > 
> > > > > > > > Why?  The H/W isn't going to change in two or three releases.  
> > > > > > > > The
> > > > > > > > clocks designated as 'always-on' will have to be on forever, or
> > > > > > > > synonymously, 'always'.
> > > > > > > >
> > > > > > > > > And that driv

Re: [PATCH 3/3] context_tracking,x86: remove extraneous irq disable & enable from context tracking on syscall entry

2015-04-30 Thread Ingo Molnar

* r...@redhat.com  wrote:

> From: Rik van Riel 
> 
> On syscall entry with nohz_full on, we enable interrupts, call user_exit,
> disable interrupts, do something, re-enable interrupts, and go on our
> merry way.
> 
> Profiling shows that a large amount of the nohz_full overhead comes
> from the extraneous disabling and re-enabling of interrupts. Andy
> suggested simply not enabling interrupts until after the context
> tracking code has done its thing, which allows us to skip a whole
> interrupt disable & re-enable cycle.
> 
> This patch builds on top of these patches by Paolo:
> https://lkml.org/lkml/2015/4/28/188
> https://lkml.org/lkml/2015/4/29/139
> 
> Together with this patch I posted earlier this week, the syscall path
> on a nohz_full cpu seems to be about 10% faster.
> https://lkml.org/lkml/2015/4/24/394
> 
> My test is a simple microbenchmark that calls getpriority() in a loop
> 10 million times:
> 
>   run timesystem time
> vanilla   5.49s   2.08s
> __acct patch  5.21s   1.92s
> both patches  4.88s   1.71s

Just curious, what are the numbers if you don't have context tracking 
enabled, i.e. without nohz_full?

I.e. what's the baseline we are talking about?

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7 00/23] IB/Verbs: IB Management Helpers

2015-04-30 Thread ira.weiny
On Tue, Apr 28, 2015 at 05:10:00PM +0200, Michael Wang wrote:
> Since v6:
>   * Thanks to Ira, Devesh for the review and testing :-)
>   * Thanks for the comments from Sean, Tom, Jason, Doug, Devesh, Ira,
> Liran :-) Please remind me if anything missed :-P
>   * Use query_protocol() and enum protocol type in 1#
>   * Use rdma_protocol_XX() in 2#
>   * Drop cma_set_legacy_transport()
>   * Reserve rdma_ib_or_iboe() and rdma_node_get_transport()
>   * Updated github repository to v7

I pulled these via Dougs for-4.2 branch and have done light testing with mlx4
and qib.

Now we need to look at converting to some bit mask.

Does anyone have a link to the emails which proposed bitmasks?  I can't find
them right now.


For the Series:

Reviewed-by: Ira Weiny 


> 
> There are plenty of lengthy code to check the transport type of IB device,
> or the link layer type of it's port, but actually we are just speculating
> whether a particular management/feature is supported by the device/port.
> 
> Thus instead of inferring, we should have our own mechanism for IB management
> capability/protocol/feature checking, several proposals below.
> 
> This patch set will introduce query_protocol() to check management requirement
> instead of inferring from transport and link layer respectively, along with
> the new enum on protocol type.
> 
> Mapping List:
>   node-type   link-layer  transport   protocol
> nes   RNICETH IWARP   IWARP
> amso1100  RNICETH IWARP   IWARP
> cxgb3 RNICETH IWARP   IWARP
> cxgb4 RNICETH IWARP   IWARP
> usnic USNIC_UDP   ETH USNIC_UDP   USNIC_UDP
> ocrdmaIB_CA   ETH IB  IBOE
> mlx4  IB_CA   IB/ETH  IB  IB/IBOE
> mlx5  IB_CA   IB  IB  IB
> ehca  IB_CA   IB  IB  IB
> ipath IB_CA   IB  IB  IB
> mthca IB_CA   IB  IB  IB
> qib   IB_CA   IB  IB  IB
> 
> For example:
>   if (transport == IB) && (link-layer == ETH)
> will now become:
>   if (query_protocol() == IBOE)
> 
> Thus we will be able to get rid of the respective transport and link-layer
> checking, and it will help us to add new protocol/Technology (like OPA) more
> easier, also with the introduced management helpers, IB management logical
> will be more clear and easier for extending.
> 
> Highlights:
> The 'mgmt-helpers' branch of 'g...@github.com:ywang-pb/infiniband-wy.git'
> contain this series based on the latest 'infiniband/for-next'
> 
> The patch set covered a wide range of IB stuff, thus for those who are
> familiar with the particular part, your suggestion would be invaluable ;-)
> 
> Patch 1#~14# included all the logical reform, 15#~23# introduced the
> management helpers.
> 
> we appreciate for those one who have the HW willing to provide Tested-by 
> :-)
> 
> Doug suggested the bitmask mechanism:
>   https://www.mail-archive.com/linux-rdma@vger.kernel.org/msg23765.html
> which could be the plan for future reforming, we prefer that to be another
> series which focus on semantic and performance.
> 
> This patch-set is somewhat 'bloated' now and it may be a good timing for
> staging, I'd like to suggest we focus on improving existed helpers and 
> push
> all the further reforms into next series ;-)
> 
> 
> Proposals:
> Sean:
>   https://www.mail-archive.com/linux-rdma@vger.kernel.org/msg23339.html
> Doug:
>   https://www.mail-archive.com/linux-rdma@vger.kernel.org/msg23418.html
>   https://www.mail-archive.com/linux-rdma@vger.kernel.org/msg23765.html
> Jason:
>   https://www.mail-archive.com/linux-rdma@vger.kernel.org/msg23425.html
> 
> Michael Wang (23):
>   IB/Verbs: Implement new callback query_protocol()
>   IB/Verbs: Implement raw management helpers
>   IB/Verbs: Reform IB-core mad/agent/user_mad
>   IB/Verbs: Reform IB-core cm
>   IB/Verbs: Reform IB-core sa_query
>   IB/Verbs: Reform IB-core multicast
>   IB/Verbs: Reform IB-ulp ipoib
>   IB/Verbs: Reform IB-ulp xprtrdma
>   IB/Verbs: Reform IB-core verbs
>   IB/Verbs: Reform cm related part in IB-core cma/ucm
>   IB/Verbs: Reform route related part in IB-core cma
>   IB/Verbs: Reform mcast related part in IB-core cma
>   IB/Verbs: Reform cma_acquire_dev()
>   IB/Verbs: Reform rest part in IB-core cma
>   IB/Verbs: Use management helper cap_ib_mad()
>   IB/Verbs: Use management helper cap_ib_smi()
>   IB/Verbs: Use management helper cap_ib_cm()
>   IB/Verbs: Use management helper cap_iw_cm()
>   IB/Verbs: Use management helper cap_ib_sa()
>   IB/Verbs: Use management helper cap_ib_mcast()
>   IB/Verbs: Use manageme

Re: [GIT PULL 0/7] perf/urgent fixes

2015-04-30 Thread Ingo Molnar

* Arnaldo Carvalho de Melo  wrote:

> Hi Ingo,
> 
>   Please consider pulling, this is on top of my previous 
> 'perf-urgent-for-mingo'
> pull request.
> 
> - Arnaldo
> 
> The following changes since commit de28c15daf60e9625bece22f13a091fac8d05f1d:
> 
>   tools lib api: Undefine _FORTIFY_SOURCE before setting it (2015-04-23 
> 17:08:23 -0300)
> 
> are available in the git repository at:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git 
> tags/perf-urgent-for-mingo-2
> 
> for you to fetch changes up to 1d90a685eb75a56648d7dd22c704a1a6da516de9:
> 
>   perf bench numa: Fix immediate meeting of convergence condition (2015-04-27 
> 13:57:50 -0300)
> 
> 
> perf/urgent fixes:
> 
> User visible:
> 
> . Fix a segfault in 'perf top' when kernel map is restricted (Wang Nan)
> 
> . Fix hung wakeup tasks after requeueing in 'perf bench futex' (Davidlohr 
> Bueso)
> 
> . Fix bug in perf probe global variables handling, missing curly braces on
>   an if body (He Kuang)
> 
> . 'perf bench numa' fixes (command line help/handling, etc) (Petr Holasek)
> 
> Build fixes:
> 
> . 'perf kmem' on RHEL6/OL6 (David Ahern)
> 
> . libtraceevent on 32-bit arch (Namhyung Kim)
> 
> Signed-off-by: Arnaldo Carvalho de Melo 
> 
> 
> David Ahern (1):
>   perf kmem: Fix compiles on RHEL6/OL6
> 
> Davidlohr Bueso (1):
>   perf bench futex: Fix hung wakeup tasks after requeueing
> 
> He Kuang (1):
>   perf probe: Fix bug with global variables handling
> 
> Namhyung Kim (1):
>   tools lib traceevent: Fix build failure on 32-bit arch
> 
> Petr Holasek (2):
>   perf bench numa: Fixes of --quiet argument
>   perf bench numa: Fix immediate meeting of convergence condition
> 
> Wang Nan (1):
>   perf top: Fix a segfault when kernel map is restricted.
> 
>  tools/lib/traceevent/event-parse.c |  2 +-
>  tools/perf/bench/futex-requeue.c   | 15 ++-
>  tools/perf/bench/numa.c| 12 +++--
>  tools/perf/builtin-kmem.c  | 54 
> +++---
>  tools/perf/builtin-top.c   |  2 +-
>  tools/perf/util/probe-finder.c |  4 ++-
>  6 files changed, 50 insertions(+), 39 deletions(-)

Pulled, thanks a lot Arnaldo!

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH kernel v9 29/32] vfio: powerpc/spapr: Register memory and define IOMMU v2

2015-04-30 Thread Alexey Kardashevskiy

On 05/01/2015 03:23 PM, David Gibson wrote:

On Fri, May 01, 2015 at 02:35:23PM +1000, Alexey Kardashevskiy wrote:

On 04/30/2015 04:55 PM, David Gibson wrote:

On Sat, Apr 25, 2015 at 10:14:53PM +1000, Alexey Kardashevskiy wrote:

The existing implementation accounts the whole DMA window in
the locked_vm counter. This is going to be worse with multiple
containers and huge DMA windows. Also, real-time accounting would requite
additional tracking of accounted pages due to the page size difference -
IOMMU uses 4K pages and system uses 4K or 64K pages.

Another issue is that actual pages pinning/unpinning happens on every
DMA map/unmap request. This does not affect the performance much now as
we spend way too much time now on switching context between
guest/userspace/host but this will start to matter when we add in-kernel
DMA map/unmap acceleration.

This introduces a new IOMMU type for SPAPR - VFIO_SPAPR_TCE_v2_IOMMU.
New IOMMU deprecates VFIO_IOMMU_ENABLE/VFIO_IOMMU_DISABLE and introduces
2 new ioctls to register/unregister DMA memory -
VFIO_IOMMU_SPAPR_REGISTER_MEMORY and VFIO_IOMMU_SPAPR_UNREGISTER_MEMORY -
which receive user space address and size of a memory region which
needs to be pinned/unpinned and counted in locked_vm.
New IOMMU splits physical pages pinning and TCE table update into 2 different
operations. It requires 1) guest pages to be registered first 2) consequent
map/unmap requests to work only with pre-registered memory.
For the default single window case this means that the entire guest
(instead of 2GB) needs to be pinned before using VFIO.
When a huge DMA window is added, no additional pinning will be
required, otherwise it would be guest RAM + 2GB.

The new memory registration ioctls are not supported by
VFIO_SPAPR_TCE_IOMMU. Dynamic DMA window and in-kernel acceleration
will require memory to be preregistered in order to work.

The accounting is done per the user process.

This advertises v2 SPAPR TCE IOMMU and restricts what the userspace
can do with v1 or v2 IOMMUs.

Signed-off-by: Alexey Kardashevskiy 
[aw: for the vfio related changes]
Acked-by: Alex Williamson 
---
Changes:
v9:
* s/tce_get_hva_cached/tce_iommu_use_page_v2/

v7:
* now memory is registered per mm (i.e. process)
* moved memory registration code to powerpc/mmu
* merged "vfio: powerpc/spapr: Define v2 IOMMU" into this
* limited new ioctls to v2 IOMMU
* updated doc
* unsupported ioclts return -ENOTTY instead of -EPERM

v6:
* tce_get_hva_cached() returns hva via a pointer

v4:
* updated docs
* s/kzmalloc/vzalloc/
* in tce_pin_pages()/tce_unpin_pages() removed @vaddr, @size and
replaced offset with index
* renamed vfio_iommu_type_register_memory to vfio_iommu_spapr_register_memory
and removed duplicating vfio_iommu_spapr_register_memory
---
  Documentation/vfio.txt  |  23 
  drivers/vfio/vfio_iommu_spapr_tce.c | 230 +++-
  include/uapi/linux/vfio.h   |  27 +
  3 files changed, 274 insertions(+), 6 deletions(-)

diff --git a/Documentation/vfio.txt b/Documentation/vfio.txt
index 96978ec..94328c8 100644
--- a/Documentation/vfio.txt
+++ b/Documentation/vfio.txt
@@ -427,6 +427,29 @@ The code flow from the example above should be slightly 
changed:



+5) There is v2 of SPAPR TCE IOMMU. It deprecates VFIO_IOMMU_ENABLE/
+VFIO_IOMMU_DISABLE and implements 2 new ioctls:
+VFIO_IOMMU_SPAPR_REGISTER_MEMORY and VFIO_IOMMU_SPAPR_UNREGISTER_MEMORY
+(which are unsupported in v1 IOMMU).


A summary of the semantic differeces between v1 and v2 would be nice.
At this point it's not really clear to me if there's a case for
creating v2, or if this could just be done by adding (optional)
functionality to v1.


v1: memory preregistration is not supported; explicit enable/disable ioctls
are required

v2: memory preregistration is required; explicit enable/disable are
prohibited (as they are not needed).

Mixing these in one IOMMU type caused a lot of problems like should I
increment locked_vm by the 32bit window size on enable() or not; what do I
do about pages pinning when map/map (check if it is from registered memory
and do not pin?).

Having 2 IOMMU models makes everything a lot simpler.


Ok.  Would it simplify it further if you made v2 only usable on IODA2
hardware?



Very little. V2 addresses memory pinning issue which is handled the same 
way on ioda2 and older hardware, including KVM acceleration. Whether enable 
DDW or not - this is handled just fine via extra properties in the GET_INFO 
ioctl().


IODA2 and others are different in handling multiple groups per container 
but this does not require changes to userspace API.


And remember, the only machine I can use 100% of time is POWER7/P5IOC2 so 
it is really useful if at least some bits of the patchset can be tested 
there; if it was a bit less different from IODA2, I would have even 
implemented DDW there too :)




+PPC64 paravirtualized guests generate a lot of map/unmap requests,
+and the handling of those 

Re: [PATCH 2/2] brcmfmac: keep WiFi chip's power during system suspension

2015-04-30 Thread Fu, Zhonghui


On 2015/4/27 16:53, Arend van Spriel wrote:
> On 04/27/15 07:06, Fu, Zhonghui wrote:
>> Need to keep the power supply for WiFi chip during system suspension.
>> Otherwise, the context of WiFi chip will be lost.
>
> I already submitted a patch doing exactly the same thing [1]

OK, please ignore this patch.

What's the target kernel version of your patch?

Thanks,
Zhonghui
>
> Regards,
> Arend
>
> [1] https://patchwork.kernel.org/patch/6217391/
>
>> Signed-off-by: Zhonghui Fu
>> ---
>>   drivers/net/wireless/brcm80211/brcmfmac/bcmsdh.c |   10 ++
>>   1 files changed, 6 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/net/wireless/brcm80211/brcmfmac/bcmsdh.c 
>> b/drivers/net/wireless/brcm80211/brcmfmac/bcmsdh.c
>> index fdf8feb..03d3671 100644
>> --- a/drivers/net/wireless/brcm80211/brcmfmac/bcmsdh.c
>> +++ b/drivers/net/wireless/brcm80211/brcmfmac/bcmsdh.c
>> @@ -1251,15 +1251,17 @@ static int brcmf_ops_sdio_suspend(struct device *dev)
>>   brcmf_sdiod_freezer_on(sdiodev);
>>   brcmf_sdio_wd_timer(sdiodev->bus, 0);
>>
>> +sdio_flags = MMC_PM_KEEP_POWER;
>>   if (sdiodev->wowl_enabled) {
>> -sdio_flags = MMC_PM_KEEP_POWER;
>>   if (sdiodev->pdata->oob_irq_supported)
>>   enable_irq_wake(sdiodev->pdata->oob_irq_nr);
>>   else
>> -sdio_flags = MMC_PM_WAKE_SDIO_IRQ;
>> -if (sdio_set_host_pm_flags(sdiodev->func[1], sdio_flags))
>> -brcmf_err("Failed to set pm_flags %x\n", sdio_flags);
>> +sdio_flags |= MMC_PM_WAKE_SDIO_IRQ;
>>   }
>> +
>> +if (sdio_set_host_pm_flags(sdiodev->func[1], sdio_flags))
>> +brcmf_err("Failed to set pm_flags %x\n", sdio_flags);
>> +
>>   return 0;
>>   }
>>
>> -- 1.7.1
>>
>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH bisected regression] input_available_p() sometimes says 'no' when it should say 'yes'

2015-04-30 Thread NeilBrown

Hi Peter,
 I recently had a report of a regression in 3.12.  I bisected it down to your
 patch
  f95499c3030f ("n_tty: Don't wait for buffer work in read() loop")

 Sometimes a poll on a master-pty will report there is nothing to read after
 the slave has written something.
 As test program is below.
 On a kernel prior to your commit, this program never reports

Total bytes read is 0. PollRC=0

 On a kernel subsequent to your commit, that message is produced quite often.

 This was found while working on a debugger.

 Following the test program is my proposed patch which allows the program to
 run as it used to.  It re-introduces the call to tty_flush_to_ldisc(), but
 only if it appears that there is nothing to read.

 Do you think this is a suitable fix?  Do you even agree that it is a real
 bug?

Thanks,
NeilBrown



--
#define _XOPEN_SOURCE
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include


#define USEPTY
#define COUNT_MAX   (500)
#define MY_BREAKPOINT   { asm("int $3"); }
#define PTRACE_IGNORED  (void *)0

/*
** Open s pseudo-tty pair.
**
** Return the master fd, set spty to the slave fd.
*/
int
my_openpt(int *spty)
{
int mfd = posix_openpt(O_RDWR | O_NOCTTY);
char *slavedev;
int sfd=-1;
if(mfd == -1) return -1;
if(grantpt(mfd) == -1) return -1;
if(unlockpt(mfd) == -1) return -1;

slavedev = (char *)ptsname(mfd);

if((sfd = open(slavedev, O_RDWR | O_NOCTTY)) == -1)
{
close(mfd);
return -1;
}
if(spty != NULL)
{
*spty = sfd;
}
return mfd;
}


/*
** Read from the provided file descriptor if poll says there's
** anything there..
*/
int
DoPollRead(int mpty)
{
struct pollfd fds;
int pollrc;
ssize_t bread=0, totread=0;
char readbuf[101];

/*
** Set up the poll.
*/
fds.fd = mpty;
fds.events = POLLIN | POLLRDNORM | POLLRDBAND | POLLPRI;

/*
** poll for any output.
*/

while((pollrc = poll(&fds, 1, 0)) == 1)
{
if(fds.revents & POLLIN)
{
bread = read( mpty, readbuf, 100 );
totread += bread;
if(bread > 0)
{
//printf("Read %d bytes.\n", (int)bread);
readbuf[bread] = '\0';
//printf("\t%s", readbuf);
} else
{
//puts("Nothing read.\n");
}
} else if (fds.revents & (POLLHUP | POLLERR | POLLNVAL)) {
printf ("hangup/error/invalid on poll\n");
return totread;
} else { printf("No POLLIN, revents=%d\n", fds.revents); };
}

/*
** This sometimes happens - we're expecting input on the pty, 
** but nothing is there.
*/
if(totread == 0)
printf("Total bytes read is 0. PollRC=%d\n", pollrc);

return totread;
}

static
void writeall (int fd, const char *buf, size_t count)
{
  while (count)
{
  ssize_t r = write (fd, buf, count);
  if (r == 0)
break;
  if (r < 0 && errno == EINTR)
continue;
  if (r < 0)
exit (2);
  count -= r;
  buf += r;
}
}

int
thechild(void)
{
unsigned int i;

writeall (1, "debuggee starts\n", strlen ("debuggee starts\n"));

for(i=0 ; i
Subject: [PATCH] n_tty: Sometimes wait for buffer work in read() loop

Since commit
  f95499c3030f ("n_tty: Don't wait for buffer work in read() loop")

it as been possible for poll to report that there is no data to read
on a master-pty even if a write to the slave has actually completed.

That patch removes a 'wait' when the wait isn't really necessary.
Unfortunately it also removed it in the case when it *is* necessary.
If the simple tests show that there is nothing to read, we really need
to flush the work queue in case there is something ready but which
hasn't arrived yet.

This patch restores the wait, but only if simple tests suggest there
is nothing ready.

Reported-by: Nic Percival 
Reported-by: Michael Matz 
Fixes: f95499c3030f ("n_tty: Don't wait for buffer work in read() loop")
Signed-off-by: NeilBrown 

diff --git a/drivers/tty/n_tty.c b/drivers/tty/n_tty.c
index cf6e0f2e1331..9884091819b6 100644
--- a/drivers/tty/n_tty.c
+++ b/drivers/tty/n_tty.c
@@ -1942,11 +1942,18 @@ static inline int input_available_p(struct tty_struct 
*tty, int poll)
 {
struct n_tty_data *ldata = tty->disc_data;
int amt = poll && !TIME_CHAR(tty) && MIN_CHAR(tty) ? MIN_CHAR(tty) : 1;
-
-   if (ldata->icanon && !L_EXTPROC(tty))
-   return ldata->canon_head != ldata->read_tail;
-   else
-   return ldata->commit_head - ldata->read_tail >= amt;
+   int i;
+   int ret = 0;
+
+   for (i = 0; !ret && i < 2; i++) {
+   if (i)
+   tty_flush_to_ldisc(tty);
+   if (ldata->icanon && !L_EXTPROC(tty))
+   ret = (ldata->canon_head 

[RFT][PATCH 2/2] regulator: max77843: Convert to use regulator_is_enabled_regmap

2015-04-30 Thread Axel Lin
Use regulator_is_enabled_regmap() to replace max77843_reg_is_enabled().

Signed-off-by: Axel Lin 
---
 drivers/regulator/max77843.c | 18 ++
 1 file changed, 2 insertions(+), 16 deletions(-)

diff --git a/drivers/regulator/max77843.c b/drivers/regulator/max77843.c
index 3ae2a9b..f4fd0d3 100644
--- a/drivers/regulator/max77843.c
+++ b/drivers/regulator/max77843.c
@@ -33,21 +33,6 @@ static const unsigned int max77843_safeout_voltage_table[] = 
{
330,
 };
 
-static int max77843_reg_is_enabled(struct regulator_dev *rdev)
-{
-   struct regmap *regmap = rdev->regmap;
-   int ret;
-   unsigned int reg;
-
-   ret = regmap_read(regmap, rdev->desc->enable_reg, Ā®);
-   if (ret) {
-   dev_err(&rdev->dev, "Fialed to read charger register\n");
-   return ret;
-   }
-
-   return (reg & rdev->desc->enable_mask) == rdev->desc->enable_mask;
-}
-
 static int max77843_reg_get_current_limit(struct regulator_dev *rdev)
 {
struct regmap *regmap = rdev->regmap;
@@ -96,7 +81,7 @@ static int max77843_reg_set_current_limit(struct 
regulator_dev *rdev,
 }
 
 static struct regulator_ops max77843_charger_ops = {
-   .is_enabled = max77843_reg_is_enabled,
+   .is_enabled = regulator_is_enabled_regmap,
.enable = regulator_enable_regmap,
.disable= regulator_disable_regmap,
.get_current_limit  = max77843_reg_get_current_limit,
@@ -141,6 +126,7 @@ static const struct regulator_desc 
max77843_supported_regulators[] = {
.owner  = THIS_MODULE,
.enable_reg = MAX77843_CHG_REG_CHG_CNFG_00,
.enable_mask= MAX77843_CHG_MASK | MAX77843_CHG_BUCK_MASK,
+   .enable_val = MAX77843_CHG_MASK | MAX77843_CHG_BUCK_MASK,
},
 };
 
-- 
2.1.0



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFT][PATCH 1/2] regulator: max77843: Fix enable_mask for max77843 charger

2015-04-30 Thread Axel Lin
MAX77843_CHG_ENABLE is 0x05, so the enable_mask should be
MAX77843_CHG_MASK | MAX77843_CHG_BUCK_MASK.

Signed-off-by: Axel Lin 
---
Hi,
I don't have this h/w, so please help to review and test this patch serial.
Thanks,
Axel
 drivers/regulator/max77843.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/regulator/max77843.c b/drivers/regulator/max77843.c
index e4e7687..3ae2a9b 100644
--- a/drivers/regulator/max77843.c
+++ b/drivers/regulator/max77843.c
@@ -140,7 +140,7 @@ static const struct regulator_desc 
max77843_supported_regulators[] = {
.type   = REGULATOR_CURRENT,
.owner  = THIS_MODULE,
.enable_reg = MAX77843_CHG_REG_CHG_CNFG_00,
-   .enable_mask= MAX77843_CHG_MASK,
+   .enable_mask= MAX77843_CHG_MASK | MAX77843_CHG_BUCK_MASK,
},
 };
 
-- 
2.1.0



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH kernel v9 31/32] vfio: powerpc/spapr: Support multiple groups in one container if possible

2015-04-30 Thread Alexey Kardashevskiy

On 05/01/2015 02:33 PM, David Gibson wrote:

On Thu, Apr 30, 2015 at 07:33:09PM +1000, Alexey Kardashevskiy wrote:

On 04/30/2015 05:22 PM, David Gibson wrote:

On Sat, Apr 25, 2015 at 10:14:55PM +1000, Alexey Kardashevskiy wrote:

At the moment only one group per container is supported.
POWER8 CPUs have more flexible design and allows naving 2 TCE tables per
IOMMU group so we can relax this limitation and support multiple groups
per container.


It's not obvious why allowing multiple TCE tables per PE has any
pearing on allowing multiple groups per container.



This patchset is a global TCE tables rework (patches 1..30, roughly) with 2
outcomes:
1. reusing the same IOMMU table for multiple groups - patch 31;
2. allowing dynamic create/remove of IOMMU tables - patch 32.

I can remove this one from the patchset and post it separately later but
since 1..30 aim to support both 1) and 2), I'd think I better keep them all
together (might explain some of changes I do in 1..30).


The combined patchset is fine.  My comment is because your commit
message says that multiple groups are possible *because* 2 TCE tables
per group are allowed, and it's not at all clear why one follows from
the other.



Ah. That's wrong indeed, I'll fix it.



This adds TCE table descriptors to a container and uses iommu_table_group_ops
to create/set DMA windows on IOMMU groups so the same TCE tables will be
shared between several IOMMU groups.

Signed-off-by: Alexey Kardashevskiy 
[aw: for the vfio related changes]
Acked-by: Alex Williamson 
---
Changes:
v7:
* updated doc
---
  Documentation/vfio.txt  |   8 +-
  drivers/vfio/vfio_iommu_spapr_tce.c | 268 ++--
  2 files changed, 199 insertions(+), 77 deletions(-)

diff --git a/Documentation/vfio.txt b/Documentation/vfio.txt
index 94328c8..7dcf2b5 100644
--- a/Documentation/vfio.txt
+++ b/Documentation/vfio.txt
@@ -289,10 +289,12 @@ PPC64 sPAPR implementation note

  This implementation has some specifics:

-1) Only one IOMMU group per container is supported as an IOMMU group
-represents the minimal entity which isolation can be guaranteed for and
-groups are allocated statically, one per a Partitionable Endpoint (PE)
+1) On older systems (POWER7 with P5IOC2/IODA1) only one IOMMU group per
+container is supported as an IOMMU table is allocated at the boot time,
+one table per a IOMMU group which is a Partitionable Endpoint (PE)
  (PE is often a PCI domain but not always).


I thought the more fundamental problem was that different PEs tended
to use disjoint bus address ranges, so even by duplicating put_tce
across PEs you couldn't have a common address space.



Sorry, I am not following you here.

By duplicating put_tce, I can have multiple IOMMU groups on the same virtual
PHB in QEMU, "[PATCH qemu v7 04/14] spapr_pci_vfio: Enable multiple groups
per container" does this, the address ranges will the same.


Oh, ok.  For some reason I thought that (at least on the older
machines) the different PEs used different and not easily changeable
DMA windows in bus addresses space.



They do use different tables (which VFIO does not get to remove/create and 
uses these old helpers - iommu_take/release_ownership), correct. But all 
these windows are mapped at zero on a PE's PCI bus and nothing prevents me 
from updating all these tables with the same TCE values when handling 
H_PUT_TCE. Yes it is slow but it works (bit more details below).





What I cannot do on p5ioc2 is programming the same table to multiple
physical PHBs (or I could but it is very different than IODA2 and pretty
ugly and might not always be possible because I would have to allocate these
pages from some common pool and face problems like fragmentation).


So allowing multiple groups per container should be possible (at the
kernel rather than qemu level) by writing the same value to multiple
TCE tables.  I guess its not worth doing for just the almost-obsolete
IOMMUs though.



It is done at QEMU level though. As it works now, QEMU opens a group, walks 
through all existing containers and tries attaching a new group there. If 
it succeeded (x86 always; POWER8 after this patch), a TCE table is shared. 
If it failed, QEMU creates another container, attaches it to the same 
VFIO/PHB address space and attaches a group there.


Then the only thing left is repeating ioctl() in vfio_container_ioctl() for 
every container in the VFIO address space; this is what that QEMU patch 
does (the first version of that patch called ioctl() only for the first 
container in the address space).


From the kernel prospective there are 2 isolated containers; I'd like to 
keep it this way.


btw thanks for the detailed review :)

--
Alexey
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86/mce: fix mce_restart() race with CPU hotplug operation

2015-04-30 Thread Ethan Zhao
On Fri, May 1, 2015 at 12:29 AM, Borislav Petkov  wrote:
> On Thu, Apr 30, 2015 at 12:04:53AM +0900, Ethan Zhao wrote:
>> while testing CPU hotplug and MCE with following two scripts,
>>
>> script 1:
>>
>>  for i in {1..30}; do while :; do  ((a=$RANDOM%160)); echo 0  >>
>>  /sys/devices/system/cpu/cpu${i}/online; echo 1 >>
>>  /sys/devices/system/cpu/cpu${i}/online; done & done
>>
>> script 2:
>>
>>  while :; do for i in $(ls
>>  /sys/devices/system/machinecheck/machinecheck*/check_interval); do echo 1  
>> >>
>>  $i; done; done
>
> For the record, it is a public secret that CPU hotplug is broken. IOW,
> you're wasting your time with those senseless pounder tests but ok.

  :<,  Someone else is stressing the CPU hotplug, seems it is fragile.
My job is holding the system, not panic to the ground.

 Thanks,
 Ethan

>
> ...
>
>> ---
>>  arch/x86/kernel/cpu/mcheck/mce.c | 4 
>>  1 file changed, 4 insertions(+)
>>
>> diff --git a/arch/x86/kernel/cpu/mcheck/mce.c 
>> b/arch/x86/kernel/cpu/mcheck/mce.c
>> index 3c036cb..fcc2794 100644
>> --- a/arch/x86/kernel/cpu/mcheck/mce.c
>> +++ b/arch/x86/kernel/cpu/mcheck/mce.c
>> @@ -1338,8 +1338,10 @@ static void mce_timer_delete_all(void)
>>  {
>>   int cpu;
>>
>> + get_online_cpus();
>>   for_each_online_cpu(cpu)
>>   del_timer_sync(&per_cpu(mce_timer, cpu));
>> + put_online_cpus();
>>  }
>>
>>  static void mce_do_trigger(struct work_struct *work)
>> @@ -2085,7 +2087,9 @@ static void mce_cpu_restart(void *data)
>>  static void mce_restart(void)
>>  {
>>   mce_timer_delete_all();
>> + get_online_cpus();
>>   on_each_cpu(mce_cpu_restart, NULL, 1);
>> + put_online_cpus();
>
> With your patch applied I get on 4.1-rc1+:
>
> ---
> [   41.364909] kvm: disabling virtualization on CPU1
> [   41.371083] smpboot: CPU 1 is now offline
> [   41.381190] x86: Booting SMP configuration:
> [   41.385405] smpboot: Booting Node 0 Processor 1 APIC 0x2
> [   41.402901] kvm: enabling virtualization on CPU1
> [   41.440944] kvm: disabling virtualization on CPU1
> [   41.447010] smpboot: CPU 1 is now offline
> [   41.486082] kvm: disabling virtualization on CPU6
> [   41.491827] smpboot: CPU 6 is now offline
> [   41.497521] smpboot: Booting Node 0 Processor 6 APIC 0x5
> [   41.514983] kvm: enabling virtualization on CPU6
> [   41.561643] kvm: disabling virtualization on CPU6
> [   41.566848] smpboot: CPU 6 is now offline
> [   41.572606] smpboot: Booting Node 0 Processor 6 APIC 0x5
> [   41.590049] kvm: enabling virtualization on CPU6
> [   41.636817] kvm: disabling virtualization on CPU6
> [   41.642575] smpboot: CPU 6 is now offline
> [   41.676812] kvm: disabling virtualization on CPU7
> [   41.682429] smpboot: CPU 7 is now offline
> [   41.687974] smpboot: Booting Node 0 Processor 7 APIC 0x7
> [   41.705416] kvm: enabling virtualization on CPU7
> [   41.752739] kvm: disabling virtualization on CPU7
> [   41.758455] smpboot: CPU 7 is now offline
> [   41.764089] smpboot: Booting Node 0 Processor 7 APIC 0x7
> [   41.781561] kvm: enabling virtualization on CPU7
> [   41.831610] kvm: disabling virtualization on CPU7
> [   41.837280] smpboot: CPU 7 is now offline
>
> [   41.843341] ==
> [   41.849561] [ INFO: possible circular locking dependency detected ]
> [   41.855883] 4.1.0-rc1+ #2 Not tainted
> [   41.859564] ---
> [   41.865871] script2.sh/2071 is trying to acquire lock:
> [   41.871044]  (cpu_hotplug.lock){++}, at: [] 
> get_online_cpus+0x32/0x80
> [   41.879521]
> but task is already holding lock:
> [   41.885392]  (s_active#121){.+}, at: [] 
> kernfs_fop_write+0x6e/0x1a0
> [   41.893695]
> which lock already depends on the new lock.
>
> [   41.901925]
> the existing dependency chain (in reverse order) is:
> [   41.909465]
> -> #2 (s_active#121){.+}:
> [   41.913739][] lock_acquire+0xd1/0x2b0
> [   41.919718][] __kernfs_remove+0x228/0x300
> [   41.926046][] kernfs_remove_by_name_ns+0x49/0xb0
> [   41.932976][] sysfs_remove_file_ns+0x15/0x20
> [   41.939552][] device_remove_file+0x19/0x20
> [   41.945968][] mce_device_remove+0x54/0xd0
> [   41.952284][] mce_cpu_callback+0x69/0x120
> [   41.958608][] notifier_call_chain+0x66/0x90
> [   41.965124][] __raw_notifier_call_chain+0xe/0x10
> [   41.972053][] cpu_notify+0x23/0x50
> [   41.977761][] cpu_notify_nofail+0xe/0x20
> [   41.983986][] _cpu_down+0x1b6/0x2d0
> [   41.989787][] cpu_down+0x36/0x50
> [   41.995324][] cpu_subsys_offline+0x14/0x20
> [   42.001734][] device_offline+0x95/0xc0
> [   42.007797][] online_store+0x3d/0x90
> [   42.013673][] dev_attr_store+0x18/0x30
> [   42.019735][] sysfs_kf_write+0x49/0x60
> [   42.025796][] kernfs_fop_write+0x140/0x1a0
> [   42.032211][] __vfs_write+0x28/0xf0
> [   42.038013]   

Re: [PATCH] Timer: fix a race condition between init_timers_cpu() and get_next_timer_interrupt()

2015-04-30 Thread Ethan Zhao
This patches works with 4.0.1, but doesn't work with 4.1-rc1+

Thanks,
Ethan

On Wed, Apr 29, 2015 at 10:58 PM, Ethan Zhao  wrote:
> while testing CPU hotplug and MCE with following two scripts,
>
> script 1:
>
>  for i in {1..30}; do while :; do  ((a=$RANDOM%160)); echo 0  >>
>  /sys/devices/system/cpu/cpu${i}/online; echo 1 >>
>  /sys/devices/system/cpu/cpu${i}/online; done & done
>
> script 2:
>
>  while :; do for i in $(ls
>  /sys/devices/system/machinecheck/machinecheck*/check_interval); do echo 1  >>
>  $i; done; done
>
> We got panic call trace as:
>
> sh> bt
> PID: 0  TASK: 881028e28080  CPU: 14  COMMAND: "swapper/14"
>  #0 [881028e2ba90] machine_kexec at 810402aa
>  #1 [881028e2baf8] crash_kexec at 810c4ea4
>  #2 [881028e2bbc0] oops_end at 81575c50
>  #3 [881028e2bbe8] no_context at 8156a8f9
>  #4 [881028e2bc30] __bad_area_nosemaphore at 8156a979
>  #5 [881028e2bc78] bad_area_nosemaphore at 8156aae3
>  #6 [881028e2bc88] __do_page_fault at 8157852d
>  #7 [881028e2bd80] do_page_fault at 8157896e
>  #8 [881028e2bd90] page_fault at 815750d8
> [exception RIP: get_next_timer_interrupt+344]
> RIP: 8106c228  RSP: 881028e2be48  RFLAGS: 00010013
> RAX:   RBX: 0001006d7a29  RCX: 0001006d7dee
> RDX: 0001  RSI: 88602978d3f8  RDI: 88602978d028
> RBP: 881028e2be90   R8: 003d   R9: 003b
> R10: 01006d7b  R11: 881028e2be50  R12: 0001006d7dee
> R13: 0001406d7a28  R14: 88602978c000  R15: 881028e2be68
> ORIG_RAX:   CS: 0010  SS: 0018
>  #9 [881028e2be40] get_next_timer_interrupt at 8106c120
>
> This panic (NULL pointer dereference) was caused by race condition between
> init_timers_cpu() and get_next_timer_interrupt(), there is no protection
> with lock when does initialization in function init_timers_cpu().
>
> The two threads cause the race condition are shown as following:
>
> Thread A:
>   store_online()
> cpu_up()
>   __cpu_notify(CPU_UP_PREPARE...)
> timer_cpu_notify()
>  CPU_UP_PREPARE: init_timers_cpu()
>  {
>  ...
>  INIT_LIST_HEAD(base->tv5.vec + j);
>  ...
>  }
> Thread B:
>   tick_nohz_idle_enter()
> __tick_nohz_idle_enter()
>  get_next_timer_interrupt()
>__next_timer_interrupt()
>{
>...
>list_for_each_entry(nte, varp->vec + slot, entry) {
>if (tbase_get_deferrable(nte->base))
>...
>}
>
> This bug will affect stable branch 4.0, 3.8, 3.19, I didn't check other 
> branches.
> The patch has been tested and verfied on stable 4.0.
>
> Reported-by: Tim Uglow 
> Signed-off-by: Ethan Zhao 
> ---
>  kernel/time/timer.c | 4 
>  1 file changed, 4 insertions(+)
>
> diff --git a/kernel/time/timer.c b/kernel/time/timer.c
> index 2d3f5c5..cc1cf35 100644
> --- a/kernel/time/timer.c
> +++ b/kernel/time/timer.c
> @@ -1573,6 +1573,9 @@ static int init_timers_cpu(int cpu)
> base = per_cpu(tvec_bases, cpu);
> }
>
> +   if (unlikely(!base))
> +   return -EINVAL;
> +   spin_lock_irq(&base->lock);
>
> for (j = 0; j < TVN_SIZE; j++) {
> INIT_LIST_HEAD(base->tv5.vec + j);
> @@ -1587,6 +1590,7 @@ static int init_timers_cpu(int cpu)
> base->next_timer = base->timer_jiffies;
> base->active_timers = 0;
> base->all_timers = 0;
> +   spin_unlock_irq(&base->lock);
> return 0;
>  }
>
> --
> 1.8.3.1
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86/mce: fix mce_restart() race with CPU hotplug operation

2015-04-30 Thread Ethan Zhao
On Fri, May 1, 2015 at 12:29 AM, Borislav Petkov  wrote:
> On Thu, Apr 30, 2015 at 12:04:53AM +0900, Ethan Zhao wrote:
>> while testing CPU hotplug and MCE with following two scripts,
>>
>> script 1:
>>
>>  for i in {1..30}; do while :; do  ((a=$RANDOM%160)); echo 0  >>
>>  /sys/devices/system/cpu/cpu${i}/online; echo 1 >>
>>  /sys/devices/system/cpu/cpu${i}/online; done & done
>>
>> script 2:
>>
>>  while :; do for i in $(ls
>>  /sys/devices/system/machinecheck/machinecheck*/check_interval); do echo 1  
>> >>
>>  $i; done; done
>
> For the record, it is a public secret that CPU hotplug is broken. IOW,
> you're wasting your time with those senseless pounder tests but ok.
>
> ...
>
>> ---
>>  arch/x86/kernel/cpu/mcheck/mce.c | 4 
>>  1 file changed, 4 insertions(+)
>>
>> diff --git a/arch/x86/kernel/cpu/mcheck/mce.c 
>> b/arch/x86/kernel/cpu/mcheck/mce.c
>> index 3c036cb..fcc2794 100644
>> --- a/arch/x86/kernel/cpu/mcheck/mce.c
>> +++ b/arch/x86/kernel/cpu/mcheck/mce.c
>> @@ -1338,8 +1338,10 @@ static void mce_timer_delete_all(void)
>>  {
>>   int cpu;
>>
>> + get_online_cpus();
>>   for_each_online_cpu(cpu)
>>   del_timer_sync(&per_cpu(mce_timer, cpu));
>> + put_online_cpus();
>>  }
>>
>>  static void mce_do_trigger(struct work_struct *work)
>> @@ -2085,7 +2087,9 @@ static void mce_cpu_restart(void *data)
>>  static void mce_restart(void)
>>  {
>>   mce_timer_delete_all();
>> + get_online_cpus();
>>   on_each_cpu(mce_cpu_restart, NULL, 1);
>> + put_online_cpus();
>
> With your patch applied I get on 4.1-rc1+:

 I didn't test it with 4.1-rc1+  yet.
 Let' us check it.

Thanks,
Ethan
>
> ---
> [   41.364909] kvm: disabling virtualization on CPU1
> [   41.371083] smpboot: CPU 1 is now offline
> [   41.381190] x86: Booting SMP configuration:
> [   41.385405] smpboot: Booting Node 0 Processor 1 APIC 0x2
> [   41.402901] kvm: enabling virtualization on CPU1
> [   41.440944] kvm: disabling virtualization on CPU1
> [   41.447010] smpboot: CPU 1 is now offline
> [   41.486082] kvm: disabling virtualization on CPU6
> [   41.491827] smpboot: CPU 6 is now offline
> [   41.497521] smpboot: Booting Node 0 Processor 6 APIC 0x5
> [   41.514983] kvm: enabling virtualization on CPU6
> [   41.561643] kvm: disabling virtualization on CPU6
> [   41.566848] smpboot: CPU 6 is now offline
> [   41.572606] smpboot: Booting Node 0 Processor 6 APIC 0x5
> [   41.590049] kvm: enabling virtualization on CPU6
> [   41.636817] kvm: disabling virtualization on CPU6
> [   41.642575] smpboot: CPU 6 is now offline
> [   41.676812] kvm: disabling virtualization on CPU7
> [   41.682429] smpboot: CPU 7 is now offline
> [   41.687974] smpboot: Booting Node 0 Processor 7 APIC 0x7
> [   41.705416] kvm: enabling virtualization on CPU7
> [   41.752739] kvm: disabling virtualization on CPU7
> [   41.758455] smpboot: CPU 7 is now offline
> [   41.764089] smpboot: Booting Node 0 Processor 7 APIC 0x7
> [   41.781561] kvm: enabling virtualization on CPU7
> [   41.831610] kvm: disabling virtualization on CPU7
> [   41.837280] smpboot: CPU 7 is now offline
>
> [   41.843341] ==
> [   41.849561] [ INFO: possible circular locking dependency detected ]
> [   41.855883] 4.1.0-rc1+ #2 Not tainted
> [   41.859564] ---
> [   41.865871] script2.sh/2071 is trying to acquire lock:
> [   41.871044]  (cpu_hotplug.lock){++}, at: [] 
> get_online_cpus+0x32/0x80
> [   41.879521]
> but task is already holding lock:
> [   41.885392]  (s_active#121){.+}, at: [] 
> kernfs_fop_write+0x6e/0x1a0
> [   41.893695]
> which lock already depends on the new lock.
>
> [   41.901925]
> the existing dependency chain (in reverse order) is:
> [   41.909465]
> -> #2 (s_active#121){.+}:
> [   41.913739][] lock_acquire+0xd1/0x2b0
> [   41.919718][] __kernfs_remove+0x228/0x300
> [   41.926046][] kernfs_remove_by_name_ns+0x49/0xb0
> [   41.932976][] sysfs_remove_file_ns+0x15/0x20
> [   41.939552][] device_remove_file+0x19/0x20
> [   41.945968][] mce_device_remove+0x54/0xd0
> [   41.952284][] mce_cpu_callback+0x69/0x120
> [   41.958608][] notifier_call_chain+0x66/0x90
> [   41.965124][] __raw_notifier_call_chain+0xe/0x10
> [   41.972053][] cpu_notify+0x23/0x50
> [   41.977761][] cpu_notify_nofail+0xe/0x20
> [   41.983986][] _cpu_down+0x1b6/0x2d0
> [   41.989787][] cpu_down+0x36/0x50
> [   41.995324][] cpu_subsys_offline+0x14/0x20
> [   42.001734][] device_offline+0x95/0xc0
> [   42.007797][] online_store+0x3d/0x90
> [   42.013673][] dev_attr_store+0x18/0x30
> [   42.019735][] sysfs_kf_write+0x49/0x60
> [   42.025796][] kernfs_fop_write+0x140/0x1a0
> [   42.032211][] __vfs_write+0x28/0xf0
> [   42.038013][] vfs_write+0xa9/0x1b0
> [   42.043715][] SyS_write+0x49/0

Regression: Disk corruption with dm-crypt and kernels >= 4.0

2015-04-30 Thread Abelardo Ricart III
I made sure to run a completely vanilla kernel when testing why I was suddenly
seeing some nasty libata errors with all kernels >= v4.0. Here's a snippet:

>8
[  165.592136] ata5.00: exception Emask 0x60 SAct 0x7000 SErr 0x800 action 0x6
frozen
[  165.592140] ata5.00: irq_stat 0x2000, host bus error
[  165.592143] ata5: SError: { HostInt }
[  165.592145] ata5.00: failed command: READ FPDMA QUEUED
[  165.592149] ata5.00: cmd 60/08:60:a0:0d:89/00:00:07:00:00/40 tag 12 ncq 4096
in
res 40/00:74:40:58:5d/00:00:00:00:00/40 Emask 0x60
(host bus error)
[  165.592151] ata5.00: status: { DRDY }
>8

After a few dozen of these errors, I'd suddenly find my system in read-only
mode with corrupted files throughout my encrypted filesystems (seemed like
either a read or a write would corrupt a file, though I could be mistaken). I
decided to do a git bisect with a random read-write-sync test to narrow down
the culprit, which turned out to be this commit (part of a series):

# first bad commit: [cf2f1abfbd0dba701f7f16ef619e4d2485de3366] dm crypt: don't
allocate pages for a partial request

Just to be sure, I created a patch to revert the entire nine patch series that
commit belonged to... and the bad behavior disappeared. I've now been running
kernel 4.0 for a few days without issue, and went so far as to stress test my
poor SSD for a few hours to be 100% positive.

Here's some more info on my setup.

>8
$ lsblk -f
NAME FSTYPE  LABEL MOUNTPOINT
sda  
ā”œā”€sda1   vfat  /boot/EFI
ā”œā”€sda2   ext4  /boot
ā””ā”€sda3   LVM2_member
  ā”œā”€SSD-root crypto_LUKS
  ā”‚ ā””ā”€root   f2fs  /
  ā””ā”€SSD-home crypto_LUKS
ā””ā”€home   f2fs  /home

$ cat /proc/cmdline
BOOT_IMAGE=/vmlinuz-linux-memnix cryptdevice=/dev/SSD/root:root:allow-discards
root=/dev/mapper/root acpi_osi=Linux security=tomoyo
TOMOYO_trigger=/usr/lib/systemd/systemd intel_iommu=on
modprobe.blacklist=nouveau rw quiet

$ cat /etc/lvm/lvm.conf | grep "issue_discards"
issue_discards = 1
>8

If there's anything else I can do to help diagnose the underlying problem, I'm
more than willing.

Thanks,

Abelardo Ricart.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH V2 1/2] mm/thp: Use new functions to clear pmd on splitting and collapse

2015-04-30 Thread Aneesh Kumar K.V
Some arch may require an explicit IPI before a THP PMD split or
collapse. This enable us to use local_irq_disable to prevent
a parallel THP PMD split or collapse.

Signed-off-by: Aneesh Kumar K.V 
---
 include/asm-generic/pgtable.h | 32 
 mm/huge_memory.c  |  9 +
 2 files changed, 37 insertions(+), 4 deletions(-)

diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h
index fe617b7e4be6..e95c697bef25 100644
--- a/include/asm-generic/pgtable.h
+++ b/include/asm-generic/pgtable.h
@@ -184,6 +184,38 @@ static inline void pmdp_set_wrprotect(struct mm_struct *mm,
 #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
 #endif
 
+#ifndef __HAVE_ARCH_PMDP_SPLITTING_FLUSH_NOTIFY
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+#define pmdp_splitting_flush_notify pmdp_clear_flush_notify
+#else
+static inline void pmdp_splitting_flush_notify(struct vm_area_struct *vma,
+  unsigned long address,
+  pmd_t *pmdp)
+{
+   BUILD_BUG();
+}
+#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
+#endif
+
+#ifndef __HAVE_ARCH_PMDP_COLLAPSE_FLUSH
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+static inline pmd_t pmdp_collapse_flush(struct vm_area_struct *vma,
+  unsigned long address,
+  pmd_t *pmdp)
+{
+   return pmdp_clear_flush(vma, address, pmdp);
+}
+#else
+static inline pmd_t pmdp_collapse_flush(struct vm_area_struct *vma,
+  unsigned long address,
+  pmd_t *pmdp)
+{
+   BUILD_BUG();
+   return __pmd(0);
+}
+#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
+#endif
+
 #ifndef __HAVE_ARCH_PGTABLE_DEPOSIT
 extern void pgtable_trans_huge_deposit(struct mm_struct *mm, pmd_t *pmdp,
   pgtable_t pgtable);
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index cce4604c192f..30c1b46fcf6d 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2187,7 +2187,7 @@ static void collapse_huge_page(struct mm_struct *mm,
 * huge and small TLB entries for the same virtual address
 * to avoid the risk of CPU bugs in that area.
 */
-   _pmd = pmdp_clear_flush(vma, address, pmd);
+   _pmd = pmdp_collapse_flush(vma, address, pmd);
spin_unlock(pmd_ptl);
mmu_notifier_invalidate_range_end(mm, mmun_start, mmun_end);
 
@@ -2606,9 +2606,10 @@ static void __split_huge_pmd_locked(struct 
vm_area_struct *vma, pmd_t *pmd,
 
write = pmd_write(*pmd);
young = pmd_young(*pmd);
-
-   /* leave pmd empty until pte is filled */
-   pmdp_clear_flush_notify(vma, haddr, pmd);
+   /*
+* leave pmd empty until pte is filled.
+*/
+   pmdp_splitting_flush_notify(vma, haddr, pmd);
 
pgtable = pgtable_trans_huge_withdraw(mm, pmd);
pmd_populate(mm, &_pmd, pgtable);
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH V2 2/2] powerpc/thp: Remove _PAGE_SPLITTING and related code

2015-04-30 Thread Aneesh Kumar K.V
With the new thp refcounting we don't need to mark the PMD splitting.
Drop the code to handle this.

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/kvm_book3s_64.h |   6 --
 arch/powerpc/include/asm/pgtable-ppc64.h |  29 ++--
 arch/powerpc/mm/hugepage-hash64.c|   3 -
 arch/powerpc/mm/hugetlbpage.c|   2 +-
 arch/powerpc/mm/pgtable_64.c | 111 ---
 mm/gup.c |   2 +-
 6 files changed, 52 insertions(+), 101 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h 
b/arch/powerpc/include/asm/kvm_book3s_64.h
index 2d81e202bdcc..9a96fe3caa48 100644
--- a/arch/powerpc/include/asm/kvm_book3s_64.h
+++ b/arch/powerpc/include/asm/kvm_book3s_64.h
@@ -298,12 +298,6 @@ static inline pte_t kvmppc_read_update_linux_pte(pte_t 
*ptep, int writing,
cpu_relax();
continue;
}
-#ifdef CONFIG_TRANSPARENT_HUGEPAGE
-   /* If hugepage and is trans splitting return None */
-   if (unlikely(hugepage &&
-pmd_trans_splitting(pte_pmd(old_pte
-   return __pte(0);
-#endif
/* If pte is not present return None */
if (unlikely(!(old_pte & _PAGE_PRESENT)))
return __pte(0);
diff --git a/arch/powerpc/include/asm/pgtable-ppc64.h 
b/arch/powerpc/include/asm/pgtable-ppc64.h
index 843cb35e6add..655dde8e9683 100644
--- a/arch/powerpc/include/asm/pgtable-ppc64.h
+++ b/arch/powerpc/include/asm/pgtable-ppc64.h
@@ -361,11 +361,6 @@ void pgtable_cache_init(void);
 #endif /* __ASSEMBLY__ */
 
 /*
- * THP pages can't be special. So use the _PAGE_SPECIAL
- */
-#define _PAGE_SPLITTING _PAGE_SPECIAL
-
-/*
  * We need to differentiate between explicit huge page and THP huge
  * page, since THP huge page also need to track real subpage details
  */
@@ -375,8 +370,7 @@ void pgtable_cache_init(void);
  * set of bits not changed in pmd_modify.
  */
 #define _HPAGE_CHG_MASK (PTE_RPN_MASK | _PAGE_HPTEFLAGS |  \
-_PAGE_DIRTY | _PAGE_ACCESSED | _PAGE_SPLITTING | \
-_PAGE_THP_HUGE)
+_PAGE_DIRTY | _PAGE_ACCESSED | _PAGE_THP_HUGE)
 
 #ifndef __ASSEMBLY__
 /*
@@ -458,13 +452,6 @@ static inline int pmd_trans_huge(pmd_t pmd)
return (pmd_val(pmd) & 0x3) && (pmd_val(pmd) & _PAGE_THP_HUGE);
 }
 
-static inline int pmd_trans_splitting(pmd_t pmd)
-{
-   if (pmd_trans_huge(pmd))
-   return pmd_val(pmd) & _PAGE_SPLITTING;
-   return 0;
-}
-
 extern int has_transparent_hugepage(void);
 #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
 
@@ -517,12 +504,6 @@ static inline pmd_t pmd_mknotpresent(pmd_t pmd)
return pmd;
 }
 
-static inline pmd_t pmd_mksplitting(pmd_t pmd)
-{
-   pmd_val(pmd) |= _PAGE_SPLITTING;
-   return pmd;
-}
-
 #define __HAVE_ARCH_PMD_SAME
 static inline int pmd_same(pmd_t pmd_a, pmd_t pmd_b)
 {
@@ -577,8 +558,12 @@ static inline void pmdp_set_wrprotect(struct mm_struct 
*mm, unsigned long addr,
pmd_hugepage_update(mm, addr, pmdp, _PAGE_RW, 0);
 }
 
-#define __HAVE_ARCH_PMDP_SPLITTING_FLUSH
-extern void pmdp_splitting_flush(struct vm_area_struct *vma,
+#define __HAVE_ARCH_PMDP_SPLITTING_FLUSH_NOTIFY
+extern void pmdp_splitting_flush_notify(struct vm_area_struct *vma,
+   unsigned long address, pmd_t *pmdp);
+
+#define __HAVE_ARCH_PMDP_COLLAPSE_FLUSH
+extern pmd_t pmdp_collapse_flush(struct vm_area_struct *vma,
 unsigned long address, pmd_t *pmdp);
 
 #define __HAVE_ARCH_PGTABLE_DEPOSIT
diff --git a/arch/powerpc/mm/hugepage-hash64.c 
b/arch/powerpc/mm/hugepage-hash64.c
index 86686514ae13..078f7207afd2 100644
--- a/arch/powerpc/mm/hugepage-hash64.c
+++ b/arch/powerpc/mm/hugepage-hash64.c
@@ -39,9 +39,6 @@ int __hash_page_thp(unsigned long ea, unsigned long access, 
unsigned long vsid,
/* If PMD busy, retry the access */
if (unlikely(old_pmd & _PAGE_BUSY))
return 0;
-   /* If PMD is trans splitting retry the access */
-   if (unlikely(old_pmd & _PAGE_SPLITTING))
-   return 0;
/* If PMD permissions don't match, take page fault */
if (unlikely(access & ~old_pmd))
return 1;
diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c
index f30ae0f7f570..dfd7db0cfbee 100644
--- a/arch/powerpc/mm/hugetlbpage.c
+++ b/arch/powerpc/mm/hugetlbpage.c
@@ -1008,7 +1008,7 @@ pte_t *find_linux_pte_or_hugepte(pgd_t *pgdir, unsigned 
long ea, unsigned *shift
 * hpte invalidate
 *
 */
-   if (pmd_none(pmd) || pmd_trans_splitting(pmd))
+   if (pmd_none(pmd))
return NULL;
 
  

[PATCH V2 0/2] Remove _PAGE_SPLITTING from ppc64

2015-04-30 Thread Aneesh Kumar K.V
The changes are on top of what is posted  at

 
http://mid.gmane.org/1429823043-157133-1-git-send-email-kirill.shute...@linux.intel.com

 git://git.kernel.org/pub/scm/linux/kernel/git/kas/linux.git thp/refcounting/v5

Changes from V1:
* Fold part of patch 3 to 1 and 2
* Drop patch 3.
* Make generic version of pmdp_splitting_flush_notify inline.

Aneesh Kumar K.V (2):
  mm/thp: Use new functions to clear pmd on splitting and collapse
  powerpc/thp: Remove _PAGE_SPLITTING and related code

 arch/powerpc/include/asm/kvm_book3s_64.h |   6 --
 arch/powerpc/include/asm/pgtable-ppc64.h |  29 ++--
 arch/powerpc/mm/hugepage-hash64.c|   3 -
 arch/powerpc/mm/hugetlbpage.c|   2 +-
 arch/powerpc/mm/pgtable_64.c | 111 ---
 include/asm-generic/pgtable.h|  32 +
 mm/gup.c |   2 +-
 mm/huge_memory.c |   9 +--
 8 files changed, 89 insertions(+), 105 deletions(-)

-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v6 4/4] clk: dt: Introduce binding for always-on clock support

2015-04-30 Thread Sascha Hauer
On Thu, Apr 30, 2015 at 10:57:22AM +0100, Lee Jones wrote:
> On Wed, 29 Apr 2015, Maxime Ripard wrote:
> 
> > On Wed, Apr 29, 2015 at 03:17:51PM +0100, Lee Jones wrote:
> > > On Wed, 22 Apr 2015, Maxime Ripard wrote:
> > > 
> > > > On Wed, Apr 08, 2015 at 06:23:44PM +0100, Lee Jones wrote:
> > > > > On Wed, 08 Apr 2015, Maxime Ripard wrote:
> > > > > 
> > > > > > On Wed, Apr 08, 2015 at 11:38:32AM +0100, Lee Jones wrote:
> > > > > > > On Wed, 08 Apr 2015, Maxime Ripard wrote:
> > > > > > > 
> > > > > > > > On Wed, Apr 08, 2015 at 09:14:50AM +0100, Lee Jones wrote:
> > > > > > > > > > > +
> > > > > > > > > > > + This property is not to be abused.  It is 
> > > > > > > > > > > only to be used to
> > > > > > > > > > > + protect platforms from being crippled by 
> > > > > > > > > > > gated clocks, not
> > > > > > > > > > > + as a convenience function to avoid using 
> > > > > > > > > > > the framework
> > > > > > > > > > > + correctly inside device drivers.
> > > > > > > > > > 
> > > > > > > > > > Disregarding what's stated here, I'm pretty sure that this 
> > > > > > > > > > will
> > > > > > > > > > actually happen. Where do you place the cursor?
> > > > > > > > > 
> > > > > > > > > That's up to Mike.
> > > > > > > > 
> > > > > > > > Except that Mike won't review any of the DT changes, so he 
> > > > > > > > won't be
> > > > > > > > able to refrain users from using it. Let alone out-of-tree DTs 
> > > > > > > > using a
> > > > > > > > mainline kernel.
> > > > > > > 
> > > > > > > Ideally Mike should be Cc'ed on patches using clock bindings, but 
> > > > > > > if
> > > > > > > he isn't the DT guys are smart enough to either make the right
> > > > > > > decisions themselves (Rob has Acked these bindings already, so 
> > > > > > > will be
> > > > > > > on the lookout for misuse, I'm sure), or ask for Mike's help.
> > > > > > 
> > > > > > Yeah, right, as if this strategy really worked in the past
> > > > > > 
> > > > > > Do we really want to look at even the DT bindings that have actually
> > > > > > been reviewed by maintainers that got merged?
> > > > > > 
> > > > > > They don't have time for that, which is totally fine, but we really
> > > > > > should bury our head in the sand by actually thinking they will 
> > > > > > review
> > > > > > every single DT-related patch.
> > > > > > 
> > > > > > Using that as an argument is just plain denial of what really 
> > > > > > happened
> > > > > > for the past 4 years.
> > > > > 
> > > > > I agree that it's a problem, but this is a process problem and has
> > > > > nothing to do with this set.  If you have a problem with the current
> > > > > process and have a better alternative, submit your thoughts to the DT
> > > > > list.  Rejecting all new bindings because you are frightened that they
> > > > > will be used in a manner that they were not intended is not the way to
> > > > > go though.
> > > > 
> > > > I'm not saying that this binding should not go in because of a process
> > > > issue.
> > > > 
> > > > I'm saying that discarding arguments against your binding by adding
> > > > restrictions that cannot be enforced is not reasonable.
> > > 
> > > I'm open to constructive suggestions/alternatives.
> > > 
> > > Hand rolling this stuff in C per vendor is not of of them.
> > 
> > I'm sorry, but ruling out alternatives that work for everyone (and
> > actually work better) just because you don't want to edit a C file is
> > not really constructive either.
> > 
> > > > > > > > > > Should we create a new driver for our RAM controller, or do 
> > > > > > > > > > we want to
> > > > > > > > > > use clock-always-on?
> > > > > > > > > 
> > > > > > > > > I would say that if all the driver did was to enable clocks, 
> > > > > > > > > then you
> > > > > > > > > should use this instead.  This binding was designed 
> > > > > > > > > specifically for
> > > > > > > > > that purpose.
> > > > > > > > > 
> > > > > > > > > However, if the aforementioned driver clock can be safely 
> > > > > > > > > gated, then
> > > > > > > > > it should not be an always-on clock.
> > > > > > > > 
> > > > > > > > Yeah, of course, I understand the original intent of it, but 
> > > > > > > > that
> > > > > > > > argument, which might very well be true at one point in time, 
> > > > > > > > might
> > > > > > > > not be true anymore two or three releases later.
> > > > > > > 
> > > > > > > Why?  The H/W isn't going to change in two or three releases.  The
> > > > > > > clocks designated as 'always-on' will have to be on forever, or
> > > > > > > synonymously, 'always'.
> > > > > > >
> > > > > > > > And that driver might actually rely on the fact that the clock 
> > > > > > > > is shut
> > > > > > > > down, which won't be the case.
> > > > > > > 
> > > > > > > I think you are missing the point of this binding.  The driver can
> > > > > > > never rely on that in this use-case.  If the clock is off, there 
> > > > > > > is no
> > > > > > > device driver, period

Re: [PATCH kernel v9 29/32] vfio: powerpc/spapr: Register memory and define IOMMU v2

2015-04-30 Thread David Gibson
On Fri, May 01, 2015 at 02:35:23PM +1000, Alexey Kardashevskiy wrote:
> On 04/30/2015 04:55 PM, David Gibson wrote:
> >On Sat, Apr 25, 2015 at 10:14:53PM +1000, Alexey Kardashevskiy wrote:
> >>The existing implementation accounts the whole DMA window in
> >>the locked_vm counter. This is going to be worse with multiple
> >>containers and huge DMA windows. Also, real-time accounting would requite
> >>additional tracking of accounted pages due to the page size difference -
> >>IOMMU uses 4K pages and system uses 4K or 64K pages.
> >>
> >>Another issue is that actual pages pinning/unpinning happens on every
> >>DMA map/unmap request. This does not affect the performance much now as
> >>we spend way too much time now on switching context between
> >>guest/userspace/host but this will start to matter when we add in-kernel
> >>DMA map/unmap acceleration.
> >>
> >>This introduces a new IOMMU type for SPAPR - VFIO_SPAPR_TCE_v2_IOMMU.
> >>New IOMMU deprecates VFIO_IOMMU_ENABLE/VFIO_IOMMU_DISABLE and introduces
> >>2 new ioctls to register/unregister DMA memory -
> >>VFIO_IOMMU_SPAPR_REGISTER_MEMORY and VFIO_IOMMU_SPAPR_UNREGISTER_MEMORY -
> >>which receive user space address and size of a memory region which
> >>needs to be pinned/unpinned and counted in locked_vm.
> >>New IOMMU splits physical pages pinning and TCE table update into 2 
> >>different
> >>operations. It requires 1) guest pages to be registered first 2) consequent
> >>map/unmap requests to work only with pre-registered memory.
> >>For the default single window case this means that the entire guest
> >>(instead of 2GB) needs to be pinned before using VFIO.
> >>When a huge DMA window is added, no additional pinning will be
> >>required, otherwise it would be guest RAM + 2GB.
> >>
> >>The new memory registration ioctls are not supported by
> >>VFIO_SPAPR_TCE_IOMMU. Dynamic DMA window and in-kernel acceleration
> >>will require memory to be preregistered in order to work.
> >>
> >>The accounting is done per the user process.
> >>
> >>This advertises v2 SPAPR TCE IOMMU and restricts what the userspace
> >>can do with v1 or v2 IOMMUs.
> >>
> >>Signed-off-by: Alexey Kardashevskiy 
> >>[aw: for the vfio related changes]
> >>Acked-by: Alex Williamson 
> >>---
> >>Changes:
> >>v9:
> >>* s/tce_get_hva_cached/tce_iommu_use_page_v2/
> >>
> >>v7:
> >>* now memory is registered per mm (i.e. process)
> >>* moved memory registration code to powerpc/mmu
> >>* merged "vfio: powerpc/spapr: Define v2 IOMMU" into this
> >>* limited new ioctls to v2 IOMMU
> >>* updated doc
> >>* unsupported ioclts return -ENOTTY instead of -EPERM
> >>
> >>v6:
> >>* tce_get_hva_cached() returns hva via a pointer
> >>
> >>v4:
> >>* updated docs
> >>* s/kzmalloc/vzalloc/
> >>* in tce_pin_pages()/tce_unpin_pages() removed @vaddr, @size and
> >>replaced offset with index
> >>* renamed vfio_iommu_type_register_memory to 
> >>vfio_iommu_spapr_register_memory
> >>and removed duplicating vfio_iommu_spapr_register_memory
> >>---
> >>  Documentation/vfio.txt  |  23 
> >>  drivers/vfio/vfio_iommu_spapr_tce.c | 230 
> >> +++-
> >>  include/uapi/linux/vfio.h   |  27 +
> >>  3 files changed, 274 insertions(+), 6 deletions(-)
> >>
> >>diff --git a/Documentation/vfio.txt b/Documentation/vfio.txt
> >>index 96978ec..94328c8 100644
> >>--- a/Documentation/vfio.txt
> >>+++ b/Documentation/vfio.txt
> >>@@ -427,6 +427,29 @@ The code flow from the example above should be 
> >>slightly changed:
> >>
> >>
> >>
> >>+5) There is v2 of SPAPR TCE IOMMU. It deprecates VFIO_IOMMU_ENABLE/
> >>+VFIO_IOMMU_DISABLE and implements 2 new ioctls:
> >>+VFIO_IOMMU_SPAPR_REGISTER_MEMORY and VFIO_IOMMU_SPAPR_UNREGISTER_MEMORY
> >>+(which are unsupported in v1 IOMMU).
> >
> >A summary of the semantic differeces between v1 and v2 would be nice.
> >At this point it's not really clear to me if there's a case for
> >creating v2, or if this could just be done by adding (optional)
> >functionality to v1.
> 
> v1: memory preregistration is not supported; explicit enable/disable ioctls
> are required
> 
> v2: memory preregistration is required; explicit enable/disable are
> prohibited (as they are not needed).
> 
> Mixing these in one IOMMU type caused a lot of problems like should I
> increment locked_vm by the 32bit window size on enable() or not; what do I
> do about pages pinning when map/map (check if it is from registered memory
> and do not pin?).
> 
> Having 2 IOMMU models makes everything a lot simpler.

Ok.  Would it simplify it further if you made v2 only usable on IODA2
hardware?

> >>+PPC64 paravirtualized guests generate a lot of map/unmap requests,
> >>+and the handling of those includes pinning/unpinning pages and updating
> >>+mm::locked_vm counter to make sure we do not exceed the rlimit.
> >>+The v2 IOMMU splits accounting and pinning into separate operations:
> >>+
> >>+- VFIO_IOMMU_SPAPR_REGISTER_MEMORY/VFIO_IOMMU_SPAPR_UNREGISTER_MEMORY 
> >>ioctls
> >

Re: [PATCH kernel v9 27/32] powerpc/iommu/ioda2: Add get_table_size() to calculate the size of future table

2015-04-30 Thread David Gibson
On Fri, May 01, 2015 at 02:10:58PM +1000, Alexey Kardashevskiy wrote:
> On 04/29/2015 04:40 PM, David Gibson wrote:
> >On Sat, Apr 25, 2015 at 10:14:51PM +1000, Alexey Kardashevskiy wrote:
> >>This adds a way for the IOMMU user to know how much a new table will
> >>use so it can be accounted in the locked_vm limit before allocation
> >>happens.
> >>
> >>This stores the allocated table size in pnv_pci_create_table()
> >>so the locked_vm counter can be updated correctly when a table is
> >>being disposed.
> >>
> >>This defines an iommu_table_group_ops callback to let VFIO know
> >>how much memory will be locked if a table is created.
> >>
> >>Signed-off-by: Alexey Kardashevskiy 
> >>---
> >>Changes:
> >>v9:
> >>* reimplemented the whole patch
> >>---
> >>  arch/powerpc/include/asm/iommu.h  |  5 +
> >>  arch/powerpc/platforms/powernv/pci-ioda.c | 14 
> >>  arch/powerpc/platforms/powernv/pci.c  | 36 
> >> +++
> >>  arch/powerpc/platforms/powernv/pci.h  |  2 ++
> >>  4 files changed, 57 insertions(+)
> >>
> >>diff --git a/arch/powerpc/include/asm/iommu.h 
> >>b/arch/powerpc/include/asm/iommu.h
> >>index 1472de3..9844c106 100644
> >>--- a/arch/powerpc/include/asm/iommu.h
> >>+++ b/arch/powerpc/include/asm/iommu.h
> >>@@ -99,6 +99,7 @@ struct iommu_table {
> >>unsigned long  it_size;  /* Size of iommu table in entries */
> >>unsigned long  it_indirect_levels;
> >>unsigned long  it_level_size;
> >>+   unsigned long  it_allocated_size;
> >>unsigned long  it_offset;/* Offset into global table */
> >>unsigned long  it_base;  /* mapped address of tce table */
> >>unsigned long  it_index; /* which iommu table this is */
> >>@@ -155,6 +156,10 @@ extern struct iommu_table *iommu_init_table(struct 
> >>iommu_table * tbl,
> >>  struct iommu_table_group;
> >>
> >>  struct iommu_table_group_ops {
> >>+   unsigned long (*get_table_size)(
> >>+   __u32 page_shift,
> >>+   __u64 window_size,
> >>+   __u32 levels);
> >>long (*create_table)(struct iommu_table_group *table_group,
> >>int num,
> >>__u32 page_shift,
> >>diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
> >>b/arch/powerpc/platforms/powernv/pci-ioda.c
> >>index e0be556..7f548b4 100644
> >>--- a/arch/powerpc/platforms/powernv/pci-ioda.c
> >>+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> >>@@ -2062,6 +2062,18 @@ static void pnv_pci_ioda2_setup_bypass_pe(struct 
> >>pnv_phb *phb,
> >>  }
> >>
> >>  #ifdef CONFIG_IOMMU_API
> >>+static unsigned long pnv_pci_ioda2_get_table_size(__u32 page_shift,
> >>+   __u64 window_size, __u32 levels)
> >>+{
> >>+   unsigned long ret = pnv_get_table_size(page_shift, window_size, levels);
> >>+
> >>+   if (!ret)
> >>+   return ret;
> >>+
> >>+   /* Add size of it_userspace */
> >>+   return ret + (window_size >> page_shift) * sizeof(unsigned long);
> >
> >This doesn't make much sense.  The userspace view can't possibly be a
> >property of the specific low-level IOMMU model.
> 
> 
> This it_userspace thing is all about memory preregistration.
> 
> I need some way to track how many actual mappings the
> mm_iommu_table_group_mem_t has in order to decide whether to allow
> unregistering or not.
> 
> When I clear TCE, I can read the old value which is host physical address
> which I cannot use to find the preregistered region and adjust the mappings
> counter; I can only use userspace addresses for this (not even guest
> physical addresses as it is VFIO and probably no KVM).
> 
> So I have to keep userspace addresses somewhere, one per IOMMU page, and the
> iommu_table seems a natural place for this.

Well.. sort of.  But as noted elsewhere this pulls VFIO specific
constraints into a platform code structure.  And whether you get this
table depends on the platform IOMMU type rather than on what VFIO
wants to do with it, which doesn't make sense.

What might make more sense is an opaque pointer io iommu_table for use
by the table "owner" (in the take_ownership sense).  The pointer would
be stored in iommu_table, but VFIO is responsible for populating and
managing its contents.

Or you could just put the userspace mappings in the container.
Although you might want a different data structure in that case.

The other thing to bear in mind is that registered regions are likely
to be large contiguous blocks in user addresses, though obviously not
contiguous in physical addr.  So you might be able to compaticfy this
information by storing it as a list of variable length blocks in
userspace address space, rather than a per-page address..



But.. isn't there a bigger problem here.  As Paulus was pointing out,
there's nothing guaranteeing the page tables continue to contain the
same page as was there at gup() time.

What's going to happen if you REGISTER a memory region, then mremap()
over it?  Then attempt to PUT_TCE a page in the region? O

Re: mmotm 2015-04-30-15-43 uploaded

2015-04-30 Thread Guenter Roeck
On Thu, Apr 30, 2015 at 03:44:10PM -0700, a...@linux-foundation.org wrote:
> The mm-of-the-moment snapshot 2015-04-30-15-43 has been uploaded to
> 
>http://www.ozlabs.org/~akpm/mmotm/
> 
> mmotm-readme.txt says
> 
> README for mm-of-the-moment:
> 
> http://www.ozlabs.org/~akpm/mmotm/
> 
> This is a snapshot of my -mm patch queue.  Uploaded at random hopefully
> more than once a week.
> 
My builders report lots of failures:

mm/bootmem.c: In function 'free_all_bootmem_core':
mm/bootmem.c:237:32: error: 'cur' undeclared (first use in this function)
mm/bootmem.c: In function 'mark_bootmem':
mm/bootmem.c:380:1: warning: control reaches end of non-void function 
[-Wreturn-type]

Guenter
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [1/4] watchdog: MAX63XX_WATCHDOG does not depend on ARM

2015-04-30 Thread Guenter Roeck
On Thu, Jan 29, 2015 at 12:15:42PM -0500, Vivien Didelot wrote:
> Remove the ARM Kconfig dependency since the Maxim MAX63xx devices are
> architecture independent.
> 
> Signed-off-by: Vivien Didelot 

Reviewed-by: Guenter Roeck 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 1/2] ACPI: activate&export acpi_os_get_physical_address

2015-04-30 Thread Matthew Garrett
On Fri, May 01, 2015 at 03:45:52AM +0200, Rafael J. Wysocki wrote:

> And I don't really understand the Matthew's comment regarding limiting
> operation regions to system memory.  This is about a specific operation
> region (which BTW only seems to be used as a means to access system memory
> at the location pointed to by the arg) in that particular method.

My feeling was that it really ought to have been the ACPI code dealing 
with this in some way, but having looked at it again I accept that this 
is really something that's limited by the vendor implementation. 
virt_to_phys() isn't the worst thing to do here.

-- 
Matthew Garrett | mj...@srcf.ucam.org
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH kernel v9 31/32] vfio: powerpc/spapr: Support multiple groups in one container if possible

2015-04-30 Thread David Gibson
On Fri, May 01, 2015 at 10:46:08AM +1000, Benjamin Herrenschmidt wrote:
> On Thu, 2015-04-30 at 19:33 +1000, Alexey Kardashevskiy wrote:
> > On 04/30/2015 05:22 PM, David Gibson wrote:
> > > On Sat, Apr 25, 2015 at 10:14:55PM +1000, Alexey Kardashevskiy wrote:
> > >> At the moment only one group per container is supported.
> > >> POWER8 CPUs have more flexible design and allows naving 2 TCE tables per
> > >> IOMMU group so we can relax this limitation and support multiple groups
> > >> per container.
> > >
> > > It's not obvious why allowing multiple TCE tables per PE has any
> > > pearing on allowing multiple groups per container.
> > 
> > 
> > This patchset is a global TCE tables rework (patches 1..30, roughly) with 2 
> > outcomes:
> > 1. reusing the same IOMMU table for multiple groups - patch 31;
> > 2. allowing dynamic create/remove of IOMMU tables - patch 32.
> > 
> > I can remove this one from the patchset and post it separately later but 
> > since 1..30 aim to support both 1) and 2), I'd think I better keep them all 
> > together (might explain some of changes I do in 1..30).
> 
> I think you are talking past each other :-)
> 
> But yes, having 2 tables per group is orthogonal to the ability of
> having multiple groups per container.
> 
> The latter is made possible on P8 in large part because each PE has its
> own DMA address space (unlike P5IOC2 or P7IOC where a single address
> space is segmented).
> 
> Also, on P8 you can actually make the TVT entries point to the same
> table in memory, thus removing the need to duplicate the actual
> tables (though you still have to duplicate the invalidations). I would
> however recommend only sharing the table that way within a chip/node.
> 
>  .../..
> 
> > >>
> > >> -1) Only one IOMMU group per container is supported as an IOMMU group
> > >> -represents the minimal entity which isolation can be guaranteed for and
> > >> -groups are allocated statically, one per a Partitionable Endpoint (PE)
> > >> +1) On older systems (POWER7 with P5IOC2/IODA1) only one IOMMU group per
> > >> +container is supported as an IOMMU table is allocated at the boot time,
> > >> +one table per a IOMMU group which is a Partitionable Endpoint (PE)
> > >>   (PE is often a PCI domain but not always).
> 
> > > I thought the more fundamental problem was that different PEs tended
> > > to use disjoint bus address ranges, so even by duplicating put_tce
> > > across PEs you couldn't have a common address space.
> 
> Yes. This is the problem with P7IOC and earlier. It *could* be doable on
> P7IOC by making them the same PE but let's not go there.
> 
> > Sorry, I am not following you here.
> > 
> > By duplicating put_tce, I can have multiple IOMMU groups on the same 
> > virtual PHB in QEMU, "[PATCH qemu v7 04/14] spapr_pci_vfio: Enable multiple 
> > groups per container" does this, the address ranges will the same.
> 
> But that is only possible on P8 because only there do we have separate
> address spaces between PEs.
> 
> > What I cannot do on p5ioc2 is programming the same table to multiple 
> > physical PHBs (or I could but it is very different than IODA2 and pretty 
> > ugly and might not always be possible because I would have to allocate 
> > these pages from some common pool and face problems like fragmentation).
> 
> And P7IOC has a similar issue. The DMA address top bits indexes the
> window on P7IOC within a shared address space. It's possible to
> configure a TVT to cover multiple devices but with very serious
> limitations.

Ok.  To check my understanding does this sound reasonable:

  * The table_group more-or-less represents a PE, but in a way you can
reference without first knowing the specific IOMMU hardware type.

  * When attaching multiple groups to the same container, the first PE
(i.e. table_group) attached is used as a representative so that
subsequent groups can be checked for compatibility with the first
PE and therefore all PEs currently included in the container

 - This is why the table_group appears in some places where it
   doesn't seem sensible from a pure object ownership point of
   view

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


pgpbWcD7IlSwf.pgp
Description: PGP signature


Re: [PATCH kernel v9 26/32] powerpc/iommu: Add userspace view of TCE table

2015-04-30 Thread David Gibson
On Fri, May 01, 2015 at 02:01:17PM +1000, Alexey Kardashevskiy wrote:
> On 04/29/2015 04:31 PM, David Gibson wrote:
> >On Sat, Apr 25, 2015 at 10:14:50PM +1000, Alexey Kardashevskiy wrote:
> >>In order to support memory pre-registration, we need a way to track
> >>the use of every registered memory region and only allow unregistration
> >>if a region is not in use anymore. So we need a way to tell from what
> >>region the just cleared TCE was from.
> >>
> >>This adds a userspace view of the TCE table into iommu_table struct.
> >>It contains userspace address, one per TCE entry. The table is only
> >>allocated when the ownership over an IOMMU group is taken which means
> >>it is only used from outside of the powernv code (such as VFIO).
> >>
> >>Signed-off-by: Alexey Kardashevskiy 
> >>---
> >>Changes:
> >>v9:
> >>* fixed code flow in error cases added in v8
> >>
> >>v8:
> >>* added ENOMEM on failed vzalloc()
> >>---
> >>  arch/powerpc/include/asm/iommu.h  |  6 ++
> >>  arch/powerpc/kernel/iommu.c   | 18 ++
> >>  arch/powerpc/platforms/powernv/pci-ioda.c | 22 --
> >>  3 files changed, 44 insertions(+), 2 deletions(-)
> >>
> >>diff --git a/arch/powerpc/include/asm/iommu.h 
> >>b/arch/powerpc/include/asm/iommu.h
> >>index 7694546..1472de3 100644
> >>--- a/arch/powerpc/include/asm/iommu.h
> >>+++ b/arch/powerpc/include/asm/iommu.h
> >>@@ -111,9 +111,15 @@ struct iommu_table {
> >>unsigned long *it_map;   /* A simple allocation bitmap for now */
> >>unsigned long  it_page_shift;/* table iommu page size */
> >>struct iommu_table_group *it_table_group;
> >>+   unsigned long *it_userspace; /* userspace view of the table */
> >
> >A single unsigned long doesn't seem like enough.
> 
> Why single? This is an array.

As in single per page.

> > How do you know
> >which process's address space this address refers to?
> 
> It is a current task. Multiple userspaces cannot use the same 
> container/tables.

Where is that enforced?

More to the point, that's a VFIO constraint, but it's here affecting
the design of a structure owned by the platform code.

[snip]
> >>  static void pnv_pci_ioda_setup_opal_tce_kill(struct pnv_phb *phb,
> >>@@ -2062,12 +2071,21 @@ static long pnv_pci_ioda2_create_table(struct 
> >>iommu_table_group *table_group,
> >>int nid = pe->phb->hose->node;
> >>__u64 bus_offset = num ? pe->tce_bypass_base : 0;
> >>long ret;
> >>+   unsigned long *uas, uas_cb = sizeof(*uas) * (window_size >> page_shift);
> >>+
> >>+   uas = vzalloc(uas_cb);
> >>+   if (!uas)
> >>+   return -ENOMEM;
> >
> >I don't see why this is allocated both here as well as in
> >take_ownership.
> 
> Where else? The only alternative is vfio_iommu_spapr_tce but I really do not
> want to touch iommu_table fields there.

Well to put it another way, why isn't take_ownership calling create
itself (or at least a common helper).

Clearly the it_userspace table needs to have lifetime which matches
the TCE table itself, so there should be a single function that marks
the beginning of that joint lifetime.

> >Isn't this function used for core-kernel users of the
> >iommu as well, in which case it shouldn't need the it_userspace.
> 
> 
> No. This is an iommu_table_group_ops callback which calls what the platform
> code calls (pnv_pci_create_table()) plus allocates this it_userspace thing.
> The callback is only called from VFIO.

Ok.

As touched on above it seems more like this should be owned by VFIO
code than the platform code.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


pgpaLDumcgaa0.pgp
Description: PGP signature


Re: [PATCH kernel v9 31/32] vfio: powerpc/spapr: Support multiple groups in one container if possible

2015-04-30 Thread David Gibson
On Thu, Apr 30, 2015 at 07:33:09PM +1000, Alexey Kardashevskiy wrote:
> On 04/30/2015 05:22 PM, David Gibson wrote:
> >On Sat, Apr 25, 2015 at 10:14:55PM +1000, Alexey Kardashevskiy wrote:
> >>At the moment only one group per container is supported.
> >>POWER8 CPUs have more flexible design and allows naving 2 TCE tables per
> >>IOMMU group so we can relax this limitation and support multiple groups
> >>per container.
> >
> >It's not obvious why allowing multiple TCE tables per PE has any
> >pearing on allowing multiple groups per container.
> 
> 
> This patchset is a global TCE tables rework (patches 1..30, roughly) with 2
> outcomes:
> 1. reusing the same IOMMU table for multiple groups - patch 31;
> 2. allowing dynamic create/remove of IOMMU tables - patch 32.
> 
> I can remove this one from the patchset and post it separately later but
> since 1..30 aim to support both 1) and 2), I'd think I better keep them all
> together (might explain some of changes I do in 1..30).

The combined patchset is fine.  My comment is because your commit
message says that multiple groups are possible *because* 2 TCE tables
per group are allowed, and it's not at all clear why one follows from
the other.

> >>This adds TCE table descriptors to a container and uses 
> >>iommu_table_group_ops
> >>to create/set DMA windows on IOMMU groups so the same TCE tables will be
> >>shared between several IOMMU groups.
> >>
> >>Signed-off-by: Alexey Kardashevskiy 
> >>[aw: for the vfio related changes]
> >>Acked-by: Alex Williamson 
> >>---
> >>Changes:
> >>v7:
> >>* updated doc
> >>---
> >>  Documentation/vfio.txt  |   8 +-
> >>  drivers/vfio/vfio_iommu_spapr_tce.c | 268 
> >> ++--
> >>  2 files changed, 199 insertions(+), 77 deletions(-)
> >>
> >>diff --git a/Documentation/vfio.txt b/Documentation/vfio.txt
> >>index 94328c8..7dcf2b5 100644
> >>--- a/Documentation/vfio.txt
> >>+++ b/Documentation/vfio.txt
> >>@@ -289,10 +289,12 @@ PPC64 sPAPR implementation note
> >>
> >>  This implementation has some specifics:
> >>
> >>-1) Only one IOMMU group per container is supported as an IOMMU group
> >>-represents the minimal entity which isolation can be guaranteed for and
> >>-groups are allocated statically, one per a Partitionable Endpoint (PE)
> >>+1) On older systems (POWER7 with P5IOC2/IODA1) only one IOMMU group per
> >>+container is supported as an IOMMU table is allocated at the boot time,
> >>+one table per a IOMMU group which is a Partitionable Endpoint (PE)
> >>  (PE is often a PCI domain but not always).
> >
> >I thought the more fundamental problem was that different PEs tended
> >to use disjoint bus address ranges, so even by duplicating put_tce
> >across PEs you couldn't have a common address space.
> 
> 
> Sorry, I am not following you here.
> 
> By duplicating put_tce, I can have multiple IOMMU groups on the same virtual
> PHB in QEMU, "[PATCH qemu v7 04/14] spapr_pci_vfio: Enable multiple groups
> per container" does this, the address ranges will the same.

Oh, ok.  For some reason I thought that (at least on the older
machines) the different PEs used different and not easily changeable
DMA windows in bus addresses space.

> What I cannot do on p5ioc2 is programming the same table to multiple
> physical PHBs (or I could but it is very different than IODA2 and pretty
> ugly and might not always be possible because I would have to allocate these
> pages from some common pool and face problems like fragmentation).

So allowing multiple groups per container should be possible (at the
kernel rather than qemu level) by writing the same value to multiple
TCE tables.  I guess its not worth doing for just the almost-obsolete
IOMMUs though.

> 
> 
> 
> >>+Newer systems (POWER8 with IODA2) have improved hardware design which 
> >>allows
> >>+to remove this limitation and have multiple IOMMU groups per a VFIO 
> >>container.
> >>
> >>  2) The hardware supports so called DMA windows - the PCI address range
> >>  within which DMA transfer is allowed, any attempt to access address space
> >>diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c 
> >>b/drivers/vfio/vfio_iommu_spapr_tce.c
> >>index a7d6729..970e3a2 100644
> >>--- a/drivers/vfio/vfio_iommu_spapr_tce.c
> >>+++ b/drivers/vfio/vfio_iommu_spapr_tce.c
> >>@@ -82,6 +82,11 @@ static void decrement_locked_vm(long npages)
> >>   * into DMA'ble space using the IOMMU
> >>   */
> >>
> >>+struct tce_iommu_group {
> >>+   struct list_head next;
> >>+   struct iommu_group *grp;
> >>+};
> >>+
> >>  /*
> >>   * The container descriptor supports only a single group per container.
> >>   * Required by the API as the container is not supplied with the IOMMU 
> >> group
> >>@@ -89,10 +94,11 @@ static void decrement_locked_vm(long npages)
> >>   */
> >>  struct tce_container {
> >>struct mutex lock;
> >>-   struct iommu_group *grp;
> >>bool enabled;
> >>unsigned long locked_pages;
> >>bool v2;
> >>+   struct iommu_table tables[IOMMU_TABLE_GROUP

[PATCH 1/2] dt-bindings: Add pxa1928 clock binding

2015-04-30 Thread Rob Herring
This adds the clock binding documentation for the Marvell PXA1928 SOC.
The PXA1928 has 3 clock control blocks for different subsystems of the
chip.

Signed-off-by: Rob Herring 
Cc: Pawel Moll 
Cc: Mark Rutland 
Cc: Ian Campbell 
Cc: Kumar Gala 
---
 .../devicetree/bindings/clock/marvell,pxa1928.txt  | 21 
 include/dt-bindings/clock/marvell,pxa1928.h| 57 ++
 2 files changed, 78 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/clock/marvell,pxa1928.txt
 create mode 100644 include/dt-bindings/clock/marvell,pxa1928.h

diff --git a/Documentation/devicetree/bindings/clock/marvell,pxa1928.txt 
b/Documentation/devicetree/bindings/clock/marvell,pxa1928.txt
new file mode 100644
index 000..809c5a2
--- /dev/null
+++ b/Documentation/devicetree/bindings/clock/marvell,pxa1928.txt
@@ -0,0 +1,21 @@
+* Marvell PXA1928 Clock Controllers
+
+The PXA1928 clock subsystem generates and supplies clock to various
+controllers within the PXA1928 SoC. The PXA1928 contains 3 clock controller
+blocks called APMU, MPMU, and APBC roughly corresponding to internal buses.
+
+Required Properties:
+
+- compatible: should be one of the following.
+  - "marvell,pxa1928-apmu" - APMU controller compatible
+  - "marvell,pxa1928-mpmu" - MPMU controller compatible
+  - "marvell,pxa1928-apbc" - APBC controller compatible
+- reg: physical base address of the clock controller and length of memory 
mapped
+  region.
+- #clock-cells: should be 1.
+- #reset-cells: should be 1.
+
+Each clock is assigned an identifier and client nodes use the clock controller
+phandle and this identifier to specify the clock which they consume.
+
+All these identifiers can be found in .
diff --git a/include/dt-bindings/clock/marvell,pxa1928.h 
b/include/dt-bindings/clock/marvell,pxa1928.h
new file mode 100644
index 000..c393ca2
--- /dev/null
+++ b/include/dt-bindings/clock/marvell,pxa1928.h
@@ -0,0 +1,57 @@
+#ifndef __DTS_MARVELL_PXA1928_CLOCK_H
+#define __DTS_MARVELL_PXA1928_CLOCK_H
+
+/*
+ * Clock ID values here correspond to the control register offset/4.
+ */
+
+/* apb periphrals */
+#define PXA1928_CLK_RTC0
+#define PXA1928_CLK_TWSI0  1
+#define PXA1928_CLK_TWSI1  2
+#define PXA1928_CLK_TWSI2  3
+#define PXA1928_CLK_TWSI3  4
+#define PXA1928_CLK_OWIRE  5
+#define PXA1928_CLK_KPC6
+#define PXA1928_CLK_TB_ROTARY  7
+#define PXA1928_CLK_SW_JTAG8
+#define PXA1928_CLK_TIMER1 9
+#define PXA1928_CLK_UART0  0xb
+#define PXA1928_CLK_UART1  0xc
+#define PXA1928_CLK_UART2  0xd
+#define PXA1928_CLK_GPIO   0xe
+#define PXA1928_CLK_PWM0   0xf
+#define PXA1928_CLK_PWM1   0x10
+#define PXA1928_CLK_PWM2   0x11
+#define PXA1928_CLK_PWM3   0x12
+#define PXA1928_CLK_SSP0   0x13
+#define PXA1928_CLK_SSP1   0x14
+#define PXA1928_CLK_SSP2   0x15
+
+#define PXA1928_CLK_TWSI4  0x1f
+#define PXA1928_CLK_TWSI5  0x20
+#define PXA1928_CLK_UART3  0x22
+#define PXA1928_CLK_THSENS_GLOB0x24
+#define PXA1928_CLK_THSENS_CPU 0x26
+#define PXA1928_CLK_THSENS_VPU 0x27
+#define PXA1928_CLK_THSENS_GC  0x28
+#define PXA1928_APBC_NR_CLKS   0x30
+
+
+/* axi periphrals */
+#define PXA1928_CLK_SDH0   0x15
+#define PXA1928_CLK_SDH1   0x16
+#define PXA1928_CLK_USB0x17
+#define PXA1928_CLK_NAND   0x18
+#define PXA1928_CLK_DMA0x19
+
+#define PXA1928_CLK_SDH2   0x3a
+#define PXA1928_CLK_SDH3   0x3b
+#define PXA1928_CLK_HSIC   0x3e
+#define PXA1928_CLK_SDH4   0x57
+#define PXA1928_CLK_GC3D   0x5d
+#define PXA1928_CLK_GC2D   0x5f
+
+#define PXA1928_APMU_NR_CLKS   0x60
+
+#endif
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/2] clk: mmp: add PXA1928 clock support

2015-04-30 Thread Rob Herring
Add initial clock support for Marvell PXA1928. The PXA1928 is a mobile
SOC and is similar to other MMP/PXA series of SOCs, so a lot of the
existing infrastructure is reused here.

Currently the PLLs are just fixed clocks, and not all leaf clocks are
implemented.

Signed-off-by: Rob Herring 
Cc: Mike Turquette 
Cc: Stephen Boyd 
---
 drivers/clk/mmp/Makefile |   2 +
 drivers/clk/mmp/clk-of-pxa1928.c | 265 +++
 2 files changed, 267 insertions(+)
 create mode 100644 drivers/clk/mmp/clk-of-pxa1928.c

diff --git a/drivers/clk/mmp/Makefile b/drivers/clk/mmp/Makefile
index 3caaf7c..9d4bc41 100644
--- a/drivers/clk/mmp/Makefile
+++ b/drivers/clk/mmp/Makefile
@@ -12,3 +12,5 @@ obj-$(CONFIG_MACH_MMP2_DT) += clk-of-mmp2.o
 obj-$(CONFIG_CPU_PXA168) += clk-pxa168.o
 obj-$(CONFIG_CPU_PXA910) += clk-pxa910.o
 obj-$(CONFIG_CPU_MMP2) += clk-mmp2.o
+
+obj-y += clk-of-pxa1928.o
diff --git a/drivers/clk/mmp/clk-of-pxa1928.c b/drivers/clk/mmp/clk-of-pxa1928.c
new file mode 100644
index 000..b7cb540b
--- /dev/null
+++ b/drivers/clk/mmp/clk-of-pxa1928.c
@@ -0,0 +1,265 @@
+/*
+ * pxa1928 clock framework source file
+ *
+ * Copyright (C) 2015 Linaro, Ltd.
+ * Rob Herring 
+ *
+ * Based on drivers/clk/mmp/clk-of-mmp2.c:
+ * Copyright (C) 2012 Marvell
+ * Chao Xie 
+ *
+ * This file is licensed under the terms of the GNU General Public
+ * License version 2. This program is licensed "as is" without any
+ * warranty of any kind, whether express or implied.
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+
+#include "clk.h"
+#include "reset.h"
+
+#define MPMU_UART_PLL  0x14
+
+struct pxa1928_clk_unit {
+   struct mmp_clk_unit unit;
+   void __iomem *mpmu_base;
+   void __iomem *apmu_base;
+   void __iomem *apbc_base;
+   void __iomem *apbcp_base;
+};
+
+static struct mmp_param_fixed_rate_clk fixed_rate_clks[] = {
+   {0, "clk32", NULL, CLK_IS_ROOT, 32768},
+   {0, "vctcxo", NULL, CLK_IS_ROOT, 2600},
+   {0, "pll1_624", NULL, CLK_IS_ROOT, 62400},
+   {0, "pll5p", NULL, CLK_IS_ROOT, 83200},
+   {0, "pll5", NULL, CLK_IS_ROOT, 124800},
+   {0, "usb_pll", NULL, CLK_IS_ROOT, 48000},
+};
+
+static struct mmp_param_fixed_factor_clk fixed_factor_clks[] = {
+   {0, "pll1_d2", "pll1_624", 1, 2, 0},
+   {0, "pll1_d9", "pll1_624", 1, 9, 0},
+   {0, "pll1_d12", "pll1_624", 1, 12, 0},
+   {0, "pll1_d16", "pll1_624", 1, 16, 0},
+   {0, "pll1_d20", "pll1_624", 1, 20, 0},
+   {0, "pll1_416", "pll1_624", 2, 3, 0},
+   {0, "vctcxo_d2", "vctcxo", 1, 2, 0},
+   {0, "vctcxo_d4", "vctcxo", 1, 4, 0},
+};
+
+static struct mmp_clk_factor_masks uart_factor_masks = {
+   .factor = 2,
+   .num_mask = 0x1fff,
+   .den_mask = 0x1fff,
+   .num_shift = 16,
+   .den_shift = 0,
+};
+
+static struct mmp_clk_factor_tbl uart_factor_tbl[] = {
+   {.num = 832, .den = 234},   /*58.5MHZ */
+   {.num = 1, .den = 1},   /*26MHZ */
+};
+
+static void pxa1928_pll_init(struct pxa1928_clk_unit *pxa_unit)
+{
+   struct clk *clk;
+   struct mmp_clk_unit *unit = &pxa_unit->unit;
+
+   mmp_register_fixed_rate_clks(unit, fixed_rate_clks,
+   ARRAY_SIZE(fixed_rate_clks));
+
+   mmp_register_fixed_factor_clks(unit, fixed_factor_clks,
+   ARRAY_SIZE(fixed_factor_clks));
+
+   clk = mmp_clk_register_factor("uart_pll", "pll1_416",
+   CLK_SET_RATE_PARENT,
+   pxa_unit->mpmu_base + MPMU_UART_PLL,
+   &uart_factor_masks, uart_factor_tbl,
+   ARRAY_SIZE(uart_factor_tbl), NULL);
+}
+
+static DEFINE_SPINLOCK(uart0_lock);
+static DEFINE_SPINLOCK(uart1_lock);
+static DEFINE_SPINLOCK(uart2_lock);
+static DEFINE_SPINLOCK(uart3_lock);
+static const char *uart_parent_names[] = {"uart_pll", "vctcxo"};
+
+static DEFINE_SPINLOCK(ssp0_lock);
+static DEFINE_SPINLOCK(ssp1_lock);
+static const char *ssp_parent_names[] = {"vctcxo_d4", "vctcxo_d2", "vctcxo", 
"pll1_d12"};
+
+static DEFINE_SPINLOCK(reset_lock);
+
+static struct mmp_param_mux_clk apbc_mux_clks[] = {
+   {0, "uart0_mux", uart_parent_names, ARRAY_SIZE(uart_parent_names), 
CLK_SET_RATE_PARENT, PXA1928_CLK_UART0 * 4, 4, 3, 0, &uart0_lock},
+   {0, "uart1_mux", uart_parent_names, ARRAY_SIZE(uart_parent_names), 
CLK_SET_RATE_PARENT, PXA1928_CLK_UART1 * 4, 4, 3, 0, &uart1_lock},
+   {0, "uart2_mux", uart_parent_names, ARRAY_SIZE(uart_parent_names), 
CLK_SET_RATE_PARENT, PXA1928_CLK_UART2 * 4, 4, 3, 0, &uart2_lock},
+   {0, "uart3_mux", uart_parent_names, ARRAY_SIZE(uart_parent_names), 
CLK_SET_RATE_PARENT, PXA1928_CLK_UART3 * 4, 4, 3, 0, &uart3_lock},
+   {0, "ssp0_mux", ssp_parent_names, ARRAY_SIZE(ssp_parent_names), 
CLK_SET_RATE_PARENT, PXA1928_CLK_SSP0 * 4, 4, 3, 0, &ssp0_lock},
+   {0, "ssp1_mux", ssp_parent_names, ARRAY_

Re: 3.17.0+ files disappearing after playing old dos game on nfsroot laptop

2015-04-30 Thread Eric W. Biederman
Hans de Bruin  writes:

>> I expect what needs to happen is to confirm that nfs directory entry
>> revalidation is buggy, and at least for the short term re-add the nfs
>> logic that will avoid dropping a dentry if it is a mount point, or
>> path to a mount point, to avoid the nfs bugs.
>>
>
> ok, I will go in waiting mode.

I dropped the ball on this but it looks like someone else hit the
problem and the following two commits fixed this issue:

Can you confirm that things are working again?

commit fa9233699cc1dc236f4cf42245d13e40966938c5
Author: Trond Myklebust 
Date:   Mon Feb 23 18:51:32 2015 -0500

NFS: Don't require a filehandle to refresh the inode in nfs_prime_dcache()

If the server does not return a valid set of attributes that we can
use to either create a file or refresh the inode, then there is no
value in calling nfs_prime_dcache().

However if we're just refreshing the inode using the attributes that
the server returned, then it shouldn't matter whether or not we have
a filehandle, as long as we check the fsid+fileid combination.

Signed-off-by: Trond Myklebust 

commit 6c441c254eea2354d686be7f5544bcd79fb6a61f
Author: Trond Myklebust 
Date:   Sun Feb 22 16:35:36 2015 -0500

NFS: Don't invalidate a submounted dentry in nfs_prime_dcache()

If we're traversing a directory which contains a submounted filesystem,
or one that has a referral, the NFS server that is processing the READDIR
request will often return information for the underlying (mounted-on)
directory. It may, or may not, also return filehandle information.

If this happens, and the lookup in nfs_prime_dcache() returns the
dentry for the submounted directory, the filehandle comparison will
fail, and we call d_invalidate(). Post-commit 8ed936b5671bf
("vfs: Lazily remove mounts on unlinked files and directories."), this
means the entire subtree is unmounted.

The following minimal patch addresses this problem by punting on
the invalidation if there is a submount.

Kudos to Neil Brown  for having tracked down this
issue (see link).

Reported-by: Nix 
Link: http://lkml.kernel.org/r/87iofju9ht@spindle.srvr.nix
Cc: sta...@vger.kernel.org # 3.18+
Signed-off-by: Trond Myklebust 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 00/22] perf tools: introduce 'perf bpf' command to load eBPF programs.

2015-04-30 Thread Alexei Starovoitov

On 4/30/15 3:52 AM, Wang Nan wrote:

This series of patches is an approach to integrate eBPF with perf.
After applying these patches, users are allowed to use following
command to load eBPF program compiled by LLVM into kernel:

  $ perf bpf sample_bpf.o

The required BPF code and the loading procedure is similar to Alexei
Starovoitov's libbpf in sample/bpf, with following exceptions:

  1. The section name are not required leading with 'kprobe/' or
 'kretprobe/'. Without such leading, any valid C var name can be use.

  2. A 'config' section can be provided to describe the position and
 arguments of a program. Syntax is identical to 'perf probe'.

An example is pasted at the bottom of this cover letter. In that
example, mybpfprog is configured by string in config section, and will
be probed at __alloc_pages_nodemask. sample_bpf.o is generated using:

  $ $CLANG -I/usr/src/kernel/include -I/usr/src/kernel/usr/include -D__KERNEL__ 
\
 -Wno-unused-value -Wno-pointer-sign \
 -O2 -emit-llvm -c sample_bpf.c -o -| $LLC -march=bpf -filetype=obj -o \
 sample_bpf.o

And can be loaded using:

  $ perf bpf sample_bpf.o

This series is only a limited functional. Following works are on the
todo list:

  1. Unprobe kprobe stubs used by eBPF programs when unloading;

  2. Enable eBPF programs to access local variables and arguments
 by utilizing debuginfo;

  3. Output data in perf way.

In this series:

Patch 1/22 is a bugfix in perf probe, and may be triggered by following
patches;

Patch 2-3/22 are preparation, add required macros and syscall
definition into perf source tree.

Patch 4/22 add 'perf bpf' command.

Patch 5-20/22 are labor works, which parse the ELF object file, collect
information in object files, create maps needed by programs, link map
and programs, config programs and load programs into kernel.

Patch 21-22/22 are the final work. Patch 21 creates kprobe points which
will be used by eBPF programs, patch 22 creates perf file descriptors
then attach eBPF programs on them.


I'm very happy to see this work. Looks great. All patches are 
impressively clean and concise.

I think patches 1-3 are ready to go into Arnaldo's perf tree right now.
4 and above are clean and polished, but probably need to go into
some 'staging area' like a branch of perf tree, since I suspect the
user interface may change a little in the coming months and it's
a bit too early to expose 'perf bpf' command to every perf user ?
Arnaldo, Ingo, what do you guys think should be the arrangement?
'perf/bpf' branch in acme/linux.git or in tip/tip.git ?

I have few comments for patches 18 and 19, but let's figure out
the long term plan first.

We're also working in parallel on creating a new tracing language
that together with llvm backend can be used as a single shared library
that can be called from perf or anything else.
Then clang compilation step will be gone and programs can be run
as 'perf bpf file.bpf'.

Thanks!

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V2 net-next 1/1] hv_netvsc: Use the xmit_more skb flag to optimize signaling the host

2015-04-30 Thread Eric Dumazet
On Thu, 2015-04-30 at 16:29 -0700, K. Y. Srinivasan wrote:
> Based on the information given to this driver (via the xmit_more skb flag),
> we can defer signaling the host if more packets are on the way. This will help
> make the host more efficient since it can potentially process a larger batch 
> of
> packets. Implement this optimization.
> 
> Signed-off-by: K. Y. Srinivasan 
> ---
>   v2: Fixed up indentation based on feedback from David Miller.
> 
>  drivers/net/hyperv/netvsc.c |   20 
>  1 files changed, 12 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
> index 2e8ad06..5fdc5e1 100644
> --- a/drivers/net/hyperv/netvsc.c
> +++ b/drivers/net/hyperv/netvsc.c
> @@ -743,6 +743,7 @@ static inline int netvsc_send_pkt(
>   u64 req_id;
>   int ret;
>   struct hv_page_buffer *pgbuf;
> + u32 vmbus_flags = VMBUS_DATA_PACKET_FLAG_COMPLETION_REQUESTED;
>  
>   nvmsg.hdr.msg_type = NVSP_MSG1_TYPE_SEND_RNDIS_PKT;
>   if (packet->is_data_pkt) {
> @@ -772,19 +773,22 @@ static inline int netvsc_send_pkt(
>   if (packet->page_buf_cnt) {
>   pgbuf = packet->cp_partial ? packet->page_buf +
>   packet->rmsg_pgcnt : packet->page_buf;
> - ret = vmbus_sendpacket_pagebuffer(out_channel,
> -   pgbuf,
> -   packet->page_buf_cnt,
> -   &nvmsg,
> -   sizeof(struct nvsp_message),
> -   req_id);
> + ret = vmbus_sendpacket_pagebuffer_ctl(out_channel,
> +   pgbuf,
> +   packet->page_buf_cnt,
> +   &nvmsg,
> +   sizeof(struct
> +  nvsp_message),
> +   req_id,
> +   vmbus_flags,
> +   !packet->xmit_more);
>   } else {
> - ret = vmbus_sendpacket(
> + ret = vmbus_sendpacket_ctl(
>   out_channel, &nvmsg,
>   sizeof(struct nvsp_message),
>   req_id,
>   VM_PKT_DATA_INBAND,
> - VMBUS_DATA_PACKET_FLAG_COMPLETION_REQUESTED);
> + vmbus_flags, !packet->xmit_more);
>   }
>  
>   if (ret == 0) {

This might be problematic, if queue is stopped ( netif_tx_stop_queue())

You need to force a kick if we are about to stop the queue :

Random example : commit ddd0ca5d60b350bbfbfb60b25885a9779ce6d6c7



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH kernel v9 29/32] vfio: powerpc/spapr: Register memory and define IOMMU v2

2015-04-30 Thread Alexey Kardashevskiy

On 04/30/2015 04:55 PM, David Gibson wrote:

On Sat, Apr 25, 2015 at 10:14:53PM +1000, Alexey Kardashevskiy wrote:

The existing implementation accounts the whole DMA window in
the locked_vm counter. This is going to be worse with multiple
containers and huge DMA windows. Also, real-time accounting would requite
additional tracking of accounted pages due to the page size difference -
IOMMU uses 4K pages and system uses 4K or 64K pages.

Another issue is that actual pages pinning/unpinning happens on every
DMA map/unmap request. This does not affect the performance much now as
we spend way too much time now on switching context between
guest/userspace/host but this will start to matter when we add in-kernel
DMA map/unmap acceleration.

This introduces a new IOMMU type for SPAPR - VFIO_SPAPR_TCE_v2_IOMMU.
New IOMMU deprecates VFIO_IOMMU_ENABLE/VFIO_IOMMU_DISABLE and introduces
2 new ioctls to register/unregister DMA memory -
VFIO_IOMMU_SPAPR_REGISTER_MEMORY and VFIO_IOMMU_SPAPR_UNREGISTER_MEMORY -
which receive user space address and size of a memory region which
needs to be pinned/unpinned and counted in locked_vm.
New IOMMU splits physical pages pinning and TCE table update into 2 different
operations. It requires 1) guest pages to be registered first 2) consequent
map/unmap requests to work only with pre-registered memory.
For the default single window case this means that the entire guest
(instead of 2GB) needs to be pinned before using VFIO.
When a huge DMA window is added, no additional pinning will be
required, otherwise it would be guest RAM + 2GB.

The new memory registration ioctls are not supported by
VFIO_SPAPR_TCE_IOMMU. Dynamic DMA window and in-kernel acceleration
will require memory to be preregistered in order to work.

The accounting is done per the user process.

This advertises v2 SPAPR TCE IOMMU and restricts what the userspace
can do with v1 or v2 IOMMUs.

Signed-off-by: Alexey Kardashevskiy 
[aw: for the vfio related changes]
Acked-by: Alex Williamson 
---
Changes:
v9:
* s/tce_get_hva_cached/tce_iommu_use_page_v2/

v7:
* now memory is registered per mm (i.e. process)
* moved memory registration code to powerpc/mmu
* merged "vfio: powerpc/spapr: Define v2 IOMMU" into this
* limited new ioctls to v2 IOMMU
* updated doc
* unsupported ioclts return -ENOTTY instead of -EPERM

v6:
* tce_get_hva_cached() returns hva via a pointer

v4:
* updated docs
* s/kzmalloc/vzalloc/
* in tce_pin_pages()/tce_unpin_pages() removed @vaddr, @size and
replaced offset with index
* renamed vfio_iommu_type_register_memory to vfio_iommu_spapr_register_memory
and removed duplicating vfio_iommu_spapr_register_memory
---
  Documentation/vfio.txt  |  23 
  drivers/vfio/vfio_iommu_spapr_tce.c | 230 +++-
  include/uapi/linux/vfio.h   |  27 +
  3 files changed, 274 insertions(+), 6 deletions(-)

diff --git a/Documentation/vfio.txt b/Documentation/vfio.txt
index 96978ec..94328c8 100644
--- a/Documentation/vfio.txt
+++ b/Documentation/vfio.txt
@@ -427,6 +427,29 @@ The code flow from the example above should be slightly 
changed:



+5) There is v2 of SPAPR TCE IOMMU. It deprecates VFIO_IOMMU_ENABLE/
+VFIO_IOMMU_DISABLE and implements 2 new ioctls:
+VFIO_IOMMU_SPAPR_REGISTER_MEMORY and VFIO_IOMMU_SPAPR_UNREGISTER_MEMORY
+(which are unsupported in v1 IOMMU).


A summary of the semantic differeces between v1 and v2 would be nice.
At this point it's not really clear to me if there's a case for
creating v2, or if this could just be done by adding (optional)
functionality to v1.


v1: memory preregistration is not supported; explicit enable/disable ioctls 
are required


v2: memory preregistration is required; explicit enable/disable are 
prohibited (as they are not needed).


Mixing these in one IOMMU type caused a lot of problems like should I 
increment locked_vm by the 32bit window size on enable() or not; what do I 
do about pages pinning when map/map (check if it is from registered memory 
and do not pin?).


Having 2 IOMMU models makes everything a lot simpler.



+PPC64 paravirtualized guests generate a lot of map/unmap requests,
+and the handling of those includes pinning/unpinning pages and updating
+mm::locked_vm counter to make sure we do not exceed the rlimit.
+The v2 IOMMU splits accounting and pinning into separate operations:
+
+- VFIO_IOMMU_SPAPR_REGISTER_MEMORY/VFIO_IOMMU_SPAPR_UNREGISTER_MEMORY ioctls
+receive a user space address and size of the block to be pinned.
+Bisecting is not supported and VFIO_IOMMU_UNREGISTER_MEMORY is expected to
+be called with the exact address and size used for registering
+the memory block. The userspace is not expected to call these often.
+The ranges are stored in a linked list in a VFIO container.
+
+- VFIO_IOMMU_MAP_DMA/VFIO_IOMMU_UNMAP_DMA ioctls only update the actual
+IOMMU table and do not do pinning; instead these check that the userspace
+address is from pre-registered range.
+
+This separation h

linux-next: Tree for May 1

2015-04-30 Thread Stephen Rothwell
Hi all,

Changes since 20150430:

The sound-asoc tree lost its build failure.

The akpm-current tree gained a build failure for which I apllied a
fix patch.

Non-merge commits (relative to Linus' tree): 1233
 1252 files changed, 78167 insertions(+), 22093 deletions(-)



I have created today's linux-next tree at
git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
(patches at http://www.kernel.org/pub/linux/kernel/next/ ).  If you
are tracking the linux-next tree using git, you should not use "git pull"
to do so as that will try to merge the new linux-next release with the
old one.  You should use "git fetch" and checkout or reset to the new
master.

You can see which trees have been included by looking in the Next/Trees
file in the source.  There are also quilt-import.log and merge.log files
in the Next directory.  Between each merge, the tree was built with
a ppc64_defconfig for powerpc and an allmodconfig for x86_64 and a
multi_v7_defconfig for arm. After the final fixups (if any), it is also
built with powerpc allnoconfig (32 and 64 bit), ppc44x_defconfig and
allyesconfig (this fails its final link) and i386, sparc, sparc64 and arm
defconfig.

Below is a summary of the state of the merge.

I am currently merging 214 trees (counting Linus' and 30 trees of patches
pending for Linus' tree).

Stats about the size of the tree over time can be seen at
http://neuling.org/linux-next-size.html .

Status of my local build tests will be at
http://kisskb.ellerman.id.au/linux-next .  If maintainers want to give
advice about cross compilers/configs that work, we are always open to add
more builds.

Thanks to Randy Dunlap for doing many randconfig builds.  And to Paul
Gortmaker for triage and bug fixes.

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au

$ git checkout master
$ git reset --hard stable
Merging origin/master (4a152c3913fb Merge tag 'pm+acpi-4.1-rc2' of 
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm)
Merging fixes/master (b94d525e58dc Merge 
git://git.kernel.org/pub/scm/linux/kernel/git/davem/net)
Merging kbuild-current/rc-fixes (c517d838eb7d Linux 4.0-rc1)
Merging arc-current/for-curr (e4140819dadc ARC: signal handling robustify)
Merging arm-current/fixes (6c5c2a01fcfd ARM: proc-arm94*.S: fix setup function)
Merging m68k-current/for-linus (b24f670b7f5b m68k/mac: Fix out-of-bounds array 
index in OSS IRQ source initialization)
Merging metag-fixes/fixes (0164a711c97b metag: Fix ioremap_wc/ioremap_cached 
build errors)
Merging mips-fixes/mips-fixes (1795cd9b3a91 Linux 3.16-rc5)
Merging powerpc-merge-mpe/fixes (68fc378ce332 Revert "powerpc/tm: Abort 
syscalls in active transactions")
Merging powerpc-merge/merge (c517d838eb7d Linux 4.0-rc1)
Merging sparc/master (acc455cffa75 sparc64: Setup sysfs to mark LDOM sockets, 
cores and threads correctly)
Merging net/master (e813bb2b955d net: fec: Fix RGMII-ID mode)
Merging ipsec/master (bdddbf6996c0 xfrm: fix a race in xfrm_state_lookup_byspi)
Merging sound-current/for-linus (0ae3aba2865a Merge tag 'asoc-v4.1-rc1' of 
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus)
Merging pci-current/for-linus (b787f68c36d4 Linux 4.1-rc1)
Merging wireless-drivers/master (414b7e3b9ce8 rtlwifi: rtl8192cu: Fix kernel 
deadlock)
Merging driver-core.current/driver-core-linus (b787f68c36d4 Linux 4.1-rc1)
Merging tty.current/tty-linus (96a5d18bc133 serial: 8250_pci: Add support for 
16 port Exar boards)
Merging usb.current/usb-linus (0d3bba0287d4 cdc-acm: prevent infinite loop when 
parsing CDC headers.)
Merging usb-gadget-fixes/fixes (c94e289f195e usb: gadget: remove incorrect 
__init/__exit annotations)
Merging usb-serial-fixes/usb-linus (82ee3aeb9295 USB: visor: Match I330 phone 
more precisely)
Merging staging.current/staging-linus (b787f68c36d4 Linux 4.1-rc1)
Merging char-misc.current/char-misc-linus (f26443a8ab76 Merge tag 
'extcon-fixes-for-4.1-rc2' of 
git://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/extcon into 
char-misc-linus)
Merging input-current/for-linus (48853389f206 Merge branch 'next' into 
for-linus)
Merging crypto-current/master (8c98ebd7a6ff crypto: img-hash - 
CRYPTO_DEV_IMGTEC_HASH should depend on HAS_DMA)
Merging ide/master (d681f1166919 ide: remove deprecated use of pci api)
Merging devicetree-current/devicetree/merge (41d9489319f2 drivers/of: Add empty 
ranges quirk for PA-Semi)
Merging rr-fixes/fixes (f47689345931 lguest: update help text.)
Merging vfio-fixes/for-linus (82a0eaab980a vfio-pci: Log device requests more 
verbosely)
Merging kselftest-fixes/fixes (67d8712dcc70 selftests: Fix build failures when 
invoked from kselftest target)
Merging drm-intel-fixes/for-linux-next-fixes (a04f90a33fab drm/i915/chv: 
Implement WaDisableShadowRegForCpd)
Merging asm-generic/master (643165c8bbc8 Merge tag 'uaccess_for_

[PATCH 3/3] Change all uses of JOBCTL_* from int to long

2015-04-30 Thread Palmer Dabbelt
c56fb6564dcd ("Fix a misaligned load inside ptrace_attach()") makes
jobctl an "unsigned long".  It makes sense to have the masks applied
to it match that type.  This is currently just a cosmetic change, but
it will prevent the mask from being unexpectedly truncated if we ever
end up with masks with more bits.

One instance of "signr" is an int, but I left this alone because the
mask ensures that it will never overflow.

Reviewed-by: Chris Metcalf 
Signed-off-by: Palmer Dabbelt 
---
 include/linux/sched.h | 18 +-
 kernel/signal.c   |  6 +++---
 2 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 391827db0a2d..9251155bf27f 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -2077,22 +2077,22 @@ TASK_PFA_CLEAR(SPREAD_SLAB, spread_slab)
 #define JOBCTL_TRAPPING_BIT21  /* switching to TRACED */
 #define JOBCTL_LISTENING_BIT   22  /* ptracer is listening for events */
 
-#define JOBCTL_STOP_DEQUEUED   (1 << JOBCTL_STOP_DEQUEUED_BIT)
-#define JOBCTL_STOP_PENDING(1 << JOBCTL_STOP_PENDING_BIT)
-#define JOBCTL_STOP_CONSUME(1 << JOBCTL_STOP_CONSUME_BIT)
-#define JOBCTL_TRAP_STOP   (1 << JOBCTL_TRAP_STOP_BIT)
-#define JOBCTL_TRAP_NOTIFY (1 << JOBCTL_TRAP_NOTIFY_BIT)
-#define JOBCTL_TRAPPING(1 << JOBCTL_TRAPPING_BIT)
-#define JOBCTL_LISTENING   (1 << JOBCTL_LISTENING_BIT)
+#define JOBCTL_STOP_DEQUEUED   (1UL << JOBCTL_STOP_DEQUEUED_BIT)
+#define JOBCTL_STOP_PENDING(1UL << JOBCTL_STOP_PENDING_BIT)
+#define JOBCTL_STOP_CONSUME(1UL << JOBCTL_STOP_CONSUME_BIT)
+#define JOBCTL_TRAP_STOP   (1UL << JOBCTL_TRAP_STOP_BIT)
+#define JOBCTL_TRAP_NOTIFY (1UL << JOBCTL_TRAP_NOTIFY_BIT)
+#define JOBCTL_TRAPPING(1UL << JOBCTL_TRAPPING_BIT)
+#define JOBCTL_LISTENING   (1UL << JOBCTL_LISTENING_BIT)
 
 #define JOBCTL_TRAP_MASK   (JOBCTL_TRAP_STOP | JOBCTL_TRAP_NOTIFY)
 #define JOBCTL_PENDING_MASK(JOBCTL_STOP_PENDING | JOBCTL_TRAP_MASK)
 
 extern bool task_set_jobctl_pending(struct task_struct *task,
-   unsigned int mask);
+   unsigned long mask);
 extern void task_clear_jobctl_trapping(struct task_struct *task);
 extern void task_clear_jobctl_pending(struct task_struct *task,
- unsigned int mask);
+ unsigned long mask);
 
 static inline void rcu_copy_process(struct task_struct *p)
 {
diff --git a/kernel/signal.c b/kernel/signal.c
index d51c5ddd855c..f19833b5db3c 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -245,7 +245,7 @@ static inline void print_dropped_signal(int sig)
  * RETURNS:
  * %true if @mask is set, %false if made noop because @task was dying.
  */
-bool task_set_jobctl_pending(struct task_struct *task, unsigned int mask)
+bool task_set_jobctl_pending(struct task_struct *task, unsigned long mask)
 {
BUG_ON(mask & ~(JOBCTL_PENDING_MASK | JOBCTL_STOP_CONSUME |
JOBCTL_STOP_SIGMASK | JOBCTL_TRAPPING));
@@ -297,7 +297,7 @@ void task_clear_jobctl_trapping(struct task_struct *task)
  * CONTEXT:
  * Must be called with @task->sighand->siglock held.
  */
-void task_clear_jobctl_pending(struct task_struct *task, unsigned int mask)
+void task_clear_jobctl_pending(struct task_struct *task, unsigned long mask)
 {
BUG_ON(mask & ~JOBCTL_PENDING_MASK);
 
@@ -2000,7 +2000,7 @@ static bool do_signal_stop(int signr)
struct signal_struct *sig = current->signal;
 
if (!(current->jobctl & JOBCTL_STOP_PENDING)) {
-   unsigned int gstop = JOBCTL_STOP_PENDING | JOBCTL_STOP_CONSUME;
+   unsigned long gstop = JOBCTL_STOP_PENDING | JOBCTL_STOP_CONSUME;
struct task_struct *t;
 
/* signr will be recorded in task->jobctl for retries */
-- 
2.0.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/3] Fix a misaligned load inside ptrace_attach()

2015-04-30 Thread Palmer Dabbelt
The misaligned load exception arises when running ptrace_attach() on
the RISC-V (which hasn't been upstreamed yet).  The problem is that
wait_on_bit() takes a void* but then proceeds to call test_bit(),
which takes a long*.  This allows an int-aligned pointer to be passed
to test_bit(), which promptly fails.  This will manifest on any other
asm-generic port where unaligned loads trap, where sizeof(long) >
sizeof(int), and where task_struct.jobctl ends up not being
long-aligned.

This patch changes task_struct.jobctl to be a long, which ensures it
has the correct alignment.

Reviewed-by: Chris Metcalf 
Signed-off-by: Palmer Dabbelt 
---
 include/linux/sched.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 26a2e6122734..391827db0a2d 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1369,7 +1369,7 @@ struct task_struct {
int exit_state;
int exit_code, exit_signal;
int pdeath_signal;  /*  The signal sent when the parent dies  */
-   unsigned int jobctl;/* JOBCTL_*, siglock protected */
+   unsigned long jobctl;   /* JOBCTL_*, siglock protected */
 
/* Used for emulating ABI behavior of previous Linux versions */
unsigned int personality;
-- 
2.0.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/3] Change wait_on_bit*() to take an unsigned long*, not a void*

2015-04-30 Thread Palmer Dabbelt
The implementations of wait_on_bit*() will only work with long-aligned
memory on systems that don't support misaligned loads and stores.
This patch changes the function prototypes to ensure that the compiler
will enforce alignment.

Running

  make defconfig
  make KFLAGS="-Werror"

seems to indicate that, as of c56fb6564dcd ("Fix a misaligned load
inside ptrace_attach()"), there are now no users of non-long-aligned
calls to wait_on_bit*().  I additionally tried a few "make randconfig"
attempts, none of which failed to compile for this reason.

Reviewed-by: Chris Metcalf 
Signed-off-by: Palmer Dabbelt 
---
 include/linux/wait.h | 17 ++---
 1 file changed, 10 insertions(+), 7 deletions(-)

diff --git a/include/linux/wait.h b/include/linux/wait.h
index 2db83349865b..d69ac4ecc88b 100644
--- a/include/linux/wait.h
+++ b/include/linux/wait.h
@@ -969,7 +969,7 @@ extern int bit_wait_io_timeout(struct wait_bit_key *);
  * on that signal.
  */
 static inline int
-wait_on_bit(void *word, int bit, unsigned mode)
+wait_on_bit(unsigned long *word, int bit, unsigned mode)
 {
might_sleep();
if (!test_bit(bit, word))
@@ -994,7 +994,7 @@ wait_on_bit(void *word, int bit, unsigned mode)
  * on that signal.
  */
 static inline int
-wait_on_bit_io(void *word, int bit, unsigned mode)
+wait_on_bit_io(unsigned long *word, int bit, unsigned mode)
 {
might_sleep();
if (!test_bit(bit, word))
@@ -1020,7 +1020,8 @@ wait_on_bit_io(void *word, int bit, unsigned mode)
  * received a signal and the mode permitted wakeup on that signal.
  */
 static inline int
-wait_on_bit_timeout(void *word, int bit, unsigned mode, unsigned long timeout)
+wait_on_bit_timeout(unsigned long *word, int bit, unsigned mode,
+   unsigned long timeout)
 {
might_sleep();
if (!test_bit(bit, word))
@@ -1047,7 +1048,8 @@ wait_on_bit_timeout(void *word, int bit, unsigned mode, 
unsigned long timeout)
  * on that signal.
  */
 static inline int
-wait_on_bit_action(void *word, int bit, wait_bit_action_f *action, unsigned 
mode)
+wait_on_bit_action(unsigned long *word, int bit, wait_bit_action_f *action,
+  unsigned mode)
 {
might_sleep();
if (!test_bit(bit, word))
@@ -1075,7 +1077,7 @@ wait_on_bit_action(void *word, int bit, wait_bit_action_f 
*action, unsigned mode
  * the @mode allows that signal to wake the process.
  */
 static inline int
-wait_on_bit_lock(void *word, int bit, unsigned mode)
+wait_on_bit_lock(unsigned long *word, int bit, unsigned mode)
 {
might_sleep();
if (!test_and_set_bit(bit, word))
@@ -1099,7 +1101,7 @@ wait_on_bit_lock(void *word, int bit, unsigned mode)
  * the @mode allows that signal to wake the process.
  */
 static inline int
-wait_on_bit_lock_io(void *word, int bit, unsigned mode)
+wait_on_bit_lock_io(unsigned long *word, int bit, unsigned mode)
 {
might_sleep();
if (!test_and_set_bit(bit, word))
@@ -1125,7 +1127,8 @@ wait_on_bit_lock_io(void *word, int bit, unsigned mode)
  * the @mode allows that signal to wake the process.
  */
 static inline int
-wait_on_bit_lock_action(void *word, int bit, wait_bit_action_f *action, 
unsigned mode)
+wait_on_bit_lock_action(unsigned long *word, int bit, wait_bit_action_f 
*action,
+   unsigned mode)
 {
might_sleep();
if (!test_and_set_bit(bit, word))
-- 
2.0.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 0/3] Fix a misaligned load inside ptrace_attach()

2015-04-30 Thread Palmer Dabbelt
I ran across what I believe is a bug in some asm-generic code while
working on the RISC-V Linux port.  Essentially the problem is that
wait_on_bit() takes a void *, but then perfroms long-aligned
operation.  As far as I can tell, this bug could manifest on any other
architecture that doesn't support misaligned operations and uses this
particular asm-generic implementation.

The patch set is split into three parts:

* #1 fixes the bug by making task_struct.jobctl an unsigned long,
   which ensures wait_on_bit() always ends up with a long-aligned
   argument.

* #2 changes the prototype of wait_on_bit() and friends to take a
   "unsigned long *" instead of a "void *", with the intent of
   ensuring these problems don't happen again.

* #3 is a bit more intrusive: it goes and changes all uses of
   task_struct.jobctl from int to long.

I'm not sure if #3 has gone too far, but I think #1 and #2 are sane.
The cost is making task_struct larger on machines where
sizeof(long)>sizeof(int), but since it's so big already this isn't too
much cost.  I thought about making test_bit() perform byte-aligned
accesses to avoid this cost, but since there are very similar looking
atomic functions I thought that would be too odd.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


linux-next: build failure after merge of the akpm-current tree

2015-04-30 Thread Stephen Rothwell
Hi Andrew,

After merging the akpm tree, today's linux-next build (sparc defconfig)
failed like this:

mm/bootmem.c: In function 'free_all_bootmem_core':
mm/bootmem.c:237:32: error: 'cur' undeclared (first use in this function)
   __free_pages_bootmem(page++, cur++, 0);
^

Caused by commit "mm: page_alloc: pass PFN to __free_pages_bootmem".
This only happens because CONFIG_NO_BOOTMEM is *not* set (it is set on
powerpc, x86, arm and sparc64).  Clearly it was never built for this
config.  :-(

Reverting would be a real pain, so I added this (probably incorrect)
patch to make it build:

From: Stephen Rothwell 
Date: Fri, 1 May 2015 14:21:08 +1000
Subject: [PATCH] mm: page_alloc: pass PFN to __free_pages_bootmem fix

Signed-off-by: Stephen Rothwell 
---
 mm/bootmem.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/mm/bootmem.c b/mm/bootmem.c
index daf956bb4782..0a0eb62b1c92 100644
--- a/mm/bootmem.c
+++ b/mm/bootmem.c
@@ -173,6 +173,7 @@ static unsigned long __init 
free_all_bootmem_core(bootmem_data_t *bdata)
 {
struct page *page;
unsigned long *map, start, end, pages, count = 0;
+   unsigned long cur;
 
if (!bdata->node_bootmem_map)
return 0;
@@ -214,7 +215,7 @@ static unsigned long __init 
free_all_bootmem_core(bootmem_data_t *bdata)
count += BITS_PER_LONG;
start += BITS_PER_LONG;
} else {
-   unsigned long cur = start;
+   cur = start;
 
start = ALIGN(start + 1, BITS_PER_LONG);
while (vec && cur != start) {
@@ -233,6 +234,7 @@ static unsigned long __init 
free_all_bootmem_core(bootmem_data_t *bdata)
pages = bdata->node_low_pfn - bdata->node_min_pfn;
pages = bootmem_bootmap_pages(pages);
count += pages;
+   cur = bdata->node_min_pfn;
while (pages--)
__free_pages_bootmem(page++, cur++, 0);
 
-- 
2.1.4

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au


pgpoi1NnWpAyV.pgp
Description: OpenPGP digital signature


Re: [PATCH 2/4] x86/mce/amd: Introduce deferred error interrupt handler

2015-04-30 Thread Aravind Gopalakrishnan


On 4/30/15 3:41 PM, Andy Lutomirski wrote:

On Thu, Apr 30, 2015 at 7:49 AM, Aravind Gopalakrishnan
 wrote:

Changes introduced in the patch-
   - Assign vector number 0xf4 for Deferred errors
   - Declare deferred_interrupt, allocate gate and bind it
 to DEFERRED_APIC_VECTOR.
   - Declare smp_deferred_interrupt to be used as the
 entry point for the interrupt in mce_amd.c
   - Define trace_deferred_interrupt for tracing
   - Enable deferred error interrupt selectively upon detection
 of 'succor' bitfield
   - Setup amd_deferred_error_interrupt() to handle the interrupt
 and assign it to def_int_vector if feature is present in HW.
 Else, let default handler deal with it.
   - Provide Deferred error interrupt stats on
 /proc/interrupts by incrementing irq_deferred_count

You're calling these "deferred interrupts" all over (e.g.
irq_deferred_count, deferred_int_handler, etc).  That seems like it'll
be confusing.  They're deferred errors, not deferred interrupts.



I used  the term as it is an interrupt due to the deferred error.
Would 'deferred_err_interrupt' be more apt? Maybe 
'irq_deferred_error_count' for the counter?


Thanks,
-Aravind.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH kernel v9 27/32] powerpc/iommu/ioda2: Add get_table_size() to calculate the size of future table

2015-04-30 Thread Alexey Kardashevskiy

On 04/29/2015 04:40 PM, David Gibson wrote:

On Sat, Apr 25, 2015 at 10:14:51PM +1000, Alexey Kardashevskiy wrote:

This adds a way for the IOMMU user to know how much a new table will
use so it can be accounted in the locked_vm limit before allocation
happens.

This stores the allocated table size in pnv_pci_create_table()
so the locked_vm counter can be updated correctly when a table is
being disposed.

This defines an iommu_table_group_ops callback to let VFIO know
how much memory will be locked if a table is created.

Signed-off-by: Alexey Kardashevskiy 
---
Changes:
v9:
* reimplemented the whole patch
---
  arch/powerpc/include/asm/iommu.h  |  5 +
  arch/powerpc/platforms/powernv/pci-ioda.c | 14 
  arch/powerpc/platforms/powernv/pci.c  | 36 +++
  arch/powerpc/platforms/powernv/pci.h  |  2 ++
  4 files changed, 57 insertions(+)

diff --git a/arch/powerpc/include/asm/iommu.h b/arch/powerpc/include/asm/iommu.h
index 1472de3..9844c106 100644
--- a/arch/powerpc/include/asm/iommu.h
+++ b/arch/powerpc/include/asm/iommu.h
@@ -99,6 +99,7 @@ struct iommu_table {
unsigned long  it_size;  /* Size of iommu table in entries */
unsigned long  it_indirect_levels;
unsigned long  it_level_size;
+   unsigned long  it_allocated_size;
unsigned long  it_offset;/* Offset into global table */
unsigned long  it_base;  /* mapped address of tce table */
unsigned long  it_index; /* which iommu table this is */
@@ -155,6 +156,10 @@ extern struct iommu_table *iommu_init_table(struct 
iommu_table * tbl,
  struct iommu_table_group;

  struct iommu_table_group_ops {
+   unsigned long (*get_table_size)(
+   __u32 page_shift,
+   __u64 window_size,
+   __u32 levels);
long (*create_table)(struct iommu_table_group *table_group,
int num,
__u32 page_shift,
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index e0be556..7f548b4 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -2062,6 +2062,18 @@ static void pnv_pci_ioda2_setup_bypass_pe(struct pnv_phb 
*phb,
  }

  #ifdef CONFIG_IOMMU_API
+static unsigned long pnv_pci_ioda2_get_table_size(__u32 page_shift,
+   __u64 window_size, __u32 levels)
+{
+   unsigned long ret = pnv_get_table_size(page_shift, window_size, levels);
+
+   if (!ret)
+   return ret;
+
+   /* Add size of it_userspace */
+   return ret + (window_size >> page_shift) * sizeof(unsigned long);


This doesn't make much sense.  The userspace view can't possibly be a
property of the specific low-level IOMMU model.



This it_userspace thing is all about memory preregistration.

I need some way to track how many actual mappings the 
mm_iommu_table_group_mem_t has in order to decide whether to allow 
unregistering or not.


When I clear TCE, I can read the old value which is host physical address 
which I cannot use to find the preregistered region and adjust the mappings 
counter; I can only use userspace addresses for this (not even guest 
physical addresses as it is VFIO and probably no KVM).


So I have to keep userspace addresses somewhere, one per IOMMU page, and 
the iommu_table seems a natural place for this.









+}
+
  static long pnv_pci_ioda2_create_table(struct iommu_table_group *table_group,
int num, __u32 page_shift, __u64 window_size, __u32 levels,
struct iommu_table *tbl)
@@ -2086,6 +2098,7 @@ static long pnv_pci_ioda2_create_table(struct 
iommu_table_group *table_group,

BUG_ON(tbl->it_userspace);
tbl->it_userspace = uas;
+   tbl->it_allocated_size += uas_cb;
tbl->it_ops = &pnv_ioda2_iommu_ops;
if (pe->tce_inval_reg)
tbl->it_type |= (TCE_PCI_SWINV_CREATE | TCE_PCI_SWINV_FREE);
@@ -2160,6 +2173,7 @@ static void pnv_ioda2_release_ownership(struct 
iommu_table_group *table_group)
  }

  static struct iommu_table_group_ops pnv_pci_ioda2_ops = {
+   .get_table_size = pnv_pci_ioda2_get_table_size,
.create_table = pnv_pci_ioda2_create_table,
.set_window = pnv_pci_ioda2_set_window,
.unset_window = pnv_pci_ioda2_unset_window,
diff --git a/arch/powerpc/platforms/powernv/pci.c 
b/arch/powerpc/platforms/powernv/pci.c
index fc129c4..1b5b48a 100644
--- a/arch/powerpc/platforms/powernv/pci.c
+++ b/arch/powerpc/platforms/powernv/pci.c
@@ -662,6 +662,38 @@ void pnv_pci_setup_iommu_table(struct iommu_table *tbl,
tbl->it_type = TCE_PCI;
  }

+unsigned long pnv_get_table_size(__u32 page_shift,
+   __u64 window_size, __u32 levels)
+{
+   unsigned long bytes = 0;
+   const unsigned window_shift = ilog2(window_size);
+   unsigned entries_shift = window_shift - page_shift;
+   unsigned table_shift =

Re: [PATCH kernel v9 26/32] powerpc/iommu: Add userspace view of TCE table

2015-04-30 Thread Alexey Kardashevskiy

On 04/29/2015 04:31 PM, David Gibson wrote:

On Sat, Apr 25, 2015 at 10:14:50PM +1000, Alexey Kardashevskiy wrote:

In order to support memory pre-registration, we need a way to track
the use of every registered memory region and only allow unregistration
if a region is not in use anymore. So we need a way to tell from what
region the just cleared TCE was from.

This adds a userspace view of the TCE table into iommu_table struct.
It contains userspace address, one per TCE entry. The table is only
allocated when the ownership over an IOMMU group is taken which means
it is only used from outside of the powernv code (such as VFIO).

Signed-off-by: Alexey Kardashevskiy 
---
Changes:
v9:
* fixed code flow in error cases added in v8

v8:
* added ENOMEM on failed vzalloc()
---
  arch/powerpc/include/asm/iommu.h  |  6 ++
  arch/powerpc/kernel/iommu.c   | 18 ++
  arch/powerpc/platforms/powernv/pci-ioda.c | 22 --
  3 files changed, 44 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/iommu.h b/arch/powerpc/include/asm/iommu.h
index 7694546..1472de3 100644
--- a/arch/powerpc/include/asm/iommu.h
+++ b/arch/powerpc/include/asm/iommu.h
@@ -111,9 +111,15 @@ struct iommu_table {
unsigned long *it_map;   /* A simple allocation bitmap for now */
unsigned long  it_page_shift;/* table iommu page size */
struct iommu_table_group *it_table_group;
+   unsigned long *it_userspace; /* userspace view of the table */


A single unsigned long doesn't seem like enough.


Why single? This is an array.


 How do you know
which process's address space this address refers to?


It is a current task. Multiple userspaces cannot use the same container/tables.




struct iommu_table_ops *it_ops;
  };

+#define IOMMU_TABLE_USERSPACE_ENTRY(tbl, entry) \
+   ((tbl)->it_userspace ? \
+   &((tbl)->it_userspace[(entry) - (tbl)->it_offset]) : \
+   NULL)
+
  /* Pure 2^n version of get_order */
  static inline __attribute_const__
  int get_iommu_order(unsigned long size, struct iommu_table *tbl)
diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
index 2eaba0c..74a3f52 100644
--- a/arch/powerpc/kernel/iommu.c
+++ b/arch/powerpc/kernel/iommu.c
@@ -38,6 +38,7 @@
  #include 
  #include 
  #include 
+#include 
  #include 
  #include 
  #include 
@@ -739,6 +740,8 @@ void iommu_reset_table(struct iommu_table *tbl, const char 
*node_name)
free_pages((unsigned long) tbl->it_map, order);
}

+   WARN_ON(tbl->it_userspace);
+
memset(tbl, 0, sizeof(*tbl));
  }

@@ -1016,6 +1019,7 @@ int iommu_take_ownership(struct iommu_table *tbl)
  {
unsigned long flags, i, sz = (tbl->it_size + 7) >> 3;
int ret = 0;
+   unsigned long *uas;

/*
 * VFIO does not control TCE entries allocation and the guest
@@ -1027,6 +1031,10 @@ int iommu_take_ownership(struct iommu_table *tbl)
if (!tbl->it_ops->exchange)
return -EINVAL;

+   uas = vzalloc(sizeof(*uas) * tbl->it_size);
+   if (!uas)
+   return -ENOMEM;
+
spin_lock_irqsave(&tbl->large_pool.lock, flags);
for (i = 0; i < tbl->nr_pools; i++)
spin_lock(&tbl->pools[i].lock);
@@ -1044,6 +1052,13 @@ int iommu_take_ownership(struct iommu_table *tbl)
memset(tbl->it_map, 0xff, sz);
}

+   if (ret) {
+   vfree(uas);
+   } else {
+   BUG_ON(tbl->it_userspace);
+   tbl->it_userspace = uas;
+   }
+
for (i = 0; i < tbl->nr_pools; i++)
spin_unlock(&tbl->pools[i].lock);
spin_unlock_irqrestore(&tbl->large_pool.lock, flags);
@@ -1056,6 +1071,9 @@ void iommu_release_ownership(struct iommu_table *tbl)
  {
unsigned long flags, i, sz = (tbl->it_size + 7) >> 3;

+   vfree(tbl->it_userspace);
+   tbl->it_userspace = NULL;
+
spin_lock_irqsave(&tbl->large_pool.lock, flags);
for (i = 0; i < tbl->nr_pools; i++)
spin_lock(&tbl->pools[i].lock);
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index 45bc131..e0be556 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -25,6 +25,7 @@
  #include 
  #include 
  #include 
+#include 

  #include 
  #include 
@@ -1827,6 +1828,14 @@ static void pnv_ioda2_tce_free(struct iommu_table *tbl, 
long index,
pnv_pci_ioda2_tce_invalidate(tbl, index, npages, false);
  }

+void pnv_pci_ioda2_free_table(struct iommu_table *tbl)
+{
+   vfree(tbl->it_userspace);
+   tbl->it_userspace = NULL;
+
+   pnv_pci_free_table(tbl);
+}
+
  static struct iommu_table_ops pnv_ioda2_iommu_ops = {
.set = pnv_ioda2_tce_build,
  #ifdef CONFIG_IOMMU_API
@@ -1834,7 +1843,7 @@ static struct iommu_table_ops pnv_ioda2_iommu_ops = {
  #

Re: [PATCH kernel v9 28/32] powerpc/mmu: Add userspace-to-physical addresses translation cache

2015-04-30 Thread David Gibson
On Thu, Apr 30, 2015 at 06:25:25PM +1000, Paul Mackerras wrote:
> On Thu, Apr 30, 2015 at 04:34:55PM +1000, David Gibson wrote:
> > On Sat, Apr 25, 2015 at 10:14:52PM +1000, Alexey Kardashevskiy wrote:
> > > We are adding support for DMA memory pre-registration to be used in
> > > conjunction with VFIO. The idea is that the userspace which is going to
> > > run a guest may want to pre-register a user space memory region so
> > > it all gets pinned once and never goes away. Having this done,
> > > a hypervisor will not have to pin/unpin pages on every DMA map/unmap
> > > request. This is going to help with multiple pinning of the same memory
> > > and in-kernel acceleration of DMA requests.
> > > 
> > > This adds a list of memory regions to mm_context_t. Each region consists
> > > of a header and a list of physical addresses. This adds API to:
> > > 1. register/unregister memory regions;
> > > 2. do final cleanup (which puts all pre-registered pages);
> > > 3. do userspace to physical address translation;
> > > 4. manage a mapped pages counter; when it is zero, it is safe to
> > > unregister the region.
> > > 
> > > Multiple registration of the same region is allowed, kref is used to
> > > track the number of registrations.
> > 
> > [snip]
> > > +long mm_iommu_alloc(unsigned long ua, unsigned long entries,
> > > + struct mm_iommu_table_group_mem_t **pmem)
> > > +{
> > > + struct mm_iommu_table_group_mem_t *mem;
> > > + long i, j;
> > > + struct page *page = NULL;
> > > +
> > > + list_for_each_entry_rcu(mem, Ā¤t->mm->context.iommu_group_mem_list,
> > > + next) {
> > > + if ((mem->ua == ua) && (mem->entries == entries))
> > > + return -EBUSY;
> > > +
> > > + /* Overlap? */
> > > + if ((mem->ua < (ua + (entries << PAGE_SHIFT))) &&
> > > + (ua < (mem->ua + (mem->entries << PAGE_SHIFT
> > > + return -EINVAL;
> > > + }
> > > +
> > > + mem = kzalloc(sizeof(*mem), GFP_KERNEL);
> > > + if (!mem)
> > > + return -ENOMEM;
> > > +
> > > + mem->hpas = vzalloc(entries * sizeof(mem->hpas[0]));
> > > + if (!mem->hpas) {
> > > + kfree(mem);
> > > + return -ENOMEM;
> > > + }
> > 
> > So, I've thought more about this and I'm really confused as to what
> > this is supposed to be accomplishing.
> > 
> > I see that you need to keep track of what regions are registered, so
> > you don't double lock or unlock, but I don't see what the point of
> > actualy storing the translations in hpas is.
> > 
> > I had assumed it was so that you could later on get to the
> > translations in real mode when you do in-kernel acceleration.  But
> > that doesn't make sense, because the array is vmalloc()ed, so can't be
> > accessed in real mode anyway.
> 
> We can access vmalloc'd arrays in real mode using real_vmalloc_addr().

Ah, ok.

> > I can't think of a circumstance in which you can use hpas where you
> > couldn't just walk the page tables anyway.
> 
> The problem with walking the page tables is that there is no guarantee
> that the page you find that way is the page that was returned by the
> gup_fast() we did earlier.  Storing the hpas means that we know for
> sure that the page we're doing DMA to is one that we have an elevated
> page count on.
> 
> Also, there are various points where a Linux PTE is made temporarily
> invalid for a short time.  If we happened to do a H_PUT_TCE on one cpu
> while another cpu was doing that, we'd get a spurious failure returned
> by the H_PUT_TCE.

I think we want this explanation in the commit message.  Anr/or in a
comment somewhere, I'm not sure.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


pgpxYOkwDTp_8.pgp
Description: PGP signature


Re: [PATCH kernel v9 23/32] powerpc/powernv/ioda: Define and implement DMA table/window management callbacks

2015-04-30 Thread David Gibson
On Thu, Apr 30, 2015 at 07:56:17PM +1000, Alexey Kardashevskiy wrote:
> On 04/30/2015 02:37 PM, David Gibson wrote:
> >On Wed, Apr 29, 2015 at 07:44:20PM +1000, Alexey Kardashevskiy wrote:
> >>On 04/29/2015 03:30 PM, David Gibson wrote:
> >>>On Sat, Apr 25, 2015 at 10:14:47PM +1000, Alexey Kardashevskiy wrote:
> This extends iommu_table_group_ops by a set of callbacks to support
> dynamic DMA windows management.
> 
> create_table() creates a TCE table with specific parameters.
> it receives iommu_table_group to know nodeid in order to allocate
> TCE table memory closer to the PHB. The exact format of allocated
> multi-level table might be also specific to the PHB model (not
> the case now though).
> This callback calculated the DMA window offset on a PCI bus from @num
> and stores it in a just created table.
> 
> set_window() sets the window at specified TVT index + @num on PHB.
> 
> unset_window() unsets the window from specified TVT.
> 
> This adds a free() callback to iommu_table_ops to free the memory
> (potentially a tree of tables) allocated for the TCE table.
> >>>
> >>>Doesn't the free callback belong with the previous patch introducing
> >>>multi-level tables?
> >>
> >>
> >>
> >>If I did that, you would say "why is it here if nothing calls it" on
> >>"multilevel" patch and "I see the allocation but I do not see memory
> >>release" ;)
> >
> >Yeah, fair enough ;)
> >
> >>I need some rule of thumb here. I think it is a bit cleaner if the same
> >>patch adds a callback for memory allocation and its counterpart, no?
> >
> >On further consideration, yes, I think you're right.
> >
> create_table() and free() are supposed to be called once per
> VFIO container and set_window()/unset_window() are supposed to be
> called for every group in a container.
> 
> This adds IOMMU capabilities to iommu_table_group such as default
> 32bit window parameters and others.
> 
> Signed-off-by: Alexey Kardashevskiy 
> ---
>   arch/powerpc/include/asm/iommu.h| 19 
>   arch/powerpc/platforms/powernv/pci-ioda.c   | 75 
>  ++---
>   arch/powerpc/platforms/powernv/pci-p5ioc2.c | 12 +++--
>   3 files changed, 96 insertions(+), 10 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/iommu.h 
> b/arch/powerpc/include/asm/iommu.h
> index 0f50ee2..7694546 100644
> --- a/arch/powerpc/include/asm/iommu.h
> +++ b/arch/powerpc/include/asm/iommu.h
> @@ -70,6 +70,7 @@ struct iommu_table_ops {
>   /* get() returns a physical address */
>   unsigned long (*get)(struct iommu_table *tbl, long index);
>   void (*flush)(struct iommu_table *tbl);
> + void (*free)(struct iommu_table *tbl);
>   };
> 
>   /* These are used by VIO */
> @@ -148,6 +149,17 @@ extern struct iommu_table *iommu_init_table(struct 
> iommu_table * tbl,
>   struct iommu_table_group;
> 
>   struct iommu_table_group_ops {
> + long (*create_table)(struct iommu_table_group *table_group,
> + int num,
> + __u32 page_shift,
> + __u64 window_size,
> + __u32 levels,
> + struct iommu_table *tbl);
> + long (*set_window)(struct iommu_table_group *table_group,
> + int num,
> + struct iommu_table *tblnew);
> + long (*unset_window)(struct iommu_table_group *table_group,
> + int num);
>   /*
>    * Switches ownership from the kernel itself to an external
>    * user. While onwership is taken, the kernel cannot use IOMMU 
>  itself.
> @@ -160,6 +172,13 @@ struct iommu_table_group {
>   #ifdef CONFIG_IOMMU_API
>   struct iommu_group *group;
>   #endif
> + /* Some key properties of IOMMU */
> + __u32 tce32_start;
> + __u32 tce32_size;
> + __u64 pgsizes; /* Bitmap of supported page sizes */
> + __u32 max_dynamic_windows_supported;
> + __u32 max_levels;
> >>>
> >>>With this information, table_group seems even more like a bad name.
> >>>"iommu_state" maybe?
> >>
> >>
> >>Please, no. We will never come to agreement then :( And "iommu_state" is too
> >>general anyway, it won't pass.
> >>
> >>
>   struct iommu_table tables[IOMMU_TABLE_GROUP_MAX_TABLES];
>   struct iommu_table_group_ops *ops;
>   };
> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
> b/arch/powerpc/platforms/powernv/pci-ioda.c
> index cc1d09c..4828837 100644
> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> @@ -24,6 +24,7 @@
>   #include 
>   #include 
>   #include 
> +#include 
> 
>   #include 
>   #include 
> @@ -1846,6 +1847,7 @@ static struct iommu_table_ops pnv_ioda2

Re: [PATCH] i2c: rk3x: Increase wait timeout to 1 second

2015-04-30 Thread Caesar Wang



åœØ 2015幓05꜈01ę—„ 05:44, Doug Anderson 写道:

While it's not sensible for an i2c command to _actually_ need more
than 200ms to complete, let's increase the timeout anyway.  Why?  It
turns out that if you've got a large number of printks going out to a
serial console, interrupts on a CPU can be disabled for hundreds of
milliseconds. That's not a great situation to be in to start with
(maybe we should put a cap in vprintk_emit()) but it's pretty annoying
to start seeing unexplained i2c timeouts.

A normal system shouldn't see i2c timeouts anyway, so increasing the
timeout should help people debugging without hurting other people
excessively.

Signed-off-by: Doug Anderson 
---
  drivers/i2c/busses/i2c-rk3x.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/i2c/busses/i2c-rk3x.c b/drivers/i2c/busses/i2c-rk3x.c
index 019d542..72e97e30 100644
--- a/drivers/i2c/busses/i2c-rk3x.c
+++ b/drivers/i2c/busses/i2c-rk3x.c
@@ -72,7 +72,7 @@ enum {
  #define REG_INT_ALL   0x7f
  
  /* Constants */

-#define WAIT_TIMEOUT  200 /* ms */
+#define WAIT_TIMEOUT  1000 /* ms */

Yeah,verified on veyron device.

Tested-by: Caesar Wang 

Thanks.
Caesar


  #define DEFAULT_SCL_RATE  (100 * 1000) /* Hz */
  
  enum rk3x_i2c_state {



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] i2c: rk3x: Increase wait timeout to 1 second

2015-04-30 Thread Caesar Wang



åœØ 2015幓05꜈01ę—„ 05:44, Doug Anderson 写道:

While it's not sensible for an i2c command to _actually_ need more
than 200ms to complete, let's increase the timeout anyway.  Why?  It
turns out that if you've got a large number of printks going out to a
serial console, interrupts on a CPU can be disabled for hundreds of
milliseconds. That's not a great situation to be in to start with
(maybe we should put a cap in vprintk_emit()) but it's pretty annoying
to start seeing unexplained i2c timeouts.

A normal system shouldn't see i2c timeouts anyway, so increasing the
timeout should help people debugging without hurting other people
excessively.

Signed-off-by: Doug Anderson 
---
  drivers/i2c/busses/i2c-rk3x.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/i2c/busses/i2c-rk3x.c b/drivers/i2c/busses/i2c-rk3x.c
index 019d542..72e97e30 100644
--- a/drivers/i2c/busses/i2c-rk3x.c
+++ b/drivers/i2c/busses/i2c-rk3x.c
@@ -72,7 +72,7 @@ enum {
  #define REG_INT_ALL   0x7f
  
  /* Constants */

-#define WAIT_TIMEOUT  200 /* ms */
+#define WAIT_TIMEOUT  1000 /* ms */

Yeah, verified on veyron device.

Tested-by: Caesar Wang 



Thanks.

Caesar

  #define DEFAULT_SCL_RATE  (100 * 1000) /* Hz */
  
  enum rk3x_i2c_state {



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v6] block: loop: avoiding too many pending per work I/O

2015-04-30 Thread Ming Lei
If there are too many pending per work I/O, too many
high priority work thread can be generated so that
system performance can be effected.

This patch limits the max pending per work I/O as 16,
and will fackback to single queue mode when the max
number is reached.

This patch fixes Fedora 22 live booting performance
regression when it is booted from squashfs over dm
based on loop, and looks the following reasons are
related with the problem:

- not like other filesyststems(such as ext4), squashfs
is a bit special, and I observed that increasing I/O jobs
to access file in squashfs only improve I/O performance a
little, but it can make big difference for ext4

- nested loop: both squashfs.img and ext3fs.img are mounted
as loop block, and ext3fs.img is inside the squashfs

- during booting, lots of tasks may run concurrently

Fixes: b5dd2f6047ca108001328aac0e8588edd15f1778
Cc: sta...@vger.kernel.org (v4.0)
Reported-by: Justin M. Forbes 
Tested-by: Justin M. Forbes 
Signed-off-by: Ming Lei 
---
 drivers/block/loop.c | 19 +--
 drivers/block/loop.h |  2 ++
 2 files changed, 19 insertions(+), 2 deletions(-)

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index ae3fcb4..5a728c6 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -1425,13 +1425,24 @@ static int loop_queue_rq(struct blk_mq_hw_ctx *hctx,
const struct blk_mq_queue_data *bd)
 {
struct loop_cmd *cmd = blk_mq_rq_to_pdu(bd->rq);
+   struct loop_device *lo = cmd->rq->q->queuedata;
+   bool single_queue = !!(cmd->rq->cmd_flags & REQ_WRITE);
+
+   /*
+* Fallback to single queue mode if the pending per work
+* I/O number reaches 16, otherwise too many high priority
+* worker thread may effect system performance as reported
+* in fedora live booting from squashfs over loop.
+*/
+   if (atomic_read(&lo->pending_per_work_io) >= 16)
+   single_queue = true;
 
blk_mq_start_request(bd->rq);
 
-   if (cmd->rq->cmd_flags & REQ_WRITE) {
-   struct loop_device *lo = cmd->rq->q->queuedata;
+   if (single_queue) {
bool need_sched = true;
 
+   cmd->per_work_io = false;
spin_lock_irq(&lo->lo_lock);
if (lo->write_started)
need_sched = false;
@@ -1443,6 +1454,8 @@ static int loop_queue_rq(struct blk_mq_hw_ctx *hctx,
if (need_sched)
queue_work(loop_wq, &lo->write_work);
} else {
+   atomic_inc(&lo->pending_per_work_io);
+   cmd->per_work_io = true;
queue_work(loop_wq, &cmd->read_work);
}
 
@@ -1467,6 +1480,8 @@ static void loop_handle_cmd(struct loop_cmd *cmd)
if (ret)
cmd->rq->errors = -EIO;
blk_mq_complete_request(cmd->rq);
+   if (cmd->per_work_io)
+   atomic_dec(&lo->pending_per_work_io);
 }
 
 static void loop_queue_write_work(struct work_struct *work)
diff --git a/drivers/block/loop.h b/drivers/block/loop.h
index 301c27f..eb855f5 100644
--- a/drivers/block/loop.h
+++ b/drivers/block/loop.h
@@ -57,6 +57,7 @@ struct loop_device {
struct list_headwrite_cmd_head;
struct work_struct  write_work;
boolwrite_started;
+   atomic_tpending_per_work_io;
int lo_state;
struct mutexlo_ctl_mutex;
 
@@ -68,6 +69,7 @@ struct loop_device {
 struct loop_cmd {
struct work_struct read_work;
struct request *rq;
+   bool per_work_io;
struct list_head list;
 };
 
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] fs: ext3: super: fixed a space coding style issue

2015-04-30 Thread Adir Kuhn
Fixed a coding style issue

Signed-off-by: Adir Kuhn 
---
 fs/ext3/super.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/ext3/super.c b/fs/ext3/super.c
index a9312f0..5ed0044 100644
--- a/fs/ext3/super.c
+++ b/fs/ext3/super.c
@@ -1908,7 +1908,7 @@ static int ext3_fill_super (struct super_block *sb, void 
*data, int silent)
sbi->s_mount_state = le16_to_cpu(es->s_state);
sbi->s_addr_per_block_bits = ilog2(EXT3_ADDR_PER_BLOCK(sb));
sbi->s_desc_per_block_bits = ilog2(EXT3_DESC_PER_BLOCK(sb));
-   for (i=0; i < 4; i++)
+   for (i = 0; i < 4; i++)
sbi->s_hash_seed[i] = le32_to_cpu(es->s_hash_seed[i]);
sbi->s_def_hash_version = es->s_def_hash_version;
i = le32_to_cpu(es->s_flags);
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 01/11] drivers/crypto: include for modular caam code

2015-04-30 Thread Herbert Xu
On Thu, Apr 30, 2015 at 09:47:37PM -0400, Paul Gortmaker wrote:
> This file is built off of a tristate Kconfig option and also contains
> modular function calls so it should explicitly include module.h to
> avoid compile breakage during header shuffles done in the future.
> 
> Cc: Herbert Xu 
> Cc: "David S. Miller" 
> Signed-off-by: Paul Gortmaker 

Please post patches to linux-cry...@vger.kernel.org if you want them
to go through the crypto tree.

Also it actually gets module.h through caam/compat.h.  So your
patch is unnecessary.

Cheers,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] crypto: Constify (de)compression parameters

2015-04-30 Thread Herbert Xu
On Tue, Apr 28, 2015 at 03:36:30PM +0100, David Howells wrote:
> In testmgr, struct pcomp_testvec takes a non-const 'params' field, which is
> pointed to a const deflate_comp_params or deflate_decomp_params object.  With
> gcc-5 this incurs the following warnings:
> 
> In file included from ../crypto/testmgr.c:44:0:
> ../crypto/testmgr.h:28736:13: warning: initialization discards 'const' 
> qualifier from pointer target type [-Wdiscarded-array-qualifiers]
>.params = &deflate_comp_params,
>  ^
> ../crypto/testmgr.h:28748:13: warning: initialization discards 'const' 
> qualifier from pointer target type [-Wdiscarded-array-qualifiers]
>.params = &deflate_comp_params,
>  ^
> ../crypto/testmgr.h:28776:13: warning: initialization discards 'const' 
> qualifier from pointer target type [-Wdiscarded-array-qualifiers]
>.params = &deflate_decomp_params,
>  ^
> ../crypto/testmgr.h:28800:13: warning: initialization discards 'const' 
> qualifier from pointer target type [-Wdiscarded-array-qualifiers]
>.params = &deflate_decomp_params,
>  ^
> 
> Fix this by making the parameters pointer const and constifying the things
> that use it.
> 
> Signed-off-by: David Howells 

Both patches applied.
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 4/6] crypto: drbg - add async seeding operation

2015-04-30 Thread Herbert Xu
On Tue, Apr 28, 2015 at 05:00:03AM +0200, Stephan Mueller wrote:
>
> @@ -1081,6 +1115,11 @@ static int drbg_seed(struct drbg_state *drbg, struct 
> drbg_string *pers,
>   return -EINVAL;
>   }
>  
> + /* cancel any previously invoked seeding */
> + mutex_unlock(&drbg->drbg_mutex);
> + drbg_async_work_cancel(&drbg->seed_work);
> + mutex_lock(&drbg->drbg_mutex);

This seems dangerous and unnecessary.  Releasing and reacquiring
the locks may invalidate previous checks.  Even if it doesn't
matter today if somebody modifies the callers later on this could
explode.

You can easily remove this by making get_blocking_random_bytes_cb
idempotent, i.e., do nothing if the work is already queued, which
is what it would do anyway if you simply move the INIT_WORK out of
it.

Cheers,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFD 5/5] tracing: Add trace_irqsoff tracepoints

2015-04-30 Thread Steven Rostedt
On Thu, 30 Apr 2015 21:14:52 -0500
Tom Zanussi  wrote:


> > 'hist:key=latency.bucket:val=hitcount:sort=latency if cpu==0'
> > 
> > but I haven't got this working. I didn't spend much time figuring out
> > why this doesn't work. Even if the above is working you still
> 
> I think it doesn't work because the tracepoint doesn't actually have a
> 'cpu' field to use in the filter...

Perhaps we should add special fields that don't use the tracepoint
field, but can use generically know fields that are always known when
the tracepoint is triggered. COMM could be one, as well as CPU.

-- Steve
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] sched/rt: Optimizate task_woken_rt()

2015-04-30 Thread Steven Rostedt
On Fri, 1 May 2015 10:02:47 +0800
pang.xun...@zte.com.cn wrote:

> > > 
> > > - Remove "!test_tsk_need_resched(rq->curr)" condition, because
> > > the flag might be set right before the waking up, but we still
> > > need to push equal or lower priority tasks, it should be removed.
> > > Without this condition, we actually get the right logic.
> > 
> > But doesn't that happen when we schedule?
> 
> It does, but will have some latency.

What latency? The need_resched flag is set, that means as soon as this
CPU is in a preemptable situation, it will schedule. The most common
case would be return from interrupt, as interrupts or softirqs are
usually what trigger the wakeups.

But if we do it here, the need resched flag could be set because
another higher priority task is about to preempt the current one that
is higher than what just woke up. So we move it now to another CPU, and
then on return of the interrupt we schedule. Then the current running
task gets preempted by the higher priority task and it too can migrate
to the CPU we just pushed the other one to, and it still doesn't run,
but yet it got moved for no reason at all.

I feel better if the need resched flag is set, we wait till a schedule
happens to see where everything is about to be moved.


> 
> Still "rq->curr->prio <= p->prio" will be enough for us to ensure the 
> proper 
> push_rt_tasks() without this condition.

I have no idea what the above means.

> 
> Beside that, for "rq->curr->nr_cpus_allowed < 2", I noticed it was 
> introduced 
> by commit b3bc211cfe7d5fe94b, but with "!test_tsk_need_resched(rq->curr)",
> it actaully can't be satisfied.

What can't be satisfied?

> 
> So, I think this condition should be removed.

I'm still not convinced.

-- Steve

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


god dag

2015-04-30 Thread Loan EngineĀ®
hallo

FĆ„ et lĆ„n nĆ„ med lĆ„n motor Ā®, med en rente pĆ„ 3%. Fyll ut skjemaet nedenfor 
hvis du er interessert i:

KjĆønn:
Land:
Forbruk:
Lengde:
FormƄl:

Det er mange grunner til hvorfor et lƄn kan hjelpe

Hilsener,

Kennel Turid
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 09/11] drivers/scsi: include for modular ufshcd-pltfrm code

2015-04-30 Thread James Bottomley
On Thu, 2015-04-30 at 21:47 -0400, Paul Gortmaker wrote:
> This file is built off of a tristate Kconfig option and also contains
> modular function calls so it should explicitly include module.h to
> avoid compile breakage during header shuffles done in the future.

I don't understand your logic.  The ufs code made a design choice to
consolidate most headers for the hcd code in a local include (ufshcd.h),
which includes module.h, so why would they explicitly need it here as
well?  And if we follow your logic, why wouldn't they also need to
duplicate everything else (like the scsi includes)?

James


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFD 5/5] tracing: Add trace_irqsoff tracepoints

2015-04-30 Thread Tom Zanussi
On Thu, 2015-04-30 at 12:06 +0200, Daniel Wagner wrote:
> Finally we place a few tracepoint at the end of critical section. With
> the hist trigger in place we can generate the plots.
> 
> There are a few drawbacks compared to the latency_hist.patch [1]
> 
> The latency plots contain the values from all CPUs. In theory you
> can also filter with something like
> 
> 'hist:key=latency.bucket:val=hitcount:sort=latency if cpu==0'
> 
> but I haven't got this working. I didn't spend much time figuring out
> why this doesn't work. Even if the above is working you still

I think it doesn't work because the tracepoint doesn't actually have a
'cpu' field to use in the filter...

Tom

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFD 3/5] tracing: Add option to quantize key values

2015-04-30 Thread Tom Zanussi
On Thu, 2015-04-30 at 12:06 +0200, Daniel Wagner wrote:
> Let's group some values together. This avoids a too detailed
> histogram. Some sort of logarythmic scale could be useful
> for latency plots.
> 
> Now we can write something like:
> 
> 'hist:key=latency.bucket:val=hitcount:sort=latency'
> 
> latency:  0 hitcount: 166440
> latency:256 hitcount:  21104
> latency:512 hitcount:   7754
> latency:768 hitcount:   3269
> latency:   1024 hitcount:   1647
> latency:   1280 hitcount:841
> latency:   1536 hitcount:524
> latency:   1792 hitcount:371
> latency:   2048 hitcount:302
> latency:   2304 hitcount:240
> latency:   2560 hitcount:207
> latency:   2816 hitcount:149
> latency:   3072 hitcount:123
> latency:   3328 hitcount:119
> latency:   3584 hitcount:102
> latency:   3840 hitcount: 94
> latency:   4096 hitcount: 89
> latency:   4352 hitcount: 79
> latency:   4608 hitcount: 88
> 

Nice addition!

> One thing I struggled with the grammatic above is that I haven't found
> a nice way to pass in arguments, for example the bucket size. There a lot
> of options to do it. Just a couple random ideas, not necessarly consistent
> or clever:
> 
> 'hist:key=latency.bucket[10,1.5]:val=hitcount:sort=latency'
>where [x,y]: x first bucket size, y scaling factor
> 

I like this notation - it's consistent with the other uses of the dot
notation in that it's modifying the way things are displayed, in this
case displaying latency as a histogram with specific [non-default]
parameters.

Tom

> 'hist:key=latency:val=hitcount:sort=latency:bucket=latency,10,1.5'
> 
> Not for inclusion!
> 
> Not-Signed-off-by: Daniel Wagner 
> ---
>  kernel/trace/trace_events_hist.c | 8 
>  1 file changed, 8 insertions(+)
> 
> diff --git a/kernel/trace/trace_events_hist.c 
> b/kernel/trace/trace_events_hist.c
> index fe06707..cac94a6 100644
> --- a/kernel/trace/trace_events_hist.c
> +++ b/kernel/trace/trace_events_hist.c
> @@ -84,6 +84,7 @@ enum hist_field_flags {
>   HIST_FIELD_STRING   = 8,
>   HIST_FIELD_EXECNAME = 16,
>   HIST_FIELD_SYSCALL  = 32,
> + HIST_FIELD_BUCKET   = 64,
>  };
>  
>  struct hist_trigger_sort_key {
> @@ -400,6 +401,8 @@ static int create_key_field(struct hist_trigger_data 
> *hist_data,
>   flags |= HIST_FIELD_EXECNAME;
>   else if (!strcmp(field_str, "syscall"))
>   flags |= HIST_FIELD_SYSCALL;
> + else if (!strcmp(field_str, "bucket"))
> + flags |= HIST_FIELD_BUCKET;
>   }
>  
>   field = trace_find_event_field(file->event_call, field_name);
> @@ -900,6 +903,9 @@ static void event_hist_trigger(struct event_trigger_data 
> *data, void *rec)
>   key = entries;
>   } else {
>   field_contents = hist_data->key->fn(hist_data->key, rec);
> + if (hist_data->key->flags & HIST_FIELD_BUCKET)
> + field_contents &= ~0xff;
> +
>   if (hist_data->key->flags & HIST_FIELD_STRING)
>   key = (void *)field_contents;
>   else
> @@ -1343,6 +1349,8 @@ static const char *get_hist_field_flags(struct 
> hist_field *hist_field)
>   flags_str = "hex";
>   else if (hist_field->flags & HIST_FIELD_SYSCALL)
>   flags_str = "syscall";
> + else if (hist_field->flags & HIST_FIELD_BUCKET)
> + flags_str = "bucket";
>   else if (hist_field->flags & HIST_FIELD_EXECNAME)
>   flags_str = "execname";
>  


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFD 2/5] tracing: Add support to sort on the key

2015-04-30 Thread Tom Zanussi
On Thu, 2015-04-30 at 12:06 +0200, Daniel Wagner wrote:
> The hist patch allows sorting on values only. By allowing to
> sort also on the key we can do something like this:
> 
> 'hist:key=latency:val=hitcount:sort=latency'
> 
> latency: 16 hitcount:  3
> latency: 17 hitcount:171
> latency: 18 hitcount:626
> latency: 19 hitcount:594
> latency: 20 hitcount:306
> latency: 21 hitcount:214
> latency: 22 hitcount:232
> latency: 23 hitcount:283
> latency: 24 hitcount:235
> latency: 25 hitcount:105
> latency: 26 hitcount: 54
> latency: 27 hitcount: 79
> latency: 28 hitcount:214
> latency: 29 hitcount:895
> latency: 30 hitcount:   1400
> latency: 31 hitcount:774
> latency: 32 hitcount:653
> [...]
> 
> The obvious choice for the bool was already taken. I haven't found a
> good name for the the flag. I guess it would make sense to refactor the
> sorting code so that it doesn't really matter what kind of field it
> is.
> 

I think you're right - the current flag is kind of an internal thing to
the implementation, and uses a name that's too generic.  Of course, you
should be able to sort on keys as well as values, and the code shouldn't
care too much about which is specified.  The original code was more
capable wrt sorting and I probably simplified it a bit too much - I'll
refactor things taking all that into account for the next version.

> BTW, I wonder if it would possible to drop the need to always provide
> the 'val' argument and just assume the 'val=hitcount' in this case.
> 

That also makes a lot of sense - I'll make that change too.

Tom

> Not for inclusion!
> 
> Not-Signed-off-by: Daniel Wagner 
> ---
>  kernel/trace/trace_events_hist.c | 35 +++
>  1 file changed, 31 insertions(+), 4 deletions(-)
> 
> diff --git a/kernel/trace/trace_events_hist.c 
> b/kernel/trace/trace_events_hist.c
> index 9a7a675..fe06707 100644
> --- a/kernel/trace/trace_events_hist.c
> +++ b/kernel/trace/trace_events_hist.c
> @@ -89,6 +89,7 @@ enum hist_field_flags {
>  struct hist_trigger_sort_key {
>   booluse_hitcount;
>   booluse_key;
> + booluse_real_key;
>   booldescending;
>   unsigned intidx;
>  };
> @@ -260,7 +261,7 @@ static void destroy_hist_fields(struct hist_trigger_data 
> *hist_data)
>   }
>  }
>  
> -static inline struct hist_trigger_sort_key *create_default_sort_key(void)
> +static inline struct hist_trigger_sort_key *create_hitcount_sort_key(void)
>  {
>   struct hist_trigger_sort_key *sort_key;
>  
> @@ -273,6 +274,19 @@ static inline struct hist_trigger_sort_key 
> *create_default_sort_key(void)
>   return sort_key;
>  }
>  
> +static inline struct hist_trigger_sort_key *create_real_key_sort_key(void)
> +{
> + struct hist_trigger_sort_key *sort_key;
> +
> + sort_key = kzalloc(sizeof(*sort_key), GFP_KERNEL);
> + if (!sort_key)
> + return ERR_PTR(-ENOMEM);
> +
> + sort_key->use_real_key = true;
> +
> + return sort_key;
> +}
> +
>  static inline struct hist_trigger_sort_key *
>  create_sort_key(char *field_name, struct hist_trigger_data *hist_data)
>  {
> @@ -280,7 +294,10 @@ create_sort_key(char *field_name, struct 
> hist_trigger_data *hist_data)
>   unsigned int i;
>  
>   if (!strcmp(field_name, "hitcount"))
> - return create_default_sort_key();
> + return create_hitcount_sort_key();
> +
> + if (!strcmp(field_name, hist_data->key->field->name))
> + return create_real_key_sort_key();
>  
>   for (i = 0; i < hist_data->n_vals; i++)
>   if (!strcmp(field_name, hist_data->vals[i]->field->name))
> @@ -306,7 +323,7 @@ static int create_sort_keys(struct hist_trigger_data 
> *hist_data)
>   int ret = 0;
>  
>   if (!fields_str) {
> - sort_key = create_default_sort_key();
> + sort_key = create_hitcount_sort_key();
>   if (IS_ERR(sort_key)) {
>   ret = PTR_ERR(sort_key);
>   goto out;
> @@ -984,6 +1001,12 @@ static int cmp_entries(const struct 
> hist_trigger_sort_entry **a,
>   hist_data = entry_a->hist_data;
>   sort_key = hist_data->sort_key_cur;
>  
> + if (sort_key->use_real_key) {
> + val_a = *(u64 *)entry_a->key;
> + val_b = *(u64 *)entry_b->key;
> + goto out;
> + }
> +
>   if (sort_key->use_key) {
>   if (memcmp((*a)->key, (*b)->key, hist_data->map->key_size))
>   ret = 1;
> @@ -998,6 +1021,7 @@ static int cmp_entries(const struct 
> hist_trigger_sort_entry **a,
>   val_b = atomic64_read(&entry_b->sums[sort_key->idx]);
>   }
>  
> +out:
>   if (val_a > val_b)

Re: [RFD 0/5] Add latency histogram

2015-04-30 Thread Tom Zanussi
Hi Daniel,

On Thu, 2015-04-30 at 12:06 +0200, Daniel Wagner wrote:
> Hi,
> 
> I would like to discuss a possible way of getting the feature of the
> latecy_hist.patch [1] added to mainline.
> 
> "Latency histograms are primarily relevant in the context of real-time
> enabled kernels (CONFIG_PREEMPT/CONFIG_PREEMPT_RT)and are used in the
> quality management of the Linux real-time capabilities."
> 
> Steven pointed out that this might be doable based on Tom Zanussi's
> "[PATCH v4 0/7] tracing: 'hist' triggers" [2].
> 
> Here are my findings. It was not too complicated to get it working,
> though I had to add some hacks. I have added comments to each patch.
> 

It looks like you were able to do quite a bit here with not much code -
nice!

Just FYI, I'll be working on a v5 of the hist triggers patchset that
will incorporate the stuff from patch 1 (needs to be split into a
separate patch for the triggers code already upstream, and one for hist
triggers) and your comments from patch 2 (see comments in my reply to
that patch), along with  a couple other unrelated changes...

Tom

> cheers,
> daniel
> 
> [2] https://lkml.org/lkml/2015/4/10/591
> [1] 
> https://git.kernel.org/cgit/linux/kernel/git/rt/linux-stable-rt.git/commit/?h=v3.14-rt-rebase&id=56d50cc34943bbba12b8c5942ee1ae3b29f73acb
> 
> Daniel Wagner (4):
>   tracing: Add support to sort on the key
>   tracing: Add option to quantize key values
>   tracing: Deference pointers without RCU checks
>   tracing: Add trace_irqsoff tracepoints
> 
> Tom Zanussi (1):
>   tracing: 'hist' triggers
> 
>  include/linux/rculist.h | 36 +
>  include/linux/tracepoint.h  |  4 ++--
>  include/trace/events/latency.h  | 40 
>  kernel/trace/trace_events_hist.c| 46 
> +
>  kernel/trace/trace_events_trigger.c | 18 +--
>  kernel/trace/trace_irqsoff.c| 38 ++
>  6 files changed, 168 insertions(+), 14 deletions(-)
>  create mode 100644 include/trace/events/latency.h
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 04/11] drivers/gpu: include for modular rockchip code

2015-04-30 Thread Paul Gortmaker
These files are built off of a tristate Kconfig option and also contain
modular function calls so they should explicitly include module.h to
avoid compile breakage during header shuffles done in the future.

Cc: David Airlie 
Cc: Mark Yao 
Cc: dri-de...@lists.freedesktop.org
Signed-off-by: Paul Gortmaker 
---
 drivers/gpu/drm/rockchip/rockchip_drm_drv.c | 1 +
 drivers/gpu/drm/rockchip/rockchip_drm_vop.c | 1 +
 2 files changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/rockchip/rockchip_drm_drv.c 
b/drivers/gpu/drm/rockchip/rockchip_drm_drv.c
index 3962176ee713..01b558fe3695 100644
--- a/drivers/gpu/drm/rockchip/rockchip_drm_drv.c
+++ b/drivers/gpu/drm/rockchip/rockchip_drm_drv.c
@@ -21,6 +21,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
diff --git a/drivers/gpu/drm/rockchip/rockchip_drm_vop.c 
b/drivers/gpu/drm/rockchip/rockchip_drm_vop.c
index ccb0ce073ef2..38155215efcd 100644
--- a/drivers/gpu/drm/rockchip/rockchip_drm_vop.c
+++ b/drivers/gpu/drm/rockchip/rockchip_drm_vop.c
@@ -19,6 +19,7 @@
 #include 
 
 #include 
+#include 
 #include 
 #include 
 #include 
-- 
2.2.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 11/11] sh: mach-highlander/psw.c is tristate and should use module.h

2015-04-30 Thread Paul Gortmaker
This file is controlled by a tristate Kconfig option, and hence
needs to include module.h so that it can get module_init() once
we relocate it from init.h into module.h in the future.

Note that module_exit() appears to be missing from the driver, so
it is questionable whether it would actually work for a removal
and reload cycle if it was configured for a modular build.

Cc: Paul Mundt 
Cc: linux...@vger.kernel.org
Signed-off-by: Paul Gortmaker 
---
 arch/sh/boards/mach-highlander/psw.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/sh/boards/mach-highlander/psw.c 
b/arch/sh/boards/mach-highlander/psw.c
index 522786318d36..40e2b585d488 100644
--- a/arch/sh/boards/mach-highlander/psw.c
+++ b/arch/sh/boards/mach-highlander/psw.c
@@ -10,7 +10,7 @@
  * for more details.
  */
 #include 
-#include 
+#include 
 #include 
 #include 
 #include 
-- 
2.2.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 01/11] drivers/crypto: include for modular caam code

2015-04-30 Thread Paul Gortmaker
This file is built off of a tristate Kconfig option and also contains
modular function calls so it should explicitly include module.h to
avoid compile breakage during header shuffles done in the future.

Cc: Herbert Xu 
Cc: "David S. Miller" 
Signed-off-by: Paul Gortmaker 
---
 drivers/crypto/caam/ctrl.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/crypto/caam/ctrl.c b/drivers/crypto/caam/ctrl.c
index efba4ccd4fac..b9ad19df372d 100644
--- a/drivers/crypto/caam/ctrl.c
+++ b/drivers/crypto/caam/ctrl.c
@@ -5,6 +5,7 @@
  */
 
 #include 
+#include 
 #include 
 #include 
 
-- 
2.2.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 07/11] drivers/pcmcia: include for modular xxs1500_ss code

2015-04-30 Thread Paul Gortmaker
This file is built off of a tristate Kconfig option and also contains
modular function calls so it should explicitly include module.h to
avoid compile breakage during header shuffles done in the future.

Cc: Wolfram Sang 
Cc: linux-pcm...@lists.infradead.org
Signed-off-by: Paul Gortmaker 
---
 drivers/pcmcia/xxs1500_ss.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/pcmcia/xxs1500_ss.c b/drivers/pcmcia/xxs1500_ss.c
index 4c04360f378b..b2a189507fc3 100644
--- a/drivers/pcmcia/xxs1500_ss.c
+++ b/drivers/pcmcia/xxs1500_ss.c
@@ -11,6 +11,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
-- 
2.2.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 06/11] drivers/net: include for modular stmmac_platform code

2015-04-30 Thread Paul Gortmaker
This file is built off of a tristate Kconfig option and also contains
modular function calls so it should explicitly include module.h to
avoid compile breakage during header shuffles done in the future.

Cc: Giuseppe Cavallaro 
Cc: net...@vger.kernel.org
Signed-off-by: Paul Gortmaker 
---
 drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c 
b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
index 705bbdf93940..68aec5c460db 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
@@ -23,6 +23,7 @@
 
***/
 
 #include 
+#include 
 #include 
 #include 
 #include 
-- 
2.2.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 02/11] drivers/clk: include for clk-max77xxx modular code

2015-04-30 Thread Paul Gortmaker
These files are built off of the tristate COMMON_CLK_MAX77686 and
COMMON_CLK_MAX77802 respectively.  They also contains modular function
calls so they should explicitly include module.h to avoid compile
breakage during header shuffles done in the future.

Cc: Mike Turquette 
Cc: Stephen Boyd 
Signed-off-by: Paul Gortmaker 
---
 drivers/clk/clk-max77686.c | 1 +
 drivers/clk/clk-max77802.c | 1 +
 2 files changed, 2 insertions(+)

diff --git a/drivers/clk/clk-max77686.c b/drivers/clk/clk-max77686.c
index 86cdb3a28629..446c2fe76dc2 100644
--- a/drivers/clk/clk-max77686.c
+++ b/drivers/clk/clk-max77686.c
@@ -23,6 +23,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
diff --git a/drivers/clk/clk-max77802.c b/drivers/clk/clk-max77802.c
index 0729dc723a8f..74c49b93a6eb 100644
--- a/drivers/clk/clk-max77802.c
+++ b/drivers/clk/clk-max77802.c
@@ -22,6 +22,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
-- 
2.2.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 08/11] drivers/pcmcia: include for modular max77802 code

2015-04-30 Thread Paul Gortmaker
This file is built off of a tristate Kconfig option and also contains
modular function calls so it should explicitly include module.h to
avoid compile breakage during header shuffles done in the future.

Cc: Liam Girdwood 
Cc: Mark Brown 
Signed-off-by: Paul Gortmaker 
---
 drivers/regulator/max77802.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/regulator/max77802.c b/drivers/regulator/max77802.c
index 6af41abccacb..c07ee13bd470 100644
--- a/drivers/regulator/max77802.c
+++ b/drivers/regulator/max77802.c
@@ -27,6 +27,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
-- 
2.2.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 03/11] drivers/gpio: include for modular crystalcove code

2015-04-30 Thread Paul Gortmaker
This file is built off of a tristate Kconfig option and also contains
modular function calls so it should explicitly include module.h to
avoid compile breakage during header shuffles done in the future.

Cc: Linus Walleij 
Cc: Alexandre Courbot 
Cc: linux-g...@vger.kernel.org
Signed-off-by: Paul Gortmaker 
---
 drivers/gpio/gpio-crystalcove.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpio/gpio-crystalcove.c b/drivers/gpio/gpio-crystalcove.c
index 91a7ffe83135..cf28ec525e93 100644
--- a/drivers/gpio/gpio-crystalcove.c
+++ b/drivers/gpio/gpio-crystalcove.c
@@ -16,6 +16,7 @@
  */
 
 #include 
+#include 
 #include 
 #include 
 #include 
-- 
2.2.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 10/11] drivers/staging: include for modular android tegra_ion code

2015-04-30 Thread Paul Gortmaker
This file is built off of a tristate Kconfig option and also contains
modular function calls so it should explicitly include module.h to
avoid compile breakage during header shuffles done in the future.

Cc: Greg Kroah-Hartman 
Cc: "Arve Hjļæ½nnevļæ½g" 
Cc: Riley Andrews 
Cc: Stephen Warren 
Cc: Thierry Reding 
Cc: Alexandre Courbot 
Cc: de...@driverdev.osuosl.org
Cc: linux-te...@vger.kernel.org
Signed-off-by: Paul Gortmaker 
---
 drivers/staging/android/ion/tegra/tegra_ion.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/staging/android/ion/tegra/tegra_ion.c 
b/drivers/staging/android/ion/tegra/tegra_ion.c
index 5b8ef0e66010..4d3c516cc15e 100644
--- a/drivers/staging/android/ion/tegra/tegra_ion.c
+++ b/drivers/staging/android/ion/tegra/tegra_ion.c
@@ -15,6 +15,7 @@
  */
 
 #include 
+#include 
 #include 
 #include 
 #include "../ion.h"
-- 
2.2.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 09/11] drivers/scsi: include for modular ufshcd-pltfrm code

2015-04-30 Thread Paul Gortmaker
This file is built off of a tristate Kconfig option and also contains
modular function calls so it should explicitly include module.h to
avoid compile breakage during header shuffles done in the future.

Cc: Vinayak Holikatti 
Cc: "James E.J. Bottomley" 
Cc: linux-s...@vger.kernel.org
Signed-off-by: Paul Gortmaker 
---
 drivers/scsi/ufs/ufshcd-pltfrm.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/scsi/ufs/ufshcd-pltfrm.c b/drivers/scsi/ufs/ufshcd-pltfrm.c
index 7db9564f507d..1c0bac8a7e4a 100644
--- a/drivers/scsi/ufs/ufshcd-pltfrm.c
+++ b/drivers/scsi/ufs/ufshcd-pltfrm.c
@@ -33,6 +33,7 @@
  * this program.
  */
 
+#include 
 #include 
 #include 
 #include 
-- 
2.2.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 05/11] drivers/hsi: include for modular omap_ssi code

2015-04-30 Thread Paul Gortmaker
These files are built off of a tristate Kconfig option and also contain
modular function calls so they should explicitly include module.h to
avoid compile breakage during header shuffles done in the future.

We change the one header file wich gives us coverage on both files:
   drivers/hsi/controllers/omap_ssi.c
   drivers/hsi/controllers/omap_ssi_port.c

Cc: Sebastian Reichel 
Signed-off-by: Paul Gortmaker 
---
 drivers/hsi/controllers/omap_ssi.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/hsi/controllers/omap_ssi.h 
b/drivers/hsi/controllers/omap_ssi.h
index 9d056417d88c..f9aaf37262be 100644
--- a/drivers/hsi/controllers/omap_ssi.h
+++ b/drivers/hsi/controllers/omap_ssi.h
@@ -24,6 +24,7 @@
 #define __LINUX_HSI_OMAP_SSI_H__
 
 #include 
+#include 
 #include 
 #include 
 #include 
-- 
2.2.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 00/11] Fix implicit includes of that will break.

2015-04-30 Thread Paul Gortmaker
The files changed here are simply modular source files that are implicitly
relying on  being present.  We fix them up now, so that we can
decouple some of the module related init code from the core init code in
another pending series.

This is the second series; a pseudo followup to the 1st series[1] factored out
from what was a previously larger series[2] so that there is a common theme and
lower patch count to ease review.

In this case the addition of module.h include to several files is the common
theme, and it is a no-op from a code generation point of view, and even from
a compile point of view at this point in time.

There are probably lots more implicit includes of  in tree, but
these are the ones that must be fixed in order to avoid build breakage
fallout for the pending module.h <---> init.h code relocations.

Paul.

[1] https://lkml.org/lkml/2015/4/27/777
[2] https://marc.info/?l=linux-kernel&m=139033951228828

---

Paul Gortmaker (11):
  drivers/crypto: include  for modular caam code
  drivers/clk: include  for clk-max77xxx modular code
  drivers/gpio: include  for modular crystalcove code
  drivers/gpu: include  for modular rockchip code
  drivers/hsi: include  for modular omap_ssi code
  drivers/net: include  for modular stmmac_platform code
  drivers/pcmcia: include  for modular xxs1500_ss code
  drivers/pcmcia: include  for modular max77802 code
  drivers/scsi: include  for modular ufshcd-pltfrm code
  drivers/staging: include  for modular android tegra_ion code
  sh: mach-highlander/psw.c is tristate and should use module.h

 arch/sh/boards/mach-highlander/psw.c  | 2 +-
 drivers/clk/clk-max77686.c| 1 +
 drivers/clk/clk-max77802.c| 1 +
 drivers/crypto/caam/ctrl.c| 1 +
 drivers/gpio/gpio-crystalcove.c   | 1 +
 drivers/gpu/drm/rockchip/rockchip_drm_drv.c   | 1 +
 drivers/gpu/drm/rockchip/rockchip_drm_vop.c   | 1 +
 drivers/hsi/controllers/omap_ssi.h| 1 +
 drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c | 1 +
 drivers/pcmcia/xxs1500_ss.c   | 1 +
 drivers/regulator/max77802.c  | 1 +
 drivers/scsi/ufs/ufshcd-pltfrm.c  | 1 +
 drivers/staging/android/ion/tegra/tegra_ion.c | 1 +
 13 files changed, 13 insertions(+), 1 deletion(-)

-- 
2.2.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH kernel v9 31/32] vfio: powerpc/spapr: Support multiple groups in one container if possible

2015-04-30 Thread Benjamin Herrenschmidt
On Thu, 2015-04-30 at 19:33 +1000, Alexey Kardashevskiy wrote:
> On 04/30/2015 05:22 PM, David Gibson wrote:
> > On Sat, Apr 25, 2015 at 10:14:55PM +1000, Alexey Kardashevskiy wrote:
> >> At the moment only one group per container is supported.
> >> POWER8 CPUs have more flexible design and allows naving 2 TCE tables per
> >> IOMMU group so we can relax this limitation and support multiple groups
> >> per container.
> >
> > It's not obvious why allowing multiple TCE tables per PE has any
> > pearing on allowing multiple groups per container.
> 
> 
> This patchset is a global TCE tables rework (patches 1..30, roughly) with 2 
> outcomes:
> 1. reusing the same IOMMU table for multiple groups - patch 31;
> 2. allowing dynamic create/remove of IOMMU tables - patch 32.
> 
> I can remove this one from the patchset and post it separately later but 
> since 1..30 aim to support both 1) and 2), I'd think I better keep them all 
> together (might explain some of changes I do in 1..30).

I think you are talking past each other :-)

But yes, having 2 tables per group is orthogonal to the ability of
having multiple groups per container.

The latter is made possible on P8 in large part because each PE has its
own DMA address space (unlike P5IOC2 or P7IOC where a single address
space is segmented).

Also, on P8 you can actually make the TVT entries point to the same
table in memory, thus removing the need to duplicate the actual
tables (though you still have to duplicate the invalidations). I would
however recommend only sharing the table that way within a chip/node.

 .../..

> >>
> >> -1) Only one IOMMU group per container is supported as an IOMMU group
> >> -represents the minimal entity which isolation can be guaranteed for and
> >> -groups are allocated statically, one per a Partitionable Endpoint (PE)
> >> +1) On older systems (POWER7 with P5IOC2/IODA1) only one IOMMU group per
> >> +container is supported as an IOMMU table is allocated at the boot time,
> >> +one table per a IOMMU group which is a Partitionable Endpoint (PE)
> >>   (PE is often a PCI domain but not always).

> > I thought the more fundamental problem was that different PEs tended
> > to use disjoint bus address ranges, so even by duplicating put_tce
> > across PEs you couldn't have a common address space.

Yes. This is the problem with P7IOC and earlier. It *could* be doable on
P7IOC by making them the same PE but let's not go there.

> Sorry, I am not following you here.
> 
> By duplicating put_tce, I can have multiple IOMMU groups on the same 
> virtual PHB in QEMU, "[PATCH qemu v7 04/14] spapr_pci_vfio: Enable multiple 
> groups per container" does this, the address ranges will the same.

But that is only possible on P8 because only there do we have separate
address spaces between PEs.

> What I cannot do on p5ioc2 is programming the same table to multiple 
> physical PHBs (or I could but it is very different than IODA2 and pretty 
> ugly and might not always be possible because I would have to allocate 
> these pages from some common pool and face problems like fragmentation).

And P7IOC has a similar issue. The DMA address top bits indexes the
window on P7IOC within a shared address space. It's possible to
configure a TVT to cover multiple devices but with very serious
limitations.

> >> +Newer systems (POWER8 with IODA2) have improved hardware design which 
> >> allows
> >> +to remove this limitation and have multiple IOMMU groups per a VFIO 
> >> container.
> >>
> >>   2) The hardware supports so called DMA windows - the PCI address range
> >>   within which DMA transfer is allowed, any attempt to access address space
> >> diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c 
> >> b/drivers/vfio/vfio_iommu_spapr_tce.c
> >> index a7d6729..970e3a2 100644
> >> --- a/drivers/vfio/vfio_iommu_spapr_tce.c
> >> +++ b/drivers/vfio/vfio_iommu_spapr_tce.c
> >> @@ -82,6 +82,11 @@ static void decrement_locked_vm(long npages)
> >>* into DMA'ble space using the IOMMU
> >>*/
> >>
> >> +struct tce_iommu_group {
> >> +  struct list_head next;
> >> +  struct iommu_group *grp;
> >> +};
> >> +
> >>   /*
> >>* The container descriptor supports only a single group per container.
> >>* Required by the API as the container is not supplied with the IOMMU 
> >> group
> >> @@ -89,10 +94,11 @@ static void decrement_locked_vm(long npages)
> >>*/
> >>   struct tce_container {
> >>struct mutex lock;
> >> -  struct iommu_group *grp;
> >>bool enabled;
> >>unsigned long locked_pages;
> >>bool v2;
> >> +  struct iommu_table tables[IOMMU_TABLE_GROUP_MAX_TABLES];
> >
> > Hrm,  so here we have more copies of the full iommu_table structures,
> > which again muddies the lifetime.  The table_group pointer is
> > presumably meaningless in these copies, which seems dangerously
> > confusing.
> 
> 
> Ouch. This is bad. No, table_group is not pointless here as it is used to 
> get to the PE number to invalidate TCE cache. I just realized although I 
> ne

Re: [PATCH 2/2] clk: qcom: Fix MSM8916 gfx3d_clk_src configuration

2015-04-30 Thread Stephen Boyd
On 04/29, Georgi Djakov wrote:
> The gfx3d_clk_src parents configuration is incorrect. Fix it.
> 
> Fixes: 3966fab8b6ab "clk: qcom: Add MSM8916 Global Clock Controller support"
> Signed-off-by: Georgi Djakov 

Applied to clk-fixes

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] clk: qcom: Fix MSM8916 venus divider value

2015-04-30 Thread Stephen Boyd
On 04/29, Georgi Djakov wrote:
> One of the video codec clock frequencies has incorrect divider value. Fix it.
> 
> Fixes: 3966fab8b6ab "clk: qcom: Add MSM8916 Global Clock Controller support"
> Signed-off-by: Georgi Djakov 

Applied to clk-fixes

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RESEND] clk: Fix JSON output in debugfs

2015-04-30 Thread Stephen Boyd
On 04/30, Felipe Balbi wrote:
> On Thu, Apr 30, 2015 at 05:37:12PM -0700, Stephen Boyd wrote:
> > On 04/29, Stefan Wahren wrote:
> > > key/value pairs in a JSON object must be separated by a comma.
> > > After adding the properties "accuracy" and "phase" the JSON output
> > > of /sys/kernel/debug/clk/clk_dump is invalid.
> > > 
> > > So add the missing commas to fix it.
> > > 
> > > Fixes: 5279fc4 ("clk: add clk accuracy retrieval support")
> > > Signed-off-by: Stefan Wahren 
> > 
> > Hmph, this regression is old, v3.14 days. We probably ought to
> > have a comment in here stating this should be JSON format.
> > 
> > Applied to clk-next with the comment below squashed in.
> > 
> > 8<
> > diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c
> > index 5edbec6dfb20..b850a0ef5b9f 100644
> > --- a/drivers/clk/clk.c
> > +++ b/drivers/clk/clk.c
> > @@ -1974,6 +1974,7 @@ static void clk_dump_one(struct seq_file *s, struct 
> > clk_core *c, int level)
> > if (!c)
> > return;
> >  
> > +   /* This should be JSON format, i.e. elements separated with a comma */
> > seq_printf(s, "\"%s\": { ", c->name);
> > seq_printf(s, "\"enable_count\": %d,", c->enable_count);
> > seq_printf(s, "\"prepare_count\": %d,", c->prepare_count);
> 
> you probably want to a newline character after all clocks have been
> dumped.

Sure. Please send it as a separate patch with signed-off and I'll
apply. It doesn't seem like a fix for a regression.

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


context tracking vs. syscall_trace_leave & do_notify_resume loop

2015-04-30 Thread Rik van Riel
Andy pointed out to me something I should have seen earlier: both
syscall_trace_leave and do_notify_resume call both user_exit()
and user_enter(), which has the potential to greatly increase the
cost of context tracking.

I believe (though it is hard to know for sure) there are legitimate
reasons why there is a loop around syscall_trace_leave and
do_notify_resume, but I strongly suspect the context tracking code
does not need to be in that loop.

I suspect it would be possible to stick a call to a new function
(return_to_user ?) right after the DISABLE_INTERRUPTS below, which
could be used to do the context tracking user_enter just once, and
later on also to load the user FPU context (patches I have sitting
around).

syscall_return:
/* The IRETQ could re-enable interrupts: */
DISABLE_INTERRUPTS(CLBR_ANY)
TRACE_IRQS_IRETQ

Andy, Denys, do you guys see any issues with that idea?

I realize that would mean a RESTORE_EXTRA_REGS after that call
to return_to_user(), but it looks like that could be achieved
without making the code any worse than it already is :)

-- 
All rights reversed
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] clk: at91: Constify irq_domain_ops

2015-04-30 Thread Stephen Boyd
On 04/27, Boris Brezillon wrote:
> On Mon, 27 Apr 2015 21:52:38 +0900
> Krzysztof Kozlowski  wrote:
> 
> > The irq_domain_ops are not modified by the driver and the irqdomain core
> > code accepts pointer to a const data.
> > 
> > Signed-off-by: Krzysztof Kozlowski 
> 
> Acked-by: Boris Brezillon 
> 

Thanks. Applied to clk-next.
-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] block: loop: avoiding too many pending per work I/O

2015-04-30 Thread Ming Lei
On Fri, May 1, 2015 at 12:59 AM, Jeff Moyer  wrote:
> Ming Lei  writes:
>
>> On Wed, Apr 29, 2015 at 12:36 AM, Jeff Moyer  wrote:
>>> Ming Lei  writes:
>>>
 If there are too many pending per work I/O, too many
 high priority work thread can be generated so that
 system performance can be effected.

 This patch limits the max pending per work I/O as 16,
 and will fackback to single queue mode when the max
 number is reached.
>>>
>>> Actually, it limits it to 32.  Also, there is no discussion on what
>>> variables might affect this number.  Will that magic number change
>>> depending on the number of cpus on the system, for example?
>>
>> My fault, it should have been 16.
>>
>> It is just used to keep more IOs in flight, but can't cause obvious
>> costs like the case of Fedora live booting.
>>
>> IMO, it shouldn't depend much on number of CPUs, and more
>> related with I/O performance of the backing file, and the number
>> is like 'iodepth' of fio.
>
> OK, that makes more sense.  I'm still not a huge fan of hard-coding
> numbers that are storage-specific, but I don't have a better suggestion
> at the moment, either.

OK, thanks for your review.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 1/2] ACPI: activate&export acpi_os_get_physical_address

2015-04-30 Thread Rafael J. Wysocki
On Friday, May 01, 2015 03:32:29 AM Rafael J. Wysocki wrote:
> On Thursday, April 30, 2015 11:10:25 AM Darren Hart wrote:
> > On Wed, Apr 22, 2015 at 04:12:24PM +0200, Kast Bernd wrote:
> > > acpi_os_get_physical_address will be needed by an acpi driver 
> > > (asus-wmi.c).
> > > Additionally it could  be used by dell-laptop.c instead of directly 
> > > calling virt_to_phys.
> > > 
> > > acpi_os_get_physical_address gets exported and ACPI_FUTURE_USAGE is 
> > > removed
> > > 
> > 
> > Hrm, well... this doesn't get rid of virt_to_phys, it just wraps it really. 
> > I'm
> > not sure that makes this any more acceptable than the original from Felipe 
> > - but
> > that's not my call.
> 
> Use virt_to_phys() if you need to.
> 
> This one is in case ACPICA needs to get the virtual-to-physical mapping (hence
> ACPI_FUTURE_USAGE).

More to the point, the reason why virt_to_phys() needs to be used in patch [2/2]
seems to be a nasty hack in the ASUS AML that pretty much expects us to provide
the physical address as an argument.

And I don't really understand the Matthew's comment regarding limiting
operation regions to system memory.  This is about a specific operation
region (which BTW only seems to be used as a means to access system memory
at the location pointed to by the arg) in that particular method.

Matthew?


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 6/6] dt-bindings: ARM: Mediatek: Document devicetree bindings for clock/reset controllers

2015-04-30 Thread Stephen Boyd
On 04/23, Sascha Hauer wrote:
> This adds the binding documentation for the apmixedsys, perisys and
> infracfg controllers found on Mediatek SoCs.
> 
> Signed-off-by: Sascha Hauer 

Please Cc devicetree reviewers on bindings (CCed now).

> ---
>  .../bindings/arm/mediatek/mediatek,apmixedsys.txt  | 23 +
>  .../bindings/arm/mediatek/mediatek,infracfg.txt| 30 
> ++
>  .../bindings/arm/mediatek/mediatek,pericfg.txt | 30 
> ++
>  .../bindings/arm/mediatek/mediatek,topckgen.txt| 23 +
>  4 files changed, 106 insertions(+)
>  create mode 100644 
> Documentation/devicetree/bindings/arm/mediatek/mediatek,apmixedsys.txt
>  create mode 100644 
> Documentation/devicetree/bindings/arm/mediatek/mediatek,infracfg.txt
>  create mode 100644 
> Documentation/devicetree/bindings/arm/mediatek/mediatek,pericfg.txt
>  create mode 100644 
> Documentation/devicetree/bindings/arm/mediatek/mediatek,topckgen.txt
> 
> diff --git 
> a/Documentation/devicetree/bindings/arm/mediatek/mediatek,apmixedsys.txt 
> b/Documentation/devicetree/bindings/arm/mediatek/mediatek,apmixedsys.txt
> new file mode 100644
> index 000..5af6d73
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/arm/mediatek/mediatek,apmixedsys.txt
> @@ -0,0 +1,23 @@
> +Mediatek apmixedsys controller
> +==
> +
> +The Mediatek apmixedsys controller provides the PLLs to the system.
> +
> +Required Properties:
> +
> +- compatible: Should be:
> + - "mediatek,mt8135-apmixedsys"
> + - "mediatek,mt8173-apmixedsys"
> +- #clock-cells: Must be 1
> +
> +The apmixedsys controller uses the common clk binding from
> +Documentation/devicetree/bindings/clock/clock-bindings.txt
> +The available clocks are defined in dt-bindings/clock/mt*-clk.h.
> +
> +Example:
> +
> +apmixedsys: apmixedsys@10209000 {

apmixedsys: clock-controller@10209000 {

would be more standard. The same comment applies throughout this
patch. Otherwise it looks good to me.

-Stephen

> + compatible = "mediatek,mt8173-apmixedsys";
> + reg = <0 0x10209000 0 0x1000>;
> + #clock-cells = <1>;
> +};
> diff --git 
> a/Documentation/devicetree/bindings/arm/mediatek/mediatek,infracfg.txt 
> b/Documentation/devicetree/bindings/arm/mediatek/mediatek,infracfg.txt
> new file mode 100644
> index 000..684da473
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/arm/mediatek/mediatek,infracfg.txt
> @@ -0,0 +1,30 @@
> +Mediatek infracfg controller
> +
> +
> +The Mediatek infracfg controller provides various clocks and reset
> +outputs to the system.
> +
> +Required Properties:
> +
> +- compatible: Should be:
> + - "mediatek,mt8135-infracfg", "syscon"
> + - "mediatek,mt8173-infracfg", "syscon"
> +- #clock-cells: Must be 1
> +- #reset-cells: Must be 1
> +
> +The infracfg controller uses the common clk binding from
> +Documentation/devicetree/bindings/clock/clock-bindings.txt
> +The available clocks are defined in dt-bindings/clock/mt*-clk.h.
> +Also it uses the common reset controller binding from
> +Documentation/devicetree/bindings/reset/reset.txt.
> +The available reset outputs are defined in
> +dt-bindings/reset-controller/mt*-resets.h
> +
> +Example:
> +
> +infracfg: infracfg@10001000 {
> + compatible = "mediatek,mt8173-infracfg", "syscon";
> + reg = <0 0x10001000 0 0x1000>;
> + #clock-cells = <1>;
> + #reset-cells = <1>;
> +};
> diff --git 
> a/Documentation/devicetree/bindings/arm/mediatek/mediatek,pericfg.txt 
> b/Documentation/devicetree/bindings/arm/mediatek/mediatek,pericfg.txt
> new file mode 100644
> index 000..fdb45c6
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/arm/mediatek/mediatek,pericfg.txt
> @@ -0,0 +1,30 @@
> +Mediatek pericfg controller
> +===
> +
> +The Mediatek pericfg controller provides various clocks and reset
> +outputs to the system.
> +
> +Required Properties:
> +
> +- compatible: Should be:
> + - "mediatek,mt8135-pericfg", "syscon"
> + - "mediatek,mt8173-pericfg", "syscon"
> +- #clock-cells: Must be 1
> +- #reset-cells: Must be 1
> +
> +The pericfg controller uses the common clk binding from
> +Documentation/devicetree/bindings/clock/clock-bindings.txt
> +The available clocks are defined in dt-bindings/clock/mt*-clk.h.
> +Also it uses the common reset controller binding from
> +Documentation/devicetree/bindings/reset/reset.txt.
> +The available reset outputs are defined in
> +dt-bindings/reset-controller/mt*-resets.h
> +
> +Example:
> +
> +pericfg: pericfg@10003000 {
> + compatible = "mediatek,mt8173-pericfg", "syscon";
> + reg = <0 0x10003000 0 0x1000>;
> + #clock-cells = <1>;
> + #reset-cells = <1>;
> +};
> diff --git 
> a/Documentation/devicetree/bindings/arm/mediatek/mediatek,topckgen.txt 
> b/Documentation/devicetree/bindings/arm/mediatek/mediatek,topckgen.txt
> new file mode 100644
> index 000..a425248
> --- /dev/null
> +++ b/Documenta

Re: [PATCH 1/3] HID: wacom: Do not add suffix to name of devices with an unknown type

2015-04-30 Thread Ping Cheng
On Thu, Apr 30, 2015 at 5:51 PM, Jason Gerecke  wrote:
> The naming logic currently assumes that all devices will be a pen, finger,
> or pad. Though this has historically been the case, the new HID_GENERIC
> catch-all may cause us to probe devices with Wacom's 056A VID which aren't
> any of these types (e.g. the "Cintiq 24HDT Monitor Control"). This patch
> updates the logic so that no suffix will be added to the device name if
> the device type is unknown.
>
> Signed-off-by: Jason Gerecke 

Reviewed-by: Ping Cheng  for the whole set.

Cheers,

Ping

> ---
>  drivers/hid/wacom_sys.c | 13 -
>  1 file changed, 8 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/hid/wacom_sys.c b/drivers/hid/wacom_sys.c
> index 9c57ac0..222baf5 100644
> --- a/drivers/hid/wacom_sys.c
> +++ b/drivers/hid/wacom_sys.c
> @@ -1440,12 +1440,15 @@ static void wacom_update_name(struct wacom *wacom)
> snprintf(wacom_wac->pad_name, sizeof(wacom_wac->pad_name),
> "%s Pad", wacom_wac->name);
>
> -   if (features->device_type != BTN_TOOL_FINGER)
> +   if (features->device_type == BTN_TOOL_PEN) {
> strlcat(wacom_wac->name, " Pen", WACOM_NAME_MAX);
> -   else if (features->touch_max)
> -   strlcat(wacom_wac->name, " Finger", WACOM_NAME_MAX);
> -   else
> -   strlcat(wacom_wac->name, " Pad", WACOM_NAME_MAX);
> +   }
> +   else if (features->device_type == BTN_TOOL_FINGER) {
> +   if (features->touch_max)
> +   strlcat(wacom_wac->name, " Finger", WACOM_NAME_MAX);
> +   else
> +   strlcat(wacom_wac->name, " Pad", WACOM_NAME_MAX);
> +   }
>  }
>
>  static int wacom_probe(struct hid_device *hdev,
> --
> 2.3.5
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC v1 5/5] clk: introduce clk_core_enable_lock and clk_core_disable_lock functions

2015-04-30 Thread Stephen Boyd
On 04/15, Dong Aisheng wrote:
> This can be usefully when clock core wants to enable/disable clocks.
> Then we don't have to convert the struct clk_core to struct clk to call
> clk_enable/clk_disable which is a bit un-align with exist using.
> 
> Cc: Mike Turquette 
> Cc: Stephen Boyd 
> Signed-off-by: Dong Aisheng 
> ---

Yeah let's add this patch either before patch 4 or squash it into
patch 4. Also, avoid adding more function prototypes please.

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC v1 4/5] clk: core: add CLK_SET_PARENT_ON flags to support clocks require parent on

2015-04-30 Thread Stephen Boyd
On 04/15, Dong Aisheng wrote:
> diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c
> index 7af553d..f2470e5 100644
> --- a/drivers/clk/clk.c
> +++ b/drivers/clk/clk.c
> @@ -43,6 +43,11 @@ static int clk_core_get_phase(struct clk_core *clk);
>  static bool clk_core_is_prepared(struct clk_core *clk);
>  static bool clk_core_is_enabled(struct clk_core *clk);
>  static struct clk_core *clk_core_lookup(const char *name);
> +static struct clk *clk_core_get_parent(struct clk_core *clk);
> +static int clk_core_prepare(struct clk_core *clk);
> +static void clk_core_unprepare(struct clk_core *clk);
> +static int clk_core_enable(struct clk_core *clk);
> +static void clk_core_disable(struct clk_core *clk);

Let's avoid adding more here if we can.

>  
>  /***private data structures***/
>  
> @@ -508,6 +513,7 @@ static void clk_unprepare_unused_subtree(struct clk_core 
> *clk)
>  static void clk_disable_unused_subtree(struct clk_core *clk)
>  {
>   struct clk_core *child;
> + struct clk *parent = clk_core_get_parent(clk);
>   unsigned long flags;
>  
>   lockdep_assert_held(&prepare_lock);
> @@ -515,6 +521,13 @@ static void clk_disable_unused_subtree(struct clk_core 
> *clk)
>   hlist_for_each_entry(child, &clk->children, child_node)
>   clk_disable_unused_subtree(child);
>  
> + if (clk->flags & CLK_SET_PARENT_ON && parent) {
> + clk_core_prepare(parent->core);
> + flags = clk_enable_lock();
> + clk_core_enable(parent->core);
> + clk_enable_unlock(flags);
> + }

If there's a parent and this clock is on, why wouldn't the parent
also be on? It doesn't seem right to have a clock that's on
without it's parent on that we're trying to turn off. Put another
way, how is this fixing anything?

> +
>   flags = clk_enable_lock();
>  
>   if (clk->enable_count)
> @@ -608,6 +627,14 @@ struct clk *__clk_get_parent(struct clk *clk)
>  }
>  EXPORT_SYMBOL_GPL(__clk_get_parent);
>  
> +static struct clk *clk_core_get_parent(struct clk_core *clk)
> +{
> + if (!clk)
> + return NULL;
> +
> + return !clk->parent ? NULL : clk->parent->hw->clk;
> +}

s/clk/core/ in this function

> +
>  static struct clk_core *clk_core_get_parent_by_index(struct clk_core *clk,
>u8 index)
>  {
> @@ -1456,13 +1483,27 @@ static struct clk_core 
> *__clk_set_parent_before(struct clk_core *clk,
>* hardware and software states.
>*
>* See also: Comment for clk_set_parent() below.
> +  *
> +  * 2. enable two parents clock for .set_parent() operation if finding
> +  * flag CLK_SET_PARENT_ON
>*/
> - if (clk->prepare_count) {
> + if (clk->prepare_count || clk->flags & CLK_SET_PARENT_ON) {
>   clk_core_prepare(parent);
>   flags = clk_enable_lock();
>   clk_core_enable(parent);
> - clk_core_enable(clk);
>   clk_enable_unlock(flags);
> +
> + if (clk->prepare_count) {
> + flags = clk_enable_lock();
> + clk_core_enable(clk);
> + clk_enable_unlock(flags);
> + } else {
> +
> + clk_core_prepare(old_parent);
> + flags = clk_enable_lock();
> + clk_core_enable(old_parent);
> + clk_enable_unlock(flags);
> + }
>   }
>  
>   /* update the clk tree topology */
> @@ -1483,12 +1524,22 @@ static void __clk_set_parent_after(struct clk_core 
> *clk,
>* Finish the migration of prepare state and undo the changes done
>* for preventing a race with clk_enable().
>*/
> - if (clk->prepare_count) {
> + if (clk->prepare_count || clk->flags & CLK_SET_PARENT_ON) {
>   flags = clk_enable_lock();
> - clk_core_disable(clk);
>   clk_core_disable(old_parent);
>   clk_enable_unlock(flags);
>   clk_core_unprepare(old_parent);
> +
> + if (clk->prepare_count) {
> + flags = clk_enable_lock();
> + clk_core_disable(clk);
> + clk_enable_unlock(flags);
> + } else {
> + flags = clk_enable_lock();
> + clk_core_disable(parent);
> + clk_enable_unlock(flags);
> + clk_core_unprepare(parent);
> + }

Is there a reason why the clk itself can't be on when we switch
parents? It seems that if the clk was on during the parent
switch, then it should be possible to just add a flag check on
both these if conditions and be done. It may be possible to
change the behavior here and not enable the clk in hardware, just
up the count and turn on both the parents. I'm trying to recall
why we enable the clk itself across the switch.

>   }
>  }
>  
> @@ -1514,12 +1565,23 @@ static int __clk_set_parent(

Re: [RFC 1/2] ACPI: activate&export acpi_os_get_physical_address

2015-04-30 Thread Rafael J. Wysocki
On Thursday, April 30, 2015 11:10:25 AM Darren Hart wrote:
> On Wed, Apr 22, 2015 at 04:12:24PM +0200, Kast Bernd wrote:
> > acpi_os_get_physical_address will be needed by an acpi driver (asus-wmi.c).
> > Additionally it could  be used by dell-laptop.c instead of directly calling 
> > virt_to_phys.
> > 
> > acpi_os_get_physical_address gets exported and ACPI_FUTURE_USAGE is removed
> > 
> 
> Hrm, well... this doesn't get rid of virt_to_phys, it just wraps it really. 
> I'm
> not sure that makes this any more acceptable than the original from Felipe - 
> but
> that's not my call.

Use virt_to_phys() if you need to.

This one is in case ACPICA needs to get the virtual-to-physical mapping (hence
ACPI_FUTURE_USAGE).

Rafael

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RESEND] clk: Fix JSON output in debugfs

2015-04-30 Thread Felipe Balbi
On Thu, Apr 30, 2015 at 05:37:12PM -0700, Stephen Boyd wrote:
> On 04/29, Stefan Wahren wrote:
> > key/value pairs in a JSON object must be separated by a comma.
> > After adding the properties "accuracy" and "phase" the JSON output
> > of /sys/kernel/debug/clk/clk_dump is invalid.
> > 
> > So add the missing commas to fix it.
> > 
> > Fixes: 5279fc4 ("clk: add clk accuracy retrieval support")
> > Signed-off-by: Stefan Wahren 
> 
> Hmph, this regression is old, v3.14 days. We probably ought to
> have a comment in here stating this should be JSON format.
> 
> Applied to clk-next with the comment below squashed in.
> 
> 8<
> diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c
> index 5edbec6dfb20..b850a0ef5b9f 100644
> --- a/drivers/clk/clk.c
> +++ b/drivers/clk/clk.c
> @@ -1974,6 +1974,7 @@ static void clk_dump_one(struct seq_file *s, struct 
> clk_core *c, int level)
>   if (!c)
>   return;
>  
> + /* This should be JSON format, i.e. elements separated with a comma */
>   seq_printf(s, "\"%s\": { ", c->name);
>   seq_printf(s, "\"enable_count\": %d,", c->enable_count);
>   seq_printf(s, "\"prepare_count\": %d,", c->prepare_count);

you probably want to a newline character after all clocks have been
dumped.

diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c
index 459ce9da13e0..c2de94238e75 100644
--- a/drivers/clk/clk.c
+++ b/drivers/clk/clk.c
@@ -276,7 +276,7 @@ static int clk_dump(struct seq_file *s, void *data)
 
clk_prepare_unlock();
 
-   seq_printf(s, "}");
+   seq_printf(s, "}\n");
return 0;
 }

cheers

-- 
balbi


signature.asc
Description: Digital signature


Re: [PATCH v2 02/20] libnd, nd_acpi: initial libnd infrastructure and NFIT support

2015-04-30 Thread Rafael J. Wysocki
On Thursday, April 30, 2015 05:39:06 PM Dan Williams wrote:
> On Thu, Apr 30, 2015 at 4:23 PM, Rafael J. Wysocki  wrote:
> > On Tuesday, April 28, 2015 02:24:23 PM Dan Williams wrote:
> >> 1/ Autodetect an NFIT table for the ACPI namespace device with _HID of
> >>"ACPI0012"
> >>
> >> 2/ libnd bus registration
> >>
> >> The NFIT provided by ACPI is one possible method by which platforms will
> >> discover NVDIMM resources.  However, the intent of the nd_bus_descriptor
> >> abstraction is to abstract "provider" specific details, leaving libnd
> >> to be independent of the specific NVDIMM resource discovery mechanism.
> >> This flexibility is later exploited later to implement custom-defined nd
> >> buses.
> >>
> >> Cc: 
> >> Cc: Robert Moore 
> >> Cc: Rafael J. Wysocki 
> >> Signed-off-by: Dan Williams 
> >> ---
> >>  drivers/block/Kconfig |2
> >>  drivers/block/Makefile|1
> >>  drivers/block/nd/Kconfig  |   40 +++
> >>  drivers/block/nd/Makefile |6 +
> >>  drivers/block/nd/acpi.c   |  475 
> >> +
> >>  drivers/block/nd/acpi_nfit.h  |  254 ++
> >>  drivers/block/nd/core.c   |   67 ++
> >>  drivers/block/nd/libnd.h  |   33 +++
> >>  drivers/block/nd/nd-private.h |   23 ++
> >>  9 files changed, 901 insertions(+)
> >>  create mode 100644 drivers/block/nd/Kconfig
> >>  create mode 100644 drivers/block/nd/Makefile
> >>  create mode 100644 drivers/block/nd/acpi.c
> >>  create mode 100644 drivers/block/nd/acpi_nfit.h
> >>  create mode 100644 drivers/block/nd/core.c
> >>  create mode 100644 drivers/block/nd/libnd.h
> >>  create mode 100644 drivers/block/nd/nd-private.h
> >>
> >> diff --git a/drivers/block/Kconfig b/drivers/block/Kconfig
> >> index eb1fed5bd516..dfe40e5ca9bd 100644
> >> --- a/drivers/block/Kconfig
> >> +++ b/drivers/block/Kconfig
> >> @@ -321,6 +321,8 @@ config BLK_DEV_NVME
> >> To compile this driver as a module, choose M here: the
> >> module will be called nvme.
> >>
> >> +source "drivers/block/nd/Kconfig"
> >> +
> >>  config BLK_DEV_SKD
> >>   tristate "STEC S1120 Block Driver"
> >>   depends on PCI
> >> diff --git a/drivers/block/Makefile b/drivers/block/Makefile
> >> index 9cc6c18a1c7e..07a6acecf4d8 100644
> >> --- a/drivers/block/Makefile
> >> +++ b/drivers/block/Makefile
> >> @@ -24,6 +24,7 @@ obj-$(CONFIG_CDROM_PKTCDVD) += pktcdvd.o
> >>  obj-$(CONFIG_MG_DISK)+= mg_disk.o
> >>  obj-$(CONFIG_SUNVDC) += sunvdc.o
> >>  obj-$(CONFIG_BLK_DEV_NVME)   += nvme.o
> >> +obj-$(CONFIG_ND_DEVICES) += nd/
> >>  obj-$(CONFIG_BLK_DEV_SKD)+= skd.o
> >>  obj-$(CONFIG_BLK_DEV_OSD)+= osdblk.o
> >>
> >> diff --git a/drivers/block/nd/Kconfig b/drivers/block/nd/Kconfig
> >> new file mode 100644
> >> index ..6d5d6b732f82
> >> --- /dev/null
> >> +++ b/drivers/block/nd/Kconfig
> >> @@ -0,0 +1,40 @@
> >> +menuconfig ND_DEVICES
> >> + bool "NVDIMM Support"
> >> + depends on PHYS_ADDR_T_64BIT
> >> + help
> >> +   Generic support for non-volatile memory devices including
> >> +   ACPI-6-NFIT defined resources.  On platforms that define an
> >> +   NFIT, or otherwise can discover NVDIMM resources, a libnd
> >> +   bus is registered to advertise PMEM (persistent memory)
> >> +   namespaces (/dev/pmemX) and BLK (sliding mmio window(s))
> >> +   namespaces (/dev/ndX). A PMEM namespace refers to a memory
> >> +   resource that may span multiple DIMMs and support DAX (see
> >> +   CONFIG_DAX).  A BLK namespace refers to an NVDIMM control
> >> +   region which exposes an mmio register set for windowed
> >> +   access mode to non-volatile memory.
> >> +
> >> +if ND_DEVICES
> >> +
> >> +config LIBND
> >> + tristate "LIBND: libnd device driver support"
> >> + help
> >> +   Platform agnostic device model for a libnd bus.  Publishes
> >> +   resources for a PMEM (persistent-memory) driver and/or BLK
> >> +   (sliding mmio window(s)) driver to attach.  Exposes a device
> >> +   topology under a "ndX" bus device, a "/dev/ndctlX" bus-ioctl
> >> +   message passing interface, and a "/dev/nmemX" dimm-ioctl
> >> +   message interface for each memory device registered on the
> >> +   bus.  instance.  A userspace library "ndctl" provides an API
> >> +   to enumerate/manage this subsystem.
> >> +
> >> +config ND_ACPI
> >> + tristate "ACPI: NFIT to libnd bus support"
> >> + select LIBND
> >> + depends on ACPI
> >> + help
> >> +   Infrastructure to probe ACPI 6 compliant platforms for
> >> +   NVDIMMs (NFIT) and register a libnd device tree.  In
> >> +   addition to storage devices this also enables libnd craft
> >> +   ACPI._DSM messages for platform/dimm configuration.
> >
> > I'm wondering if the two CONFIG options above really need to be 
> > user-selectable?
> >
> > For example, what reason people (who've already selected ND_DEVICE

[PATCH 2/3] HID: wacom: Discover device_type from HID descriptor for all devices

2015-04-30 Thread Jason Gerecke
Currently, we assume a device_type of BTN_TOOL_PEN before scanning the
HID descriptor and then change the device_type if what we discover
proves that assumption wrong. This way of doing things makes it more
difficult to figure out if a device (particularly a HID_GENERIC device)
actually does tablet/touch input or is something completley different.

This patch leaves device_type at its initial value of 0 and then calls
'wacom_parse_hid' for every device (not just those that have touch).
As we map the usages, we can set the device_type as before. After we're
finished, we can then check if the value is still zero and do whatever
is most appropriate.

Detecting the pen can be a little tricky on most Wacom devices because
the descriptors describe opaque blobs. Fortunately, older Wacom tablets
have the HID_DG_DIGITIZER usage on the pen's application collection and
newer tablets seem to have a similar vendor-defined usage that we can
trigger on.

Signed-off-by: Jason Gerecke 
---
 drivers/hid/wacom_sys.c | 23 +--
 drivers/hid/wacom_wac.c |  8 +---
 drivers/hid/wacom_wac.h |  6 +-
 3 files changed, 23 insertions(+), 14 deletions(-)

diff --git a/drivers/hid/wacom_sys.c b/drivers/hid/wacom_sys.c
index 222baf5..157aa7a 100644
--- a/drivers/hid/wacom_sys.c
+++ b/drivers/hid/wacom_sys.c
@@ -181,7 +181,11 @@ static void wacom_usage_mapping(struct hid_device *hdev,
* X/Y values and some cases of invalid Digitizer X/Y
* values commonly reported.
*/
-   if (!pen && !finger)
+   if (pen)
+   features->device_type = BTN_TOOL_PEN;
+   else if (finger)
+   features->device_type = BTN_TOOL_FINGER;
+   else
return;
 
/*
@@ -198,14 +202,11 @@ static void wacom_usage_mapping(struct hid_device *hdev,
case HID_GD_X:
features->x_max = field->logical_maximum;
if (finger) {
-   features->device_type = BTN_TOOL_FINGER;
features->x_phy = field->physical_maximum;
if (features->type != BAMBOO_PT) {
features->unit = field->unit;
features->unitExpo = field->unit_exponent;
}
-   } else {
-   features->device_type = BTN_TOOL_PEN;
}
break;
case HID_GD_Y:
@@ -425,7 +426,6 @@ static void wacom_retrieve_hid_descriptor(struct hid_device 
*hdev,
struct usb_interface *intf = wacom->intf;
 
/* default features */
-   features->device_type = BTN_TOOL_PEN;
features->x_fuzz = 4;
features->y_fuzz = 4;
features->pressure_fuzz = 0;
@@ -446,10 +446,6 @@ static void wacom_retrieve_hid_descriptor(struct 
hid_device *hdev,
}
}
 
-   /* only devices that support touch need to retrieve the info */
-   if (features->type < BAMBOO_PT)
-   return;
-
wacom_parse_hid(hdev, features);
 }
 
@@ -1529,8 +1525,15 @@ static int wacom_probe(struct hid_device *hdev,
 
/* Retrieve the physical and logical size for touch devices */
wacom_retrieve_hid_descriptor(hdev, features);
-
wacom_setup_device_quirks(wacom);
+
+   if (!features->device_type && features->type != WIRELESS) {
+   dev_warn(&hdev->dev, "Unknown device_type for '%s'. %s.",
+hdev->name, "Assuming pen");
+
+   features->device_type = BTN_TOOL_PEN;
+   }
+
wacom_calculate_res(features);
 
wacom_update_name(wacom);
diff --git a/drivers/hid/wacom_wac.c b/drivers/hid/wacom_wac.c
index dff99ff..a52fc25 100644
--- a/drivers/hid/wacom_wac.c
+++ b/drivers/hid/wacom_wac.c
@@ -2186,13 +2186,15 @@ void wacom_setup_device_quirks(struct wacom *wacom)
 
features->x_max = 4096;
features->y_max = 4096;
-   } else {
-   features->device_type = BTN_TOOL_PEN;
}
}
 
/*
-* Same thing for Bamboo PAD
+* Raw Wacom-mode pen and touch events both come from interface
+* 0, whose HID descriptor has an application usage of 0xFF0D
+* (i.e., WACOM_VENDORDEFINED_PEN). We route pen packets back
+* out through the HID_GENERIC device created for interface 1,
+* so rewrite this one to be of type BTN_TOOL_FINGER.
 */
if (features->type == BAMBOO_PAD)
features->device_type = BTN_TOOL_FINGER;
diff --git a/drivers/hid/wacom_wac.h b/drivers/hid/wacom_wac.h
index f5a5f68..9a5ee62 100644
--- a/drivers/hid/wacom_wac.h
+++ b/drivers/hid/wacom_wac.h
@@ -72,10 +72,14 @@
 #define WACOM_QUIRK_MONITOR0x0004
 #define WACOM_QUIRK_BATTERY0x0008
 
+#define WACOM_VENDORDEFINED_PEN0xff0d0001
+
 #define WACOM_PEN_FIELD(f) (((f)->logical == HID_DG_STYLUS) || \
 

[PATCH 3/3] HID: wacom: Fail probe if HID_GENERIC device has unknown device_type

2015-04-30 Thread Jason Gerecke
The last patch was careful to maintain backwards-compatible behavior
by forcing device_type to BTN_TOOL_PEN (and printing a warning) if it
were still uninitialized after scanning the HID descriptor and applying
quirks. We should be more strict with HID_GENERIC devices, however,
since there is no a priori guarantee that it is a tablet or touchpad.
If the device_type is still uninitialized for a HID_GENERIC device then
we assume that it isn't something the driver can work with and so fail
the probe.

Signed-off-by: Jason Gerecke 
---
 drivers/hid/wacom_sys.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/hid/wacom_sys.c b/drivers/hid/wacom_sys.c
index 157aa7a..7abf52c 100644
--- a/drivers/hid/wacom_sys.c
+++ b/drivers/hid/wacom_sys.c
@@ -1528,8 +1528,14 @@ static int wacom_probe(struct hid_device *hdev,
wacom_setup_device_quirks(wacom);
 
if (!features->device_type && features->type != WIRELESS) {
+   error = features->type == HID_GENERIC ? -ENODEV : 0;
+
dev_warn(&hdev->dev, "Unknown device_type for '%s'. %s.",
-hdev->name, "Assuming pen");
+hdev->name,
+error ? "Ignoring" : "Assuming pen");
+
+   if (error)
+   goto fail_shared_data;
 
features->device_type = BTN_TOOL_PEN;
}
-- 
2.3.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/3] HID: wacom: Do not add suffix to name of devices with an unknown type

2015-04-30 Thread Jason Gerecke
The naming logic currently assumes that all devices will be a pen, finger,
or pad. Though this has historically been the case, the new HID_GENERIC
catch-all may cause us to probe devices with Wacom's 056A VID which aren't
any of these types (e.g. the "Cintiq 24HDT Monitor Control"). This patch
updates the logic so that no suffix will be added to the device name if
the device type is unknown.

Signed-off-by: Jason Gerecke 
---
 drivers/hid/wacom_sys.c | 13 -
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/drivers/hid/wacom_sys.c b/drivers/hid/wacom_sys.c
index 9c57ac0..222baf5 100644
--- a/drivers/hid/wacom_sys.c
+++ b/drivers/hid/wacom_sys.c
@@ -1440,12 +1440,15 @@ static void wacom_update_name(struct wacom *wacom)
snprintf(wacom_wac->pad_name, sizeof(wacom_wac->pad_name),
"%s Pad", wacom_wac->name);
 
-   if (features->device_type != BTN_TOOL_FINGER)
+   if (features->device_type == BTN_TOOL_PEN) {
strlcat(wacom_wac->name, " Pen", WACOM_NAME_MAX);
-   else if (features->touch_max)
-   strlcat(wacom_wac->name, " Finger", WACOM_NAME_MAX);
-   else
-   strlcat(wacom_wac->name, " Pad", WACOM_NAME_MAX);
+   }
+   else if (features->device_type == BTN_TOOL_FINGER) {
+   if (features->touch_max)
+   strlcat(wacom_wac->name, " Finger", WACOM_NAME_MAX);
+   else
+   strlcat(wacom_wac->name, " Pad", WACOM_NAME_MAX);
+   }
 }
 
 static int wacom_probe(struct hid_device *hdev,
-- 
2.3.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RESEND] clk: Fix JSON output in debugfs

2015-04-30 Thread Stefan Wahren

> Stephen Boyd  hat am 1. Mai 2015 um 02:37 geschrieben:
>
>
> On 04/29, Stefan Wahren wrote:
> > key/value pairs in a JSON object must be separated by a comma.
> > After adding the properties "accuracy" and "phase" the JSON output
> > of /sys/kernel/debug/clk/clk_dump is invalid.
> >
> > So add the missing commas to fix it.
> >
> > Fixes: 5279fc4 ("clk: add clk accuracy retrieval support")
> > Signed-off-by: Stefan Wahren 
>
> Hmph, this regression is old, v3.14 days. We probably ought to
> have a comment in here stating this should be JSON format.
>
> Applied to clk-next with the comment below squashed in.

Thanks

Stefan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 02/20] libnd, nd_acpi: initial libnd infrastructure and NFIT support

2015-04-30 Thread Dan Williams
On Thu, Apr 30, 2015 at 4:23 PM, Rafael J. Wysocki  wrote:
> On Tuesday, April 28, 2015 02:24:23 PM Dan Williams wrote:
>> 1/ Autodetect an NFIT table for the ACPI namespace device with _HID of
>>"ACPI0012"
>>
>> 2/ libnd bus registration
>>
>> The NFIT provided by ACPI is one possible method by which platforms will
>> discover NVDIMM resources.  However, the intent of the nd_bus_descriptor
>> abstraction is to abstract "provider" specific details, leaving libnd
>> to be independent of the specific NVDIMM resource discovery mechanism.
>> This flexibility is later exploited later to implement custom-defined nd
>> buses.
>>
>> Cc: 
>> Cc: Robert Moore 
>> Cc: Rafael J. Wysocki 
>> Signed-off-by: Dan Williams 
>> ---
>>  drivers/block/Kconfig |2
>>  drivers/block/Makefile|1
>>  drivers/block/nd/Kconfig  |   40 +++
>>  drivers/block/nd/Makefile |6 +
>>  drivers/block/nd/acpi.c   |  475 
>> +
>>  drivers/block/nd/acpi_nfit.h  |  254 ++
>>  drivers/block/nd/core.c   |   67 ++
>>  drivers/block/nd/libnd.h  |   33 +++
>>  drivers/block/nd/nd-private.h |   23 ++
>>  9 files changed, 901 insertions(+)
>>  create mode 100644 drivers/block/nd/Kconfig
>>  create mode 100644 drivers/block/nd/Makefile
>>  create mode 100644 drivers/block/nd/acpi.c
>>  create mode 100644 drivers/block/nd/acpi_nfit.h
>>  create mode 100644 drivers/block/nd/core.c
>>  create mode 100644 drivers/block/nd/libnd.h
>>  create mode 100644 drivers/block/nd/nd-private.h
>>
>> diff --git a/drivers/block/Kconfig b/drivers/block/Kconfig
>> index eb1fed5bd516..dfe40e5ca9bd 100644
>> --- a/drivers/block/Kconfig
>> +++ b/drivers/block/Kconfig
>> @@ -321,6 +321,8 @@ config BLK_DEV_NVME
>> To compile this driver as a module, choose M here: the
>> module will be called nvme.
>>
>> +source "drivers/block/nd/Kconfig"
>> +
>>  config BLK_DEV_SKD
>>   tristate "STEC S1120 Block Driver"
>>   depends on PCI
>> diff --git a/drivers/block/Makefile b/drivers/block/Makefile
>> index 9cc6c18a1c7e..07a6acecf4d8 100644
>> --- a/drivers/block/Makefile
>> +++ b/drivers/block/Makefile
>> @@ -24,6 +24,7 @@ obj-$(CONFIG_CDROM_PKTCDVD) += pktcdvd.o
>>  obj-$(CONFIG_MG_DISK)+= mg_disk.o
>>  obj-$(CONFIG_SUNVDC) += sunvdc.o
>>  obj-$(CONFIG_BLK_DEV_NVME)   += nvme.o
>> +obj-$(CONFIG_ND_DEVICES) += nd/
>>  obj-$(CONFIG_BLK_DEV_SKD)+= skd.o
>>  obj-$(CONFIG_BLK_DEV_OSD)+= osdblk.o
>>
>> diff --git a/drivers/block/nd/Kconfig b/drivers/block/nd/Kconfig
>> new file mode 100644
>> index ..6d5d6b732f82
>> --- /dev/null
>> +++ b/drivers/block/nd/Kconfig
>> @@ -0,0 +1,40 @@
>> +menuconfig ND_DEVICES
>> + bool "NVDIMM Support"
>> + depends on PHYS_ADDR_T_64BIT
>> + help
>> +   Generic support for non-volatile memory devices including
>> +   ACPI-6-NFIT defined resources.  On platforms that define an
>> +   NFIT, or otherwise can discover NVDIMM resources, a libnd
>> +   bus is registered to advertise PMEM (persistent memory)
>> +   namespaces (/dev/pmemX) and BLK (sliding mmio window(s))
>> +   namespaces (/dev/ndX). A PMEM namespace refers to a memory
>> +   resource that may span multiple DIMMs and support DAX (see
>> +   CONFIG_DAX).  A BLK namespace refers to an NVDIMM control
>> +   region which exposes an mmio register set for windowed
>> +   access mode to non-volatile memory.
>> +
>> +if ND_DEVICES
>> +
>> +config LIBND
>> + tristate "LIBND: libnd device driver support"
>> + help
>> +   Platform agnostic device model for a libnd bus.  Publishes
>> +   resources for a PMEM (persistent-memory) driver and/or BLK
>> +   (sliding mmio window(s)) driver to attach.  Exposes a device
>> +   topology under a "ndX" bus device, a "/dev/ndctlX" bus-ioctl
>> +   message passing interface, and a "/dev/nmemX" dimm-ioctl
>> +   message interface for each memory device registered on the
>> +   bus.  instance.  A userspace library "ndctl" provides an API
>> +   to enumerate/manage this subsystem.
>> +
>> +config ND_ACPI
>> + tristate "ACPI: NFIT to libnd bus support"
>> + select LIBND
>> + depends on ACPI
>> + help
>> +   Infrastructure to probe ACPI 6 compliant platforms for
>> +   NVDIMMs (NFIT) and register a libnd device tree.  In
>> +   addition to storage devices this also enables libnd craft
>> +   ACPI._DSM messages for platform/dimm configuration.
>
> I'm wondering if the two CONFIG options above really need to be 
> user-selectable?
>
> For example, what reason people (who've already selected ND_DEVICES) may have
> for not selecting ND_ACPI if ACPI is set?


Later on in the series we introduce ND_E820 which supports creating a
libnd-bus from e820-type-12 memory ranges on pre-NFIT systems.  I'm
also considering a configfs defined libnd-bus because e820 types are
not nearly enough i

  1   2   3   4   5   6   7   8   9   10   >