Re: [PATCH kernel v9 27/32] powerpc/iommu/ioda2: Add get_table_size() to calculate the size of future table
On 05/01/2015 03:12 PM, David Gibson wrote: On Fri, May 01, 2015 at 02:10:58PM +1000, Alexey Kardashevskiy wrote: On 04/29/2015 04:40 PM, David Gibson wrote: On Sat, Apr 25, 2015 at 10:14:51PM +1000, Alexey Kardashevskiy wrote: This adds a way for the IOMMU user to know how much a new table will use so it can be accounted in the locked_vm limit before allocation happens. This stores the allocated table size in pnv_pci_create_table() so the locked_vm counter can be updated correctly when a table is being disposed. This defines an iommu_table_group_ops callback to let VFIO know how much memory will be locked if a table is created. Signed-off-by: Alexey Kardashevskiy --- Changes: v9: * reimplemented the whole patch --- arch/powerpc/include/asm/iommu.h | 5 + arch/powerpc/platforms/powernv/pci-ioda.c | 14 arch/powerpc/platforms/powernv/pci.c | 36 +++ arch/powerpc/platforms/powernv/pci.h | 2 ++ 4 files changed, 57 insertions(+) diff --git a/arch/powerpc/include/asm/iommu.h b/arch/powerpc/include/asm/iommu.h index 1472de3..9844c106 100644 --- a/arch/powerpc/include/asm/iommu.h +++ b/arch/powerpc/include/asm/iommu.h @@ -99,6 +99,7 @@ struct iommu_table { unsigned long it_size; /* Size of iommu table in entries */ unsigned long it_indirect_levels; unsigned long it_level_size; + unsigned long it_allocated_size; unsigned long it_offset;/* Offset into global table */ unsigned long it_base; /* mapped address of tce table */ unsigned long it_index; /* which iommu table this is */ @@ -155,6 +156,10 @@ extern struct iommu_table *iommu_init_table(struct iommu_table * tbl, struct iommu_table_group; struct iommu_table_group_ops { + unsigned long (*get_table_size)( + __u32 page_shift, + __u64 window_size, + __u32 levels); long (*create_table)(struct iommu_table_group *table_group, int num, __u32 page_shift, diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c index e0be556..7f548b4 100644 --- a/arch/powerpc/platforms/powernv/pci-ioda.c +++ b/arch/powerpc/platforms/powernv/pci-ioda.c @@ -2062,6 +2062,18 @@ static void pnv_pci_ioda2_setup_bypass_pe(struct pnv_phb *phb, } #ifdef CONFIG_IOMMU_API +static unsigned long pnv_pci_ioda2_get_table_size(__u32 page_shift, + __u64 window_size, __u32 levels) +{ + unsigned long ret = pnv_get_table_size(page_shift, window_size, levels); + + if (!ret) + return ret; + + /* Add size of it_userspace */ + return ret + (window_size >> page_shift) * sizeof(unsigned long); This doesn't make much sense. The userspace view can't possibly be a property of the specific low-level IOMMU model. This it_userspace thing is all about memory preregistration. I need some way to track how many actual mappings the mm_iommu_table_group_mem_t has in order to decide whether to allow unregistering or not. When I clear TCE, I can read the old value which is host physical address which I cannot use to find the preregistered region and adjust the mappings counter; I can only use userspace addresses for this (not even guest physical addresses as it is VFIO and probably no KVM). So I have to keep userspace addresses somewhere, one per IOMMU page, and the iommu_table seems a natural place for this. Well.. sort of. But as noted elsewhere this pulls VFIO specific constraints into a platform code structure. And whether you get this table depends on the platform IOMMU type rather than on what VFIO wants to do with it, which doesn't make sense. What might make more sense is an opaque pointer io iommu_table for use by the table "owner" (in the take_ownership sense). The pointer would be stored in iommu_table, but VFIO is responsible for populating and managing its contents. Or you could just put the userspace mappings in the container. Although you might want a different data structure in that case. Nope. I need this table in in-kernel acceleration to update the mappings counter per mm_iommu_table_group_mem_t. In KVM's real mode handlers, I only have IOMMU tables, not containers or groups. QEMU creates a guest view of the table (KVM_CREATE_SPAPR_TCE) specifying a LIOBN, and then attaches TCE tables to it via set of ioctls (one per IOMMU group) to VFIO KVM device. So if I call it it_opaque (instead of it_userspace), I will still need a common place (visible to VFIO and PowerKVM) for this to put: #define IOMMU_TABLE_USERSPACE_ENTRY(tbl, entry) So far this place was arch/powerpc/include/asm/iommu.h and the iommu_table struct. The other thing to bear in mind is that registered regions are likely to be large contiguous blocks in user addresses, though obviously not contiguous in physical addr. So you migh
Re: [PATCH v2] scripts/gdb: Add command to check list consistency
On 2015-04-24 03:57, ThiƩbaud Weksteen wrote: > Add a gdb script to verify the consistency of lists. > > Signed-off-by: ThiƩbaud Weksteen > --- > Implement suggestions from Jan. > > Changes in v2: > - Add copyright line > - Rename check_list to list_check > - Remove casting and only accept (struct list_head) object > - Add error message if argument is missing > - Reformat error messages to include address of nodes > Thanks! I've queued it up (git.kiszka.org/linux.git queues/gdb-scripts) along with two small improvements (completion and support for list pointers). Will push to Andrew for 4.2. I'm also still thinking about lx-list-for-each... Jan signature.asc Description: OpenPGP digital signature
[PATCH v2 2/2] arm64: dts: mt8173: Fixup pinctrl nodes
The 8173 pinctrl node doesn't follow dts convention. Fix them. Also add a comment to explain pinctrl register usage to make it more clear. Signed-off-by: Yingjoe Chen Reviewed-by: Daniel Kurtz --- arch/arm64/boot/dts/mediatek/mt8173.dtsi | 22 +- 1 file changed, 13 insertions(+), 9 deletions(-) diff --git a/arch/arm64/boot/dts/mediatek/mt8173.dtsi b/arch/arm64/boot/dts/mediatek/mt8173.dtsi index 924fdb6..4595196 100644 --- a/arch/arm64/boot/dts/mediatek/mt8173.dtsi +++ b/arch/arm64/boot/dts/mediatek/mt8173.dtsi @@ -106,14 +106,13 @@ compatible = "simple-bus"; ranges; - syscfg_pctl_a: syscfg_pctl_a@10005000 { - compatible = "mediatek,mt8173-pctl-a-syscfg", "syscon"; - reg = <0 0x10005000 0 0x1000>; - }; - - pio: pinctrl@0x10005000 { + /* +* Pinctrl access register at 0x10005000 through regmap. +* Register 0x1000b000 is used by EINT. +*/ + pio: pinctrl@10005000 { compatible = "mediatek,mt8173-pinctrl"; - reg = <0 0x1000B000 0 0x1000>; + reg = <0 0x1000b000 0 0x1000>; mediatek,pctl-regmap = <&syscfg_pctl_a>; pins-are-numbered; gpio-controller; @@ -121,8 +120,13 @@ interrupt-controller; #interrupt-cells = <2>; interrupts = , - , - ; +, +; + }; + + syscfg_pctl_a: syscfg_pctl_a@10005000 { + compatible = "mediatek,mt8173-pctl-a-syscfg", "syscon"; + reg = <0 0x10005000 0 0x1000>; }; sysirq: intpol-controller@10200620 { -- 1.8.1.1.dirty -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v2 1/2] ARM: dts: mt8135: Add pinctrl/GPIO/EINT node for mt8135.
From: Hongzhou Yang Patches based on v4.1-rc1. Change according to Matthias' suggestion. - Remove comments on syscfg nodes - Sort nodes by instance address & name. ---8< Add pinctrl,GPIO and EINT node to mt8135.dtsi. Signed-off-by: Hongzhou Yang Acked-by: Linus Walleij --- arch/arm/boot/dts/mt8135-pinfunc.h | 1302 arch/arm/boot/dts/mt8135.dtsi | 29 + 2 files changed, 1331 insertions(+) create mode 100644 arch/arm/boot/dts/mt8135-pinfunc.h diff --git a/arch/arm/boot/dts/mt8135-pinfunc.h b/arch/arm/boot/dts/mt8135-pinfunc.h new file mode 100644 index 000..5a60987 --- /dev/null +++ b/arch/arm/boot/dts/mt8135-pinfunc.h @@ -0,0 +1,1302 @@ +/* + * Copyright (c) 2014 MediaTek Inc. + * Author: Hongzhou.Yang + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + */ + +#ifndef __DTS_MT8135_PINFUNC_H +#define __DTS_MT8135_PINFUNC_H + +#include + +#define MT8135_PIN_0_MSDC0_DAT7__FUNC_GPIO0 (MTK_PIN_NO(0) | 0) +#define MT8135_PIN_0_MSDC0_DAT7__FUNC_MSDC0_DAT7 (MTK_PIN_NO(0) | 1) +#define MT8135_PIN_0_MSDC0_DAT7__FUNC_EINT49 (MTK_PIN_NO(0) | 2) +#define MT8135_PIN_0_MSDC0_DAT7__FUNC_I2SOUT_DAT (MTK_PIN_NO(0) | 3) +#define MT8135_PIN_0_MSDC0_DAT7__FUNC_DAC_DAT_OUT (MTK_PIN_NO(0) | 4) +#define MT8135_PIN_0_MSDC0_DAT7__FUNC_PCM1_DO (MTK_PIN_NO(0) | 5) +#define MT8135_PIN_0_MSDC0_DAT7__FUNC_SPI1_MO (MTK_PIN_NO(0) | 6) +#define MT8135_PIN_0_MSDC0_DAT7__FUNC_NALE (MTK_PIN_NO(0) | 7) + +#define MT8135_PIN_1_MSDC0_DAT6__FUNC_GPIO1 (MTK_PIN_NO(1) | 0) +#define MT8135_PIN_1_MSDC0_DAT6__FUNC_MSDC0_DAT6 (MTK_PIN_NO(1) | 1) +#define MT8135_PIN_1_MSDC0_DAT6__FUNC_EINT48 (MTK_PIN_NO(1) | 2) +#define MT8135_PIN_1_MSDC0_DAT6__FUNC_I2SIN_WS (MTK_PIN_NO(1) | 3) +#define MT8135_PIN_1_MSDC0_DAT6__FUNC_DAC_WS (MTK_PIN_NO(1) | 4) +#define MT8135_PIN_1_MSDC0_DAT6__FUNC_PCM1_WS (MTK_PIN_NO(1) | 5) +#define MT8135_PIN_1_MSDC0_DAT6__FUNC_SPI1_CSN (MTK_PIN_NO(1) | 6) +#define MT8135_PIN_1_MSDC0_DAT6__FUNC_NCLE (MTK_PIN_NO(1) | 7) + +#define MT8135_PIN_2_MSDC0_DAT5__FUNC_GPIO2 (MTK_PIN_NO(2) | 0) +#define MT8135_PIN_2_MSDC0_DAT5__FUNC_MSDC0_DAT5 (MTK_PIN_NO(2) | 1) +#define MT8135_PIN_2_MSDC0_DAT5__FUNC_EINT47 (MTK_PIN_NO(2) | 2) +#define MT8135_PIN_2_MSDC0_DAT5__FUNC_I2SIN_CK (MTK_PIN_NO(2) | 3) +#define MT8135_PIN_2_MSDC0_DAT5__FUNC_DAC_CK (MTK_PIN_NO(2) | 4) +#define MT8135_PIN_2_MSDC0_DAT5__FUNC_PCM1_CK (MTK_PIN_NO(2) | 5) +#define MT8135_PIN_2_MSDC0_DAT5__FUNC_SPI1_CLK (MTK_PIN_NO(2) | 6) +#define MT8135_PIN_2_MSDC0_DAT5__FUNC_NLD4 (MTK_PIN_NO(2) | 7) + +#define MT8135_PIN_3_MSDC0_DAT4__FUNC_GPIO3 (MTK_PIN_NO(3) | 0) +#define MT8135_PIN_3_MSDC0_DAT4__FUNC_MSDC0_DAT4 (MTK_PIN_NO(3) | 1) +#define MT8135_PIN_3_MSDC0_DAT4__FUNC_EINT46 (MTK_PIN_NO(3) | 2) +#define MT8135_PIN_3_MSDC0_DAT4__FUNC_A_FUNC_CK (MTK_PIN_NO(3) | 3) +#define MT8135_PIN_3_MSDC0_DAT4__FUNC_LSCE1B_2X (MTK_PIN_NO(3) | 6) +#define MT8135_PIN_3_MSDC0_DAT4__FUNC_NLD5 (MTK_PIN_NO(3) | 7) + +#define MT8135_PIN_4_MSDC0_CMD__FUNC_GPIO4 (MTK_PIN_NO(4) | 0) +#define MT8135_PIN_4_MSDC0_CMD__FUNC_MSDC0_CMD (MTK_PIN_NO(4) | 1) +#define MT8135_PIN_4_MSDC0_CMD__FUNC_EINT41 (MTK_PIN_NO(4) | 2) +#define MT8135_PIN_4_MSDC0_CMD__FUNC_A_FUNC_DOUT_0 (MTK_PIN_NO(4) | 3) +#define MT8135_PIN_4_MSDC0_CMD__FUNC_USB_TEST_IO_0 (MTK_PIN_NO(4) | 5) +#define MT8135_PIN_4_MSDC0_CMD__FUNC_LRSTB_2X (MTK_PIN_NO(4) | 6) +#define MT8135_PIN_4_MSDC0_CMD__FUNC_NRNB (MTK_PIN_NO(4) | 7) + +#define MT8135_PIN_5_MSDC0_CLK__FUNC_GPIO5 (MTK_PIN_NO(5) | 0) +#define MT8135_PIN_5_MSDC0_CLK__FUNC_MSDC0_CLK (MTK_PIN_NO(5) | 1) +#define MT8135_PIN_5_MSDC0_CLK__FUNC_EINT40 (MTK_PIN_NO(5) | 2) +#define MT8135_PIN_5_MSDC0_CLK__FUNC_A_FUNC_DOUT_1 (MTK_PIN_NO(5) | 3) +#define MT8135_PIN_5_MSDC0_CLK__FUNC_USB_TEST_IO_1 (MTK_PIN_NO(5) | 5) +#define MT8135_PIN_5_MSDC0_CLK__FUNC_LPTE (MTK_PIN_NO(5) | 6) +#define MT8135_PIN_5_MSDC0_CLK__FUNC_NREB (MTK_PIN_NO(5) | 7) + +#define MT8135_PIN_6_MSDC0_DAT3__FUNC_GPIO6 (MTK_PIN_NO(6) | 0) +#define MT8135_PIN_6_MSDC0_DAT3__FUNC_MSDC0_DAT3 (MTK_PIN_NO(6) | 1) +#define MT8135_PIN_6_MSDC0_DAT3__FUNC_EINT45 (MTK_PIN_NO(6) | 2) +#define MT8135_PIN_6_MSDC0_DAT3__FUNC_A_FUNC_DOUT_2 (MTK_PIN_NO(6) | 3) +#define MT8135_PIN_6_MSDC0_DAT3__FUNC_USB_TEST_IO_2 (MTK_PIN_NO(6) | 5) +#define MT8135_PIN_6_MSDC0_DAT3__FUNC_LSCE0B_2X (MTK_PIN_NO(6) | 6) +#define MT8135_PIN_6_MSDC0_DAT3__FUNC_NLD7 (MTK_PIN_NO(6) | 7) + +#define MT8135_PIN_7_MSDC0_DAT2__FUNC_GPIO7 (MTK_PIN_NO(7) | 0) +#define MT8135_PIN_7_MSDC0_DAT2__FUNC_MSDC0_DAT2 (MTK_PIN_NO(7) | 1) +#define MT8135_PIN_7_MSDC0_DAT2__FUNC_EINT44 (MT
[PATCH 3/5] metag: use for_each_sg()
Since metag doesn't select ARCH_HAS_SG_CHAIN, it is not necessary to use for_each_sg() in order to loop over each sg element. But this can help find problems with drivers that do not properly initialize their sg tables when CONFIG_DEBUG_SG is enabled. Signed-off-by: Akinobu Mita Cc: James Hogan Cc: linux-me...@vger.kernel.org Cc: linux-a...@vger.kernel.org --- arch/metag/include/asm/dma-mapping.h | 14 +- 1 file changed, 9 insertions(+), 5 deletions(-) diff --git a/arch/metag/include/asm/dma-mapping.h b/arch/metag/include/asm/dma-mapping.h index 14b23ef..eb5cdec 100644 --- a/arch/metag/include/asm/dma-mapping.h +++ b/arch/metag/include/asm/dma-mapping.h @@ -134,20 +134,24 @@ dma_sync_single_range_for_device(struct device *dev, dma_addr_t dma_handle, } static inline void -dma_sync_sg_for_cpu(struct device *dev, struct scatterlist *sg, int nelems, +dma_sync_sg_for_cpu(struct device *dev, struct scatterlist *sglist, int nelems, enum dma_data_direction direction) { int i; - for (i = 0; i < nelems; i++, sg++) + struct scatterlist *sg; + + for_each_sg(sglist, sg, nelems, i) dma_sync_for_cpu(sg_virt(sg), sg->length, direction); } static inline void -dma_sync_sg_for_device(struct device *dev, struct scatterlist *sg, int nelems, - enum dma_data_direction direction) +dma_sync_sg_for_device(struct device *dev, struct scatterlist *sglist, + int nelems, enum dma_data_direction direction) { int i; - for (i = 0; i < nelems; i++, sg++) + struct scatterlist *sg; + + for_each_sg(sglist, sg, nelems, i) dma_sync_for_device(sg_virt(sg), sg->length, direction); } -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 5/5] mips: use for_each_sg()
Since mips doesn't select ARCH_HAS_SG_CHAIN, it is not necessary to use for_each_sg() in order to loop over each sg element. But this can help find problems with drivers that do not properly initialize their sg tables when CONFIG_DEBUG_SG is enabled. Signed-off-by: Akinobu Mita Cc: Ralf Baechle Cc: linux-m...@linux-mips.org Cc: linux-a...@vger.kernel.org --- arch/mips/mm/dma-default.c | 30 -- 1 file changed, 20 insertions(+), 10 deletions(-) diff --git a/arch/mips/mm/dma-default.c b/arch/mips/mm/dma-default.c index 609d124..eeaf024 100644 --- a/arch/mips/mm/dma-default.c +++ b/arch/mips/mm/dma-default.c @@ -262,12 +262,13 @@ static void mips_dma_unmap_page(struct device *dev, dma_addr_t dma_addr, plat_unmap_dma_mem(dev, dma_addr, size, direction); } -static int mips_dma_map_sg(struct device *dev, struct scatterlist *sg, +static int mips_dma_map_sg(struct device *dev, struct scatterlist *sglist, int nents, enum dma_data_direction direction, struct dma_attrs *attrs) { int i; + struct scatterlist *sg; - for (i = 0; i < nents; i++, sg++) { + for_each_sg(sglist, sg, nents, i) { if (!plat_device_is_coherent(dev)) __dma_sync(sg_page(sg), sg->offset, sg->length, direction); @@ -291,13 +292,14 @@ static dma_addr_t mips_dma_map_page(struct device *dev, struct page *page, return plat_map_dma_mem_page(dev, page) + offset; } -static void mips_dma_unmap_sg(struct device *dev, struct scatterlist *sg, +static void mips_dma_unmap_sg(struct device *dev, struct scatterlist *sglist, int nhwentries, enum dma_data_direction direction, struct dma_attrs *attrs) { int i; + struct scatterlist *sg; - for (i = 0; i < nhwentries; i++, sg++) { + for_each_sg(sglist, sg, nhwentries, i) { if (!plat_device_is_coherent(dev) && direction != DMA_TO_DEVICE) __dma_sync(sg_page(sg), sg->offset, sg->length, @@ -324,26 +326,34 @@ static void mips_dma_sync_single_for_device(struct device *dev, } static void mips_dma_sync_sg_for_cpu(struct device *dev, - struct scatterlist *sg, int nelems, enum dma_data_direction direction) + struct scatterlist *sglist, int nelems, + enum dma_data_direction direction) { int i; + struct scatterlist *sg; - if (cpu_needs_post_dma_flush(dev)) - for (i = 0; i < nelems; i++, sg++) + if (cpu_needs_post_dma_flush(dev)) { + for_each_sg(sglist, sg, nelems, i) { __dma_sync(sg_page(sg), sg->offset, sg->length, direction); + } + } plat_post_dma_flush(dev); } static void mips_dma_sync_sg_for_device(struct device *dev, - struct scatterlist *sg, int nelems, enum dma_data_direction direction) + struct scatterlist *sglist, int nelems, + enum dma_data_direction direction) { int i; + struct scatterlist *sg; - if (!plat_device_is_coherent(dev)) - for (i = 0; i < nelems; i++, sg++) + if (!plat_device_is_coherent(dev)) { + for_each_sg(sglist, sg, nelems, i) { __dma_sync(sg_page(sg), sg->offset, sg->length, direction); + } + } } int mips_dma_mapping_error(struct device *dev, dma_addr_t dma_addr) -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 4/5] xtensa: use for_each_sg()
Since xtensa doesn't select ARCH_HAS_SG_CHAIN, it is not necessary to use for_each_sg() in order to loop over each sg element. But this can help find problems with drivers that do not properly initialize their sg tables when CONFIG_DEBUG_SG is enabled. Signed-off-by: Akinobu Mita Cc: Chris Zankel Cc: Max Filippov Cc: linux-xte...@linux-xtensa.org Cc: linux-a...@vger.kernel.org --- arch/xtensa/include/asm/dma-mapping.h | 19 --- 1 file changed, 12 insertions(+), 7 deletions(-) diff --git a/arch/xtensa/include/asm/dma-mapping.h b/arch/xtensa/include/asm/dma-mapping.h index 172a02a..54d2b22 100644 --- a/arch/xtensa/include/asm/dma-mapping.h +++ b/arch/xtensa/include/asm/dma-mapping.h @@ -52,14 +52,15 @@ dma_unmap_single(struct device *dev, dma_addr_t dma_addr, size_t size, } static inline int -dma_map_sg(struct device *dev, struct scatterlist *sg, int nents, +dma_map_sg(struct device *dev, struct scatterlist *sglist, int nents, enum dma_data_direction direction) { int i; + struct scatterlist *sg; BUG_ON(direction == DMA_NONE); - for (i = 0; i < nents; i++, sg++ ) { + for_each_sg(sglist, sg, nents, i) { BUG_ON(!sg_page(sg)); sg->dma_address = sg_phys(sg); @@ -124,20 +125,24 @@ dma_sync_single_range_for_device(struct device *dev, dma_addr_t dma_handle, consistent_sync((void *)bus_to_virt(dma_handle)+offset,size,direction); } static inline void -dma_sync_sg_for_cpu(struct device *dev, struct scatterlist *sg, int nelems, +dma_sync_sg_for_cpu(struct device *dev, struct scatterlist *sglist, int nelems, enum dma_data_direction dir) { int i; - for (i = 0; i < nelems; i++, sg++) + struct scatterlist *sg; + + for_each_sg(sglist, sg, nelems, i) consistent_sync(sg_virt(sg), sg->length, dir); } static inline void -dma_sync_sg_for_device(struct device *dev, struct scatterlist *sg, int nelems, -enum dma_data_direction dir) +dma_sync_sg_for_device(struct device *dev, struct scatterlist *sglist, + int nelems, enum dma_data_direction dir) { int i; - for (i = 0; i < nelems; i++, sg++) + struct scatterlist *sg; + + for_each_sg(sglist, sg, nelems, i) consistent_sync(sg_virt(sg), sg->length, dir); } static inline int -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 2/5] arc: use for_each_sg()
Since arc doesn't select ARCH_HAS_SG_CHAIN, it is not necessary to use for_each_sg() in order to loop over each sg element. But this can help find problems with drivers that do not properly initialize their sg tables when CONFIG_DEBUG_SG is enabled. Signed-off-by: Akinobu Mita Cc: Vineet Gupta Cc: linux-a...@vger.kernel.org --- arch/arc/include/asm/dma-mapping.h | 12 +++- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/arch/arc/include/asm/dma-mapping.h b/arch/arc/include/asm/dma-mapping.h index 45b8e0c..f787894 100644 --- a/arch/arc/include/asm/dma-mapping.h +++ b/arch/arc/include/asm/dma-mapping.h @@ -178,22 +178,24 @@ dma_sync_single_range_for_device(struct device *dev, dma_addr_t dma_handle, } static inline void -dma_sync_sg_for_cpu(struct device *dev, struct scatterlist *sg, int nelems, +dma_sync_sg_for_cpu(struct device *dev, struct scatterlist *sglist, int nelems, enum dma_data_direction dir) { int i; + struct scatterlist *sg; - for (i = 0; i < nelems; i++, sg++) + for_each_sg(sglist, sg, nelems, i) _dma_cache_sync((unsigned int)sg_virt(sg), sg->length, dir); } static inline void -dma_sync_sg_for_device(struct device *dev, struct scatterlist *sg, int nelems, - enum dma_data_direction dir) +dma_sync_sg_for_device(struct device *dev, struct scatterlist *sglist, + int nelems, enum dma_data_direction dir) { int i; + struct scatterlist *sg; - for (i = 0; i < nelems; i++, sg++) + for_each_sg(sglist, sg, nelems, i) _dma_cache_sync((unsigned int)sg_virt(sg), sg->length, dir); } -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 1/5] m68k: use for_each_sg()
Since m68k doesn't select ARCH_HAS_SG_CHAIN, it is not necessary to use for_each_sg() in order to loop over each sg element. But this can help find problems with drivers that do not properly initialize their sg tables when CONFIG_DEBUG_SG is enabled. Signed-off-by: Akinobu Mita Cc: Geert Uytterhoeven Cc: linux-m...@lists.linux-m68k.org Cc: linux-a...@vger.kernel.org --- arch/m68k/kernel/dma.c | 19 --- 1 file changed, 12 insertions(+), 7 deletions(-) diff --git a/arch/m68k/kernel/dma.c b/arch/m68k/kernel/dma.c index e546a55..564665f 100644 --- a/arch/m68k/kernel/dma.c +++ b/arch/m68k/kernel/dma.c @@ -120,13 +120,16 @@ void dma_sync_single_for_device(struct device *dev, dma_addr_t handle, } EXPORT_SYMBOL(dma_sync_single_for_device); -void dma_sync_sg_for_device(struct device *dev, struct scatterlist *sg, int nents, - enum dma_data_direction dir) +void dma_sync_sg_for_device(struct device *dev, struct scatterlist *sglist, + int nents, enum dma_data_direction dir) { int i; + struct scatterlist *sg; - for (i = 0; i < nents; sg++, i++) - dma_sync_single_for_device(dev, sg->dma_address, sg->length, dir); + for_each_sg(sglist, sg, nents, i) { + dma_sync_single_for_device(dev, sg->dma_address, sg->length, + dir); + } } EXPORT_SYMBOL(dma_sync_sg_for_device); @@ -151,14 +154,16 @@ dma_addr_t dma_map_page(struct device *dev, struct page *page, } EXPORT_SYMBOL(dma_map_page); -int dma_map_sg(struct device *dev, struct scatterlist *sg, int nents, +int dma_map_sg(struct device *dev, struct scatterlist *sglist, int nents, enum dma_data_direction dir) { int i; + struct scatterlist *sg; - for (i = 0; i < nents; sg++, i++) { + for_each_sg(sglist, sg, nents, i) { sg->dma_address = sg_phys(sg); - dma_sync_single_for_device(dev, sg->dma_address, sg->length, dir); + dma_sync_single_for_device(dev, sg->dma_address, sg->length, + dir); } return nents; } -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[BUG] Jiffies not incrementing, tick_handover_do_timer strictly tied to hotplug
Hi, Linux version 3.10.17 Problem Statement: The timekeeping/do_timer seems to be stopped and the core (in this case it is core0) which is aborting is stuck in the loop which relies on jiffies. The root cause/Reason: we have tickless kernel, so cpu goes to deep idle state, and stop sched tick. tick_nohz_stop_sched_tick tick_sched_do_timer should then take the job and whichever cpu is running transfer jiffies incrementing job to itself. which is tick_sched_do_timer but when say core0 has raised BUG, ipi_cpu_stop will amek other cpu to go to stop. and clcokevents_notify/tick_notify/hrtimer_notifiy eventually seem to be conencted through cpu_chain. but this code belong to hotplug where cpu_down happen and then it can successfully call tick_handover_do_timer which will take over the duty from dying cpu and assign it to the one which is online. static void tick_handover_do_timer(int *cpup) { if (*cpup == tick_do_timer_cpu) { int cpu = cpumask_first(cpu_online_mask); tick_do_timer_cpu = (cpu < nr_cpu_ids) ? cpu : TICK_DO_TIMER_NONE; } } but since cpu_down is not getting called, this handover is not happening. and the last status of the variable tick_do_timer_cpu is always pointing to DEAD cpu (1,2 or 3). and core0 waits forever (where if the code relies on the increment of jiffies). what is the right way to approach this problem, at first it looks like kernel should take care of handing over the jiffies job to other online core indepedent of hotplug. Regards, Oza. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v6 4/4] clk: dt: Introduce binding for always-on clock support
On Fri, 01 May 2015, Sascha Hauer wrote: > On Thu, Apr 30, 2015 at 10:57:22AM +0100, Lee Jones wrote: > > On Wed, 29 Apr 2015, Maxime Ripard wrote: > > > > > On Wed, Apr 29, 2015 at 03:17:51PM +0100, Lee Jones wrote: > > > > On Wed, 22 Apr 2015, Maxime Ripard wrote: > > > > > > > > > On Wed, Apr 08, 2015 at 06:23:44PM +0100, Lee Jones wrote: > > > > > > On Wed, 08 Apr 2015, Maxime Ripard wrote: > > > > > > > > > > > > > On Wed, Apr 08, 2015 at 11:38:32AM +0100, Lee Jones wrote: > > > > > > > > On Wed, 08 Apr 2015, Maxime Ripard wrote: > > > > > > > > > > > > > > > > > On Wed, Apr 08, 2015 at 09:14:50AM +0100, Lee Jones wrote: > > > > > > > > > > > > + > > > > > > > > > > > > + This property is not to be abused. > > > > > > > > > > > > It is only to be used to > > > > > > > > > > > > + protect platforms from being > > > > > > > > > > > > crippled by gated clocks, not > > > > > > > > > > > > + as a convenience function to avoid > > > > > > > > > > > > using the framework > > > > > > > > > > > > + correctly inside device drivers. > > > > > > > > > > > > > > > > > > > > > > Disregarding what's stated here, I'm pretty sure that > > > > > > > > > > > this will > > > > > > > > > > > actually happen. Where do you place the cursor? > > > > > > > > > > > > > > > > > > > > That's up to Mike. > > > > > > > > > > > > > > > > > > Except that Mike won't review any of the DT changes, so he > > > > > > > > > won't be > > > > > > > > > able to refrain users from using it. Let alone out-of-tree > > > > > > > > > DTs using a > > > > > > > > > mainline kernel. > > > > > > > > > > > > > > > > Ideally Mike should be Cc'ed on patches using clock bindings, > > > > > > > > but if > > > > > > > > he isn't the DT guys are smart enough to either make the right > > > > > > > > decisions themselves (Rob has Acked these bindings already, so > > > > > > > > will be > > > > > > > > on the lookout for misuse, I'm sure), or ask for Mike's help. > > > > > > > > > > > > > > Yeah, right, as if this strategy really worked in the past > > > > > > > > > > > > > > Do we really want to look at even the DT bindings that have > > > > > > > actually > > > > > > > been reviewed by maintainers that got merged? > > > > > > > > > > > > > > They don't have time for that, which is totally fine, but we > > > > > > > really > > > > > > > should bury our head in the sand by actually thinking they will > > > > > > > review > > > > > > > every single DT-related patch. > > > > > > > > > > > > > > Using that as an argument is just plain denial of what really > > > > > > > happened > > > > > > > for the past 4 years. > > > > > > > > > > > > I agree that it's a problem, but this is a process problem and has > > > > > > nothing to do with this set. If you have a problem with the current > > > > > > process and have a better alternative, submit your thoughts to the > > > > > > DT > > > > > > list. Rejecting all new bindings because you are frightened that > > > > > > they > > > > > > will be used in a manner that they were not intended is not the way > > > > > > to > > > > > > go though. > > > > > > > > > > I'm not saying that this binding should not go in because of a process > > > > > issue. > > > > > > > > > > I'm saying that discarding arguments against your binding by adding > > > > > restrictions that cannot be enforced is not reasonable. > > > > > > > > I'm open to constructive suggestions/alternatives. > > > > > > > > Hand rolling this stuff in C per vendor is not of of them. > > > > > > I'm sorry, but ruling out alternatives that work for everyone (and > > > actually work better) just because you don't want to edit a C file is > > > not really constructive either. > > > > > > > > > > > > > > Should we create a new driver for our RAM controller, or > > > > > > > > > > > do we want to > > > > > > > > > > > use clock-always-on? > > > > > > > > > > > > > > > > > > > > I would say that if all the driver did was to enable > > > > > > > > > > clocks, then you > > > > > > > > > > should use this instead. This binding was designed > > > > > > > > > > specifically for > > > > > > > > > > that purpose. > > > > > > > > > > > > > > > > > > > > However, if the aforementioned driver clock can be safely > > > > > > > > > > gated, then > > > > > > > > > > it should not be an always-on clock. > > > > > > > > > > > > > > > > > > Yeah, of course, I understand the original intent of it, but > > > > > > > > > that > > > > > > > > > argument, which might very well be true at one point in time, > > > > > > > > > might > > > > > > > > > not be true anymore two or three releases later. > > > > > > > > > > > > > > > > Why? The H/W isn't going to change in two or three releases. > > > > > > > > The > > > > > > > > clocks designated as 'always-on' will have to be on forever, or > > > > > > > > synonymously, 'always'. > > > > > > > > > > > > > > > > > And that driv
Re: [PATCH 3/3] context_tracking,x86: remove extraneous irq disable & enable from context tracking on syscall entry
* r...@redhat.com wrote: > From: Rik van Riel > > On syscall entry with nohz_full on, we enable interrupts, call user_exit, > disable interrupts, do something, re-enable interrupts, and go on our > merry way. > > Profiling shows that a large amount of the nohz_full overhead comes > from the extraneous disabling and re-enabling of interrupts. Andy > suggested simply not enabling interrupts until after the context > tracking code has done its thing, which allows us to skip a whole > interrupt disable & re-enable cycle. > > This patch builds on top of these patches by Paolo: > https://lkml.org/lkml/2015/4/28/188 > https://lkml.org/lkml/2015/4/29/139 > > Together with this patch I posted earlier this week, the syscall path > on a nohz_full cpu seems to be about 10% faster. > https://lkml.org/lkml/2015/4/24/394 > > My test is a simple microbenchmark that calls getpriority() in a loop > 10 million times: > > run timesystem time > vanilla 5.49s 2.08s > __acct patch 5.21s 1.92s > both patches 4.88s 1.71s Just curious, what are the numbers if you don't have context tracking enabled, i.e. without nohz_full? I.e. what's the baseline we are talking about? Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v7 00/23] IB/Verbs: IB Management Helpers
On Tue, Apr 28, 2015 at 05:10:00PM +0200, Michael Wang wrote: > Since v6: > * Thanks to Ira, Devesh for the review and testing :-) > * Thanks for the comments from Sean, Tom, Jason, Doug, Devesh, Ira, > Liran :-) Please remind me if anything missed :-P > * Use query_protocol() and enum protocol type in 1# > * Use rdma_protocol_XX() in 2# > * Drop cma_set_legacy_transport() > * Reserve rdma_ib_or_iboe() and rdma_node_get_transport() > * Updated github repository to v7 I pulled these via Dougs for-4.2 branch and have done light testing with mlx4 and qib. Now we need to look at converting to some bit mask. Does anyone have a link to the emails which proposed bitmasks? I can't find them right now. For the Series: Reviewed-by: Ira Weiny > > There are plenty of lengthy code to check the transport type of IB device, > or the link layer type of it's port, but actually we are just speculating > whether a particular management/feature is supported by the device/port. > > Thus instead of inferring, we should have our own mechanism for IB management > capability/protocol/feature checking, several proposals below. > > This patch set will introduce query_protocol() to check management requirement > instead of inferring from transport and link layer respectively, along with > the new enum on protocol type. > > Mapping List: > node-type link-layer transport protocol > nes RNICETH IWARP IWARP > amso1100 RNICETH IWARP IWARP > cxgb3 RNICETH IWARP IWARP > cxgb4 RNICETH IWARP IWARP > usnic USNIC_UDP ETH USNIC_UDP USNIC_UDP > ocrdmaIB_CA ETH IB IBOE > mlx4 IB_CA IB/ETH IB IB/IBOE > mlx5 IB_CA IB IB IB > ehca IB_CA IB IB IB > ipath IB_CA IB IB IB > mthca IB_CA IB IB IB > qib IB_CA IB IB IB > > For example: > if (transport == IB) && (link-layer == ETH) > will now become: > if (query_protocol() == IBOE) > > Thus we will be able to get rid of the respective transport and link-layer > checking, and it will help us to add new protocol/Technology (like OPA) more > easier, also with the introduced management helpers, IB management logical > will be more clear and easier for extending. > > Highlights: > The 'mgmt-helpers' branch of 'g...@github.com:ywang-pb/infiniband-wy.git' > contain this series based on the latest 'infiniband/for-next' > > The patch set covered a wide range of IB stuff, thus for those who are > familiar with the particular part, your suggestion would be invaluable ;-) > > Patch 1#~14# included all the logical reform, 15#~23# introduced the > management helpers. > > we appreciate for those one who have the HW willing to provide Tested-by > :-) > > Doug suggested the bitmask mechanism: > https://www.mail-archive.com/linux-rdma@vger.kernel.org/msg23765.html > which could be the plan for future reforming, we prefer that to be another > series which focus on semantic and performance. > > This patch-set is somewhat 'bloated' now and it may be a good timing for > staging, I'd like to suggest we focus on improving existed helpers and > push > all the further reforms into next series ;-) > > > Proposals: > Sean: > https://www.mail-archive.com/linux-rdma@vger.kernel.org/msg23339.html > Doug: > https://www.mail-archive.com/linux-rdma@vger.kernel.org/msg23418.html > https://www.mail-archive.com/linux-rdma@vger.kernel.org/msg23765.html > Jason: > https://www.mail-archive.com/linux-rdma@vger.kernel.org/msg23425.html > > Michael Wang (23): > IB/Verbs: Implement new callback query_protocol() > IB/Verbs: Implement raw management helpers > IB/Verbs: Reform IB-core mad/agent/user_mad > IB/Verbs: Reform IB-core cm > IB/Verbs: Reform IB-core sa_query > IB/Verbs: Reform IB-core multicast > IB/Verbs: Reform IB-ulp ipoib > IB/Verbs: Reform IB-ulp xprtrdma > IB/Verbs: Reform IB-core verbs > IB/Verbs: Reform cm related part in IB-core cma/ucm > IB/Verbs: Reform route related part in IB-core cma > IB/Verbs: Reform mcast related part in IB-core cma > IB/Verbs: Reform cma_acquire_dev() > IB/Verbs: Reform rest part in IB-core cma > IB/Verbs: Use management helper cap_ib_mad() > IB/Verbs: Use management helper cap_ib_smi() > IB/Verbs: Use management helper cap_ib_cm() > IB/Verbs: Use management helper cap_iw_cm() > IB/Verbs: Use management helper cap_ib_sa() > IB/Verbs: Use management helper cap_ib_mcast() > IB/Verbs: Use manageme
Re: [GIT PULL 0/7] perf/urgent fixes
* Arnaldo Carvalho de Melo wrote: > Hi Ingo, > > Please consider pulling, this is on top of my previous > 'perf-urgent-for-mingo' > pull request. > > - Arnaldo > > The following changes since commit de28c15daf60e9625bece22f13a091fac8d05f1d: > > tools lib api: Undefine _FORTIFY_SOURCE before setting it (2015-04-23 > 17:08:23 -0300) > > are available in the git repository at: > > git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git > tags/perf-urgent-for-mingo-2 > > for you to fetch changes up to 1d90a685eb75a56648d7dd22c704a1a6da516de9: > > perf bench numa: Fix immediate meeting of convergence condition (2015-04-27 > 13:57:50 -0300) > > > perf/urgent fixes: > > User visible: > > . Fix a segfault in 'perf top' when kernel map is restricted (Wang Nan) > > . Fix hung wakeup tasks after requeueing in 'perf bench futex' (Davidlohr > Bueso) > > . Fix bug in perf probe global variables handling, missing curly braces on > an if body (He Kuang) > > . 'perf bench numa' fixes (command line help/handling, etc) (Petr Holasek) > > Build fixes: > > . 'perf kmem' on RHEL6/OL6 (David Ahern) > > . libtraceevent on 32-bit arch (Namhyung Kim) > > Signed-off-by: Arnaldo Carvalho de Melo > > > David Ahern (1): > perf kmem: Fix compiles on RHEL6/OL6 > > Davidlohr Bueso (1): > perf bench futex: Fix hung wakeup tasks after requeueing > > He Kuang (1): > perf probe: Fix bug with global variables handling > > Namhyung Kim (1): > tools lib traceevent: Fix build failure on 32-bit arch > > Petr Holasek (2): > perf bench numa: Fixes of --quiet argument > perf bench numa: Fix immediate meeting of convergence condition > > Wang Nan (1): > perf top: Fix a segfault when kernel map is restricted. > > tools/lib/traceevent/event-parse.c | 2 +- > tools/perf/bench/futex-requeue.c | 15 ++- > tools/perf/bench/numa.c| 12 +++-- > tools/perf/builtin-kmem.c | 54 > +++--- > tools/perf/builtin-top.c | 2 +- > tools/perf/util/probe-finder.c | 4 ++- > 6 files changed, 50 insertions(+), 39 deletions(-) Pulled, thanks a lot Arnaldo! Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH kernel v9 29/32] vfio: powerpc/spapr: Register memory and define IOMMU v2
On 05/01/2015 03:23 PM, David Gibson wrote: On Fri, May 01, 2015 at 02:35:23PM +1000, Alexey Kardashevskiy wrote: On 04/30/2015 04:55 PM, David Gibson wrote: On Sat, Apr 25, 2015 at 10:14:53PM +1000, Alexey Kardashevskiy wrote: The existing implementation accounts the whole DMA window in the locked_vm counter. This is going to be worse with multiple containers and huge DMA windows. Also, real-time accounting would requite additional tracking of accounted pages due to the page size difference - IOMMU uses 4K pages and system uses 4K or 64K pages. Another issue is that actual pages pinning/unpinning happens on every DMA map/unmap request. This does not affect the performance much now as we spend way too much time now on switching context between guest/userspace/host but this will start to matter when we add in-kernel DMA map/unmap acceleration. This introduces a new IOMMU type for SPAPR - VFIO_SPAPR_TCE_v2_IOMMU. New IOMMU deprecates VFIO_IOMMU_ENABLE/VFIO_IOMMU_DISABLE and introduces 2 new ioctls to register/unregister DMA memory - VFIO_IOMMU_SPAPR_REGISTER_MEMORY and VFIO_IOMMU_SPAPR_UNREGISTER_MEMORY - which receive user space address and size of a memory region which needs to be pinned/unpinned and counted in locked_vm. New IOMMU splits physical pages pinning and TCE table update into 2 different operations. It requires 1) guest pages to be registered first 2) consequent map/unmap requests to work only with pre-registered memory. For the default single window case this means that the entire guest (instead of 2GB) needs to be pinned before using VFIO. When a huge DMA window is added, no additional pinning will be required, otherwise it would be guest RAM + 2GB. The new memory registration ioctls are not supported by VFIO_SPAPR_TCE_IOMMU. Dynamic DMA window and in-kernel acceleration will require memory to be preregistered in order to work. The accounting is done per the user process. This advertises v2 SPAPR TCE IOMMU and restricts what the userspace can do with v1 or v2 IOMMUs. Signed-off-by: Alexey Kardashevskiy [aw: for the vfio related changes] Acked-by: Alex Williamson --- Changes: v9: * s/tce_get_hva_cached/tce_iommu_use_page_v2/ v7: * now memory is registered per mm (i.e. process) * moved memory registration code to powerpc/mmu * merged "vfio: powerpc/spapr: Define v2 IOMMU" into this * limited new ioctls to v2 IOMMU * updated doc * unsupported ioclts return -ENOTTY instead of -EPERM v6: * tce_get_hva_cached() returns hva via a pointer v4: * updated docs * s/kzmalloc/vzalloc/ * in tce_pin_pages()/tce_unpin_pages() removed @vaddr, @size and replaced offset with index * renamed vfio_iommu_type_register_memory to vfio_iommu_spapr_register_memory and removed duplicating vfio_iommu_spapr_register_memory --- Documentation/vfio.txt | 23 drivers/vfio/vfio_iommu_spapr_tce.c | 230 +++- include/uapi/linux/vfio.h | 27 + 3 files changed, 274 insertions(+), 6 deletions(-) diff --git a/Documentation/vfio.txt b/Documentation/vfio.txt index 96978ec..94328c8 100644 --- a/Documentation/vfio.txt +++ b/Documentation/vfio.txt @@ -427,6 +427,29 @@ The code flow from the example above should be slightly changed: +5) There is v2 of SPAPR TCE IOMMU. It deprecates VFIO_IOMMU_ENABLE/ +VFIO_IOMMU_DISABLE and implements 2 new ioctls: +VFIO_IOMMU_SPAPR_REGISTER_MEMORY and VFIO_IOMMU_SPAPR_UNREGISTER_MEMORY +(which are unsupported in v1 IOMMU). A summary of the semantic differeces between v1 and v2 would be nice. At this point it's not really clear to me if there's a case for creating v2, or if this could just be done by adding (optional) functionality to v1. v1: memory preregistration is not supported; explicit enable/disable ioctls are required v2: memory preregistration is required; explicit enable/disable are prohibited (as they are not needed). Mixing these in one IOMMU type caused a lot of problems like should I increment locked_vm by the 32bit window size on enable() or not; what do I do about pages pinning when map/map (check if it is from registered memory and do not pin?). Having 2 IOMMU models makes everything a lot simpler. Ok. Would it simplify it further if you made v2 only usable on IODA2 hardware? Very little. V2 addresses memory pinning issue which is handled the same way on ioda2 and older hardware, including KVM acceleration. Whether enable DDW or not - this is handled just fine via extra properties in the GET_INFO ioctl(). IODA2 and others are different in handling multiple groups per container but this does not require changes to userspace API. And remember, the only machine I can use 100% of time is POWER7/P5IOC2 so it is really useful if at least some bits of the patchset can be tested there; if it was a bit less different from IODA2, I would have even implemented DDW there too :) +PPC64 paravirtualized guests generate a lot of map/unmap requests, +and the handling of those
Re: [PATCH 2/2] brcmfmac: keep WiFi chip's power during system suspension
On 2015/4/27 16:53, Arend van Spriel wrote: > On 04/27/15 07:06, Fu, Zhonghui wrote: >> Need to keep the power supply for WiFi chip during system suspension. >> Otherwise, the context of WiFi chip will be lost. > > I already submitted a patch doing exactly the same thing [1] OK, please ignore this patch. What's the target kernel version of your patch? Thanks, Zhonghui > > Regards, > Arend > > [1] https://patchwork.kernel.org/patch/6217391/ > >> Signed-off-by: Zhonghui Fu >> --- >> drivers/net/wireless/brcm80211/brcmfmac/bcmsdh.c | 10 ++ >> 1 files changed, 6 insertions(+), 4 deletions(-) >> >> diff --git a/drivers/net/wireless/brcm80211/brcmfmac/bcmsdh.c >> b/drivers/net/wireless/brcm80211/brcmfmac/bcmsdh.c >> index fdf8feb..03d3671 100644 >> --- a/drivers/net/wireless/brcm80211/brcmfmac/bcmsdh.c >> +++ b/drivers/net/wireless/brcm80211/brcmfmac/bcmsdh.c >> @@ -1251,15 +1251,17 @@ static int brcmf_ops_sdio_suspend(struct device *dev) >> brcmf_sdiod_freezer_on(sdiodev); >> brcmf_sdio_wd_timer(sdiodev->bus, 0); >> >> +sdio_flags = MMC_PM_KEEP_POWER; >> if (sdiodev->wowl_enabled) { >> -sdio_flags = MMC_PM_KEEP_POWER; >> if (sdiodev->pdata->oob_irq_supported) >> enable_irq_wake(sdiodev->pdata->oob_irq_nr); >> else >> -sdio_flags = MMC_PM_WAKE_SDIO_IRQ; >> -if (sdio_set_host_pm_flags(sdiodev->func[1], sdio_flags)) >> -brcmf_err("Failed to set pm_flags %x\n", sdio_flags); >> +sdio_flags |= MMC_PM_WAKE_SDIO_IRQ; >> } >> + >> +if (sdio_set_host_pm_flags(sdiodev->func[1], sdio_flags)) >> +brcmf_err("Failed to set pm_flags %x\n", sdio_flags); >> + >> return 0; >> } >> >> -- 1.7.1 >> > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH bisected regression] input_available_p() sometimes says 'no' when it should say 'yes'
Hi Peter, I recently had a report of a regression in 3.12. I bisected it down to your patch f95499c3030f ("n_tty: Don't wait for buffer work in read() loop") Sometimes a poll on a master-pty will report there is nothing to read after the slave has written something. As test program is below. On a kernel prior to your commit, this program never reports Total bytes read is 0. PollRC=0 On a kernel subsequent to your commit, that message is produced quite often. This was found while working on a debugger. Following the test program is my proposed patch which allows the program to run as it used to. It re-introduces the call to tty_flush_to_ldisc(), but only if it appears that there is nothing to read. Do you think this is a suitable fix? Do you even agree that it is a real bug? Thanks, NeilBrown -- #define _XOPEN_SOURCE #include #include #include #include #include #include #include #include #include #include #include #include #define USEPTY #define COUNT_MAX (500) #define MY_BREAKPOINT { asm("int $3"); } #define PTRACE_IGNORED (void *)0 /* ** Open s pseudo-tty pair. ** ** Return the master fd, set spty to the slave fd. */ int my_openpt(int *spty) { int mfd = posix_openpt(O_RDWR | O_NOCTTY); char *slavedev; int sfd=-1; if(mfd == -1) return -1; if(grantpt(mfd) == -1) return -1; if(unlockpt(mfd) == -1) return -1; slavedev = (char *)ptsname(mfd); if((sfd = open(slavedev, O_RDWR | O_NOCTTY)) == -1) { close(mfd); return -1; } if(spty != NULL) { *spty = sfd; } return mfd; } /* ** Read from the provided file descriptor if poll says there's ** anything there.. */ int DoPollRead(int mpty) { struct pollfd fds; int pollrc; ssize_t bread=0, totread=0; char readbuf[101]; /* ** Set up the poll. */ fds.fd = mpty; fds.events = POLLIN | POLLRDNORM | POLLRDBAND | POLLPRI; /* ** poll for any output. */ while((pollrc = poll(&fds, 1, 0)) == 1) { if(fds.revents & POLLIN) { bread = read( mpty, readbuf, 100 ); totread += bread; if(bread > 0) { //printf("Read %d bytes.\n", (int)bread); readbuf[bread] = '\0'; //printf("\t%s", readbuf); } else { //puts("Nothing read.\n"); } } else if (fds.revents & (POLLHUP | POLLERR | POLLNVAL)) { printf ("hangup/error/invalid on poll\n"); return totread; } else { printf("No POLLIN, revents=%d\n", fds.revents); }; } /* ** This sometimes happens - we're expecting input on the pty, ** but nothing is there. */ if(totread == 0) printf("Total bytes read is 0. PollRC=%d\n", pollrc); return totread; } static void writeall (int fd, const char *buf, size_t count) { while (count) { ssize_t r = write (fd, buf, count); if (r == 0) break; if (r < 0 && errno == EINTR) continue; if (r < 0) exit (2); count -= r; buf += r; } } int thechild(void) { unsigned int i; writeall (1, "debuggee starts\n", strlen ("debuggee starts\n")); for(i=0 ; i Subject: [PATCH] n_tty: Sometimes wait for buffer work in read() loop Since commit f95499c3030f ("n_tty: Don't wait for buffer work in read() loop") it as been possible for poll to report that there is no data to read on a master-pty even if a write to the slave has actually completed. That patch removes a 'wait' when the wait isn't really necessary. Unfortunately it also removed it in the case when it *is* necessary. If the simple tests show that there is nothing to read, we really need to flush the work queue in case there is something ready but which hasn't arrived yet. This patch restores the wait, but only if simple tests suggest there is nothing ready. Reported-by: Nic Percival Reported-by: Michael Matz Fixes: f95499c3030f ("n_tty: Don't wait for buffer work in read() loop") Signed-off-by: NeilBrown diff --git a/drivers/tty/n_tty.c b/drivers/tty/n_tty.c index cf6e0f2e1331..9884091819b6 100644 --- a/drivers/tty/n_tty.c +++ b/drivers/tty/n_tty.c @@ -1942,11 +1942,18 @@ static inline int input_available_p(struct tty_struct *tty, int poll) { struct n_tty_data *ldata = tty->disc_data; int amt = poll && !TIME_CHAR(tty) && MIN_CHAR(tty) ? MIN_CHAR(tty) : 1; - - if (ldata->icanon && !L_EXTPROC(tty)) - return ldata->canon_head != ldata->read_tail; - else - return ldata->commit_head - ldata->read_tail >= amt; + int i; + int ret = 0; + + for (i = 0; !ret && i < 2; i++) { + if (i) + tty_flush_to_ldisc(tty); + if (ldata->icanon && !L_EXTPROC(tty)) + ret = (ldata->canon_head
[RFT][PATCH 2/2] regulator: max77843: Convert to use regulator_is_enabled_regmap
Use regulator_is_enabled_regmap() to replace max77843_reg_is_enabled(). Signed-off-by: Axel Lin --- drivers/regulator/max77843.c | 18 ++ 1 file changed, 2 insertions(+), 16 deletions(-) diff --git a/drivers/regulator/max77843.c b/drivers/regulator/max77843.c index 3ae2a9b..f4fd0d3 100644 --- a/drivers/regulator/max77843.c +++ b/drivers/regulator/max77843.c @@ -33,21 +33,6 @@ static const unsigned int max77843_safeout_voltage_table[] = { 330, }; -static int max77843_reg_is_enabled(struct regulator_dev *rdev) -{ - struct regmap *regmap = rdev->regmap; - int ret; - unsigned int reg; - - ret = regmap_read(regmap, rdev->desc->enable_reg, Ā®); - if (ret) { - dev_err(&rdev->dev, "Fialed to read charger register\n"); - return ret; - } - - return (reg & rdev->desc->enable_mask) == rdev->desc->enable_mask; -} - static int max77843_reg_get_current_limit(struct regulator_dev *rdev) { struct regmap *regmap = rdev->regmap; @@ -96,7 +81,7 @@ static int max77843_reg_set_current_limit(struct regulator_dev *rdev, } static struct regulator_ops max77843_charger_ops = { - .is_enabled = max77843_reg_is_enabled, + .is_enabled = regulator_is_enabled_regmap, .enable = regulator_enable_regmap, .disable= regulator_disable_regmap, .get_current_limit = max77843_reg_get_current_limit, @@ -141,6 +126,7 @@ static const struct regulator_desc max77843_supported_regulators[] = { .owner = THIS_MODULE, .enable_reg = MAX77843_CHG_REG_CHG_CNFG_00, .enable_mask= MAX77843_CHG_MASK | MAX77843_CHG_BUCK_MASK, + .enable_val = MAX77843_CHG_MASK | MAX77843_CHG_BUCK_MASK, }, }; -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFT][PATCH 1/2] regulator: max77843: Fix enable_mask for max77843 charger
MAX77843_CHG_ENABLE is 0x05, so the enable_mask should be MAX77843_CHG_MASK | MAX77843_CHG_BUCK_MASK. Signed-off-by: Axel Lin --- Hi, I don't have this h/w, so please help to review and test this patch serial. Thanks, Axel drivers/regulator/max77843.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/regulator/max77843.c b/drivers/regulator/max77843.c index e4e7687..3ae2a9b 100644 --- a/drivers/regulator/max77843.c +++ b/drivers/regulator/max77843.c @@ -140,7 +140,7 @@ static const struct regulator_desc max77843_supported_regulators[] = { .type = REGULATOR_CURRENT, .owner = THIS_MODULE, .enable_reg = MAX77843_CHG_REG_CHG_CNFG_00, - .enable_mask= MAX77843_CHG_MASK, + .enable_mask= MAX77843_CHG_MASK | MAX77843_CHG_BUCK_MASK, }, }; -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH kernel v9 31/32] vfio: powerpc/spapr: Support multiple groups in one container if possible
On 05/01/2015 02:33 PM, David Gibson wrote: On Thu, Apr 30, 2015 at 07:33:09PM +1000, Alexey Kardashevskiy wrote: On 04/30/2015 05:22 PM, David Gibson wrote: On Sat, Apr 25, 2015 at 10:14:55PM +1000, Alexey Kardashevskiy wrote: At the moment only one group per container is supported. POWER8 CPUs have more flexible design and allows naving 2 TCE tables per IOMMU group so we can relax this limitation and support multiple groups per container. It's not obvious why allowing multiple TCE tables per PE has any pearing on allowing multiple groups per container. This patchset is a global TCE tables rework (patches 1..30, roughly) with 2 outcomes: 1. reusing the same IOMMU table for multiple groups - patch 31; 2. allowing dynamic create/remove of IOMMU tables - patch 32. I can remove this one from the patchset and post it separately later but since 1..30 aim to support both 1) and 2), I'd think I better keep them all together (might explain some of changes I do in 1..30). The combined patchset is fine. My comment is because your commit message says that multiple groups are possible *because* 2 TCE tables per group are allowed, and it's not at all clear why one follows from the other. Ah. That's wrong indeed, I'll fix it. This adds TCE table descriptors to a container and uses iommu_table_group_ops to create/set DMA windows on IOMMU groups so the same TCE tables will be shared between several IOMMU groups. Signed-off-by: Alexey Kardashevskiy [aw: for the vfio related changes] Acked-by: Alex Williamson --- Changes: v7: * updated doc --- Documentation/vfio.txt | 8 +- drivers/vfio/vfio_iommu_spapr_tce.c | 268 ++-- 2 files changed, 199 insertions(+), 77 deletions(-) diff --git a/Documentation/vfio.txt b/Documentation/vfio.txt index 94328c8..7dcf2b5 100644 --- a/Documentation/vfio.txt +++ b/Documentation/vfio.txt @@ -289,10 +289,12 @@ PPC64 sPAPR implementation note This implementation has some specifics: -1) Only one IOMMU group per container is supported as an IOMMU group -represents the minimal entity which isolation can be guaranteed for and -groups are allocated statically, one per a Partitionable Endpoint (PE) +1) On older systems (POWER7 with P5IOC2/IODA1) only one IOMMU group per +container is supported as an IOMMU table is allocated at the boot time, +one table per a IOMMU group which is a Partitionable Endpoint (PE) (PE is often a PCI domain but not always). I thought the more fundamental problem was that different PEs tended to use disjoint bus address ranges, so even by duplicating put_tce across PEs you couldn't have a common address space. Sorry, I am not following you here. By duplicating put_tce, I can have multiple IOMMU groups on the same virtual PHB in QEMU, "[PATCH qemu v7 04/14] spapr_pci_vfio: Enable multiple groups per container" does this, the address ranges will the same. Oh, ok. For some reason I thought that (at least on the older machines) the different PEs used different and not easily changeable DMA windows in bus addresses space. They do use different tables (which VFIO does not get to remove/create and uses these old helpers - iommu_take/release_ownership), correct. But all these windows are mapped at zero on a PE's PCI bus and nothing prevents me from updating all these tables with the same TCE values when handling H_PUT_TCE. Yes it is slow but it works (bit more details below). What I cannot do on p5ioc2 is programming the same table to multiple physical PHBs (or I could but it is very different than IODA2 and pretty ugly and might not always be possible because I would have to allocate these pages from some common pool and face problems like fragmentation). So allowing multiple groups per container should be possible (at the kernel rather than qemu level) by writing the same value to multiple TCE tables. I guess its not worth doing for just the almost-obsolete IOMMUs though. It is done at QEMU level though. As it works now, QEMU opens a group, walks through all existing containers and tries attaching a new group there. If it succeeded (x86 always; POWER8 after this patch), a TCE table is shared. If it failed, QEMU creates another container, attaches it to the same VFIO/PHB address space and attaches a group there. Then the only thing left is repeating ioctl() in vfio_container_ioctl() for every container in the VFIO address space; this is what that QEMU patch does (the first version of that patch called ioctl() only for the first container in the address space). From the kernel prospective there are 2 isolated containers; I'd like to keep it this way. btw thanks for the detailed review :) -- Alexey -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86/mce: fix mce_restart() race with CPU hotplug operation
On Fri, May 1, 2015 at 12:29 AM, Borislav Petkov wrote: > On Thu, Apr 30, 2015 at 12:04:53AM +0900, Ethan Zhao wrote: >> while testing CPU hotplug and MCE with following two scripts, >> >> script 1: >> >> for i in {1..30}; do while :; do ((a=$RANDOM%160)); echo 0 >> >> /sys/devices/system/cpu/cpu${i}/online; echo 1 >> >> /sys/devices/system/cpu/cpu${i}/online; done & done >> >> script 2: >> >> while :; do for i in $(ls >> /sys/devices/system/machinecheck/machinecheck*/check_interval); do echo 1 >> >> >> $i; done; done > > For the record, it is a public secret that CPU hotplug is broken. IOW, > you're wasting your time with those senseless pounder tests but ok. :<, Someone else is stressing the CPU hotplug, seems it is fragile. My job is holding the system, not panic to the ground. Thanks, Ethan > > ... > >> --- >> arch/x86/kernel/cpu/mcheck/mce.c | 4 >> 1 file changed, 4 insertions(+) >> >> diff --git a/arch/x86/kernel/cpu/mcheck/mce.c >> b/arch/x86/kernel/cpu/mcheck/mce.c >> index 3c036cb..fcc2794 100644 >> --- a/arch/x86/kernel/cpu/mcheck/mce.c >> +++ b/arch/x86/kernel/cpu/mcheck/mce.c >> @@ -1338,8 +1338,10 @@ static void mce_timer_delete_all(void) >> { >> int cpu; >> >> + get_online_cpus(); >> for_each_online_cpu(cpu) >> del_timer_sync(&per_cpu(mce_timer, cpu)); >> + put_online_cpus(); >> } >> >> static void mce_do_trigger(struct work_struct *work) >> @@ -2085,7 +2087,9 @@ static void mce_cpu_restart(void *data) >> static void mce_restart(void) >> { >> mce_timer_delete_all(); >> + get_online_cpus(); >> on_each_cpu(mce_cpu_restart, NULL, 1); >> + put_online_cpus(); > > With your patch applied I get on 4.1-rc1+: > > --- > [ 41.364909] kvm: disabling virtualization on CPU1 > [ 41.371083] smpboot: CPU 1 is now offline > [ 41.381190] x86: Booting SMP configuration: > [ 41.385405] smpboot: Booting Node 0 Processor 1 APIC 0x2 > [ 41.402901] kvm: enabling virtualization on CPU1 > [ 41.440944] kvm: disabling virtualization on CPU1 > [ 41.447010] smpboot: CPU 1 is now offline > [ 41.486082] kvm: disabling virtualization on CPU6 > [ 41.491827] smpboot: CPU 6 is now offline > [ 41.497521] smpboot: Booting Node 0 Processor 6 APIC 0x5 > [ 41.514983] kvm: enabling virtualization on CPU6 > [ 41.561643] kvm: disabling virtualization on CPU6 > [ 41.566848] smpboot: CPU 6 is now offline > [ 41.572606] smpboot: Booting Node 0 Processor 6 APIC 0x5 > [ 41.590049] kvm: enabling virtualization on CPU6 > [ 41.636817] kvm: disabling virtualization on CPU6 > [ 41.642575] smpboot: CPU 6 is now offline > [ 41.676812] kvm: disabling virtualization on CPU7 > [ 41.682429] smpboot: CPU 7 is now offline > [ 41.687974] smpboot: Booting Node 0 Processor 7 APIC 0x7 > [ 41.705416] kvm: enabling virtualization on CPU7 > [ 41.752739] kvm: disabling virtualization on CPU7 > [ 41.758455] smpboot: CPU 7 is now offline > [ 41.764089] smpboot: Booting Node 0 Processor 7 APIC 0x7 > [ 41.781561] kvm: enabling virtualization on CPU7 > [ 41.831610] kvm: disabling virtualization on CPU7 > [ 41.837280] smpboot: CPU 7 is now offline > > [ 41.843341] == > [ 41.849561] [ INFO: possible circular locking dependency detected ] > [ 41.855883] 4.1.0-rc1+ #2 Not tainted > [ 41.859564] --- > [ 41.865871] script2.sh/2071 is trying to acquire lock: > [ 41.871044] (cpu_hotplug.lock){++}, at: [] > get_online_cpus+0x32/0x80 > [ 41.879521] > but task is already holding lock: > [ 41.885392] (s_active#121){.+}, at: [] > kernfs_fop_write+0x6e/0x1a0 > [ 41.893695] > which lock already depends on the new lock. > > [ 41.901925] > the existing dependency chain (in reverse order) is: > [ 41.909465] > -> #2 (s_active#121){.+}: > [ 41.913739][] lock_acquire+0xd1/0x2b0 > [ 41.919718][] __kernfs_remove+0x228/0x300 > [ 41.926046][] kernfs_remove_by_name_ns+0x49/0xb0 > [ 41.932976][] sysfs_remove_file_ns+0x15/0x20 > [ 41.939552][] device_remove_file+0x19/0x20 > [ 41.945968][] mce_device_remove+0x54/0xd0 > [ 41.952284][] mce_cpu_callback+0x69/0x120 > [ 41.958608][] notifier_call_chain+0x66/0x90 > [ 41.965124][] __raw_notifier_call_chain+0xe/0x10 > [ 41.972053][] cpu_notify+0x23/0x50 > [ 41.977761][] cpu_notify_nofail+0xe/0x20 > [ 41.983986][] _cpu_down+0x1b6/0x2d0 > [ 41.989787][] cpu_down+0x36/0x50 > [ 41.995324][] cpu_subsys_offline+0x14/0x20 > [ 42.001734][] device_offline+0x95/0xc0 > [ 42.007797][] online_store+0x3d/0x90 > [ 42.013673][] dev_attr_store+0x18/0x30 > [ 42.019735][] sysfs_kf_write+0x49/0x60 > [ 42.025796][] kernfs_fop_write+0x140/0x1a0 > [ 42.032211][] __vfs_write+0x28/0xf0 > [ 42.038013]
Re: [PATCH] Timer: fix a race condition between init_timers_cpu() and get_next_timer_interrupt()
This patches works with 4.0.1, but doesn't work with 4.1-rc1+ Thanks, Ethan On Wed, Apr 29, 2015 at 10:58 PM, Ethan Zhao wrote: > while testing CPU hotplug and MCE with following two scripts, > > script 1: > > for i in {1..30}; do while :; do ((a=$RANDOM%160)); echo 0 >> > /sys/devices/system/cpu/cpu${i}/online; echo 1 >> > /sys/devices/system/cpu/cpu${i}/online; done & done > > script 2: > > while :; do for i in $(ls > /sys/devices/system/machinecheck/machinecheck*/check_interval); do echo 1 >> > $i; done; done > > We got panic call trace as: > > sh> bt > PID: 0 TASK: 881028e28080 CPU: 14 COMMAND: "swapper/14" > #0 [881028e2ba90] machine_kexec at 810402aa > #1 [881028e2baf8] crash_kexec at 810c4ea4 > #2 [881028e2bbc0] oops_end at 81575c50 > #3 [881028e2bbe8] no_context at 8156a8f9 > #4 [881028e2bc30] __bad_area_nosemaphore at 8156a979 > #5 [881028e2bc78] bad_area_nosemaphore at 8156aae3 > #6 [881028e2bc88] __do_page_fault at 8157852d > #7 [881028e2bd80] do_page_fault at 8157896e > #8 [881028e2bd90] page_fault at 815750d8 > [exception RIP: get_next_timer_interrupt+344] > RIP: 8106c228 RSP: 881028e2be48 RFLAGS: 00010013 > RAX: RBX: 0001006d7a29 RCX: 0001006d7dee > RDX: 0001 RSI: 88602978d3f8 RDI: 88602978d028 > RBP: 881028e2be90 R8: 003d R9: 003b > R10: 01006d7b R11: 881028e2be50 R12: 0001006d7dee > R13: 0001406d7a28 R14: 88602978c000 R15: 881028e2be68 > ORIG_RAX: CS: 0010 SS: 0018 > #9 [881028e2be40] get_next_timer_interrupt at 8106c120 > > This panic (NULL pointer dereference) was caused by race condition between > init_timers_cpu() and get_next_timer_interrupt(), there is no protection > with lock when does initialization in function init_timers_cpu(). > > The two threads cause the race condition are shown as following: > > Thread A: > store_online() > cpu_up() > __cpu_notify(CPU_UP_PREPARE...) > timer_cpu_notify() > CPU_UP_PREPARE: init_timers_cpu() > { > ... > INIT_LIST_HEAD(base->tv5.vec + j); > ... > } > Thread B: > tick_nohz_idle_enter() > __tick_nohz_idle_enter() > get_next_timer_interrupt() >__next_timer_interrupt() >{ >... >list_for_each_entry(nte, varp->vec + slot, entry) { >if (tbase_get_deferrable(nte->base)) >... >} > > This bug will affect stable branch 4.0, 3.8, 3.19, I didn't check other > branches. > The patch has been tested and verfied on stable 4.0. > > Reported-by: Tim Uglow > Signed-off-by: Ethan Zhao > --- > kernel/time/timer.c | 4 > 1 file changed, 4 insertions(+) > > diff --git a/kernel/time/timer.c b/kernel/time/timer.c > index 2d3f5c5..cc1cf35 100644 > --- a/kernel/time/timer.c > +++ b/kernel/time/timer.c > @@ -1573,6 +1573,9 @@ static int init_timers_cpu(int cpu) > base = per_cpu(tvec_bases, cpu); > } > > + if (unlikely(!base)) > + return -EINVAL; > + spin_lock_irq(&base->lock); > > for (j = 0; j < TVN_SIZE; j++) { > INIT_LIST_HEAD(base->tv5.vec + j); > @@ -1587,6 +1590,7 @@ static int init_timers_cpu(int cpu) > base->next_timer = base->timer_jiffies; > base->active_timers = 0; > base->all_timers = 0; > + spin_unlock_irq(&base->lock); > return 0; > } > > -- > 1.8.3.1 > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86/mce: fix mce_restart() race with CPU hotplug operation
On Fri, May 1, 2015 at 12:29 AM, Borislav Petkov wrote: > On Thu, Apr 30, 2015 at 12:04:53AM +0900, Ethan Zhao wrote: >> while testing CPU hotplug and MCE with following two scripts, >> >> script 1: >> >> for i in {1..30}; do while :; do ((a=$RANDOM%160)); echo 0 >> >> /sys/devices/system/cpu/cpu${i}/online; echo 1 >> >> /sys/devices/system/cpu/cpu${i}/online; done & done >> >> script 2: >> >> while :; do for i in $(ls >> /sys/devices/system/machinecheck/machinecheck*/check_interval); do echo 1 >> >> >> $i; done; done > > For the record, it is a public secret that CPU hotplug is broken. IOW, > you're wasting your time with those senseless pounder tests but ok. > > ... > >> --- >> arch/x86/kernel/cpu/mcheck/mce.c | 4 >> 1 file changed, 4 insertions(+) >> >> diff --git a/arch/x86/kernel/cpu/mcheck/mce.c >> b/arch/x86/kernel/cpu/mcheck/mce.c >> index 3c036cb..fcc2794 100644 >> --- a/arch/x86/kernel/cpu/mcheck/mce.c >> +++ b/arch/x86/kernel/cpu/mcheck/mce.c >> @@ -1338,8 +1338,10 @@ static void mce_timer_delete_all(void) >> { >> int cpu; >> >> + get_online_cpus(); >> for_each_online_cpu(cpu) >> del_timer_sync(&per_cpu(mce_timer, cpu)); >> + put_online_cpus(); >> } >> >> static void mce_do_trigger(struct work_struct *work) >> @@ -2085,7 +2087,9 @@ static void mce_cpu_restart(void *data) >> static void mce_restart(void) >> { >> mce_timer_delete_all(); >> + get_online_cpus(); >> on_each_cpu(mce_cpu_restart, NULL, 1); >> + put_online_cpus(); > > With your patch applied I get on 4.1-rc1+: I didn't test it with 4.1-rc1+ yet. Let' us check it. Thanks, Ethan > > --- > [ 41.364909] kvm: disabling virtualization on CPU1 > [ 41.371083] smpboot: CPU 1 is now offline > [ 41.381190] x86: Booting SMP configuration: > [ 41.385405] smpboot: Booting Node 0 Processor 1 APIC 0x2 > [ 41.402901] kvm: enabling virtualization on CPU1 > [ 41.440944] kvm: disabling virtualization on CPU1 > [ 41.447010] smpboot: CPU 1 is now offline > [ 41.486082] kvm: disabling virtualization on CPU6 > [ 41.491827] smpboot: CPU 6 is now offline > [ 41.497521] smpboot: Booting Node 0 Processor 6 APIC 0x5 > [ 41.514983] kvm: enabling virtualization on CPU6 > [ 41.561643] kvm: disabling virtualization on CPU6 > [ 41.566848] smpboot: CPU 6 is now offline > [ 41.572606] smpboot: Booting Node 0 Processor 6 APIC 0x5 > [ 41.590049] kvm: enabling virtualization on CPU6 > [ 41.636817] kvm: disabling virtualization on CPU6 > [ 41.642575] smpboot: CPU 6 is now offline > [ 41.676812] kvm: disabling virtualization on CPU7 > [ 41.682429] smpboot: CPU 7 is now offline > [ 41.687974] smpboot: Booting Node 0 Processor 7 APIC 0x7 > [ 41.705416] kvm: enabling virtualization on CPU7 > [ 41.752739] kvm: disabling virtualization on CPU7 > [ 41.758455] smpboot: CPU 7 is now offline > [ 41.764089] smpboot: Booting Node 0 Processor 7 APIC 0x7 > [ 41.781561] kvm: enabling virtualization on CPU7 > [ 41.831610] kvm: disabling virtualization on CPU7 > [ 41.837280] smpboot: CPU 7 is now offline > > [ 41.843341] == > [ 41.849561] [ INFO: possible circular locking dependency detected ] > [ 41.855883] 4.1.0-rc1+ #2 Not tainted > [ 41.859564] --- > [ 41.865871] script2.sh/2071 is trying to acquire lock: > [ 41.871044] (cpu_hotplug.lock){++}, at: [] > get_online_cpus+0x32/0x80 > [ 41.879521] > but task is already holding lock: > [ 41.885392] (s_active#121){.+}, at: [] > kernfs_fop_write+0x6e/0x1a0 > [ 41.893695] > which lock already depends on the new lock. > > [ 41.901925] > the existing dependency chain (in reverse order) is: > [ 41.909465] > -> #2 (s_active#121){.+}: > [ 41.913739][] lock_acquire+0xd1/0x2b0 > [ 41.919718][] __kernfs_remove+0x228/0x300 > [ 41.926046][] kernfs_remove_by_name_ns+0x49/0xb0 > [ 41.932976][] sysfs_remove_file_ns+0x15/0x20 > [ 41.939552][] device_remove_file+0x19/0x20 > [ 41.945968][] mce_device_remove+0x54/0xd0 > [ 41.952284][] mce_cpu_callback+0x69/0x120 > [ 41.958608][] notifier_call_chain+0x66/0x90 > [ 41.965124][] __raw_notifier_call_chain+0xe/0x10 > [ 41.972053][] cpu_notify+0x23/0x50 > [ 41.977761][] cpu_notify_nofail+0xe/0x20 > [ 41.983986][] _cpu_down+0x1b6/0x2d0 > [ 41.989787][] cpu_down+0x36/0x50 > [ 41.995324][] cpu_subsys_offline+0x14/0x20 > [ 42.001734][] device_offline+0x95/0xc0 > [ 42.007797][] online_store+0x3d/0x90 > [ 42.013673][] dev_attr_store+0x18/0x30 > [ 42.019735][] sysfs_kf_write+0x49/0x60 > [ 42.025796][] kernfs_fop_write+0x140/0x1a0 > [ 42.032211][] __vfs_write+0x28/0xf0 > [ 42.038013][] vfs_write+0xa9/0x1b0 > [ 42.043715][] SyS_write+0x49/0
Regression: Disk corruption with dm-crypt and kernels >= 4.0
I made sure to run a completely vanilla kernel when testing why I was suddenly seeing some nasty libata errors with all kernels >= v4.0. Here's a snippet: >8 [ 165.592136] ata5.00: exception Emask 0x60 SAct 0x7000 SErr 0x800 action 0x6 frozen [ 165.592140] ata5.00: irq_stat 0x2000, host bus error [ 165.592143] ata5: SError: { HostInt } [ 165.592145] ata5.00: failed command: READ FPDMA QUEUED [ 165.592149] ata5.00: cmd 60/08:60:a0:0d:89/00:00:07:00:00/40 tag 12 ncq 4096 in res 40/00:74:40:58:5d/00:00:00:00:00/40 Emask 0x60 (host bus error) [ 165.592151] ata5.00: status: { DRDY } >8 After a few dozen of these errors, I'd suddenly find my system in read-only mode with corrupted files throughout my encrypted filesystems (seemed like either a read or a write would corrupt a file, though I could be mistaken). I decided to do a git bisect with a random read-write-sync test to narrow down the culprit, which turned out to be this commit (part of a series): # first bad commit: [cf2f1abfbd0dba701f7f16ef619e4d2485de3366] dm crypt: don't allocate pages for a partial request Just to be sure, I created a patch to revert the entire nine patch series that commit belonged to... and the bad behavior disappeared. I've now been running kernel 4.0 for a few days without issue, and went so far as to stress test my poor SSD for a few hours to be 100% positive. Here's some more info on my setup. >8 $ lsblk -f NAME FSTYPE LABEL MOUNTPOINT sda āāsda1 vfat /boot/EFI āāsda2 ext4 /boot āāsda3 LVM2_member āāSSD-root crypto_LUKS ā āāroot f2fs / āāSSD-home crypto_LUKS āāhome f2fs /home $ cat /proc/cmdline BOOT_IMAGE=/vmlinuz-linux-memnix cryptdevice=/dev/SSD/root:root:allow-discards root=/dev/mapper/root acpi_osi=Linux security=tomoyo TOMOYO_trigger=/usr/lib/systemd/systemd intel_iommu=on modprobe.blacklist=nouveau rw quiet $ cat /etc/lvm/lvm.conf | grep "issue_discards" issue_discards = 1 >8 If there's anything else I can do to help diagnose the underlying problem, I'm more than willing. Thanks, Abelardo Ricart. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH V2 1/2] mm/thp: Use new functions to clear pmd on splitting and collapse
Some arch may require an explicit IPI before a THP PMD split or collapse. This enable us to use local_irq_disable to prevent a parallel THP PMD split or collapse. Signed-off-by: Aneesh Kumar K.V --- include/asm-generic/pgtable.h | 32 mm/huge_memory.c | 9 + 2 files changed, 37 insertions(+), 4 deletions(-) diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h index fe617b7e4be6..e95c697bef25 100644 --- a/include/asm-generic/pgtable.h +++ b/include/asm-generic/pgtable.h @@ -184,6 +184,38 @@ static inline void pmdp_set_wrprotect(struct mm_struct *mm, #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ #endif +#ifndef __HAVE_ARCH_PMDP_SPLITTING_FLUSH_NOTIFY +#ifdef CONFIG_TRANSPARENT_HUGEPAGE +#define pmdp_splitting_flush_notify pmdp_clear_flush_notify +#else +static inline void pmdp_splitting_flush_notify(struct vm_area_struct *vma, + unsigned long address, + pmd_t *pmdp) +{ + BUILD_BUG(); +} +#endif /* CONFIG_TRANSPARENT_HUGEPAGE */ +#endif + +#ifndef __HAVE_ARCH_PMDP_COLLAPSE_FLUSH +#ifdef CONFIG_TRANSPARENT_HUGEPAGE +static inline pmd_t pmdp_collapse_flush(struct vm_area_struct *vma, + unsigned long address, + pmd_t *pmdp) +{ + return pmdp_clear_flush(vma, address, pmdp); +} +#else +static inline pmd_t pmdp_collapse_flush(struct vm_area_struct *vma, + unsigned long address, + pmd_t *pmdp) +{ + BUILD_BUG(); + return __pmd(0); +} +#endif /* CONFIG_TRANSPARENT_HUGEPAGE */ +#endif + #ifndef __HAVE_ARCH_PGTABLE_DEPOSIT extern void pgtable_trans_huge_deposit(struct mm_struct *mm, pmd_t *pmdp, pgtable_t pgtable); diff --git a/mm/huge_memory.c b/mm/huge_memory.c index cce4604c192f..30c1b46fcf6d 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2187,7 +2187,7 @@ static void collapse_huge_page(struct mm_struct *mm, * huge and small TLB entries for the same virtual address * to avoid the risk of CPU bugs in that area. */ - _pmd = pmdp_clear_flush(vma, address, pmd); + _pmd = pmdp_collapse_flush(vma, address, pmd); spin_unlock(pmd_ptl); mmu_notifier_invalidate_range_end(mm, mmun_start, mmun_end); @@ -2606,9 +2606,10 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd, write = pmd_write(*pmd); young = pmd_young(*pmd); - - /* leave pmd empty until pte is filled */ - pmdp_clear_flush_notify(vma, haddr, pmd); + /* +* leave pmd empty until pte is filled. +*/ + pmdp_splitting_flush_notify(vma, haddr, pmd); pgtable = pgtable_trans_huge_withdraw(mm, pmd); pmd_populate(mm, &_pmd, pgtable); -- 2.1.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH V2 2/2] powerpc/thp: Remove _PAGE_SPLITTING and related code
With the new thp refcounting we don't need to mark the PMD splitting. Drop the code to handle this. Signed-off-by: Aneesh Kumar K.V --- arch/powerpc/include/asm/kvm_book3s_64.h | 6 -- arch/powerpc/include/asm/pgtable-ppc64.h | 29 ++-- arch/powerpc/mm/hugepage-hash64.c| 3 - arch/powerpc/mm/hugetlbpage.c| 2 +- arch/powerpc/mm/pgtable_64.c | 111 --- mm/gup.c | 2 +- 6 files changed, 52 insertions(+), 101 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h b/arch/powerpc/include/asm/kvm_book3s_64.h index 2d81e202bdcc..9a96fe3caa48 100644 --- a/arch/powerpc/include/asm/kvm_book3s_64.h +++ b/arch/powerpc/include/asm/kvm_book3s_64.h @@ -298,12 +298,6 @@ static inline pte_t kvmppc_read_update_linux_pte(pte_t *ptep, int writing, cpu_relax(); continue; } -#ifdef CONFIG_TRANSPARENT_HUGEPAGE - /* If hugepage and is trans splitting return None */ - if (unlikely(hugepage && -pmd_trans_splitting(pte_pmd(old_pte - return __pte(0); -#endif /* If pte is not present return None */ if (unlikely(!(old_pte & _PAGE_PRESENT))) return __pte(0); diff --git a/arch/powerpc/include/asm/pgtable-ppc64.h b/arch/powerpc/include/asm/pgtable-ppc64.h index 843cb35e6add..655dde8e9683 100644 --- a/arch/powerpc/include/asm/pgtable-ppc64.h +++ b/arch/powerpc/include/asm/pgtable-ppc64.h @@ -361,11 +361,6 @@ void pgtable_cache_init(void); #endif /* __ASSEMBLY__ */ /* - * THP pages can't be special. So use the _PAGE_SPECIAL - */ -#define _PAGE_SPLITTING _PAGE_SPECIAL - -/* * We need to differentiate between explicit huge page and THP huge * page, since THP huge page also need to track real subpage details */ @@ -375,8 +370,7 @@ void pgtable_cache_init(void); * set of bits not changed in pmd_modify. */ #define _HPAGE_CHG_MASK (PTE_RPN_MASK | _PAGE_HPTEFLAGS | \ -_PAGE_DIRTY | _PAGE_ACCESSED | _PAGE_SPLITTING | \ -_PAGE_THP_HUGE) +_PAGE_DIRTY | _PAGE_ACCESSED | _PAGE_THP_HUGE) #ifndef __ASSEMBLY__ /* @@ -458,13 +452,6 @@ static inline int pmd_trans_huge(pmd_t pmd) return (pmd_val(pmd) & 0x3) && (pmd_val(pmd) & _PAGE_THP_HUGE); } -static inline int pmd_trans_splitting(pmd_t pmd) -{ - if (pmd_trans_huge(pmd)) - return pmd_val(pmd) & _PAGE_SPLITTING; - return 0; -} - extern int has_transparent_hugepage(void); #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ @@ -517,12 +504,6 @@ static inline pmd_t pmd_mknotpresent(pmd_t pmd) return pmd; } -static inline pmd_t pmd_mksplitting(pmd_t pmd) -{ - pmd_val(pmd) |= _PAGE_SPLITTING; - return pmd; -} - #define __HAVE_ARCH_PMD_SAME static inline int pmd_same(pmd_t pmd_a, pmd_t pmd_b) { @@ -577,8 +558,12 @@ static inline void pmdp_set_wrprotect(struct mm_struct *mm, unsigned long addr, pmd_hugepage_update(mm, addr, pmdp, _PAGE_RW, 0); } -#define __HAVE_ARCH_PMDP_SPLITTING_FLUSH -extern void pmdp_splitting_flush(struct vm_area_struct *vma, +#define __HAVE_ARCH_PMDP_SPLITTING_FLUSH_NOTIFY +extern void pmdp_splitting_flush_notify(struct vm_area_struct *vma, + unsigned long address, pmd_t *pmdp); + +#define __HAVE_ARCH_PMDP_COLLAPSE_FLUSH +extern pmd_t pmdp_collapse_flush(struct vm_area_struct *vma, unsigned long address, pmd_t *pmdp); #define __HAVE_ARCH_PGTABLE_DEPOSIT diff --git a/arch/powerpc/mm/hugepage-hash64.c b/arch/powerpc/mm/hugepage-hash64.c index 86686514ae13..078f7207afd2 100644 --- a/arch/powerpc/mm/hugepage-hash64.c +++ b/arch/powerpc/mm/hugepage-hash64.c @@ -39,9 +39,6 @@ int __hash_page_thp(unsigned long ea, unsigned long access, unsigned long vsid, /* If PMD busy, retry the access */ if (unlikely(old_pmd & _PAGE_BUSY)) return 0; - /* If PMD is trans splitting retry the access */ - if (unlikely(old_pmd & _PAGE_SPLITTING)) - return 0; /* If PMD permissions don't match, take page fault */ if (unlikely(access & ~old_pmd)) return 1; diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c index f30ae0f7f570..dfd7db0cfbee 100644 --- a/arch/powerpc/mm/hugetlbpage.c +++ b/arch/powerpc/mm/hugetlbpage.c @@ -1008,7 +1008,7 @@ pte_t *find_linux_pte_or_hugepte(pgd_t *pgdir, unsigned long ea, unsigned *shift * hpte invalidate * */ - if (pmd_none(pmd) || pmd_trans_splitting(pmd)) + if (pmd_none(pmd)) return NULL;
[PATCH V2 0/2] Remove _PAGE_SPLITTING from ppc64
The changes are on top of what is posted at http://mid.gmane.org/1429823043-157133-1-git-send-email-kirill.shute...@linux.intel.com git://git.kernel.org/pub/scm/linux/kernel/git/kas/linux.git thp/refcounting/v5 Changes from V1: * Fold part of patch 3 to 1 and 2 * Drop patch 3. * Make generic version of pmdp_splitting_flush_notify inline. Aneesh Kumar K.V (2): mm/thp: Use new functions to clear pmd on splitting and collapse powerpc/thp: Remove _PAGE_SPLITTING and related code arch/powerpc/include/asm/kvm_book3s_64.h | 6 -- arch/powerpc/include/asm/pgtable-ppc64.h | 29 ++-- arch/powerpc/mm/hugepage-hash64.c| 3 - arch/powerpc/mm/hugetlbpage.c| 2 +- arch/powerpc/mm/pgtable_64.c | 111 --- include/asm-generic/pgtable.h| 32 + mm/gup.c | 2 +- mm/huge_memory.c | 9 +-- 8 files changed, 89 insertions(+), 105 deletions(-) -- 2.1.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v6 4/4] clk: dt: Introduce binding for always-on clock support
On Thu, Apr 30, 2015 at 10:57:22AM +0100, Lee Jones wrote: > On Wed, 29 Apr 2015, Maxime Ripard wrote: > > > On Wed, Apr 29, 2015 at 03:17:51PM +0100, Lee Jones wrote: > > > On Wed, 22 Apr 2015, Maxime Ripard wrote: > > > > > > > On Wed, Apr 08, 2015 at 06:23:44PM +0100, Lee Jones wrote: > > > > > On Wed, 08 Apr 2015, Maxime Ripard wrote: > > > > > > > > > > > On Wed, Apr 08, 2015 at 11:38:32AM +0100, Lee Jones wrote: > > > > > > > On Wed, 08 Apr 2015, Maxime Ripard wrote: > > > > > > > > > > > > > > > On Wed, Apr 08, 2015 at 09:14:50AM +0100, Lee Jones wrote: > > > > > > > > > > > + > > > > > > > > > > > + This property is not to be abused. It is > > > > > > > > > > > only to be used to > > > > > > > > > > > + protect platforms from being crippled by > > > > > > > > > > > gated clocks, not > > > > > > > > > > > + as a convenience function to avoid using > > > > > > > > > > > the framework > > > > > > > > > > > + correctly inside device drivers. > > > > > > > > > > > > > > > > > > > > Disregarding what's stated here, I'm pretty sure that this > > > > > > > > > > will > > > > > > > > > > actually happen. Where do you place the cursor? > > > > > > > > > > > > > > > > > > That's up to Mike. > > > > > > > > > > > > > > > > Except that Mike won't review any of the DT changes, so he > > > > > > > > won't be > > > > > > > > able to refrain users from using it. Let alone out-of-tree DTs > > > > > > > > using a > > > > > > > > mainline kernel. > > > > > > > > > > > > > > Ideally Mike should be Cc'ed on patches using clock bindings, but > > > > > > > if > > > > > > > he isn't the DT guys are smart enough to either make the right > > > > > > > decisions themselves (Rob has Acked these bindings already, so > > > > > > > will be > > > > > > > on the lookout for misuse, I'm sure), or ask for Mike's help. > > > > > > > > > > > > Yeah, right, as if this strategy really worked in the past > > > > > > > > > > > > Do we really want to look at even the DT bindings that have actually > > > > > > been reviewed by maintainers that got merged? > > > > > > > > > > > > They don't have time for that, which is totally fine, but we really > > > > > > should bury our head in the sand by actually thinking they will > > > > > > review > > > > > > every single DT-related patch. > > > > > > > > > > > > Using that as an argument is just plain denial of what really > > > > > > happened > > > > > > for the past 4 years. > > > > > > > > > > I agree that it's a problem, but this is a process problem and has > > > > > nothing to do with this set. If you have a problem with the current > > > > > process and have a better alternative, submit your thoughts to the DT > > > > > list. Rejecting all new bindings because you are frightened that they > > > > > will be used in a manner that they were not intended is not the way to > > > > > go though. > > > > > > > > I'm not saying that this binding should not go in because of a process > > > > issue. > > > > > > > > I'm saying that discarding arguments against your binding by adding > > > > restrictions that cannot be enforced is not reasonable. > > > > > > I'm open to constructive suggestions/alternatives. > > > > > > Hand rolling this stuff in C per vendor is not of of them. > > > > I'm sorry, but ruling out alternatives that work for everyone (and > > actually work better) just because you don't want to edit a C file is > > not really constructive either. > > > > > > > > > > > > Should we create a new driver for our RAM controller, or do > > > > > > > > > > we want to > > > > > > > > > > use clock-always-on? > > > > > > > > > > > > > > > > > > I would say that if all the driver did was to enable clocks, > > > > > > > > > then you > > > > > > > > > should use this instead. This binding was designed > > > > > > > > > specifically for > > > > > > > > > that purpose. > > > > > > > > > > > > > > > > > > However, if the aforementioned driver clock can be safely > > > > > > > > > gated, then > > > > > > > > > it should not be an always-on clock. > > > > > > > > > > > > > > > > Yeah, of course, I understand the original intent of it, but > > > > > > > > that > > > > > > > > argument, which might very well be true at one point in time, > > > > > > > > might > > > > > > > > not be true anymore two or three releases later. > > > > > > > > > > > > > > Why? The H/W isn't going to change in two or three releases. The > > > > > > > clocks designated as 'always-on' will have to be on forever, or > > > > > > > synonymously, 'always'. > > > > > > > > > > > > > > > And that driver might actually rely on the fact that the clock > > > > > > > > is shut > > > > > > > > down, which won't be the case. > > > > > > > > > > > > > > I think you are missing the point of this binding. The driver can > > > > > > > never rely on that in this use-case. If the clock is off, there > > > > > > > is no > > > > > > > device driver, period
Re: [PATCH kernel v9 29/32] vfio: powerpc/spapr: Register memory and define IOMMU v2
On Fri, May 01, 2015 at 02:35:23PM +1000, Alexey Kardashevskiy wrote: > On 04/30/2015 04:55 PM, David Gibson wrote: > >On Sat, Apr 25, 2015 at 10:14:53PM +1000, Alexey Kardashevskiy wrote: > >>The existing implementation accounts the whole DMA window in > >>the locked_vm counter. This is going to be worse with multiple > >>containers and huge DMA windows. Also, real-time accounting would requite > >>additional tracking of accounted pages due to the page size difference - > >>IOMMU uses 4K pages and system uses 4K or 64K pages. > >> > >>Another issue is that actual pages pinning/unpinning happens on every > >>DMA map/unmap request. This does not affect the performance much now as > >>we spend way too much time now on switching context between > >>guest/userspace/host but this will start to matter when we add in-kernel > >>DMA map/unmap acceleration. > >> > >>This introduces a new IOMMU type for SPAPR - VFIO_SPAPR_TCE_v2_IOMMU. > >>New IOMMU deprecates VFIO_IOMMU_ENABLE/VFIO_IOMMU_DISABLE and introduces > >>2 new ioctls to register/unregister DMA memory - > >>VFIO_IOMMU_SPAPR_REGISTER_MEMORY and VFIO_IOMMU_SPAPR_UNREGISTER_MEMORY - > >>which receive user space address and size of a memory region which > >>needs to be pinned/unpinned and counted in locked_vm. > >>New IOMMU splits physical pages pinning and TCE table update into 2 > >>different > >>operations. It requires 1) guest pages to be registered first 2) consequent > >>map/unmap requests to work only with pre-registered memory. > >>For the default single window case this means that the entire guest > >>(instead of 2GB) needs to be pinned before using VFIO. > >>When a huge DMA window is added, no additional pinning will be > >>required, otherwise it would be guest RAM + 2GB. > >> > >>The new memory registration ioctls are not supported by > >>VFIO_SPAPR_TCE_IOMMU. Dynamic DMA window and in-kernel acceleration > >>will require memory to be preregistered in order to work. > >> > >>The accounting is done per the user process. > >> > >>This advertises v2 SPAPR TCE IOMMU and restricts what the userspace > >>can do with v1 or v2 IOMMUs. > >> > >>Signed-off-by: Alexey Kardashevskiy > >>[aw: for the vfio related changes] > >>Acked-by: Alex Williamson > >>--- > >>Changes: > >>v9: > >>* s/tce_get_hva_cached/tce_iommu_use_page_v2/ > >> > >>v7: > >>* now memory is registered per mm (i.e. process) > >>* moved memory registration code to powerpc/mmu > >>* merged "vfio: powerpc/spapr: Define v2 IOMMU" into this > >>* limited new ioctls to v2 IOMMU > >>* updated doc > >>* unsupported ioclts return -ENOTTY instead of -EPERM > >> > >>v6: > >>* tce_get_hva_cached() returns hva via a pointer > >> > >>v4: > >>* updated docs > >>* s/kzmalloc/vzalloc/ > >>* in tce_pin_pages()/tce_unpin_pages() removed @vaddr, @size and > >>replaced offset with index > >>* renamed vfio_iommu_type_register_memory to > >>vfio_iommu_spapr_register_memory > >>and removed duplicating vfio_iommu_spapr_register_memory > >>--- > >> Documentation/vfio.txt | 23 > >> drivers/vfio/vfio_iommu_spapr_tce.c | 230 > >> +++- > >> include/uapi/linux/vfio.h | 27 + > >> 3 files changed, 274 insertions(+), 6 deletions(-) > >> > >>diff --git a/Documentation/vfio.txt b/Documentation/vfio.txt > >>index 96978ec..94328c8 100644 > >>--- a/Documentation/vfio.txt > >>+++ b/Documentation/vfio.txt > >>@@ -427,6 +427,29 @@ The code flow from the example above should be > >>slightly changed: > >> > >> > >> > >>+5) There is v2 of SPAPR TCE IOMMU. It deprecates VFIO_IOMMU_ENABLE/ > >>+VFIO_IOMMU_DISABLE and implements 2 new ioctls: > >>+VFIO_IOMMU_SPAPR_REGISTER_MEMORY and VFIO_IOMMU_SPAPR_UNREGISTER_MEMORY > >>+(which are unsupported in v1 IOMMU). > > > >A summary of the semantic differeces between v1 and v2 would be nice. > >At this point it's not really clear to me if there's a case for > >creating v2, or if this could just be done by adding (optional) > >functionality to v1. > > v1: memory preregistration is not supported; explicit enable/disable ioctls > are required > > v2: memory preregistration is required; explicit enable/disable are > prohibited (as they are not needed). > > Mixing these in one IOMMU type caused a lot of problems like should I > increment locked_vm by the 32bit window size on enable() or not; what do I > do about pages pinning when map/map (check if it is from registered memory > and do not pin?). > > Having 2 IOMMU models makes everything a lot simpler. Ok. Would it simplify it further if you made v2 only usable on IODA2 hardware? > >>+PPC64 paravirtualized guests generate a lot of map/unmap requests, > >>+and the handling of those includes pinning/unpinning pages and updating > >>+mm::locked_vm counter to make sure we do not exceed the rlimit. > >>+The v2 IOMMU splits accounting and pinning into separate operations: > >>+ > >>+- VFIO_IOMMU_SPAPR_REGISTER_MEMORY/VFIO_IOMMU_SPAPR_UNREGISTER_MEMORY > >>ioctls > >
Re: [PATCH kernel v9 27/32] powerpc/iommu/ioda2: Add get_table_size() to calculate the size of future table
On Fri, May 01, 2015 at 02:10:58PM +1000, Alexey Kardashevskiy wrote: > On 04/29/2015 04:40 PM, David Gibson wrote: > >On Sat, Apr 25, 2015 at 10:14:51PM +1000, Alexey Kardashevskiy wrote: > >>This adds a way for the IOMMU user to know how much a new table will > >>use so it can be accounted in the locked_vm limit before allocation > >>happens. > >> > >>This stores the allocated table size in pnv_pci_create_table() > >>so the locked_vm counter can be updated correctly when a table is > >>being disposed. > >> > >>This defines an iommu_table_group_ops callback to let VFIO know > >>how much memory will be locked if a table is created. > >> > >>Signed-off-by: Alexey Kardashevskiy > >>--- > >>Changes: > >>v9: > >>* reimplemented the whole patch > >>--- > >> arch/powerpc/include/asm/iommu.h | 5 + > >> arch/powerpc/platforms/powernv/pci-ioda.c | 14 > >> arch/powerpc/platforms/powernv/pci.c | 36 > >> +++ > >> arch/powerpc/platforms/powernv/pci.h | 2 ++ > >> 4 files changed, 57 insertions(+) > >> > >>diff --git a/arch/powerpc/include/asm/iommu.h > >>b/arch/powerpc/include/asm/iommu.h > >>index 1472de3..9844c106 100644 > >>--- a/arch/powerpc/include/asm/iommu.h > >>+++ b/arch/powerpc/include/asm/iommu.h > >>@@ -99,6 +99,7 @@ struct iommu_table { > >>unsigned long it_size; /* Size of iommu table in entries */ > >>unsigned long it_indirect_levels; > >>unsigned long it_level_size; > >>+ unsigned long it_allocated_size; > >>unsigned long it_offset;/* Offset into global table */ > >>unsigned long it_base; /* mapped address of tce table */ > >>unsigned long it_index; /* which iommu table this is */ > >>@@ -155,6 +156,10 @@ extern struct iommu_table *iommu_init_table(struct > >>iommu_table * tbl, > >> struct iommu_table_group; > >> > >> struct iommu_table_group_ops { > >>+ unsigned long (*get_table_size)( > >>+ __u32 page_shift, > >>+ __u64 window_size, > >>+ __u32 levels); > >>long (*create_table)(struct iommu_table_group *table_group, > >>int num, > >>__u32 page_shift, > >>diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c > >>b/arch/powerpc/platforms/powernv/pci-ioda.c > >>index e0be556..7f548b4 100644 > >>--- a/arch/powerpc/platforms/powernv/pci-ioda.c > >>+++ b/arch/powerpc/platforms/powernv/pci-ioda.c > >>@@ -2062,6 +2062,18 @@ static void pnv_pci_ioda2_setup_bypass_pe(struct > >>pnv_phb *phb, > >> } > >> > >> #ifdef CONFIG_IOMMU_API > >>+static unsigned long pnv_pci_ioda2_get_table_size(__u32 page_shift, > >>+ __u64 window_size, __u32 levels) > >>+{ > >>+ unsigned long ret = pnv_get_table_size(page_shift, window_size, levels); > >>+ > >>+ if (!ret) > >>+ return ret; > >>+ > >>+ /* Add size of it_userspace */ > >>+ return ret + (window_size >> page_shift) * sizeof(unsigned long); > > > >This doesn't make much sense. The userspace view can't possibly be a > >property of the specific low-level IOMMU model. > > > This it_userspace thing is all about memory preregistration. > > I need some way to track how many actual mappings the > mm_iommu_table_group_mem_t has in order to decide whether to allow > unregistering or not. > > When I clear TCE, I can read the old value which is host physical address > which I cannot use to find the preregistered region and adjust the mappings > counter; I can only use userspace addresses for this (not even guest > physical addresses as it is VFIO and probably no KVM). > > So I have to keep userspace addresses somewhere, one per IOMMU page, and the > iommu_table seems a natural place for this. Well.. sort of. But as noted elsewhere this pulls VFIO specific constraints into a platform code structure. And whether you get this table depends on the platform IOMMU type rather than on what VFIO wants to do with it, which doesn't make sense. What might make more sense is an opaque pointer io iommu_table for use by the table "owner" (in the take_ownership sense). The pointer would be stored in iommu_table, but VFIO is responsible for populating and managing its contents. Or you could just put the userspace mappings in the container. Although you might want a different data structure in that case. The other thing to bear in mind is that registered regions are likely to be large contiguous blocks in user addresses, though obviously not contiguous in physical addr. So you might be able to compaticfy this information by storing it as a list of variable length blocks in userspace address space, rather than a per-page address.. But.. isn't there a bigger problem here. As Paulus was pointing out, there's nothing guaranteeing the page tables continue to contain the same page as was there at gup() time. What's going to happen if you REGISTER a memory region, then mremap() over it? Then attempt to PUT_TCE a page in the region? O
Re: mmotm 2015-04-30-15-43 uploaded
On Thu, Apr 30, 2015 at 03:44:10PM -0700, a...@linux-foundation.org wrote: > The mm-of-the-moment snapshot 2015-04-30-15-43 has been uploaded to > >http://www.ozlabs.org/~akpm/mmotm/ > > mmotm-readme.txt says > > README for mm-of-the-moment: > > http://www.ozlabs.org/~akpm/mmotm/ > > This is a snapshot of my -mm patch queue. Uploaded at random hopefully > more than once a week. > My builders report lots of failures: mm/bootmem.c: In function 'free_all_bootmem_core': mm/bootmem.c:237:32: error: 'cur' undeclared (first use in this function) mm/bootmem.c: In function 'mark_bootmem': mm/bootmem.c:380:1: warning: control reaches end of non-void function [-Wreturn-type] Guenter -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [1/4] watchdog: MAX63XX_WATCHDOG does not depend on ARM
On Thu, Jan 29, 2015 at 12:15:42PM -0500, Vivien Didelot wrote: > Remove the ARM Kconfig dependency since the Maxim MAX63xx devices are > architecture independent. > > Signed-off-by: Vivien Didelot Reviewed-by: Guenter Roeck -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 1/2] ACPI: activate&export acpi_os_get_physical_address
On Fri, May 01, 2015 at 03:45:52AM +0200, Rafael J. Wysocki wrote: > And I don't really understand the Matthew's comment regarding limiting > operation regions to system memory. This is about a specific operation > region (which BTW only seems to be used as a means to access system memory > at the location pointed to by the arg) in that particular method. My feeling was that it really ought to have been the ACPI code dealing with this in some way, but having looked at it again I accept that this is really something that's limited by the vendor implementation. virt_to_phys() isn't the worst thing to do here. -- Matthew Garrett | mj...@srcf.ucam.org -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH kernel v9 31/32] vfio: powerpc/spapr: Support multiple groups in one container if possible
On Fri, May 01, 2015 at 10:46:08AM +1000, Benjamin Herrenschmidt wrote: > On Thu, 2015-04-30 at 19:33 +1000, Alexey Kardashevskiy wrote: > > On 04/30/2015 05:22 PM, David Gibson wrote: > > > On Sat, Apr 25, 2015 at 10:14:55PM +1000, Alexey Kardashevskiy wrote: > > >> At the moment only one group per container is supported. > > >> POWER8 CPUs have more flexible design and allows naving 2 TCE tables per > > >> IOMMU group so we can relax this limitation and support multiple groups > > >> per container. > > > > > > It's not obvious why allowing multiple TCE tables per PE has any > > > pearing on allowing multiple groups per container. > > > > > > This patchset is a global TCE tables rework (patches 1..30, roughly) with 2 > > outcomes: > > 1. reusing the same IOMMU table for multiple groups - patch 31; > > 2. allowing dynamic create/remove of IOMMU tables - patch 32. > > > > I can remove this one from the patchset and post it separately later but > > since 1..30 aim to support both 1) and 2), I'd think I better keep them all > > together (might explain some of changes I do in 1..30). > > I think you are talking past each other :-) > > But yes, having 2 tables per group is orthogonal to the ability of > having multiple groups per container. > > The latter is made possible on P8 in large part because each PE has its > own DMA address space (unlike P5IOC2 or P7IOC where a single address > space is segmented). > > Also, on P8 you can actually make the TVT entries point to the same > table in memory, thus removing the need to duplicate the actual > tables (though you still have to duplicate the invalidations). I would > however recommend only sharing the table that way within a chip/node. > > .../.. > > > >> > > >> -1) Only one IOMMU group per container is supported as an IOMMU group > > >> -represents the minimal entity which isolation can be guaranteed for and > > >> -groups are allocated statically, one per a Partitionable Endpoint (PE) > > >> +1) On older systems (POWER7 with P5IOC2/IODA1) only one IOMMU group per > > >> +container is supported as an IOMMU table is allocated at the boot time, > > >> +one table per a IOMMU group which is a Partitionable Endpoint (PE) > > >> (PE is often a PCI domain but not always). > > > > I thought the more fundamental problem was that different PEs tended > > > to use disjoint bus address ranges, so even by duplicating put_tce > > > across PEs you couldn't have a common address space. > > Yes. This is the problem with P7IOC and earlier. It *could* be doable on > P7IOC by making them the same PE but let's not go there. > > > Sorry, I am not following you here. > > > > By duplicating put_tce, I can have multiple IOMMU groups on the same > > virtual PHB in QEMU, "[PATCH qemu v7 04/14] spapr_pci_vfio: Enable multiple > > groups per container" does this, the address ranges will the same. > > But that is only possible on P8 because only there do we have separate > address spaces between PEs. > > > What I cannot do on p5ioc2 is programming the same table to multiple > > physical PHBs (or I could but it is very different than IODA2 and pretty > > ugly and might not always be possible because I would have to allocate > > these pages from some common pool and face problems like fragmentation). > > And P7IOC has a similar issue. The DMA address top bits indexes the > window on P7IOC within a shared address space. It's possible to > configure a TVT to cover multiple devices but with very serious > limitations. Ok. To check my understanding does this sound reasonable: * The table_group more-or-less represents a PE, but in a way you can reference without first knowing the specific IOMMU hardware type. * When attaching multiple groups to the same container, the first PE (i.e. table_group) attached is used as a representative so that subsequent groups can be checked for compatibility with the first PE and therefore all PEs currently included in the container - This is why the table_group appears in some places where it doesn't seem sensible from a pure object ownership point of view -- David Gibson| I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson pgpbWcD7IlSwf.pgp Description: PGP signature
Re: [PATCH kernel v9 26/32] powerpc/iommu: Add userspace view of TCE table
On Fri, May 01, 2015 at 02:01:17PM +1000, Alexey Kardashevskiy wrote: > On 04/29/2015 04:31 PM, David Gibson wrote: > >On Sat, Apr 25, 2015 at 10:14:50PM +1000, Alexey Kardashevskiy wrote: > >>In order to support memory pre-registration, we need a way to track > >>the use of every registered memory region and only allow unregistration > >>if a region is not in use anymore. So we need a way to tell from what > >>region the just cleared TCE was from. > >> > >>This adds a userspace view of the TCE table into iommu_table struct. > >>It contains userspace address, one per TCE entry. The table is only > >>allocated when the ownership over an IOMMU group is taken which means > >>it is only used from outside of the powernv code (such as VFIO). > >> > >>Signed-off-by: Alexey Kardashevskiy > >>--- > >>Changes: > >>v9: > >>* fixed code flow in error cases added in v8 > >> > >>v8: > >>* added ENOMEM on failed vzalloc() > >>--- > >> arch/powerpc/include/asm/iommu.h | 6 ++ > >> arch/powerpc/kernel/iommu.c | 18 ++ > >> arch/powerpc/platforms/powernv/pci-ioda.c | 22 -- > >> 3 files changed, 44 insertions(+), 2 deletions(-) > >> > >>diff --git a/arch/powerpc/include/asm/iommu.h > >>b/arch/powerpc/include/asm/iommu.h > >>index 7694546..1472de3 100644 > >>--- a/arch/powerpc/include/asm/iommu.h > >>+++ b/arch/powerpc/include/asm/iommu.h > >>@@ -111,9 +111,15 @@ struct iommu_table { > >>unsigned long *it_map; /* A simple allocation bitmap for now */ > >>unsigned long it_page_shift;/* table iommu page size */ > >>struct iommu_table_group *it_table_group; > >>+ unsigned long *it_userspace; /* userspace view of the table */ > > > >A single unsigned long doesn't seem like enough. > > Why single? This is an array. As in single per page. > > How do you know > >which process's address space this address refers to? > > It is a current task. Multiple userspaces cannot use the same > container/tables. Where is that enforced? More to the point, that's a VFIO constraint, but it's here affecting the design of a structure owned by the platform code. [snip] > >> static void pnv_pci_ioda_setup_opal_tce_kill(struct pnv_phb *phb, > >>@@ -2062,12 +2071,21 @@ static long pnv_pci_ioda2_create_table(struct > >>iommu_table_group *table_group, > >>int nid = pe->phb->hose->node; > >>__u64 bus_offset = num ? pe->tce_bypass_base : 0; > >>long ret; > >>+ unsigned long *uas, uas_cb = sizeof(*uas) * (window_size >> page_shift); > >>+ > >>+ uas = vzalloc(uas_cb); > >>+ if (!uas) > >>+ return -ENOMEM; > > > >I don't see why this is allocated both here as well as in > >take_ownership. > > Where else? The only alternative is vfio_iommu_spapr_tce but I really do not > want to touch iommu_table fields there. Well to put it another way, why isn't take_ownership calling create itself (or at least a common helper). Clearly the it_userspace table needs to have lifetime which matches the TCE table itself, so there should be a single function that marks the beginning of that joint lifetime. > >Isn't this function used for core-kernel users of the > >iommu as well, in which case it shouldn't need the it_userspace. > > > No. This is an iommu_table_group_ops callback which calls what the platform > code calls (pnv_pci_create_table()) plus allocates this it_userspace thing. > The callback is only called from VFIO. Ok. As touched on above it seems more like this should be owned by VFIO code than the platform code. -- David Gibson| I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson pgpaLDumcgaa0.pgp Description: PGP signature
Re: [PATCH kernel v9 31/32] vfio: powerpc/spapr: Support multiple groups in one container if possible
On Thu, Apr 30, 2015 at 07:33:09PM +1000, Alexey Kardashevskiy wrote: > On 04/30/2015 05:22 PM, David Gibson wrote: > >On Sat, Apr 25, 2015 at 10:14:55PM +1000, Alexey Kardashevskiy wrote: > >>At the moment only one group per container is supported. > >>POWER8 CPUs have more flexible design and allows naving 2 TCE tables per > >>IOMMU group so we can relax this limitation and support multiple groups > >>per container. > > > >It's not obvious why allowing multiple TCE tables per PE has any > >pearing on allowing multiple groups per container. > > > This patchset is a global TCE tables rework (patches 1..30, roughly) with 2 > outcomes: > 1. reusing the same IOMMU table for multiple groups - patch 31; > 2. allowing dynamic create/remove of IOMMU tables - patch 32. > > I can remove this one from the patchset and post it separately later but > since 1..30 aim to support both 1) and 2), I'd think I better keep them all > together (might explain some of changes I do in 1..30). The combined patchset is fine. My comment is because your commit message says that multiple groups are possible *because* 2 TCE tables per group are allowed, and it's not at all clear why one follows from the other. > >>This adds TCE table descriptors to a container and uses > >>iommu_table_group_ops > >>to create/set DMA windows on IOMMU groups so the same TCE tables will be > >>shared between several IOMMU groups. > >> > >>Signed-off-by: Alexey Kardashevskiy > >>[aw: for the vfio related changes] > >>Acked-by: Alex Williamson > >>--- > >>Changes: > >>v7: > >>* updated doc > >>--- > >> Documentation/vfio.txt | 8 +- > >> drivers/vfio/vfio_iommu_spapr_tce.c | 268 > >> ++-- > >> 2 files changed, 199 insertions(+), 77 deletions(-) > >> > >>diff --git a/Documentation/vfio.txt b/Documentation/vfio.txt > >>index 94328c8..7dcf2b5 100644 > >>--- a/Documentation/vfio.txt > >>+++ b/Documentation/vfio.txt > >>@@ -289,10 +289,12 @@ PPC64 sPAPR implementation note > >> > >> This implementation has some specifics: > >> > >>-1) Only one IOMMU group per container is supported as an IOMMU group > >>-represents the minimal entity which isolation can be guaranteed for and > >>-groups are allocated statically, one per a Partitionable Endpoint (PE) > >>+1) On older systems (POWER7 with P5IOC2/IODA1) only one IOMMU group per > >>+container is supported as an IOMMU table is allocated at the boot time, > >>+one table per a IOMMU group which is a Partitionable Endpoint (PE) > >> (PE is often a PCI domain but not always). > > > >I thought the more fundamental problem was that different PEs tended > >to use disjoint bus address ranges, so even by duplicating put_tce > >across PEs you couldn't have a common address space. > > > Sorry, I am not following you here. > > By duplicating put_tce, I can have multiple IOMMU groups on the same virtual > PHB in QEMU, "[PATCH qemu v7 04/14] spapr_pci_vfio: Enable multiple groups > per container" does this, the address ranges will the same. Oh, ok. For some reason I thought that (at least on the older machines) the different PEs used different and not easily changeable DMA windows in bus addresses space. > What I cannot do on p5ioc2 is programming the same table to multiple > physical PHBs (or I could but it is very different than IODA2 and pretty > ugly and might not always be possible because I would have to allocate these > pages from some common pool and face problems like fragmentation). So allowing multiple groups per container should be possible (at the kernel rather than qemu level) by writing the same value to multiple TCE tables. I guess its not worth doing for just the almost-obsolete IOMMUs though. > > > > >>+Newer systems (POWER8 with IODA2) have improved hardware design which > >>allows > >>+to remove this limitation and have multiple IOMMU groups per a VFIO > >>container. > >> > >> 2) The hardware supports so called DMA windows - the PCI address range > >> within which DMA transfer is allowed, any attempt to access address space > >>diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c > >>b/drivers/vfio/vfio_iommu_spapr_tce.c > >>index a7d6729..970e3a2 100644 > >>--- a/drivers/vfio/vfio_iommu_spapr_tce.c > >>+++ b/drivers/vfio/vfio_iommu_spapr_tce.c > >>@@ -82,6 +82,11 @@ static void decrement_locked_vm(long npages) > >> * into DMA'ble space using the IOMMU > >> */ > >> > >>+struct tce_iommu_group { > >>+ struct list_head next; > >>+ struct iommu_group *grp; > >>+}; > >>+ > >> /* > >> * The container descriptor supports only a single group per container. > >> * Required by the API as the container is not supplied with the IOMMU > >> group > >>@@ -89,10 +94,11 @@ static void decrement_locked_vm(long npages) > >> */ > >> struct tce_container { > >>struct mutex lock; > >>- struct iommu_group *grp; > >>bool enabled; > >>unsigned long locked_pages; > >>bool v2; > >>+ struct iommu_table tables[IOMMU_TABLE_GROUP
[PATCH 1/2] dt-bindings: Add pxa1928 clock binding
This adds the clock binding documentation for the Marvell PXA1928 SOC. The PXA1928 has 3 clock control blocks for different subsystems of the chip. Signed-off-by: Rob Herring Cc: Pawel Moll Cc: Mark Rutland Cc: Ian Campbell Cc: Kumar Gala --- .../devicetree/bindings/clock/marvell,pxa1928.txt | 21 include/dt-bindings/clock/marvell,pxa1928.h| 57 ++ 2 files changed, 78 insertions(+) create mode 100644 Documentation/devicetree/bindings/clock/marvell,pxa1928.txt create mode 100644 include/dt-bindings/clock/marvell,pxa1928.h diff --git a/Documentation/devicetree/bindings/clock/marvell,pxa1928.txt b/Documentation/devicetree/bindings/clock/marvell,pxa1928.txt new file mode 100644 index 000..809c5a2 --- /dev/null +++ b/Documentation/devicetree/bindings/clock/marvell,pxa1928.txt @@ -0,0 +1,21 @@ +* Marvell PXA1928 Clock Controllers + +The PXA1928 clock subsystem generates and supplies clock to various +controllers within the PXA1928 SoC. The PXA1928 contains 3 clock controller +blocks called APMU, MPMU, and APBC roughly corresponding to internal buses. + +Required Properties: + +- compatible: should be one of the following. + - "marvell,pxa1928-apmu" - APMU controller compatible + - "marvell,pxa1928-mpmu" - MPMU controller compatible + - "marvell,pxa1928-apbc" - APBC controller compatible +- reg: physical base address of the clock controller and length of memory mapped + region. +- #clock-cells: should be 1. +- #reset-cells: should be 1. + +Each clock is assigned an identifier and client nodes use the clock controller +phandle and this identifier to specify the clock which they consume. + +All these identifiers can be found in . diff --git a/include/dt-bindings/clock/marvell,pxa1928.h b/include/dt-bindings/clock/marvell,pxa1928.h new file mode 100644 index 000..c393ca2 --- /dev/null +++ b/include/dt-bindings/clock/marvell,pxa1928.h @@ -0,0 +1,57 @@ +#ifndef __DTS_MARVELL_PXA1928_CLOCK_H +#define __DTS_MARVELL_PXA1928_CLOCK_H + +/* + * Clock ID values here correspond to the control register offset/4. + */ + +/* apb periphrals */ +#define PXA1928_CLK_RTC0 +#define PXA1928_CLK_TWSI0 1 +#define PXA1928_CLK_TWSI1 2 +#define PXA1928_CLK_TWSI2 3 +#define PXA1928_CLK_TWSI3 4 +#define PXA1928_CLK_OWIRE 5 +#define PXA1928_CLK_KPC6 +#define PXA1928_CLK_TB_ROTARY 7 +#define PXA1928_CLK_SW_JTAG8 +#define PXA1928_CLK_TIMER1 9 +#define PXA1928_CLK_UART0 0xb +#define PXA1928_CLK_UART1 0xc +#define PXA1928_CLK_UART2 0xd +#define PXA1928_CLK_GPIO 0xe +#define PXA1928_CLK_PWM0 0xf +#define PXA1928_CLK_PWM1 0x10 +#define PXA1928_CLK_PWM2 0x11 +#define PXA1928_CLK_PWM3 0x12 +#define PXA1928_CLK_SSP0 0x13 +#define PXA1928_CLK_SSP1 0x14 +#define PXA1928_CLK_SSP2 0x15 + +#define PXA1928_CLK_TWSI4 0x1f +#define PXA1928_CLK_TWSI5 0x20 +#define PXA1928_CLK_UART3 0x22 +#define PXA1928_CLK_THSENS_GLOB0x24 +#define PXA1928_CLK_THSENS_CPU 0x26 +#define PXA1928_CLK_THSENS_VPU 0x27 +#define PXA1928_CLK_THSENS_GC 0x28 +#define PXA1928_APBC_NR_CLKS 0x30 + + +/* axi periphrals */ +#define PXA1928_CLK_SDH0 0x15 +#define PXA1928_CLK_SDH1 0x16 +#define PXA1928_CLK_USB0x17 +#define PXA1928_CLK_NAND 0x18 +#define PXA1928_CLK_DMA0x19 + +#define PXA1928_CLK_SDH2 0x3a +#define PXA1928_CLK_SDH3 0x3b +#define PXA1928_CLK_HSIC 0x3e +#define PXA1928_CLK_SDH4 0x57 +#define PXA1928_CLK_GC3D 0x5d +#define PXA1928_CLK_GC2D 0x5f + +#define PXA1928_APMU_NR_CLKS 0x60 + +#endif -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 2/2] clk: mmp: add PXA1928 clock support
Add initial clock support for Marvell PXA1928. The PXA1928 is a mobile SOC and is similar to other MMP/PXA series of SOCs, so a lot of the existing infrastructure is reused here. Currently the PLLs are just fixed clocks, and not all leaf clocks are implemented. Signed-off-by: Rob Herring Cc: Mike Turquette Cc: Stephen Boyd --- drivers/clk/mmp/Makefile | 2 + drivers/clk/mmp/clk-of-pxa1928.c | 265 +++ 2 files changed, 267 insertions(+) create mode 100644 drivers/clk/mmp/clk-of-pxa1928.c diff --git a/drivers/clk/mmp/Makefile b/drivers/clk/mmp/Makefile index 3caaf7c..9d4bc41 100644 --- a/drivers/clk/mmp/Makefile +++ b/drivers/clk/mmp/Makefile @@ -12,3 +12,5 @@ obj-$(CONFIG_MACH_MMP2_DT) += clk-of-mmp2.o obj-$(CONFIG_CPU_PXA168) += clk-pxa168.o obj-$(CONFIG_CPU_PXA910) += clk-pxa910.o obj-$(CONFIG_CPU_MMP2) += clk-mmp2.o + +obj-y += clk-of-pxa1928.o diff --git a/drivers/clk/mmp/clk-of-pxa1928.c b/drivers/clk/mmp/clk-of-pxa1928.c new file mode 100644 index 000..b7cb540b --- /dev/null +++ b/drivers/clk/mmp/clk-of-pxa1928.c @@ -0,0 +1,265 @@ +/* + * pxa1928 clock framework source file + * + * Copyright (C) 2015 Linaro, Ltd. + * Rob Herring + * + * Based on drivers/clk/mmp/clk-of-mmp2.c: + * Copyright (C) 2012 Marvell + * Chao Xie + * + * This file is licensed under the terms of the GNU General Public + * License version 2. This program is licensed "as is" without any + * warranty of any kind, whether express or implied. + */ +#include +#include +#include +#include +#include + +#include + +#include "clk.h" +#include "reset.h" + +#define MPMU_UART_PLL 0x14 + +struct pxa1928_clk_unit { + struct mmp_clk_unit unit; + void __iomem *mpmu_base; + void __iomem *apmu_base; + void __iomem *apbc_base; + void __iomem *apbcp_base; +}; + +static struct mmp_param_fixed_rate_clk fixed_rate_clks[] = { + {0, "clk32", NULL, CLK_IS_ROOT, 32768}, + {0, "vctcxo", NULL, CLK_IS_ROOT, 2600}, + {0, "pll1_624", NULL, CLK_IS_ROOT, 62400}, + {0, "pll5p", NULL, CLK_IS_ROOT, 83200}, + {0, "pll5", NULL, CLK_IS_ROOT, 124800}, + {0, "usb_pll", NULL, CLK_IS_ROOT, 48000}, +}; + +static struct mmp_param_fixed_factor_clk fixed_factor_clks[] = { + {0, "pll1_d2", "pll1_624", 1, 2, 0}, + {0, "pll1_d9", "pll1_624", 1, 9, 0}, + {0, "pll1_d12", "pll1_624", 1, 12, 0}, + {0, "pll1_d16", "pll1_624", 1, 16, 0}, + {0, "pll1_d20", "pll1_624", 1, 20, 0}, + {0, "pll1_416", "pll1_624", 2, 3, 0}, + {0, "vctcxo_d2", "vctcxo", 1, 2, 0}, + {0, "vctcxo_d4", "vctcxo", 1, 4, 0}, +}; + +static struct mmp_clk_factor_masks uart_factor_masks = { + .factor = 2, + .num_mask = 0x1fff, + .den_mask = 0x1fff, + .num_shift = 16, + .den_shift = 0, +}; + +static struct mmp_clk_factor_tbl uart_factor_tbl[] = { + {.num = 832, .den = 234}, /*58.5MHZ */ + {.num = 1, .den = 1}, /*26MHZ */ +}; + +static void pxa1928_pll_init(struct pxa1928_clk_unit *pxa_unit) +{ + struct clk *clk; + struct mmp_clk_unit *unit = &pxa_unit->unit; + + mmp_register_fixed_rate_clks(unit, fixed_rate_clks, + ARRAY_SIZE(fixed_rate_clks)); + + mmp_register_fixed_factor_clks(unit, fixed_factor_clks, + ARRAY_SIZE(fixed_factor_clks)); + + clk = mmp_clk_register_factor("uart_pll", "pll1_416", + CLK_SET_RATE_PARENT, + pxa_unit->mpmu_base + MPMU_UART_PLL, + &uart_factor_masks, uart_factor_tbl, + ARRAY_SIZE(uart_factor_tbl), NULL); +} + +static DEFINE_SPINLOCK(uart0_lock); +static DEFINE_SPINLOCK(uart1_lock); +static DEFINE_SPINLOCK(uart2_lock); +static DEFINE_SPINLOCK(uart3_lock); +static const char *uart_parent_names[] = {"uart_pll", "vctcxo"}; + +static DEFINE_SPINLOCK(ssp0_lock); +static DEFINE_SPINLOCK(ssp1_lock); +static const char *ssp_parent_names[] = {"vctcxo_d4", "vctcxo_d2", "vctcxo", "pll1_d12"}; + +static DEFINE_SPINLOCK(reset_lock); + +static struct mmp_param_mux_clk apbc_mux_clks[] = { + {0, "uart0_mux", uart_parent_names, ARRAY_SIZE(uart_parent_names), CLK_SET_RATE_PARENT, PXA1928_CLK_UART0 * 4, 4, 3, 0, &uart0_lock}, + {0, "uart1_mux", uart_parent_names, ARRAY_SIZE(uart_parent_names), CLK_SET_RATE_PARENT, PXA1928_CLK_UART1 * 4, 4, 3, 0, &uart1_lock}, + {0, "uart2_mux", uart_parent_names, ARRAY_SIZE(uart_parent_names), CLK_SET_RATE_PARENT, PXA1928_CLK_UART2 * 4, 4, 3, 0, &uart2_lock}, + {0, "uart3_mux", uart_parent_names, ARRAY_SIZE(uart_parent_names), CLK_SET_RATE_PARENT, PXA1928_CLK_UART3 * 4, 4, 3, 0, &uart3_lock}, + {0, "ssp0_mux", ssp_parent_names, ARRAY_SIZE(ssp_parent_names), CLK_SET_RATE_PARENT, PXA1928_CLK_SSP0 * 4, 4, 3, 0, &ssp0_lock}, + {0, "ssp1_mux", ssp_parent_names, ARRAY_
Re: 3.17.0+ files disappearing after playing old dos game on nfsroot laptop
Hans de Bruin writes: >> I expect what needs to happen is to confirm that nfs directory entry >> revalidation is buggy, and at least for the short term re-add the nfs >> logic that will avoid dropping a dentry if it is a mount point, or >> path to a mount point, to avoid the nfs bugs. >> > > ok, I will go in waiting mode. I dropped the ball on this but it looks like someone else hit the problem and the following two commits fixed this issue: Can you confirm that things are working again? commit fa9233699cc1dc236f4cf42245d13e40966938c5 Author: Trond Myklebust Date: Mon Feb 23 18:51:32 2015 -0500 NFS: Don't require a filehandle to refresh the inode in nfs_prime_dcache() If the server does not return a valid set of attributes that we can use to either create a file or refresh the inode, then there is no value in calling nfs_prime_dcache(). However if we're just refreshing the inode using the attributes that the server returned, then it shouldn't matter whether or not we have a filehandle, as long as we check the fsid+fileid combination. Signed-off-by: Trond Myklebust commit 6c441c254eea2354d686be7f5544bcd79fb6a61f Author: Trond Myklebust Date: Sun Feb 22 16:35:36 2015 -0500 NFS: Don't invalidate a submounted dentry in nfs_prime_dcache() If we're traversing a directory which contains a submounted filesystem, or one that has a referral, the NFS server that is processing the READDIR request will often return information for the underlying (mounted-on) directory. It may, or may not, also return filehandle information. If this happens, and the lookup in nfs_prime_dcache() returns the dentry for the submounted directory, the filehandle comparison will fail, and we call d_invalidate(). Post-commit 8ed936b5671bf ("vfs: Lazily remove mounts on unlinked files and directories."), this means the entire subtree is unmounted. The following minimal patch addresses this problem by punting on the invalidation if there is a submount. Kudos to Neil Brown for having tracked down this issue (see link). Reported-by: Nix Link: http://lkml.kernel.org/r/87iofju9ht@spindle.srvr.nix Cc: sta...@vger.kernel.org # 3.18+ Signed-off-by: Trond Myklebust -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 00/22] perf tools: introduce 'perf bpf' command to load eBPF programs.
On 4/30/15 3:52 AM, Wang Nan wrote: This series of patches is an approach to integrate eBPF with perf. After applying these patches, users are allowed to use following command to load eBPF program compiled by LLVM into kernel: $ perf bpf sample_bpf.o The required BPF code and the loading procedure is similar to Alexei Starovoitov's libbpf in sample/bpf, with following exceptions: 1. The section name are not required leading with 'kprobe/' or 'kretprobe/'. Without such leading, any valid C var name can be use. 2. A 'config' section can be provided to describe the position and arguments of a program. Syntax is identical to 'perf probe'. An example is pasted at the bottom of this cover letter. In that example, mybpfprog is configured by string in config section, and will be probed at __alloc_pages_nodemask. sample_bpf.o is generated using: $ $CLANG -I/usr/src/kernel/include -I/usr/src/kernel/usr/include -D__KERNEL__ \ -Wno-unused-value -Wno-pointer-sign \ -O2 -emit-llvm -c sample_bpf.c -o -| $LLC -march=bpf -filetype=obj -o \ sample_bpf.o And can be loaded using: $ perf bpf sample_bpf.o This series is only a limited functional. Following works are on the todo list: 1. Unprobe kprobe stubs used by eBPF programs when unloading; 2. Enable eBPF programs to access local variables and arguments by utilizing debuginfo; 3. Output data in perf way. In this series: Patch 1/22 is a bugfix in perf probe, and may be triggered by following patches; Patch 2-3/22 are preparation, add required macros and syscall definition into perf source tree. Patch 4/22 add 'perf bpf' command. Patch 5-20/22 are labor works, which parse the ELF object file, collect information in object files, create maps needed by programs, link map and programs, config programs and load programs into kernel. Patch 21-22/22 are the final work. Patch 21 creates kprobe points which will be used by eBPF programs, patch 22 creates perf file descriptors then attach eBPF programs on them. I'm very happy to see this work. Looks great. All patches are impressively clean and concise. I think patches 1-3 are ready to go into Arnaldo's perf tree right now. 4 and above are clean and polished, but probably need to go into some 'staging area' like a branch of perf tree, since I suspect the user interface may change a little in the coming months and it's a bit too early to expose 'perf bpf' command to every perf user ? Arnaldo, Ingo, what do you guys think should be the arrangement? 'perf/bpf' branch in acme/linux.git or in tip/tip.git ? I have few comments for patches 18 and 19, but let's figure out the long term plan first. We're also working in parallel on creating a new tracing language that together with llvm backend can be used as a single shared library that can be called from perf or anything else. Then clang compilation step will be gone and programs can be run as 'perf bpf file.bpf'. Thanks! -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH V2 net-next 1/1] hv_netvsc: Use the xmit_more skb flag to optimize signaling the host
On Thu, 2015-04-30 at 16:29 -0700, K. Y. Srinivasan wrote: > Based on the information given to this driver (via the xmit_more skb flag), > we can defer signaling the host if more packets are on the way. This will help > make the host more efficient since it can potentially process a larger batch > of > packets. Implement this optimization. > > Signed-off-by: K. Y. Srinivasan > --- > v2: Fixed up indentation based on feedback from David Miller. > > drivers/net/hyperv/netvsc.c | 20 > 1 files changed, 12 insertions(+), 8 deletions(-) > > diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c > index 2e8ad06..5fdc5e1 100644 > --- a/drivers/net/hyperv/netvsc.c > +++ b/drivers/net/hyperv/netvsc.c > @@ -743,6 +743,7 @@ static inline int netvsc_send_pkt( > u64 req_id; > int ret; > struct hv_page_buffer *pgbuf; > + u32 vmbus_flags = VMBUS_DATA_PACKET_FLAG_COMPLETION_REQUESTED; > > nvmsg.hdr.msg_type = NVSP_MSG1_TYPE_SEND_RNDIS_PKT; > if (packet->is_data_pkt) { > @@ -772,19 +773,22 @@ static inline int netvsc_send_pkt( > if (packet->page_buf_cnt) { > pgbuf = packet->cp_partial ? packet->page_buf + > packet->rmsg_pgcnt : packet->page_buf; > - ret = vmbus_sendpacket_pagebuffer(out_channel, > - pgbuf, > - packet->page_buf_cnt, > - &nvmsg, > - sizeof(struct nvsp_message), > - req_id); > + ret = vmbus_sendpacket_pagebuffer_ctl(out_channel, > + pgbuf, > + packet->page_buf_cnt, > + &nvmsg, > + sizeof(struct > + nvsp_message), > + req_id, > + vmbus_flags, > + !packet->xmit_more); > } else { > - ret = vmbus_sendpacket( > + ret = vmbus_sendpacket_ctl( > out_channel, &nvmsg, > sizeof(struct nvsp_message), > req_id, > VM_PKT_DATA_INBAND, > - VMBUS_DATA_PACKET_FLAG_COMPLETION_REQUESTED); > + vmbus_flags, !packet->xmit_more); > } > > if (ret == 0) { This might be problematic, if queue is stopped ( netif_tx_stop_queue()) You need to force a kick if we are about to stop the queue : Random example : commit ddd0ca5d60b350bbfbfb60b25885a9779ce6d6c7 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH kernel v9 29/32] vfio: powerpc/spapr: Register memory and define IOMMU v2
On 04/30/2015 04:55 PM, David Gibson wrote: On Sat, Apr 25, 2015 at 10:14:53PM +1000, Alexey Kardashevskiy wrote: The existing implementation accounts the whole DMA window in the locked_vm counter. This is going to be worse with multiple containers and huge DMA windows. Also, real-time accounting would requite additional tracking of accounted pages due to the page size difference - IOMMU uses 4K pages and system uses 4K or 64K pages. Another issue is that actual pages pinning/unpinning happens on every DMA map/unmap request. This does not affect the performance much now as we spend way too much time now on switching context between guest/userspace/host but this will start to matter when we add in-kernel DMA map/unmap acceleration. This introduces a new IOMMU type for SPAPR - VFIO_SPAPR_TCE_v2_IOMMU. New IOMMU deprecates VFIO_IOMMU_ENABLE/VFIO_IOMMU_DISABLE and introduces 2 new ioctls to register/unregister DMA memory - VFIO_IOMMU_SPAPR_REGISTER_MEMORY and VFIO_IOMMU_SPAPR_UNREGISTER_MEMORY - which receive user space address and size of a memory region which needs to be pinned/unpinned and counted in locked_vm. New IOMMU splits physical pages pinning and TCE table update into 2 different operations. It requires 1) guest pages to be registered first 2) consequent map/unmap requests to work only with pre-registered memory. For the default single window case this means that the entire guest (instead of 2GB) needs to be pinned before using VFIO. When a huge DMA window is added, no additional pinning will be required, otherwise it would be guest RAM + 2GB. The new memory registration ioctls are not supported by VFIO_SPAPR_TCE_IOMMU. Dynamic DMA window and in-kernel acceleration will require memory to be preregistered in order to work. The accounting is done per the user process. This advertises v2 SPAPR TCE IOMMU and restricts what the userspace can do with v1 or v2 IOMMUs. Signed-off-by: Alexey Kardashevskiy [aw: for the vfio related changes] Acked-by: Alex Williamson --- Changes: v9: * s/tce_get_hva_cached/tce_iommu_use_page_v2/ v7: * now memory is registered per mm (i.e. process) * moved memory registration code to powerpc/mmu * merged "vfio: powerpc/spapr: Define v2 IOMMU" into this * limited new ioctls to v2 IOMMU * updated doc * unsupported ioclts return -ENOTTY instead of -EPERM v6: * tce_get_hva_cached() returns hva via a pointer v4: * updated docs * s/kzmalloc/vzalloc/ * in tce_pin_pages()/tce_unpin_pages() removed @vaddr, @size and replaced offset with index * renamed vfio_iommu_type_register_memory to vfio_iommu_spapr_register_memory and removed duplicating vfio_iommu_spapr_register_memory --- Documentation/vfio.txt | 23 drivers/vfio/vfio_iommu_spapr_tce.c | 230 +++- include/uapi/linux/vfio.h | 27 + 3 files changed, 274 insertions(+), 6 deletions(-) diff --git a/Documentation/vfio.txt b/Documentation/vfio.txt index 96978ec..94328c8 100644 --- a/Documentation/vfio.txt +++ b/Documentation/vfio.txt @@ -427,6 +427,29 @@ The code flow from the example above should be slightly changed: +5) There is v2 of SPAPR TCE IOMMU. It deprecates VFIO_IOMMU_ENABLE/ +VFIO_IOMMU_DISABLE and implements 2 new ioctls: +VFIO_IOMMU_SPAPR_REGISTER_MEMORY and VFIO_IOMMU_SPAPR_UNREGISTER_MEMORY +(which are unsupported in v1 IOMMU). A summary of the semantic differeces between v1 and v2 would be nice. At this point it's not really clear to me if there's a case for creating v2, or if this could just be done by adding (optional) functionality to v1. v1: memory preregistration is not supported; explicit enable/disable ioctls are required v2: memory preregistration is required; explicit enable/disable are prohibited (as they are not needed). Mixing these in one IOMMU type caused a lot of problems like should I increment locked_vm by the 32bit window size on enable() or not; what do I do about pages pinning when map/map (check if it is from registered memory and do not pin?). Having 2 IOMMU models makes everything a lot simpler. +PPC64 paravirtualized guests generate a lot of map/unmap requests, +and the handling of those includes pinning/unpinning pages and updating +mm::locked_vm counter to make sure we do not exceed the rlimit. +The v2 IOMMU splits accounting and pinning into separate operations: + +- VFIO_IOMMU_SPAPR_REGISTER_MEMORY/VFIO_IOMMU_SPAPR_UNREGISTER_MEMORY ioctls +receive a user space address and size of the block to be pinned. +Bisecting is not supported and VFIO_IOMMU_UNREGISTER_MEMORY is expected to +be called with the exact address and size used for registering +the memory block. The userspace is not expected to call these often. +The ranges are stored in a linked list in a VFIO container. + +- VFIO_IOMMU_MAP_DMA/VFIO_IOMMU_UNMAP_DMA ioctls only update the actual +IOMMU table and do not do pinning; instead these check that the userspace +address is from pre-registered range. + +This separation h
linux-next: Tree for May 1
Hi all, Changes since 20150430: The sound-asoc tree lost its build failure. The akpm-current tree gained a build failure for which I apllied a fix patch. Non-merge commits (relative to Linus' tree): 1233 1252 files changed, 78167 insertions(+), 22093 deletions(-) I have created today's linux-next tree at git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git (patches at http://www.kernel.org/pub/linux/kernel/next/ ). If you are tracking the linux-next tree using git, you should not use "git pull" to do so as that will try to merge the new linux-next release with the old one. You should use "git fetch" and checkout or reset to the new master. You can see which trees have been included by looking in the Next/Trees file in the source. There are also quilt-import.log and merge.log files in the Next directory. Between each merge, the tree was built with a ppc64_defconfig for powerpc and an allmodconfig for x86_64 and a multi_v7_defconfig for arm. After the final fixups (if any), it is also built with powerpc allnoconfig (32 and 64 bit), ppc44x_defconfig and allyesconfig (this fails its final link) and i386, sparc, sparc64 and arm defconfig. Below is a summary of the state of the merge. I am currently merging 214 trees (counting Linus' and 30 trees of patches pending for Linus' tree). Stats about the size of the tree over time can be seen at http://neuling.org/linux-next-size.html . Status of my local build tests will be at http://kisskb.ellerman.id.au/linux-next . If maintainers want to give advice about cross compilers/configs that work, we are always open to add more builds. Thanks to Randy Dunlap for doing many randconfig builds. And to Paul Gortmaker for triage and bug fixes. -- Cheers, Stephen Rothwells...@canb.auug.org.au $ git checkout master $ git reset --hard stable Merging origin/master (4a152c3913fb Merge tag 'pm+acpi-4.1-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm) Merging fixes/master (b94d525e58dc Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net) Merging kbuild-current/rc-fixes (c517d838eb7d Linux 4.0-rc1) Merging arc-current/for-curr (e4140819dadc ARC: signal handling robustify) Merging arm-current/fixes (6c5c2a01fcfd ARM: proc-arm94*.S: fix setup function) Merging m68k-current/for-linus (b24f670b7f5b m68k/mac: Fix out-of-bounds array index in OSS IRQ source initialization) Merging metag-fixes/fixes (0164a711c97b metag: Fix ioremap_wc/ioremap_cached build errors) Merging mips-fixes/mips-fixes (1795cd9b3a91 Linux 3.16-rc5) Merging powerpc-merge-mpe/fixes (68fc378ce332 Revert "powerpc/tm: Abort syscalls in active transactions") Merging powerpc-merge/merge (c517d838eb7d Linux 4.0-rc1) Merging sparc/master (acc455cffa75 sparc64: Setup sysfs to mark LDOM sockets, cores and threads correctly) Merging net/master (e813bb2b955d net: fec: Fix RGMII-ID mode) Merging ipsec/master (bdddbf6996c0 xfrm: fix a race in xfrm_state_lookup_byspi) Merging sound-current/for-linus (0ae3aba2865a Merge tag 'asoc-v4.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus) Merging pci-current/for-linus (b787f68c36d4 Linux 4.1-rc1) Merging wireless-drivers/master (414b7e3b9ce8 rtlwifi: rtl8192cu: Fix kernel deadlock) Merging driver-core.current/driver-core-linus (b787f68c36d4 Linux 4.1-rc1) Merging tty.current/tty-linus (96a5d18bc133 serial: 8250_pci: Add support for 16 port Exar boards) Merging usb.current/usb-linus (0d3bba0287d4 cdc-acm: prevent infinite loop when parsing CDC headers.) Merging usb-gadget-fixes/fixes (c94e289f195e usb: gadget: remove incorrect __init/__exit annotations) Merging usb-serial-fixes/usb-linus (82ee3aeb9295 USB: visor: Match I330 phone more precisely) Merging staging.current/staging-linus (b787f68c36d4 Linux 4.1-rc1) Merging char-misc.current/char-misc-linus (f26443a8ab76 Merge tag 'extcon-fixes-for-4.1-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/extcon into char-misc-linus) Merging input-current/for-linus (48853389f206 Merge branch 'next' into for-linus) Merging crypto-current/master (8c98ebd7a6ff crypto: img-hash - CRYPTO_DEV_IMGTEC_HASH should depend on HAS_DMA) Merging ide/master (d681f1166919 ide: remove deprecated use of pci api) Merging devicetree-current/devicetree/merge (41d9489319f2 drivers/of: Add empty ranges quirk for PA-Semi) Merging rr-fixes/fixes (f47689345931 lguest: update help text.) Merging vfio-fixes/for-linus (82a0eaab980a vfio-pci: Log device requests more verbosely) Merging kselftest-fixes/fixes (67d8712dcc70 selftests: Fix build failures when invoked from kselftest target) Merging drm-intel-fixes/for-linux-next-fixes (a04f90a33fab drm/i915/chv: Implement WaDisableShadowRegForCpd) Merging asm-generic/master (643165c8bbc8 Merge tag 'uaccess_for_
[PATCH 3/3] Change all uses of JOBCTL_* from int to long
c56fb6564dcd ("Fix a misaligned load inside ptrace_attach()") makes jobctl an "unsigned long". It makes sense to have the masks applied to it match that type. This is currently just a cosmetic change, but it will prevent the mask from being unexpectedly truncated if we ever end up with masks with more bits. One instance of "signr" is an int, but I left this alone because the mask ensures that it will never overflow. Reviewed-by: Chris Metcalf Signed-off-by: Palmer Dabbelt --- include/linux/sched.h | 18 +- kernel/signal.c | 6 +++--- 2 files changed, 12 insertions(+), 12 deletions(-) diff --git a/include/linux/sched.h b/include/linux/sched.h index 391827db0a2d..9251155bf27f 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -2077,22 +2077,22 @@ TASK_PFA_CLEAR(SPREAD_SLAB, spread_slab) #define JOBCTL_TRAPPING_BIT21 /* switching to TRACED */ #define JOBCTL_LISTENING_BIT 22 /* ptracer is listening for events */ -#define JOBCTL_STOP_DEQUEUED (1 << JOBCTL_STOP_DEQUEUED_BIT) -#define JOBCTL_STOP_PENDING(1 << JOBCTL_STOP_PENDING_BIT) -#define JOBCTL_STOP_CONSUME(1 << JOBCTL_STOP_CONSUME_BIT) -#define JOBCTL_TRAP_STOP (1 << JOBCTL_TRAP_STOP_BIT) -#define JOBCTL_TRAP_NOTIFY (1 << JOBCTL_TRAP_NOTIFY_BIT) -#define JOBCTL_TRAPPING(1 << JOBCTL_TRAPPING_BIT) -#define JOBCTL_LISTENING (1 << JOBCTL_LISTENING_BIT) +#define JOBCTL_STOP_DEQUEUED (1UL << JOBCTL_STOP_DEQUEUED_BIT) +#define JOBCTL_STOP_PENDING(1UL << JOBCTL_STOP_PENDING_BIT) +#define JOBCTL_STOP_CONSUME(1UL << JOBCTL_STOP_CONSUME_BIT) +#define JOBCTL_TRAP_STOP (1UL << JOBCTL_TRAP_STOP_BIT) +#define JOBCTL_TRAP_NOTIFY (1UL << JOBCTL_TRAP_NOTIFY_BIT) +#define JOBCTL_TRAPPING(1UL << JOBCTL_TRAPPING_BIT) +#define JOBCTL_LISTENING (1UL << JOBCTL_LISTENING_BIT) #define JOBCTL_TRAP_MASK (JOBCTL_TRAP_STOP | JOBCTL_TRAP_NOTIFY) #define JOBCTL_PENDING_MASK(JOBCTL_STOP_PENDING | JOBCTL_TRAP_MASK) extern bool task_set_jobctl_pending(struct task_struct *task, - unsigned int mask); + unsigned long mask); extern void task_clear_jobctl_trapping(struct task_struct *task); extern void task_clear_jobctl_pending(struct task_struct *task, - unsigned int mask); + unsigned long mask); static inline void rcu_copy_process(struct task_struct *p) { diff --git a/kernel/signal.c b/kernel/signal.c index d51c5ddd855c..f19833b5db3c 100644 --- a/kernel/signal.c +++ b/kernel/signal.c @@ -245,7 +245,7 @@ static inline void print_dropped_signal(int sig) * RETURNS: * %true if @mask is set, %false if made noop because @task was dying. */ -bool task_set_jobctl_pending(struct task_struct *task, unsigned int mask) +bool task_set_jobctl_pending(struct task_struct *task, unsigned long mask) { BUG_ON(mask & ~(JOBCTL_PENDING_MASK | JOBCTL_STOP_CONSUME | JOBCTL_STOP_SIGMASK | JOBCTL_TRAPPING)); @@ -297,7 +297,7 @@ void task_clear_jobctl_trapping(struct task_struct *task) * CONTEXT: * Must be called with @task->sighand->siglock held. */ -void task_clear_jobctl_pending(struct task_struct *task, unsigned int mask) +void task_clear_jobctl_pending(struct task_struct *task, unsigned long mask) { BUG_ON(mask & ~JOBCTL_PENDING_MASK); @@ -2000,7 +2000,7 @@ static bool do_signal_stop(int signr) struct signal_struct *sig = current->signal; if (!(current->jobctl & JOBCTL_STOP_PENDING)) { - unsigned int gstop = JOBCTL_STOP_PENDING | JOBCTL_STOP_CONSUME; + unsigned long gstop = JOBCTL_STOP_PENDING | JOBCTL_STOP_CONSUME; struct task_struct *t; /* signr will be recorded in task->jobctl for retries */ -- 2.0.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 1/3] Fix a misaligned load inside ptrace_attach()
The misaligned load exception arises when running ptrace_attach() on the RISC-V (which hasn't been upstreamed yet). The problem is that wait_on_bit() takes a void* but then proceeds to call test_bit(), which takes a long*. This allows an int-aligned pointer to be passed to test_bit(), which promptly fails. This will manifest on any other asm-generic port where unaligned loads trap, where sizeof(long) > sizeof(int), and where task_struct.jobctl ends up not being long-aligned. This patch changes task_struct.jobctl to be a long, which ensures it has the correct alignment. Reviewed-by: Chris Metcalf Signed-off-by: Palmer Dabbelt --- include/linux/sched.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/linux/sched.h b/include/linux/sched.h index 26a2e6122734..391827db0a2d 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1369,7 +1369,7 @@ struct task_struct { int exit_state; int exit_code, exit_signal; int pdeath_signal; /* The signal sent when the parent dies */ - unsigned int jobctl;/* JOBCTL_*, siglock protected */ + unsigned long jobctl; /* JOBCTL_*, siglock protected */ /* Used for emulating ABI behavior of previous Linux versions */ unsigned int personality; -- 2.0.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 2/3] Change wait_on_bit*() to take an unsigned long*, not a void*
The implementations of wait_on_bit*() will only work with long-aligned memory on systems that don't support misaligned loads and stores. This patch changes the function prototypes to ensure that the compiler will enforce alignment. Running make defconfig make KFLAGS="-Werror" seems to indicate that, as of c56fb6564dcd ("Fix a misaligned load inside ptrace_attach()"), there are now no users of non-long-aligned calls to wait_on_bit*(). I additionally tried a few "make randconfig" attempts, none of which failed to compile for this reason. Reviewed-by: Chris Metcalf Signed-off-by: Palmer Dabbelt --- include/linux/wait.h | 17 ++--- 1 file changed, 10 insertions(+), 7 deletions(-) diff --git a/include/linux/wait.h b/include/linux/wait.h index 2db83349865b..d69ac4ecc88b 100644 --- a/include/linux/wait.h +++ b/include/linux/wait.h @@ -969,7 +969,7 @@ extern int bit_wait_io_timeout(struct wait_bit_key *); * on that signal. */ static inline int -wait_on_bit(void *word, int bit, unsigned mode) +wait_on_bit(unsigned long *word, int bit, unsigned mode) { might_sleep(); if (!test_bit(bit, word)) @@ -994,7 +994,7 @@ wait_on_bit(void *word, int bit, unsigned mode) * on that signal. */ static inline int -wait_on_bit_io(void *word, int bit, unsigned mode) +wait_on_bit_io(unsigned long *word, int bit, unsigned mode) { might_sleep(); if (!test_bit(bit, word)) @@ -1020,7 +1020,8 @@ wait_on_bit_io(void *word, int bit, unsigned mode) * received a signal and the mode permitted wakeup on that signal. */ static inline int -wait_on_bit_timeout(void *word, int bit, unsigned mode, unsigned long timeout) +wait_on_bit_timeout(unsigned long *word, int bit, unsigned mode, + unsigned long timeout) { might_sleep(); if (!test_bit(bit, word)) @@ -1047,7 +1048,8 @@ wait_on_bit_timeout(void *word, int bit, unsigned mode, unsigned long timeout) * on that signal. */ static inline int -wait_on_bit_action(void *word, int bit, wait_bit_action_f *action, unsigned mode) +wait_on_bit_action(unsigned long *word, int bit, wait_bit_action_f *action, + unsigned mode) { might_sleep(); if (!test_bit(bit, word)) @@ -1075,7 +1077,7 @@ wait_on_bit_action(void *word, int bit, wait_bit_action_f *action, unsigned mode * the @mode allows that signal to wake the process. */ static inline int -wait_on_bit_lock(void *word, int bit, unsigned mode) +wait_on_bit_lock(unsigned long *word, int bit, unsigned mode) { might_sleep(); if (!test_and_set_bit(bit, word)) @@ -1099,7 +1101,7 @@ wait_on_bit_lock(void *word, int bit, unsigned mode) * the @mode allows that signal to wake the process. */ static inline int -wait_on_bit_lock_io(void *word, int bit, unsigned mode) +wait_on_bit_lock_io(unsigned long *word, int bit, unsigned mode) { might_sleep(); if (!test_and_set_bit(bit, word)) @@ -1125,7 +1127,8 @@ wait_on_bit_lock_io(void *word, int bit, unsigned mode) * the @mode allows that signal to wake the process. */ static inline int -wait_on_bit_lock_action(void *word, int bit, wait_bit_action_f *action, unsigned mode) +wait_on_bit_lock_action(unsigned long *word, int bit, wait_bit_action_f *action, + unsigned mode) { might_sleep(); if (!test_and_set_bit(bit, word)) -- 2.0.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 0/3] Fix a misaligned load inside ptrace_attach()
I ran across what I believe is a bug in some asm-generic code while working on the RISC-V Linux port. Essentially the problem is that wait_on_bit() takes a void *, but then perfroms long-aligned operation. As far as I can tell, this bug could manifest on any other architecture that doesn't support misaligned operations and uses this particular asm-generic implementation. The patch set is split into three parts: * #1 fixes the bug by making task_struct.jobctl an unsigned long, which ensures wait_on_bit() always ends up with a long-aligned argument. * #2 changes the prototype of wait_on_bit() and friends to take a "unsigned long *" instead of a "void *", with the intent of ensuring these problems don't happen again. * #3 is a bit more intrusive: it goes and changes all uses of task_struct.jobctl from int to long. I'm not sure if #3 has gone too far, but I think #1 and #2 are sane. The cost is making task_struct larger on machines where sizeof(long)>sizeof(int), but since it's so big already this isn't too much cost. I thought about making test_bit() perform byte-aligned accesses to avoid this cost, but since there are very similar looking atomic functions I thought that would be too odd. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
linux-next: build failure after merge of the akpm-current tree
Hi Andrew, After merging the akpm tree, today's linux-next build (sparc defconfig) failed like this: mm/bootmem.c: In function 'free_all_bootmem_core': mm/bootmem.c:237:32: error: 'cur' undeclared (first use in this function) __free_pages_bootmem(page++, cur++, 0); ^ Caused by commit "mm: page_alloc: pass PFN to __free_pages_bootmem". This only happens because CONFIG_NO_BOOTMEM is *not* set (it is set on powerpc, x86, arm and sparc64). Clearly it was never built for this config. :-( Reverting would be a real pain, so I added this (probably incorrect) patch to make it build: From: Stephen Rothwell Date: Fri, 1 May 2015 14:21:08 +1000 Subject: [PATCH] mm: page_alloc: pass PFN to __free_pages_bootmem fix Signed-off-by: Stephen Rothwell --- mm/bootmem.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/mm/bootmem.c b/mm/bootmem.c index daf956bb4782..0a0eb62b1c92 100644 --- a/mm/bootmem.c +++ b/mm/bootmem.c @@ -173,6 +173,7 @@ static unsigned long __init free_all_bootmem_core(bootmem_data_t *bdata) { struct page *page; unsigned long *map, start, end, pages, count = 0; + unsigned long cur; if (!bdata->node_bootmem_map) return 0; @@ -214,7 +215,7 @@ static unsigned long __init free_all_bootmem_core(bootmem_data_t *bdata) count += BITS_PER_LONG; start += BITS_PER_LONG; } else { - unsigned long cur = start; + cur = start; start = ALIGN(start + 1, BITS_PER_LONG); while (vec && cur != start) { @@ -233,6 +234,7 @@ static unsigned long __init free_all_bootmem_core(bootmem_data_t *bdata) pages = bdata->node_low_pfn - bdata->node_min_pfn; pages = bootmem_bootmap_pages(pages); count += pages; + cur = bdata->node_min_pfn; while (pages--) __free_pages_bootmem(page++, cur++, 0); -- 2.1.4 -- Cheers, Stephen Rothwells...@canb.auug.org.au pgpoi1NnWpAyV.pgp Description: OpenPGP digital signature
Re: [PATCH 2/4] x86/mce/amd: Introduce deferred error interrupt handler
On 4/30/15 3:41 PM, Andy Lutomirski wrote: On Thu, Apr 30, 2015 at 7:49 AM, Aravind Gopalakrishnan wrote: Changes introduced in the patch- - Assign vector number 0xf4 for Deferred errors - Declare deferred_interrupt, allocate gate and bind it to DEFERRED_APIC_VECTOR. - Declare smp_deferred_interrupt to be used as the entry point for the interrupt in mce_amd.c - Define trace_deferred_interrupt for tracing - Enable deferred error interrupt selectively upon detection of 'succor' bitfield - Setup amd_deferred_error_interrupt() to handle the interrupt and assign it to def_int_vector if feature is present in HW. Else, let default handler deal with it. - Provide Deferred error interrupt stats on /proc/interrupts by incrementing irq_deferred_count You're calling these "deferred interrupts" all over (e.g. irq_deferred_count, deferred_int_handler, etc). That seems like it'll be confusing. They're deferred errors, not deferred interrupts. I used the term as it is an interrupt due to the deferred error. Would 'deferred_err_interrupt' be more apt? Maybe 'irq_deferred_error_count' for the counter? Thanks, -Aravind. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH kernel v9 27/32] powerpc/iommu/ioda2: Add get_table_size() to calculate the size of future table
On 04/29/2015 04:40 PM, David Gibson wrote: On Sat, Apr 25, 2015 at 10:14:51PM +1000, Alexey Kardashevskiy wrote: This adds a way for the IOMMU user to know how much a new table will use so it can be accounted in the locked_vm limit before allocation happens. This stores the allocated table size in pnv_pci_create_table() so the locked_vm counter can be updated correctly when a table is being disposed. This defines an iommu_table_group_ops callback to let VFIO know how much memory will be locked if a table is created. Signed-off-by: Alexey Kardashevskiy --- Changes: v9: * reimplemented the whole patch --- arch/powerpc/include/asm/iommu.h | 5 + arch/powerpc/platforms/powernv/pci-ioda.c | 14 arch/powerpc/platforms/powernv/pci.c | 36 +++ arch/powerpc/platforms/powernv/pci.h | 2 ++ 4 files changed, 57 insertions(+) diff --git a/arch/powerpc/include/asm/iommu.h b/arch/powerpc/include/asm/iommu.h index 1472de3..9844c106 100644 --- a/arch/powerpc/include/asm/iommu.h +++ b/arch/powerpc/include/asm/iommu.h @@ -99,6 +99,7 @@ struct iommu_table { unsigned long it_size; /* Size of iommu table in entries */ unsigned long it_indirect_levels; unsigned long it_level_size; + unsigned long it_allocated_size; unsigned long it_offset;/* Offset into global table */ unsigned long it_base; /* mapped address of tce table */ unsigned long it_index; /* which iommu table this is */ @@ -155,6 +156,10 @@ extern struct iommu_table *iommu_init_table(struct iommu_table * tbl, struct iommu_table_group; struct iommu_table_group_ops { + unsigned long (*get_table_size)( + __u32 page_shift, + __u64 window_size, + __u32 levels); long (*create_table)(struct iommu_table_group *table_group, int num, __u32 page_shift, diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c index e0be556..7f548b4 100644 --- a/arch/powerpc/platforms/powernv/pci-ioda.c +++ b/arch/powerpc/platforms/powernv/pci-ioda.c @@ -2062,6 +2062,18 @@ static void pnv_pci_ioda2_setup_bypass_pe(struct pnv_phb *phb, } #ifdef CONFIG_IOMMU_API +static unsigned long pnv_pci_ioda2_get_table_size(__u32 page_shift, + __u64 window_size, __u32 levels) +{ + unsigned long ret = pnv_get_table_size(page_shift, window_size, levels); + + if (!ret) + return ret; + + /* Add size of it_userspace */ + return ret + (window_size >> page_shift) * sizeof(unsigned long); This doesn't make much sense. The userspace view can't possibly be a property of the specific low-level IOMMU model. This it_userspace thing is all about memory preregistration. I need some way to track how many actual mappings the mm_iommu_table_group_mem_t has in order to decide whether to allow unregistering or not. When I clear TCE, I can read the old value which is host physical address which I cannot use to find the preregistered region and adjust the mappings counter; I can only use userspace addresses for this (not even guest physical addresses as it is VFIO and probably no KVM). So I have to keep userspace addresses somewhere, one per IOMMU page, and the iommu_table seems a natural place for this. +} + static long pnv_pci_ioda2_create_table(struct iommu_table_group *table_group, int num, __u32 page_shift, __u64 window_size, __u32 levels, struct iommu_table *tbl) @@ -2086,6 +2098,7 @@ static long pnv_pci_ioda2_create_table(struct iommu_table_group *table_group, BUG_ON(tbl->it_userspace); tbl->it_userspace = uas; + tbl->it_allocated_size += uas_cb; tbl->it_ops = &pnv_ioda2_iommu_ops; if (pe->tce_inval_reg) tbl->it_type |= (TCE_PCI_SWINV_CREATE | TCE_PCI_SWINV_FREE); @@ -2160,6 +2173,7 @@ static void pnv_ioda2_release_ownership(struct iommu_table_group *table_group) } static struct iommu_table_group_ops pnv_pci_ioda2_ops = { + .get_table_size = pnv_pci_ioda2_get_table_size, .create_table = pnv_pci_ioda2_create_table, .set_window = pnv_pci_ioda2_set_window, .unset_window = pnv_pci_ioda2_unset_window, diff --git a/arch/powerpc/platforms/powernv/pci.c b/arch/powerpc/platforms/powernv/pci.c index fc129c4..1b5b48a 100644 --- a/arch/powerpc/platforms/powernv/pci.c +++ b/arch/powerpc/platforms/powernv/pci.c @@ -662,6 +662,38 @@ void pnv_pci_setup_iommu_table(struct iommu_table *tbl, tbl->it_type = TCE_PCI; } +unsigned long pnv_get_table_size(__u32 page_shift, + __u64 window_size, __u32 levels) +{ + unsigned long bytes = 0; + const unsigned window_shift = ilog2(window_size); + unsigned entries_shift = window_shift - page_shift; + unsigned table_shift =
Re: [PATCH kernel v9 26/32] powerpc/iommu: Add userspace view of TCE table
On 04/29/2015 04:31 PM, David Gibson wrote: On Sat, Apr 25, 2015 at 10:14:50PM +1000, Alexey Kardashevskiy wrote: In order to support memory pre-registration, we need a way to track the use of every registered memory region and only allow unregistration if a region is not in use anymore. So we need a way to tell from what region the just cleared TCE was from. This adds a userspace view of the TCE table into iommu_table struct. It contains userspace address, one per TCE entry. The table is only allocated when the ownership over an IOMMU group is taken which means it is only used from outside of the powernv code (such as VFIO). Signed-off-by: Alexey Kardashevskiy --- Changes: v9: * fixed code flow in error cases added in v8 v8: * added ENOMEM on failed vzalloc() --- arch/powerpc/include/asm/iommu.h | 6 ++ arch/powerpc/kernel/iommu.c | 18 ++ arch/powerpc/platforms/powernv/pci-ioda.c | 22 -- 3 files changed, 44 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/include/asm/iommu.h b/arch/powerpc/include/asm/iommu.h index 7694546..1472de3 100644 --- a/arch/powerpc/include/asm/iommu.h +++ b/arch/powerpc/include/asm/iommu.h @@ -111,9 +111,15 @@ struct iommu_table { unsigned long *it_map; /* A simple allocation bitmap for now */ unsigned long it_page_shift;/* table iommu page size */ struct iommu_table_group *it_table_group; + unsigned long *it_userspace; /* userspace view of the table */ A single unsigned long doesn't seem like enough. Why single? This is an array. How do you know which process's address space this address refers to? It is a current task. Multiple userspaces cannot use the same container/tables. struct iommu_table_ops *it_ops; }; +#define IOMMU_TABLE_USERSPACE_ENTRY(tbl, entry) \ + ((tbl)->it_userspace ? \ + &((tbl)->it_userspace[(entry) - (tbl)->it_offset]) : \ + NULL) + /* Pure 2^n version of get_order */ static inline __attribute_const__ int get_iommu_order(unsigned long size, struct iommu_table *tbl) diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c index 2eaba0c..74a3f52 100644 --- a/arch/powerpc/kernel/iommu.c +++ b/arch/powerpc/kernel/iommu.c @@ -38,6 +38,7 @@ #include #include #include +#include #include #include #include @@ -739,6 +740,8 @@ void iommu_reset_table(struct iommu_table *tbl, const char *node_name) free_pages((unsigned long) tbl->it_map, order); } + WARN_ON(tbl->it_userspace); + memset(tbl, 0, sizeof(*tbl)); } @@ -1016,6 +1019,7 @@ int iommu_take_ownership(struct iommu_table *tbl) { unsigned long flags, i, sz = (tbl->it_size + 7) >> 3; int ret = 0; + unsigned long *uas; /* * VFIO does not control TCE entries allocation and the guest @@ -1027,6 +1031,10 @@ int iommu_take_ownership(struct iommu_table *tbl) if (!tbl->it_ops->exchange) return -EINVAL; + uas = vzalloc(sizeof(*uas) * tbl->it_size); + if (!uas) + return -ENOMEM; + spin_lock_irqsave(&tbl->large_pool.lock, flags); for (i = 0; i < tbl->nr_pools; i++) spin_lock(&tbl->pools[i].lock); @@ -1044,6 +1052,13 @@ int iommu_take_ownership(struct iommu_table *tbl) memset(tbl->it_map, 0xff, sz); } + if (ret) { + vfree(uas); + } else { + BUG_ON(tbl->it_userspace); + tbl->it_userspace = uas; + } + for (i = 0; i < tbl->nr_pools; i++) spin_unlock(&tbl->pools[i].lock); spin_unlock_irqrestore(&tbl->large_pool.lock, flags); @@ -1056,6 +1071,9 @@ void iommu_release_ownership(struct iommu_table *tbl) { unsigned long flags, i, sz = (tbl->it_size + 7) >> 3; + vfree(tbl->it_userspace); + tbl->it_userspace = NULL; + spin_lock_irqsave(&tbl->large_pool.lock, flags); for (i = 0; i < tbl->nr_pools; i++) spin_lock(&tbl->pools[i].lock); diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c index 45bc131..e0be556 100644 --- a/arch/powerpc/platforms/powernv/pci-ioda.c +++ b/arch/powerpc/platforms/powernv/pci-ioda.c @@ -25,6 +25,7 @@ #include #include #include +#include #include #include @@ -1827,6 +1828,14 @@ static void pnv_ioda2_tce_free(struct iommu_table *tbl, long index, pnv_pci_ioda2_tce_invalidate(tbl, index, npages, false); } +void pnv_pci_ioda2_free_table(struct iommu_table *tbl) +{ + vfree(tbl->it_userspace); + tbl->it_userspace = NULL; + + pnv_pci_free_table(tbl); +} + static struct iommu_table_ops pnv_ioda2_iommu_ops = { .set = pnv_ioda2_tce_build, #ifdef CONFIG_IOMMU_API @@ -1834,7 +1843,7 @@ static struct iommu_table_ops pnv_ioda2_iommu_ops = { #
Re: [PATCH kernel v9 28/32] powerpc/mmu: Add userspace-to-physical addresses translation cache
On Thu, Apr 30, 2015 at 06:25:25PM +1000, Paul Mackerras wrote: > On Thu, Apr 30, 2015 at 04:34:55PM +1000, David Gibson wrote: > > On Sat, Apr 25, 2015 at 10:14:52PM +1000, Alexey Kardashevskiy wrote: > > > We are adding support for DMA memory pre-registration to be used in > > > conjunction with VFIO. The idea is that the userspace which is going to > > > run a guest may want to pre-register a user space memory region so > > > it all gets pinned once and never goes away. Having this done, > > > a hypervisor will not have to pin/unpin pages on every DMA map/unmap > > > request. This is going to help with multiple pinning of the same memory > > > and in-kernel acceleration of DMA requests. > > > > > > This adds a list of memory regions to mm_context_t. Each region consists > > > of a header and a list of physical addresses. This adds API to: > > > 1. register/unregister memory regions; > > > 2. do final cleanup (which puts all pre-registered pages); > > > 3. do userspace to physical address translation; > > > 4. manage a mapped pages counter; when it is zero, it is safe to > > > unregister the region. > > > > > > Multiple registration of the same region is allowed, kref is used to > > > track the number of registrations. > > > > [snip] > > > +long mm_iommu_alloc(unsigned long ua, unsigned long entries, > > > + struct mm_iommu_table_group_mem_t **pmem) > > > +{ > > > + struct mm_iommu_table_group_mem_t *mem; > > > + long i, j; > > > + struct page *page = NULL; > > > + > > > + list_for_each_entry_rcu(mem, Ā¤t->mm->context.iommu_group_mem_list, > > > + next) { > > > + if ((mem->ua == ua) && (mem->entries == entries)) > > > + return -EBUSY; > > > + > > > + /* Overlap? */ > > > + if ((mem->ua < (ua + (entries << PAGE_SHIFT))) && > > > + (ua < (mem->ua + (mem->entries << PAGE_SHIFT > > > + return -EINVAL; > > > + } > > > + > > > + mem = kzalloc(sizeof(*mem), GFP_KERNEL); > > > + if (!mem) > > > + return -ENOMEM; > > > + > > > + mem->hpas = vzalloc(entries * sizeof(mem->hpas[0])); > > > + if (!mem->hpas) { > > > + kfree(mem); > > > + return -ENOMEM; > > > + } > > > > So, I've thought more about this and I'm really confused as to what > > this is supposed to be accomplishing. > > > > I see that you need to keep track of what regions are registered, so > > you don't double lock or unlock, but I don't see what the point of > > actualy storing the translations in hpas is. > > > > I had assumed it was so that you could later on get to the > > translations in real mode when you do in-kernel acceleration. But > > that doesn't make sense, because the array is vmalloc()ed, so can't be > > accessed in real mode anyway. > > We can access vmalloc'd arrays in real mode using real_vmalloc_addr(). Ah, ok. > > I can't think of a circumstance in which you can use hpas where you > > couldn't just walk the page tables anyway. > > The problem with walking the page tables is that there is no guarantee > that the page you find that way is the page that was returned by the > gup_fast() we did earlier. Storing the hpas means that we know for > sure that the page we're doing DMA to is one that we have an elevated > page count on. > > Also, there are various points where a Linux PTE is made temporarily > invalid for a short time. If we happened to do a H_PUT_TCE on one cpu > while another cpu was doing that, we'd get a spurious failure returned > by the H_PUT_TCE. I think we want this explanation in the commit message. Anr/or in a comment somewhere, I'm not sure. -- David Gibson| I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson pgpxYOkwDTp_8.pgp Description: PGP signature
Re: [PATCH kernel v9 23/32] powerpc/powernv/ioda: Define and implement DMA table/window management callbacks
On Thu, Apr 30, 2015 at 07:56:17PM +1000, Alexey Kardashevskiy wrote: > On 04/30/2015 02:37 PM, David Gibson wrote: > >On Wed, Apr 29, 2015 at 07:44:20PM +1000, Alexey Kardashevskiy wrote: > >>On 04/29/2015 03:30 PM, David Gibson wrote: > >>>On Sat, Apr 25, 2015 at 10:14:47PM +1000, Alexey Kardashevskiy wrote: > This extends iommu_table_group_ops by a set of callbacks to support > dynamic DMA windows management. > > create_table() creates a TCE table with specific parameters. > it receives iommu_table_group to know nodeid in order to allocate > TCE table memory closer to the PHB. The exact format of allocated > multi-level table might be also specific to the PHB model (not > the case now though). > This callback calculated the DMA window offset on a PCI bus from @num > and stores it in a just created table. > > set_window() sets the window at specified TVT index + @num on PHB. > > unset_window() unsets the window from specified TVT. > > This adds a free() callback to iommu_table_ops to free the memory > (potentially a tree of tables) allocated for the TCE table. > >>> > >>>Doesn't the free callback belong with the previous patch introducing > >>>multi-level tables? > >> > >> > >> > >>If I did that, you would say "why is it here if nothing calls it" on > >>"multilevel" patch and "I see the allocation but I do not see memory > >>release" ;) > > > >Yeah, fair enough ;) > > > >>I need some rule of thumb here. I think it is a bit cleaner if the same > >>patch adds a callback for memory allocation and its counterpart, no? > > > >On further consideration, yes, I think you're right. > > > create_table() and free() are supposed to be called once per > VFIO container and set_window()/unset_window() are supposed to be > called for every group in a container. > > This adds IOMMU capabilities to iommu_table_group such as default > 32bit window parameters and others. > > Signed-off-by: Alexey Kardashevskiy > --- > arch/powerpc/include/asm/iommu.h| 19 > arch/powerpc/platforms/powernv/pci-ioda.c | 75 > ++--- > arch/powerpc/platforms/powernv/pci-p5ioc2.c | 12 +++-- > 3 files changed, 96 insertions(+), 10 deletions(-) > > diff --git a/arch/powerpc/include/asm/iommu.h > b/arch/powerpc/include/asm/iommu.h > index 0f50ee2..7694546 100644 > --- a/arch/powerpc/include/asm/iommu.h > +++ b/arch/powerpc/include/asm/iommu.h > @@ -70,6 +70,7 @@ struct iommu_table_ops { > /* get() returns a physical address */ > unsigned long (*get)(struct iommu_table *tbl, long index); > void (*flush)(struct iommu_table *tbl); > + void (*free)(struct iommu_table *tbl); > }; > > /* These are used by VIO */ > @@ -148,6 +149,17 @@ extern struct iommu_table *iommu_init_table(struct > iommu_table * tbl, > struct iommu_table_group; > > struct iommu_table_group_ops { > + long (*create_table)(struct iommu_table_group *table_group, > + int num, > + __u32 page_shift, > + __u64 window_size, > + __u32 levels, > + struct iommu_table *tbl); > + long (*set_window)(struct iommu_table_group *table_group, > + int num, > + struct iommu_table *tblnew); > + long (*unset_window)(struct iommu_table_group *table_group, > + int num); > /* > * Switches ownership from the kernel itself to an external > * user. While onwership is taken, the kernel cannot use IOMMU > itself. > @@ -160,6 +172,13 @@ struct iommu_table_group { > #ifdef CONFIG_IOMMU_API > struct iommu_group *group; > #endif > + /* Some key properties of IOMMU */ > + __u32 tce32_start; > + __u32 tce32_size; > + __u64 pgsizes; /* Bitmap of supported page sizes */ > + __u32 max_dynamic_windows_supported; > + __u32 max_levels; > >>> > >>>With this information, table_group seems even more like a bad name. > >>>"iommu_state" maybe? > >> > >> > >>Please, no. We will never come to agreement then :( And "iommu_state" is too > >>general anyway, it won't pass. > >> > >> > struct iommu_table tables[IOMMU_TABLE_GROUP_MAX_TABLES]; > struct iommu_table_group_ops *ops; > }; > diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c > b/arch/powerpc/platforms/powernv/pci-ioda.c > index cc1d09c..4828837 100644 > --- a/arch/powerpc/platforms/powernv/pci-ioda.c > +++ b/arch/powerpc/platforms/powernv/pci-ioda.c > @@ -24,6 +24,7 @@ > #include > #include > #include > +#include > > #include > #include > @@ -1846,6 +1847,7 @@ static struct iommu_table_ops pnv_ioda2
Re: [PATCH] i2c: rk3x: Increase wait timeout to 1 second
åØ 2015幓05ę01ę„ 05:44, Doug Anderson åé: While it's not sensible for an i2c command to _actually_ need more than 200ms to complete, let's increase the timeout anyway. Why? It turns out that if you've got a large number of printks going out to a serial console, interrupts on a CPU can be disabled for hundreds of milliseconds. That's not a great situation to be in to start with (maybe we should put a cap in vprintk_emit()) but it's pretty annoying to start seeing unexplained i2c timeouts. A normal system shouldn't see i2c timeouts anyway, so increasing the timeout should help people debugging without hurting other people excessively. Signed-off-by: Doug Anderson --- drivers/i2c/busses/i2c-rk3x.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/i2c/busses/i2c-rk3x.c b/drivers/i2c/busses/i2c-rk3x.c index 019d542..72e97e30 100644 --- a/drivers/i2c/busses/i2c-rk3x.c +++ b/drivers/i2c/busses/i2c-rk3x.c @@ -72,7 +72,7 @@ enum { #define REG_INT_ALL 0x7f /* Constants */ -#define WAIT_TIMEOUT 200 /* ms */ +#define WAIT_TIMEOUT 1000 /* ms */ Yeah,verified on veyron device. Tested-by: Caesar Wang Thanks. Caesar #define DEFAULT_SCL_RATE (100 * 1000) /* Hz */ enum rk3x_i2c_state { -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] i2c: rk3x: Increase wait timeout to 1 second
åØ 2015幓05ę01ę„ 05:44, Doug Anderson åé: While it's not sensible for an i2c command to _actually_ need more than 200ms to complete, let's increase the timeout anyway. Why? It turns out that if you've got a large number of printks going out to a serial console, interrupts on a CPU can be disabled for hundreds of milliseconds. That's not a great situation to be in to start with (maybe we should put a cap in vprintk_emit()) but it's pretty annoying to start seeing unexplained i2c timeouts. A normal system shouldn't see i2c timeouts anyway, so increasing the timeout should help people debugging without hurting other people excessively. Signed-off-by: Doug Anderson --- drivers/i2c/busses/i2c-rk3x.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/i2c/busses/i2c-rk3x.c b/drivers/i2c/busses/i2c-rk3x.c index 019d542..72e97e30 100644 --- a/drivers/i2c/busses/i2c-rk3x.c +++ b/drivers/i2c/busses/i2c-rk3x.c @@ -72,7 +72,7 @@ enum { #define REG_INT_ALL 0x7f /* Constants */ -#define WAIT_TIMEOUT 200 /* ms */ +#define WAIT_TIMEOUT 1000 /* ms */ Yeah, verified on veyron device. Tested-by: Caesar Wang Thanks. Caesar #define DEFAULT_SCL_RATE (100 * 1000) /* Hz */ enum rk3x_i2c_state { -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v6] block: loop: avoiding too many pending per work I/O
If there are too many pending per work I/O, too many high priority work thread can be generated so that system performance can be effected. This patch limits the max pending per work I/O as 16, and will fackback to single queue mode when the max number is reached. This patch fixes Fedora 22 live booting performance regression when it is booted from squashfs over dm based on loop, and looks the following reasons are related with the problem: - not like other filesyststems(such as ext4), squashfs is a bit special, and I observed that increasing I/O jobs to access file in squashfs only improve I/O performance a little, but it can make big difference for ext4 - nested loop: both squashfs.img and ext3fs.img are mounted as loop block, and ext3fs.img is inside the squashfs - during booting, lots of tasks may run concurrently Fixes: b5dd2f6047ca108001328aac0e8588edd15f1778 Cc: sta...@vger.kernel.org (v4.0) Reported-by: Justin M. Forbes Tested-by: Justin M. Forbes Signed-off-by: Ming Lei --- drivers/block/loop.c | 19 +-- drivers/block/loop.h | 2 ++ 2 files changed, 19 insertions(+), 2 deletions(-) diff --git a/drivers/block/loop.c b/drivers/block/loop.c index ae3fcb4..5a728c6 100644 --- a/drivers/block/loop.c +++ b/drivers/block/loop.c @@ -1425,13 +1425,24 @@ static int loop_queue_rq(struct blk_mq_hw_ctx *hctx, const struct blk_mq_queue_data *bd) { struct loop_cmd *cmd = blk_mq_rq_to_pdu(bd->rq); + struct loop_device *lo = cmd->rq->q->queuedata; + bool single_queue = !!(cmd->rq->cmd_flags & REQ_WRITE); + + /* +* Fallback to single queue mode if the pending per work +* I/O number reaches 16, otherwise too many high priority +* worker thread may effect system performance as reported +* in fedora live booting from squashfs over loop. +*/ + if (atomic_read(&lo->pending_per_work_io) >= 16) + single_queue = true; blk_mq_start_request(bd->rq); - if (cmd->rq->cmd_flags & REQ_WRITE) { - struct loop_device *lo = cmd->rq->q->queuedata; + if (single_queue) { bool need_sched = true; + cmd->per_work_io = false; spin_lock_irq(&lo->lo_lock); if (lo->write_started) need_sched = false; @@ -1443,6 +1454,8 @@ static int loop_queue_rq(struct blk_mq_hw_ctx *hctx, if (need_sched) queue_work(loop_wq, &lo->write_work); } else { + atomic_inc(&lo->pending_per_work_io); + cmd->per_work_io = true; queue_work(loop_wq, &cmd->read_work); } @@ -1467,6 +1480,8 @@ static void loop_handle_cmd(struct loop_cmd *cmd) if (ret) cmd->rq->errors = -EIO; blk_mq_complete_request(cmd->rq); + if (cmd->per_work_io) + atomic_dec(&lo->pending_per_work_io); } static void loop_queue_write_work(struct work_struct *work) diff --git a/drivers/block/loop.h b/drivers/block/loop.h index 301c27f..eb855f5 100644 --- a/drivers/block/loop.h +++ b/drivers/block/loop.h @@ -57,6 +57,7 @@ struct loop_device { struct list_headwrite_cmd_head; struct work_struct write_work; boolwrite_started; + atomic_tpending_per_work_io; int lo_state; struct mutexlo_ctl_mutex; @@ -68,6 +69,7 @@ struct loop_device { struct loop_cmd { struct work_struct read_work; struct request *rq; + bool per_work_io; struct list_head list; }; -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] fs: ext3: super: fixed a space coding style issue
Fixed a coding style issue Signed-off-by: Adir Kuhn --- fs/ext3/super.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/ext3/super.c b/fs/ext3/super.c index a9312f0..5ed0044 100644 --- a/fs/ext3/super.c +++ b/fs/ext3/super.c @@ -1908,7 +1908,7 @@ static int ext3_fill_super (struct super_block *sb, void *data, int silent) sbi->s_mount_state = le16_to_cpu(es->s_state); sbi->s_addr_per_block_bits = ilog2(EXT3_ADDR_PER_BLOCK(sb)); sbi->s_desc_per_block_bits = ilog2(EXT3_DESC_PER_BLOCK(sb)); - for (i=0; i < 4; i++) + for (i = 0; i < 4; i++) sbi->s_hash_seed[i] = le32_to_cpu(es->s_hash_seed[i]); sbi->s_def_hash_version = es->s_def_hash_version; i = le32_to_cpu(es->s_flags); -- 2.1.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 01/11] drivers/crypto: include for modular caam code
On Thu, Apr 30, 2015 at 09:47:37PM -0400, Paul Gortmaker wrote: > This file is built off of a tristate Kconfig option and also contains > modular function calls so it should explicitly include module.h to > avoid compile breakage during header shuffles done in the future. > > Cc: Herbert Xu > Cc: "David S. Miller" > Signed-off-by: Paul Gortmaker Please post patches to linux-cry...@vger.kernel.org if you want them to go through the crypto tree. Also it actually gets module.h through caam/compat.h. So your patch is unnecessary. Cheers, -- Email: Herbert Xu Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2] crypto: Constify (de)compression parameters
On Tue, Apr 28, 2015 at 03:36:30PM +0100, David Howells wrote: > In testmgr, struct pcomp_testvec takes a non-const 'params' field, which is > pointed to a const deflate_comp_params or deflate_decomp_params object. With > gcc-5 this incurs the following warnings: > > In file included from ../crypto/testmgr.c:44:0: > ../crypto/testmgr.h:28736:13: warning: initialization discards 'const' > qualifier from pointer target type [-Wdiscarded-array-qualifiers] >.params = &deflate_comp_params, > ^ > ../crypto/testmgr.h:28748:13: warning: initialization discards 'const' > qualifier from pointer target type [-Wdiscarded-array-qualifiers] >.params = &deflate_comp_params, > ^ > ../crypto/testmgr.h:28776:13: warning: initialization discards 'const' > qualifier from pointer target type [-Wdiscarded-array-qualifiers] >.params = &deflate_decomp_params, > ^ > ../crypto/testmgr.h:28800:13: warning: initialization discards 'const' > qualifier from pointer target type [-Wdiscarded-array-qualifiers] >.params = &deflate_decomp_params, > ^ > > Fix this by making the parameters pointer const and constifying the things > that use it. > > Signed-off-by: David Howells Both patches applied. -- Email: Herbert Xu Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v3 4/6] crypto: drbg - add async seeding operation
On Tue, Apr 28, 2015 at 05:00:03AM +0200, Stephan Mueller wrote: > > @@ -1081,6 +1115,11 @@ static int drbg_seed(struct drbg_state *drbg, struct > drbg_string *pers, > return -EINVAL; > } > > + /* cancel any previously invoked seeding */ > + mutex_unlock(&drbg->drbg_mutex); > + drbg_async_work_cancel(&drbg->seed_work); > + mutex_lock(&drbg->drbg_mutex); This seems dangerous and unnecessary. Releasing and reacquiring the locks may invalidate previous checks. Even if it doesn't matter today if somebody modifies the callers later on this could explode. You can easily remove this by making get_blocking_random_bytes_cb idempotent, i.e., do nothing if the work is already queued, which is what it would do anyway if you simply move the INIT_WORK out of it. Cheers, -- Email: Herbert Xu Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFD 5/5] tracing: Add trace_irqsoff tracepoints
On Thu, 30 Apr 2015 21:14:52 -0500 Tom Zanussi wrote: > > 'hist:key=latency.bucket:val=hitcount:sort=latency if cpu==0' > > > > but I haven't got this working. I didn't spend much time figuring out > > why this doesn't work. Even if the above is working you still > > I think it doesn't work because the tracepoint doesn't actually have a > 'cpu' field to use in the filter... Perhaps we should add special fields that don't use the tracepoint field, but can use generically know fields that are always known when the tracepoint is triggered. COMM could be one, as well as CPU. -- Steve -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] sched/rt: Optimizate task_woken_rt()
On Fri, 1 May 2015 10:02:47 +0800 pang.xun...@zte.com.cn wrote: > > > > > > - Remove "!test_tsk_need_resched(rq->curr)" condition, because > > > the flag might be set right before the waking up, but we still > > > need to push equal or lower priority tasks, it should be removed. > > > Without this condition, we actually get the right logic. > > > > But doesn't that happen when we schedule? > > It does, but will have some latency. What latency? The need_resched flag is set, that means as soon as this CPU is in a preemptable situation, it will schedule. The most common case would be return from interrupt, as interrupts or softirqs are usually what trigger the wakeups. But if we do it here, the need resched flag could be set because another higher priority task is about to preempt the current one that is higher than what just woke up. So we move it now to another CPU, and then on return of the interrupt we schedule. Then the current running task gets preempted by the higher priority task and it too can migrate to the CPU we just pushed the other one to, and it still doesn't run, but yet it got moved for no reason at all. I feel better if the need resched flag is set, we wait till a schedule happens to see where everything is about to be moved. > > Still "rq->curr->prio <= p->prio" will be enough for us to ensure the > proper > push_rt_tasks() without this condition. I have no idea what the above means. > > Beside that, for "rq->curr->nr_cpus_allowed < 2", I noticed it was > introduced > by commit b3bc211cfe7d5fe94b, but with "!test_tsk_need_resched(rq->curr)", > it actaully can't be satisfied. What can't be satisfied? > > So, I think this condition should be removed. I'm still not convinced. -- Steve -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
god dag
hallo FĆ„ et lĆ„n nĆ„ med lĆ„n motor Ā®, med en rente pĆ„ 3%. Fyll ut skjemaet nedenfor hvis du er interessert i: KjĆønn: Land: Forbruk: Lengde: FormĆ„l: Det er mange grunner til hvorfor et lĆ„n kan hjelpe Hilsener, Kennel Turid -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 09/11] drivers/scsi: include for modular ufshcd-pltfrm code
On Thu, 2015-04-30 at 21:47 -0400, Paul Gortmaker wrote: > This file is built off of a tristate Kconfig option and also contains > modular function calls so it should explicitly include module.h to > avoid compile breakage during header shuffles done in the future. I don't understand your logic. The ufs code made a design choice to consolidate most headers for the hcd code in a local include (ufshcd.h), which includes module.h, so why would they explicitly need it here as well? And if we follow your logic, why wouldn't they also need to duplicate everything else (like the scsi includes)? James -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFD 5/5] tracing: Add trace_irqsoff tracepoints
On Thu, 2015-04-30 at 12:06 +0200, Daniel Wagner wrote: > Finally we place a few tracepoint at the end of critical section. With > the hist trigger in place we can generate the plots. > > There are a few drawbacks compared to the latency_hist.patch [1] > > The latency plots contain the values from all CPUs. In theory you > can also filter with something like > > 'hist:key=latency.bucket:val=hitcount:sort=latency if cpu==0' > > but I haven't got this working. I didn't spend much time figuring out > why this doesn't work. Even if the above is working you still I think it doesn't work because the tracepoint doesn't actually have a 'cpu' field to use in the filter... Tom -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFD 3/5] tracing: Add option to quantize key values
On Thu, 2015-04-30 at 12:06 +0200, Daniel Wagner wrote: > Let's group some values together. This avoids a too detailed > histogram. Some sort of logarythmic scale could be useful > for latency plots. > > Now we can write something like: > > 'hist:key=latency.bucket:val=hitcount:sort=latency' > > latency: 0 hitcount: 166440 > latency:256 hitcount: 21104 > latency:512 hitcount: 7754 > latency:768 hitcount: 3269 > latency: 1024 hitcount: 1647 > latency: 1280 hitcount:841 > latency: 1536 hitcount:524 > latency: 1792 hitcount:371 > latency: 2048 hitcount:302 > latency: 2304 hitcount:240 > latency: 2560 hitcount:207 > latency: 2816 hitcount:149 > latency: 3072 hitcount:123 > latency: 3328 hitcount:119 > latency: 3584 hitcount:102 > latency: 3840 hitcount: 94 > latency: 4096 hitcount: 89 > latency: 4352 hitcount: 79 > latency: 4608 hitcount: 88 > Nice addition! > One thing I struggled with the grammatic above is that I haven't found > a nice way to pass in arguments, for example the bucket size. There a lot > of options to do it. Just a couple random ideas, not necessarly consistent > or clever: > > 'hist:key=latency.bucket[10,1.5]:val=hitcount:sort=latency' >where [x,y]: x first bucket size, y scaling factor > I like this notation - it's consistent with the other uses of the dot notation in that it's modifying the way things are displayed, in this case displaying latency as a histogram with specific [non-default] parameters. Tom > 'hist:key=latency:val=hitcount:sort=latency:bucket=latency,10,1.5' > > Not for inclusion! > > Not-Signed-off-by: Daniel Wagner > --- > kernel/trace/trace_events_hist.c | 8 > 1 file changed, 8 insertions(+) > > diff --git a/kernel/trace/trace_events_hist.c > b/kernel/trace/trace_events_hist.c > index fe06707..cac94a6 100644 > --- a/kernel/trace/trace_events_hist.c > +++ b/kernel/trace/trace_events_hist.c > @@ -84,6 +84,7 @@ enum hist_field_flags { > HIST_FIELD_STRING = 8, > HIST_FIELD_EXECNAME = 16, > HIST_FIELD_SYSCALL = 32, > + HIST_FIELD_BUCKET = 64, > }; > > struct hist_trigger_sort_key { > @@ -400,6 +401,8 @@ static int create_key_field(struct hist_trigger_data > *hist_data, > flags |= HIST_FIELD_EXECNAME; > else if (!strcmp(field_str, "syscall")) > flags |= HIST_FIELD_SYSCALL; > + else if (!strcmp(field_str, "bucket")) > + flags |= HIST_FIELD_BUCKET; > } > > field = trace_find_event_field(file->event_call, field_name); > @@ -900,6 +903,9 @@ static void event_hist_trigger(struct event_trigger_data > *data, void *rec) > key = entries; > } else { > field_contents = hist_data->key->fn(hist_data->key, rec); > + if (hist_data->key->flags & HIST_FIELD_BUCKET) > + field_contents &= ~0xff; > + > if (hist_data->key->flags & HIST_FIELD_STRING) > key = (void *)field_contents; > else > @@ -1343,6 +1349,8 @@ static const char *get_hist_field_flags(struct > hist_field *hist_field) > flags_str = "hex"; > else if (hist_field->flags & HIST_FIELD_SYSCALL) > flags_str = "syscall"; > + else if (hist_field->flags & HIST_FIELD_BUCKET) > + flags_str = "bucket"; > else if (hist_field->flags & HIST_FIELD_EXECNAME) > flags_str = "execname"; > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFD 2/5] tracing: Add support to sort on the key
On Thu, 2015-04-30 at 12:06 +0200, Daniel Wagner wrote: > The hist patch allows sorting on values only. By allowing to > sort also on the key we can do something like this: > > 'hist:key=latency:val=hitcount:sort=latency' > > latency: 16 hitcount: 3 > latency: 17 hitcount:171 > latency: 18 hitcount:626 > latency: 19 hitcount:594 > latency: 20 hitcount:306 > latency: 21 hitcount:214 > latency: 22 hitcount:232 > latency: 23 hitcount:283 > latency: 24 hitcount:235 > latency: 25 hitcount:105 > latency: 26 hitcount: 54 > latency: 27 hitcount: 79 > latency: 28 hitcount:214 > latency: 29 hitcount:895 > latency: 30 hitcount: 1400 > latency: 31 hitcount:774 > latency: 32 hitcount:653 > [...] > > The obvious choice for the bool was already taken. I haven't found a > good name for the the flag. I guess it would make sense to refactor the > sorting code so that it doesn't really matter what kind of field it > is. > I think you're right - the current flag is kind of an internal thing to the implementation, and uses a name that's too generic. Of course, you should be able to sort on keys as well as values, and the code shouldn't care too much about which is specified. The original code was more capable wrt sorting and I probably simplified it a bit too much - I'll refactor things taking all that into account for the next version. > BTW, I wonder if it would possible to drop the need to always provide > the 'val' argument and just assume the 'val=hitcount' in this case. > That also makes a lot of sense - I'll make that change too. Tom > Not for inclusion! > > Not-Signed-off-by: Daniel Wagner > --- > kernel/trace/trace_events_hist.c | 35 +++ > 1 file changed, 31 insertions(+), 4 deletions(-) > > diff --git a/kernel/trace/trace_events_hist.c > b/kernel/trace/trace_events_hist.c > index 9a7a675..fe06707 100644 > --- a/kernel/trace/trace_events_hist.c > +++ b/kernel/trace/trace_events_hist.c > @@ -89,6 +89,7 @@ enum hist_field_flags { > struct hist_trigger_sort_key { > booluse_hitcount; > booluse_key; > + booluse_real_key; > booldescending; > unsigned intidx; > }; > @@ -260,7 +261,7 @@ static void destroy_hist_fields(struct hist_trigger_data > *hist_data) > } > } > > -static inline struct hist_trigger_sort_key *create_default_sort_key(void) > +static inline struct hist_trigger_sort_key *create_hitcount_sort_key(void) > { > struct hist_trigger_sort_key *sort_key; > > @@ -273,6 +274,19 @@ static inline struct hist_trigger_sort_key > *create_default_sort_key(void) > return sort_key; > } > > +static inline struct hist_trigger_sort_key *create_real_key_sort_key(void) > +{ > + struct hist_trigger_sort_key *sort_key; > + > + sort_key = kzalloc(sizeof(*sort_key), GFP_KERNEL); > + if (!sort_key) > + return ERR_PTR(-ENOMEM); > + > + sort_key->use_real_key = true; > + > + return sort_key; > +} > + > static inline struct hist_trigger_sort_key * > create_sort_key(char *field_name, struct hist_trigger_data *hist_data) > { > @@ -280,7 +294,10 @@ create_sort_key(char *field_name, struct > hist_trigger_data *hist_data) > unsigned int i; > > if (!strcmp(field_name, "hitcount")) > - return create_default_sort_key(); > + return create_hitcount_sort_key(); > + > + if (!strcmp(field_name, hist_data->key->field->name)) > + return create_real_key_sort_key(); > > for (i = 0; i < hist_data->n_vals; i++) > if (!strcmp(field_name, hist_data->vals[i]->field->name)) > @@ -306,7 +323,7 @@ static int create_sort_keys(struct hist_trigger_data > *hist_data) > int ret = 0; > > if (!fields_str) { > - sort_key = create_default_sort_key(); > + sort_key = create_hitcount_sort_key(); > if (IS_ERR(sort_key)) { > ret = PTR_ERR(sort_key); > goto out; > @@ -984,6 +1001,12 @@ static int cmp_entries(const struct > hist_trigger_sort_entry **a, > hist_data = entry_a->hist_data; > sort_key = hist_data->sort_key_cur; > > + if (sort_key->use_real_key) { > + val_a = *(u64 *)entry_a->key; > + val_b = *(u64 *)entry_b->key; > + goto out; > + } > + > if (sort_key->use_key) { > if (memcmp((*a)->key, (*b)->key, hist_data->map->key_size)) > ret = 1; > @@ -998,6 +1021,7 @@ static int cmp_entries(const struct > hist_trigger_sort_entry **a, > val_b = atomic64_read(&entry_b->sums[sort_key->idx]); > } > > +out: > if (val_a > val_b)
Re: [RFD 0/5] Add latency histogram
Hi Daniel, On Thu, 2015-04-30 at 12:06 +0200, Daniel Wagner wrote: > Hi, > > I would like to discuss a possible way of getting the feature of the > latecy_hist.patch [1] added to mainline. > > "Latency histograms are primarily relevant in the context of real-time > enabled kernels (CONFIG_PREEMPT/CONFIG_PREEMPT_RT)and are used in the > quality management of the Linux real-time capabilities." > > Steven pointed out that this might be doable based on Tom Zanussi's > "[PATCH v4 0/7] tracing: 'hist' triggers" [2]. > > Here are my findings. It was not too complicated to get it working, > though I had to add some hacks. I have added comments to each patch. > It looks like you were able to do quite a bit here with not much code - nice! Just FYI, I'll be working on a v5 of the hist triggers patchset that will incorporate the stuff from patch 1 (needs to be split into a separate patch for the triggers code already upstream, and one for hist triggers) and your comments from patch 2 (see comments in my reply to that patch), along with a couple other unrelated changes... Tom > cheers, > daniel > > [2] https://lkml.org/lkml/2015/4/10/591 > [1] > https://git.kernel.org/cgit/linux/kernel/git/rt/linux-stable-rt.git/commit/?h=v3.14-rt-rebase&id=56d50cc34943bbba12b8c5942ee1ae3b29f73acb > > Daniel Wagner (4): > tracing: Add support to sort on the key > tracing: Add option to quantize key values > tracing: Deference pointers without RCU checks > tracing: Add trace_irqsoff tracepoints > > Tom Zanussi (1): > tracing: 'hist' triggers > > include/linux/rculist.h | 36 + > include/linux/tracepoint.h | 4 ++-- > include/trace/events/latency.h | 40 > kernel/trace/trace_events_hist.c| 46 > + > kernel/trace/trace_events_trigger.c | 18 +-- > kernel/trace/trace_irqsoff.c| 38 ++ > 6 files changed, 168 insertions(+), 14 deletions(-) > create mode 100644 include/trace/events/latency.h > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 04/11] drivers/gpu: include for modular rockchip code
These files are built off of a tristate Kconfig option and also contain modular function calls so they should explicitly include module.h to avoid compile breakage during header shuffles done in the future. Cc: David Airlie Cc: Mark Yao Cc: dri-de...@lists.freedesktop.org Signed-off-by: Paul Gortmaker --- drivers/gpu/drm/rockchip/rockchip_drm_drv.c | 1 + drivers/gpu/drm/rockchip/rockchip_drm_vop.c | 1 + 2 files changed, 2 insertions(+) diff --git a/drivers/gpu/drm/rockchip/rockchip_drm_drv.c b/drivers/gpu/drm/rockchip/rockchip_drm_drv.c index 3962176ee713..01b558fe3695 100644 --- a/drivers/gpu/drm/rockchip/rockchip_drm_drv.c +++ b/drivers/gpu/drm/rockchip/rockchip_drm_drv.c @@ -21,6 +21,7 @@ #include #include #include +#include #include #include diff --git a/drivers/gpu/drm/rockchip/rockchip_drm_vop.c b/drivers/gpu/drm/rockchip/rockchip_drm_vop.c index ccb0ce073ef2..38155215efcd 100644 --- a/drivers/gpu/drm/rockchip/rockchip_drm_vop.c +++ b/drivers/gpu/drm/rockchip/rockchip_drm_vop.c @@ -19,6 +19,7 @@ #include #include +#include #include #include #include -- 2.2.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 11/11] sh: mach-highlander/psw.c is tristate and should use module.h
This file is controlled by a tristate Kconfig option, and hence needs to include module.h so that it can get module_init() once we relocate it from init.h into module.h in the future. Note that module_exit() appears to be missing from the driver, so it is questionable whether it would actually work for a removal and reload cycle if it was configured for a modular build. Cc: Paul Mundt Cc: linux...@vger.kernel.org Signed-off-by: Paul Gortmaker --- arch/sh/boards/mach-highlander/psw.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/sh/boards/mach-highlander/psw.c b/arch/sh/boards/mach-highlander/psw.c index 522786318d36..40e2b585d488 100644 --- a/arch/sh/boards/mach-highlander/psw.c +++ b/arch/sh/boards/mach-highlander/psw.c @@ -10,7 +10,7 @@ * for more details. */ #include -#include +#include #include #include #include -- 2.2.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 01/11] drivers/crypto: include for modular caam code
This file is built off of a tristate Kconfig option and also contains modular function calls so it should explicitly include module.h to avoid compile breakage during header shuffles done in the future. Cc: Herbert Xu Cc: "David S. Miller" Signed-off-by: Paul Gortmaker --- drivers/crypto/caam/ctrl.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/crypto/caam/ctrl.c b/drivers/crypto/caam/ctrl.c index efba4ccd4fac..b9ad19df372d 100644 --- a/drivers/crypto/caam/ctrl.c +++ b/drivers/crypto/caam/ctrl.c @@ -5,6 +5,7 @@ */ #include +#include #include #include -- 2.2.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 07/11] drivers/pcmcia: include for modular xxs1500_ss code
This file is built off of a tristate Kconfig option and also contains modular function calls so it should explicitly include module.h to avoid compile breakage during header shuffles done in the future. Cc: Wolfram Sang Cc: linux-pcm...@lists.infradead.org Signed-off-by: Paul Gortmaker --- drivers/pcmcia/xxs1500_ss.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/pcmcia/xxs1500_ss.c b/drivers/pcmcia/xxs1500_ss.c index 4c04360f378b..b2a189507fc3 100644 --- a/drivers/pcmcia/xxs1500_ss.c +++ b/drivers/pcmcia/xxs1500_ss.c @@ -11,6 +11,7 @@ #include #include #include +#include #include #include #include -- 2.2.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 06/11] drivers/net: include for modular stmmac_platform code
This file is built off of a tristate Kconfig option and also contains modular function calls so it should explicitly include module.h to avoid compile breakage during header shuffles done in the future. Cc: Giuseppe Cavallaro Cc: net...@vger.kernel.org Signed-off-by: Paul Gortmaker --- drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c index 705bbdf93940..68aec5c460db 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c @@ -23,6 +23,7 @@ ***/ #include +#include #include #include #include -- 2.2.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 02/11] drivers/clk: include for clk-max77xxx modular code
These files are built off of the tristate COMMON_CLK_MAX77686 and COMMON_CLK_MAX77802 respectively. They also contains modular function calls so they should explicitly include module.h to avoid compile breakage during header shuffles done in the future. Cc: Mike Turquette Cc: Stephen Boyd Signed-off-by: Paul Gortmaker --- drivers/clk/clk-max77686.c | 1 + drivers/clk/clk-max77802.c | 1 + 2 files changed, 2 insertions(+) diff --git a/drivers/clk/clk-max77686.c b/drivers/clk/clk-max77686.c index 86cdb3a28629..446c2fe76dc2 100644 --- a/drivers/clk/clk-max77686.c +++ b/drivers/clk/clk-max77686.c @@ -23,6 +23,7 @@ #include #include #include +#include #include #include #include diff --git a/drivers/clk/clk-max77802.c b/drivers/clk/clk-max77802.c index 0729dc723a8f..74c49b93a6eb 100644 --- a/drivers/clk/clk-max77802.c +++ b/drivers/clk/clk-max77802.c @@ -22,6 +22,7 @@ #include #include #include +#include #include #include #include -- 2.2.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 08/11] drivers/pcmcia: include for modular max77802 code
This file is built off of a tristate Kconfig option and also contains modular function calls so it should explicitly include module.h to avoid compile breakage during header shuffles done in the future. Cc: Liam Girdwood Cc: Mark Brown Signed-off-by: Paul Gortmaker --- drivers/regulator/max77802.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/regulator/max77802.c b/drivers/regulator/max77802.c index 6af41abccacb..c07ee13bd470 100644 --- a/drivers/regulator/max77802.c +++ b/drivers/regulator/max77802.c @@ -27,6 +27,7 @@ #include #include #include +#include #include #include #include -- 2.2.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 03/11] drivers/gpio: include for modular crystalcove code
This file is built off of a tristate Kconfig option and also contains modular function calls so it should explicitly include module.h to avoid compile breakage during header shuffles done in the future. Cc: Linus Walleij Cc: Alexandre Courbot Cc: linux-g...@vger.kernel.org Signed-off-by: Paul Gortmaker --- drivers/gpio/gpio-crystalcove.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/gpio/gpio-crystalcove.c b/drivers/gpio/gpio-crystalcove.c index 91a7ffe83135..cf28ec525e93 100644 --- a/drivers/gpio/gpio-crystalcove.c +++ b/drivers/gpio/gpio-crystalcove.c @@ -16,6 +16,7 @@ */ #include +#include #include #include #include -- 2.2.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 10/11] drivers/staging: include for modular android tegra_ion code
This file is built off of a tristate Kconfig option and also contains modular function calls so it should explicitly include module.h to avoid compile breakage during header shuffles done in the future. Cc: Greg Kroah-Hartman Cc: "Arve Hjļæ½nnevļæ½g" Cc: Riley Andrews Cc: Stephen Warren Cc: Thierry Reding Cc: Alexandre Courbot Cc: de...@driverdev.osuosl.org Cc: linux-te...@vger.kernel.org Signed-off-by: Paul Gortmaker --- drivers/staging/android/ion/tegra/tegra_ion.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/staging/android/ion/tegra/tegra_ion.c b/drivers/staging/android/ion/tegra/tegra_ion.c index 5b8ef0e66010..4d3c516cc15e 100644 --- a/drivers/staging/android/ion/tegra/tegra_ion.c +++ b/drivers/staging/android/ion/tegra/tegra_ion.c @@ -15,6 +15,7 @@ */ #include +#include #include #include #include "../ion.h" -- 2.2.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 09/11] drivers/scsi: include for modular ufshcd-pltfrm code
This file is built off of a tristate Kconfig option and also contains modular function calls so it should explicitly include module.h to avoid compile breakage during header shuffles done in the future. Cc: Vinayak Holikatti Cc: "James E.J. Bottomley" Cc: linux-s...@vger.kernel.org Signed-off-by: Paul Gortmaker --- drivers/scsi/ufs/ufshcd-pltfrm.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/scsi/ufs/ufshcd-pltfrm.c b/drivers/scsi/ufs/ufshcd-pltfrm.c index 7db9564f507d..1c0bac8a7e4a 100644 --- a/drivers/scsi/ufs/ufshcd-pltfrm.c +++ b/drivers/scsi/ufs/ufshcd-pltfrm.c @@ -33,6 +33,7 @@ * this program. */ +#include #include #include #include -- 2.2.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 05/11] drivers/hsi: include for modular omap_ssi code
These files are built off of a tristate Kconfig option and also contain modular function calls so they should explicitly include module.h to avoid compile breakage during header shuffles done in the future. We change the one header file wich gives us coverage on both files: drivers/hsi/controllers/omap_ssi.c drivers/hsi/controllers/omap_ssi_port.c Cc: Sebastian Reichel Signed-off-by: Paul Gortmaker --- drivers/hsi/controllers/omap_ssi.h | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/hsi/controllers/omap_ssi.h b/drivers/hsi/controllers/omap_ssi.h index 9d056417d88c..f9aaf37262be 100644 --- a/drivers/hsi/controllers/omap_ssi.h +++ b/drivers/hsi/controllers/omap_ssi.h @@ -24,6 +24,7 @@ #define __LINUX_HSI_OMAP_SSI_H__ #include +#include #include #include #include -- 2.2.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 00/11] Fix implicit includes of that will break.
The files changed here are simply modular source files that are implicitly relying on being present. We fix them up now, so that we can decouple some of the module related init code from the core init code in another pending series. This is the second series; a pseudo followup to the 1st series[1] factored out from what was a previously larger series[2] so that there is a common theme and lower patch count to ease review. In this case the addition of module.h include to several files is the common theme, and it is a no-op from a code generation point of view, and even from a compile point of view at this point in time. There are probably lots more implicit includes of in tree, but these are the ones that must be fixed in order to avoid build breakage fallout for the pending module.h <---> init.h code relocations. Paul. [1] https://lkml.org/lkml/2015/4/27/777 [2] https://marc.info/?l=linux-kernel&m=139033951228828 --- Paul Gortmaker (11): drivers/crypto: include for modular caam code drivers/clk: include for clk-max77xxx modular code drivers/gpio: include for modular crystalcove code drivers/gpu: include for modular rockchip code drivers/hsi: include for modular omap_ssi code drivers/net: include for modular stmmac_platform code drivers/pcmcia: include for modular xxs1500_ss code drivers/pcmcia: include for modular max77802 code drivers/scsi: include for modular ufshcd-pltfrm code drivers/staging: include for modular android tegra_ion code sh: mach-highlander/psw.c is tristate and should use module.h arch/sh/boards/mach-highlander/psw.c | 2 +- drivers/clk/clk-max77686.c| 1 + drivers/clk/clk-max77802.c| 1 + drivers/crypto/caam/ctrl.c| 1 + drivers/gpio/gpio-crystalcove.c | 1 + drivers/gpu/drm/rockchip/rockchip_drm_drv.c | 1 + drivers/gpu/drm/rockchip/rockchip_drm_vop.c | 1 + drivers/hsi/controllers/omap_ssi.h| 1 + drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c | 1 + drivers/pcmcia/xxs1500_ss.c | 1 + drivers/regulator/max77802.c | 1 + drivers/scsi/ufs/ufshcd-pltfrm.c | 1 + drivers/staging/android/ion/tegra/tegra_ion.c | 1 + 13 files changed, 13 insertions(+), 1 deletion(-) -- 2.2.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH kernel v9 31/32] vfio: powerpc/spapr: Support multiple groups in one container if possible
On Thu, 2015-04-30 at 19:33 +1000, Alexey Kardashevskiy wrote: > On 04/30/2015 05:22 PM, David Gibson wrote: > > On Sat, Apr 25, 2015 at 10:14:55PM +1000, Alexey Kardashevskiy wrote: > >> At the moment only one group per container is supported. > >> POWER8 CPUs have more flexible design and allows naving 2 TCE tables per > >> IOMMU group so we can relax this limitation and support multiple groups > >> per container. > > > > It's not obvious why allowing multiple TCE tables per PE has any > > pearing on allowing multiple groups per container. > > > This patchset is a global TCE tables rework (patches 1..30, roughly) with 2 > outcomes: > 1. reusing the same IOMMU table for multiple groups - patch 31; > 2. allowing dynamic create/remove of IOMMU tables - patch 32. > > I can remove this one from the patchset and post it separately later but > since 1..30 aim to support both 1) and 2), I'd think I better keep them all > together (might explain some of changes I do in 1..30). I think you are talking past each other :-) But yes, having 2 tables per group is orthogonal to the ability of having multiple groups per container. The latter is made possible on P8 in large part because each PE has its own DMA address space (unlike P5IOC2 or P7IOC where a single address space is segmented). Also, on P8 you can actually make the TVT entries point to the same table in memory, thus removing the need to duplicate the actual tables (though you still have to duplicate the invalidations). I would however recommend only sharing the table that way within a chip/node. .../.. > >> > >> -1) Only one IOMMU group per container is supported as an IOMMU group > >> -represents the minimal entity which isolation can be guaranteed for and > >> -groups are allocated statically, one per a Partitionable Endpoint (PE) > >> +1) On older systems (POWER7 with P5IOC2/IODA1) only one IOMMU group per > >> +container is supported as an IOMMU table is allocated at the boot time, > >> +one table per a IOMMU group which is a Partitionable Endpoint (PE) > >> (PE is often a PCI domain but not always). > > I thought the more fundamental problem was that different PEs tended > > to use disjoint bus address ranges, so even by duplicating put_tce > > across PEs you couldn't have a common address space. Yes. This is the problem with P7IOC and earlier. It *could* be doable on P7IOC by making them the same PE but let's not go there. > Sorry, I am not following you here. > > By duplicating put_tce, I can have multiple IOMMU groups on the same > virtual PHB in QEMU, "[PATCH qemu v7 04/14] spapr_pci_vfio: Enable multiple > groups per container" does this, the address ranges will the same. But that is only possible on P8 because only there do we have separate address spaces between PEs. > What I cannot do on p5ioc2 is programming the same table to multiple > physical PHBs (or I could but it is very different than IODA2 and pretty > ugly and might not always be possible because I would have to allocate > these pages from some common pool and face problems like fragmentation). And P7IOC has a similar issue. The DMA address top bits indexes the window on P7IOC within a shared address space. It's possible to configure a TVT to cover multiple devices but with very serious limitations. > >> +Newer systems (POWER8 with IODA2) have improved hardware design which > >> allows > >> +to remove this limitation and have multiple IOMMU groups per a VFIO > >> container. > >> > >> 2) The hardware supports so called DMA windows - the PCI address range > >> within which DMA transfer is allowed, any attempt to access address space > >> diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c > >> b/drivers/vfio/vfio_iommu_spapr_tce.c > >> index a7d6729..970e3a2 100644 > >> --- a/drivers/vfio/vfio_iommu_spapr_tce.c > >> +++ b/drivers/vfio/vfio_iommu_spapr_tce.c > >> @@ -82,6 +82,11 @@ static void decrement_locked_vm(long npages) > >>* into DMA'ble space using the IOMMU > >>*/ > >> > >> +struct tce_iommu_group { > >> + struct list_head next; > >> + struct iommu_group *grp; > >> +}; > >> + > >> /* > >>* The container descriptor supports only a single group per container. > >>* Required by the API as the container is not supplied with the IOMMU > >> group > >> @@ -89,10 +94,11 @@ static void decrement_locked_vm(long npages) > >>*/ > >> struct tce_container { > >>struct mutex lock; > >> - struct iommu_group *grp; > >>bool enabled; > >>unsigned long locked_pages; > >>bool v2; > >> + struct iommu_table tables[IOMMU_TABLE_GROUP_MAX_TABLES]; > > > > Hrm, so here we have more copies of the full iommu_table structures, > > which again muddies the lifetime. The table_group pointer is > > presumably meaningless in these copies, which seems dangerously > > confusing. > > > Ouch. This is bad. No, table_group is not pointless here as it is used to > get to the PE number to invalidate TCE cache. I just realized although I > ne
Re: [PATCH 2/2] clk: qcom: Fix MSM8916 gfx3d_clk_src configuration
On 04/29, Georgi Djakov wrote: > The gfx3d_clk_src parents configuration is incorrect. Fix it. > > Fixes: 3966fab8b6ab "clk: qcom: Add MSM8916 Global Clock Controller support" > Signed-off-by: Georgi Djakov Applied to clk-fixes -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2] clk: qcom: Fix MSM8916 venus divider value
On 04/29, Georgi Djakov wrote: > One of the video codec clock frequencies has incorrect divider value. Fix it. > > Fixes: 3966fab8b6ab "clk: qcom: Add MSM8916 Global Clock Controller support" > Signed-off-by: Georgi Djakov Applied to clk-fixes -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RESEND] clk: Fix JSON output in debugfs
On 04/30, Felipe Balbi wrote: > On Thu, Apr 30, 2015 at 05:37:12PM -0700, Stephen Boyd wrote: > > On 04/29, Stefan Wahren wrote: > > > key/value pairs in a JSON object must be separated by a comma. > > > After adding the properties "accuracy" and "phase" the JSON output > > > of /sys/kernel/debug/clk/clk_dump is invalid. > > > > > > So add the missing commas to fix it. > > > > > > Fixes: 5279fc4 ("clk: add clk accuracy retrieval support") > > > Signed-off-by: Stefan Wahren > > > > Hmph, this regression is old, v3.14 days. We probably ought to > > have a comment in here stating this should be JSON format. > > > > Applied to clk-next with the comment below squashed in. > > > > 8< > > diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c > > index 5edbec6dfb20..b850a0ef5b9f 100644 > > --- a/drivers/clk/clk.c > > +++ b/drivers/clk/clk.c > > @@ -1974,6 +1974,7 @@ static void clk_dump_one(struct seq_file *s, struct > > clk_core *c, int level) > > if (!c) > > return; > > > > + /* This should be JSON format, i.e. elements separated with a comma */ > > seq_printf(s, "\"%s\": { ", c->name); > > seq_printf(s, "\"enable_count\": %d,", c->enable_count); > > seq_printf(s, "\"prepare_count\": %d,", c->prepare_count); > > you probably want to a newline character after all clocks have been > dumped. Sure. Please send it as a separate patch with signed-off and I'll apply. It doesn't seem like a fix for a regression. -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
context tracking vs. syscall_trace_leave & do_notify_resume loop
Andy pointed out to me something I should have seen earlier: both syscall_trace_leave and do_notify_resume call both user_exit() and user_enter(), which has the potential to greatly increase the cost of context tracking. I believe (though it is hard to know for sure) there are legitimate reasons why there is a loop around syscall_trace_leave and do_notify_resume, but I strongly suspect the context tracking code does not need to be in that loop. I suspect it would be possible to stick a call to a new function (return_to_user ?) right after the DISABLE_INTERRUPTS below, which could be used to do the context tracking user_enter just once, and later on also to load the user FPU context (patches I have sitting around). syscall_return: /* The IRETQ could re-enable interrupts: */ DISABLE_INTERRUPTS(CLBR_ANY) TRACE_IRQS_IRETQ Andy, Denys, do you guys see any issues with that idea? I realize that would mean a RESTORE_EXTRA_REGS after that call to return_to_user(), but it looks like that could be achieved without making the code any worse than it already is :) -- All rights reversed -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] clk: at91: Constify irq_domain_ops
On 04/27, Boris Brezillon wrote: > On Mon, 27 Apr 2015 21:52:38 +0900 > Krzysztof Kozlowski wrote: > > > The irq_domain_ops are not modified by the driver and the irqdomain core > > code accepts pointer to a const data. > > > > Signed-off-by: Krzysztof Kozlowski > > Acked-by: Boris Brezillon > Thanks. Applied to clk-next. -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] block: loop: avoiding too many pending per work I/O
On Fri, May 1, 2015 at 12:59 AM, Jeff Moyer wrote: > Ming Lei writes: > >> On Wed, Apr 29, 2015 at 12:36 AM, Jeff Moyer wrote: >>> Ming Lei writes: >>> If there are too many pending per work I/O, too many high priority work thread can be generated so that system performance can be effected. This patch limits the max pending per work I/O as 16, and will fackback to single queue mode when the max number is reached. >>> >>> Actually, it limits it to 32. Also, there is no discussion on what >>> variables might affect this number. Will that magic number change >>> depending on the number of cpus on the system, for example? >> >> My fault, it should have been 16. >> >> It is just used to keep more IOs in flight, but can't cause obvious >> costs like the case of Fedora live booting. >> >> IMO, it shouldn't depend much on number of CPUs, and more >> related with I/O performance of the backing file, and the number >> is like 'iodepth' of fio. > > OK, that makes more sense. I'm still not a huge fan of hard-coding > numbers that are storage-specific, but I don't have a better suggestion > at the moment, either. OK, thanks for your review. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 1/2] ACPI: activate&export acpi_os_get_physical_address
On Friday, May 01, 2015 03:32:29 AM Rafael J. Wysocki wrote: > On Thursday, April 30, 2015 11:10:25 AM Darren Hart wrote: > > On Wed, Apr 22, 2015 at 04:12:24PM +0200, Kast Bernd wrote: > > > acpi_os_get_physical_address will be needed by an acpi driver > > > (asus-wmi.c). > > > Additionally it could be used by dell-laptop.c instead of directly > > > calling virt_to_phys. > > > > > > acpi_os_get_physical_address gets exported and ACPI_FUTURE_USAGE is > > > removed > > > > > > > Hrm, well... this doesn't get rid of virt_to_phys, it just wraps it really. > > I'm > > not sure that makes this any more acceptable than the original from Felipe > > - but > > that's not my call. > > Use virt_to_phys() if you need to. > > This one is in case ACPICA needs to get the virtual-to-physical mapping (hence > ACPI_FUTURE_USAGE). More to the point, the reason why virt_to_phys() needs to be used in patch [2/2] seems to be a nasty hack in the ASUS AML that pretty much expects us to provide the physical address as an argument. And I don't really understand the Matthew's comment regarding limiting operation regions to system memory. This is about a specific operation region (which BTW only seems to be used as a means to access system memory at the location pointed to by the arg) in that particular method. Matthew? -- I speak only for myself. Rafael J. Wysocki, Intel Open Source Technology Center. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 6/6] dt-bindings: ARM: Mediatek: Document devicetree bindings for clock/reset controllers
On 04/23, Sascha Hauer wrote: > This adds the binding documentation for the apmixedsys, perisys and > infracfg controllers found on Mediatek SoCs. > > Signed-off-by: Sascha Hauer Please Cc devicetree reviewers on bindings (CCed now). > --- > .../bindings/arm/mediatek/mediatek,apmixedsys.txt | 23 + > .../bindings/arm/mediatek/mediatek,infracfg.txt| 30 > ++ > .../bindings/arm/mediatek/mediatek,pericfg.txt | 30 > ++ > .../bindings/arm/mediatek/mediatek,topckgen.txt| 23 + > 4 files changed, 106 insertions(+) > create mode 100644 > Documentation/devicetree/bindings/arm/mediatek/mediatek,apmixedsys.txt > create mode 100644 > Documentation/devicetree/bindings/arm/mediatek/mediatek,infracfg.txt > create mode 100644 > Documentation/devicetree/bindings/arm/mediatek/mediatek,pericfg.txt > create mode 100644 > Documentation/devicetree/bindings/arm/mediatek/mediatek,topckgen.txt > > diff --git > a/Documentation/devicetree/bindings/arm/mediatek/mediatek,apmixedsys.txt > b/Documentation/devicetree/bindings/arm/mediatek/mediatek,apmixedsys.txt > new file mode 100644 > index 000..5af6d73 > --- /dev/null > +++ b/Documentation/devicetree/bindings/arm/mediatek/mediatek,apmixedsys.txt > @@ -0,0 +1,23 @@ > +Mediatek apmixedsys controller > +== > + > +The Mediatek apmixedsys controller provides the PLLs to the system. > + > +Required Properties: > + > +- compatible: Should be: > + - "mediatek,mt8135-apmixedsys" > + - "mediatek,mt8173-apmixedsys" > +- #clock-cells: Must be 1 > + > +The apmixedsys controller uses the common clk binding from > +Documentation/devicetree/bindings/clock/clock-bindings.txt > +The available clocks are defined in dt-bindings/clock/mt*-clk.h. > + > +Example: > + > +apmixedsys: apmixedsys@10209000 { apmixedsys: clock-controller@10209000 { would be more standard. The same comment applies throughout this patch. Otherwise it looks good to me. -Stephen > + compatible = "mediatek,mt8173-apmixedsys"; > + reg = <0 0x10209000 0 0x1000>; > + #clock-cells = <1>; > +}; > diff --git > a/Documentation/devicetree/bindings/arm/mediatek/mediatek,infracfg.txt > b/Documentation/devicetree/bindings/arm/mediatek/mediatek,infracfg.txt > new file mode 100644 > index 000..684da473 > --- /dev/null > +++ b/Documentation/devicetree/bindings/arm/mediatek/mediatek,infracfg.txt > @@ -0,0 +1,30 @@ > +Mediatek infracfg controller > + > + > +The Mediatek infracfg controller provides various clocks and reset > +outputs to the system. > + > +Required Properties: > + > +- compatible: Should be: > + - "mediatek,mt8135-infracfg", "syscon" > + - "mediatek,mt8173-infracfg", "syscon" > +- #clock-cells: Must be 1 > +- #reset-cells: Must be 1 > + > +The infracfg controller uses the common clk binding from > +Documentation/devicetree/bindings/clock/clock-bindings.txt > +The available clocks are defined in dt-bindings/clock/mt*-clk.h. > +Also it uses the common reset controller binding from > +Documentation/devicetree/bindings/reset/reset.txt. > +The available reset outputs are defined in > +dt-bindings/reset-controller/mt*-resets.h > + > +Example: > + > +infracfg: infracfg@10001000 { > + compatible = "mediatek,mt8173-infracfg", "syscon"; > + reg = <0 0x10001000 0 0x1000>; > + #clock-cells = <1>; > + #reset-cells = <1>; > +}; > diff --git > a/Documentation/devicetree/bindings/arm/mediatek/mediatek,pericfg.txt > b/Documentation/devicetree/bindings/arm/mediatek/mediatek,pericfg.txt > new file mode 100644 > index 000..fdb45c6 > --- /dev/null > +++ b/Documentation/devicetree/bindings/arm/mediatek/mediatek,pericfg.txt > @@ -0,0 +1,30 @@ > +Mediatek pericfg controller > +=== > + > +The Mediatek pericfg controller provides various clocks and reset > +outputs to the system. > + > +Required Properties: > + > +- compatible: Should be: > + - "mediatek,mt8135-pericfg", "syscon" > + - "mediatek,mt8173-pericfg", "syscon" > +- #clock-cells: Must be 1 > +- #reset-cells: Must be 1 > + > +The pericfg controller uses the common clk binding from > +Documentation/devicetree/bindings/clock/clock-bindings.txt > +The available clocks are defined in dt-bindings/clock/mt*-clk.h. > +Also it uses the common reset controller binding from > +Documentation/devicetree/bindings/reset/reset.txt. > +The available reset outputs are defined in > +dt-bindings/reset-controller/mt*-resets.h > + > +Example: > + > +pericfg: pericfg@10003000 { > + compatible = "mediatek,mt8173-pericfg", "syscon"; > + reg = <0 0x10003000 0 0x1000>; > + #clock-cells = <1>; > + #reset-cells = <1>; > +}; > diff --git > a/Documentation/devicetree/bindings/arm/mediatek/mediatek,topckgen.txt > b/Documentation/devicetree/bindings/arm/mediatek/mediatek,topckgen.txt > new file mode 100644 > index 000..a425248 > --- /dev/null > +++ b/Documenta
Re: [PATCH 1/3] HID: wacom: Do not add suffix to name of devices with an unknown type
On Thu, Apr 30, 2015 at 5:51 PM, Jason Gerecke wrote: > The naming logic currently assumes that all devices will be a pen, finger, > or pad. Though this has historically been the case, the new HID_GENERIC > catch-all may cause us to probe devices with Wacom's 056A VID which aren't > any of these types (e.g. the "Cintiq 24HDT Monitor Control"). This patch > updates the logic so that no suffix will be added to the device name if > the device type is unknown. > > Signed-off-by: Jason Gerecke Reviewed-by: Ping Cheng for the whole set. Cheers, Ping > --- > drivers/hid/wacom_sys.c | 13 - > 1 file changed, 8 insertions(+), 5 deletions(-) > > diff --git a/drivers/hid/wacom_sys.c b/drivers/hid/wacom_sys.c > index 9c57ac0..222baf5 100644 > --- a/drivers/hid/wacom_sys.c > +++ b/drivers/hid/wacom_sys.c > @@ -1440,12 +1440,15 @@ static void wacom_update_name(struct wacom *wacom) > snprintf(wacom_wac->pad_name, sizeof(wacom_wac->pad_name), > "%s Pad", wacom_wac->name); > > - if (features->device_type != BTN_TOOL_FINGER) > + if (features->device_type == BTN_TOOL_PEN) { > strlcat(wacom_wac->name, " Pen", WACOM_NAME_MAX); > - else if (features->touch_max) > - strlcat(wacom_wac->name, " Finger", WACOM_NAME_MAX); > - else > - strlcat(wacom_wac->name, " Pad", WACOM_NAME_MAX); > + } > + else if (features->device_type == BTN_TOOL_FINGER) { > + if (features->touch_max) > + strlcat(wacom_wac->name, " Finger", WACOM_NAME_MAX); > + else > + strlcat(wacom_wac->name, " Pad", WACOM_NAME_MAX); > + } > } > > static int wacom_probe(struct hid_device *hdev, > -- > 2.3.5 > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFC v1 5/5] clk: introduce clk_core_enable_lock and clk_core_disable_lock functions
On 04/15, Dong Aisheng wrote: > This can be usefully when clock core wants to enable/disable clocks. > Then we don't have to convert the struct clk_core to struct clk to call > clk_enable/clk_disable which is a bit un-align with exist using. > > Cc: Mike Turquette > Cc: Stephen Boyd > Signed-off-by: Dong Aisheng > --- Yeah let's add this patch either before patch 4 or squash it into patch 4. Also, avoid adding more function prototypes please. -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFC v1 4/5] clk: core: add CLK_SET_PARENT_ON flags to support clocks require parent on
On 04/15, Dong Aisheng wrote: > diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c > index 7af553d..f2470e5 100644 > --- a/drivers/clk/clk.c > +++ b/drivers/clk/clk.c > @@ -43,6 +43,11 @@ static int clk_core_get_phase(struct clk_core *clk); > static bool clk_core_is_prepared(struct clk_core *clk); > static bool clk_core_is_enabled(struct clk_core *clk); > static struct clk_core *clk_core_lookup(const char *name); > +static struct clk *clk_core_get_parent(struct clk_core *clk); > +static int clk_core_prepare(struct clk_core *clk); > +static void clk_core_unprepare(struct clk_core *clk); > +static int clk_core_enable(struct clk_core *clk); > +static void clk_core_disable(struct clk_core *clk); Let's avoid adding more here if we can. > > /***private data structures***/ > > @@ -508,6 +513,7 @@ static void clk_unprepare_unused_subtree(struct clk_core > *clk) > static void clk_disable_unused_subtree(struct clk_core *clk) > { > struct clk_core *child; > + struct clk *parent = clk_core_get_parent(clk); > unsigned long flags; > > lockdep_assert_held(&prepare_lock); > @@ -515,6 +521,13 @@ static void clk_disable_unused_subtree(struct clk_core > *clk) > hlist_for_each_entry(child, &clk->children, child_node) > clk_disable_unused_subtree(child); > > + if (clk->flags & CLK_SET_PARENT_ON && parent) { > + clk_core_prepare(parent->core); > + flags = clk_enable_lock(); > + clk_core_enable(parent->core); > + clk_enable_unlock(flags); > + } If there's a parent and this clock is on, why wouldn't the parent also be on? It doesn't seem right to have a clock that's on without it's parent on that we're trying to turn off. Put another way, how is this fixing anything? > + > flags = clk_enable_lock(); > > if (clk->enable_count) > @@ -608,6 +627,14 @@ struct clk *__clk_get_parent(struct clk *clk) > } > EXPORT_SYMBOL_GPL(__clk_get_parent); > > +static struct clk *clk_core_get_parent(struct clk_core *clk) > +{ > + if (!clk) > + return NULL; > + > + return !clk->parent ? NULL : clk->parent->hw->clk; > +} s/clk/core/ in this function > + > static struct clk_core *clk_core_get_parent_by_index(struct clk_core *clk, >u8 index) > { > @@ -1456,13 +1483,27 @@ static struct clk_core > *__clk_set_parent_before(struct clk_core *clk, >* hardware and software states. >* >* See also: Comment for clk_set_parent() below. > + * > + * 2. enable two parents clock for .set_parent() operation if finding > + * flag CLK_SET_PARENT_ON >*/ > - if (clk->prepare_count) { > + if (clk->prepare_count || clk->flags & CLK_SET_PARENT_ON) { > clk_core_prepare(parent); > flags = clk_enable_lock(); > clk_core_enable(parent); > - clk_core_enable(clk); > clk_enable_unlock(flags); > + > + if (clk->prepare_count) { > + flags = clk_enable_lock(); > + clk_core_enable(clk); > + clk_enable_unlock(flags); > + } else { > + > + clk_core_prepare(old_parent); > + flags = clk_enable_lock(); > + clk_core_enable(old_parent); > + clk_enable_unlock(flags); > + } > } > > /* update the clk tree topology */ > @@ -1483,12 +1524,22 @@ static void __clk_set_parent_after(struct clk_core > *clk, >* Finish the migration of prepare state and undo the changes done >* for preventing a race with clk_enable(). >*/ > - if (clk->prepare_count) { > + if (clk->prepare_count || clk->flags & CLK_SET_PARENT_ON) { > flags = clk_enable_lock(); > - clk_core_disable(clk); > clk_core_disable(old_parent); > clk_enable_unlock(flags); > clk_core_unprepare(old_parent); > + > + if (clk->prepare_count) { > + flags = clk_enable_lock(); > + clk_core_disable(clk); > + clk_enable_unlock(flags); > + } else { > + flags = clk_enable_lock(); > + clk_core_disable(parent); > + clk_enable_unlock(flags); > + clk_core_unprepare(parent); > + } Is there a reason why the clk itself can't be on when we switch parents? It seems that if the clk was on during the parent switch, then it should be possible to just add a flag check on both these if conditions and be done. It may be possible to change the behavior here and not enable the clk in hardware, just up the count and turn on both the parents. I'm trying to recall why we enable the clk itself across the switch. > } > } > > @@ -1514,12 +1565,23 @@ static int __clk_set_parent(
Re: [RFC 1/2] ACPI: activate&export acpi_os_get_physical_address
On Thursday, April 30, 2015 11:10:25 AM Darren Hart wrote: > On Wed, Apr 22, 2015 at 04:12:24PM +0200, Kast Bernd wrote: > > acpi_os_get_physical_address will be needed by an acpi driver (asus-wmi.c). > > Additionally it could be used by dell-laptop.c instead of directly calling > > virt_to_phys. > > > > acpi_os_get_physical_address gets exported and ACPI_FUTURE_USAGE is removed > > > > Hrm, well... this doesn't get rid of virt_to_phys, it just wraps it really. > I'm > not sure that makes this any more acceptable than the original from Felipe - > but > that's not my call. Use virt_to_phys() if you need to. This one is in case ACPICA needs to get the virtual-to-physical mapping (hence ACPI_FUTURE_USAGE). Rafael -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RESEND] clk: Fix JSON output in debugfs
On Thu, Apr 30, 2015 at 05:37:12PM -0700, Stephen Boyd wrote: > On 04/29, Stefan Wahren wrote: > > key/value pairs in a JSON object must be separated by a comma. > > After adding the properties "accuracy" and "phase" the JSON output > > of /sys/kernel/debug/clk/clk_dump is invalid. > > > > So add the missing commas to fix it. > > > > Fixes: 5279fc4 ("clk: add clk accuracy retrieval support") > > Signed-off-by: Stefan Wahren > > Hmph, this regression is old, v3.14 days. We probably ought to > have a comment in here stating this should be JSON format. > > Applied to clk-next with the comment below squashed in. > > 8< > diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c > index 5edbec6dfb20..b850a0ef5b9f 100644 > --- a/drivers/clk/clk.c > +++ b/drivers/clk/clk.c > @@ -1974,6 +1974,7 @@ static void clk_dump_one(struct seq_file *s, struct > clk_core *c, int level) > if (!c) > return; > > + /* This should be JSON format, i.e. elements separated with a comma */ > seq_printf(s, "\"%s\": { ", c->name); > seq_printf(s, "\"enable_count\": %d,", c->enable_count); > seq_printf(s, "\"prepare_count\": %d,", c->prepare_count); you probably want to a newline character after all clocks have been dumped. diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c index 459ce9da13e0..c2de94238e75 100644 --- a/drivers/clk/clk.c +++ b/drivers/clk/clk.c @@ -276,7 +276,7 @@ static int clk_dump(struct seq_file *s, void *data) clk_prepare_unlock(); - seq_printf(s, "}"); + seq_printf(s, "}\n"); return 0; } cheers -- balbi signature.asc Description: Digital signature
Re: [PATCH v2 02/20] libnd, nd_acpi: initial libnd infrastructure and NFIT support
On Thursday, April 30, 2015 05:39:06 PM Dan Williams wrote: > On Thu, Apr 30, 2015 at 4:23 PM, Rafael J. Wysocki wrote: > > On Tuesday, April 28, 2015 02:24:23 PM Dan Williams wrote: > >> 1/ Autodetect an NFIT table for the ACPI namespace device with _HID of > >>"ACPI0012" > >> > >> 2/ libnd bus registration > >> > >> The NFIT provided by ACPI is one possible method by which platforms will > >> discover NVDIMM resources. However, the intent of the nd_bus_descriptor > >> abstraction is to abstract "provider" specific details, leaving libnd > >> to be independent of the specific NVDIMM resource discovery mechanism. > >> This flexibility is later exploited later to implement custom-defined nd > >> buses. > >> > >> Cc: > >> Cc: Robert Moore > >> Cc: Rafael J. Wysocki > >> Signed-off-by: Dan Williams > >> --- > >> drivers/block/Kconfig |2 > >> drivers/block/Makefile|1 > >> drivers/block/nd/Kconfig | 40 +++ > >> drivers/block/nd/Makefile |6 + > >> drivers/block/nd/acpi.c | 475 > >> + > >> drivers/block/nd/acpi_nfit.h | 254 ++ > >> drivers/block/nd/core.c | 67 ++ > >> drivers/block/nd/libnd.h | 33 +++ > >> drivers/block/nd/nd-private.h | 23 ++ > >> 9 files changed, 901 insertions(+) > >> create mode 100644 drivers/block/nd/Kconfig > >> create mode 100644 drivers/block/nd/Makefile > >> create mode 100644 drivers/block/nd/acpi.c > >> create mode 100644 drivers/block/nd/acpi_nfit.h > >> create mode 100644 drivers/block/nd/core.c > >> create mode 100644 drivers/block/nd/libnd.h > >> create mode 100644 drivers/block/nd/nd-private.h > >> > >> diff --git a/drivers/block/Kconfig b/drivers/block/Kconfig > >> index eb1fed5bd516..dfe40e5ca9bd 100644 > >> --- a/drivers/block/Kconfig > >> +++ b/drivers/block/Kconfig > >> @@ -321,6 +321,8 @@ config BLK_DEV_NVME > >> To compile this driver as a module, choose M here: the > >> module will be called nvme. > >> > >> +source "drivers/block/nd/Kconfig" > >> + > >> config BLK_DEV_SKD > >> tristate "STEC S1120 Block Driver" > >> depends on PCI > >> diff --git a/drivers/block/Makefile b/drivers/block/Makefile > >> index 9cc6c18a1c7e..07a6acecf4d8 100644 > >> --- a/drivers/block/Makefile > >> +++ b/drivers/block/Makefile > >> @@ -24,6 +24,7 @@ obj-$(CONFIG_CDROM_PKTCDVD) += pktcdvd.o > >> obj-$(CONFIG_MG_DISK)+= mg_disk.o > >> obj-$(CONFIG_SUNVDC) += sunvdc.o > >> obj-$(CONFIG_BLK_DEV_NVME) += nvme.o > >> +obj-$(CONFIG_ND_DEVICES) += nd/ > >> obj-$(CONFIG_BLK_DEV_SKD)+= skd.o > >> obj-$(CONFIG_BLK_DEV_OSD)+= osdblk.o > >> > >> diff --git a/drivers/block/nd/Kconfig b/drivers/block/nd/Kconfig > >> new file mode 100644 > >> index ..6d5d6b732f82 > >> --- /dev/null > >> +++ b/drivers/block/nd/Kconfig > >> @@ -0,0 +1,40 @@ > >> +menuconfig ND_DEVICES > >> + bool "NVDIMM Support" > >> + depends on PHYS_ADDR_T_64BIT > >> + help > >> + Generic support for non-volatile memory devices including > >> + ACPI-6-NFIT defined resources. On platforms that define an > >> + NFIT, or otherwise can discover NVDIMM resources, a libnd > >> + bus is registered to advertise PMEM (persistent memory) > >> + namespaces (/dev/pmemX) and BLK (sliding mmio window(s)) > >> + namespaces (/dev/ndX). A PMEM namespace refers to a memory > >> + resource that may span multiple DIMMs and support DAX (see > >> + CONFIG_DAX). A BLK namespace refers to an NVDIMM control > >> + region which exposes an mmio register set for windowed > >> + access mode to non-volatile memory. > >> + > >> +if ND_DEVICES > >> + > >> +config LIBND > >> + tristate "LIBND: libnd device driver support" > >> + help > >> + Platform agnostic device model for a libnd bus. Publishes > >> + resources for a PMEM (persistent-memory) driver and/or BLK > >> + (sliding mmio window(s)) driver to attach. Exposes a device > >> + topology under a "ndX" bus device, a "/dev/ndctlX" bus-ioctl > >> + message passing interface, and a "/dev/nmemX" dimm-ioctl > >> + message interface for each memory device registered on the > >> + bus. instance. A userspace library "ndctl" provides an API > >> + to enumerate/manage this subsystem. > >> + > >> +config ND_ACPI > >> + tristate "ACPI: NFIT to libnd bus support" > >> + select LIBND > >> + depends on ACPI > >> + help > >> + Infrastructure to probe ACPI 6 compliant platforms for > >> + NVDIMMs (NFIT) and register a libnd device tree. In > >> + addition to storage devices this also enables libnd craft > >> + ACPI._DSM messages for platform/dimm configuration. > > > > I'm wondering if the two CONFIG options above really need to be > > user-selectable? > > > > For example, what reason people (who've already selected ND_DEVICE
[PATCH 2/3] HID: wacom: Discover device_type from HID descriptor for all devices
Currently, we assume a device_type of BTN_TOOL_PEN before scanning the HID descriptor and then change the device_type if what we discover proves that assumption wrong. This way of doing things makes it more difficult to figure out if a device (particularly a HID_GENERIC device) actually does tablet/touch input or is something completley different. This patch leaves device_type at its initial value of 0 and then calls 'wacom_parse_hid' for every device (not just those that have touch). As we map the usages, we can set the device_type as before. After we're finished, we can then check if the value is still zero and do whatever is most appropriate. Detecting the pen can be a little tricky on most Wacom devices because the descriptors describe opaque blobs. Fortunately, older Wacom tablets have the HID_DG_DIGITIZER usage on the pen's application collection and newer tablets seem to have a similar vendor-defined usage that we can trigger on. Signed-off-by: Jason Gerecke --- drivers/hid/wacom_sys.c | 23 +-- drivers/hid/wacom_wac.c | 8 +--- drivers/hid/wacom_wac.h | 6 +- 3 files changed, 23 insertions(+), 14 deletions(-) diff --git a/drivers/hid/wacom_sys.c b/drivers/hid/wacom_sys.c index 222baf5..157aa7a 100644 --- a/drivers/hid/wacom_sys.c +++ b/drivers/hid/wacom_sys.c @@ -181,7 +181,11 @@ static void wacom_usage_mapping(struct hid_device *hdev, * X/Y values and some cases of invalid Digitizer X/Y * values commonly reported. */ - if (!pen && !finger) + if (pen) + features->device_type = BTN_TOOL_PEN; + else if (finger) + features->device_type = BTN_TOOL_FINGER; + else return; /* @@ -198,14 +202,11 @@ static void wacom_usage_mapping(struct hid_device *hdev, case HID_GD_X: features->x_max = field->logical_maximum; if (finger) { - features->device_type = BTN_TOOL_FINGER; features->x_phy = field->physical_maximum; if (features->type != BAMBOO_PT) { features->unit = field->unit; features->unitExpo = field->unit_exponent; } - } else { - features->device_type = BTN_TOOL_PEN; } break; case HID_GD_Y: @@ -425,7 +426,6 @@ static void wacom_retrieve_hid_descriptor(struct hid_device *hdev, struct usb_interface *intf = wacom->intf; /* default features */ - features->device_type = BTN_TOOL_PEN; features->x_fuzz = 4; features->y_fuzz = 4; features->pressure_fuzz = 0; @@ -446,10 +446,6 @@ static void wacom_retrieve_hid_descriptor(struct hid_device *hdev, } } - /* only devices that support touch need to retrieve the info */ - if (features->type < BAMBOO_PT) - return; - wacom_parse_hid(hdev, features); } @@ -1529,8 +1525,15 @@ static int wacom_probe(struct hid_device *hdev, /* Retrieve the physical and logical size for touch devices */ wacom_retrieve_hid_descriptor(hdev, features); - wacom_setup_device_quirks(wacom); + + if (!features->device_type && features->type != WIRELESS) { + dev_warn(&hdev->dev, "Unknown device_type for '%s'. %s.", +hdev->name, "Assuming pen"); + + features->device_type = BTN_TOOL_PEN; + } + wacom_calculate_res(features); wacom_update_name(wacom); diff --git a/drivers/hid/wacom_wac.c b/drivers/hid/wacom_wac.c index dff99ff..a52fc25 100644 --- a/drivers/hid/wacom_wac.c +++ b/drivers/hid/wacom_wac.c @@ -2186,13 +2186,15 @@ void wacom_setup_device_quirks(struct wacom *wacom) features->x_max = 4096; features->y_max = 4096; - } else { - features->device_type = BTN_TOOL_PEN; } } /* -* Same thing for Bamboo PAD +* Raw Wacom-mode pen and touch events both come from interface +* 0, whose HID descriptor has an application usage of 0xFF0D +* (i.e., WACOM_VENDORDEFINED_PEN). We route pen packets back +* out through the HID_GENERIC device created for interface 1, +* so rewrite this one to be of type BTN_TOOL_FINGER. */ if (features->type == BAMBOO_PAD) features->device_type = BTN_TOOL_FINGER; diff --git a/drivers/hid/wacom_wac.h b/drivers/hid/wacom_wac.h index f5a5f68..9a5ee62 100644 --- a/drivers/hid/wacom_wac.h +++ b/drivers/hid/wacom_wac.h @@ -72,10 +72,14 @@ #define WACOM_QUIRK_MONITOR0x0004 #define WACOM_QUIRK_BATTERY0x0008 +#define WACOM_VENDORDEFINED_PEN0xff0d0001 + #define WACOM_PEN_FIELD(f) (((f)->logical == HID_DG_STYLUS) || \
[PATCH 3/3] HID: wacom: Fail probe if HID_GENERIC device has unknown device_type
The last patch was careful to maintain backwards-compatible behavior by forcing device_type to BTN_TOOL_PEN (and printing a warning) if it were still uninitialized after scanning the HID descriptor and applying quirks. We should be more strict with HID_GENERIC devices, however, since there is no a priori guarantee that it is a tablet or touchpad. If the device_type is still uninitialized for a HID_GENERIC device then we assume that it isn't something the driver can work with and so fail the probe. Signed-off-by: Jason Gerecke --- drivers/hid/wacom_sys.c | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/drivers/hid/wacom_sys.c b/drivers/hid/wacom_sys.c index 157aa7a..7abf52c 100644 --- a/drivers/hid/wacom_sys.c +++ b/drivers/hid/wacom_sys.c @@ -1528,8 +1528,14 @@ static int wacom_probe(struct hid_device *hdev, wacom_setup_device_quirks(wacom); if (!features->device_type && features->type != WIRELESS) { + error = features->type == HID_GENERIC ? -ENODEV : 0; + dev_warn(&hdev->dev, "Unknown device_type for '%s'. %s.", -hdev->name, "Assuming pen"); +hdev->name, +error ? "Ignoring" : "Assuming pen"); + + if (error) + goto fail_shared_data; features->device_type = BTN_TOOL_PEN; } -- 2.3.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 1/3] HID: wacom: Do not add suffix to name of devices with an unknown type
The naming logic currently assumes that all devices will be a pen, finger, or pad. Though this has historically been the case, the new HID_GENERIC catch-all may cause us to probe devices with Wacom's 056A VID which aren't any of these types (e.g. the "Cintiq 24HDT Monitor Control"). This patch updates the logic so that no suffix will be added to the device name if the device type is unknown. Signed-off-by: Jason Gerecke --- drivers/hid/wacom_sys.c | 13 - 1 file changed, 8 insertions(+), 5 deletions(-) diff --git a/drivers/hid/wacom_sys.c b/drivers/hid/wacom_sys.c index 9c57ac0..222baf5 100644 --- a/drivers/hid/wacom_sys.c +++ b/drivers/hid/wacom_sys.c @@ -1440,12 +1440,15 @@ static void wacom_update_name(struct wacom *wacom) snprintf(wacom_wac->pad_name, sizeof(wacom_wac->pad_name), "%s Pad", wacom_wac->name); - if (features->device_type != BTN_TOOL_FINGER) + if (features->device_type == BTN_TOOL_PEN) { strlcat(wacom_wac->name, " Pen", WACOM_NAME_MAX); - else if (features->touch_max) - strlcat(wacom_wac->name, " Finger", WACOM_NAME_MAX); - else - strlcat(wacom_wac->name, " Pad", WACOM_NAME_MAX); + } + else if (features->device_type == BTN_TOOL_FINGER) { + if (features->touch_max) + strlcat(wacom_wac->name, " Finger", WACOM_NAME_MAX); + else + strlcat(wacom_wac->name, " Pad", WACOM_NAME_MAX); + } } static int wacom_probe(struct hid_device *hdev, -- 2.3.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RESEND] clk: Fix JSON output in debugfs
> Stephen Boyd hat am 1. Mai 2015 um 02:37 geschrieben: > > > On 04/29, Stefan Wahren wrote: > > key/value pairs in a JSON object must be separated by a comma. > > After adding the properties "accuracy" and "phase" the JSON output > > of /sys/kernel/debug/clk/clk_dump is invalid. > > > > So add the missing commas to fix it. > > > > Fixes: 5279fc4 ("clk: add clk accuracy retrieval support") > > Signed-off-by: Stefan Wahren > > Hmph, this regression is old, v3.14 days. We probably ought to > have a comment in here stating this should be JSON format. > > Applied to clk-next with the comment below squashed in. Thanks Stefan -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 02/20] libnd, nd_acpi: initial libnd infrastructure and NFIT support
On Thu, Apr 30, 2015 at 4:23 PM, Rafael J. Wysocki wrote: > On Tuesday, April 28, 2015 02:24:23 PM Dan Williams wrote: >> 1/ Autodetect an NFIT table for the ACPI namespace device with _HID of >>"ACPI0012" >> >> 2/ libnd bus registration >> >> The NFIT provided by ACPI is one possible method by which platforms will >> discover NVDIMM resources. However, the intent of the nd_bus_descriptor >> abstraction is to abstract "provider" specific details, leaving libnd >> to be independent of the specific NVDIMM resource discovery mechanism. >> This flexibility is later exploited later to implement custom-defined nd >> buses. >> >> Cc: >> Cc: Robert Moore >> Cc: Rafael J. Wysocki >> Signed-off-by: Dan Williams >> --- >> drivers/block/Kconfig |2 >> drivers/block/Makefile|1 >> drivers/block/nd/Kconfig | 40 +++ >> drivers/block/nd/Makefile |6 + >> drivers/block/nd/acpi.c | 475 >> + >> drivers/block/nd/acpi_nfit.h | 254 ++ >> drivers/block/nd/core.c | 67 ++ >> drivers/block/nd/libnd.h | 33 +++ >> drivers/block/nd/nd-private.h | 23 ++ >> 9 files changed, 901 insertions(+) >> create mode 100644 drivers/block/nd/Kconfig >> create mode 100644 drivers/block/nd/Makefile >> create mode 100644 drivers/block/nd/acpi.c >> create mode 100644 drivers/block/nd/acpi_nfit.h >> create mode 100644 drivers/block/nd/core.c >> create mode 100644 drivers/block/nd/libnd.h >> create mode 100644 drivers/block/nd/nd-private.h >> >> diff --git a/drivers/block/Kconfig b/drivers/block/Kconfig >> index eb1fed5bd516..dfe40e5ca9bd 100644 >> --- a/drivers/block/Kconfig >> +++ b/drivers/block/Kconfig >> @@ -321,6 +321,8 @@ config BLK_DEV_NVME >> To compile this driver as a module, choose M here: the >> module will be called nvme. >> >> +source "drivers/block/nd/Kconfig" >> + >> config BLK_DEV_SKD >> tristate "STEC S1120 Block Driver" >> depends on PCI >> diff --git a/drivers/block/Makefile b/drivers/block/Makefile >> index 9cc6c18a1c7e..07a6acecf4d8 100644 >> --- a/drivers/block/Makefile >> +++ b/drivers/block/Makefile >> @@ -24,6 +24,7 @@ obj-$(CONFIG_CDROM_PKTCDVD) += pktcdvd.o >> obj-$(CONFIG_MG_DISK)+= mg_disk.o >> obj-$(CONFIG_SUNVDC) += sunvdc.o >> obj-$(CONFIG_BLK_DEV_NVME) += nvme.o >> +obj-$(CONFIG_ND_DEVICES) += nd/ >> obj-$(CONFIG_BLK_DEV_SKD)+= skd.o >> obj-$(CONFIG_BLK_DEV_OSD)+= osdblk.o >> >> diff --git a/drivers/block/nd/Kconfig b/drivers/block/nd/Kconfig >> new file mode 100644 >> index ..6d5d6b732f82 >> --- /dev/null >> +++ b/drivers/block/nd/Kconfig >> @@ -0,0 +1,40 @@ >> +menuconfig ND_DEVICES >> + bool "NVDIMM Support" >> + depends on PHYS_ADDR_T_64BIT >> + help >> + Generic support for non-volatile memory devices including >> + ACPI-6-NFIT defined resources. On platforms that define an >> + NFIT, or otherwise can discover NVDIMM resources, a libnd >> + bus is registered to advertise PMEM (persistent memory) >> + namespaces (/dev/pmemX) and BLK (sliding mmio window(s)) >> + namespaces (/dev/ndX). A PMEM namespace refers to a memory >> + resource that may span multiple DIMMs and support DAX (see >> + CONFIG_DAX). A BLK namespace refers to an NVDIMM control >> + region which exposes an mmio register set for windowed >> + access mode to non-volatile memory. >> + >> +if ND_DEVICES >> + >> +config LIBND >> + tristate "LIBND: libnd device driver support" >> + help >> + Platform agnostic device model for a libnd bus. Publishes >> + resources for a PMEM (persistent-memory) driver and/or BLK >> + (sliding mmio window(s)) driver to attach. Exposes a device >> + topology under a "ndX" bus device, a "/dev/ndctlX" bus-ioctl >> + message passing interface, and a "/dev/nmemX" dimm-ioctl >> + message interface for each memory device registered on the >> + bus. instance. A userspace library "ndctl" provides an API >> + to enumerate/manage this subsystem. >> + >> +config ND_ACPI >> + tristate "ACPI: NFIT to libnd bus support" >> + select LIBND >> + depends on ACPI >> + help >> + Infrastructure to probe ACPI 6 compliant platforms for >> + NVDIMMs (NFIT) and register a libnd device tree. In >> + addition to storage devices this also enables libnd craft >> + ACPI._DSM messages for platform/dimm configuration. > > I'm wondering if the two CONFIG options above really need to be > user-selectable? > > For example, what reason people (who've already selected ND_DEVICES) may have > for not selecting ND_ACPI if ACPI is set? Later on in the series we introduce ND_E820 which supports creating a libnd-bus from e820-type-12 memory ranges on pre-NFIT systems. I'm also considering a configfs defined libnd-bus because e820 types are not nearly enough i