[PATCH]smp: Fix send func call IPI to empty cpu mask
I get below warning every day with 3.7, one or two times per day. [ 2235.186027] WARNING: at /mnt/sda7/kernel/linux/arch/x86/kernel/apic/ipi.c:109 default_send_IPI_mask_logical+0x2f/0xb8() [ 2235.186030] Hardware name: Aspire 4741 [ 2235.186032] empty IPI mask [ 2235.186034] Modules linked in: vboxpci(O) vboxnetadp(O) vboxnetflt(O) vboxdrv(O) nvidia(PO) wl(O) [ 2235.186046] Pid: 5542, comm: pool Tainted: P O 3.7.2+ #41 [ 2235.186049] Call Trace: [ 2235.186059] [] warn_slowpath_common+0x65/0x7a [ 2235.186064] [] ? default_send_IPI_mask_logical+0x2f/0xb8 [ 2235.186069] [] warn_slowpath_fmt+0x26/0x2a [ 2235.186074] [] default_send_IPI_mask_logical+0x2f/0xb8 [ 2235.186079] [] native_send_call_func_ipi+0x4f/0x57 [ 2235.186087] [] smp_call_function_many+0x191/0x1a9 [ 2235.186092] [] ? do_flush_tlb_all+0x3f/0x3f [ 2235.186097] [] native_flush_tlb_others+0x21/0x24 [ 2235.186101] [] flush_tlb_page+0x63/0x89 [ 2235.186105] [] ptep_set_access_flags+0x20/0x26 [ 2235.186111] [] do_wp_page+0x234/0x502 [ 2235.186117] [] ? T.2009+0x31/0x35 [ 2235.186121] [] handle_pte_fault+0x50d/0x54c [ 2235.186128] [] ? irq_exit+0x5f/0x61 [ 2235.186133] [] ? smp_call_function_interrupt+0x2c/0x2e [ 2235.186143] [] ? call_function_interrupt+0x2d/0x34 [ 2235.186148] [] handle_mm_fault+0xd0/0xe2 [ 2235.186153] [] __do_page_fault+0x411/0x42d [ 2235.186158] [] ? sys_futex+0xa9/0xee [ 2235.186162] [] ? __do_page_fault+0x42d/0x42d [ 2235.186166] [] do_page_fault+0x8/0xa [ 2235.186170] [] error_code+0x5a/0x60 [ 2235.186174] [] ? __do_page_fault+0x42d/0x42d [ 2235.186177] ---[ end trace 089b20858c3cb340 ]--- This patch fix it. This patch also fix some system hang problem: If the data->cpumask been cleared after pass if (WARN_ONCE(!mask, "empty IPI mask")) return; then the problem 83d349f3 fix will happen again. Signed-off-by: Wang YanQing --- kernel/smp.c | 10 +- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/kernel/smp.c b/kernel/smp.c index 29dd40a..7c56aba 100644 --- a/kernel/smp.c +++ b/kernel/smp.c @@ -33,6 +33,7 @@ struct call_function_data { struct call_single_data csd; atomic_trefs; cpumask_var_t cpumask; + cpumask_var_t cpumask_ipi; }; static DEFINE_PER_CPU_SHARED_ALIGNED(struct call_function_data, cfd_data); @@ -526,6 +527,13 @@ void smp_call_function_many(const struct cpumask *mask, return; } + /* +* After we put entry into list, data->cpumask +* may be cleared when others cpu respone other +* IPI for call function, then data->cpumask will +* be zero. +*/ + cpumask_copy(data->cpumask_ipi, data->cpumask); raw_spin_lock_irqsave(&call_function.lock, flags); /* * Place entry at the _HEAD_ of the list, so that any cpu still @@ -549,7 +557,7 @@ void smp_call_function_many(const struct cpumask *mask, smp_mb(); /* Send a message to all CPUs in the map */ - arch_send_call_function_ipi_mask(data->cpumask); + arch_send_call_function_ipi_mask(data->cpumask_ipi); /* Optionally wait for the CPUs to complete */ if (wait) -- 1.7.11.1.116.g8228a23 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch] edac: test correct variable in ->store function
We're testing for ->show but calling ->store(). Signed-off-by: Dan Carpenter diff --git a/drivers/edac/edac_pci_sysfs.c b/drivers/edac/edac_pci_sysfs.c index 7684426..e8658e4 100644 --- a/drivers/edac/edac_pci_sysfs.c +++ b/drivers/edac/edac_pci_sysfs.c @@ -256,7 +256,7 @@ static ssize_t edac_pci_dev_store(struct kobject *kobj, struct edac_pci_dev_attribute *edac_pci_dev; edac_pci_dev = (struct edac_pci_dev_attribute *)attr; - if (edac_pci_dev->show) + if (edac_pci_dev->store) return edac_pci_dev->store(edac_pci_dev->value, buffer, count); return -EIO; } -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Bug#695182: [PATCH] Subtract min_free_kbytes from dirtyable memory
Hi Paul, Ben Hutchings wrote: > If you can identify where it was fixed then your patch for older > versions should go to stable with a reference to the upstream fix (see > Documentation/stable_kernel_rules.txt). How about this patch? It was applied in mainline during the 3.3 merge window, so kernels newer than 3.2.y shouldn't need it. -- >8 -- From: Johannes Weiner Date: Tue, 10 Jan 2012 15:07:42 -0800 Subject: mm: exclude reserved pages from dirtyable memory commit ab8fabd46f811d5153d8a0cd2fac9a0d41fb593d upstream. Per-zone dirty limits try to distribute page cache pages allocated for writing across zones in proportion to the individual zone sizes, to reduce the likelihood of reclaim having to write back individual pages from the LRU lists in order to make progress. This patch: The amount of dirtyable pages should not include the full number of free pages: there is a number of reserved pages that the page allocator and kswapd always try to keep free. The closer (reclaimable pages - dirty pages) is to the number of reserved pages, the more likely it becomes for reclaim to run into dirty pages: +--+ --- | anon | | +--+ | | | | | | -- dirty limit new-- flusher new | file | | | | | | | | | -- dirty limit old-- flusher old | || +--+ --- reclaim | reserved | +--+ | kernel | +--+ This patch introduces a per-zone dirty reserve that takes both the lowmem reserve as well as the high watermark of the zone into account, and a global sum of those per-zone values that is subtracted from the global amount of dirtyable pages. The lowmem reserve is unavailable to page cache allocations and kswapd tries to keep the high watermark free. We don't want to end up in a situation where reclaim has to clean pages in order to balance zones. Not treating reserved pages as dirtyable on a global level is only a conceptual fix. In reality, dirty pages are not distributed equally across zones and reclaim runs into dirty pages on a regular basis. But it is important to get this right before tackling the problem on a per-zone level, where the distance between reclaim and the dirty pages is mostly much smaller in absolute numbers. [a...@linux-foundation.org: fix highmem build] Signed-off-by: Johannes Weiner Reviewed-by: Rik van Riel Reviewed-by: Michal Hocko Reviewed-by: Minchan Kim Acked-by: Mel Gorman Cc: KAMEZAWA Hiroyuki Cc: Christoph Hellwig Cc: Wu Fengguang Cc: Dave Chinner Cc: Jan Kara Cc: Shaohua Li Cc: Chris Mason Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Jonathan Nieder --- include/linux/mmzone.h | 6 ++ include/linux/swap.h | 1 + mm/page-writeback.c| 5 +++-- mm/page_alloc.c| 19 +++ 4 files changed, 29 insertions(+), 2 deletions(-) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 25842b6e72e1..a594af3278bc 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -319,6 +319,12 @@ struct zone { */ unsigned long lowmem_reserve[MAX_NR_ZONES]; + /* +* This is a per-zone reserve of pages that should not be +* considered dirtyable memory. +*/ + unsigned long dirty_balance_reserve; + #ifdef CONFIG_NUMA int node; /* diff --git a/include/linux/swap.h b/include/linux/swap.h index 67b3fa308988..3e60228e7299 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -207,6 +207,7 @@ struct swap_list_t { /* linux/mm/page_alloc.c */ extern unsigned long totalram_pages; extern unsigned long totalreserve_pages; +extern unsigned long dirty_balance_reserve; extern unsigned int nr_free_buffer_pages(void); extern unsigned int nr_free_pagecache_pages(void); diff --git a/mm/page-writeback.c b/mm/page-writeback.c index 50f08241f981..f620e7b0dc26 100644 --- a/mm/page-writeback.c +++ b/mm/page-writeback.c @@ -320,7 +320,7 @@ static unsigned long highmem_dirtyable_memory(unsigned long total) &NODE_DATA(node)->node_zones[ZONE_HIGHMEM]; x += zone_page_state(z, NR_FREE_PAGES) + -zone_reclaimable_pages(z); +zone_reclaimable_pages(z) - z->dirty_balance_reserve; } /* * Make sure that the number of highmem pages is never larger @@ -344,7 +344,8 @@ unsigned long determine_dirtyable_memory(void) { unsigned long x; - x = global_page_state(NR_FREE_PAGES) + global_reclaimable_pages(); + x = global_page_state(NR_FREE_PAGES) + global_reclaimable_pages() - + dirty_balance_reserve; if (!vm_highmem_is_dirtyable) x -= highmem_dirtyable_memory(x); diff --git a/mm/page_alloc.c b/mm/pag
Re: [PATCH 1/2] spi: spi-gpio: Add checks for the dt properties
On Fri, Jan 25, 2013 at 09:39:34AM +0100, Maxime Ripard wrote: > The bindings assumed that the gpios properties were always there, which > made the NO_TX and NO_RX mode not usable from device tree. Add extra > checks to make sure that the driver can work if either MOSI or MISO is > not used. Applied, thanks. signature.asc Description: Digital signature
Re: [PATCH] regulators/db8500: Fix compile failure for drivers/regulator/dbx500-prcmu.c
On Thu, Jan 24, 2013 at 10:29:26AM -0500, Steven Rostedt wrote: > Building for the snowball board, I ran into this compile failure: Applied, thanks. Please use subject lines appropriate for the subsystem (I see I let the original one through). signature.asc Description: Digital signature
Re: [PATCH v3 04/10] spi/pxa2xx: convert to the common clk framework
On Tue, Jan 22, 2013 at 12:26:27PM +0200, Mika Westerberg wrote: > Convert clk_enable() to clk_prepare_enable() and clk_disable() to > clk_disable_unprepare() respectively in order to support the common clk > framework. Otherwise we get warnings on the console as the clock is not > prepared before it is enabled. Applied, thanks. signature.asc Description: Digital signature
Re: [PATCH v3 03/10] spi/pxa2xx: convert to the pump message infrastructure
On Tue, Jan 22, 2013 at 12:26:26PM +0200, Mika Westerberg wrote: > The SPI core provides infrastructure for standard message queueing so use > that instead of handling everything in the driver. This simplifies the > driver. Applied, thanks. signature.asc Description: Digital signature
Re: [PATCH v3 02/10] spi/pxa2xx: fix warnings when compiling a 64-bit kernel
On Tue, Jan 22, 2013 at 12:26:25PM +0200, Mika Westerberg wrote: > Fix following warnings seen when compiling 64-bit: Applied, thanks. signature.asc Description: Digital signature
Re: [PATCH v3 01/10] spi/pxa2xx: allow building on a 64-bit kernel
On Tue, Jan 22, 2013 at 12:26:24PM +0200, Mika Westerberg wrote: > We are going to use it on 64-bit kernel on Intel Lynxpoint so make sure we > can build it into such kernel. Applied, thanks. signature.asc Description: Digital signature
Re: [PATCH -v4 0/5] x86,smp: make ticket spinlock proportional backoff w/ auto tuning
On Fri, 2013-01-25 at 14:05 -0500, Rik van Riel wrote: > The performance issue observed with AIM7 is still a mystery. Hm. AIM7 mystery _may_ be the same crud I see on a 4 node 40 core box. Stock scheduler knobs are too preempt happy, produce unstable results. I twiddle them as below to stabilize results. I'm testing a load balancing series from Alex Shi with AIM7 and whatnot, added your series on top of it and retested. What I see is improvement. Oodles of numbers follow. Sorry that your numbers are mixed in with my numbers, but this is just an excerpt from my test log, and I'm too lazy to reformat and filter. You can save wear and tear on your eyeballs by just poking 'D'. There does appear to be evidence that your patch set improved this load though, so in case you want to see numbers, here come a bunch, a quick scroll-by may be worth it. The very heavy load end did not improve, which seems odd, but whatever. Numbers... sched_latency_ns = 24ms sched_min_granularity_ns = 8ms sched_wakeup_granularity_ns = 10ms aim7 compute 3.8.0-performance 3.8.0-balance 3.8.0-powersaving Tasksjobs/min jti jobs/min/task real cpujobs/min jti jobs/min/task real cpujobs/min jti jobs/min/task real cpu 1 432.86 100 432.8571 14.00 3.99 433.48 100 433.4764 13.98 3.97 433.17 100 433.1665 13.99 3.98 1 437.23 100 437.2294 13.86 3.85 436.60 100 436.5994 13.88 3.86 435.66 100 435.6578 13.91 3.90 1 434.10 100 434.0974 13.96 3.95 436.29 100 436.2851 13.89 3.89 436.29 100 436.2851 13.89 3.87 5 2400.95 99 480.1902 12.62 12.49 2554.81 98 510.9612 11.86 7.55 2487.68 98 497.5369 12.18 8.22 5 2341.58 99 468.3153 12.94 13.95 2578.72 99 515.7447 11.75 7.25 2527.11 99 505.4212 11.99 7.90 5 2350.66 99 470.1319 12.89 13.66 2600.86 99 520.1717 11.65 7.09 2508.28 98 501.6556 12.08 8.24 10 4291.78 99 429.1785 14.12 40.14 5334.51 99 533.4507 11.36 11.13 5183.92 98 518.3918 11.69 12.15 10 4334.76 99 433.4764 13.98 38.70 5311.13 99 531.1131 11.41 11.23 5215.15 99 521.5146 11.62 12.53 10 4273.62 99 427.3625 14.18 40.29 5287.96 99 528.7958 11.46 11.46 5144.31 98 514.4312 11.78 12.32 20 8487.39 94 424.3697 14.28 63.1410594.41 99 529.7203 11.44 23.7210575.92 99 528.7958 11.46 22.08 20 8387.54 97 419.3772 14.45 77.0110575.92 98 528.7958 11.46 23.4110520.83 99 526.0417 11.52 21.88 20 8713.16 95 435.6578 13.91 55.1010659.63 99 532.9815 11.37 24.1710539.13 99 526.9565 11.50 22.13 4016786.70 99 419.6676 14.44170.0819469.88 98 486.7470 12.45 60.7819967.05 98 499.1763 12.14 51.40 4016728.78 99 418.2195 14.49172.9619627.53 98 490.6883 12.35 65.2620386.88 98 509.6720 11.89 46.91 4016763.49 99 419.0871 14.46171.4220033.06 98 500.8264 12.10 51.4420682.59 98 517.0648 11.72 42.45 8033024.52 98 412.8065 14.68355.1033205.48 98 415.0685 14.60336.9033690.06 97 421.1258 14.39 248.91 8033002.04 99 412.5255 14.69356.2733949.58 96 424.3697 14.28283.8733160.05 97 414.5007 14.62 264.85 8033047.03 99 413.0879 14.67355.2233137.39 98 414.2174 14.63338.9233526.97 97 419.0871 14.46 257.31 16064254.47 98 401.5905 15.09391.3064000.00 98 400. 15.15396.8765073.83 97 406.7114 14.90 371.09 16064468.09 98 402.9255 15.04390.2864553.93 98 403.4621 15.02389.4964640.00 98 404. 15.00 379.82 16064297.08 98 401.8568 15.08389.4564856.19 98 405.3512 14.95383.6464683.12 98 404.2695 14.99 379.43 320 121579.94 98
Re: [PATCH RESEND] ARM: dts: max77686: Add DTS file for max77686 PMIC
Hello Mark, Yes, this is not ARM-specific chip at all. Just wanted to be reviewed by you and others if the format is ok before integrating to my board file. I had sent similar one before, https://patchwork.kernel.org/patch/1287711, and you advised that was too board specific. And plan to integrate like OMAP boards have with twl6030.dtsi and twl6040.dtsi. If would be nice if somewhere you specify a directory for such device files. Regards, Dongjin. On Sat, Jan 26, 2013 at 2:06 PM, Mark Brown wrote: > On Fri, Jan 25, 2013 at 03:46:08AM +0900, Dongjin Kim wrote: > >> --- >> arch/arm/boot/dts/max77686.dtsi | 156 >> +++ > > Why is this in arch/arm? This isn't an ARM-specific chip. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC PATCH 2/4] lib: add support for LZ4-compressed kernels
This patch adds support for extracting LZ4-compressed kernel images, as well as LZ4-compressed ramdisk images in the kernel boot process. This depends on the patch below decompressors: add lz4 decompressor module Signed-off-by: Kyungsik Lee --- include/linux/decompress/unlz4.h | 10 ++ init/Kconfig | 13 ++- lib/Kconfig | 7 ++ lib/Makefile | 2 + lib/decompress.c | 5 + lib/decompress_unlz4.c | 199 +++ lib/lz4/Makefile | 1 + lib/lz4/lz4_decompress.c | 2 +- scripts/Makefile.lib | 5 + usr/Kconfig | 9 ++ 10 files changed, 251 insertions(+), 2 deletions(-) create mode 100644 include/linux/decompress/unlz4.h create mode 100644 lib/decompress_unlz4.c create mode 100644 lib/lz4/Makefile diff --git a/include/linux/decompress/unlz4.h b/include/linux/decompress/unlz4.h new file mode 100644 index 000..d5b68bf --- /dev/null +++ b/include/linux/decompress/unlz4.h @@ -0,0 +1,10 @@ +#ifndef DECOMPRESS_UNLZ4_H +#define DECOMPRESS_UNLZ4_H + +int unlz4(unsigned char *inbuf, int len, + int(*fill)(void*, unsigned int), + int(*flush)(void*, unsigned int), + unsigned char *output, + int *pos, + void(*error)(char *x)); +#endif diff --git a/init/Kconfig b/init/Kconfig index 1aefe1a..be3753e 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -102,10 +102,13 @@ config HAVE_KERNEL_XZ config HAVE_KERNEL_LZO bool +config HAVE_KERNEL_LZ4 + bool + choice prompt "Kernel compression mode" default KERNEL_GZIP - depends on HAVE_KERNEL_GZIP || HAVE_KERNEL_BZIP2 || HAVE_KERNEL_LZMA || HAVE_KERNEL_XZ || HAVE_KERNEL_LZO + depends on HAVE_KERNEL_GZIP || HAVE_KERNEL_BZIP2 || HAVE_KERNEL_LZMA || HAVE_KERNEL_XZ || HAVE_KERNEL_LZO || HAVE_KERNEL_LZ4 help The linux kernel is a kind of self-extracting executable. Several compression algorithms are available, which differ @@ -172,6 +175,14 @@ config KERNEL_LZO size is about 10% bigger than gzip; however its speed (both compression and decompression) is the fastest. +config KERNEL_LZ4 + bool "LZ4" + depends on HAVE_KERNEL_LZ4 + help + Its compression ratio is worse than LZO. The size of the kernel + is about 5% bigger than LZO. But the decompression speed is + faster than LZO. + endchoice config DEFAULT_HOSTNAME diff --git a/lib/Kconfig b/lib/Kconfig index 75cdb77..b108047 100644 --- a/lib/Kconfig +++ b/lib/Kconfig @@ -189,6 +189,9 @@ config LZO_COMPRESS config LZO_DECOMPRESS tristate +config LZ4_DECOMPRESS + tristate + source "lib/xz/Kconfig" # @@ -213,6 +216,10 @@ config DECOMPRESS_LZO select LZO_DECOMPRESS tristate +config DECOMPRESS_LZ4 + select LZ4_DECOMPRESS + tristate + # # Generic allocator support is selected if needed # diff --git a/lib/Makefile b/lib/Makefile index 02ed6c0..c2073bf 100644 --- a/lib/Makefile +++ b/lib/Makefile @@ -72,6 +72,7 @@ obj-$(CONFIG_REED_SOLOMON) += reed_solomon/ obj-$(CONFIG_BCH) += bch.o obj-$(CONFIG_LZO_COMPRESS) += lzo/ obj-$(CONFIG_LZO_DECOMPRESS) += lzo/ +obj-$(CONFIG_LZ4_DECOMPRESS) += lz4/ obj-$(CONFIG_XZ_DEC) += xz/ obj-$(CONFIG_RAID6_PQ) += raid6/ @@ -80,6 +81,7 @@ lib-$(CONFIG_DECOMPRESS_BZIP2) += decompress_bunzip2.o lib-$(CONFIG_DECOMPRESS_LZMA) += decompress_unlzma.o lib-$(CONFIG_DECOMPRESS_XZ) += decompress_unxz.o lib-$(CONFIG_DECOMPRESS_LZO) += decompress_unlzo.o +lib-$(CONFIG_DECOMPRESS_LZ4) += decompress_unlz4.o obj-$(CONFIG_TEXTSEARCH) += textsearch.o obj-$(CONFIG_TEXTSEARCH_KMP) += ts_kmp.o diff --git a/lib/decompress.c b/lib/decompress.c index 31a8042..c70810e 100644 --- a/lib/decompress.c +++ b/lib/decompress.c @@ -11,6 +11,7 @@ #include #include #include +#include #include #include @@ -31,6 +32,9 @@ #ifndef CONFIG_DECOMPRESS_LZO # define unlzo NULL #endif +#ifndef CONFIG_DECOMPRESS_LZ4 +# define unlz4 NULL +#endif struct compress_format { unsigned char magic[2]; @@ -45,6 +49,7 @@ static const struct compress_format compressed_formats[] __initdata = { { {0x5d, 0x00}, "lzma", unlzma }, { {0xfd, 0x37}, "xz", unxz }, { {0x89, 0x4c}, "lzo", unlzo }, + { {0x02, 0x21}, "lz4", unlz4 }, { {0, 0}, NULL, NULL } }; diff --git a/lib/decompress_unlz4.c b/lib/decompress_unlz4.c new file mode 100644 index 000..6b6a8d0 --- /dev/null +++ b/lib/decompress_unlz4.c @@ -0,0 +1,199 @@ +/* + * LZ4 decompressor for the Linux kernel. + * + * Linux kernel adaptation: + * Copyright (C) 2013, LG Electronics, Kyungsik Lee + * + * Based on LZ4 implementation by Yann Collet. + * + * LZ4 - Fast LZ compression algorithm + * Copyright (C) 2011-2012, Yann Collet. + * BSD 2-Clause License (http://www.opensource.org/licenses/bsd-license.php) + * + * Redistribution an
[RFC PATCH 3/4] arm: add support for LZ4-compressed kernels
This patch integrates the LZ4 decompression code to the arm pre-boot code. And it depends on two patchs below lib: add support for LZ4-compressed kernels decompressors: add lz4 decompressor module Signed-off-by: Kyungsik Lee --- arch/arm/Kconfig | 1 + arch/arm/boot/compressed/.gitignore | 1 + arch/arm/boot/compressed/Makefile | 3 ++- arch/arm/boot/compressed/decompress.c | 4 arch/arm/boot/compressed/piggy.lz4.S | 6 ++ 5 files changed, 14 insertions(+), 1 deletion(-) create mode 100644 arch/arm/boot/compressed/piggy.lz4.S diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index 91f8d78..1b3621d 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -37,6 +37,7 @@ config ARM select HAVE_HW_BREAKPOINT if (PERF_EVENTS && (CPU_V6 || CPU_V6K || CPU_V7)) select HAVE_IDE if PCI || ISA || PCMCIA select HAVE_KERNEL_GZIP + select HAVE_KERNEL_LZ4 select HAVE_KERNEL_LZMA select HAVE_KERNEL_LZO select HAVE_KERNEL_XZ diff --git a/arch/arm/boot/compressed/.gitignore b/arch/arm/boot/compressed/.gitignore index f79a08e..47279aa 100644 --- a/arch/arm/boot/compressed/.gitignore +++ b/arch/arm/boot/compressed/.gitignore @@ -6,6 +6,7 @@ piggy.gzip piggy.lzo piggy.lzma piggy.xzkern +piggy.lz4 vmlinux vmlinux.lds diff --git a/arch/arm/boot/compressed/Makefile b/arch/arm/boot/compressed/Makefile index 5cad8a6..8b5c79a 100644 --- a/arch/arm/boot/compressed/Makefile +++ b/arch/arm/boot/compressed/Makefile @@ -88,6 +88,7 @@ suffix_$(CONFIG_KERNEL_GZIP) = gzip suffix_$(CONFIG_KERNEL_LZO) = lzo suffix_$(CONFIG_KERNEL_LZMA) = lzma suffix_$(CONFIG_KERNEL_XZ) = xzkern +suffix_$(CONFIG_KERNEL_LZ4) = lz4 # Borrowed libfdt files for the ATAG compatibility mode @@ -112,7 +113,7 @@ targets := vmlinux vmlinux.lds \ font.o font.c head.o misc.o $(OBJS) # Make sure files are removed during clean -extra-y += piggy.gzip piggy.lzo piggy.lzma piggy.xzkern \ +extra-y += piggy.gzip piggy.lzo piggy.lzma piggy.xzkern piggy.lz4 \ lib1funcs.S ashldi3.S $(libfdt) $(libfdt_hdrs) ifeq ($(CONFIG_FUNCTION_TRACER),y) diff --git a/arch/arm/boot/compressed/decompress.c b/arch/arm/boot/compressed/decompress.c index 9deb56a..a95f071 100644 --- a/arch/arm/boot/compressed/decompress.c +++ b/arch/arm/boot/compressed/decompress.c @@ -53,6 +53,10 @@ extern char * strstr(const char * s1, const char *s2); #include "../../../../lib/decompress_unxz.c" #endif +#ifdef CONFIG_KERNEL_LZ4 +#include "../../../../lib/decompress_unlz4.c" +#endif + int do_decompress(u8 *input, int len, u8 *output, void (*error)(char *x)) { return decompress(input, len, NULL, NULL, output, NULL, error); diff --git a/arch/arm/boot/compressed/piggy.lz4.S b/arch/arm/boot/compressed/piggy.lz4.S new file mode 100644 index 000..3d9a575 --- /dev/null +++ b/arch/arm/boot/compressed/piggy.lz4.S @@ -0,0 +1,6 @@ + .section .piggydata,#alloc + .globl input_data +input_data: + .incbin "arch/arm/boot/compressed/piggy.lz4" + .globl input_data_end +input_data_end: -- 1.8.0.3 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC PATCH 1/4] decompressors: add lz4 decompressor module
This patch adds support for LZ4 decompression in the kernel. LZ4 Decompression APIs for kernel are based on LZ4 implementation by Yann Collet. LZ4 homepage : http://fastcompression.blogspot.com/p/lz4.html LZ4 source repository : http://code.google.com/p/lz4/ Signed-off-by: Kyungsik Lee --- include/linux/lz4.h | 62 +++ lib/lz4/lz4_decompress.c | 199 +++ lib/lz4/lz4defs.h| 129 ++ 3 files changed, 390 insertions(+) create mode 100644 include/linux/lz4.h create mode 100644 lib/lz4/lz4_decompress.c create mode 100644 lib/lz4/lz4defs.h diff --git a/include/linux/lz4.h b/include/linux/lz4.h new file mode 100644 index 000..df03dd8 --- /dev/null +++ b/include/linux/lz4.h @@ -0,0 +1,62 @@ +#ifndef __LZ4_H__ +#define __LZ4_H__ +/* + * LZ4 Decompressor Kernel Interface + * + * Copyright (C) 2013, LG Electronics, Kyungsik Lee + * Based on LZ4 implementation by Yann Collet. + * + * LZ4 - Fast LZ compression algorithm + * Copyright (C) 2011-2012, Yann Collet. + * BSD 2-Clause License (http://www.opensource.org/licenses/bsd-license.php) + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions are + * met: + * + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * * Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following disclaimer + * in the documentation and/or other materials provided with the + * distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + * + * You can contact the author at : + * - LZ4 homepage : http://fastcompression.blogspot.com/p/lz4.html + * - LZ4 source repository : http://code.google.com/p/lz4/ + */ + + +/* + * LZ4_COMPRESSBOUND() + * Provides the maximum size that LZ4 may output in a "worst case" scenario + * (input data not compressible) + */ +#define LZ4_COMPRESSBOUND(isize) (isize + ((isize)/255) + 16) + +/* + * lz4_decompress() + * src : source address of the compressed data + * src_len : is the input size, therefore the compressed size + * dest: output buffer address of the decompressed data + * dest_len: is the size of the destination buffer + * (which must be already allocated) + * return : Success if return 0 + * Error if return (< 0) + * note : Destination buffer must be already allocated. + */ +int lz4_decompress(const char *src, size_t src_len, char *dest, + size_t *dest_len); +#endif diff --git a/lib/lz4/lz4_decompress.c b/lib/lz4/lz4_decompress.c new file mode 100644 index 000..e8beb6b --- /dev/null +++ b/lib/lz4/lz4_decompress.c @@ -0,0 +1,199 @@ +/* + * LZ4 Decompressor for Linux kernel + * + * Copyright (C) 2013 LG Electronics Co., Ltd. (http://www.lge.com/) + * + * Based on LZ4 implementation by Yann Collet. + * + * LZ4 - Fast LZ compression algorithm + * Copyright (C) 2011-2012, Yann Collet. + * BSD 2-Clause License (http://www.opensource.org/licenses/bsd-license.php) + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions are + * met: + * + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * * Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following disclaimer + * in the documentation and/or other materials provided with the + * distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + * DATA, O
[RFC PATCH 4/4] x86: add support for LZ4-compressed kernels
This patch integrates the LZ4 decompression code to the x86 pre-boot code. And it depends on two patchs below lib: add support for LZ4-compressed kernels decompressors: add lz4 decompressor module Signed-off-by: Kyungsik Lee --- arch/x86/Kconfig | 1 + arch/x86/boot/compressed/Makefile | 5 - arch/x86/boot/compressed/misc.c | 4 3 files changed, 9 insertions(+), 1 deletion(-) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 8c185d0..7142bef 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -62,6 +62,7 @@ config X86 select HAVE_KERNEL_LZMA select HAVE_KERNEL_XZ select HAVE_KERNEL_LZO + select HAVE_KERNEL_LZ4 select HAVE_HW_BREAKPOINT select HAVE_MIXED_BREAKPOINTS_REGS select PERF_EVENTS diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile index 8a84501..c275db5 100644 --- a/arch/x86/boot/compressed/Makefile +++ b/arch/x86/boot/compressed/Makefile @@ -4,7 +4,7 @@ # create a compressed vmlinux image from the original vmlinux # -targets := vmlinux.lds vmlinux vmlinux.bin vmlinux.bin.gz vmlinux.bin.bz2 vmlinux.bin.lzma vmlinux.bin.xz vmlinux.bin.lzo head_$(BITS).o misc.o string.o cmdline.o early_serial_console.o piggy.o +targets := vmlinux.lds vmlinux vmlinux.bin vmlinux.bin.gz vmlinux.bin.bz2 vmlinux.bin.lzma vmlinux.bin.xz vmlinux.bin.lzo vmlinux.bin.lz4 head_$(BITS).o misc.o string.o cmdline.o early_serial_console.o piggy.o KBUILD_CFLAGS := -m$(BITS) -D__KERNEL__ $(LINUX_INCLUDE) -O2 KBUILD_CFLAGS += -fno-strict-aliasing -fPIC @@ -64,12 +64,15 @@ $(obj)/vmlinux.bin.xz: $(vmlinux.bin.all-y) FORCE $(call if_changed,xzkern) $(obj)/vmlinux.bin.lzo: $(vmlinux.bin.all-y) FORCE $(call if_changed,lzo) +$(obj)/vmlinux.bin.lz4: $(vmlinux.bin.all-y) FORCE + $(call if_changed,lz4) suffix-$(CONFIG_KERNEL_GZIP) := gz suffix-$(CONFIG_KERNEL_BZIP2) := bz2 suffix-$(CONFIG_KERNEL_LZMA) := lzma suffix-$(CONFIG_KERNEL_XZ) := xz suffix-$(CONFIG_KERNEL_LZO):= lzo +suffix-$(CONFIG_KERNEL_LZ4):= lz4 quiet_cmd_mkpiggy = MKPIGGY $@ cmd_mkpiggy = $(obj)/mkpiggy $< > $@ || ( rm -f $@ ; false ) diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c index 88f7ff6..166a0a8 100644 --- a/arch/x86/boot/compressed/misc.c +++ b/arch/x86/boot/compressed/misc.c @@ -145,6 +145,10 @@ static int lines, cols; #include "../../../../lib/decompress_unlzo.c" #endif +#ifdef CONFIG_KERNEL_LZ4 +#include "../../../../lib/decompress_unlz4.c" +#endif + static void scroll(void) { int i; -- 1.8.0.3 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCHv2] ARM: mxs: dt: Add Crystalfontz CFA-10037 device tree support
On Fri, Jan 25, 2013 at 10:00:35AM +0100, Maxime Ripard wrote: > The CFA-10037 is another expansion board for the CFA-10036 module, with > only a USB Host, a Ethernet device and a lot of gpios. > > Signed-off-by: Maxime Ripard Applied, thanks. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 1/3] pwm: Add pwm_cansleep() as exported API to users
On Fri, Jan 25, 2013 at 02:44:29PM +0100, Florian Vaussard wrote: > Calls to some external PWM chips can sleep. To help users, > add pwm_cansleep() API. > > Signed-off-by: Florian Vaussard > --- > drivers/pwm/core.c | 12 > include/linux/pwm.h | 10 ++ > 2 files changed, 22 insertions(+), 0 deletions(-) > > diff --git a/drivers/pwm/core.c b/drivers/pwm/core.c > index 4a13da4..e737f5f 100644 > --- a/drivers/pwm/core.c > +++ b/drivers/pwm/core.c > @@ -763,6 +763,18 @@ void devm_pwm_put(struct device *dev, struct pwm_device > *pwm) > } > EXPORT_SYMBOL_GPL(devm_pwm_put); > > +/** > + * pwm_cansleep() - report whether pwm access will sleep "... whether PWM access..." please. > + * @pwm: PWM device > + * > + * It returns nonzero if accessing the PWM can sleep. > + */ > +int pwm_cansleep(struct pwm_device *pwm) I actually liked pwm_can_sleep() better. I find it to be more consistent with the naming of other function names. It would furthermore match the field name. > +{ > + return pwm->chip->can_sleep; > +} > +EXPORT_SYMBOL_GPL(pwm_cansleep); Would it make sense to check for NULL pointers here? I guess that passing NULL into the function could be considered a programming error and an oops would be okay, but in that case there's no point in making the function return an int. Also see my next comment. > + > #ifdef CONFIG_DEBUG_FS > static void pwm_dbg_show(struct pwm_chip *chip, struct seq_file *s) > { > diff --git a/include/linux/pwm.h b/include/linux/pwm.h > index 70655a2..e2cb5c7 100644 > --- a/include/linux/pwm.h > +++ b/include/linux/pwm.h > @@ -146,6 +146,8 @@ struct pwm_ops { > * @base: number of first PWM controlled by this chip > * @npwm: number of PWMs controlled by this chip > * @pwms: array of PWM devices allocated by the framework > + * @can_sleep: flag must be set iff config()/enable()/disable() methods > sleep, > + * as they must while accessing PWM chips over I2C or SPI > */ > struct pwm_chip { > struct device *dev; > @@ -159,6 +161,7 @@ struct pwm_chip { > struct pwm_device * (*of_xlate)(struct pwm_chip *pc, > const struct of_phandle_args *args); > unsigned intof_pwm_n_cells; > + unsigned intcan_sleep:1; What's the reason for making this a bitfield? Couldn't we just use a bool instead? Thierry pgpfBhH2wr1GB.pgp Description: PGP signature
[PATCH] regulator: lp8755: Use LP8755_BUCK_MAX instead of magic number
Signed-off-by: Axel Lin --- drivers/regulator/lp8755.c |6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/regulator/lp8755.c b/drivers/regulator/lp8755.c index 8b1ce0f..f0f6ea0 100644 --- a/drivers/regulator/lp8755.c +++ b/drivers/regulator/lp8755.c @@ -373,7 +373,7 @@ static irqreturn_t lp8755_irq_handler(int irq, void *data) goto err_i2c; /* sent power fault detection event to specific regulator */ - for (icnt = 0; icnt < 6; icnt++) + for (icnt = 0; icnt < LP8755_BUCK_MAX; icnt++) if ((flag0 & (0x4 << icnt)) && (pchip->irqmask & (0x04 << icnt)) && (pchip->rdev[icnt] != NULL)) @@ -508,7 +508,7 @@ err_irq: err_regulator: /* output disable */ - for (icnt = 0; icnt < 0x06; icnt++) + for (icnt = 0; icnt < LP8755_BUCK_MAX; icnt++) lp8755_write(pchip, icnt, 0x00); return ret; @@ -522,7 +522,7 @@ static int lp8755_remove(struct i2c_client *client) for (icnt = 0; icnt < mphase_buck[pchip->mphase].nreg; icnt++) regulator_unregister(pchip->rdev[icnt]); - for (icnt = 0; icnt < 0x06; icnt++) + for (icnt = 0; icnt < LP8755_BUCK_MAX; icnt++) lp8755_write(pchip, icnt, 0x00); if (pchip->irq != 0) -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] mmc: fix to refer NULL pointer
Check whether host->sdio_irq_thread is NULL before wake_up_process() is called about host->sdio_irq_thread. Signed-off-by: Joonyoung Shim --- Currently the kernel panic to refer NULL pointer about host->sdio_irq_thread are occuring at the trats board using Samsung SDHCI driver. include/linux/mmc/host.h |3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/include/linux/mmc/host.h b/include/linux/mmc/host.h index 61a10c1..2950fea 100644 --- a/include/linux/mmc/host.h +++ b/include/linux/mmc/host.h @@ -372,7 +372,8 @@ static inline void mmc_signal_sdio_irq(struct mmc_host *host) { host->ops->enable_sdio_irq(host, 0); host->sdio_irq_pending = true; - wake_up_process(host->sdio_irq_thread); + if (host->sdio_irq_thread) + wake_up_process(host->sdio_irq_thread); } #ifdef CONFIG_REGULATOR -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 5/6] ARM: regulator: add tps6507x device tree data
On Fri, Jan 25, 2013 at 06:29:49AM +, Vishwanathrao Badarkhe, Manish wrote: > On Thu, Jan 24, 2013 at 17:30:51, Mark Brown wrote: > I too doubt that whether it should be in architecture specific folder, > My code is in reference to below patch: > arm/dts: regulator: Add tps65910 device tree > data(d5d08e2e1672da627d7c9d34a9dc1089c653e23a) > Could you please suggest me if it can be moved somewhere else? We should have somewhere to put this sort of generic stuff, yes. Not sure where, possibly under drivers/of or some non-drivers part of the tree. signature.asc Description: Digital signature
Re: [PATCH RESEND] ARM: dts: max77686: Add DTS file for max77686 PMIC
On Fri, Jan 25, 2013 at 03:46:08AM +0900, Dongjin Kim wrote: > --- > arch/arm/boot/dts/max77686.dtsi | 156 > +++ Why is this in arch/arm? This isn't an ARM-specific chip. signature.asc Description: Digital signature
[PATCH v2] mm: clean up soft_offline_page()
Currently soft_offline_page() is hard to maintain because it has many return points and goto statements. All of this mess come from get_any_page(). This function should only get page refcount as the name implies, but it does some page isolating actions like SetPageHWPoison() and dequeuing hugepage. This patch corrects it and introduces some internal subroutines to make soft offlining code more readable and maintainable. ChangeLog v2: - receive returned value from __soft_offline_page and soft_offline_huge_page - place __soft_offline_page after soft_offline_page to reduce the diff - rebased onto mmotm-2013-01-23-17-04 - add comment on double checks of PageHWpoison Signed-off-by: Naoya Horiguchi --- mm/memory-failure.c | 154 1 file changed, 83 insertions(+), 71 deletions(-) diff --git mmotm-2013-01-23-17-04.orig/mm/memory-failure.c mmotm-2013-01-23-17-04/mm/memory-failure.c index c95e19a..302625b 100644 --- mmotm-2013-01-23-17-04.orig/mm/memory-failure.c +++ mmotm-2013-01-23-17-04/mm/memory-failure.c @@ -1368,7 +1368,7 @@ static struct page *new_page(struct page *p, unsigned long private, int **x) * that is not free, and 1 for any other page type. * For 1 the page is returned with increased page count, otherwise not. */ -static int get_any_page(struct page *p, unsigned long pfn, int flags) +static int __get_any_page(struct page *p, unsigned long pfn, int flags) { int ret; @@ -1393,11 +1393,9 @@ static int get_any_page(struct page *p, unsigned long pfn, int flags) if (!get_page_unless_zero(compound_head(p))) { if (PageHuge(p)) { pr_info("%s: %#lx free huge page\n", __func__, pfn); - ret = dequeue_hwpoisoned_huge_page(compound_head(p)); + ret = 0; } else if (is_free_buddy_page(p)) { pr_info("%s: %#lx free buddy page\n", __func__, pfn); - /* Set hwpoison bit while page is still isolated */ - SetPageHWPoison(p); ret = 0; } else { pr_info("%s: %#lx: unknown zero refcount page type %lx\n", @@ -1413,42 +1411,62 @@ static int get_any_page(struct page *p, unsigned long pfn, int flags) return ret; } +static int get_any_page(struct page *page, unsigned long pfn, int flags) +{ + int ret = __get_any_page(page, pfn, flags); + + if (ret == 1 && !PageHuge(page) && !PageLRU(page)) { + /* +* Try to free it. +*/ + put_page(page); + shake_page(page, 1); + + /* +* Did it turn free? +*/ + ret = __get_any_page(page, pfn, 0); + if (!PageLRU(page)) { + pr_info("soft_offline: %#lx: unknown non LRU page type %lx\n", + pfn, page->flags); + return -EIO; + } + } + return ret; +} + static int soft_offline_huge_page(struct page *page, int flags) { int ret; unsigned long pfn = page_to_pfn(page); struct page *hpage = compound_head(page); + /* +* This double-check of PageHWPoison is to avoid the race with +* memory_failure(). See also comment in __soft_offline_page(). +*/ + lock_page(hpage); if (PageHWPoison(hpage)) { + unlock_page(hpage); + put_page(hpage); pr_info("soft offline: %#lx hugepage already poisoned\n", pfn); - ret = -EBUSY; - goto out; + return -EBUSY; } - - ret = get_any_page(page, pfn, flags); - if (ret < 0) - goto out; - if (ret == 0) - goto done; + unlock_page(hpage); /* Keep page count to indicate a given hugepage is isolated. */ ret = migrate_huge_page(hpage, new_page, MPOL_MF_MOVE_ALL, false, MIGRATE_SYNC); put_page(hpage); - if (ret) { + if (ret) pr_info("soft offline: %#lx: migration failed %d, type %lx\n", pfn, ret, page->flags); - goto out; - } -done: /* keep elevated page count for bad page */ - atomic_long_add(1 << compound_trans_order(hpage), &num_poisoned_pages); - set_page_hwpoison_huge_page(hpage); - dequeue_hwpoisoned_huge_page(hpage); -out: return ret; } +static int __soft_offline_page(struct page *page, int flags); + /** * soft_offline_page - Soft offline a page. * @page: page to offline @@ -1477,62 +1495,60 @@ int soft_offline_page(struct page *page, int flags) unsigned long pfn = page_to_pfn(page); struct page *hpage = compound_trans_head(page); - if (PageHuge(page)) { - ret = soft_offline_huge_page(
Re: [PATCH 1/2] i2c-core: Add gpio based bus arbitration implementation
On Thu, Jan 24, 2013 at 12:39:48PM +0100, Wolfram Sang wrote: > On Thu, Jan 24, 2013 at 07:18:47PM +0800, Mark Brown wrote: > > A read is typically implemented as a write of the register address > > followed by a read of the value, usually with the ability to free the > > bus in between. If two devices attempt to access the register map > > simultaneously this results in the address going wrong. > Could happen. But in what situations will one not use repeated start > here? Especially when designing a multi-master bus? Well, you're depending on the specific drivers doing things that way and it's actually quite rare for the controller drivers in Linux to support I2C_M_NOSTART which discourages this. signature.asc Description: Digital signature
Re: [PATCH 11/19] regmap: avoid undefined return from regmap_read_debugfs
On Sat, Jan 26, 2013 at 12:42:26PM +0800, Mark Brown wrote: > On Fri, Jan 25, 2013 at 02:14:28PM +, Arnd Bergmann wrote: > > Gcc warns about the case where regmap_read_debugfs tries > Are you sure about that function name? > > to walk an empty map->debugfs_off_cache list, which results > > in uninitialized variable getting returned. > > Setting this variable to 0 first avoids the warning and > > the potentially undefined value. > This probably won't apply against current code as there's already a > better fix there, in general just picking a value to initialise masks > errors. Resending with corrected list address; to be clear please don't send this. signature.asc Description: Digital signature
Re: [PATCH 11/19] regmap: avoid undefined return from regmap_read_debugfs
On Fri, Jan 25, 2013 at 02:14:28PM +, Arnd Bergmann wrote: > Gcc warns about the case where regmap_read_debugfs tries Are you sure about that function name? > to walk an empty map->debugfs_off_cache list, which results > in uninitialized variable getting returned. > Setting this variable to 0 first avoids the warning and > the potentially undefined value. This probably won't apply against current code as there's already a better fix there, in general just picking a value to initialise masks errors. signature.asc Description: Digital signature
Re: [PATCH 1/2] media: add support for decoder subdevs along with sensor and others
Hi Sylwester, On Sat, Jan 26, 2013 at 1:24 AM, Sylwester Nawrocki wrote: > Hi Prahakar, > > > On 01/25/2013 08:01 AM, Prabhakar Lad wrote: >> >> From: Manjunath Hadli >> >> A lot of SOCs including Texas Instruments Davinci family mainly use >> video decoders as input devices. Here the initial subdevice node >> from where the input really comes is this decoder, for which support >> is needed as part of the Media Controller infrastructure. This patch >> adds an additional flag to include the decoders along with others, >> such as the sensor and lens. >> >> Signed-off-by: Manjunath Hadli >> Signed-off-by: Lad, Prabhakar >> --- >> include/uapi/linux/media.h |1 + >> 1 files changed, 1 insertions(+), 0 deletions(-) >> >> diff --git a/include/uapi/linux/media.h b/include/uapi/linux/media.h >> index 0ef8833..fa44ed9 100644 >> --- a/include/uapi/linux/media.h >> +++ b/include/uapi/linux/media.h >> @@ -56,6 +56,7 @@ struct media_device_info { >> #define MEDIA_ENT_T_V4L2_SUBDEV_SENSOR(MEDIA_ENT_T_V4L2_SUBDEV + >> 1) >> #define MEDIA_ENT_T_V4L2_SUBDEV_FLASH (MEDIA_ENT_T_V4L2_SUBDEV + 2) >> #define MEDIA_ENT_T_V4L2_SUBDEV_LENS (MEDIA_ENT_T_V4L2_SUBDEV + 3) >> +#define MEDIA_ENT_T_V4L2_SUBDEV_DECODER(MEDIA_ENT_T_V4L2_SUBDEV + >> 4) > > > Such a new entity type needs to be documented in the media DocBook [1]. > It probably also deserves a comment here, as DECODER isn't that obvious > like the other already existing entity types. I heard people referring > to a device that encodes analog (composite) video signal into its digital > representation as an ENCODER. :) > > Thanks for pointing it :), I'll document it and post a v2. Regards, --Prabhakar Lad > [1] http://hverkuil.home.xs4all.nl/spec/media.html#media-ioc-enum-entities > > -- > > Regards, > Sylwester -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v5 7/8] fat (exportfs): rebuild directory-inode if fat_dget() fails
Namjae Jeon writes: > 2013/1/20, OGAWA Hirofumi : >> Namjae Jeon writes: >> >>> We rewrite patch as your suggestion using dummy inode. Would please >>> you review below patch code ? >> >> Looks like good as initial. Clean and shorter. >> >> Next is, we have to think about race. I.e. if real inode was made, what >> happens? Is there no race? > Hi OGAWA. > > Although checking several routines to check hang case you said, I > didn't find anything. > And There is no any race on test result also. Am I missing something ? > Let me know your opinion. Hm, it's read-only. So, there may not be race for now, I'm sure there is race on write path though. Thanks. -- OGAWA Hirofumi -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Negative (setpoint-dirty) in bdi_position_ratio()
Dear Fengguang (et al), > There are 260MB reclaimable slab pages in the normal zone, however we > somehow failed to reclaim them. ... Could the problem be that without CONFIG_NUMA, zone_reclaim_mode stays at zero and anyway zone_reclaim() does nothing in include/linux/swap.h ? Though... there is no CONFIG_NUMA nor /proc/sys/vm/zone_reclaim_mode in the Ubuntu non-PAE "plain" HIGHMEM4G kernel, and still it handles the "sleep test" just fine. Where does reclaiming happen (or meant to happen)? Thanks, Paul Paul Szabo p...@maths.usyd.edu.au http://www.maths.usyd.edu.au/u/psz/ School of Mathematics and Statistics University of SydneyAustralia -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2]linux-usb:Define a new macro for USB storage match rules
On Fri, Jan 25, 2013 at 07:10:29PM -0800, Matthew Dharm wrote: > I suggest one of two options: > > 1) Setup an alternative mail client. There are many to choose from > which will not damage your patches. I personally like 'mutt' (which > you should be able to install on your linux machine). Others may be > able to recommend ones that work for them; in general, I think you > will find that most e-mail clients that run on Linux will be suitable. The file, Documentation/email_clients.txt will help out here. > 2) If you plan on contributing to the linux kernel in the future, it > may be worth your time to setup a repo on github that Greg can then > directly pull from. All you would need to do is send Greg a "pull > request" indicating the URL of the branch in your repo that he should > pull from. Greg can then pull directly from your repo, bypassing this > issue entirely. No, sorry, I only pull trees from a _very_ few people, patches are what I prefer for almost all stuff. Only subsystem maintainers who I have been working with for many years will I pull trees from. sorry, greg k-h -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Bug#695182: [PATCH] Subtract min_free_kbytes from dirtyable memory
On Sat, 2013-01-26 at 14:07 +1100, paul.sz...@sydney.edu.au wrote: > Dear Ben, > > > ... the mm maintainers are probably much better placed ... > > Exactly. Now I wonder: are you one of them? Hah, no. Ben. -- Ben Hutchings Any smoothly functioning technology is indistinguishable from a rigged demo. signature.asc Description: This is a digitally signed message part
Re: [PATCH 1/2]linux-usb:Define a new macro for USB storage match rules
On Fri, Jan 25, 2013 at 6:05 PM, Greg KH wrote: > On Sat, Jan 26, 2013 at 01:39:50AM +, Fangxiaozhi (Franko) wrote: >> >> >> > -Original Message- >> > From: Greg KH [mailto:g...@kroah.com] >> > Sent: Saturday, January 26, 2013 1:45 AM >> > To: Fangxiaozhi (Franko) >> > Cc: Sergei Shtylyov; linux-...@vger.kernel.org; >> > linux-kernel@vger.kernel.org; >> > Xueguiying (Zihan); Linlei (Lei Lin); Yili (Neil); Wangyuhua (Roger, >> > Credit); >> > Huqiao (C); ba...@ti.com; mdharm-...@one-eyed-alien.net; >> > sebast...@breakpoint.cc >> > Subject: Re: [PATCH 1/2]linux-usb:Define a new macro for USB storage match >> > rules >> > >> > On Fri, Jan 25, 2013 at 04:18:34PM +0400, Sergei Shtylyov wrote: >> > > Hello. >> > > >> > > On 25-01-2013 6:44, fangxiaozhi 00110321 wrote: >> > > >> > > >From: fangxiaozhi >> > > >> > > >1. Define a new macro for USB storage match rules: >> > > > matching with Vendor ID and interface descriptors. >> > > >> > > >Signed-off-by: fangxiaozhi >> > > > >> > > > >> > > > diff -uprN linux-3.8-rc4_orig/drivers/usb/storage/usb.c >> > > >linux-3.8-rc4/drivers/usb/storage/usb.c >> > > >--- linux-3.8-rc4_orig/drivers/usb/storage/usb.c 2013-01-22 >> > > >14:12:42.595238727 +0800 >> > > >+++ linux-3.8-rc4/drivers/usb/storage/usb.c 2013-01-22 >> > > >+++ 14:16:01.398250305 +0800 >> > > >@@ -120,6 +120,17 @@ MODULE_PARM_DESC(quirks, "supplemental l >> > > > .useTransport = use_transport, \ >> > > > } >> > > > >> > > >+#define UNUSUAL_VENDOR_INTF(idVendor, cl, sc, pr, \ >> > > >+ vendor_name, product_name, use_protocol, use_transport, \ >> > > >+ init_function, Flags) \ >> > > >+{ \ >> > > >+ .vendorName = vendor_name, \ >> > > >+ .productName = product_name, \ >> > > >+ .useProtocol = use_protocol, \ >> > > >+ .useTransport = use_transport, \ >> > > >+ .initFunction = init_function, \ >> > > >+} >> > > >> > > Shouldn't the field initilaizers be indented with tab, not space? >> > >> > Yes it must. fangxiaozhi, please always run your patches through the >> > scripts/checkpatch.pl tool before sending them out (note, you will have to >> > ignore the CamelCase warnings your patch produces, but not the other >> > ones.) >> > >> -What's wrong with it? >> -I have checked the patches with scripts/checkpatch.pl before sending. >> -There is no other warning or error in my patches except CamelCase >> warnings. >> -So what's wrong now? > > Then your email client messed up the patches and put spaces in the code > instead of tabs. Try looking at the message on the mailing list and run > that through checkpatch, it will show you the problems. > > What I received isn't ok, sorry. Fangxiaozhi -- According to the headers of your E-mail, you are using MS Outlook to send your patches. Outlook commonly mangles patches, unfortunately. It is not a very good e-mail client. I suggest one of two options: 1) Setup an alternative mail client. There are many to choose from which will not damage your patches. I personally like 'mutt' (which you should be able to install on your linux machine). Others may be able to recommend ones that work for them; in general, I think you will find that most e-mail clients that run on Linux will be suitable. 2) If you plan on contributing to the linux kernel in the future, it may be worth your time to setup a repo on github that Greg can then directly pull from. All you would need to do is send Greg a "pull request" indicating the URL of the branch in your repo that he should pull from. Greg can then pull directly from your repo, bypassing this issue entirely. Matt -- Matthew Dharm Maintainer, USB Mass Storage driver for Linux -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Bug#695182: [PATCH] Subtract min_free_kbytes from dirtyable memory
Dear Ben, > ... the mm maintainers are probably much better placed ... Exactly. Now I wonder: are you one of them? Thanks, Paul Paul Szabo p...@maths.usyd.edu.au http://www.maths.usyd.edu.au/u/psz/ School of Mathematics and Statistics University of SydneyAustralia -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Bug#695182: [PATCH] Subtract min_free_kbytes from dirtyable memory
-- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] extcon: arizona: Use regulated mode for microphone supply when detecting
On 01/25/2013 06:16 PM, Mark Brown wrote: When starting microphone detection some headsets should be exposed to the fully regulated microphone bias in order to ensure that they behave in an optimal fashion. Signed-off-by: Mark Brown --- drivers/extcon/Kconfig |2 +- drivers/extcon/extcon-arizona.c | 62 +++ 2 files changed, 63 insertions(+), 1 deletion(-) Applied it. Thanks, Chanwoo Choi -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] userns: improve uid/gid map collision detection
Aristeu Rozanski writes: > On Thu, Jan 24, 2013 at 04:46:12PM -0800, Andrew Morton wrote: >> eek, a macro! Macros are always bad. >> >> This one is bad because >> >> a) it's a macro >> >> b) it evaluates its args multiple times and hence will cause nasty >>bugs if called with expressions-with-side-effects. >> >> c) it evaluates its args multiple times and if called with >>non-trivial expressions the compiler might not be able to CSE those >>expressions, leading to code bloat. >> >> Add lo, this patch: >> >> --- >> a/kernel/user_namespace.c~userns-improve-uid-gid-map-collision-detection-fix >> +++ a/kernel/user_namespace.c >> @@ -521,7 +521,11 @@ struct seq_operations proc_projid_seq_op >> >> static DEFINE_MUTEX(id_map_mutex); >> >> -#define in_range(b,first,len) ((b)>=(first)&&(b)<(first)+(len)) >> +static bool in_range(u32 b, u32 first, u32 len) >> +{ >> +return b >= first && b < first + len; >> +} >> + >> static inline int extent_collision(struct uid_gid_map *new_map, >> struct uid_gid_extent *extent) >> { >> >> reduces the user_namespace.o text from 4822 bytes to 4727 with >> gcc-4.4.4. This is a remarkably large difference. > > thanks Andrew > > (I see Eric already answered about the config option) Aritsteu after looking at both my version and yours I am going with mine. While my code is a little wordier I have half the number of comparisons your code does, and I took the time to kill the variable introducing a function to test for range collisions makes unnecessary. On Andrews size metric my version seems noticably smaller as well. size $PWD-build/kernel/user_namespace.o textdata bss dec hex filename 4376 144 0452011a8 /home/eric/projects/linux/linux-userns-devel-build/kernel/user_namespace.o Short of something unexpected I plan to push all my code to linux-next sometime tomorrow. Eric -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH review 6/6] userns: Allow the userns root to mount tmpfs.
There is no backing store to tmpfs and file creation rules are the same as for any other filesystem so it is semantically safe to allow unprivileged users to mount it. ramfs is safe for the same reasons so allow either flavor of tmpfs to be mounted by a user namespace root user. The memory control group successfully limits how much memory tmpfs can consume on any system that cares about a user namespace root using tmpfs to exhaust memory the memory control group can be deployed. Signed-off-by: "Eric W. Biederman" --- mm/shmem.c |2 ++ 1 files changed, 2 insertions(+), 0 deletions(-) diff --git a/mm/shmem.c b/mm/shmem.c index 5c90d84..197ca5e 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -2766,6 +2766,7 @@ static struct file_system_type shmem_fs_type = { .name = "tmpfs", .mount = shmem_mount, .kill_sb= kill_litter_super, + .fs_flags = FS_USERNS_MOUNT, }; int __init shmem_init(void) @@ -2823,6 +2824,7 @@ static struct file_system_type shmem_fs_type = { .name = "tmpfs", .mount = ramfs_mount, .kill_sb= kill_litter_super, + .fs_flags = FS_USERNS_MOUNT, }; int __init shmem_init(void) -- 1.7.5.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH review 5/6] userns: Allow the userns root to mount ramfs.
There is no backing store to ramfs and file creation rules are the same as for any other filesystem so it is semantically safe to allow unprivileged users to mount it. The memory control group successfully limits how much memory ramfs can consume on any system that cares about a user namespace root using ramfs to exhaust memory the memory control group can be deployed. Signed-off-by: "Eric W. Biederman" --- fs/ramfs/inode.c |1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/fs/ramfs/inode.c b/fs/ramfs/inode.c index eab8c09..c24f1e1 100644 --- a/fs/ramfs/inode.c +++ b/fs/ramfs/inode.c @@ -260,6 +260,7 @@ static struct file_system_type ramfs_fs_type = { .name = "ramfs", .mount = ramfs_mount, .kill_sb= ramfs_kill_sb, + .fs_flags = FS_USERNS_MOUNT, }; static struct file_system_type rootfs_fs_type = { .name = "rootfs", -- 1.7.5.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH review 4/6] userns: Allow the userns root to mount of devpts
- The context in which devpts is mounted has no effect on the creation of ptys as the /dev/ptmx interface has been used by unprivileged users for many years. - Only support unprivileged mounts in combination with the newinstance option to ensure that mounting of /dev/pts in a user namespace will not allow the options of an existing mount of devpts to be modified. - Create /dev/pts/ptmx as the root user in the user namespace that mounts devpts so that it's permissions to be changed. Signed-off-by: "Eric W. Biederman" --- fs/devpts/inode.c | 18 ++ 1 files changed, 18 insertions(+), 0 deletions(-) diff --git a/fs/devpts/inode.c b/fs/devpts/inode.c index 472e6be..073d30b 100644 --- a/fs/devpts/inode.c +++ b/fs/devpts/inode.c @@ -243,6 +243,13 @@ static int mknod_ptmx(struct super_block *sb) struct dentry *root = sb->s_root; struct pts_fs_info *fsi = DEVPTS_SB(sb); struct pts_mount_opts *opts = &fsi->mount_opts; + kuid_t root_uid; + kgid_t root_gid; + + root_uid = make_kuid(current_user_ns(), 0); + root_gid = make_kgid(current_user_ns(), 0); + if (!uid_valid(root_uid) || !gid_valid(root_gid)) + return -EINVAL; mutex_lock(&root->d_inode->i_mutex); @@ -273,6 +280,8 @@ static int mknod_ptmx(struct super_block *sb) mode = S_IFCHR|opts->ptmxmode; init_special_inode(inode, mode, MKDEV(TTYAUX_MAJOR, 2)); + inode->i_uid = root_uid; + inode->i_gid = root_gid; d_add(dentry, inode); @@ -438,6 +447,12 @@ static struct dentry *devpts_mount(struct file_system_type *fs_type, if (error) return ERR_PTR(error); + /* Require newinstance for all user namespace mounts to ensure +* the mount options are not changed. +*/ + if ((current_user_ns() != &init_user_ns) && !opts.newinstance) + return ERR_PTR(-EINVAL); + if (opts.newinstance) s = sget(fs_type, NULL, set_anon_super, flags, NULL); else @@ -491,6 +506,9 @@ static struct file_system_type devpts_fs_type = { .name = "devpts", .mount = devpts_mount, .kill_sb= devpts_kill_sb, +#ifdef CONFIG_DEVPTS_MULTIPLE_INSTANCES + .fs_flags = FS_USERNS_MOUNT | FS_USERNS_DEV_MOUNT, +#endif }; /* -- 1.7.5.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH review 3/6] userns: Recommend use of memory control groups.
In the help text describing user namespaces recommend use of memory control groups. In many cases memory control groups are the only mechanism there is to limit how much memory a user who can create user namespaces can use. Signed-off-by: "Eric W. Biederman" --- Documentation/namespaces/resource-control.txt | 10 ++ init/Kconfig |7 +++ 2 files changed, 17 insertions(+), 0 deletions(-) create mode 100644 Documentation/namespaces/resource-control.txt diff --git a/Documentation/namespaces/resource-control.txt b/Documentation/namespaces/resource-control.txt new file mode 100644 index 000..3d8178a --- /dev/null +++ b/Documentation/namespaces/resource-control.txt @@ -0,0 +1,10 @@ +There are a lot of kinds of objects in the kernel that don't have +individual limits or that have limits that are ineffective when a set +of processes is allowed to switch user ids. With user namespaces +enabled in a kernel for people who don't trust their users or their +users programs to play nice this problems becomes more acute. + +Therefore it is recommended that memory control groups be enabled in +kernels that enable user namespaces, and it is further recommended +that userspace configure memory control groups to limit how much +memory users they don't trust to play nice can use. diff --git a/init/Kconfig b/init/Kconfig index 7d30240..c8c58bd 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -1035,6 +1035,13 @@ config USER_NS help This allows containers, i.e. vservers, to use user namespaces to provide different user info for different servers. + + When user namespaces are enabled in the kernel it is + recommended that the MEMCG and MEMCG_KMEM options also be + enabled and that user-space use the memory control groups to + limit the amount of memory a memory unprivileged users can + use. + If unsure, say N. config PID_NS -- 1.7.5.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH review 2/6] userns: Allow any uid or gid mappings that don't overlap.
When I initially wrote the code for /proc//uid_map. I was lazy and avoided duplicate mappings by the simple expedient of ensuring the first number in a new extent was greater than any number in the previous extent. Unfortunately that precludes a number of valid mappings, and someone noticed and complained. So use a simple check to ensure that ranges in the mapping extents don't overlap. Signed-off-by: "Eric W. Biederman" --- kernel/user_namespace.c | 45 +++-- 1 files changed, 39 insertions(+), 6 deletions(-) diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c index 24f8ec3..8b65083 100644 --- a/kernel/user_namespace.c +++ b/kernel/user_namespace.c @@ -520,6 +520,42 @@ struct seq_operations proc_projid_seq_operations = { .show = projid_m_show, }; +static bool mappings_overlap(struct uid_gid_map *new_map, struct uid_gid_extent *extent) +{ + u32 upper_first, lower_first, upper_last, lower_last; + unsigned idx; + + upper_first = extent->first; + lower_first = extent->lower_first; + upper_last = upper_first + extent->count - 1; + lower_last = lower_first + extent->count - 1; + + for (idx = 0; idx < new_map->nr_extents; idx++) { + u32 prev_upper_first, prev_lower_first; + u32 prev_upper_last, prev_lower_last; + struct uid_gid_extent *prev; + + prev = &new_map->extent[idx]; + + prev_upper_first = prev->first; + prev_lower_first = prev->lower_first; + prev_upper_last = prev_upper_first + prev->count - 1; + prev_lower_last = prev_lower_first + prev->count - 1; + + /* Does the upper range intersect a previous extent? */ + if ((prev_upper_first <= upper_last) && + (prev_upper_last >= upper_first)) + return true; + + /* Does the lower range intersect a previous extent? */ + if ((prev_lower_first <= lower_last) && + (prev_lower_last >= lower_first)) + return true; + } + return false; +} + + static DEFINE_MUTEX(id_map_mutex); static ssize_t map_write(struct file *file, const char __user *buf, @@ -532,7 +568,7 @@ static ssize_t map_write(struct file *file, const char __user *buf, struct user_namespace *ns = seq->private; struct uid_gid_map new_map; unsigned idx; - struct uid_gid_extent *extent, *last = NULL; + struct uid_gid_extent *extent = NULL; unsigned long page = 0; char *kbuf, *pos, *next_line; ssize_t ret = -EINVAL; @@ -635,14 +671,11 @@ static ssize_t map_write(struct file *file, const char __user *buf, if ((extent->lower_first + extent->count) <= extent->lower_first) goto out; - /* For now only accept extents that are strictly in order */ - if (last && - (((last->first + last->count) > extent->first) || -((last->lower_first + last->count) > extent->lower_first))) + /* Do the ranges in extent overlap any previous extents? */ + if (mappings_overlap(&new_map, extent)) goto out; new_map.nr_extents++; - last = extent; /* Fail if the file contains too many extents */ if ((new_map.nr_extents == UID_GID_MAP_MAX_EXTENTS) && -- 1.7.5.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH review 1/6] userns: Avoid recursion in put_user_ns
When freeing a deeply nested user namespace free_user_ns calls put_user_ns on it's parent which may in turn call free_user_ns again. When -fno-optimize-sibling-calls is passed to gcc one stack frame per user namespace is left on the stack, potentially overflowing the kernel stack. CONFIG_FRAME_POINTER forces -fno-optimize-sibling-calls so we can't count on gcc to optimize this code. Remove struct kref and use a plain atomic_t. Making the code more flexible and easier to comprehend. Make the loop in free_user_ns explict to guarantee that the stack does not overflow with CONFIG_FRAME_POINTER enabled. I have tested this fix with a simple program that uses unshare to create a deeply nested user namespace structure and then calls exit. With 1000 nesteuser namespaces before this change running my test program causes the kernel to die a horrible death. With 10,000,000 nested user namespaces after this change my test program runs to completion and causes no harm. Pointed-out-by: Vasily Kulikov Signed-off-by: "Eric W. Biederman" --- include/linux/user_namespace.h | 10 +- kernel/user.c |4 +--- kernel/user_namespace.c| 17 + 3 files changed, 15 insertions(+), 16 deletions(-) diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h index b9bd2e6..4ce0093 100644 --- a/include/linux/user_namespace.h +++ b/include/linux/user_namespace.h @@ -21,7 +21,7 @@ struct user_namespace { struct uid_gid_map uid_map; struct uid_gid_map gid_map; struct uid_gid_map projid_map; - struct kref kref; + atomic_tcount; struct user_namespace *parent; kuid_t owner; kgid_t group; @@ -35,18 +35,18 @@ extern struct user_namespace init_user_ns; static inline struct user_namespace *get_user_ns(struct user_namespace *ns) { if (ns) - kref_get(&ns->kref); + atomic_inc(&ns->count); return ns; } extern int create_user_ns(struct cred *new); extern int unshare_userns(unsigned long unshare_flags, struct cred **new_cred); -extern void free_user_ns(struct kref *kref); +extern void free_user_ns(struct user_namespace *ns); static inline void put_user_ns(struct user_namespace *ns) { - if (ns) - kref_put(&ns->kref, free_user_ns); + if (ns && atomic_dec_and_test(&ns->count)) + free_user_ns(ns); } struct seq_operations; diff --git a/kernel/user.c b/kernel/user.c index 33acb5e..57ebfd4 100644 --- a/kernel/user.c +++ b/kernel/user.c @@ -47,9 +47,7 @@ struct user_namespace init_user_ns = { .count = 4294967295U, }, }, - .kref = { - .refcount = ATOMIC_INIT(3), - }, + .count = ATOMIC_INIT(3), .owner = GLOBAL_ROOT_UID, .group = GLOBAL_ROOT_GID, .proc_inum = PROC_USER_INIT_INO, diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c index 2b042c4..24f8ec3 100644 --- a/kernel/user_namespace.c +++ b/kernel/user_namespace.c @@ -78,7 +78,7 @@ int create_user_ns(struct cred *new) return ret; } - kref_init(&ns->kref); + atomic_set(&ns->count, 1); /* Leave the new->user_ns reference with the new user namespace. */ ns->parent = parent_ns; ns->owner = owner; @@ -104,15 +104,16 @@ int unshare_userns(unsigned long unshare_flags, struct cred **new_cred) return create_user_ns(cred); } -void free_user_ns(struct kref *kref) +void free_user_ns(struct user_namespace *ns) { - struct user_namespace *parent, *ns = - container_of(kref, struct user_namespace, kref); + struct user_namespace *parent; - parent = ns->parent; - proc_free_inum(ns->proc_inum); - kmem_cache_free(user_ns_cachep, ns); - put_user_ns(parent); + do { + parent = ns->parent; + proc_free_inum(ns->proc_inum); + kmem_cache_free(user_ns_cachep, ns); + ns = parent; + } while (atomic_dec_and_test(&parent->count)); } EXPORT_SYMBOL(free_user_ns); -- 1.7.5.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH review 0/6] miscelaneous user namespace patches
Now that I have done my worst to infect user space with some basic tools for using user namespaces, this is my first round of patches aimed at the 3.9 merge window. This documents that if you care about limit resources you want to configure the memory control group when user namespaces are enabled. This enables the user namespace root to mount devpts, ramfs and tmpfs. Functionality that is needed for practical uses of the user namespace. This includes my patch to enable more flexibility into the input allowed in uid_map and gid_map. Documentation/namespaces/resource-control.txt | 10 fs/devpts/inode.c | 18 +++ fs/ramfs/inode.c |1 + include/linux/user_namespace.h| 10 ++-- init/Kconfig |7 +++ kernel/user.c |4 +- kernel/user_namespace.c | 62 +++-- mm/shmem.c|2 + 8 files changed, 92 insertions(+), 22 deletions(-) Eric W. Biederman (6): userns: Avoid recursion in put_user_ns userns: Allow any uid or gid mappings that don't overlap. userns: Recommend use of memory control groups. userns: Allow the userns root to mount of devpts userns: Allow the userns root to mount ramfs. userns: Allow the userns root to mount tmpfs. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 11/11] ksm: stop hotremove lockdep warning
Complaints are rare, but lockdep still does not understand the way ksm_memory_callback(MEM_GOING_OFFLINE) takes ksm_thread_mutex, and holds it until the ksm_memory_callback(MEM_OFFLINE): that appears to be a problem because notifier callbacks are made under down_read of blocking_notifier_head->rwsem (so first the mutex is taken while holding the rwsem, then later the rwsem is taken while still holding the mutex); but is not in fact a problem because mem_hotplug_mutex is held throughout the dance. There was an attempt to fix this with mutex_lock_nested(); but if that happened to fool lockdep two years ago, apparently it does so no longer. I had hoped to eradicate this issue in extending KSM page migration not to need the ksm_thread_mutex. But then realized that although the page migration itself is safe, we do still need to lock out ksmd and other users of get_ksm_page() while offlining memory - at some point between MEM_GOING_OFFLINE and MEM_OFFLINE, the struct pages themselves may vanish, and get_ksm_page()'s accesses to them become a violation. So, give up on holding ksm_thread_mutex itself from MEM_GOING_OFFLINE to MEM_OFFLINE, and add a KSM_RUN_OFFLINE flag, and wait_while_offlining() checks, to achieve the same lockout without being caught by lockdep. This is less elegant for KSM, but it's more important to keep lockdep useful to other users - and I apologize for how long it took to fix. Reported-by: Gerald Schaefer Signed-off-by: Hugh Dickins --- mm/ksm.c | 55 +++-- 1 file changed, 41 insertions(+), 14 deletions(-) --- mmotm.orig/mm/ksm.c 2013-01-25 14:37:06.880206290 -0800 +++ mmotm/mm/ksm.c 2013-01-25 14:38:53.984208836 -0800 @@ -226,7 +226,9 @@ static unsigned int ksm_merge_across_nod #define KSM_RUN_STOP 0 #define KSM_RUN_MERGE 1 #define KSM_RUN_UNMERGE2 -static unsigned int ksm_run = KSM_RUN_STOP; +#define KSM_RUN_OFFLINE4 +static unsigned long ksm_run = KSM_RUN_STOP; +static void wait_while_offlining(void); static DECLARE_WAIT_QUEUE_HEAD(ksm_thread_wait); static DEFINE_MUTEX(ksm_thread_mutex); @@ -1700,6 +1702,7 @@ static int ksm_scan_thread(void *nothing while (!kthread_should_stop()) { mutex_lock(&ksm_thread_mutex); + wait_while_offlining(); if (ksmd_should_run()) ksm_do_scan(ksm_thread_pages_to_scan); mutex_unlock(&ksm_thread_mutex); @@ -2056,6 +2059,22 @@ void ksm_migrate_page(struct page *newpa #endif /* CONFIG_MIGRATION */ #ifdef CONFIG_MEMORY_HOTREMOVE +static int just_wait(void *word) +{ + schedule(); + return 0; +} + +static void wait_while_offlining(void) +{ + while (ksm_run & KSM_RUN_OFFLINE) { + mutex_unlock(&ksm_thread_mutex); + wait_on_bit(&ksm_run, ilog2(KSM_RUN_OFFLINE), + just_wait, TASK_UNINTERRUPTIBLE); + mutex_lock(&ksm_thread_mutex); + } +} + static void ksm_check_stable_tree(unsigned long start_pfn, unsigned long end_pfn) { @@ -2098,15 +2117,15 @@ static int ksm_memory_callback(struct no switch (action) { case MEM_GOING_OFFLINE: /* -* Keep it very simple for now: just lock out ksmd and -* MADV_UNMERGEABLE while any memory is going offline. -* mutex_lock_nested() is necessary because lockdep was alarmed -* that here we take ksm_thread_mutex inside notifier chain -* mutex, and later take notifier chain mutex inside -* ksm_thread_mutex to unlock it. But that's safe because both -* are inside mem_hotplug_mutex. +* Prevent ksm_do_scan(), unmerge_and_remove_all_rmap_items() +* and remove_all_stable_nodes() while memory is going offline: +* it is unsafe for them to touch the stable tree at this time. +* But unmerge_ksm_pages(), rmap lookups and other entry points +* which do not need the ksm_thread_mutex are all safe. */ - mutex_lock_nested(&ksm_thread_mutex, SINGLE_DEPTH_NESTING); + mutex_lock(&ksm_thread_mutex); + ksm_run |= KSM_RUN_OFFLINE; + mutex_unlock(&ksm_thread_mutex); break; case MEM_OFFLINE: @@ -2122,11 +2141,20 @@ static int ksm_memory_callback(struct no /* fallthrough */ case MEM_CANCEL_OFFLINE: + mutex_lock(&ksm_thread_mutex); + ksm_run &= ~KSM_RUN_OFFLINE; mutex_unlock(&ksm_thread_mutex); + + smp_mb(); /* wake_up_bit advises this */ + wake_up_bit(&ksm_run, ilog2(KSM_RUN_OFFLINE)); break; } return NOTIFY_OK; } +#else +static void wait_while_offlining(void) +{ +} #endif /* CONFIG_MEMORY_HOTREMOVE
[PATCH 10/11] mm: remove offlining arg to migrate_pages
No functional change, but the only purpose of the offlining argument to migrate_pages() etc, was to ensure that __unmap_and_move() could migrate a KSM page for memory hotremove (which took ksm_thread_mutex) but not for other callers. Now all cases are safe, remove the arg. Signed-off-by: Hugh Dickins --- include/linux/migrate.h | 14 ++ mm/compaction.c |2 +- mm/memory-failure.c |7 +++ mm/memory_hotplug.c |3 +-- mm/mempolicy.c |8 +++- mm/migrate.c| 35 +-- mm/page_alloc.c |6 ++ 7 files changed, 29 insertions(+), 46 deletions(-) --- mmotm.orig/include/linux/migrate.h 2013-01-24 12:28:38.740127550 -0800 +++ mmotm/include/linux/migrate.h 2013-01-25 14:38:51.468208776 -0800 @@ -40,11 +40,9 @@ extern void putback_movable_pages(struct extern int migrate_page(struct address_space *, struct page *, struct page *, enum migrate_mode); extern int migrate_pages(struct list_head *l, new_page_t x, - unsigned long private, bool offlining, - enum migrate_mode mode, int reason); + unsigned long private, enum migrate_mode mode, int reason); extern int migrate_huge_page(struct page *, new_page_t x, - unsigned long private, bool offlining, - enum migrate_mode mode); + unsigned long private, enum migrate_mode mode); extern int fail_migrate_page(struct address_space *, struct page *, struct page *); @@ -62,11 +60,11 @@ extern int migrate_huge_page_move_mappin static inline void putback_lru_pages(struct list_head *l) {} static inline void putback_movable_pages(struct list_head *l) {} static inline int migrate_pages(struct list_head *l, new_page_t x, - unsigned long private, bool offlining, - enum migrate_mode mode, int reason) { return -ENOSYS; } + unsigned long private, enum migrate_mode mode, int reason) + { return -ENOSYS; } static inline int migrate_huge_page(struct page *page, new_page_t x, - unsigned long private, bool offlining, - enum migrate_mode mode) { return -ENOSYS; } + unsigned long private, enum migrate_mode mode) + { return -ENOSYS; } static inline int migrate_prep(void) { return -ENOSYS; } static inline int migrate_prep_local(void) { return -ENOSYS; } --- mmotm.orig/mm/compaction.c 2013-01-24 12:28:38.740127550 -0800 +++ mmotm/mm/compaction.c 2013-01-25 14:38:51.472208776 -0800 @@ -980,7 +980,7 @@ static int compact_zone(struct zone *zon nr_migrate = cc->nr_migratepages; err = migrate_pages(&cc->migratepages, compaction_alloc, - (unsigned long)cc, false, + (unsigned long)cc, cc->sync ? MIGRATE_SYNC_LIGHT : MIGRATE_ASYNC, MR_COMPACTION); update_nr_listpages(cc); --- mmotm.orig/mm/memory-failure.c 2013-01-24 12:28:38.740127550 -0800 +++ mmotm/mm/memory-failure.c 2013-01-25 14:38:51.472208776 -0800 @@ -1432,7 +1432,7 @@ static int soft_offline_huge_page(struct goto done; /* Keep page count to indicate a given hugepage is isolated. */ - ret = migrate_huge_page(hpage, new_page, MPOL_MF_MOVE_ALL, false, + ret = migrate_huge_page(hpage, new_page, MPOL_MF_MOVE_ALL, MIGRATE_SYNC); put_page(hpage); if (ret) { @@ -1564,11 +1564,10 @@ int soft_offline_page(struct page *page, if (!ret) { LIST_HEAD(pagelist); inc_zone_page_state(page, NR_ISOLATED_ANON + - page_is_file_cache(page)); + page_is_file_cache(page)); list_add(&page->lru, &pagelist); ret = migrate_pages(&pagelist, new_page, MPOL_MF_MOVE_ALL, - false, MIGRATE_SYNC, - MR_MEMORY_FAILURE); + MIGRATE_SYNC, MR_MEMORY_FAILURE); if (ret) { putback_lru_pages(&pagelist); pr_info("soft offline: %#lx: migration failed %d, type %lx\n", --- mmotm.orig/mm/memory_hotplug.c 2013-01-24 12:28:38.740127550 -0800 +++ mmotm/mm/memory_hotplug.c 2013-01-25 14:38:51.472208776 -0800 @@ -1283,8 +1283,7 @@ do_migrate_range(unsigned long start_pfn * migrate_pages returns # of failed pages. */ ret = migrate_pages(&source, alloc_migrate_target, 0, - true, MIGRATE_SYNC, - MR_MEMORY_HOTPLUG); +
[PATCH 9/11] ksm: enable KSM page migration
Migration of KSM pages is now safe: remove the PageKsm restrictions from mempolicy.c and migrate.c. But keep PageKsm out of __unmap_and_move()'s anon_vma contortions, which are irrelevant to KSM: it looks as if that code was preventing hotremove migration of KSM pages, unless they happened to be in swapcache. There is some question as to whether enforcing a NUMA mempolicy migration ought to migrate KSM pages, mapped into entirely unrelated processes; but moving page_mapcount > 1 is only permitted with MPOL_MF_MOVE_ALL anyway, and it seems reasonable to assume that you wouldn't set MADV_MERGEABLE on any area where this is a worry. Signed-off-by: Hugh Dickins --- mm/mempolicy.c |3 +-- mm/migrate.c | 21 +++-- 2 files changed, 4 insertions(+), 20 deletions(-) --- mmotm.orig/mm/mempolicy.c 2013-01-24 12:28:38.848127553 -0800 +++ mmotm/mm/mempolicy.c2013-01-25 14:38:49.596208731 -0800 @@ -496,9 +496,8 @@ static int check_pte_range(struct vm_are /* * vm_normal_page() filters out zero pages, but there might * still be PageReserved pages to skip, perhaps in a VDSO. -* And we cannot move PageKsm pages sensibly or safely yet. */ - if (PageReserved(page) || PageKsm(page)) + if (PageReserved(page)) continue; nid = page_to_nid(page); if (node_isset(nid, *nodes) == !!(flags & MPOL_MF_INVERT)) --- mmotm.orig/mm/migrate.c 2013-01-25 14:37:03.832206218 -0800 +++ mmotm/mm/migrate.c 2013-01-25 14:38:49.596208731 -0800 @@ -731,20 +731,6 @@ static int __unmap_and_move(struct page lock_page(page); } - /* -* Only memory hotplug's offline_pages() caller has locked out KSM, -* and can safely migrate a KSM page. The other cases have skipped -* PageKsm along with PageReserved - but it is only now when we have -* the page lock that we can be certain it will not go KSM beneath us -* (KSM will not upgrade a page from PageAnon to PageKsm when it sees -* its pagecount raised, but only here do we take the page lock which -* serializes that). -*/ - if (PageKsm(page) && !offlining) { - rc = -EBUSY; - goto unlock; - } - /* charge against new page */ mem_cgroup_prepare_migration(page, newpage, &mem); @@ -771,7 +757,7 @@ static int __unmap_and_move(struct page * File Caches may use write_page() or lock_page() in migration, then, * just care Anon page here. */ - if (PageAnon(page)) { + if (PageAnon(page) && !PageKsm(page)) { /* * Only page_lock_anon_vma_read() understands the subtleties of * getting a hold on an anon_vma from outside one of its mms. @@ -851,7 +837,6 @@ uncharge: mem_cgroup_end_migration(mem, page, newpage, (rc == MIGRATEPAGE_SUCCESS || rc == MIGRATEPAGE_BALLOON_SUCCESS)); -unlock: unlock_page(page); out: return rc; @@ -1156,7 +1141,7 @@ static int do_move_page_to_node_array(st goto set_status; /* Use PageReserved to check for zero page */ - if (PageReserved(page) || PageKsm(page)) + if (PageReserved(page)) goto put_and_set; pp->page = page; @@ -1318,7 +1303,7 @@ static void do_pages_stat_array(struct m err = -ENOENT; /* Use PageReserved to check for zero page */ - if (!page || PageReserved(page) || PageKsm(page)) + if (!page || PageReserved(page)) goto set_status; err = page_to_nid(page); -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 8/11] ksm: make !merge_across_nodes migration safe
The new KSM NUMA merge_across_nodes knob introduces a problem, when it's set to non-default 0: if a KSM page is migrated to a different NUMA node, how do we migrate its stable node to the right tree? And what if that collides with an existing stable node? ksm_migrate_page() can do no more than it's already doing, updating stable_node->kpfn: the stable tree itself cannot be manipulated without holding ksm_thread_mutex. So accept that a stable tree may temporarily indicate a page belonging to the wrong NUMA node, leave updating until the next pass of ksmd, just be careful not to merge other pages on to a misplaced page. Note nid of holding tree in stable_node, and recognize that it will not always match nid of kpfn. A misplaced KSM page is discovered, either when ksm_do_scan() next comes around to one of its rmap_items (we now have to go to cmp_and_merge_page even on pages in a stable tree), or when stable_tree_search() arrives at a matching node for another page, and this node page is found misplaced. In each case, move the misplaced stable_node to a list of migrate_nodes (and use the address of migrate_nodes as magic by which to identify them): we don't need them in a tree. If stable_tree_search() finds no match for a page, but it's currently exiled to this list, then slot its stable_node right there into the tree, bringing all of its mappings with it; otherwise they get migrated one by one to the original page of the colliding node. stable_tree_search() is now modelled more like stable_tree_insert(), in order to handle these insertions of migrated nodes. remove_node_from_stable_tree(), remove_all_stable_nodes() and ksm_check_stable_tree() have to handle the migrate_nodes list as well as the stable tree itself. Less obviously, we do need to prune the list of stale entries from time to time (scan_get_next_rmap_item() does it once each full scan): whereas stale nodes in the stable tree get naturally pruned as searches try to brush past them, these migrate_nodes may get forgotten and accumulate. Signed-off-by: Hugh Dickins --- mm/ksm.c | 164 +++-- 1 file changed, 134 insertions(+), 30 deletions(-) --- mmotm.orig/mm/ksm.c 2013-01-25 14:37:03.832206218 -0800 +++ mmotm/mm/ksm.c 2013-01-25 14:37:06.880206290 -0800 @@ -122,13 +122,25 @@ struct ksm_scan { /** * struct stable_node - node of the stable rbtree * @node: rb node of this ksm page in the stable tree + * @head: (overlaying parent) &migrate_nodes indicates temporarily on that list + * @list: linked into migrate_nodes, pending placement in the proper node tree * @hlist: hlist head of rmap_items using this ksm page - * @kpfn: page frame number of this ksm page + * @kpfn: page frame number of this ksm page (perhaps temporarily on wrong nid) + * @nid: NUMA node id of stable tree in which linked (may not match kpfn) */ struct stable_node { - struct rb_node node; + union { + struct rb_node node;/* when node of stable tree */ + struct {/* when listed for migration */ + struct list_head *head; + struct list_head list; + }; + }; struct hlist_head hlist; unsigned long kpfn; +#ifdef CONFIG_NUMA + int nid; +#endif }; /** @@ -169,6 +181,9 @@ struct rmap_item { static struct rb_root root_unstable_tree[MAX_NUMNODES]; static struct rb_root root_stable_tree[MAX_NUMNODES]; +/* Recently migrated nodes of stable tree, pending proper placement */ +static LIST_HEAD(migrate_nodes); + #define MM_SLOTS_HASH_BITS 10 static DEFINE_HASHTABLE(mm_slots_hash, MM_SLOTS_HASH_BITS); @@ -311,11 +326,6 @@ static void insert_to_mm_slots_hash(stru hash_add(mm_slots_hash, &mm_slot->link, (unsigned long)mm); } -static inline int in_stable_tree(struct rmap_item *rmap_item) -{ - return rmap_item->address & STABLE_FLAG; -} - /* * ksmd, and unmerge_and_remove_all_rmap_items(), must not touch an mm's * page tables after it has passed through ksm_exit() - which, if necessary, @@ -476,7 +486,6 @@ static void remove_node_from_stable_tree { struct rmap_item *rmap_item; struct hlist_node *hlist; - int nid; hlist_for_each_entry(rmap_item, hlist, &stable_node->hlist, hlist) { if (rmap_item->hlist.next) @@ -488,8 +497,11 @@ static void remove_node_from_stable_tree cond_resched(); } - nid = get_kpfn_nid(stable_node->kpfn); - rb_erase(&stable_node->node, &root_stable_tree[nid]); + if (stable_node->head == &migrate_nodes) + list_del(&stable_node->list); + else + rb_erase(&stable_node->node, +&root_stable_tree[NUMA(stable_node->nid)]); free_stable_node(stable_node); } @@ -712,6 +724,7 @@ static int remove_stable_node(struct sta static int remove_all_stable_nodes(void) { struct stable_node *s
Re: [PATCH 1/2]linux-usb:Define a new macro for USB storage match rules
On Sat, Jan 26, 2013 at 01:39:50AM +, Fangxiaozhi (Franko) wrote: > > > > -Original Message- > > From: Greg KH [mailto:g...@kroah.com] > > Sent: Saturday, January 26, 2013 1:45 AM > > To: Fangxiaozhi (Franko) > > Cc: Sergei Shtylyov; linux-...@vger.kernel.org; > > linux-kernel@vger.kernel.org; > > Xueguiying (Zihan); Linlei (Lei Lin); Yili (Neil); Wangyuhua (Roger, > > Credit); > > Huqiao (C); ba...@ti.com; mdharm-...@one-eyed-alien.net; > > sebast...@breakpoint.cc > > Subject: Re: [PATCH 1/2]linux-usb:Define a new macro for USB storage match > > rules > > > > On Fri, Jan 25, 2013 at 04:18:34PM +0400, Sergei Shtylyov wrote: > > > Hello. > > > > > > On 25-01-2013 6:44, fangxiaozhi 00110321 wrote: > > > > > > >From: fangxiaozhi > > > > > > >1. Define a new macro for USB storage match rules: > > > > matching with Vendor ID and interface descriptors. > > > > > > >Signed-off-by: fangxiaozhi > > > > > > > > > > > > diff -uprN linux-3.8-rc4_orig/drivers/usb/storage/usb.c > > > >linux-3.8-rc4/drivers/usb/storage/usb.c > > > >--- linux-3.8-rc4_orig/drivers/usb/storage/usb.c 2013-01-22 > > > >14:12:42.595238727 +0800 > > > >+++ linux-3.8-rc4/drivers/usb/storage/usb.c 2013-01-22 > > > >+++ 14:16:01.398250305 +0800 > > > >@@ -120,6 +120,17 @@ MODULE_PARM_DESC(quirks, "supplemental l > > > > .useTransport = use_transport, \ > > > > } > > > > > > > >+#define UNUSUAL_VENDOR_INTF(idVendor, cl, sc, pr, \ > > > >+ vendor_name, product_name, use_protocol, use_transport, \ > > > >+ init_function, Flags) \ > > > >+{ \ > > > >+ .vendorName = vendor_name, \ > > > >+ .productName = product_name, \ > > > >+ .useProtocol = use_protocol, \ > > > >+ .useTransport = use_transport, \ > > > >+ .initFunction = init_function, \ > > > >+} > > > > > > Shouldn't the field initilaizers be indented with tab, not space? > > > > Yes it must. fangxiaozhi, please always run your patches through the > > scripts/checkpatch.pl tool before sending them out (note, you will have to > > ignore the CamelCase warnings your patch produces, but not the other > > ones.) > > > -What's wrong with it? > -I have checked the patches with scripts/checkpatch.pl before sending. > -There is no other warning or error in my patches except CamelCase > warnings. > -So what's wrong now? Then your email client messed up the patches and put spaces in the code instead of tabs. Try looking at the message on the mailing list and run that through checkpatch, it will show you the problems. What I received isn't ok, sorry. greg k-h -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 7/11] ksm: make KSM page migration possible
KSM page migration is already supported in the case of memory hotremove, which takes the ksm_thread_mutex across all its migrations to keep life simple. But the new KSM NUMA merge_across_nodes knob introduces a problem, when it's set to non-default 0: if a KSM page is migrated to a different NUMA node, how do we migrate its stable node to the right tree? And what if that collides with an existing stable node? So far there's no provision for that, and this patch does not attempt to deal with it either. But how will I test a solution, when I don't know how to hotremove memory? The best answer is to enable KSM page migration in all cases now, and test more common cases. With THP and compaction added since KSM came in, page migration is now mainstream, and it's a shame that a KSM page can frustrate freeing a page block. Without worrying about merge_across_nodes 0 for now, this patch gets KSM page migration working reliably for default merge_across_nodes 1 (but leave the patch enabling it until near the end of the series). It's much simpler than I'd originally imagined, and does not require an additional tier of locking: page migration relies on the page lock, KSM page reclaim relies on the page lock, the page lock is enough for KSM page migration too. Almost all the care has to be in get_ksm_page(): that's the function which worries about when a stable node is stale and should be freed, now it also has to worry about the KSM page being migrated. The only new overhead is an additional put/get/lock/unlock_page when stable_tree_search() arrives at a matching node: to make sure migration respects the raised page count, and so does not migrate the page while we're busy with it here. That's probably avoidable, either by changing internal interfaces from using kpage to stable_node, or by moving the ksm_migrate_page() callsite into a page_freeze_refs() section (even if not swapcache); but this works well, I've no urge to pull it apart now. (Descents of the stable tree may pass through nodes whose KSM pages are under migration: being unlocked, the raised page count does not prevent that, nor need it: it's safe to memcmp against either old or new page.) You might worry about mremap, and whether page migration's rmap_walk to remove migration entries will find all the KSM locations where it inserted earlier: that should already be handled, by the satisfyingly heavy hammer of move_vma()'s call to ksm_madvise(,,,MADV_UNMERGEABLE,). Signed-off-by: Hugh Dickins --- mm/ksm.c | 94 ++--- mm/migrate.c |5 ++ 2 files changed, 77 insertions(+), 22 deletions(-) --- mmotm.orig/mm/ksm.c 2013-01-25 14:37:00.768206145 -0800 +++ mmotm/mm/ksm.c 2013-01-25 14:37:03.832206218 -0800 @@ -499,6 +499,7 @@ static void remove_node_from_stable_tree * In which case we can trust the content of the page, and it * returns the gotten page; but if the page has now been zapped, * remove the stale node from the stable tree and return NULL. + * But beware, the stable node's page might be being migrated. * * You would expect the stable_node to hold a reference to the ksm page. * But if it increments the page's count, swapping out has to wait for @@ -509,44 +510,77 @@ static void remove_node_from_stable_tree * pointing back to this stable node. This relies on freeing a PageAnon * page to reset its page->mapping to NULL, and relies on no other use of * a page to put something that might look like our key in page->mapping. - * - * include/linux/pagemap.h page_cache_get_speculative() is a good reference, - * but this is different - made simpler by ksm_thread_mutex being held, but - * interesting for assuming that no other use of the struct page could ever - * put our expected_mapping into page->mapping (or a field of the union which - * coincides with page->mapping). - * - * Note: it is possible that get_ksm_page() will return NULL one moment, - * then page the next, if the page is in between page_freeze_refs() and - * page_unfreeze_refs(): this shouldn't be a problem anywhere, the page * is on its way to being freed; but it is an anomaly to bear in mind. */ static struct page *get_ksm_page(struct stable_node *stable_node, bool locked) { struct page *page; void *expected_mapping; + unsigned long kpfn; - page = pfn_to_page(stable_node->kpfn); expected_mapping = (void *)stable_node + (PAGE_MAPPING_ANON | PAGE_MAPPING_KSM); - if (page->mapping != expected_mapping) - goto stale; - if (!get_page_unless_zero(page)) +again: + kpfn = ACCESS_ONCE(stable_node->kpfn); + page = pfn_to_page(kpfn); + + /* +* page is computed from kpfn, so on most architectures reading +* page->mapping is naturally ordered after reading node->kpfn, +* but on Alpha we need to be more careful. +*/ + smp_read_barrier_depends(); + if (ACCESS
[PATCH 6/11] ksm: remove old stable nodes more thoroughly
Switching merge_across_nodes after running KSM is liable to oops on stale nodes still left over from the previous stable tree. It's not something that people will often want to do, but it would be lame to demand a reboot when they're trying to determine which merge_across_nodes setting is best. How can this happen? We only permit switching merge_across_nodes when pages_shared is 0, and usually set run 2 to force that beforehand, which ought to unmerge everything: yet oopses still occur when you then run 1. Three causes: 1. The old stable tree (built according to the inverse merge_across_nodes) has not been fully torn down. A stable node lingers until get_ksm_page() notices that the page it references no longer references it: but the page is not necessarily freed as soon as expected, particularly when swapcache. Fix this with a pass through the old stable tree, applying get_ksm_page() to each of the remaining nodes (most found stale and removed immediately), with forced removal of any left over. Unless the page is still mapped: I've not seen that case, it shouldn't occur, but better to WARN_ON_ONCE and EBUSY than BUG. 2. __ksm_enter() has a nice little optimization, to insert the new mm just behind ksmd's cursor, so there's a full pass for it to stabilize (or be removed) before ksmd addresses it. Nice when ksmd is running, but not so nice when we're trying to unmerge all mms: we were missing those mms forked and inserted behind the unmerge cursor. Easily fixed by inserting at the end when KSM_RUN_UNMERGE. 3. It is possible for a KSM page to be faulted back from swapcache into an mm, just after unmerge_and_remove_all_rmap_items() scanned past it. Fix this by copying on fault when KSM_RUN_UNMERGE: but that is private to ksm.c, so dissolve the distinction between ksm_might_need_to_copy() and ksm_does_need_to_copy(), doing it all in the one call into ksm.c. A long outstanding, unrelated bugfix sneaks in with that third fix: ksm_does_need_to_copy() would copy from a !PageUptodate page (implying I/O error when read in from swap) to a page which it then marks Uptodate. Fix this case by not copying, letting do_swap_page() discover the error. Signed-off-by: Hugh Dickins --- include/linux/ksm.h | 18 ++--- mm/ksm.c| 83 +++--- mm/memory.c | 19 - 3 files changed, 92 insertions(+), 28 deletions(-) --- mmotm.orig/include/linux/ksm.h 2013-01-25 14:27:58.220193250 -0800 +++ mmotm/include/linux/ksm.h 2013-01-25 14:37:00.764206145 -0800 @@ -16,9 +16,6 @@ struct stable_node; struct mem_cgroup; -struct page *ksm_does_need_to_copy(struct page *page, - struct vm_area_struct *vma, unsigned long address); - #ifdef CONFIG_KSM int ksm_madvise(struct vm_area_struct *vma, unsigned long start, unsigned long end, int advice, unsigned long *vm_flags); @@ -73,15 +70,8 @@ static inline void set_page_stable_node( * We'd like to make this conditional on vma->vm_flags & VM_MERGEABLE, * but what if the vma was unmerged while the page was swapped out? */ -static inline int ksm_might_need_to_copy(struct page *page, - struct vm_area_struct *vma, unsigned long address) -{ - struct anon_vma *anon_vma = page_anon_vma(page); - - return anon_vma && - (anon_vma->root != vma->anon_vma->root || -page->index != linear_page_index(vma, address)); -} +struct page *ksm_might_need_to_copy(struct page *page, + struct vm_area_struct *vma, unsigned long address); int page_referenced_ksm(struct page *page, struct mem_cgroup *memcg, unsigned long *vm_flags); @@ -113,10 +103,10 @@ static inline int ksm_madvise(struct vm_ return 0; } -static inline int ksm_might_need_to_copy(struct page *page, +static inline struct page *ksm_might_need_to_copy(struct page *page, struct vm_area_struct *vma, unsigned long address) { - return 0; + return page; } static inline int page_referenced_ksm(struct page *page, --- mmotm.orig/mm/ksm.c 2013-01-25 14:36:58.856206099 -0800 +++ mmotm/mm/ksm.c 2013-01-25 14:37:00.768206145 -0800 @@ -644,6 +644,57 @@ static int unmerge_ksm_pages(struct vm_a /* * Only called through the sysfs control interface: */ +static int remove_stable_node(struct stable_node *stable_node) +{ + struct page *page; + int err; + + page = get_ksm_page(stable_node, true); + if (!page) { + /* +* get_ksm_page did remove_node_from_stable_tree itself. +*/ + return 0; + } + + if (WARN_ON_ONCE(page_mapped(page))) + err = -EBUSY; + else { + /* +* This page might be in a pagevec waiting to be freed, +* or it might be PageSwapCache (perhaps under writeback), +* or it might have been rem
[PATCH 5/11] ksm: get_ksm_page locked
In some places where get_ksm_page() is used, we need the page to be locked. When KSM migration is fully enabled, we shall want that to make sure that the page just acquired cannot be migrated beneath us (raised page count is only effective when there is serialization to make sure migration notices). Whereas when navigating through the stable tree, we certainly do not want to lock each node (raised page count is enough to guarantee the memcmps, even if page is migrated to another node). Since we're about to add another use case, add the locked argument to get_ksm_page() now. Hmm, what's that rcu_read_lock() about? Complete misunderstanding, I really got the wrong end of the stick on that! There's a configuration in which page_cache_get_speculative() can do something cheaper than get_page_unless_zero(), relying on its caller's rcu_read_lock() to have disabled preemption for it. There's no need for rcu_read_lock() around get_page_unless_zero() (and mapping checks) here. Cut out that silliness before making this any harder to understand. Signed-off-by: Hugh Dickins --- mm/ksm.c | 23 +-- 1 file changed, 13 insertions(+), 10 deletions(-) --- mmotm.orig/mm/ksm.c 2013-01-25 14:36:53.244205966 -0800 +++ mmotm/mm/ksm.c 2013-01-25 14:36:58.856206099 -0800 @@ -514,15 +514,14 @@ static void remove_node_from_stable_tree * but this is different - made simpler by ksm_thread_mutex being held, but * interesting for assuming that no other use of the struct page could ever * put our expected_mapping into page->mapping (or a field of the union which - * coincides with page->mapping). The RCU calls are not for KSM at all, but - * to keep the page_count protocol described with page_cache_get_speculative. + * coincides with page->mapping). * * Note: it is possible that get_ksm_page() will return NULL one moment, * then page the next, if the page is in between page_freeze_refs() and * page_unfreeze_refs(): this shouldn't be a problem anywhere, the page * is on its way to being freed; but it is an anomaly to bear in mind. */ -static struct page *get_ksm_page(struct stable_node *stable_node) +static struct page *get_ksm_page(struct stable_node *stable_node, bool locked) { struct page *page; void *expected_mapping; @@ -530,7 +529,6 @@ static struct page *get_ksm_page(struct page = pfn_to_page(stable_node->kpfn); expected_mapping = (void *)stable_node + (PAGE_MAPPING_ANON | PAGE_MAPPING_KSM); - rcu_read_lock(); if (page->mapping != expected_mapping) goto stale; if (!get_page_unless_zero(page)) @@ -539,10 +537,16 @@ static struct page *get_ksm_page(struct put_page(page); goto stale; } - rcu_read_unlock(); + if (locked) { + lock_page(page); + if (page->mapping != expected_mapping) { + unlock_page(page); + put_page(page); + goto stale; + } + } return page; stale: - rcu_read_unlock(); remove_node_from_stable_tree(stable_node); return NULL; } @@ -558,11 +562,10 @@ static void remove_rmap_item_from_tree(s struct page *page; stable_node = rmap_item->head; - page = get_ksm_page(stable_node); + page = get_ksm_page(stable_node, true); if (!page) goto out; - lock_page(page); hlist_del(&rmap_item->hlist); unlock_page(page); put_page(page); @@ -1042,7 +1045,7 @@ static struct page *stable_tree_search(s cond_resched(); stable_node = rb_entry(node, struct stable_node, node); - tree_page = get_ksm_page(stable_node); + tree_page = get_ksm_page(stable_node, false); if (!tree_page) return NULL; @@ -1086,7 +1089,7 @@ static struct stable_node *stable_tree_i cond_resched(); stable_node = rb_entry(*new, struct stable_node, node); - tree_page = get_ksm_page(stable_node); + tree_page = get_ksm_page(stable_node, false); if (!tree_page) return NULL; -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 4/11] ksm: reorganize ksm_check_stable_tree
Memory hotremove's ksm_check_stable_tree() is pitifully inefficient (restarting whenever it finds a stale node to remove), but rearrange so that at least it does not needlessly restart from nid 0 each time. And add a couple of comments: here is why we keep pfn instead of page. Signed-off-by: Hugh Dickins --- mm/ksm.c | 38 ++ 1 file changed, 22 insertions(+), 16 deletions(-) --- mmotm.orig/mm/ksm.c 2013-01-25 14:36:52.152205940 -0800 +++ mmotm/mm/ksm.c 2013-01-25 14:36:53.244205966 -0800 @@ -1830,31 +1830,36 @@ void ksm_migrate_page(struct page *newpa #endif /* CONFIG_MIGRATION */ #ifdef CONFIG_MEMORY_HOTREMOVE -static struct stable_node *ksm_check_stable_tree(unsigned long start_pfn, -unsigned long end_pfn) +static void ksm_check_stable_tree(unsigned long start_pfn, + unsigned long end_pfn) { + struct stable_node *stable_node; struct rb_node *node; int nid; - for (nid = 0; nid < nr_node_ids; nid++) - for (node = rb_first(&root_stable_tree[nid]); node; - node = rb_next(node)) { - struct stable_node *stable_node; - + for (nid = 0; nid < nr_node_ids; nid++) { + node = rb_first(&root_stable_tree[nid]); + while (node) { stable_node = rb_entry(node, struct stable_node, node); if (stable_node->kpfn >= start_pfn && - stable_node->kpfn < end_pfn) - return stable_node; + stable_node->kpfn < end_pfn) { + /* +* Don't get_ksm_page, page has already gone: +* which is why we keep kpfn instead of page* +*/ + remove_node_from_stable_tree(stable_node); + node = rb_first(&root_stable_tree[nid]); + } else + node = rb_next(node); + cond_resched(); } - - return NULL; + } } static int ksm_memory_callback(struct notifier_block *self, unsigned long action, void *arg) { struct memory_notify *mn = arg; - struct stable_node *stable_node; switch (action) { case MEM_GOING_OFFLINE: @@ -1874,11 +1879,12 @@ static int ksm_memory_callback(struct no /* * Most of the work is done by page migration; but there might * be a few stable_nodes left over, still pointing to struct -* pages which have been offlined: prune those from the tree. +* pages which have been offlined: prune those from the tree, +* otherwise get_ksm_page() might later try to access a +* non-existent struct page. */ - while ((stable_node = ksm_check_stable_tree(mn->start_pfn, - mn->start_pfn + mn->nr_pages)) != NULL) - remove_node_from_stable_tree(stable_node); + ksm_check_stable_tree(mn->start_pfn, + mn->start_pfn + mn->nr_pages); /* fallthrough */ case MEM_CANCEL_OFFLINE: -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 3/11] ksm: trivial tidyups
Add NUMA() and DO_NUMA() macros to minimize blight of #ifdef CONFIG_NUMAs (but indeed we don't want to expand struct rmap_item by nid when not NUMA). Add comment, remove "unsigned" from rmap_item->nid, as "int nid" elsewhere. Define ksm_merge_across_nodes 1U when #ifndef NUMA to help optimizing out. Use ?: in get_kpfn_nid(). Adjust a few comments noticed in ongoing work. Leave stable_tree_insert()'s rb_linkage until after the node has been set up, as unstable_tree_search_insert() does: ksm_thread_mutex and page lock make either way safe, but we're going to copy and I prefer this precedent. Signed-off-by: Hugh Dickins --- mm/ksm.c | 48 ++-- 1 file changed, 22 insertions(+), 26 deletions(-) --- mmotm.orig/mm/ksm.c 2013-01-25 14:36:38.608205618 -0800 +++ mmotm/mm/ksm.c 2013-01-25 14:36:52.152205940 -0800 @@ -41,6 +41,14 @@ #include #include "internal.h" +#ifdef CONFIG_NUMA +#define NUMA(x)(x) +#define DO_NUMA(x) (x) +#else +#define NUMA(x)(0) +#define DO_NUMA(x) do { } while (0) +#endif + /* * A few notes about the KSM scanning process, * to make it easier to understand the data structures below: @@ -130,6 +138,7 @@ struct stable_node { * @mm: the memory structure this rmap_item is pointing into * @address: the virtual address this rmap_item tracks (+ flags in low bits) * @oldchecksum: previous checksum of the page at that virtual address + * @nid: NUMA node id of unstable tree in which linked (may not match page) * @node: rb node of this rmap_item in the unstable tree * @head: pointer to stable_node heading this list in the stable tree * @hlist: link into hlist of rmap_items hanging off that stable_node @@ -141,7 +150,7 @@ struct rmap_item { unsigned long address; /* + low bits used for flags below */ unsigned int oldchecksum; /* when unstable */ #ifdef CONFIG_NUMA - unsigned int nid; + int nid; #endif union { struct rb_node node;/* when node of unstable tree */ @@ -192,8 +201,12 @@ static unsigned int ksm_thread_pages_to_ /* Milliseconds ksmd should sleep between batches */ static unsigned int ksm_thread_sleep_millisecs = 20; +#ifdef CONFIG_NUMA /* Zeroed when merging across nodes is not allowed */ static unsigned int ksm_merge_across_nodes = 1; +#else +#define ksm_merge_across_nodes 1U +#endif #define KSM_RUN_STOP 0 #define KSM_RUN_MERGE 1 @@ -456,10 +469,7 @@ out: page = NULL; */ static inline int get_kpfn_nid(unsigned long kpfn) { - if (ksm_merge_across_nodes) - return 0; - else - return pfn_to_nid(kpfn); + return ksm_merge_across_nodes ? 0 : pfn_to_nid(kpfn); } static void remove_node_from_stable_tree(struct stable_node *stable_node) @@ -479,7 +489,6 @@ static void remove_node_from_stable_tree } nid = get_kpfn_nid(stable_node->kpfn); - rb_erase(&stable_node->node, &root_stable_tree[nid]); free_stable_node(stable_node); } @@ -578,13 +587,8 @@ static void remove_rmap_item_from_tree(s age = (unsigned char)(ksm_scan.seqnr - rmap_item->address); BUG_ON(age > 1); if (!age) -#ifdef CONFIG_NUMA rb_erase(&rmap_item->node, - &root_unstable_tree[rmap_item->nid]); -#else - rb_erase(&rmap_item->node, &root_unstable_tree[0]); -#endif - +&root_unstable_tree[NUMA(rmap_item->nid)]); ksm_pages_unshared--; rmap_item->address &= PAGE_MASK; } @@ -604,7 +608,7 @@ static void remove_trailing_rmap_items(s } /* - * Though it's very tempting to unmerge in_stable_tree(rmap_item)s rather + * Though it's very tempting to unmerge rmap_items from stable tree rather * than check every pte of a given vma, the locking doesn't quite work for * that - an rmap_item is assigned to the stable tree after inserting ksm * page and upping mmap_sem. Nor does it fit with the way we skip dup'ing @@ -1058,7 +1062,7 @@ static struct page *stable_tree_search(s } /* - * stable_tree_insert - insert rmap_item pointing to new ksm page + * stable_tree_insert - insert stable tree node pointing to new ksm page * into the stable tree. * * This function returns the stable tree node just allocated on success, @@ -1108,13 +1112,11 @@ static struct stable_node *stable_tree_i if (!stable_node) return NULL; - rb_link_node(&stable_node->node, parent, new); - rb_insert_color(&stable_node->node, &root_stable_tree[nid]); - INIT_HLIST_HEAD(&stable_node->hlist); - stable_node->kpfn = kpfn; set_page_stable_node(kpage, stable_node); + rb_link_node(&stable_node->node, parent, new); + rb_insert_color(&stable_node->node, &root_stable_tree[nid]); return stable_node; } @
[PATCH 2/11] ksm: add sysfs ABI Documentation
From: Petr Holasek This patch adds sysfs documentation for Kernel Samepage Merging (KSM) including new merge_across_nodes knob. Signed-off-by: Petr Holasek Signed-off-by: Hugh Dickins --- Documentation/ABI/testing/sysfs-kernel-mm-ksm | 52 1 file changed, 52 insertions(+) create mode 100644 Documentation/ABI/testing/sysfs-kernel-mm-ksm --- /dev/null 1970-01-01 00:00:00.0 + +++ mmotm/Documentation/ABI/testing/sysfs-kernel-mm-ksm 2013-01-25 14:36:50.660205905 -0800 @@ -0,0 +1,52 @@ +What: /sys/kernel/mm/ksm +Date: September 2009 +KernelVersion: 2.6.32 +Contact: Linux memory management mailing list +Description: Interface for Kernel Samepage Merging (KSM) + +What: /sys/kernel/mm/ksm/full_scans +What: /sys/kernel/mm/ksm/pages_shared +What: /sys/kernel/mm/ksm/pages_sharing +What: /sys/kernel/mm/ksm/pages_to_scan +What: /sys/kernel/mm/ksm/pages_unshared +What: /sys/kernel/mm/ksm/pages_volatile +What: /sys/kernel/mm/ksm/run +What: /sys/kernel/mm/ksm/sleep_millisecs +Date: September 2009 +Contact: Linux memory management mailing list +Description: Kernel Samepage Merging daemon sysfs interface + + full_scans: how many times all mergeable areas have been + scanned. + + pages_shared: how many shared pages are being used. + + pages_sharing: how many more sites are sharing them i.e. how + much saved. + + pages_to_scan: how many present pages to scan before ksmd goes + to sleep. + + pages_unshared: how many pages unique but repeatedly checked + for merging. + + pages_volatile: how many pages changing too fast to be placed + in a tree. + + run: write 0 to disable ksm, read 0 while ksm is disabled. + write 1 to run ksm, read 1 while ksm is running. + write 2 to disable ksm and unmerge all its pages. + + sleep_millisecs: how many milliseconds ksm should sleep between + scans. + + See Documentation/vm/ksm.txt for more information. + +What: /sys/kernel/mm/ksm/merge_across_nodes +Date: January 2013 +KernelVersion: 3.9 +Contact: Linux memory management mailing list +Description: Control merging pages across different NUMA nodes. + + When it is set to 0 only pages from the same node are merged, + otherwise pages from all nodes can be merged together (default). -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:x86/mm] x86, kvm: Fix kvm's use of __pa() on percpu areas
Commit-ID: 5dfd486c4750c9278c63fa96e6e85bdd2fb58e9d Gitweb: http://git.kernel.org/tip/5dfd486c4750c9278c63fa96e6e85bdd2fb58e9d Author: Dave Hansen AuthorDate: Tue, 22 Jan 2013 13:24:35 -0800 Committer: H. Peter Anvin CommitDate: Fri, 25 Jan 2013 16:34:55 -0800 x86, kvm: Fix kvm's use of __pa() on percpu areas In short, it is illegal to call __pa() on an address holding a percpu variable. This replaces those __pa() calls with slow_virt_to_phys(). All of the cases in this patch are in boot time (or CPU hotplug time at worst) code, so the slow pagetable walking in slow_virt_to_phys() is not expected to have a performance impact. The times when this actually matters are pretty obscure (certain 32-bit NUMA systems), but it _does_ happen. It is important to keep KVM guests working on these systems because the real hardware is getting harder and harder to find. This bug manifested first by me seeing a plain hang at boot after this message: CPU 0 irqstacks, hard=f3018000 soft=f301a000 or, sometimes, it would actually make it out to the console: [0.00] BUG: unable to handle kernel paging request at I eventually traced it down to the KVM async pagefault code. This can be worked around by disabling that code either at compile-time, or on the kernel command-line. The kvm async pagefault code was injecting page faults in to the guest which the guest misinterpreted because its "reason" was not being properly sent from the host. The guest passes a physical address of an per-cpu async page fault structure via an MSR to the host. Since __pa() is broken on percpu data, the physical address it sent was bascially bogus and the host went scribbling on random data. The guest never saw the real reason for the page fault (it was injected by the host), assumed that the kernel had taken a _real_ page fault, and panic()'d. The behavior varied, though, depending on what got corrupted by the bad write. Signed-off-by: Dave Hansen Link: http://lkml.kernel.org/r/20130122212435.49056...@kernel.stglabs.ibm.com Acked-by: Rik van Riel Reviewed-by: Marcelo Tosatti Signed-off-by: H. Peter Anvin --- arch/x86/kernel/kvm.c | 9 + arch/x86/kernel/kvmclock.c | 4 ++-- 2 files changed, 7 insertions(+), 6 deletions(-) diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c index 9c2bd8b..aa7e58b 100644 --- a/arch/x86/kernel/kvm.c +++ b/arch/x86/kernel/kvm.c @@ -297,9 +297,9 @@ static void kvm_register_steal_time(void) memset(st, 0, sizeof(*st)); - wrmsrl(MSR_KVM_STEAL_TIME, (__pa(st) | KVM_MSR_ENABLED)); + wrmsrl(MSR_KVM_STEAL_TIME, (slow_virt_to_phys(st) | KVM_MSR_ENABLED)); printk(KERN_INFO "kvm-stealtime: cpu %d, msr %lx\n", - cpu, __pa(st)); + cpu, slow_virt_to_phys(st)); } static DEFINE_PER_CPU(unsigned long, kvm_apic_eoi) = KVM_PV_EOI_DISABLED; @@ -324,7 +324,7 @@ void __cpuinit kvm_guest_cpu_init(void) return; if (kvm_para_has_feature(KVM_FEATURE_ASYNC_PF) && kvmapf) { - u64 pa = __pa(&__get_cpu_var(apf_reason)); + u64 pa = slow_virt_to_phys(&__get_cpu_var(apf_reason)); #ifdef CONFIG_PREEMPT pa |= KVM_ASYNC_PF_SEND_ALWAYS; @@ -340,7 +340,8 @@ void __cpuinit kvm_guest_cpu_init(void) /* Size alignment is implied but just to make it explicit. */ BUILD_BUG_ON(__alignof__(kvm_apic_eoi) < 4); __get_cpu_var(kvm_apic_eoi) = 0; - pa = __pa(&__get_cpu_var(kvm_apic_eoi)) | KVM_MSR_ENABLED; + pa = slow_virt_to_phys(&__get_cpu_var(kvm_apic_eoi)) + | KVM_MSR_ENABLED; wrmsrl(MSR_KVM_PV_EOI_EN, pa); } diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c index 220a360..9f966dc 100644 --- a/arch/x86/kernel/kvmclock.c +++ b/arch/x86/kernel/kvmclock.c @@ -162,8 +162,8 @@ int kvm_register_clock(char *txt) int low, high, ret; struct pvclock_vcpu_time_info *src = &hv_clock[cpu].pvti; - low = (int)__pa(src) | 1; - high = ((u64)__pa(src) >> 32); + low = (int)slow_virt_to_phys(src) | 1; + high = ((u64)slow_virt_to_phys(src) >> 32); ret = native_write_msr_safe(msr_kvm_system_time, low, high); printk(KERN_INFO "kvm-clock: cpu %d, msr %x:%x, %s\n", cpu, high, low, txt); -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:x86/mm] x86, mm: Create slow_virt_to_phys()
Commit-ID: d765653445129b7c476758040e3079480775f80a Gitweb: http://git.kernel.org/tip/d765653445129b7c476758040e3079480775f80a Author: Dave Hansen AuthorDate: Tue, 22 Jan 2013 13:24:33 -0800 Committer: H. Peter Anvin CommitDate: Fri, 25 Jan 2013 16:33:23 -0800 x86, mm: Create slow_virt_to_phys() This is necessary because __pa() does not work on some kinds of memory, like vmalloc() or the alloc_remap() areas on 32-bit NUMA systems. We have some functions to do conversions _like_ this in the vmalloc() code (like vmalloc_to_page()), but they do not work on sizes other than 4k pages. We would potentially need to be able to handle all the page sizes that we use for the kernel linear mapping (4k, 2M, 1G). In practice, on 32-bit NUMA systems, the percpu areas get stuck in the alloc_remap() area. Any __pa() call on them will break and basically return garbage. This patch introduces a new function slow_virt_to_phys(), which walks the kernel page tables on x86 and should do precisely the same logical thing as __pa(), but actually work on a wider range of memory. It should work on the normal linear mapping, vmalloc(), kmap(), etc... Signed-off-by: Dave Hansen Link: http://lkml.kernel.org/r/20130122212433.4d1fc...@kernel.stglabs.ibm.com Acked-by: Rik van Riel Signed-off-by: H. Peter Anvin --- arch/x86/include/asm/pgtable_types.h | 1 + arch/x86/mm/pageattr.c | 31 +++ 2 files changed, 32 insertions(+) diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h index 6c297e7..9f82690 100644 --- a/arch/x86/include/asm/pgtable_types.h +++ b/arch/x86/include/asm/pgtable_types.h @@ -352,6 +352,7 @@ static inline void update_page_count(int level, unsigned long pages) { } * as a pte too. */ extern pte_t *lookup_address(unsigned long address, unsigned int *level); +extern phys_addr_t slow_virt_to_phys(void *__address); #endif /* !__ASSEMBLY__ */ diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c index 2a5c9ab..6d13d2a 100644 --- a/arch/x86/mm/pageattr.c +++ b/arch/x86/mm/pageattr.c @@ -364,6 +364,37 @@ pte_t *lookup_address(unsigned long address, unsigned int *level) EXPORT_SYMBOL_GPL(lookup_address); /* + * This is necessary because __pa() does not work on some + * kinds of memory, like vmalloc() or the alloc_remap() + * areas on 32-bit NUMA systems. The percpu areas can + * end up in this kind of memory, for instance. + * + * This could be optimized, but it is only intended to be + * used at inititalization time, and keeping it + * unoptimized should increase the testing coverage for + * the more obscure platforms. + */ +phys_addr_t slow_virt_to_phys(void *__virt_addr) +{ + unsigned long virt_addr = (unsigned long)__virt_addr; + phys_addr_t phys_addr; + unsigned long offset; + enum pg_level level; + unsigned long psize; + unsigned long pmask; + pte_t *pte; + + pte = lookup_address(virt_addr, &level); + BUG_ON(!pte); + psize = page_level_size(level); + pmask = page_level_mask(level); + offset = virt_addr & ~pmask; + phys_addr = pte_pfn(*pte) << PAGE_SHIFT; + return (phys_addr | offset); +} +EXPORT_SYMBOL_GPL(slow_virt_to_phys); + +/* * Set the new pmd in all the pgds we know about: */ static void __set_pmd_pte(pte_t *kpte, unsigned long address, pte_t pte) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 1/11] ksm: allow trees per NUMA node
From: Petr Holasek Introduces new sysfs boolean knob /sys/kernel/mm/ksm/merge_across_nodes which control merging pages across different numa nodes. When it is set to zero only pages from the same node are merged, otherwise pages from all nodes can be merged together (default behavior). Typical use-case could be a lot of KVM guests on NUMA machine and cpus from more distant nodes would have significant increase of access latency to the merged ksm page. Sysfs knob was choosen for higher variability when some users still prefers higher amount of saved physical memory regardless of access latency. Every numa node has its own stable & unstable trees because of faster searching and inserting. Changing of merge_across_nodes value is possible only when there are not any ksm shared pages in system. I've tested this patch on numa machines with 2, 4 and 8 nodes and measured speed of memory access inside of KVM guests with memory pinned to one of nodes with this benchmark: http://pholasek.fedorapeople.org/alloc_pg.c Population standard deviations of access times in percentage of average were following: merge_across_nodes=1 2 nodes 1.4% 4 nodes 1.6% 8 nodes 1.7% merge_across_nodes=0 2 nodes 1% 4 nodes 0.32% 8 nodes 0.018% RFC: https://lkml.org/lkml/2011/11/30/91 v1: https://lkml.org/lkml/2012/1/23/46 v2: https://lkml.org/lkml/2012/6/29/105 v3: https://lkml.org/lkml/2012/9/14/550 v4: https://lkml.org/lkml/2012/9/23/137 v5: https://lkml.org/lkml/2012/12/10/540 v6: https://lkml.org/lkml/2012/12/23/154 v7: https://lkml.org/lkml/2012/12/27/225 Hugh notes that this patch brings two problems, whose solution needs further support in mm/ksm.c, which follows in subsequent patches: 1) switching merge_across_nodes after running KSM is liable to oops on stale nodes still left over from the previous stable tree; 2) memory hotremove may migrate KSM pages, but there is no provision here for !merge_across_nodes to migrate nodes to the proper tree. Signed-off-by: Petr Holasek Signed-off-by: Hugh Dickins Acked-by: Rik van Riel --- Documentation/vm/ksm.txt |7 + mm/ksm.c | 151 - 2 files changed, 139 insertions(+), 19 deletions(-) --- mmotm.orig/Documentation/vm/ksm.txt 2013-01-25 14:36:31.724205455 -0800 +++ mmotm/Documentation/vm/ksm.txt 2013-01-25 14:36:38.608205618 -0800 @@ -58,6 +58,13 @@ sleep_millisecs - how many milliseconds e.g. "echo 20 > /sys/kernel/mm/ksm/sleep_millisecs" Default: 20 (chosen for demonstration purposes) +merge_across_nodes - specifies if pages from different numa nodes can be merged. + When set to 0, ksm merges only pages which physically + reside in the memory area of same NUMA node. It brings + lower latency to access to shared page. Value can be + changed only when there is no ksm shared pages in system. + Default: 1 + run - set 0 to stop ksmd from running but keep merged pages, set 1 to run ksmd e.g. "echo 1 > /sys/kernel/mm/ksm/run", set 2 to stop ksmd and unmerge all pages currently merged, --- mmotm.orig/mm/ksm.c 2013-01-25 14:36:31.724205455 -0800 +++ mmotm/mm/ksm.c 2013-01-25 14:36:38.608205618 -0800 @@ -36,6 +36,7 @@ #include #include #include +#include #include #include "internal.h" @@ -139,6 +140,9 @@ struct rmap_item { struct mm_struct *mm; unsigned long address; /* + low bits used for flags below */ unsigned int oldchecksum; /* when unstable */ +#ifdef CONFIG_NUMA + unsigned int nid; +#endif union { struct rb_node node;/* when node of unstable tree */ struct {/* when listed from stable tree */ @@ -153,8 +157,8 @@ struct rmap_item { #define STABLE_FLAG0x200 /* is listed from the stable tree */ /* The stable and unstable tree heads */ -static struct rb_root root_stable_tree = RB_ROOT; -static struct rb_root root_unstable_tree = RB_ROOT; +static struct rb_root root_unstable_tree[MAX_NUMNODES]; +static struct rb_root root_stable_tree[MAX_NUMNODES]; #define MM_SLOTS_HASH_BITS 10 static DEFINE_HASHTABLE(mm_slots_hash, MM_SLOTS_HASH_BITS); @@ -188,6 +192,9 @@ static unsigned int ksm_thread_pages_to_ /* Milliseconds ksmd should sleep between batches */ static unsigned int ksm_thread_sleep_millisecs = 20; +/* Zeroed when merging across nodes is not allowed */ +static unsigned int ksm_merge_across_nodes = 1; + #define KSM_RUN_STOP 0 #define KSM_RUN_MERGE 1 #define KSM_RUN_UNMERGE2 @@ -441,10 +448,25 @@ out: page = NULL; return page; } +/* + * This helper is used for getting right index into array of tree roots. + * When merge_across_nodes knob is set to 1, there are only two rb-trees for + * stable and unstable pages from all nodes with roots in index 0. Otherwise,
[tip:x86/mm] x86, mm: Use new pagetable helpers in try_preserve_large_page()
Commit-ID: f3c4fbb68e93b10c781c0cc462a9d80770244da6 Gitweb: http://git.kernel.org/tip/f3c4fbb68e93b10c781c0cc462a9d80770244da6 Author: Dave Hansen AuthorDate: Tue, 22 Jan 2013 13:24:32 -0800 Committer: H. Peter Anvin CommitDate: Fri, 25 Jan 2013 16:33:23 -0800 x86, mm: Use new pagetable helpers in try_preserve_large_page() try_preserve_large_page() can be slightly simplified by using the new page_level_*() helpers. This also moves the 'level' over to the new pg_level enum type. Signed-off-by: Dave Hansen Link: http://lkml.kernel.org/r/20130122212432.14f3d...@kernel.stglabs.ibm.com Signed-off-by: H. Peter Anvin --- arch/x86/mm/pageattr.c | 11 --- 1 file changed, 4 insertions(+), 7 deletions(-) diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c index 40f92f3..2a5c9ab 100644 --- a/arch/x86/mm/pageattr.c +++ b/arch/x86/mm/pageattr.c @@ -396,7 +396,7 @@ try_preserve_large_page(pte_t *kpte, unsigned long address, pte_t new_pte, old_pte, *tmp; pgprot_t old_prot, new_prot, req_prot; int i, do_split = 1; - unsigned int level; + enum pg_level level; if (cpa->force_split) return 1; @@ -412,15 +412,12 @@ try_preserve_large_page(pte_t *kpte, unsigned long address, switch (level) { case PG_LEVEL_2M: - psize = PMD_PAGE_SIZE; - pmask = PMD_PAGE_MASK; - break; #ifdef CONFIG_X86_64 case PG_LEVEL_1G: - psize = PUD_PAGE_SIZE; - pmask = PUD_PAGE_MASK; - break; #endif + psize = page_level_size(level); + pmask = page_level_mask(level); + break; default: do_split = -EINVAL; goto out_unlock; -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 0/11] ksm: NUMA trees and page migration
Here's a KSM series, based on mmotm 2013-01-23-17-04: starting with Petr's v7 "KSM: numa awareness sysfs knob"; then fixing the two issues we had with that, fully enabling KSM page migration on the way. (A different kind of KSM/NUMA issue which I've certainly not begun to address here: when KSM pages are unmerged, there's usually no sense in preferring to allocate the new pages local to the caller's node.) Petr, I have intentionally changed the titles of yours: partly because your "sysfs knob" understated it, but mainly because I think gmail is liable to assign 1/11 and 2/11 to your earlier December thread, making them vanish from this series. I hope a change of title prevents that. 1 ksm: allow trees per NUMA node 2 ksm: add sysfs ABI Documentation 3 ksm: trivial tidyups 4 ksm: reorganize ksm_check_stable_tree 5 ksm: get_ksm_page locked 6 ksm: remove old stable nodes more thoroughly 7 ksm: make KSM page migration possible 8 ksm: make !merge_across_nodes migration safe 9 mm: enable KSM page migration 10 mm: remove offlining arg to migrate_pages 11 ksm: stop hotremove lockdep warning Documentation/ABI/testing/sysfs-kernel-mm-ksm | 52 + Documentation/vm/ksm.txt |7 include/linux/ksm.h | 18 include/linux/migrate.h | 14 mm/compaction.c |2 mm/ksm.c | 566 +--- mm/memory-failure.c |7 mm/memory.c | 19 mm/memory_hotplug.c |3 mm/mempolicy.c| 11 mm/migrate.c | 61 - mm/page_alloc.c |6 12 files changed, 580 insertions(+), 186 deletions(-) Hugh -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:x86/mm] x86, mm: Pagetable level size/shift/mask helpers
Commit-ID: 4cbeb51b860c57ba8b2ae50c4016ee7a41f5fbd5 Gitweb: http://git.kernel.org/tip/4cbeb51b860c57ba8b2ae50c4016ee7a41f5fbd5 Author: Dave Hansen AuthorDate: Tue, 22 Jan 2013 13:24:31 -0800 Committer: H. Peter Anvin CommitDate: Fri, 25 Jan 2013 16:33:22 -0800 x86, mm: Pagetable level size/shift/mask helpers I plan to use lookup_address() to walk the kernel pagetables in a later patch. It returns a "pte" and the level in the pagetables where the "pte" was found. The level is just an enum and needs to be converted to a useful value in order to do address calculations with it. These helpers will be used in at least two places. This also gives the anonymous enum a real name so that no one gets confused about what they should be passing in to these helpers. "PTE_SHIFT" was chosen for naming consistency with the other pagetable levels (PGD/PUD/PMD_SHIFT). Cc: H. Peter Anvin Signed-off-by: Dave Hansen Link: http://lkml.kernel.org/r/20130122212431.405d3...@kernel.stglabs.ibm.com Signed-off-by: H. Peter Anvin --- arch/x86/include/asm/pgtable.h | 14 ++ arch/x86/include/asm/pgtable_types.h | 2 +- 2 files changed, 15 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index 5199db2..bc28e6f 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -390,6 +390,7 @@ pte_t *populate_extra_pte(unsigned long vaddr); #ifndef __ASSEMBLY__ #include +#include static inline int pte_none(pte_t pte) { @@ -781,6 +782,19 @@ static inline void clone_pgd_range(pgd_t *dst, pgd_t *src, int count) memcpy(dst, src, count * sizeof(pgd_t)); } +#define PTE_SHIFT ilog2(PTRS_PER_PTE) +static inline int page_level_shift(enum pg_level level) +{ + return (PAGE_SHIFT - PTE_SHIFT) + level * PTE_SHIFT; +} +static inline unsigned long page_level_size(enum pg_level level) +{ + return 1UL << page_level_shift(level); +} +static inline unsigned long page_level_mask(enum pg_level level) +{ + return ~(page_level_size(level) - 1); +} #include #endif /* __ASSEMBLY__ */ diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h index 3c32db8..6c297e7 100644 --- a/arch/x86/include/asm/pgtable_types.h +++ b/arch/x86/include/asm/pgtable_types.h @@ -331,7 +331,7 @@ extern void native_pagetable_init(void); struct seq_file; extern void arch_report_meminfo(struct seq_file *m); -enum { +enum pg_level { PG_LEVEL_NONE, PG_LEVEL_4K, PG_LEVEL_2M, -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 1/1] Drivers: scsi: storvsc: Initialize the sglist
Properly initialize scatterlist before using it. Signed-off-by: K. Y. Srinivasan Cc: sta...@vger.kernel.org --- drivers/scsi/storvsc_drv.c |1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c index 270b3cf..5ada1d0 100644 --- a/drivers/scsi/storvsc_drv.c +++ b/drivers/scsi/storvsc_drv.c @@ -467,6 +467,7 @@ static struct scatterlist *create_bounce_buffer(struct scatterlist *sgl, if (!bounce_sgl) return NULL; + sg_init_table(bounce_sgl, num_pages); for (i = 0; i < num_pages; i++) { page_buf = alloc_page(GFP_ATOMIC); if (!page_buf) -- 1.7.4.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:x86/mm] x86, mm: Make DEBUG_VIRTUAL work earlier in boot
Commit-ID: a25b9316841c5afa226f8f70a457861b35276a92 Gitweb: http://git.kernel.org/tip/a25b9316841c5afa226f8f70a457861b35276a92 Author: Dave Hansen AuthorDate: Tue, 22 Jan 2013 13:24:30 -0800 Committer: H. Peter Anvin CommitDate: Fri, 25 Jan 2013 16:33:22 -0800 x86, mm: Make DEBUG_VIRTUAL work earlier in boot The KVM code has some repeated bugs in it around use of __pa() on per-cpu data. Those data are not in an area on which using __pa() is valid. However, they are also called early enough in boot that __vmalloc_start_set is not set, and thus the CONFIG_DEBUG_VIRTUAL debugging does not catch them. This adds a check to also verify __pa() calls against max_low_pfn, which we can use earler in boot than is_vmalloc_addr(). However, if we are super-early in boot, max_low_pfn=0 and this will trip on every call, so also make sure that max_low_pfn is set before we try to use it. With this patch applied, CONFIG_DEBUG_VIRTUAL will actually catch the bug I was chasing (and fix later in this series). I'd love to find a generic way so that any __pa() call on percpu areas could do a BUG_ON(), but there don't appear to be any nice and easy ways to check if an address is a percpu one. Anybody have ideas on a way to do this? Signed-off-by: Dave Hansen Link: http://lkml.kernel.org/r/20130122212430.f46f8...@kernel.stglabs.ibm.com Signed-off-by: H. Peter Anvin --- arch/x86/mm/numa.c | 2 +- arch/x86/mm/pat.c | 4 ++-- arch/x86/mm/physaddr.c | 9 - 3 files changed, 11 insertions(+), 4 deletions(-) diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c index 2d125be..76604eb 100644 --- a/arch/x86/mm/numa.c +++ b/arch/x86/mm/numa.c @@ -219,7 +219,7 @@ static void __init setup_node_data(int nid, u64 start, u64 end) */ nd = alloc_remap(nid, nd_size); if (nd) { - nd_pa = __pa(nd); + nd_pa = __phys_addr_nodebug(nd); remapped = true; } else { nd_pa = memblock_alloc_nid(nd_size, SMP_CACHE_BYTES, nid); diff --git a/arch/x86/mm/pat.c b/arch/x86/mm/pat.c index 0eb572e..2610bd9 100644 --- a/arch/x86/mm/pat.c +++ b/arch/x86/mm/pat.c @@ -560,10 +560,10 @@ int kernel_map_sync_memtype(u64 base, unsigned long size, unsigned long flags) { unsigned long id_sz; - if (base >= __pa(high_memory)) + if (base > __pa(high_memory-1)) return 0; - id_sz = (__pa(high_memory) < base + size) ? + id_sz = (__pa(high_memory-1) <= base + size) ? __pa(high_memory) - base : size; diff --git a/arch/x86/mm/physaddr.c b/arch/x86/mm/physaddr.c index c73fedd..e666cbb 100644 --- a/arch/x86/mm/physaddr.c +++ b/arch/x86/mm/physaddr.c @@ -1,3 +1,4 @@ +#include #include #include #include @@ -68,10 +69,16 @@ EXPORT_SYMBOL(__virt_addr_valid); #ifdef CONFIG_DEBUG_VIRTUAL unsigned long __phys_addr(unsigned long x) { + unsigned long phys_addr = x - PAGE_OFFSET; /* VMALLOC_* aren't constants */ VIRTUAL_BUG_ON(x < PAGE_OFFSET); VIRTUAL_BUG_ON(__vmalloc_start_set && is_vmalloc_addr((void *) x)); - return x - PAGE_OFFSET; + /* max_low_pfn is set early, but not _that_ early */ + if (max_low_pfn) { + VIRTUAL_BUG_ON((phys_addr >> PAGE_SHIFT) > max_low_pfn); + BUG_ON(slow_virt_to_phys((void *)x) != phys_addr); + } + return phys_addr; } EXPORT_SYMBOL(__phys_addr); #endif -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC patch v2 7/7] sched: consider runnable load average in effective_load
effective_load calculates the load change as seen from the root_task_group. It needs to engage the runnable average of changed task. Thanks for Morten Rasmussen's reminder of this. Signed-off-by: Alex Shi --- kernel/sched/fair.c | 27 --- 1 file changed, 20 insertions(+), 7 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 84bb3f7..8066a61 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -2981,7 +2981,8 @@ static void task_waking_fair(struct task_struct *p) #ifdef CONFIG_FAIR_GROUP_SCHED /* - * effective_load() calculates the load change as seen from the root_task_group + * effective_load() calculates the runnable load average change as seen from + * the root_task_group * * Adding load to a group doesn't make a group heavier, but can cause movement * of group shares between cpus. Assuming the shares were perfectly aligned one @@ -3029,6 +3030,9 @@ static void task_waking_fair(struct task_struct *p) * Therefore the effective change in loads on CPU 0 would be 5/56 (3/8 - 2/7) * times the weight of the group. The effect on CPU 1 would be -4/56 (4/8 - * 4/7) times the weight of the group. + * + * After get effective_load of the load moving, will engaged the sched entity's + * runnable avg. */ static long effective_load(struct task_group *tg, int cpu, long wl, long wg) { @@ -3103,6 +3107,7 @@ static int wake_affine(struct sched_domain *sd, struct task_struct *p, int sync) struct task_group *tg; unsigned long weight; int balanced; + int runnable_avg; idx = sd->wake_idx; this_cpu = smp_processor_id(); @@ -3118,13 +3123,19 @@ static int wake_affine(struct sched_domain *sd, struct task_struct *p, int sync) if (sync) { tg = task_group(current); weight = current->se.load.weight; + runnable_avg = current->se.avg.runnable_avg_sum * NICE_0_LOAD + / (current->se.avg.runnable_avg_period + 1); - this_load += effective_load(tg, this_cpu, -weight, -weight); - load += effective_load(tg, prev_cpu, 0, -weight); + this_load += effective_load(tg, this_cpu, -weight, -weight) + * runnable_avg >> NICE_0_SHIFT; + load += effective_load(tg, prev_cpu, 0, -weight) + * runnable_avg >> NICE_0_SHIFT; } tg = task_group(p); weight = p->se.load.weight; + runnable_avg = p->se.avg.runnable_avg_sum * NICE_0_LOAD + / (p->se.avg.runnable_avg_period + 1); /* * In low-load situations, where prev_cpu is idle and this_cpu is idle @@ -3136,16 +3147,18 @@ static int wake_affine(struct sched_domain *sd, struct task_struct *p, int sync) * task to be woken on this_cpu. */ if (this_load > 0) { - s64 this_eff_load, prev_eff_load; + s64 this_eff_load, prev_eff_load, tmp_eff_load; this_eff_load = 100; this_eff_load *= power_of(prev_cpu); - this_eff_load *= this_load + - effective_load(tg, this_cpu, weight, weight); + tmp_eff_load = effective_load(tg, this_cpu, weight, weight) + * runnable_avg >> NICE_0_SHIFT; + this_eff_load *= this_load + tmp_eff_load; prev_eff_load = 100 + (sd->imbalance_pct - 100) / 2; prev_eff_load *= power_of(this_cpu); - prev_eff_load *= load + effective_load(tg, prev_cpu, 0, weight); + prev_eff_load *= load + (effective_load(tg, prev_cpu, 0, weight) + * runnable_avg >> NICE_0_SHIFT); balanced = this_eff_load <= prev_eff_load; } else -- 1.7.12 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC patch v2 1/7] sched: give initial value for runnable avg of sched entities.
We need initialize the se.avg.{decay_count, load_avg_contrib} to zero after a new task forked. Otherwise random values of above variables cause mess when do new task enqueue: enqueue_task_fair enqueue_entity enqueue_entity_load_avg Signed-off-by: Alex Shi --- kernel/sched/core.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 257002c..66c1718 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -1558,6 +1558,8 @@ static void __sched_fork(struct task_struct *p) #if defined(CONFIG_SMP) && defined(CONFIG_FAIR_GROUP_SCHED) p->se.avg.runnable_avg_period = 0; p->se.avg.runnable_avg_sum = 0; + p->se.avg.decay_count = 0; + p->se.avg.load_avg_contrib = 0; #endif #ifdef CONFIG_SCHEDSTATS memset(&p->se.statistics, 0, sizeof(p->se.statistics)); -- 1.7.12 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC patch v2 5/7] sched: compute runnable load avg in cpu_load and cpu_avg_load_per_task
They are the base values in load balance, update them with rq runnable load average, then the load balance will consider runnable load avg naturally. Signed-off-by: Alex Shi --- kernel/sched/core.c | 4 ++-- kernel/sched/fair.c | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 4f4714e..5da13ff 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -2539,7 +2539,7 @@ static void __update_cpu_load(struct rq *this_rq, unsigned long this_load, void update_idle_cpu_load(struct rq *this_rq) { unsigned long curr_jiffies = ACCESS_ONCE(jiffies); - unsigned long load = this_rq->load.weight; + unsigned long load = (unsigned long)this_rq->cfs.runnable_load_avg; unsigned long pending_updates; /* @@ -2589,7 +2589,7 @@ static void update_cpu_load_active(struct rq *this_rq) * See the mess around update_idle_cpu_load() / update_cpu_load_nohz(). */ this_rq->last_load_update_tick = jiffies; - __update_cpu_load(this_rq, this_rq->load.weight, 1); + __update_cpu_load(this_rq, this_rq->cfs.runnable_load_avg, 1); calc_load_account_active(this_rq); } diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 017e040..729221b 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -2905,7 +2905,7 @@ static void dequeue_task_fair(struct rq *rq, struct task_struct *p, int flags) /* Used instead of source_load when we know the type == 0 */ static unsigned long weighted_cpuload(const int cpu) { - return cpu_rq(cpu)->load.weight; + return (unsigned long)cpu_rq(cpu)->cfs.runnable_load_avg; } /* @@ -2952,7 +2952,7 @@ static unsigned long cpu_avg_load_per_task(int cpu) unsigned long nr_running = ACCESS_ONCE(rq->nr_running); if (nr_running) - return rq->load.weight / nr_running; + return (unsigned long)rq->cfs.runnable_load_avg / nr_running; return 0; } -- 1.7.12 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC patch v2 6/7] sched: consider runnable load average in move_tasks
Except using runnable load average in background, move_tasks is also the key functions in load balance. We need consider the runnable load average in it in order to the apple to apple load comparison. Signed-off-by: Alex Shi --- kernel/sched/fair.c | 11 ++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 729221b..84bb3f7 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -3978,6 +3978,15 @@ static unsigned long task_h_load(struct task_struct *p); static const unsigned int sched_nr_migrate_break = 32; +static unsigned long task_h_load_avg(struct task_struct *p) +{ + u32 period = p->se.avg.runnable_avg_period; + if (!period) + return 0; + + return task_h_load(p) * p->se.avg.runnable_avg_sum / period; +} + /* * move_tasks tries to move up to imbalance weighted load from busiest to * this_rq, as part of a balancing operation within domain "sd". @@ -4013,7 +4022,7 @@ static int move_tasks(struct lb_env *env) if (throttled_lb_pair(task_group(p), env->src_cpu, env->dst_cpu)) goto next; - load = task_h_load(p); + load = task_h_load_avg(p); if (sched_feat(LB_MIN) && load < 16 && !env->sd->nr_balance_failed) goto next; -- 1.7.12 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC patch v2 2/7] sched: set initial load avg of new forked task
New task has no runnable sum at its first runnable time, so its runnable load is zero. That makes burst forking balancing just select few idle cpus to assign tasks if we engage runnable load in balancing. Set initial load avg of new forked task as its load weight to resolve this issue. Signed-off-by: Alex Shi Reviewed-by: Preeti U Murthy --- include/linux/sched.h | 1 + kernel/sched/core.c | 2 +- kernel/sched/fair.c | 11 +-- 3 files changed, 11 insertions(+), 3 deletions(-) diff --git a/include/linux/sched.h b/include/linux/sched.h index 6fc8f45..b8738c0 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1069,6 +1069,7 @@ struct sched_domain; #else #define ENQUEUE_WAKING 0 #endif +#define ENQUEUE_NEWTASK8 #define DEQUEUE_SLEEP 1 diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 66c1718..66ce1f1 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -1705,7 +1705,7 @@ void wake_up_new_task(struct task_struct *p) #endif rq = __task_rq_lock(p); - activate_task(rq, p, 0); + activate_task(rq, p, ENQUEUE_NEWTASK); p->on_rq = 1; trace_sched_wakeup_new(p, true); check_preempt_curr(rq, p, WF_FORK); diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 5eea870..1384297 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -1503,8 +1503,9 @@ static inline void update_rq_runnable_avg(struct rq *rq, int runnable) /* Add the load generated by se into cfs_rq's child load-average */ static inline void enqueue_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se, - int wakeup) + int flags) { + int wakeup = flags & ENQUEUE_WAKEUP; /* * We track migrations using entity decay_count <= 0, on a wake-up * migration we use a negative decay count to track the remote decays @@ -1538,6 +1539,12 @@ static inline void enqueue_entity_load_avg(struct cfs_rq *cfs_rq, update_entity_load_avg(se, 0); } + /* +* set the initial load avg of new task same as its load +* in order to avoid brust fork make few cpu too heavier +*/ + if (flags & ENQUEUE_NEWTASK) + se->avg.load_avg_contrib = se->load.weight; cfs_rq->runnable_load_avg += se->avg.load_avg_contrib; /* we force update consideration on load-balancer moves */ update_cfs_rq_blocked_load(cfs_rq, !wakeup); @@ -1701,7 +1708,7 @@ enqueue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags) * Update run-time statistics of the 'current'. */ update_curr(cfs_rq); - enqueue_entity_load_avg(cfs_rq, se, flags & ENQUEUE_WAKEUP); + enqueue_entity_load_avg(cfs_rq, se, flags); account_entity_enqueue(cfs_rq, se); update_cfs_shares(cfs_rq); -- 1.7.12 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC patch v2 3/7] Revert "sched: Introduce temporary FAIR_GROUP_SCHED dependency for load-tracking"
Remove CONFIG_FAIR_GROUP_SCHED that covers the runnable info, then we can use runnable load variables. Signed-off-by: Alex Shi --- include/linux/sched.h | 8 +--- kernel/sched/core.c | 7 +-- kernel/sched/fair.c | 13 ++--- kernel/sched/sched.h | 9 + 4 files changed, 5 insertions(+), 32 deletions(-) diff --git a/include/linux/sched.h b/include/linux/sched.h index b8738c0..e55fa95 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1195,13 +1195,7 @@ struct sched_entity { /* rq "owned" by this entity/group: */ struct cfs_rq *my_q; #endif -/* - * Load-tracking only depends on SMP, FAIR_GROUP_SCHED dependency below may be - * removed when useful for applications beyond shares distribution (e.g. - * load-balance). - */ -#if defined(CONFIG_SMP) && defined(CONFIG_FAIR_GROUP_SCHED) - /* Per-entity load-tracking */ +#ifdef CONFIG_SMP struct sched_avgavg; #endif }; diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 66ce1f1..dbab4b3 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -1550,12 +1550,7 @@ static void __sched_fork(struct task_struct *p) p->se.vruntime = 0; INIT_LIST_HEAD(&p->se.group_node); -/* - * Load-tracking only depends on SMP, FAIR_GROUP_SCHED dependency below may be - * removed when useful for applications beyond shares distribution (e.g. - * load-balance). - */ -#if defined(CONFIG_SMP) && defined(CONFIG_FAIR_GROUP_SCHED) +#ifdef CONFIG_SMP p->se.avg.runnable_avg_period = 0; p->se.avg.runnable_avg_sum = 0; p->se.avg.decay_count = 0; diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 1384297..017e040 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -1109,8 +1109,7 @@ static inline void update_cfs_shares(struct cfs_rq *cfs_rq) } #endif /* CONFIG_FAIR_GROUP_SCHED */ -/* Only depends on SMP, FAIR_GROUP_SCHED may be removed when useful in lb */ -#if defined(CONFIG_SMP) && defined(CONFIG_FAIR_GROUP_SCHED) +#ifdef CONFIG_SMP /* * We choose a half-life close to 1 scheduling period. * Note: The tables below are dependent on this value. @@ -3410,12 +3409,6 @@ unlock: } /* - * Load-tracking only depends on SMP, FAIR_GROUP_SCHED dependency below may be - * removed when useful for applications beyond shares distribution (e.g. - * load-balance). - */ -#ifdef CONFIG_FAIR_GROUP_SCHED -/* * Called immediately before a task is migrated to a new cpu; task_cpu(p) and * cfs_rq_of(p) references at time of call are still valid and identify the * previous cpu. However, the caller only guarantees p->pi_lock is held; no @@ -3438,7 +3431,6 @@ migrate_task_rq_fair(struct task_struct *p, int next_cpu) atomic64_add(se->avg.load_avg_contrib, &cfs_rq->removed_load); } } -#endif #endif /* CONFIG_SMP */ static unsigned long @@ -6130,9 +6122,8 @@ const struct sched_class fair_sched_class = { #ifdef CONFIG_SMP .select_task_rq = select_task_rq_fair, -#ifdef CONFIG_FAIR_GROUP_SCHED .migrate_task_rq= migrate_task_rq_fair, -#endif + .rq_online = rq_online_fair, .rq_offline = rq_offline_fair, diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index fc88644..ae3511e 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -225,12 +225,6 @@ struct cfs_rq { #endif #ifdef CONFIG_SMP -/* - * Load-tracking only depends on SMP, FAIR_GROUP_SCHED dependency below may be - * removed when useful for applications beyond shares distribution (e.g. - * load-balance). - */ -#ifdef CONFIG_FAIR_GROUP_SCHED /* * CFS Load tracking * Under CFS, load is tracked on a per-entity basis and aggregated up. @@ -240,8 +234,7 @@ struct cfs_rq { u64 runnable_load_avg, blocked_load_avg; atomic64_t decay_counter, removed_load; u64 last_decay; -#endif /* CONFIG_FAIR_GROUP_SCHED */ -/* These always depend on CONFIG_FAIR_GROUP_SCHED */ + #ifdef CONFIG_FAIR_GROUP_SCHED u32 tg_runnable_contrib; u64 tg_load_contrib; -- 1.7.12 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC patch v2 4/7] sched: update cpu load after task_tick.
To get the latest runnable info, we need do this cpuload update after task_tick. Signed-off-by: Alex Shi --- kernel/sched/core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index dbab4b3..4f4714e 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -2695,8 +2695,8 @@ void scheduler_tick(void) raw_spin_lock(&rq->lock); update_rq_clock(rq); - update_cpu_load_active(rq); curr->sched_class->task_tick(rq, curr, 0); + update_cpu_load_active(rq); raw_spin_unlock(&rq->lock); perf_event_task_tick(); -- 1.7.12 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC patch v2] sched: use runnable load avg in cfs balance instead of instant load
This patchset can be used, but causes burst waking benchmark aim9 drop 5~7% on my 2 sockets machine. The reason is too light runnable load in early stage of waked tasks causes imbalance in balancing. So, it is immature and just a reference for guys who want to go gurther. V2 change: 1, attached the 1~3 patches, which were sent in power awareness scheduling 2, remove CONFIG_FAIR_GROUP_SCHED mask in patch 5th. Thanks Ingo's comments and testing provided by Fengguang's kbuild system. Now it is indepent patchset bases on Linus' tree. Thanks Alex [RFC patch v2 1/7] sched: give initial value for runnable avg of [RFC patch v2 2/7] sched: set initial load avg of new forked task [RFC patch v2 3/7] Revert "sched: Introduce temporary [RFC patch v2 4/7] sched: update cpu load after task_tick. [RFC patch v2 5/7] sched: compute runnable load avg in cpu_load and [RFC patch v2 6/7] sched: consider runnable load average in [RFC patch v2 7/7] sched: consider runnable load average in -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] ACPI scan handlers
On Friday, January 25, 2013 04:07:38 PM Toshi Kani wrote: > On Fri, 2013-01-25 at 23:11 +0100, Rafael J. Wysocki wrote: > > On Friday, January 25, 2013 09:52:21 AM Toshi Kani wrote: > > > On Thu, 2013-01-24 at 01:26 +0100, Rafael J. Wysocki wrote: > : > > > > > > > > I wonder if anyone is seeing any major problems with this at the high > > > > level. > > > > First of all, thanks for the response. :-) > > > > > I agree that the current model is mess. As shown below, it requires > > > that .add() at boot-time only performs acpi dev init, and .add() at > > > hot-add needs both acpi dev init and device on-lining. > > > > I'm not sure what you're talking about, though. > > > > You seem to be confusing ACPI device nodes (i.e. things represented by > > struct > > acpi_device objects) with devices, but they are different things. They are > > just used to store static information extracted from device objects in the > > ACPI namespace and to expose those objects (and possibly some of their > > properties) via sysfs. Device objects in the ACPI namespace are not > > devices, > > however, and they don't even need to represent devices (for example, the > > _SB thing, which is represented by struct acpi_device, is hardly a device). > > > > So the role of struct acpi_device things is analogous to the role of > > struct device_node things in the Device Trees world. In fact, no drivers > > should ever bind to them and in my opinion it was a grievous mistake to > > let them do that. But I'm digressing. > > > > So, when you're saying "acpi dev", I'm not sure if you think about a device > > node > > or a device (possibly) represented by that node. If you mean device node, > > then > > I'm not sure what "acpi dev init" means, because device nodes by definition > > don't require any initialization beyond what acpi_add_single_object() does > > (and they don't require any off-lining beyod what acpi_device_unregister() > > does, for that matter). In turn, if you mean "device represented by the > > given > > device node", then you can't even say "ACPI device" about it, because it > > very > > well may be a PCI device, or a USB device, or a SATA device etc. > > Let me clarify my point with the ACPI memory driver as an example since > it is the one that has caused a problem in .remove(). > > acpi_memory_device_add() implements .add() and does two things below. > > 1. Call _CRS and initialize a list of struct acpi_memory_info that is > attached to acpi_device->driver_data. This step is what I described as > "acpi dev init". ACPI drivers perform driver-specific initialization to > ACPI device objects. > > 2. Call add_memory() to add a target memory range to the mm module. > This step is what I described as "on-lining". This step is not > necessary at boot-time since the mm module has already on-lined the > memory ranges at early boot-time. At hot-add, however, it needs to call > add_memory() with the current framework. I see. OK, so that does handle the "struct acpi_device has been registered" event, both on boot and hot-add. The interactions with mm are tricky, I agree, but that's not what I want to address at this point. > Similarly, acpi_memory_device_remove() implements .remove() and does two > things below. > > 1. Call remove_memory() to offline a target memory range. This step, > "off-lining", can fail since the mm module may or may not be able to > delete non-movable ranges. This failure cannot be handled properly and > causes the system to crash at this point. Well, if the system administrator wants to crash the system this way, it's basically up to him. So that should be done by .detach() anyway in that case. > 2. Free up the list of struct acpi_memory_info. This step deletes > driver-specific data from an ACPI device object. OK > > That's part of the whole confusion, by the way. > > > > If the device represented by an ACPI device node is on a natively enumerated > > bus, like PCI, then its native bus' init code initializes the device and > > creates a "physical" device object for it, like struct pci_dev, which is > > then > > "glued" to the corresponding struct acpi_device by acpi_bind_one(). Then, > > it > > is clear which is which and there's no confusion. The confusion starts when > > there's no native enumeration and we only have the struct acpi_device thing, > > because then everybody seems to think "oh, there's no physical device object > > now, so this must be something different", but the *only* difference is that > > there is no native bus' init code now and we should still be creating a > > "physical device" object for the device and we should "glue" it to the > > existing struct acpi_device like in the natively enumerated case. > > > > > It then requires .remove() to perform both off-lining and acpi dev > > > delete. .remove() must succeed, but off-lining can fail. > > > > > > acpi dev online > > > ||=| > > > > > >add @ boot > > > --
RE: [PATCH 1/2]linux-usb:Define a new macro for USB storage match rules
> -Original Message- > From: Greg KH [mailto:g...@kroah.com] > Sent: Saturday, January 26, 2013 1:45 AM > To: Fangxiaozhi (Franko) > Cc: Sergei Shtylyov; linux-...@vger.kernel.org; linux-kernel@vger.kernel.org; > Xueguiying (Zihan); Linlei (Lei Lin); Yili (Neil); Wangyuhua (Roger, Credit); > Huqiao (C); ba...@ti.com; mdharm-...@one-eyed-alien.net; > sebast...@breakpoint.cc > Subject: Re: [PATCH 1/2]linux-usb:Define a new macro for USB storage match > rules > > On Fri, Jan 25, 2013 at 04:18:34PM +0400, Sergei Shtylyov wrote: > > Hello. > > > > On 25-01-2013 6:44, fangxiaozhi 00110321 wrote: > > > > >From: fangxiaozhi > > > > >1. Define a new macro for USB storage match rules: > > > matching with Vendor ID and interface descriptors. > > > > >Signed-off-by: fangxiaozhi > > > > > > > > > diff -uprN linux-3.8-rc4_orig/drivers/usb/storage/usb.c > > >linux-3.8-rc4/drivers/usb/storage/usb.c > > >--- linux-3.8-rc4_orig/drivers/usb/storage/usb.c 2013-01-22 > > >14:12:42.595238727 +0800 > > >+++ linux-3.8-rc4/drivers/usb/storage/usb.c 2013-01-22 > > >+++ 14:16:01.398250305 +0800 > > >@@ -120,6 +120,17 @@ MODULE_PARM_DESC(quirks, "supplemental l > > > .useTransport = use_transport, \ > > > } > > > > > >+#define UNUSUAL_VENDOR_INTF(idVendor, cl, sc, pr, \ > > >+ vendor_name, product_name, use_protocol, use_transport, \ > > >+ init_function, Flags) \ > > >+{ \ > > >+ .vendorName = vendor_name, \ > > >+ .productName = product_name, \ > > >+ .useProtocol = use_protocol, \ > > >+ .useTransport = use_transport, \ > > >+ .initFunction = init_function, \ > > >+} > > > > Shouldn't the field initilaizers be indented with tab, not space? > > Yes it must. fangxiaozhi, please always run your patches through the > scripts/checkpatch.pl tool before sending them out (note, you will have to > ignore the CamelCase warnings your patch produces, but not the other > ones.) > -What's wrong with it? -I have checked the patches with scripts/checkpatch.pl before sending. -There is no other warning or error in my patches except CamelCase warnings. -So what's wrong now? > Please do that on both of these patches and resend them. > > thanks, > > greg k-h
[PATCH 09/14] dlm: use idr_for_each_entry() in recover_idr_clear() error path
Convert recover_idr_clear() to use idr_for_each_entry() instead of idr_for_each(). It's somewhat less efficient this way but it shouldn't matter in an error path. This is to help with deprecation of idr_remove_all(). Only compile tested. Signed-off-by: Tejun Heo Cc: Christine Caulfield Cc: David Teigland Cc: cluster-de...@redhat.com --- This patch depends on an earlier idr patch and I think it would be best to route these together through -mm. Christine, David, can you please ack this? Thanks. fs/dlm/recover.c | 23 ++- 1 file changed, 10 insertions(+), 13 deletions(-) diff --git a/fs/dlm/recover.c b/fs/dlm/recover.c index aedea28..b2856e7 100644 --- a/fs/dlm/recover.c +++ b/fs/dlm/recover.c @@ -351,23 +351,20 @@ static struct dlm_rsb *recover_idr_find(struct dlm_ls *ls, uint64_t id) return r; } -static int recover_idr_clear_rsb(int id, void *p, void *data) +static void recover_idr_clear(struct dlm_ls *ls) { - struct dlm_ls *ls = data; - struct dlm_rsb *r = p; + struct dlm_rsb *r; + int id; - r->res_id = 0; - r->res_recover_locks_count = 0; - ls->ls_recover_list_count--; + spin_lock(&ls->ls_recover_idr_lock); - dlm_put_rsb(r); - return 0; -} + idr_for_each_entry(&ls->ls_recover_idr, r, id) { + r->res_id = 0; + r->res_recover_locks_count = 0; + ls->ls_recover_list_count--; -static void recover_idr_clear(struct dlm_ls *ls) -{ - spin_lock(&ls->ls_recover_idr_lock); - idr_for_each(&ls->ls_recover_idr, recover_idr_clear_rsb, ls); + dlm_put_rsb(r); + } idr_remove_all(&ls->ls_recover_idr); if (ls->ls_recover_list_count != 0) { -- 1.8.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 10/14] dlm: don't use idr_remove_all()
idr_destroy() can destroy idr by itself and idr_remove_all() is being deprecated. The conversion isn't completely trivial for recover_idr_clear() as it's the only place in kernel which makes legitimate use of idr_remove_all() w/o idr_destroy(). Replace it with idr_remove() call inside idr_for_each_entry() loop. It goes on top so that it matches the operation order in recover_idr_del(). Only compile tested. Signed-off-by: Tejun Heo Cc: Christine Caulfield Cc: David Teigland Cc: cluster-de...@redhat.com --- This patch depends on an earlier idr patch and given the trivial nature of the patch, I think it would be best to route these together through -mm. Please holler if there's any objection. Thanks. fs/dlm/lockspace.c | 1 - fs/dlm/recover.c | 2 +- 2 files changed, 1 insertion(+), 2 deletions(-) diff --git a/fs/dlm/lockspace.c b/fs/dlm/lockspace.c index 2e99fb0..3ca79d3 100644 --- a/fs/dlm/lockspace.c +++ b/fs/dlm/lockspace.c @@ -796,7 +796,6 @@ static int release_lockspace(struct dlm_ls *ls, int force) */ idr_for_each(&ls->ls_lkbidr, lkb_idr_free, ls); - idr_remove_all(&ls->ls_lkbidr); idr_destroy(&ls->ls_lkbidr); /* diff --git a/fs/dlm/recover.c b/fs/dlm/recover.c index b2856e7..236d108 100644 --- a/fs/dlm/recover.c +++ b/fs/dlm/recover.c @@ -359,13 +359,13 @@ static void recover_idr_clear(struct dlm_ls *ls) spin_lock(&ls->ls_recover_idr_lock); idr_for_each_entry(&ls->ls_recover_idr, r, id) { + idr_remove(&ls->ls_recover_idr, id); r->res_id = 0; r->res_recover_locks_count = 0; ls->ls_recover_list_count--; dlm_put_rsb(r); } - idr_remove_all(&ls->ls_recover_idr); if (ls->ls_recover_list_count != 0) { log_error(ls, "warning: recover_list_count %d", -- 1.8.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 02/14] atm/nicstar: don't use idr_remove_all()
idr_destroy() can destroy idr by itself and idr_remove_all() is being deprecated. Drop its usage. Signed-off-by: Tejun Heo Cc: Chas Williams Cc: net...@vger.kernel.org --- This patch depends on an earlier idr patch and given the trivial nature of the patch, I think it would be best to route these together through -mm. Please holler if there's any objection. Thanks. drivers/atm/nicstar.c | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/atm/nicstar.c b/drivers/atm/nicstar.c index ed1d2b7..628787e 100644 --- a/drivers/atm/nicstar.c +++ b/drivers/atm/nicstar.c @@ -251,7 +251,6 @@ static void nicstar_remove_one(struct pci_dev *pcidev) if (card->scd2vc[j] != NULL) free_scq(card, card->scd2vc[j]->scq, card->scd2vc[j]->tx_vcc); } - idr_remove_all(&card->idr); idr_destroy(&card->idr); pci_free_consistent(card->pcidev, NS_RSQSIZE + NS_RSQ_ALIGNMENT, card->rsq.org, card->rsq.dma); -- 1.8.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCHSET] idr: deprecate idr_remove_all()
Hello, (Andrew, I think this one is best routed through -mm. Please read on) idr is one of the areas with much higher concentration of bad interface and implementation decisions. This patchset removes one of those oddities - idr_remove_all(). idr needs two steps for destruction - idr_remove_all() followed by idr_destroy(). idr_remove_all() releases all IDs in use but doesn't release buffered idr_layers. idr_destroy() frees buffered idr_layers() but doesn't bother with in-use idr_layers. For added fun, calling idr_remove() on all allocated IDs doesn't necessarily free all in-use idr_layers, so idr_for_each_entry() idr_remove(); followed by idr_destroy() may still leak memory. This confuses people. Some correctly use both. Many forget to call idr_remove_all() and others forget idr_destroy() and they all leak memory. Even ida - something tightly coupled w/ idr - forgets to do idr_remove_all() (although it's my fault). This is just a bad interface. While remove_all in itself might not be that bad, there is only one legitimate user of idr_remove_all() which can be converted to idr_remove() relatively easily, so I think it'd be better to deprecate and later unexport it than keeping it around. This patchset contains the following 14 patches. 0001-idr-make-idr_destroy-imply-idr_remove_all.patch 0002-atm-nicstar-don-t-use-idr_remove_all.patch 0003-block-loop-don-t-use-idr_remove_all.patch 0004-firewire-don-t-use-idr_remove_all.patch 0005-drm-don-t-use-idr_remove_all.patch 0006-dm-don-t-use-idr_remove_all.patch 0007-remoteproc-don-t-use-idr_remove_all.patch 0008-rpmsg-don-t-use-idr_remove_all.patch 0009-dlm-use-idr_for_each_entry-in-recover_idr_clear-erro.patch 0010-dlm-don-t-use-idr_remove_all.patch 0011-nfs-idr_destroy-no-longer-needs-idr_remove_all.patch 0012-inotify-don-t-use-idr_remove_all.patch 0013-cgroup-don-t-use-idr_remove_all.patch 0014-idr-deprecate-idr_remove_all.patch 0001 makes idr_destroy() imply idr_remove_all(). 0002-0013 remove uses of idr_remove_all(). 0014 marks idr_remove_all() deprecated. The patches are on top of the current linus#master 66e2d3e8c2 and also applies on top of the current -mm. It's available in the following git branch. git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc.git deprecate-idr_remove_all As changes to most are trivial and have dependency on the first patch, I think it would be best to route these together. The only non-trivial change is 0009 and 0010 which converts idr_for_each() to idr_for_each_entry() and then replaces idr_remove_all() with idr_remove() inside the for_each_entry loop. Defintely wanna get acks from dlm people. Andrew, once people agree with the series, can you please route these through -mm? diffstat follows. Thanks. drivers/atm/nicstar.c |1 - drivers/block/loop.c|1 - drivers/firewire/core-cdev.c|1 - drivers/gpu/drm/drm_context.c |2 +- drivers/gpu/drm/drm_crtc.c |1 - drivers/gpu/drm/drm_drv.c |1 - drivers/gpu/drm/drm_gem.c |2 -- drivers/gpu/drm/exynos/exynos_drm_ipp.c |4 drivers/gpu/drm/sis/sis_drv.c |1 - drivers/gpu/drm/via/via_map.c |1 - drivers/md/dm.c |1 - drivers/remoteproc/remoteproc_core.c|1 - drivers/rpmsg/virtio_rpmsg_bus.c|1 - fs/dlm/lockspace.c |1 - fs/dlm/recover.c| 25 +++-- fs/nfs/client.c |1 - fs/notify/inotify/inotify_fsnotify.c|1 - include/linux/idr.h | 14 +- kernel/cgroup.c |4 +--- lib/idr.c | 28 +--- 20 files changed, 39 insertions(+), 53 deletions(-) -- tejun -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 11/14] nfs: idr_destroy() no longer needs idr_remove_all()
idr_destroy() can destroy idr by itself and idr_remove_all() is being deprecated. Drop reference to idr_remove_all(). Note that the code wasn't completely correct before because idr_remove() on all entries doesn't necessarily release all idr_layers which could lead to memory leak. Signed-off-by: Tejun Heo Cc: "J. Bruce Fields" Cc: linux-...@vger.kernel.org --- This patch depends on an earlier idr patch and given the trivial nature of the patch, I think it would be best to route these together through -mm. Please holler if there's any objection. Thanks. fs/nfs/client.c | 1 - 1 file changed, 1 deletion(-) diff --git a/fs/nfs/client.c b/fs/nfs/client.c index 9f3c664..84d8eae 100644 --- a/fs/nfs/client.c +++ b/fs/nfs/client.c @@ -197,7 +197,6 @@ error_0: EXPORT_SYMBOL_GPL(nfs_alloc_client); #if IS_ENABLED(CONFIG_NFS_V4) -/* idr_remove_all is not needed as all id's are removed by nfs_put_client */ void nfs_cleanup_cb_ident_idr(struct net *net) { struct nfs_net *nn = net_generic(net, nfs_net_id); -- 1.8.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 13/14] cgroup: don't use idr_remove_all()
idr_destroy() can destroy idr by itself and idr_remove_all() is being deprecated. Drop its usage. Signed-off-by: Tejun Heo Cc: Li Zefan Cc: contain...@lists.linux-foundation.org Cc: cgro...@vger.kernel.org --- This patch depends on an earlier idr patch and given the trivial nature of the patch, I think it would be best to route these together through -mm. Please holler if there's any objection. Thanks. kernel/cgroup.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/kernel/cgroup.c b/kernel/cgroup.c index 4855892..6b18c5c 100644 --- a/kernel/cgroup.c +++ b/kernel/cgroup.c @@ -4567,10 +4567,8 @@ void cgroup_unload_subsys(struct cgroup_subsys *ss) offline_css(ss, dummytop); ss->active = 0; - if (ss->use_id) { - idr_remove_all(&ss->idr); + if (ss->use_id) idr_destroy(&ss->idr); - } /* deassign the subsys_id */ subsys[ss->subsys_id] = NULL; -- 1.8.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 14/14] idr: deprecate idr_remove_all()
There was only one legitimate use of idr_remove_all() and a lot more of incorrect uses (or lack of it). Now that idr_destroy() implies idr_remove_all() and all the in-kernel users updated not to use it, there's no reason to keep it around. Mark it deprecated so that we can later unexport it. idr_remove_all() is made an inline function calling __idr_remove_all() to avoid triggering deprecated warning on EXPORT_SYMBOL(). Signed-off-by: Tejun Heo --- include/linux/idr.h | 14 +- lib/idr.c | 10 +++--- 2 files changed, 16 insertions(+), 8 deletions(-) diff --git a/include/linux/idr.h b/include/linux/idr.h index de7e190..1b932e7 100644 --- a/include/linux/idr.h +++ b/include/linux/idr.h @@ -110,10 +110,22 @@ int idr_for_each(struct idr *idp, void *idr_get_next(struct idr *idp, int *nextid); void *idr_replace(struct idr *idp, void *ptr, int id); void idr_remove(struct idr *idp, int id); -void idr_remove_all(struct idr *idp); void idr_destroy(struct idr *idp); void idr_init(struct idr *idp); +void __idr_remove_all(struct idr *idp);/* don't use */ + +/** + * idr_remove_all - remove all ids from the given idr tree + * @idp: idr handle + * + * If you're trying to destroy @idp, calling idr_destroy() is enough. + * This is going away. Don't use. + */ +static inline void __deprecated idr_remove_all(struct idr *idp) +{ + __idr_remove_all(idp); +} /* * IDA - IDR based id allocator, use when translation from id to diff --git a/lib/idr.c b/lib/idr.c index 1e47832..1408e93 100644 --- a/lib/idr.c +++ b/lib/idr.c @@ -433,11 +433,7 @@ void idr_remove(struct idr *idp, int id) } EXPORT_SYMBOL(idr_remove); -/** - * idr_remove_all - remove all ids from the given idr tree - * @idp: idr handle - */ -void idr_remove_all(struct idr *idp) +void __idr_remove_all(struct idr *idp) { int n, id, max; int bt_mask; @@ -470,7 +466,7 @@ void idr_remove_all(struct idr *idp) } idp->layers = 0; } -EXPORT_SYMBOL(idr_remove_all); +EXPORT_SYMBOL(__idr_remove_all); /** * idr_destroy - release all cached layers within an idr tree @@ -487,7 +483,7 @@ EXPORT_SYMBOL(idr_remove_all); */ void idr_destroy(struct idr *idp) { - idr_remove_all(idp); + __idr_remove_all(idp); while (idp->id_free_cnt) { struct idr_layer *p = get_from_free_list(idp); -- 1.8.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 12/14] inotify: don't use idr_remove_all()
idr_destroy() can destroy idr by itself and idr_remove_all() is being deprecated. Drop its usage. Signed-off-by: Tejun Heo Cc: John McCutchan Cc: Robert Love Cc: Eric Paris --- This patch depends on an earlier idr patch and given the trivial nature of the patch, I think it would be best to route these together through -mm. Please holler if there's any objection. Thanks. fs/notify/inotify/inotify_fsnotify.c | 1 - 1 file changed, 1 deletion(-) diff --git a/fs/notify/inotify/inotify_fsnotify.c b/fs/notify/inotify/inotify_fsnotify.c index 871569c..4216308 100644 --- a/fs/notify/inotify/inotify_fsnotify.c +++ b/fs/notify/inotify/inotify_fsnotify.c @@ -197,7 +197,6 @@ static void inotify_free_group_priv(struct fsnotify_group *group) { /* ideally the idr is empty and we won't hit the BUG in the callback */ idr_for_each(&group->inotify_data.idr, idr_callback, group); - idr_remove_all(&group->inotify_data.idr); idr_destroy(&group->inotify_data.idr); atomic_dec(&group->inotify_data.user->inotify_devs); free_uid(group->inotify_data.user); -- 1.8.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 06/14] dm: don't use idr_remove_all()
idr_destroy() can destroy idr by itself and idr_remove_all() is being deprecated. Drop its usage. Signed-off-by: Tejun Heo Cc: Alasdair Kergon Cc: dm-de...@redhat.com --- This patch depends on an earlier idr patch and given the trivial nature of the patch, I think it would be best to route these together through -mm. Please holler if there's any objection. Thanks. drivers/md/dm.c | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/md/dm.c b/drivers/md/dm.c index c72e4d5..ea1a6ca 100644 --- a/drivers/md/dm.c +++ b/drivers/md/dm.c @@ -318,7 +318,6 @@ static void __exit dm_exit(void) /* * Should be empty by this point. */ - idr_remove_all(&_minor_idr); idr_destroy(&_minor_idr); } -- 1.8.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 08/14] rpmsg: don't use idr_remove_all()
idr_destroy() can destroy idr by itself and idr_remove_all() is being deprecated. Drop its usage. Signed-off-by: Tejun Heo Cc: Ohad Ben-Cohen --- This patch depends on an earlier idr patch and given the trivial nature of the patch, I think it would be best to route these together through -mm. Please holler if there's any objection. Thanks. drivers/rpmsg/virtio_rpmsg_bus.c | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/rpmsg/virtio_rpmsg_bus.c b/drivers/rpmsg/virtio_rpmsg_bus.c index f1e3239..aa334b6 100644 --- a/drivers/rpmsg/virtio_rpmsg_bus.c +++ b/drivers/rpmsg/virtio_rpmsg_bus.c @@ -1036,7 +1036,6 @@ static void rpmsg_remove(struct virtio_device *vdev) if (vrp->ns_ept) __rpmsg_destroy_ept(vrp, vrp->ns_ept); - idr_remove_all(&vrp->endpoints); idr_destroy(&vrp->endpoints); vdev->config->del_vqs(vrp->vdev); -- 1.8.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 05/14] drm: don't use idr_remove_all()
idr_destroy() can destroy idr by itself and idr_remove_all() is being deprecated. Drop its usage. * drm_ctxbitmap_cleanup() was calling idr_remove_all() but forgetting idr_destroy() thus leaking all buffered free idr_layers. Replace it with idr_destroy(). Signed-off-by: Tejun Heo Cc: David Airlie Cc: dri-de...@lists.freedesktop.org Cc: Inki Dae Cc: Joonyoung Shim Cc: Seung-Woo Kim Cc: Kyungmin Park --- This patch depends on an earlier idr patch and given the trivial nature of the patch, I think it would be best to route these together through -mm. Please holler if there's any objection. Thanks. drivers/gpu/drm/drm_context.c | 2 +- drivers/gpu/drm/drm_crtc.c | 1 - drivers/gpu/drm/drm_drv.c | 1 - drivers/gpu/drm/drm_gem.c | 2 -- drivers/gpu/drm/exynos/exynos_drm_ipp.c | 4 drivers/gpu/drm/sis/sis_drv.c | 1 - drivers/gpu/drm/via/via_map.c | 1 - 7 files changed, 1 insertion(+), 11 deletions(-) diff --git a/drivers/gpu/drm/drm_context.c b/drivers/gpu/drm/drm_context.c index 45adf97..75f62c5 100644 --- a/drivers/gpu/drm/drm_context.c +++ b/drivers/gpu/drm/drm_context.c @@ -118,7 +118,7 @@ int drm_ctxbitmap_init(struct drm_device * dev) void drm_ctxbitmap_cleanup(struct drm_device * dev) { mutex_lock(&dev->struct_mutex); - idr_remove_all(&dev->ctx_idr); + idr_destroy(&dev->ctx_idr); mutex_unlock(&dev->struct_mutex); } diff --git a/drivers/gpu/drm/drm_crtc.c b/drivers/gpu/drm/drm_crtc.c index f2d667b..9b39d1f 100644 --- a/drivers/gpu/drm/drm_crtc.c +++ b/drivers/gpu/drm/drm_crtc.c @@ -1102,7 +1102,6 @@ void drm_mode_config_cleanup(struct drm_device *dev) crtc->funcs->destroy(crtc); } - idr_remove_all(&dev->mode_config.crtc_idr); idr_destroy(&dev->mode_config.crtc_idr); } EXPORT_SYMBOL(drm_mode_config_cleanup); diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c index be174ca..25f91cd 100644 --- a/drivers/gpu/drm/drm_drv.c +++ b/drivers/gpu/drm/drm_drv.c @@ -297,7 +297,6 @@ static void __exit drm_core_exit(void) unregister_chrdev(DRM_MAJOR, "drm"); - idr_remove_all(&drm_minors_idr); idr_destroy(&drm_minors_idr); } diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c index 24efae4..e775859 100644 --- a/drivers/gpu/drm/drm_gem.c +++ b/drivers/gpu/drm/drm_gem.c @@ -561,8 +561,6 @@ drm_gem_release(struct drm_device *dev, struct drm_file *file_private) { idr_for_each(&file_private->object_idr, &drm_gem_object_release_handle, file_private); - - idr_remove_all(&file_private->object_idr); idr_destroy(&file_private->object_idr); } diff --git a/drivers/gpu/drm/exynos/exynos_drm_ipp.c b/drivers/gpu/drm/exynos/exynos_drm_ipp.c index 0bda964..49278f0 100644 --- a/drivers/gpu/drm/exynos/exynos_drm_ipp.c +++ b/drivers/gpu/drm/exynos/exynos_drm_ipp.c @@ -1786,8 +1786,6 @@ err_iommu: drm_iommu_detach_device(drm_dev, ippdrv->dev); err_idr: - idr_remove_all(&ctx->ipp_idr); - idr_remove_all(&ctx->prop_idr); idr_destroy(&ctx->ipp_idr); idr_destroy(&ctx->prop_idr); return ret; @@ -1965,8 +1963,6 @@ static int ipp_remove(struct platform_device *pdev) exynos_drm_subdrv_unregister(&ctx->subdrv); /* remove,destroy ipp idr */ - idr_remove_all(&ctx->ipp_idr); - idr_remove_all(&ctx->prop_idr); idr_destroy(&ctx->ipp_idr); idr_destroy(&ctx->prop_idr); diff --git a/drivers/gpu/drm/sis/sis_drv.c b/drivers/gpu/drm/sis/sis_drv.c index 841065b..5a5325e 100644 --- a/drivers/gpu/drm/sis/sis_drv.c +++ b/drivers/gpu/drm/sis/sis_drv.c @@ -58,7 +58,6 @@ static int sis_driver_unload(struct drm_device *dev) { drm_sis_private_t *dev_priv = dev->dev_private; - idr_remove_all(&dev_priv->object_idr); idr_destroy(&dev_priv->object_idr); kfree(dev_priv); diff --git a/drivers/gpu/drm/via/via_map.c b/drivers/gpu/drm/via/via_map.c index c0f1cc7..d0ab3fb 100644 --- a/drivers/gpu/drm/via/via_map.c +++ b/drivers/gpu/drm/via/via_map.c @@ -120,7 +120,6 @@ int via_driver_unload(struct drm_device *dev) { drm_via_private_t *dev_priv = dev->dev_private; - idr_remove_all(&dev_priv->object_idr); idr_destroy(&dev_priv->object_idr); kfree(dev_priv); -- 1.8.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 03/14] block/loop: don't use idr_remove_all()
idr_destroy() can destroy idr by itself and idr_remove_all() is being deprecated. Drop its usage. Signed-off-by: Tejun Heo Cc: Jens Axboe --- This patch depends on an earlier idr patch and given the trivial nature of the patch, I think it would be best to route these together through -mm. Please holler if there's any objection. Thanks. drivers/block/loop.c | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/block/loop.c b/drivers/block/loop.c index ae12512..3b9c32b 100644 --- a/drivers/block/loop.c +++ b/drivers/block/loop.c @@ -1911,7 +1911,6 @@ static void __exit loop_exit(void) range = max_loop ? max_loop << part_shift : 1UL << MINORBITS; idr_for_each(&loop_index_idr, &loop_exit_cb, NULL); - idr_remove_all(&loop_index_idr); idr_destroy(&loop_index_idr); blk_unregister_region(MKDEV(LOOP_MAJOR, 0), range); -- 1.8.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 04/14] firewire: don't use idr_remove_all()
idr_destroy() can destroy idr by itself and idr_remove_all() is being deprecated. Drop its usage. Signed-off-by: Tejun Heo Cc: Stefan Richter Cc: linux1394-de...@lists.sourceforge.net --- This patch depends on an earlier idr patch and given the trivial nature of the patch, I think it would be best to route these together through -mm. Please holler if there's any objection. Thanks. drivers/firewire/core-cdev.c | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/firewire/core-cdev.c b/drivers/firewire/core-cdev.c index f8d2287..68c3138 100644 --- a/drivers/firewire/core-cdev.c +++ b/drivers/firewire/core-cdev.c @@ -1779,7 +1779,6 @@ static int fw_device_op_release(struct inode *inode, struct file *file) wait_event(client->tx_flush_wait, !has_outbound_transactions(client)); idr_for_each(&client->resource_idr, shutdown_resource, client); - idr_remove_all(&client->resource_idr); idr_destroy(&client->resource_idr); list_for_each_entry_safe(event, next_event, &client->event_list, link) -- 1.8.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 07/14] remoteproc: don't use idr_remove_all()
idr_destroy() can destroy idr by itself and idr_remove_all() is being deprecated. Drop its usage. Signed-off-by: Tejun Heo Cc: Ohad Ben-Cohen --- This patch depends on an earlier idr patch and given the trivial nature of the patch, I think it would be best to route these together through -mm. Please holler if there's any objection. Thanks. drivers/remoteproc/remoteproc_core.c | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/remoteproc/remoteproc_core.c b/drivers/remoteproc/remoteproc_core.c index dd3bfaf..634d367 100644 --- a/drivers/remoteproc/remoteproc_core.c +++ b/drivers/remoteproc/remoteproc_core.c @@ -1180,7 +1180,6 @@ static void rproc_type_release(struct device *dev) rproc_delete_debug_dir(rproc); - idr_remove_all(&rproc->notifyids); idr_destroy(&rproc->notifyids); if (rproc->index >= 0) -- 1.8.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 01/14] idr: make idr_destroy() imply idr_remove_all()
idr is silly in quite a few ways, one of which is how it's supposed to be destroyed - idr_destroy() doesn't release IDs and doesn't even whine if the idr isn't empty. If the caller forgets idr_remove_all(), it simply leaks memory. Even ida gets this wrong and leaks memory on destruction. There is absoltely no reason not to call idr_remove_all() from idr_destroy(). Nobody is abusing idr_destroy() for shrinking free layer buffer and continues to use idr after idr_destroy(), so it's safe to do remove_all from destroy. In the whole kernel, there is only one place where idr_remove_all() is legitimiately used without following idr_destroy() while there are quite a few places where the caller forgets either idr_remove_all() or idr_destroy() leaking memory. This patch makes idr_destroy() call idr_destroy_all() and updates the function description accordingly. Signed-off-by: Tejun Heo --- lib/idr.c | 20 +++- 1 file changed, 11 insertions(+), 9 deletions(-) diff --git a/lib/idr.c b/lib/idr.c index 6482390..1e47832 100644 --- a/lib/idr.c +++ b/lib/idr.c @@ -436,15 +436,6 @@ EXPORT_SYMBOL(idr_remove); /** * idr_remove_all - remove all ids from the given idr tree * @idp: idr handle - * - * idr_destroy() only frees up unused, cached idp_layers, but this - * function will remove all id mappings and leave all idp_layers - * unused. - * - * A typical clean-up sequence for objects stored in an idr tree will - * use idr_for_each() to free all objects, if necessay, then - * idr_remove_all() to remove all ids, and idr_destroy() to free - * up the cached idr_layers. */ void idr_remove_all(struct idr *idp) { @@ -484,9 +475,20 @@ EXPORT_SYMBOL(idr_remove_all); /** * idr_destroy - release all cached layers within an idr tree * @idp: idr handle + * + * Free all id mappings and all idp_layers. After this function, @idp is + * completely unused and can be freed / recycled. The caller is + * responsible for ensuring that no one else accesses @idp during or after + * idr_destroy(). + * + * A typical clean-up sequence for objects stored in an idr tree will use + * idr_for_each() to free all objects, if necessay, then idr_destroy() to + * free up the id mappings and cached idr_layers. */ void idr_destroy(struct idr *idp) { + idr_remove_all(idp); + while (idp->id_free_cnt) { struct idr_layer *p = get_from_free_list(idp); kmem_cache_free(idr_layer_cache, p); -- 1.8.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3/3] acpi, memory-hotplug: Support getting hotplug info from SRAT.
On 01/25/2013 05:12 PM, Andrew Morton wrote: > On Fri, 25 Jan 2013 17:42:09 +0800 > Tang Chen wrote: > >> NOTE: Using this way will cause NUMA performance down because the whole node >> will be set as ZONE_MOVABLE, and kernel cannot use memory on it. >> If users don't want to lose NUMA performance, just don't use it. > > I agree with this, but it means that nobody will test any of your new code. > > To get improved testing coverage, can you think of any temporary > testing-only patch which will cause testers to exercise the > memory-hotplug changes? > There is another problem: if ALL the nodes in the system support hotpluggable memory, what happens? -hpa -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v10 00/11] PCI, ACPI: pci root bus hotplug support / pci match_driver
On 2013-1-26 8:04, Bjorn Helgaas wrote: > On Tue, Jan 22, 2013 at 3:19 PM, Yinghai Lu wrote: >> On Tue, Jan 22, 2013 at 2:09 PM, Rafael J. Wysocki wrote: >>> On Monday, January 21, 2013 01:20:41 PM Yinghai Lu wrote: It includes 1. preparing patches for pci root bus hotadd/hotremove support 2. move root bus hotadd from acpiphp to pci_root.c 3. add hot-remove support 4. add acpi_hp_work to be shared with acpiphp and root-bus hotplug 5. add match_driver to add pci device to device tree early but not attach driver for hotplug path. based on pci/next + pm/acpi-scan could get from git://git.kernel.org/pub/scm/linux/kernel/git/yinghai/linux-yinghai.git for-pci-root-bus-hotplug -v9: merges several patches together for easy review, requested by Rafael. -v10: address comments from Rafael. Jiang Liu (2): PCI: Fix a device reference count leakage issue in pci_dev_present() PCI: make PCI device create/destroy logic symmetric Tang Chen (1): PCI, ACPI: debug print for installation of acpi root bridge's notifier Yinghai Lu (8): PCI, acpiphp: Add is_hotplug_bridge detection PCI: Add root bus children dev's res to fail list PCI: Set dev_node early for pci_dev PCI, ACPI, acpiphp: Rename alloc_acpiphp_hp_work() to alloc_acpi_hp_work PCI, acpiphp: Move and enhance hotplug support of pci host bridge PCI, acpiphp: Don't bailout even no slots found yet. PCI: Skip attaching driver in device_add() PCI: Put pci dev to device tree as early as possible >>> >>> OK >>> >>> Please feel free to add >>> >>> Acked-by: Rafael J. Wysocki >>> >>> to all of the patches in this series I haven't acked already. > > I first pulled in > "git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git > acpi-scan" again (to pci/acpi-scan2), added your acks, Rafael, and put > this series on a pci/yinghai-root-bus branch based on pci/acpi-scan2. > > I reworked some of the changelogs a bit, but I don't think I made any > code changes except that in [10/11] I just inlined the > pci_bus_attach_device() code rather than making a new function, since > it's small, there's only one caller, and I didn't think we needed any > more pci_* and pci_bus_* functions than we already have. > > Let me know if I messed anything up. Great, so I could rebase my PCI notification related work to this branch. I'm trying to resolve conflicts between acpi-scan and pci-root-bus-hotplug last night. Thanks! > > Bjorn > > . > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH]cputime: make bool type for steal ticks
On Sat, 2013-01-26 at 01:45 +0100, Frederic Weisbecker wrote: > > diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c [] > > @@ -282,7 +282,7 @@ static __always_inline bool > > steal_account_process_tick(void) [] > > - return st; > > + return !!st; > > I would expect gcc to perform the semantic "!!" cast implicitly. I > just did some basic tests locally and it does. > I prefer to be paranoid and not do any assumption though, unless I'm > told gcc always guarantees this correct implicit cast. I'm queuing > this patch and will send it to Ingo. It's unnecessary. 6.3.1.2p1: "When any scalar value is converted to _Bool, the result is 0 if the value compares equal to 0; otherwise, the result is 1." -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] checkpatch: Fix $Float creation of match variables
commit 74349bccedb ("checkpatch: add support for floating point constants") added an unnecessary match variable that caused tests that used a $Constant or $LvalOrFunc to have one too many matches. This causes problems with usleep_range, min/max and other extended tests. Avoid using match variables in $Float. Avoid using match variables in $Assignment too. Signed-off-by: Joe Perches --- scripts/checkpatch.pl | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl index 9de3a69..3d0f577 100755 --- a/scripts/checkpatch.pl +++ b/scripts/checkpatch.pl @@ -230,12 +230,12 @@ our $Inline = qr{inline|__always_inline|noinline}; our $Member= qr{->$Ident|\.$Ident|\[[^]]*\]}; our $Lval = qr{$Ident(?:$Member)*}; -our $Float_hex = qr{(?i:0x[0-9a-f]+p-?[0-9]+[fl]?)}; -our $Float_dec = qr{(?i:((?:[0-9]+\.[0-9]*|[0-9]*\.[0-9]+)(?:e-?[0-9]+)?[fl]?))}; -our $Float_int = qr{(?i:[0-9]+e-?[0-9]+[fl]?)}; +our $Float_hex = qr{(?i)0x[0-9a-f]+p-?[0-9]+[fl]?}; +our $Float_dec = qr{(?i)(?:[0-9]+\.[0-9]*|[0-9]*\.[0-9]+)(?:e-?[0-9]+)?[fl]?}; +our $Float_int = qr{(?i)[0-9]+e-?[0-9]+[fl]?}; our $Float = qr{$Float_hex|$Float_dec|$Float_int}; -our $Constant = qr{(?:$Float|(?i:(?:0x[0-9a-f]+|[0-9]+)[ul]*))}; -our $Assignment= qr{(?:\*\=|/=|%=|\+=|-=|<<=|>>=|&=|\^=|\|=|=)}; +our $Constant = qr{$Float|(?i)(?:0x[0-9a-f]+|[0-9]+)[ul]*}; +our $Assignment= qr{\*\=|/=|%=|\+=|-=|<<=|>>=|&=|\^=|\|=|=}; our $Compare= qr{<=|>=|==|!=|<|>}; our $Operators = qr{ <=|>=|==|!=| -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3/3] acpi, memory-hotplug: Support getting hotplug info from SRAT.
On Fri, 25 Jan 2013 17:42:09 +0800 Tang Chen wrote: > NOTE: Using this way will cause NUMA performance down because the whole node > will be set as ZONE_MOVABLE, and kernel cannot use memory on it. > If users don't want to lose NUMA performance, just don't use it. I agree with this, but it means that nobody will test any of your new code. To get improved testing coverage, can you think of any temporary testing-only patch which will cause testers to exercise the memory-hotplug changes? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 3/3] timekeeping: Add CONFIG_HAS_PERSISTENT_CLOCK option
On 01/22/2013 11:49 AM, John Stultz wrote: On 01/22/2013 11:44 AM, Jason Gunthorpe wrote: On Tue, Jan 15, 2013 at 11:50:18AM -0800, John Stultz wrote: On 01/15/2013 08:09 AM, Feng Tang wrote: Make the persistent clock check a kernel config option, so that some platform can explicitely select it, also make CONFIG_RTC_HCTOSYS depends on its non-existence, which could prevent the persistent clock and RTC code from doing similar thing twice during system's init/suspend/resume phases. If the CONFIG_HAS_PERSISTENT_CLOCK=n, then no change happens for kernel which still does the persistent clock check in timekeeping_init(). Cc: Thomas Gleixner Suggested-by: John Stultz Signed-off-by: Feng Tang Applied. I also added a dependency for Jason's CONFIG_RTC_SYSTOHC. Sort of an ugly config name, since I gather ARM should always set this to 'n'... CONFIG_USE_ONLY_PERSISTENT_CLOCK ? (Sigh. I got this seemingly microseconds after I sent the pull request :) So yea, fair point, there could be some confusion. But ONLY_PERSISTENT_CLOCK isn't quite right either, more like CONFIG_HAS_PERSISTENT_CLOCK_ALWAYS or something. Decided upon CONFIG_ALWAYS_USE_PERSISTENT_CLOCK which I think is clear enough. Let me know if you object or have a better idea. thanks -john -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [tip:x86/asm] x86/xor: Make virtualization friendly
On 01/25/2013 02:15 PM, H. Peter Anvin wrote: On 01/25/2013 02:11 PM, H. Peter Anvin wrote: On 01/25/2013 02:43 AM, tip-bot for Jan Beulich wrote: Commit-ID: 05fbf4d6fc6a3c0c3e63b77979c9311596716d10 Gitweb: http://git.kernel.org/tip/05fbf4d6fc6a3c0c3e63b77979c9311596716d10 Author: Jan Beulich AuthorDate: Fri, 2 Nov 2012 14:21:23 + Committer: Ingo Molnar CommitDate: Fri, 25 Jan 2013 09:23:51 +0100 x86/xor: Make virtualization friendly In virtualized environments, the CR0.TS management needed here can be a lot slower than anticipated by the original authors of this code, which particularly means that in such cases forcing the use of SSE- (or MMX-) based implementations is not desirable - actual measurements should always be done in that case. For consistency, pull into the shared (32- and 64-bit) header not only the inclusion of the generic code, but also that of the AVX variants. This patch is wrong and should be dropped. I verified it with the KVM people that they do NOT want this change. It is a Xen-specific problem. FWIW: I have dropped this patch from tip:x86/asm. The bottom line, I guess, is that we need something like cpu_has_slow_kernel_fpu or something like that, and set it for specifically affected hypervisors? Do we know if Hyper-V has performance issues with CR0.TS? -hpa -- H. Peter Anvin, Intel Open Source Technology Center I work for Intel. I don't speak on their behalf. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 1/2] PCI: introduce accessor to retrieve PCIe Capabilities Register
Provide an accessor to retrieve the PCI Express device's Capabilities Register. Signed-off-by: Myron Stowe --- include/linux/pci.h |9 + 1 files changed, 9 insertions(+), 0 deletions(-) diff --git a/include/linux/pci.h b/include/linux/pci.h index 15472d6..78581e1 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -1693,6 +1693,15 @@ static inline bool pci_is_pcie(struct pci_dev *dev) } /** + * pcie_caps_reg - get the PCIe Capabilities Register + * @dev: PCI device + */ +static inline u16 pcie_caps_reg(const struct pci_dev *dev) +{ + return dev->pcie_flags_reg; +} + +/** * pci_pcie_type - get the PCIe device/port type * @dev: PCI device */ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 2/2] PCI: Use PCI Express Capability accessors
Use PCI Express Capability access functions to simplify device Capabilities Register usages. Signed-off-by: Myron Stowe --- drivers/pci/access.c|4 ++-- drivers/pci/pcie/portdrv_core.c |2 +- include/linux/pci.h |2 +- 3 files changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/pci/access.c b/drivers/pci/access.c index 3af0478..5278ac6 100644 --- a/drivers/pci/access.c +++ b/drivers/pci/access.c @@ -472,7 +472,7 @@ EXPORT_SYMBOL_GPL(pci_cfg_access_unlock); static inline int pcie_cap_version(const struct pci_dev *dev) { - return dev->pcie_flags_reg & PCI_EXP_FLAGS_VERS; + return pcie_caps_reg(dev) & PCI_EXP_FLAGS_VERS; } static inline bool pcie_cap_has_devctl(const struct pci_dev *dev) @@ -497,7 +497,7 @@ static inline bool pcie_cap_has_sltctl(const struct pci_dev *dev) return pcie_cap_version(dev) > 1 || type == PCI_EXP_TYPE_ROOT_PORT || (type == PCI_EXP_TYPE_DOWNSTREAM && - dev->pcie_flags_reg & PCI_EXP_FLAGS_SLOT); + pcie_caps_reg(dev) & PCI_EXP_FLAGS_SLOT); } static inline bool pcie_cap_has_rtctl(const struct pci_dev *dev) diff --git a/drivers/pci/pcie/portdrv_core.c b/drivers/pci/pcie/portdrv_core.c index b42133a..31063ac 100644 --- a/drivers/pci/pcie/portdrv_core.c +++ b/drivers/pci/pcie/portdrv_core.c @@ -272,7 +272,7 @@ static int get_port_device_capability(struct pci_dev *dev) /* Hot-Plug Capable */ if ((cap_mask & PCIE_PORT_SERVICE_HP) && - dev->pcie_flags_reg & PCI_EXP_FLAGS_SLOT) { + pcie_caps_reg(dev) & PCI_EXP_FLAGS_SLOT) { pcie_capability_read_dword(dev, PCI_EXP_SLTCAP, ®32); if (reg32 & PCI_EXP_SLTCAP_HPC) { services |= PCIE_PORT_SERVICE_HP; diff --git a/include/linux/pci.h b/include/linux/pci.h index 78581e1..63b3628 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -1707,7 +1707,7 @@ static inline u16 pcie_caps_reg(const struct pci_dev *dev) */ static inline int pci_pcie_type(const struct pci_dev *dev) { - return (dev->pcie_flags_reg & PCI_EXP_FLAGS_TYPE) >> 4; + return (pcie_caps_reg(dev) & PCI_EXP_FLAGS_TYPE) >> 4; } void pci_request_acs(void); -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 0/2] Extend interfaces to access PCIe capabilities registers
This series is a minor extension to Jiang Liu's recent efforts - [PATCH v3 00/32] provide interfaces to access PCIe capabilities registers - which adds an additional PCI Express accessor for obtaining a device's Capabilities Register. Reference: https://lkml.org/lkml/2012/8/1/253 --- Myron Stowe (2): PCI: Use PCI Express Capability accessors PCI: introduce accessor to retrieve PCIe Capabilities Register drivers/pci/access.c|4 ++-- drivers/pci/pcie/portdrv_core.c |2 +- include/linux/pci.h | 11 ++- 3 files changed, 13 insertions(+), 4 deletions(-) -- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 19/19] [INCOMPLETE] ARM: make return_address available for ARM_UNWIND
On Friday 25 January 2013, Dave Martin wrote: > On Fri, Jan 25, 2013 at 11:44:14AM -0500, Steven Rostedt wrote: > > [ I got an error with linux-arm-ker...@list.infradead.org and had to > > remove from CC ] > > Blame Arnd :) > Sorry about that, I now posted the entire series again with the right mailing list address. ARnd -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH]cputime: make bool type for steal ticks
2012/11/16 liguang : > Signed-off-by: liguang > --- > kernel/sched/cputime.c |2 +- > 1 files changed, 1 insertions(+), 1 deletions(-) > > diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c > index 81b763b..d2c24c1 100644 > --- a/kernel/sched/cputime.c > +++ b/kernel/sched/cputime.c > @@ -282,7 +282,7 @@ static __always_inline bool > steal_account_process_tick(void) > this_rq()->prev_steal_time += st * TICK_NSEC; > > account_steal_time(st); > - return st; > + return !!st; I would expect gcc to perform the semantic "!!" cast implicitly. I just did some basic tests locally and it does. I prefer to be paranoid and not do any assumption though, unless I'm told gcc always guarantees this correct implicit cast. I'm queuing this patch and will send it to Ingo. Thanks! > } > #endif > return false; > -- > 1.7.1 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3/3] acpi, memory-hotplug: Support getting hotplug info from SRAT.
On Fri, 25 Jan 2013 17:42:09 +0800 Tang Chen wrote: > We now provide an option for users who don't want to specify physical > memory address in kernel commandline. > > /* > * For movablemem_map=acpi: > * > * SRAT:|_| |_| |_| |_| .. > * node id:0 1 1 2 > * hotpluggable: n y y n > * movablemem_map: |_| |_| > * > * Using movablemem_map, we can prevent memblock from allocating > memory > * on ZONE_MOVABLE at boot time. > */ > > So user just specify movablemem_map=acpi, and the kernel will use hotpluggable > info in SRAT to determine which memory ranges should be set as ZONE_MOVABLE. Well, as a reult of my previous hackery, arch/x86/mm/srat.c now looks rather different. Please check it carefully and runtime test this code when it appears in linux-next? /* * ACPI 3.0 based NUMA setup * Copyright 2004 Andi Kleen, SuSE Labs. * * Reads the ACPI SRAT table to figure out what memory belongs to which CPUs. * * Called from acpi_numa_init while reading the SRAT and SLIT tables. * Assumes all memory regions belonging to a single proximity domain * are in one chunk. Holes between them will be included in the node. */ #include #include #include #include #include #include #include #include #include #include #include #include #include #include int acpi_numa __initdata; static __init int setup_node(int pxm) { return acpi_map_pxm_to_node(pxm); } static __init void bad_srat(void) { printk(KERN_ERR "SRAT: SRAT not used.\n"); acpi_numa = -1; } static __init inline int srat_disabled(void) { return acpi_numa < 0; } /* Callback for SLIT parsing */ void __init acpi_numa_slit_init(struct acpi_table_slit *slit) { int i, j; for (i = 0; i < slit->locality_count; i++) for (j = 0; j < slit->locality_count; j++) numa_set_distance(pxm_to_node(i), pxm_to_node(j), slit->entry[slit->locality_count * i + j]); } /* Callback for Proximity Domain -> x2APIC mapping */ void __init acpi_numa_x2apic_affinity_init(struct acpi_srat_x2apic_cpu_affinity *pa) { int pxm, node; int apic_id; if (srat_disabled()) return; if (pa->header.length < sizeof(struct acpi_srat_x2apic_cpu_affinity)) { bad_srat(); return; } if ((pa->flags & ACPI_SRAT_CPU_ENABLED) == 0) return; pxm = pa->proximity_domain; apic_id = pa->apic_id; if (!apic->apic_id_valid(apic_id)) { printk(KERN_INFO "SRAT: PXM %u -> X2APIC 0x%04x ignored\n", pxm, apic_id); return; } node = setup_node(pxm); if (node < 0) { printk(KERN_ERR "SRAT: Too many proximity domains %x\n", pxm); bad_srat(); return; } if (apic_id >= MAX_LOCAL_APIC) { printk(KERN_INFO "SRAT: PXM %u -> APIC 0x%04x -> Node %u skipped apicid that is too big\n", pxm, apic_id, node); return; } set_apicid_to_node(apic_id, node); node_set(node, numa_nodes_parsed); acpi_numa = 1; printk(KERN_INFO "SRAT: PXM %u -> APIC 0x%04x -> Node %u\n", pxm, apic_id, node); } /* Callback for Proximity Domain -> LAPIC mapping */ void __init acpi_numa_processor_affinity_init(struct acpi_srat_cpu_affinity *pa) { int pxm, node; int apic_id; if (srat_disabled()) return; if (pa->header.length != sizeof(struct acpi_srat_cpu_affinity)) { bad_srat(); return; } if ((pa->flags & ACPI_SRAT_CPU_ENABLED) == 0) return; pxm = pa->proximity_domain_lo; if (acpi_srat_revision >= 2) pxm |= *((unsigned int*)pa->proximity_domain_hi) << 8; node = setup_node(pxm); if (node < 0) { printk(KERN_ERR "SRAT: Too many proximity domains %x\n", pxm); bad_srat(); return; } if (get_uv_system_type() >= UV_X2APIC) apic_id = (pa->apic_id << 8) | pa->local_sapic_eid; else apic_id = pa->apic_id; if (apic_id >= MAX_LOCAL_APIC) { printk(KERN_INFO "SRAT: PXM %u -> APIC 0x%02x -> Node %u skipped apicid that is too big\n", pxm, apic_id, node); return; } set_apicid_to_node(apic_id, node); node_set(node, numa_nodes_parsed); acpi_numa = 1; printk(KERN_INFO "SRAT: PXM %u -> APIC 0x%02x -> Node %u\n", pxm, apic_id, node); } #ifdef CONFIG_MEMORY_HOTPLUG static inline int save_add_info(void) {re
Re: [PATCH 2/3] acpi, memory-hotplug: Extend movablemem_map ranges to the end of node.
On Fri, 25 Jan 2013 17:42:08 +0800 Tang Chen wrote: > When implementing movablemem_map boot option, we introduced an array > movablemem_map.map[] to store the memory ranges to be set as ZONE_MOVABLE. > > Since ZONE_MOVABLE is the latst zone of a node, if user didn't specify > the whole node memory range, we need to extend it to the node end so that > we can use it to prevent memblock from allocating memory in the ranges > user didn't specify. > > We now implement movablemem_map boot option like this: > /* > * For movablemem_map=nn[KMG]@ss[KMG]: > * > * SRAT:|_| |_| |_| |_| .. > * node id:0 1 1 2 > * user specified:|__| |___| > * movablemem_map:|___| |_||__| .. > * > * Using movablemem_map, we can prevent memblock from allocating > memory > * on ZONE_MOVABLE at boot time. > * > * NOTE: In this case, SRAT info will be ingored. > */ > The patch generates a bunch of rejects, partly due to linux-next changes but I think I fixed everything up OK. > index 4ddf497..f841d0e 100644 > --- a/arch/x86/mm/srat.c > +++ b/arch/x86/mm/srat.c > @@ -141,11 +141,16 @@ static inline int save_add_info(void) {return 1;} > static inline int save_add_info(void) {return 0;} > #endif > > +#ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP > +extern struct movablemem_map movablemem_map; > +#endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */ Well. a) we shouldn't put extern declarations in C files - put them in headers so we can be assured that all compilation units agree on the type. b) the ifdefs are unneeded - a unused extern declaration is OK (as long as the type itself is always defined!) c) movablemem_map is already declared in memblock.h. So I zapped the above three lines. > @@ -178,9 +185,57 @@ acpi_numa_memory_affinity_init(struct > acpi_srat_mem_affinity *ma) > > node_set(node, numa_nodes_parsed); > > - printk(KERN_INFO "SRAT: Node %u PXM %u [mem %#010Lx-%#010Lx]\n", > + printk(KERN_INFO "SRAT: Node %u PXM %u [mem %#010Lx-%#010Lx] %s\n", > node, pxm, > -(unsigned long long) start, (unsigned long long) end - 1); > +(unsigned long long) start, (unsigned long long) end - 1, > +hotpluggable ? "Hot Pluggable": ""); > + > +#ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP > + int overlap; > + unsigned long start_pfn, end_pfn; no, we don't put declarations of locals in the middle of C statements like this: arch/x86/mm/srat.c: In function 'acpi_numa_memory_affinity_init': arch/x86/mm/srat.c:185: warning: ISO C90 forbids mixed declarations and code Did your compiler not emit this warning? I fixed this by moving the code into a new function "handle_movablemem". Feel free to suggest a more appropriate name! From: Andrew Morton Subject: acpi-memory-hotplug-extend-movablemem_map-ranges-to-the-end-of-node-fix clean up code, fix build warning Cc: "Brown, Len" Cc: "H. Peter Anvin" Cc: Ingo Molnar Cc: Jiang Liu Cc: Jianguo Wu Cc: KOSAKI Motohiro Cc: Kamezawa Hiroyuki Cc: Lai Jiangshan Cc: Len Brown Cc: Tang Chen Cc: Thomas Gleixner Cc: Wu Jianguo Cc: Yasuaki Ishimatsu Signed-off-by: Andrew Morton --- arch/x86/mm/srat.c | 93 ++- 1 file changed, 49 insertions(+), 44 deletions(-) diff -puN arch/x86/mm/srat.c~acpi-memory-hotplug-extend-movablemem_map-ranges-to-the-end-of-node-fix arch/x86/mm/srat.c --- a/arch/x86/mm/srat.c~acpi-memory-hotplug-extend-movablemem_map-ranges-to-the-end-of-node-fix +++ a/arch/x86/mm/srat.c @@ -142,50 +142,8 @@ static inline int save_add_info(void) {r #endif #ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP -extern struct movablemem_map movablemem_map; -#endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */ - -/* Callback for parsing of the Proximity Domain <-> Memory Area mappings */ -int __init -acpi_numa_memory_affinity_init(struct acpi_srat_mem_affinity *ma) +static void __init handle_movablemem(int node, u64 start, u64 end) { - u64 start, end; - u32 hotpluggable; - int node, pxm; - - if (srat_disabled()) - goto out_err; - if (ma->header.length != sizeof(struct acpi_srat_mem_affinity)) - goto out_err_bad_srat; - if ((ma->flags & ACPI_SRAT_MEM_ENABLED) == 0) - goto out_err; - hotpluggable = ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE; - if (hotpluggable && !save_add_info()) - goto out_err; - - start = ma->base_address; - end = start + ma->length; - pxm = ma->proximity_domain; - if (acpi_srat_revision <= 1) - pxm &= 0xff; - - node = setup_node(pxm); - if (node < 0) { - printk(KERN_ERR "SRAT: Too many proximity domains.\n"); - goto out_err_bad_srat; - } -