Re: [PATCH v6 00/15] memory-hotplug: hot-remove physical memory
On 01/10/2013 11:31 AM, Kamezawa Hiroyuki wrote: > (2013/01/10 16:14), Glauber Costa wrote: >> On 01/10/2013 06:17 AM, Tang Chen wrote: > Note: if the memory provided by the memory device is used by the > kernel, it > can't be offlined. It is not a bug. Right. But how often does this happen in testing? In other words, please provide an overall description of how well memory hot-remove is presently operating. Is it reliable? What is the success rate in real-world situations? >>> >>> We test the hot-remove functionality mostly with movable_online used. >>> And the memory used by kernel is not allowed to be removed. >> >> Can you try doing this using cpusets configured to hardwall ? >> It is my understanding that the object allocators will try hard not to >> allocate anything outside the walls defined by cpuset. Which means that >> if you have one process per node, and they are hardwalled, your kernel >> memory will be spread evenly among the machine. With a big enough load, >> they should eventually be present in all blocks. >> > > I'm sorry I couldn't catch your point. > Do you want to confirm whether cpuset can work enough instead of > ZONE_MOVABLE ? > Or Do you want to confirm whether ZONE_MOVABLE will not work if it's > used with cpuset ? > > No, I am not proposing to use cpuset do tackle the problem. I am just wondering if you would still have high success rates with cpusets in use with hardwalls. This is just one example of a workload that would spread kernel memory around quite heavily. So this is just me trying to understand the limitations of the mechanism. >> Another question I have for you: Have you considering calling >> shrink_slab to try to deplete the caches and therefore free at least >> slab memory in the nodes that can't be offlined? Is it relevant? >> > > At this stage, we don't consider to call shrink_slab(). We require > nearly 100% success at offlining memory for removing DIMM. > It's my understanding. > Of course, this is indisputable. > IMHO, I don't think shrink_slab() can kill all objects in a node even > if they are some caches. We need more study for doing that. > Indeed, shrink_slab can only kill cached objects. They, however, are usually a very big part of kernel memory. I wonder though if in case of failure, it is worth it to try at least one shrink pass before you give up. It is not very different from what is in memory-failure.c, except that we could do better and do a more targetted shrinking (support for that is being worked on) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] cpufreq: Simplify __cpufreq_remove_dev()
On Wed, Jan 09, 2013 at 04:50:44PM +0530, Viresh Kumar wrote: > @Shawn: I believe your driver don't require that ugly code anymore (Though i > know there is a situation for that to happen, if we have two cpus, you remove > second one and then add it back. With this cpufreq_add_dev() would call init() > first and then try to match if there are any managed_policies present. But the > issue you pointed out about unregistering the driver would be solved by this > patch.) Yes, just played it and it works for me. However, I would have to keep that little ugly code in my patch to save the dependency on your patch. Will send a follow-up to clean that up once your patch hits mainline. Shawn -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] bnx2x: NULL dereference on error in debug code
From: Dan Carpenter Date: Tue, 8 Jan 2013 16:42:14 +0300 > "vfop" is NULL here. I've changed the debugging to not use it. > > Signed-off-by: Dan Carpenter Applied, thanks Dan. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] MAINTAINERS: fix a status pattern
Change MAINTAINED to Maintained. Signed-off-by: Zhang Yanfei --- MAINTAINERS |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/MAINTAINERS b/MAINTAINERS index c4f15a1..33d8f95 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -648,7 +648,7 @@ F: arch/arm/ ARM SUB-ARCHITECTURES L: linux-arm-ker...@lists.infradead.org (moderated for non-subscribers) -S: MAINTAINED +S: Maintained F: arch/arm/mach-*/ F: arch/arm/plat-*/ T: git git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc.git -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] MAINTAINERS: fix arch/arm/plat-omap/include/plat/omap_hwmod.h
This file was moved to arch/arm/mach-omap2/omap_hwmod.h by commit 2a296c8 (ARM: OMAP: Make plat/omap_hwmod.h local to mach-omap2). Signed-off-by: Zhang Yanfei --- MAINTAINERS |3 +-- 1 files changed, 1 insertions(+), 2 deletions(-) diff --git a/MAINTAINERS b/MAINTAINERS index 915564e..c4f15a1 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -5507,8 +5507,7 @@ M:Benoît Cousson M: Paul Walmsley L: linux-o...@vger.kernel.org S: Maintained -F: arch/arm/mach-omap2/omap_hwmod.c -F: arch/arm/plat-omap/include/plat/omap_hwmod.h +F: arch/arm/mach-omap2/omap_hwmod.* OMAP HWMOD DATA FOR OMAP4-BASED DEVICES M: Benoît Cousson -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH V2 3/3] ARM: davinci: da850: add NAND driver entries
On Wed, Jan 09, 2013 at 18:17:46, Nori, Sekhar wrote: > On 1/8/2013 1:50 PM, Kumar, Anil wrote: > > Add NAND driver entries to export NAND functionality on da850 EVM and > > NAND pinctrl node to do pin mux according to pinctrl-single driver. > > Subject line should have DT and da850 evm somewhere. > Otherwise NAND has been supported on DA850 for long. I will update it in V3 of this patch series. > > > > > Signed-off-by: Kumar, Anil > > --- > > :100644 100644 c7609d0... 382a7da... M arch/arm/boot/dts/da850-evm.dts > > :100644 100644 e9c6e82... 16e2ac2... M arch/arm/boot/dts/da850.dtsi > > arch/arm/boot/dts/da850-evm.dts | 19 +++ > > arch/arm/boot/dts/da850.dtsi| 13 + > > 2 files changed, 32 insertions(+), 0 deletions(-) > > > > diff --git a/arch/arm/boot/dts/da850-evm.dts > > b/arch/arm/boot/dts/da850-evm.dts > > index c7609d0..382a7da 100644 > > --- a/arch/arm/boot/dts/da850-evm.dts > > +++ b/arch/arm/boot/dts/da850-evm.dts > > @@ -28,4 +28,23 @@ > > status = "okay"; > > }; > > }; > > + nand_cs3@6200 { > > + status = "okay"; > > + }; > > +}; > > +&pmx_core{ > > + pinctrl-names = "default"; > > + pinctrl-0 = < > > + &nand_cs3_pins > > + >; > > This means that the NAND pins are configured even if NAND is not > probed. Right? Right > This can be moved into the nand_cs3 node to avoid that. > And then when used with Linus Walleij's patch "drivers/pinctrl: grab > default handles from device core" which should be accepted soon, the > pins will be automatically setup when the NAND gets probed. It is better. I will do it in V3 of this patch series. > > > + > > + nand_cs3_pins: pinmux_nand_pins { > > + pinctrl-single,bits = < > > + 0x1c 0x0011 0x00ff /* EMA_OE, EMA_WE */ > > + 0x1c 0x0110 0x0ff0 /* EMA_CS[4],EMA_CS[3]*/ > > + 0x24 0x 0x /* EMA_D[0], EMA_D[1], > > EMA_D[2], EMA_D[3], > > + EMA_D[4], EMA_D[5], > > EMA_D[6], EMA_D[7] */ > > + 0x30 0x0110 0x0ff0 /* EMA_A[1], EMA_A[2] */ > > + >; > > + }; > > I suspect a number of other boards using NAND on CS3 would have to > repeat these configurations. It will be better to move them into the > da850.dtsi file so it can be reused. > > Since I am new to the DT and pinctrl stuff, I had to actually try these to > make sure what I am suggesting isn't nonsense. I came up with this > patch on top of your series. Please test it at your end Tested. It is working. > and also see if you > are happy with the changes. I do not think that it is good idea to move NAND pin mux information into da850.dtsi because this information is evm specific. if we will use this approach then we must use the same approach for other modules also as ASoC etc. Thanks, Anil > ---8< > diff --git a/arch/arm/boot/dts/da850-evm.dts b/arch/arm/boot/dts/da850-evm.dts > index 382a7da..52abb9d 100644 > --- a/arch/arm/boot/dts/da850-evm.dts > +++ b/arch/arm/boot/dts/da850-evm.dts > @@ -30,21 +30,8 @@ > }; > nand_cs3@6200 { > status = "okay"; > + pinctrl-names = "default"; > + pinctrl-0 = <&nand_cs3_pins>; > }; > }; > -&pmx_core{ > - pinctrl-names = "default"; > - pinctrl-0 = < > - &nand_cs3_pins > - >; > > - nand_cs3_pins: pinmux_nand_pins { > - pinctrl-single,bits = < > - 0x1c 0x0011 0x00ff /* EMA_OE, EMA_WE */ > - 0x1c 0x0110 0x0ff0 /* EMA_CS[4],EMA_CS[3]*/ > - 0x24 0x 0x /* EMA_D[0], EMA_D[1], > EMA_D[2], EMA_D[3], > -EMA_D[4], EMA_D[5], > EMA_D[6], EMA_D[7] */ > - 0x30 0x0110 0x0ff0 /* EMA_A[1], EMA_A[2] */ > - >; > - }; > -}; > diff --git a/arch/arm/boot/dts/da850.dtsi b/arch/arm/boot/dts/da850.dtsi > index 6ab9730..c4b21a8 100644 > --- a/arch/arm/boot/dts/da850.dtsi > +++ b/arch/arm/boot/dts/da850.dtsi > @@ -38,7 +38,25 @@ > pinctrl-single,register-width = <32>; > pinctrl-single,function-mask = <0x>; > status = "disabled"; > + > + nand_cs3_pins: pinmux_nand_pins { > + pinctrl-single,bits = < > + /* EMA_OE, EMA_WE */ > + 0x1c 0x0011 0x00ff > + /* EMA_CS[4],EMA_CS[3]*/ > + 0x1c 0x0110 0x0ff0 > + /* > + * EMA_D[0], EMA_D[1], EMA_D[2], > + * EMA_D[3
Re: [PATCH v6 00/15] memory-hotplug: hot-remove physical memory
(2013/01/10 16:14), Glauber Costa wrote: On 01/10/2013 06:17 AM, Tang Chen wrote: Note: if the memory provided by the memory device is used by the kernel, it can't be offlined. It is not a bug. Right. But how often does this happen in testing? In other words, please provide an overall description of how well memory hot-remove is presently operating. Is it reliable? What is the success rate in real-world situations? We test the hot-remove functionality mostly with movable_online used. And the memory used by kernel is not allowed to be removed. Can you try doing this using cpusets configured to hardwall ? It is my understanding that the object allocators will try hard not to allocate anything outside the walls defined by cpuset. Which means that if you have one process per node, and they are hardwalled, your kernel memory will be spread evenly among the machine. With a big enough load, they should eventually be present in all blocks. I'm sorry I couldn't catch your point. Do you want to confirm whether cpuset can work enough instead of ZONE_MOVABLE ? Or Do you want to confirm whether ZONE_MOVABLE will not work if it's used with cpuset ? Another question I have for you: Have you considering calling shrink_slab to try to deplete the caches and therefore free at least slab memory in the nodes that can't be offlined? Is it relevant? At this stage, we don't consider to call shrink_slab(). We require nearly 100% success at offlining memory for removing DIMM. It's my understanding. IMHO, I don't think shrink_slab() can kill all objects in a node even if they are some caches. We need more study for doing that. Thanks, -Kame -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 05/14] lib: Add I/O map cache implementation
On Wed, Jan 09, 2013 at 04:17:58PM -0700, Jason Gunthorpe wrote: > On Wed, Jan 09, 2013 at 04:12:31PM -0700, Stephen Warren wrote: > > On 01/09/2013 03:10 PM, Arnd Bergmann wrote: > > > On Wednesday 09 January 2013, Thierry Reding wrote: > > >> What happens on Tegra is that we need to map 256 MiB of physical memory > > >> to access all the PCIe extended configuration space. However, ioremap() > > >> on such a large region fails if not enough vmalloc() space is available. > > >> > > >> This was observed when somebody tested this on CardHu which has a 1 GiB > > >> of RAM and therefore remapping the full 256 MiB fails. > > ... > > > Have you checked if the hardware supports an alternative config > > > space access mechanism that does not depend on a huge address range? > > > A lot of them provide an index/data register pair somewhere, as the > > > original PC implementation did. > > > > That would be nice, but I've talked to the HW engineers, and there's no > > indication that any alternative mechanism exists. > > It seems to be convention that extended config space is often only > accessible through mmio space, that was true on x86 last I checked > too.. > > You could decrease the size of the mapping to only span the bus > numbers that are configured for use via DT. That won't work, unfortunately. The mapping is such that the bus number is not encoded in the uppermost bits, the extended register number is. So the only thing that we could do is decrease the size of the extended register space for *all* devices. > Are there any concerns about these config registers being accessed > from a context where a new mapping can't be made? Interrupt? Machine > Check? PCI-E Advanced Error Reporting? I haven't checked but I would expect configuration space accesses to not happen in interrupt context. Usually they are limited to enumeration and driver probe. Thierry pgp8hk9HE_8Vd.pgp Description: PGP signature
Re: [PATCH 1/2] Add mempressure cgroup
On 01/10/2013 02:06 AM, Anton Vorontsov wrote: > On Wed, Jan 09, 2013 at 01:55:14PM -0800, Tejun Heo wrote: > [...] >>> We can use mempressure w/o memcg, and even then it can (or should :) be >>> useful (for cpuset, for example). >> >> The problem is that you end with, at the very least, duplicate >> hierarchical accounting mechanisms which overlap with each other >> while, most likely, being slightly different. About the same thing >> happened with cpu and cpuacct controllers and we're now trying to >> deprecate the latter. > > Yeah. I started answering your comments about hierarchical accounting, > looked into the memcg code, and realized that *this* is where I need the > memcg stuff. :) > > Thus yes, I guess I'll have to integrate it with memcg, or sort of. > That being my point since the beginning. To generate per-memcg pressure, you need memcg anyway. So you would have to have two different and orthogonal mechanisms, and therefore, double account. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v6 00/15] memory-hotplug: hot-remove physical memory
On 01/10/2013 06:17 AM, Tang Chen wrote: >>> Note: if the memory provided by the memory device is used by the >>> kernel, it >>> can't be offlined. It is not a bug. >> >> Right. But how often does this happen in testing? In other words, >> please provide an overall description of how well memory hot-remove is >> presently operating. Is it reliable? What is the success rate in >> real-world situations? > > We test the hot-remove functionality mostly with movable_online used. > And the memory used by kernel is not allowed to be removed. Can you try doing this using cpusets configured to hardwall ? It is my understanding that the object allocators will try hard not to allocate anything outside the walls defined by cpuset. Which means that if you have one process per node, and they are hardwalled, your kernel memory will be spread evenly among the machine. With a big enough load, they should eventually be present in all blocks. Another question I have for you: Have you considering calling shrink_slab to try to deplete the caches and therefore free at least slab memory in the nodes that can't be offlined? Is it relevant? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/5] kfifo: remove unnecessary type check
On Wed, Jan 09, 2013 at 04:29:39PM +0100, Stefani Seibold wrote: > Am Mittwoch, den 09.01.2013, 10:35 +0800 schrieb Yuanhan Liu: > > On Tue, Jan 08, 2013 at 10:51:04PM +0100, Stefani Seibold wrote: > > > Am Dienstag, den 08.01.2013, 22:57 +0800 schrieb Yuanhan Liu: > > > > Firstly, this kind of type check doesn't work. It does something similar > > > > as following: > > > > void * __dummy = NULL; > > > > __buf = __dummy; > > > > > > > > __dummy is defined as void *. Thus it will not trigger warnings as > > > > expected. > > > > > > > > Second, we don't need that kind of check. Since the prototype > > > > of __kfifo_out is: > > > > unsigned int __kfifo_out(struct __kfifo *fifo, void *buf, > > > > unsigned int len) > > > > > > > > buf is defined as void *, so we don't need do the type check. Remove it. > > > > > > > > > > Thats wrong. > > > > > > First the type checking will be used in kfifo_put() and kfifo_in() for > > > const types to check if the passed type of the data can converted to the > > > fifo element type. > > > > Hi Stefani, > > > > Yes, I see now. After rechecking the code, I found that this kind of > > type checking only works for those static defined kifo by > > DECLARE/DEFINE_KFIFO. As the ptrtype is the same as the data type: > > > > /* the 4th argument "type" is "ptrtype" */ > > #define STRUCT_KFIFO(type, size) struct __STRUCT_KFIFO(type, size, 0, > > type) > > > > #define DECLARE_KFIFO(fifo, type, size) STRUCT_KFIFO(type, size) fifo > > > > While, for those kfifo dynamically allocated, the type checking will not > > work as expected then as ptrtype is always "void": > > > > struct kfifo __STRUCT_KFIFO_PTR(unsigned char, 0, void); > > > > You should avoid using struct kfifo, as you can read in kfifo.h this is > only for compatibility reason. Well, the fact is struct kfifo is used far more widely than DECLARE/DEFINE_KFIFO; say above 50 vs less than 10. Thanks. --yliu > > If you use the macro DECLARE_KFIFO_PTR(), DECLARE_KFIFO() or > DEFINE_KFIFO() instead. > > Have a look at the examples files in the samples/kfifo directory. > > - Stefani > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 05/14] lib: Add I/O map cache implementation
On Wed, Jan 09, 2013 at 10:10:49PM +, Arnd Bergmann wrote: > On Wednesday 09 January 2013, Thierry Reding wrote: > > What happens on Tegra is that we need to map 256 MiB of physical memory > > to access all the PCIe extended configuration space. However, ioremap() > > on such a large region fails if not enough vmalloc() space is available. > > > > This was observed when somebody tested this on CardHu which has a 1 GiB > > of RAM and therefore remapping the full 256 MiB fails. > > Hmm, config space accesses are fairly rare and generally not expected > to be fast, and 256 MB is really a huge waste of virtual address space, > so I agree that just ioremapping the entire space is not a good > solution. > > However, it's not clear that a cache is necessary. Have you measured > a significant performance benefit of this implementation over just > iorempping and unmapping a single page for every config space access? No, I had not actually implemented it that way because I figured I might just as well implement something generic with the added benefit that most remapping operations would be cached automatically since the PCI enumeration algorithms usually access the configuration space of a single device at a time, so it actually maps to the best case for an LRU based cache approach. > Even if we actually want a cache, how about a private implementation > that just remembers a single page in LRU? I doubt that there are > more drivers that would benefit from a generalized version that you > provide. I can move the code to the Tegra PCIe driver, but there's quite a bit of code that isn't actually related to the PCI functionality and I'd really like to avoid cluttering the driver with this implementation. Keeping it in a more central location will certainly increase the code's visibility and make it easier for other potential users to find. Also I just noticed that I hadn't actually added a parameter to the iomap_cache_create() function to specify the maximum number of pages, so currently the code only uses a single page anyway. It should be trivial to change. I guess performance was good enough with a single page that I didn't have a reason to increase the maximum number of pages. Thierry pgpxKXeino59f.pgp Description: PGP signature
Re: linux-next: build warning after merge of the kvm tree
On Thu, Jan 10, 2013 at 01:34:36PM +1100, Stephen Rothwell wrote: > Hi Marcelo, > > After merging the kvm tree, today's linux-next build (x86_64 allmodconfig) > produced this warning: > > In file included from arch/x86/kvm/mmu.c:3482:0: > arch/x86/kvm/paging_tmpl.h: In function 'paging64_walk_addr_generic': > arch/x86/kvm/paging_tmpl.h:154:57: warning: unused variable 'shift' > [-Wunused-variable] > In file included from arch/x86/kvm/mmu.c:3486:0: > arch/x86/kvm/paging_tmpl.h: In function 'paging32_walk_addr_generic': > arch/x86/kvm/paging_tmpl.h:154:57: warning: unused variable 'shift' > [-Wunused-variable] > > Introduced by commit 908e7d7999bc ("KVM: MMU: simplify folding of dirty > bit into accessed_dirty"). > Was fixed yesterday. Thanks. -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: linux-next: change contacts for the kvm tree
On Thu, Jan 10, 2013 at 01:36:18PM +1100, Stephen Rothwell wrote: > Hi Marcelo, > > I have removed Avi as a contact for the kvm tree as the address I have > (Avi Kivity ) bounces. Please add Gleb Natapov instead. -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 00/14] Rewrite Tegra PCIe driver
On Wed, Jan 09, 2013 at 10:25:17PM +0100, Thomas Petazzoni wrote: > Dear Thierry Reding, > > On Wed, 9 Jan 2013 21:43:00 +0100, Thierry Reding wrote: > > This patch series contains an almost complete rewrite of the Tegra PCIe > > driver. The code is moved to the drivers/pci/host directory and turned > > into a proper platform driver, adding MSI and DT support while at it. > > Other PCI host controller drivers can be added to that directory in an > > attempt to make it easier to factor out common code. > > Thanks! > > I have started basing the Marvell PCIe code on some of your earlier > versions. But apparently in this final version, you no longer have the > emulated Host bridge. Why so? The reason is that with the latest bindings the matching of root ports to device tree nodes works as-is and nothing else indicates that the emulated host bridge is actually required to make any of this work. So in order not to introduce unneeded code I've left it out for now. If somebody decides that we actually need this host bridge (for standards compliance or whatnot) it could easily be added back. However, before the emulated bridge implementation can be merged I think the PCI ID issue needs to be resolved. > For the Marvell PCIe code, I've used your emulated Host bridge, and > added an emulated PCI-to-PCI bridge implementation, in order to get the > following hierarchy: > > + Host Bridge >+ PCI-to-PCI bridge > + PCI Device >+ PCI-to-PCI bridge > + PCI device > > So, I instantiate one unique emulated Host Bridge, and then one > emulated PCI-to-PCI Bridge for each PCIe interface that I have. Oh dear, that's even worse than on Tegra. The Marvell hardware doesn't even expose the root ports as PCI devices on the bus? > The nice thing about that is that I can then read the configuration > space of the PCI-to-PCI bridge to find out how much I/O space and > memory space is needed for the device connected to this interface, and > at which address is has been mapped. This greatly helps my "address > decoding" problem, and removes the ad-hoc virtual space allocator I had > written. > > Is there a reason for having given up on this idea? Is there still a > hope for a different PCIe implementation to use this idea? I suppose that in your case it really makes sense because you already need the emulated PCI-to-PCI bridges and therefore adding an emulated host bridge doesn't add much. As I said, for Tegra everything still works without, so I didn't see a reason to add needless code. Thierry pgpb8owggtbnV.pgp Description: PGP signature
Re: PTRACE_SYSCALL && vsyscall (Was: arch_check_bp_in_kernelspace: fix the range check)
Hi Everyone, > On 01/08, Pedro Alves wrote: >> >> On 12/04/2012 05:59 PM, Oleg Nesterov wrote: >> >> > But If we want to allow to trace vsyscall's, hw bp doesn't look very >> > nice imo. HBP_NUM = 4 and you need to setup 3 bp's to trace them all. >> >> Irrespective of the whole syscall tracing issue, allowing HW bkpts in >> the vsyscall just seems like a bug fix to me. > > And I never argued. I sent the patch iirc ;) Exactly, it is a bug and I am still waiting for it to be fixed in the Linux kernel. Fully emulating PTRACE_SYSCALL could also provide a suitable way to fix my problem, and it may also help others by saving them the need to program and waste x86 debug registers, but it doesn't change the fact that my problem is caused by a bug in the first place, which should be fixed in any case. Best Regards, Amnon. > >> > That is why I think PTRACE_SYSCALL should "simply work" somehow. And >> > so far I think that "just report syscall_exit with orig_ax = -1" is >> > the best (and simple) solution. >> >> If you report exit alone, you'll confuse current GDB into mistaking >> it for an enter, > > Sure. That is why I asked Jan. > >> > OK. We can do more. We can report both syscall_enter/exit and we can >> > change orig_ax/ax temporary to "fool" the tracer, so that everything >> > will look as a "normal" syscall. Like vsyscall_seccomp() does. >> > >> > But this needs much more changes. >> >> I'd just like to add, that if any new syscall related option is >> to be added, can we please just go all the way and add >> PTRACE_EVENT_SYSCALL_ENTER|PTRACE_EVENT_SYSCALL_EXIT instead? > > Oh yes, this was suggested many times. > > Oleg. > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH V3 1/2] virtio-net: fix the set affinity bug when CPU IDs are not consecutive
Wanlong Gao writes: > On 01/09/2013 07:31 AM, Rusty Russell wrote: >> Wanlong Gao writes: >>> */ >>> static u16 virtnet_select_queue(struct net_device *dev, struct sk_buff >>> *skb) >>> { >>> - int txq = skb_rx_queue_recorded(skb) ? skb_get_rx_queue(skb) : >>> - smp_processor_id(); >>> + int txq = 0; >>> + >>> + if (skb_rx_queue_recorded(skb)) >>> + txq = skb_get_rx_queue(skb); >>> + else if ((txq = per_cpu(vq_index, smp_processor_id())) == -1) >>> + txq = 0; >> >> You should use __get_cpu_var() instead of smp_processor_id() here, ie: >> >> else if ((txq = __get_cpu_var(vq_index)) == -1) >> >> And AFAICT, no reason to initialize txq to 0 to start with. >> >> So: >> >> int txq; >> >> if (skb_rx_queue_recorded(skb)) >> txq = skb_get_rx_queue(skb); >> else { >> txq = __get_cpu_var(vq_index); >> if (txq == -1) >> txq = 0; >> } > > Got it, thank you. > >> >> Now, just to confirm, I assume this can happen even if we use vq_index, >> right, because of races with virtnet_set_channels? > > I still can't understand this race, could you explain more? thank you. I assume that someone can call virtnet_set_channels() while we are inside virtnet_select_queue(), so they reduce dev->real_num_tx_queues, causing virtnet_set_channels to do: while (unlikely(txq >= dev->real_num_tx_queues)) txq -= dev->real_num_tx_queues; Otherwise, when is this loop called? Thanks, Rusty. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] genirq: Give warning when setup an already-setup non-shared irq
Meet the case when the request_threaded_irq() with the same irq is called twice continually, get the below mismatch info: "IRQ handler type mismatch for IRQ 323" Here give a right warning that like below: "Trying to setup already-setup non-shared IRQ 323" Signed-off-by: liu chuansheng --- kernel/irq/manage.c | 10 ++ 1 files changed, 10 insertions(+), 0 deletions(-) diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c index e49a288..6802ce1 100644 --- a/kernel/irq/manage.c +++ b/kernel/irq/manage.c @@ -999,6 +999,16 @@ __setup_irq(unsigned int irq, struct irq_desc *desc, struct irqaction *new) old_ptr = &desc->action; old = *old_ptr; if (old) { + /* Give warning when setup an irq which has been setup +* already. +*/ + if (!(old->flags & IRQF_SHARED)) { + ret = -EINVAL; + WARN(1, "Trying to setup already-setup non-shared IRQ %d\n", + irq); + goto out_mask; + } + /* * Can't share interrupts unless both agree to and are * the same type (level, edge, polarity). So both flag -- 1.7.0.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH V4 0/4] input: keyboard: tegra: cleanups and DT supports
Hi Dmitry, On Monday 07 January 2013 10:22 PM, Stephen Warren wrote: On 01/06/2013 04:14 AM, Laxman Dewangan wrote: This patch series: - fix build warning, - use devm_* for allocation, - make column/rows configuration through DT and - remove the rarely used key mapping table. The series, Reviewed-by: Stephen Warren If you are fine with this series then can it be apply please? I can handle if there is any comment on this series to close this. Thanks, Laxman -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RESEND v1 01/16] vfs: introduce some data structures
On Thu, Jan 10, 2013 at 8:48 AM, David Sterba wrote: > On Thu, Dec 20, 2012 at 10:43:20PM +0800, zwu.ker...@gmail.com wrote: >> --- /dev/null >> +++ b/fs/hot_tracking.c >> @@ -0,0 +1,109 @@ >> +/* >> + * fs/hot_tracking.c > > From what I've undrestood the file name written here is not wanted, so > please drop it (and from .h too) Done. > >> + * >> + * Copyright (C) 2012 IBM Corp. All rights reserved. >> + * Written by Zhi Yong Wu >> + * >> + * This program is free software; you can redistribute it and/or >> + * modify it under the terms of the GNU General Public >> + * License v2 as published by the Free Software Foundation. > > A short description of the hot tracking feature or pointer to the > Documentation/ file would be nice here. ok, Done > >> + */ >> + >> +#include >> +#include >> +#include >> +#include >> +#include >> +#include >> +#include >> +#include >> +#include >> +#include >> +#include "hot_tracking.h" >> + >> +/* kmem_cache pointers for slab caches */ > > This comment seems useless to me, I does not help understanding the code, just > says the same what reads in C. There are more such redundant comments in the > series, but I'm not going point to all of them right now. Removed. > >> +static struct kmem_cache *hot_inode_item_cachep __read_mostly; >> +static struct kmem_cache *hot_range_item_cachep __read_mostly; >> + > >> --- /dev/null >> +++ b/include/linux/hot_tracking.h >> +/* The common info for both following structures */ >> +struct hot_comm_item { >> + struct rb_node rb_node; /* rbtree index */ >> + struct hot_freq_data hot_freq_data; /* frequency data */ >> + spinlock_t lock; /* protects object data */ >> + struct kref refs; /* prevents kfree */ >> +}; >> + >> +/* An item representing an inode and its access frequency */ >> +struct hot_inode_item { >> + struct hot_comm_item hot_inode; /* node in hot_inode_tree */ >> + struct hot_rb_tree hot_range_tree; /* tree of ranges */ >> + spinlock_t lock; /* protect range tree */ >> + struct hot_rb_tree *hot_inode_tree; >> + u64 i_ino; /* inode number from inode */ >> +}; > > Please align the comments to something like this (or drop them if they seem > redundant): Done > > /* The common info for both following structures */ > struct hot_comm_item { > struct rb_node rb_node; /* rbtree index */ > struct hot_freq_data hot_freq_data; /* frequency data */ > spinlock_t lock; /* protects object data */ > struct kref refs;/* prevents kfree */ > struct list_head n_list; /* list node index */ > }; > > /* An item representing an inode and its access frequency */ > struct hot_inode_item { > struct hot_comm_item hot_inode; /* node in hot_inode_tree */ > struct hot_rb_tree hot_range_tree; /* tree of ranges */ > spinlock_t lock; /* protect range tree */ > struct hot_rb_tree *hot_inode_tree; > u64 i_ino; /* inode number from inode */ > }; > >> +extern void __init hot_cache_init(void); > > this belongs to the private include fs/hot_tracking.h (because this is called > only once by vfs init and not by filesystems), there's > hot_track_init(superblock) for that purpose introduced later. Done, Move it to fs/hot_tracking.h > > > david -- Regards, Zhi Yong Wu -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v3 0/3] ARM: dts: omap: add dt data for MUSB
On Friday 28 December 2012 12:05 AM, Aaro Koskinen wrote: Hi, On Thu, Sep 20, 2012 at 05:21:15AM +0200, Benoit Cousson wrote: On 09/19/2012 11:32 AM, Kishon Vijay Abraham I wrote: This patch series adds dt data to get MUSB working in omap4 and omap3 Changes from v2: * Changes the subject of all the patches to include "ARM: dts:" * Added reg property and interrupt property for "usb_otg_hs". Previously these were obtained from ti,hwmods property. * Rebased on git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap.git devel-dt Changes from v1: Just removed the omap-usb2 dt data and sent that as a separate patch. Kishon Vijay Abraham I (3): ARM: dts: Add twl6030-usb data ARM: dts: Add twl4030-usb data ARM: dts: omap: Add usb_otg and glue data Thanks for the update. I've just pulled the series for 3.7. I wonder what happened to the patch #3 (Add usb_otg and glue data) of this series? Why was it dropped? I cannot see it in 3.7 or 3.8-rc1. Benoit? -Kishon -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v6 04/15] memory-hotplug: remove /sys/firmware/memmap/X sysfs
Hi Andrew, On 01/10/2013 07:19 AM, Andrew Morton wrote: ... + entry = firmware_map_find_entry(start, end - 1, type); + if (!entry) + return -EINVAL; + + firmware_map_remove_entry(entry); ... The above code looks racy. After firmware_map_find_entry() does the spin_unlock() there is nothing to prevent a concurrent firmware_map_remove_entry() from removing the entry, so the kernel ends up calling firmware_map_remove_entry() twice against the same entry. An easy fix for this is to hold the spinlock across the entire lookup/remove operation. This problem is inherent to firmware_map_find_entry() as you have implemented it, so this function simply should not exist in the current form - no caller can use it without being buggy! A simple fix for this is to remove the spin_lock()/spin_unlock() from firmware_map_find_entry() and add locking documentation to firmware_map_find_entry(), explaining that the caller must hold map_entries_lock and must not release that lock until processing of firmware_map_find_entry()'s return value has completed. Thank you for your advice, I'll fix it soon. Since you have merged the patch-set, do I need to resend all these patches again, or just send a patch to fix it based on the current one ? Thanks. :) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v6 02/15] memory-hotplug: check whether all memory blocks are offlined or not when removing memory
Hi Andrew, On 01/10/2013 07:11 AM, Andrew Morton wrote: On Wed, 9 Jan 2013 17:32:26 +0800 Tang Chen wrote: We remove the memory like this: 1. lock memory hotplug 2. offline a memory block 3. unlock memory hotplug 4. repeat 1-3 to offline all memory blocks 5. lock memory hotplug 6. remove memory(TODO) 7. unlock memory hotplug All memory blocks must be offlined before removing memory. But we don't hold the lock in the whole operation. So we should check whether all memory blocks are offlined before step6. Otherwise, kernel maybe panicked. Well, the obvious question is: why don't we hold lock_memory_hotplug() for all of steps 1-4? Please send the reasons for this in a form which I can paste into the changelog. In the changelog form: Offlining a memory block and removing a memory device can be two different operations. Users can just offline some memory blocks without removing the memory device. For this purpose, the kernel has held lock_memory_hotplug() in __offline_pages(). To reuse the code for memory hot-remove, we repeat step 1-3 to offline all the memory blocks, repeatedly lock and unlock memory hotplug, but not hold the memory hotplug lock in the whole operation. Actually, I wonder if doing this would fix a race in the current remove_memory() repeat: loop. That code does a find_memory_block_hinted() followed by offline_memory_block(), but afaict find_memory_block_hinted() only does a get_device(). Is the get_device() sufficiently strong to prevent problems if another thread concurrently offlines or otherwise alters this memory_block's state? I think we already have memory_block->state_mutex to protect the concurrently changing of memory_block's state. The find_memory_block_hinted() here is to find the memory_block corresponding to the memory section we are dealing with. Thanks. :) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v6 04/15] memory-hotplug: remove /sys/firmware/memmap/X sysfs
Hi Andrew, On 01/10/2013 06:49 AM, Andrew Morton wrote: On Wed, 9 Jan 2013 17:32:28 +0800 Tang Chen wrote: When (hot)adding memory into system, /sys/firmware/memmap/X/{end, start, type} sysfs files are created. But there is no code to remove these files. The patch implements the function to remove them. Note: The code does not free firmware_map_entry which is allocated by bootmem. So the patch makes memory leak. But I think the memory leak size is very samll. And it does not affect the system. Well that's bad. Can we remember the address of that memory and then reuse the storage if/when the memory is re-added? That at least puts an upper bound on the leak. I think we can do this. I'll post a new patch to do so. Thanks. :) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] Fix wrong EOF compare
getc returns "int" so EOF could be -1 but storing getc's return value to char directly makes the vaule to 255 so below condition is always false. It happens in my ARM system so loop is not ended, then segfaulted. This patch fixes it. *curr = getc(fin); // *curr = 255 if (*curr == EOF) return -1; // if ( 255 == -1) Cc: Mel Gorman Cc: Andy Whitcroft Cc: Alexander Nyberg Signed-off-by: Minchan Kim --- Documentation/page_owner.c |6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/Documentation/page_owner.c b/Documentation/page_owner.c index f0156e1..b777fb6 100644 --- a/Documentation/page_owner.c +++ b/Documentation/page_owner.c @@ -32,12 +32,14 @@ int read_block(char *buf, FILE *fin) { int ret = 0; int hit = 0; + int vaule; char *curr = buf; for (;;) { - *curr = getc(fin); - if (*curr == EOF) return -1; + value = getc(fin); + if (value == EOF) return -1; + *curr = value; ret++; if (*curr == '\n' && hit == 1) return ret - 1; -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] cpufreq: Simplify __cpufreq_remove_dev()
On 9 January 2013 21:09, Viresh Kumar wrote: > I have tried that too, it is also pushed at: > > https://lkml.org/lkml/2012/12/16/5 Bad link :( http://git.linaro.org/gitweb?p=arm/big.LITTLE/mp.git;a=shortlog;h=refs/heads/cpufreq-fixes-v2 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] cpufreq: Simplify __cpufreq_remove_dev()
On 9 January 2013 21:09, Viresh Kumar wrote: > On 9 January 2013 16:50, Viresh Kumar wrote: >> [Probably need to simplify cpufreq_add_dev() too, but that can be done as >> next >> step.] > > I have tried that too, it is also pushed at: > > https://lkml.org/lkml/2012/12/16/5 > > [Untested for now, will be doing it tomorrow] > > From: Viresh Kumar > Date: Wed, 9 Jan 2013 21:02:50 +0530 > Subject: [PATCH] cpufreq: Simplify cpufreq_add_dev() > > Currently cpufreq_add_dev() firsts allocated policy, calls ->init() and then > checks if this cpu should be already managed or not. And if it already > managed, > free its policy. > > We can save all this if we somehow know if this cpu is managed or not in > advance. policy->related_cpus contains list of all valid sibling cpus of > policy->cpu. We can check this to know if current cpu is already managed. > > Signed-off-by: Viresh Kumar Tested-by: Viresh Kumar -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [TRIVIAL PATCH 11/26] powerpc: Convert print_symbol to %pSR
On Thu, 2012-12-13 at 11:58 +, Arnd Bergmann wrote: > On Wednesday 12 December 2012, Joe Perches wrote: > > Use the new vsprintf extension to avoid any possible > > message interleaving. > > > > Convert the #ifdef DEBUG block to a single pr_debug. > > > > Signed-off-by: Joe Perches > > nice cleanup! ... which also breaks the build :-( > Acked-by: Arnd Bergmann I'll fix it up locally. Ben. > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] sched: Get rid of unnecessary checks from select_idle_sibling
On Wed, 09 Jan 2013 16:34:39 +0800, Alex Shi wrote: > On 01/09/2013 03:54 PM, Namhyung Kim wrote: >> Hi Alex, >> >> On Wed, 09 Jan 2013 15:33:40 +0800, Alex Shi wrote: >>> On 01/09/2013 02:50 PM, Namhyung Kim wrote: From: Namhyung Kim AFAICS @target cpu of select_idle_sibling() is always either prev_cpu or this_cpu. So no need to check it again and the conditionals can be consolidated. >> [snip] >>> Uh, we don't know if the target is this_cpu or previous cpu, If we just >>> check the target idle status, we may miss another idle cpu. So this >>> patch change the logical in this function. >> >> select_idle_sibling() is called only in select_task_rq_fair() if it >> found a suitable affine_sd. The default target is the 'prev_cpu' of the >> task but if wake_affine() returns true it'd be (this) 'cpu'. >> >> I cannot see where the prev_cpu or the cpu is set to another one before >> calling select_idle_sibling. > > The old logical will return directly whenever prev_cpu or this cpu idle, > but your new logical just has one chance. Sorry, I can't get your point. Could you elaborate on it a bit more? Thanks, Namhyung -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] sched: Get rid of unnecessary checks from select_idle_sibling
Hi Preeti, On Wed, 09 Jan 2013 13:51:00 +0530, Preeti U. Murthy wrote: > On 01/09/2013 12:20 PM, Namhyung Kim wrote: >> From: Namhyung Kim >> >> AFAICS @target cpu of select_idle_sibling() is always either prev_cpu >> or this_cpu. So no need to check it again and the conditionals can be >> consolidated. [snip] > If NUMA_TTWU_BIAS or NUMA_TTWU_TO is true(it is false by I can't find those bits in the code. I've checked v3.8-rc2, next-20130110, tip/master and tip/numa/core but there's nothing like above. Which tree are you saying? > default),cpu/prev_cpu can be changed to be a random node_cpu(the node > that 'this_cpu' is on). In which case even if the node cpu is idle,it > would not be a viable target,looks like.Maybe that is why > select_idle_sibling() makes the check if the target is prev_cpu/this cpu. Looking into tip/numa/core, I can see that there's a code added for CONFIG_NUMA_BALANCING. But still, it seems nothing changed on a path from select_task_rq_fair() to select_idle_sibling() - i.e. if the select_idle_sibling called, the target would be either prev_cpu or this cpu. Am I missing something? Thanks, Namhyung -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH V2] vfs: re-implement writeback_inodes_sb(_nr)_if_idle() and rename them
writeback_inodes_sb(_nr)_if_idle() is re-implemented by replacing down_read() with down_read_trylock() because - If ->s_umount is write locked, then the sb is not idle. That is writeback_inodes_sb(_nr)_if_idle() needn't wait for the lock. - writeback_inodes_sb(_nr)_if_idle() grabs s_umount lock when it want to start writeback, it may bring us deadlock problem when doing umount. In order to fix the problem, ext4 and btrfs implemented their own writeback functions instead of writeback_inodes_sb(_nr)_if_idle(), but it introduced the redundant code, it is better to implement a new writeback_inodes_sb(_nr)_if_idle(). The name of these two functions is cumbersome, so rename them to try_to_writeback_inodes_sb(_nr). This idea came from Christoph Hellwig. Some code is from the patch of Kamal Mostafa. Signed-off-by: Miao Xie --- Changelog v1 -> v2: - do not remove EXPORT_SYMBOL of writeback_inodes_sb_br() --- fs/btrfs/extent-tree.c| 20 +++- fs/ext4/inode.c | 8 ++-- fs/fs-writeback.c | 44 include/linux/writeback.h | 6 +++--- 4 files changed, 28 insertions(+), 50 deletions(-) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 521e9d4..f31abb1 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -3689,20 +3689,6 @@ static int can_overcommit(struct btrfs_root *root, return 0; } -static int writeback_inodes_sb_nr_if_idle_safe(struct super_block *sb, - unsigned long nr_pages, - enum wb_reason reason) -{ - if (!writeback_in_progress(sb->s_bdi) && - down_read_trylock(&sb->s_umount)) { - writeback_inodes_sb_nr(sb, nr_pages, reason); - up_read(&sb->s_umount); - return 1; - } - - return 0; -} - /* * shrink metadata reservation for delalloc */ @@ -3735,9 +3721,9 @@ static void shrink_delalloc(struct btrfs_root *root, u64 to_reclaim, u64 orig, while (delalloc_bytes && loops < 3) { max_reclaim = min(delalloc_bytes, to_reclaim); nr_pages = max_reclaim >> PAGE_CACHE_SHIFT; - writeback_inodes_sb_nr_if_idle_safe(root->fs_info->sb, - nr_pages, - WB_REASON_FS_FREE_SPACE); + try_to_writeback_inodes_sb_nr(root->fs_info->sb, + nr_pages, + WB_REASON_FS_FREE_SPACE); /* * We need to wait for the async pages to actually start before diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index cbfe13b..5f6eef7 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -2512,12 +2512,8 @@ static int ext4_nonda_switch(struct super_block *sb) /* * Start pushing delalloc when 1/2 of free blocks are dirty. */ - if (dirty_blocks && (free_blocks < 2 * dirty_blocks) && - !writeback_in_progress(sb->s_bdi) && - down_read_trylock(&sb->s_umount)) { - writeback_inodes_sb(sb, WB_REASON_FS_FREE_SPACE); - up_read(&sb->s_umount); - } + if (dirty_blocks && (free_blocks < 2 * dirty_blocks)) + try_to_writeback_inodes_sb(sb, WB_REASON_FS_FREE_SPACE); if (2 * free_blocks < 3 * dirty_blocks || free_blocks < (dirty_blocks + EXT4_FREECLUSTERS_WATERMARK)) { diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c index 310972b..ad3cc46 100644 --- a/fs/fs-writeback.c +++ b/fs/fs-writeback.c @@ -1332,47 +1332,43 @@ void writeback_inodes_sb(struct super_block *sb, enum wb_reason reason) EXPORT_SYMBOL(writeback_inodes_sb); /** - * writeback_inodes_sb_if_idle - start writeback if none underway + * try_to_writeback_inodes_sb_nr - try to start writeback if none underway * @sb: the superblock - * @reason: reason why some writeback work was initiated + * @nr: the number of pages to write + * @reason: the reason of writeback * - * Invoke writeback_inodes_sb if no writeback is currently underway. + * Invoke writeback_inodes_sb_nr if no writeback is currently underway. * Returns 1 if writeback was started, 0 if not. */ -int writeback_inodes_sb_if_idle(struct super_block *sb, enum wb_reason reason) +int try_to_writeback_inodes_sb_nr(struct super_block *sb, + unsigned long nr, + enum wb_reason reason) { - if (!writeback_in_progress(sb->s_bdi)) { - down_read(&sb->s_umount); - writeback_inodes_sb(sb, reason); - up_read(&sb->s_umount); + if (writeback_in_progress(sb->s_bdi)) return 1; - } else + + if (!down_read_trylock(&sb->s_umount)) return 0; + + writeback_inodes_sb_nr(sb, nr, reason); +
Re: [PATCH 1/1] uprobes: Teach handler_chain() to filter out the probed task
* Oleg Nesterov [2012-12-29 18:36:14]: > Currrently the are 2 problems with pre-filtering: > > 1. It is not possible to add/remove a task (mm) after uprobe_register() > > 2. A forked child inherits all breakpoints and uprobe_consumer can not >control this. > > This patch does the first step to improve the filtering. handler_chain() > removes the breakpoints installed by this uprobe from current->mm if all > handlers return UPROBE_HANDLER_REMOVE. > > Note that handler_chain() relies on ->register_rwsem to avoid the race > with uprobe_register/unregister which can add/del a consumer, or even > remove and then insert the new uprobe at the same address. > > Perhaps we will add uprobe_apply_mm(uprobe, mm, is_register) and teach > copy_mm() to do filter(UPROBE_FILTER_FORK), but I think this change makes > sense anyway. > > Note: instead of checking the retcode from uc->handler, we could add > uc->filter(UPROBE_FILTER_BPHIT). But I think this is not optimal to > call 2 hooks in a row. This buys nothing, and if handler/filter do > something nontrivial they will probably do the same work twice. > > Signed-off-by: Oleg Nesterov Acked-by: Srikar Dronamraju > --- > include/linux/uprobes.h |3 ++ > kernel/events/uprobes.c | 58 ++ > 2 files changed, 51 insertions(+), 10 deletions(-) > > diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h > index c2df693..95d0002 100644 > --- a/include/linux/uprobes.h > +++ b/include/linux/uprobes.h > @@ -35,6 +35,9 @@ struct inode; > # include > #endif > > +#define UPROBE_HANDLER_REMOVE1 > +#define UPROBE_HANDLER_MASK 1 > + > enum uprobe_filter_ctx { > UPROBE_FILTER_REGISTER, > UPROBE_FILTER_UNREGISTER, > diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c > index e2ebb6f..59b6e97 100644 > --- a/kernel/events/uprobes.c > +++ b/kernel/events/uprobes.c > @@ -440,16 +440,6 @@ static struct uprobe *alloc_uprobe(struct inode *inode, > loff_t offset) > return uprobe; > } > > -static void handler_chain(struct uprobe *uprobe, struct pt_regs *regs) > -{ > - struct uprobe_consumer *uc; > - > - down_read(&uprobe->register_rwsem); > - for (uc = uprobe->consumers; uc; uc = uc->next) > - uc->handler(uc, regs); > - up_read(&uprobe->register_rwsem); > -} > - > static void consumer_add(struct uprobe *uprobe, struct uprobe_consumer *uc) > { > down_write(&uprobe->consumer_rwsem); > @@ -882,6 +872,33 @@ void uprobe_unregister(struct inode *inode, loff_t > offset, struct uprobe_consume > put_uprobe(uprobe); > } > > +static int unapply_uprobe(struct uprobe *uprobe, struct mm_struct *mm) > +{ > + struct vm_area_struct *vma; > + int err = 0; > + > + down_read(&mm->mmap_sem); > + for (vma = mm->mmap; vma; vma = vma->vm_next) { > + unsigned long vaddr; > + loff_t offset; > + > + if (!valid_vma(vma, false) || > + vma->vm_file->f_mapping->host != uprobe->inode) > + continue; > + > + offset = (loff_t)vma->vm_pgoff << PAGE_SHIFT; > + if (uprobe->offset < offset || > + uprobe->offset >= offset + vma->vm_end - vma->vm_start) > + continue; > + > + vaddr = offset_to_vaddr(vma, uprobe->offset); > + err |= remove_breakpoint(uprobe, mm, vaddr); > + } > + up_read(&mm->mmap_sem); > + > + return err; > +} > + > static struct rb_node * > find_node_in_range(struct inode *inode, loff_t min, loff_t max) > { > @@ -1435,6 +1452,27 @@ static struct uprobe *find_active_uprobe(unsigned long > bp_vaddr, int *is_swbp) > return uprobe; > } > > +static void handler_chain(struct uprobe *uprobe, struct pt_regs *regs) > +{ > + struct uprobe_consumer *uc; > + int remove = UPROBE_HANDLER_REMOVE; > + > + down_read(&uprobe->register_rwsem); > + for (uc = uprobe->consumers; uc; uc = uc->next) { > + int rc = uc->handler(uc, regs); > + > + WARN(rc & ~UPROBE_HANDLER_MASK, > + "bad rc=0x%x from %pf()\n", rc, uc->handler); > + remove &= rc; > + } > + > + if (remove && uprobe->consumers) { > + WARN_ON(!uprobe_is_active(uprobe)); > + unapply_uprobe(uprobe, current->mm); > + } > + up_read(&uprobe->register_rwsem); > +} > + > /* > * Run handler and ask thread to singlestep. > * Ensure all non-fatal signals cannot interrupt thread while it singlesteps. > -- > 1.5.5.1 > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: USB autosuspend vs. URB submission
On Thu, Jan 10, 2013 at 7:05 AM, Laurent Pinchart wrote: > > I've never heard of such problems with the uvcvideo driver, and I don't see > anything wrong with the code at first sight. The driver only submits URBs when IMO, there is a minor fault in the error handling path of uvc_status_start() inside uvc_v4l2_open(), and the 'users' count should have been decreased before usb_autopm_put_interface(). In theory, the warning can be triggered when the device is opened just between usb_autopm_put_interface() and atomic_dec(&stream->dev->users), but I don't think it is the cause of the report. > starting the video capture (at that point no URB should be in flight) or in > the URB completion handler (by definition the URB has completed then). > > I've had a quick look at the trace posted at > https://bugzilla.redhat.com/show_bug.cgi?id=879462 but usbmon only shows URBs > that are successfully submitted. I'm not sure what useful information I could > get from the trace. It might be useful to post the relevant 'dmesg' and the usbmon together. Thanks, -- Ming Lei -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH V2 Resend 3/4] workqueue: Schedule work on non-idle cpu instead of current one
On 10 January 2013 00:19, Tejun Heo wrote: > On Mon, Jan 07, 2013 at 11:37:22PM +0530, Viresh Kumar wrote: >> We are talking about a core being idle from schedulers perspective :) > > But it's not like cpu doesn't consume power if scheduler considers it > idle, right? Can you please explain in detail how this contributes to > saving power? Is it primarily about routing work items to lower power > CPUs? And please don't point to presentation slides. They don't seem > to explain much better and patches and the code should be able to > describe themselves. Here, more so, as the desired behavior change > and the resulting powersave are rather subtle. I got your concerns. Firstly, when cpu is idle from schedulers perspective, it consumes a lot of power. queue_work_on_any_cpu() would queue the work on any other cpu only when current cpu is idle from schedulers perspective, and this can only happen when the cpu was actually idle (in low power state), woke up due to some interrupt/timer and is asked to queue a work.. The idea is to choose other non-idle cpu at this point, so that current cpu can immediately go into deeper idle state. With this cpus can stay longer at deeper idle state, rather than running works. And in cases, where works are rearmed from the handler, this can cause sufficient power loss, which could be easily saved by pushing this work to non-idle cpus. The same approach is taken for deffered timers too, they are already using such routine. . -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH V3 4/8] memcg: add per cgroup dirty pages accounting
(2013/01/10 13:26), Sha Zhengju wrote: But this method also has its pros and cons(e.g. need lock nesting). So I doubt whether the following is able to deal with these issues all together: (CPU-A does "page stat accounting" and CPU-B does "move") CPU-ACPU-B move_lock_mem_cgroup() memcg = pc->mem_cgroup SetPageDirty(page) move_unlock_mem_cgroup() move_lock_mem_cgroup() if (PageDirty) { old_memcg->nr_dirty --; new_memcg->nr_dirty ++; } pc->mem_cgroup = new_memcg move_unlock_mem_cgroup() memcg->nr_dirty ++ For CPU-A, we save pc->mem_cgroup in a temporary variable just before SetPageDirty inside move_lock and then update stats if the page is set PG_dirty successfully. But CPU-B may do "moving" in advance that "old_memcg->nr_dirty --" will make old_memcg->nr_dirty incorrect but soon CPU-A will do "memcg->nr_dirty ++" at the heels that amend the stats. However, there is a potential problem that old_memcg->nr_dirty may be minus in a very short period but not a big issue IMHO. IMHO, this will work. Please take care of that the recorded memcg will not be invalid pointer when you update the nr_dirty later. (Maybe RCU will protect it.) _If_ this method can handle "nesting" problem clearer and make implementation simpler, please go ahead. To be honest, I'm not sure how the code will be until seeing the patch. Hmm, why you write SetPageDirty() here rather than TestSetPageDirty() Thanks, -Kame -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] serial:ifx6x60:Delete SPI timer when shut down port
When shut down SPI port, it's possible that MRDY has been asserted and a SPI timer was activated waiting for SRDY assert, in the case, it needs to delete this timer. Signed-off-by: Chen Jun Signed-off-by: channing --- drivers/tty/serial/ifx6x60.c |1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/drivers/tty/serial/ifx6x60.c b/drivers/tty/serial/ifx6x60.c index 675d94a..7eed323 100644 --- a/drivers/tty/serial/ifx6x60.c +++ b/drivers/tty/serial/ifx6x60.c @@ -637,6 +637,7 @@ static void ifx_port_shutdown(struct tty_port *port) clear_bit(IFX_SPI_STATE_IO_AVAILABLE, &ifx_dev->flags); mrdy_set_low(ifx_dev); + del_timer(&ifx_dev->spi_timer); clear_bit(IFX_SPI_STATE_TIMER_PENDING, &ifx_dev->flags); tasklet_kill(&ifx_dev->io_work_tasklet); } -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH V3 4/8] memcg: add per cgroup dirty pages accounting
On Thu, Jan 10, 2013 at 10:16 AM, Kamezawa Hiroyuki wrote: > (2013/01/10 0:02), Sha Zhengju wrote: >> >> On Mon, Jan 7, 2013 at 3:25 PM, Kamezawa Hiroyuki >> wrote: >>> >>> (2013/01/05 13:48), Sha Zhengju wrote: On Wed, Jan 2, 2013 at 6:44 PM, Michal Hocko wrote: > > > On Wed 26-12-12 01:26:07, Sha Zhengju wrote: >> >> >> From: Sha Zhengju >> >> This patch adds memcg routines to count dirty pages, which allows >> memory >> controller >> to maintain an accurate view of the amount of its dirty memory and can >> provide some >> info for users while cgroup's direct reclaim is working. > > > > I guess you meant targeted resp. (hard/soft) limit reclaim here, > right? It is true that this is direct reclaim but it is not clear to me Yes, I meant memcg hard/soft reclaim here which is triggered directly by allocation and is distinct from background kswapd reclaim (global). > why the usefulnes should be limitted to the reclaim for users. I would > understand this if the users was in fact in-kernel users. > One of the reasons I'm trying to accounting the dirty pages is to get a more board overall view of memory usages because memcg hard/soft reclaim may have effect on response time of user application. Yeah, the beneficiary can be application administrator or kernel users. :P > [...] >> >> >> To prevent AB/BA deadlock mentioned by Greg Thelen in previous version >> (https://lkml.org/lkml/2012/7/30/227), we adjust the lock order: >> ->private_lock --> mapping->tree_lock --> memcg->move_lock. >> So we need to make mapping->tree_lock ahead of TestSetPageDirty in >> __set_page_dirty() >> and __set_page_dirty_nobuffers(). But in order to avoiding useless >> spinlock contention, >> a prepare PageDirty() checking is added. > > > > But there is another AA deadlock here I believe. > page_remove_rmap > mem_cgroup_begin_update_page_stat <<< 1 > set_page_dirty > __set_page_dirty_buffers > __set_page_dirty > mem_cgroup_begin_update_page_stat <<< 2 > move_lock_mem_cgroup > spin_lock_irqsave(&memcg->move_lock, *flags); > > mem_cgroup_begin_update_page_stat is not recursive wrt. locking AFAICS > because we might race with the moving charges: > CPU0CPU1 > page_remove_rmap > mem_cgroup_can_attach > mem_cgroup_begin_update_page_stat (1) > rcu_read_lock > > mem_cgroup_start_move > > atomic_inc(&memcg_moving) > > atomic_inc(&memcg->moving_account) > synchronize_rcu > __mem_cgroup_begin_update_page_stat > mem_cgroup_stolen <<< TRUE > move_lock_mem_cgroup > [...] > mem_cgroup_begin_update_page_stat (2) > __mem_cgroup_begin_update_page_stat > mem_cgroup_stolen <<< still TRUE > move_lock_mem_cgroup <<< DEADLOCK > [...] > mem_cgroup_end_update_page_stat > rcu_unlock > # wake up from > synchronize_rcu > [...] > mem_cgroup_move_task > > mem_cgroup_move_charge > walk_page_range > > mem_cgroup_move_account > > move_lock_mem_cgroup > > > Maybe I have missed some other locking which would prevent this from > happening but the locking relations are really complicated in this area > so if mem_cgroup_{begin,end}_update_page_stat might be called > recursively then we need a fat comment which justifies that. > Ohhh...good catching! I didn't notice there is a recursive call of mem_cgroup_{begin,end}_update_page_stat in page_remove_rmap(). The mem_cgroup_{begin,end}_update_page_stat() design has depressed me a lot recently as the lock granularity is a little bigger than I thought. Not only the resource but also some code logic is in the range of locking which may be deadlock prone. The problem still exists if we are trying to add stat account of other memcg page later, may I make bold to suggest that we dig into the lock again... But with regard to the current lock implementation, I doubt if we can we can account MEM_CGROUP_STAT_FILE_{MAPPED, DIRTY} in one breath and just try to get move_lock once in the beginning. IMHO we can make mem_cgroup_{begin,end}_update_page_stat() to recursive aware and what
Re: [PATCH RFT] regulator: lp8788-buck: Silence build warning
Thanks for the fix! It works! Tested-by: Fengguang Wu -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] sysctl: Enable IA64 "ignore-unaligned-usertrap" to be used cross-arch
On Thursday 10 January 2013 12:25 AM, Tony Luck wrote: > Ok. Queued for 3.9 merge window. It should show up in linux-next in > the next day or two. We'll see if anyone complains about the name > SYSCTL_ARCH_UNALIGN_NO_WARN (every time I touch the Kconfig > bits it seems that there is some other, better way that we do this now). > > -Tony Thanks a bunch. -Vineet -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kernel BUG at kernel/sched_rt.c:493!
On Tue, 2013-01-08 at 09:01 -0600, Shawn Bohrer wrote: > On Tue, Jan 08, 2013 at 09:36:05AM -0500, Steven Rostedt wrote: > > > > > > I've also managed to reproduce this on 3.8.0-rc2 so it appears the bug > > > is still present in the latest kernel. > > > > Shawn, > > > > Can you send me your .config file. > > I've attached the 3.8.0-rc2 config that I used to reproduce this in an > 8 core kvm image. Let me know if you need anything else. I tried beating on my little Q6600 with no success. I even tried setting the entire box rt, GUI and all, nada. Hm, maybe re-installing systemd.. -Mike -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH 3/7] pwm: pwm-tiehrpwm: Update the clock handling of pwm-tiehrpwm driver
On Wed, Jan 02, 2013 at 19:08:43, Thierry Reding wrote: > On Wed, Jan 02, 2013 at 06:54:50PM +0530, Philip Avinash wrote: > > The clock framework has changed and it's now better to invoke > > clock_prepare_enable() and clk_disable_unprepare() rather than the > > legacy clk_enable() and clk_disable() calls. This patch converts the > > pwm-tiehrpwm driver to the new framework. > > > > Signed-off-by: Philip Avinash > > Cc: Thierry Reding > > --- > > In 3.8-rc1, common clock frame work support added to AM335x. > > > > drivers/pwm/pwm-tiehrpwm.c |4 ++-- > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > diff --git a/drivers/pwm/pwm-tiehrpwm.c b/drivers/pwm/pwm-tiehrpwm.c > > index 72a6dd4..af6f162 100644 > > --- a/drivers/pwm/pwm-tiehrpwm.c > > +++ b/drivers/pwm/pwm-tiehrpwm.c > > @@ -341,7 +341,7 @@ static int ehrpwm_pwm_enable(struct pwm_chip *chip, > > struct pwm_device *pwm) > > configure_polarity(pc, pwm->hwpwm); > > > > /* Enable TBCLK before enabling PWM device */ > > - clk_enable(pc->tbclk); > > + clk_prepare_enable(pc->tbclk); > > I apparently didn't catch this before, but maybe it would be useful to > check the return value here to make sure we only proceed if the clock > can actually be enabled. Ok I will check return value & make it return status of pwm_enable(). I will send it as individual patch in next series. I am waiting for reviews / acceptance of HWMOD and clock tree node patches. Thanks Avinash > > Thierry > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH 0/7] HWMOD fixes for AM33xx PWM submodules and device tree nodes.
Hi Paul, On Wed, Jan 02, 2013 at 18:54:47, Philip, Avinash wrote: > This patch series is being part of [1] and resubmitting on top of 3.8-rc1 to > make it to 3.9. > > In AM33xx PWM sub modules like ECAP, EHRPWM & EQEP are integrated to > PWM subsystem. All these submodules shares the resources (clock) & has > a clock gating register in PWM Subsystem. So a parent PWM subsystem > driver is created. To support PWM subsystem driver, parent child > relation is created in HWMOD entry in patch #2. > > In addition EHRPWM module requires explicit clock gating from control > module. Hence add clock node for tbclk handling in patch #4. > > Patch #3 to support common clock migration for AM33xx. > > Also Device tree nodes populated to support parent child relation > between PWMSS, ECAP & EHRPWM submodules (patch #5, 6 & 7). > > This patch series based on linux_omap/master [2] and tested for > backlight with ECAP on AM335x-evm & AM335x-evmsk. Are there any reviews on this patch series? This patch set is required for LCD backlight work based on ECAP PWM in am335x-evm & am335x-evmsk. Thanks Avinash > > 1. https://lkml.org/lkml/2012/11/27/120 > 2. > http://git.kernel.org/?p=linux/kernel/git/tmlind/linux-omap.git;a=commit;h=c4e2e79ff7c9f4fdad7da432c6b449121cc3033e > > Philip Avinash (7): > ARM: OMAP: AM33xx hwmod: Corrects PWM subsystem HWMOD entries > ARM: OMAP: AM33xx hwmod: Add parent-child relationship for PWM > subsystem > pwm: pwm-tiehrpwm: Update the clock handling of pwm-tiehrpwm driver > ARM: AM33XX: clk: Add clock node for EHRPWM TBCLK > ARM: dts: AM33XX: Add PWMSS device tree nodes > ARM: dts: AM33XX: Add PWM backlight DT data to am335x-evm > ARM: dts: AM33XX: Add PWM backlight DT data to am335x-evmsk > > arch/arm/boot/dts/am335x-evm.dts | 23 ++ > arch/arm/boot/dts/am335x-evmsk.dts | 23 ++ > arch/arm/boot/dts/am33xx.dtsi | 84 ++ > arch/arm/mach-omap2/cclock33xx_data.c | 30 +++ > arch/arm/mach-omap2/control.h |8 + > arch/arm/mach-omap2/omap_hwmod_33xx_data.c | 390 > ++-- > drivers/pwm/pwm-tiehrpwm.c |4 +- > 7 files changed, 417 insertions(+), 145 deletions(-) > > -- > 1.7.9.5 > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/4] gpio: introduce descriptor-based interface
On Wednesday 09 January 2013 18:46:12 Arnd Bergmann wrote: > > The question is, do we want to totally get rid of the integer > > namespace? That would be the ultimate step, but would require another > > way to identify GPIOs (controller_device:offset might be such a way), > > and also to reorganize sysfs nodes. Wouldn't that be considered > > breaking user-space? 'cause we all know what happens to those who > > break user-space. > > > The user interface could eventually be the only part of the kernel that > uses the numbers, but you are right that we cannot change that. That's sad, as it makes it necessary to maintain the global integer namespace (assigning a base GPIO to each controller and making sure the controllers ranges do no overlap) even if it is not used internally anymore. We could make global numbers assignment transparent, but that would potentially change the GPIO numbers in user-space and cause another incompatibility. > > With the current code, a driver should depend on gpiolib being > > compiled if it uses the new interface. It is not even declared if > > gpiolib is not used. > > > > Given that both interfaces are quite close, one could imagine having a > > gpiod wrapper around the integer namespace (the "opaque descriptors" > > would then just be casted integers). This way drivers would only need > > to depend on GENERIC_GPIO. It's a little bit weird to have gpiod > > wrapping around gpio in one case and the opposite in another though - > > I'd rather have these platforms convert to GPIO descriptors internally > > (or even better, to gpiolib), but this is probably asking too much. > > > I think it would be reasonable to force everybody to use gpiolib, > that's much easier than converting everyone to the descriptor based > interface. > > > > I do not know all the details of gpiolib's history, but why would > > anyone want to implement the generic gpio interface and not use > > gpiolib anyways? > > > Only legacy users did this. Initially there was only the header file, > with the API declared but several different implementations of it. > gpiolib was introduced later to reduce code duplication and allow having > multiple implementations in the same kernel. Does the following sound reasonable? 1) Make sure every target that uses GENERIC_GPIO also implements its drivers using gpiolib, convert the (hopefully) few ones that don't to use gpiolib 2) Make GENERIC_GPIO require GPIOLIB or just merge both options into a single one 3) Turn gpio into a full subsystem (like pinctrl) This should make things less blurry and easier to maintain (less header files, only one interface, etc.) GPIO controllers would also be better integrated into the driver model. > > > If we are adding a new way to deal with GPIOs, would it make sense to > > > have that more closely integrated into pinctrl in one form or another? > > > My feeling is that there is already a significant overlap between the > > > two, and while the addition of the gpiod_* functions doesn't > > > necessarily > > > make this worse, it could be a chance to improve the current situation > > > to make the interface more consistent with pinctrl. > > > > > > That may be a chance to introduce deeper changes indeed - what do you > > have in mind exactly? > > > I don't know enough about pinctrl to have a specific idea yet, but maybe > someone else has ideas. I had a deeper look at pinctrl, and indeed I can see the connection between the two. There already interfaces to link GPIO ranges to pin ranges and have GPIO drivers switch the pin to the correct state when a GPIO is requested (this, btw, should also be updated to not use global GPIO numbers at some point). Maybe some tighter integration that I just don't see yet can be done too. > Regarding the integration of pinctrl with gpio, > I was thinking in the past that we could make pinctrl provide everything > that gpiolib does, and have a generic gpiolib driver on top of pinctrl > so that platforms don't need to implement both interfaces but only need > to provide a pure pinctrl driver. Not sure if this makes any sense. That would work if all GPIOs were connected to a ball, but how about GPIO expanders that are external to the chip? They have no use for pinctrl AFAICT. On the other hand, maybe we can have one pinctrl-gpio driver for those chips where pinctrl alone can emulate all the functionality of a GPIO controller. Maybe such a driver exists already? But in general, I agree pinctrl should be a source of inspiration for how to design GPIO. In particular, having a per-chip integer namespace instead of a single global one is definitely something to take (and that's how things work in the DT already). Alex. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] fadvise: perform WILLNEED readahead asynchronously
Simon Jeons wrote: > On Tue, 2012-12-25 at 02:22 +, Eric Wong wrote: > > Please add changelog. Changes since v1: * separate unbound workqueue for high-priority tasks * account for inflight readahead to avoid denial-of-service * limit concurrency for non-high-priority tasks (1 per CPU, same as aio) * take IO priority of requesting process into account when in workqueue. * process queued readahead in 2M chunks to help ensure fairness between multiple requests with few CPUs/workqueues. Idle tasks get smaller 256K chunks. * stops readahead for idle tasks on read congestion Will try to benchmark with Postgres when I get the chance. Any other (Free Software) applications that might benefit from lower FADV_WILLNEED latency? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v2] cgroup: use new hashtable implementation
Switch cgroup to use the new hashtable implementation. No functional changes. Signed-off-by: Li Zefan --- v2: convert a hlist_del() to hash_del(). --- kernel/cgroup.c | 92 - 1 file changed, 39 insertions(+), 53 deletions(-) diff --git a/kernel/cgroup.c b/kernel/cgroup.c index 4855892..a5262d9 100644 --- a/kernel/cgroup.c +++ b/kernel/cgroup.c @@ -52,7 +52,7 @@ #include #include #include -#include +#include #include #include #include @@ -376,22 +376,18 @@ static int css_set_count; * account cgroups in empty hierarchies. */ #define CSS_SET_HASH_BITS 7 -#define CSS_SET_TABLE_SIZE (1 << CSS_SET_HASH_BITS) -static struct hlist_head css_set_table[CSS_SET_TABLE_SIZE]; +static DEFINE_HASHTABLE(css_set_table, CSS_SET_HASH_BITS); -static struct hlist_head *css_set_hash(struct cgroup_subsys_state *css[]) +static unsigned long css_set_hash(struct cgroup_subsys_state *css[]) { int i; - int index; - unsigned long tmp = 0UL; + unsigned long key = 0UL; for (i = 0; i < CGROUP_SUBSYS_COUNT; i++) - tmp += (unsigned long)css[i]; - tmp = (tmp >> 16) ^ tmp; + key += (unsigned long)css[i]; + key = (key >> 16) ^ key; - index = hash_long(tmp, CSS_SET_HASH_BITS); - - return &css_set_table[index]; + return key; } /* We don't maintain the lists running through each css_set to its @@ -418,7 +414,7 @@ static void __put_css_set(struct css_set *cg, int taskexit) } /* This css_set is dead. unlink it and release cgroup refcounts */ - hlist_del(&cg->hlist); + hash_del(&cg->hlist); css_set_count--; list_for_each_entry_safe(link, saved_link, &cg->cg_links, @@ -550,9 +546,9 @@ static struct css_set *find_existing_css_set( { int i; struct cgroupfs_root *root = cgrp->root; - struct hlist_head *hhead; struct hlist_node *node; struct css_set *cg; + unsigned long key; /* * Build the set of subsystem state objects that we want to see in the @@ -572,8 +568,8 @@ static struct css_set *find_existing_css_set( } } - hhead = css_set_hash(template); - hlist_for_each_entry(cg, node, hhead, hlist) { + key = css_set_hash(template); + hash_for_each_possible(css_set_table, cg, node, hlist, key) { if (!compare_css_sets(cg, oldcg, cgrp, template)) continue; @@ -657,8 +653,8 @@ static struct css_set *find_css_set( struct list_head tmp_cg_links; - struct hlist_head *hhead; struct cg_cgroup_link *link; + unsigned long key; /* First see if we already have a cgroup group that matches * the desired set */ @@ -704,8 +700,8 @@ static struct css_set *find_css_set( css_set_count++; /* Add this cgroup group to the hash table */ - hhead = css_set_hash(res->subsys); - hlist_add_head(&res->hlist, hhead); + key = css_set_hash(res->subsys); + hash_add(css_set_table, &res->hlist, key); write_unlock(&css_set_lock); @@ -1597,6 +1593,8 @@ static struct dentry *cgroup_mount(struct file_system_type *fs_type, struct cgroupfs_root *existing_root; const struct cred *cred; int i; + struct hlist_node *node; + struct css_set *cg; BUG_ON(sb->s_root != NULL); @@ -1650,14 +1648,8 @@ static struct dentry *cgroup_mount(struct file_system_type *fs_type, /* Link the top cgroup in this hierarchy into all * the css_set objects */ write_lock(&css_set_lock); - for (i = 0; i < CSS_SET_TABLE_SIZE; i++) { - struct hlist_head *hhead = &css_set_table[i]; - struct hlist_node *node; - struct css_set *cg; - - hlist_for_each_entry(cg, node, hhead, hlist) - link_css_set(&tmp_cg_links, cg, root_cgrp); - } + hash_for_each(css_set_table, i, node, cg, hlist) + link_css_set(&tmp_cg_links, cg, root_cgrp); write_unlock(&css_set_lock); free_cg_links(&tmp_cg_links); @@ -4438,6 +4430,9 @@ int __init_or_module cgroup_load_subsys(struct cgroup_subsys *ss) { struct cgroup_subsys_state *css; int i, ret; + struct hlist_node *node, *tmp; + struct css_set *cg; + unsigned long key; /* check name and function validity */ if (ss->name == NULL || strlen(ss->name) > MAX_CGROUP_TYPE_NAMELEN || @@ -4503,23 +4498,17 @@ int __init_or_module cgroup_load_subsys(struct cgroup_subsys *ss) * this is all done under the css_set_lock. */ write_lock(&css_set_lock); - for (i = 0; i < CSS_SET_TABLE_SIZE; i++) { -
Re: [PATCH V3 0/22] sched: simplified fork, enable load average into LB and power awareness scheduling
>> >> 3, power awareness scheduling, patch 13~22, >> The subset implement my previous power aware scheduling proposal: >> https://lkml.org/lkml/2012/8/13/139 >> It defines 2 new power aware policy balance and powersaving, and then >> try to spread or pack tasks on each of sched group level according the >> different scheduler policy. That can save much power when task number in >> system is no more then LCPU number. > > Interesting stuff. I have read through your patches, but it is still not > clear to me what metrics you use to determine whether a sched group is > fully utilized or if it can be used for packing more tasks. Is it based on > nr_running or PJT's tracked load or both? How is the threshold defined? Thanks review, Morten! cpu utilisation = rq->util * (rq->nr_running? rq->running : 1), here: rq->util = running time / whole period. If nr_running == 2, util == 99%, the potential max 'utilisation' is 99 * 2 = 198, because both of tasks may has the possibility to run full time. group utils = Sum of all cpu's util, like a 2 LCPU group, A nr_running is 0, B cpu util is 99%, and has 3 tasks, So, the group utils = A'util + 99 * 3, that is bigger than threshold = 99% * 2. The above calculation bias to performance, and that is our purpose. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
linux-next: Tree for Jan 10
Hi all, Changes since 20130109: The pm tree gained a conflict against the pci tree. I have created today's linux-next tree at git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git (patches at http://www.kernel.org/pub/linux/kernel/next/ ). If you are tracking the linux-next tree using git, you should not use "git pull" to do so as that will try to merge the new linux-next release with the old one. You should use "git fetch" as mentioned in the FAQ on the wiki (see below). You can see which trees have been included by looking in the Next/Trees file in the source. There are also quilt-import.log and merge.log files in the Next directory. Between each merge, the tree was built with a ppc64_defconfig for powerpc and an allmodconfig for x86_64. After the final fixups (if any), it is also built with powerpc allnoconfig (32 and 64 bit), ppc44x_defconfig and allyesconfig (minus CONFIG_PROFILE_ALL_BRANCHES - this fails its final link) and i386, sparc, sparc64 and arm defconfig. These builds also have CONFIG_ENABLE_WARN_DEPRECATED, CONFIG_ENABLE_MUST_CHECK and CONFIG_DEBUG_INFO disabled when necessary. Below is a summary of the state of the merge. We are up to 214 trees (counting Linus' and 28 trees of patches pending for Linus' tree), more are welcome (even if they are currently empty). Thanks to those who have contributed, and to those who haven't, please do. Status of my local build tests will be at http://kisskb.ellerman.id.au/linux-next . If maintainers want to give advice about cross compilers/configs that work, we are always open to add more builds. Thanks to Randy Dunlap for doing many randconfig builds. And to Paul Gortmaker for triage and bug fixes. There is a wiki covering stuff to do with linux-next at http://linux.f-seidel.de/linux-next/pmwiki/ . Thanks to Frank Seidel. -- Cheers, Stephen Rothwells...@canb.auug.org.au $ git checkout master $ git reset --hard stable Merging origin/master (5c49985 Merge branch 'fixes' of git://git.linaro.org/people/rmk/linux-arm) Merging fixes/master (d287b87 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs) Merging kbuild-current/rc-fixes (bad9955 menuconfig: Replace CIRCLEQ by list_head-style lists.) Merging arm-current/fixes (d106de3 ARM: 7614/1: mm: fix wrong branch from Cortex-A9 to PJ4b) Merging m68k-current/for-linus (e7e29b4 m68k: Wire up finit_module) Merging powerpc-merge/merge (e6449c9 powerpc: Add missing NULL terminator to avoid boot panic on PPC40x) Merging sparc/master (4e4d78f sparc: Hook up finit_module syscall.) Merging net/master (c9be4a5 net: prevent setting ttl=0 via IP_TTL) Merging sound-current/for-linus (56bde0f ALSA: hdspm - Fix wordclock status on AES32) Merging pci-current/for-linus (56d0da4 PCI/AER: pci_get_domain_bus_and_slot() call missing required pci_dev_put()) Merging wireless/master (5e20a4b b43: Fix firmware loading when driver is built into the kernel) Merging driver-core.current/driver-core-linus (4956964 Merge tag 'driver-core-3.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core) Merging tty.current/tty-linus (d1c3ed6 Linux 3.8-rc2) Merging usb.current/usb-linus (75e1a2a USB: ehci: make debug port in-use detection functional again) Merging staging.current/staging-linus (e16a922 staging: tidspbridge: use prepare/unprepare on dsp clocks) Merging char-misc.current/char-misc-linus (e6028db mei: fix mismatch in mutex unlock-lock in mei_amthif_read()) Merging input-current/for-linus (bec7a4b Input: lm8323 - fix checking PWM interrupt status) Merging md-current/for-linus (a9add5d md/raid5: add blktrace calls) Merging audit-current/for-linus (c158a35 audit: no leading space in audit_log_d_path prefix) Merging crypto-current/master (a2c0911 crypto: caam - Updated SEC-4.0 device tree binding for ERA information.) Merging ide/master (9974e43 ide: fix generic_ide_suspend/resume Oops) Merging dwmw2/master (084a0ec x86: add CONFIG_X86_MOVBE option) CONFLICT (content): Merge conflict in arch/x86/Kconfig Merging sh-current/sh-fixes-for-linus (4403310 SH: Convert out[bwl] macros to inline functions) Merging irqdomain-current/irqdomain/merge (a0d271c Linux 3.6) Merging devicetree-current/devicetree/merge (ab28698 of: define struct device in of_platform.h if !OF_DEVICE and !OF_ADDRESS) Merging spi-current/spi/merge (d3601e5 spi/sh-hspi: fix return value check in hspi_probe().) Merging gpio-current/gpio/merge (bc1008c gpio/mvebu-gpio: Make mvebu-gpio depend on OF_CONFIG) Merging rr-fixes/fixes (52441fa module: prevent warning when finit_module a 0 sized file) Merging asm-generic/master (fb9de7e xtensa: Use generic asm/mmu.h for nommu) Merging arm/for-next (32887f3 Merge branch 'fixes' into for-next) Merging arm-perf/for-next/perf (1052917 Merge branch 'perf/updat
Re: [PATCH] Only reset e820 once, even with multiple memmap=exactmap params
On Tuesday, January 08, 2013 09:19:18 AM Yinghai Lu wrote: ... > > that exactmap logic still have problem: > We need to check exactmap at first, aka need to scan the whole comand line > to see if exactmap is there at first and reset e820 tables then handle > other memmap opt. > > Also please update your patch after > > tip/x86/mm2 > > I have one patch that process memmap= with "," there. > > http://git.kernel.org/?p=linux/kernel/git/tip/tip.git;a=commitdiff;h=9710f58 > 1bb4c35589ac046b0cfc0deb7f369fc85 > > We could put exactmap scanning in new parse_memmap_opt. I still do not understand why: Kexec (kexec/firmware_memmap.c) is setting up the e820 map from: /sys/firmware/memmap/* and pass it via bootloader structures. And this e820 table gets immediately voided by memmap=exactmap and a new one passed via boot parameters is set up. If I read this correctly, this is what happens? Can kexec simply pass the memory to use via memmap=X@Y Then take the original e820 table, but not the usable entries (those are coming from above memmap=X@Y). That would mean that the kexec kernel takes all the original ACPI, ACPI NVS, reserved, unusable (everthing but usable) entries from the original e820 table and identifies the usable memory from memmap boot param? This would be much smarter than trying to pass the mmconf reserved area and I could imagine other issues will show up if the reserved areas do not match the original ones in the kexec kernel. If this really can be done and memmap=exactmap was only used by kexec, it's logic could be redefined from "drop all e820 entries" to "drop all usable e820 entries" and no further adjustings in kexec/kernel are needed to get mmconf working (and other issues may be avoided before they happen). Beside that ACPI reserved aread is not needed anymore to get passed via memmap=X#Y by kexec. Thomas -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Linux 3.8-rc3
The holidays are over, and things are starting to get back to normal. Except for Greg, I suspect, who is probably still digging his way out of his email hole. Anyway, another week, another -rc. A fairly normal-sized one. Most of the changes are in drivers, with the bulk of them being one-liners all over that just remove the last traces of __dev[init|exit]. But apart from that, there are real updates (GPU drivers stand out - mainly radeon and exynos), and some architecture work (arm, powerpc, mips, microblaze), filesystems (f2fs, gfs, cifs, nfs) and networking (netfilter and sunrpc). And I really hope things calm down. I realize there was some pent-up patches from the holidays, but let's make sure -rc4 really is smaller. Ok? Linus --- Aaro Koskinen (2): ARM: OMAP1: fix USB configuration use-after-release drm/nouveau: fix init with agpgart-uninorth Abhijith Das (1): GFS2: Fix race in gfs2_rs_alloc Adrian Huang (1): ACPI / APEI: Fix the returned value in erst_dbg_read Alex Deucher (10): drm/radeon: add connector table for Mac G4 Silver drm/radeon/r6xx: fix DMA engine for ttm bo transfers drm/radeon: fix typo in evergreen dma fence drm/radeon: add GPU reset flags drm/radeon: switch to a finer grained reset for r6xx/7xx drm/radeon: switch to a finer grained reset for evergreen drm/radeon: switch to a finer grained reset for cayman/TN drm/radeon: switch to a finer grained reset for SI (v2) drm/radeon: split r6xx and r7xx copy_dma functions drm/radeon: fix DMA CS parser for r6xx linear copy packet Alex Elder (1): sctp: fix Kconfig bug in default cookie hmac selection Alexander Aring (1): mac802154: fix NOHZ local_softirq_pending 08 warning Alexander Schremmer (1): ALSA: usb-audio: Add support for Creative BT-D1 via usb sound quirks Andrew Collins (1): netfilter: nf_nat: Also handle non-ESTABLISHED routing changes in MASQUERADE Andrew Lunn (5): ARM: Kirkwood: Fix missing clk for USB device. Power: gpio-poweroff: Fix documentation and gpio_is_valid ARM: Kirkwood: Fix missing sdio clock ARM: Kirkwood: Use fixed-regulator instead of board gpio call arm: mvebu: Add missing ; for cpu node. Arend van Spriel (1): brcmsmac: add copyright information for Canonical Asim Kadav (1): sound: oss/pas2: Fix possible access out of array Barry Song (2): pinctrl: sirf: add missing DT-binding document pinctrl: sirf: enable GPIO pullup/down configuration from dts Bartlomiej Zolnierkiewicz (1): mm: fix zone_watermark_ok_safe() accounting of isolated pages Bjørn Mork (1): net: qmi_wwan: add Telekom Speedstick LTE II Bob Hockney (1): netfilter: nfnetlink_log: fix mac address for 6in4 tunnels Bob Peterson (2): GFS2: Stop looking for free blocks at end of rgrp GFS2: Reset rd_last_alloc when it reaches the end of the rgrp Borislav Petkov (1): EDAC: Fix EDAC Kconfig menu Carlos Alberto Lopez Perez (1): Documentation/sysctl/kernel.txt: document /proc/sys/shmall Catalin Marinas (3): arm64: Include linux/ptrace.h in asm/compat.h arm64: Keep the ARM64 Kconfig selects sorted arm64: Always select ARM_AMBA and GENERIC_GPIO Cesar Eduardo Barros (18): MAINTAINERS: remove drivers/platform/msm/ MAINTAINERS: remove arch/arm/common/time-acorn.c MAINTAINERS: remove arch/arm/plat-s5p/ MAINTAINERS: fix drivers/rtc/rtc-vt8500.c MAINTAINERS: fix arch/arm/mach-at91/include/mach/at_hdmac.h MAINTAINERS: fix drivers/media/platform/atmel-isi.c MAINTAINERS: adjust for UAPI MAINTAINERS: fix drivers/media/usb/dvb-usb/cxusb* MAINTAINERS: remove drivers/video/epson1355fb.c MAINTAINERS: fix .../plat-mxc/include/mach/imxfb.h MAINTAINERS: fix drivers/ieee802154/ MAINTAINERS: remove firmware/isci/ MAINTAINERS: remove arch/x86/platform/mrst/pmu.* MAINTAINERS: fix Documentation/mei/ MAINTAINERS: remove drivers/mmc/host/imxmmc.* MAINTAINERS: remove arch/*/lib/perf_event*.c MAINTAINERS: remove include/linux/of_pwm.h MAINTAINERS: fix drivers/staging/sm7xx/ Christian Lamparter (2): carl9170: fix -EINVAL bailout during init with !CONFIG_MAC80211_MESH p54usb: add USBIDs for two more p54usb devices Christophe TORDEUX (1): Input: sentelic - only report position of first finger as ST coordinates Damien Zammit (1): ALSA: usb-audio: Fix kernel panic of Digidesign Mbox2 quirk Daniel Lezcano (1): cpuidle: fix lock contention in the idle path Daniel Vetter (1): drm/ttm: fix fence locking in ttm_buffer_object_transfer David Henningsson (2): ALSA: hda - Switch "On" and "Off" for "Mute-LED Mode" kcontrol ALSA: hda - add mute LED for HP Pavilion 17 (Realtek codec) Dmitry Torokhov (1): Input: gpio_keys - defer probing if GPIO probing is deferred Eric Dumazet (2):
Re: [PATCH v3 04/22] sched: don't need go to smaller sched domain
On Wed, 2013-01-09 at 17:38 +, Morten Rasmussen wrote: > On Sat, Jan 05, 2013 at 08:37:33AM +, Alex Shi wrote: > > If parent sched domain has no task allowed cpu find. neither find in > > it's child. So, go out to save useless checking. > > > > Signed-off-by: Alex Shi > > --- > > kernel/sched/fair.c | 6 ++ > > 1 file changed, 2 insertions(+), 4 deletions(-) > > > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > > index 3c7b09a..ecfbf8e 100644 > > --- a/kernel/sched/fair.c > > +++ b/kernel/sched/fair.c > > @@ -3378,10 +3378,8 @@ select_task_rq_fair(struct task_struct *p, int > > sd_flag, int wake_flags) > > load_idx = sd->wake_idx; > > > > group = find_idlest_group(sd, p, cpu, load_idx); > > The previous patch changed the behavior of find_idlest_group() to > returning the local group if it is suitable. This effectively means that > you remove the recursive search for a suitable idle sched group. You > could as well merge find_idlest_group() and find_idlest_cpu() to avoid > iterating through the cpus of the same sched group twice. find_idlest_* could stop when seeing 0 too, can't get much more idle. -Mike -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 4/5] x86,smp: keep spinlock delay values per hashed spinlock address
On Tue, Jan 08, 2013 at 05:31:19PM -0500, Rik van Riel wrote: > From: Eric Dumazet > > Eric Dumazet found a regression with the first version of the spinlock > backoff code, in a workload where multiple spinlocks were contended, > each having a different wait time. > > This patch has multiple delay values per cpu, indexed on a hash > of the lock address, to avoid that problem. > > Eric Dumazet wrote: > > I did some tests with your patches with following configuration : > > tc qdisc add dev eth0 root htb r2q 1000 default 3 > (to force a contention on qdisc lock, even with a multi queue net > device) > > and 24 concurrent "netperf -t UDP_STREAM -H other_machine -- -m 128" > > Machine : 2 Intel(R) Xeon(R) CPU X5660 @ 2.80GHz > (24 threads), and a fast NIC (10Gbps) > > Resulting in a 13 % regression (676 Mbits -> 595 Mbits) > > In this workload we have at least two contended spinlocks, with > different delays. (spinlocks are not held for the same duration) > > It clearly defeats your assumption of a single per cpu delay being OK : > Some cpus are spinning too long while the lock was released. > > We might try to use a hash on lock address, and an array of 16 different > delays so that different spinlocks have a chance of not sharing the same > delay. > > With following patch, I get 982 Mbits/s with same bench, so an increase > of 45 % instead of a 13 % regression. > > Signed-off-by: Eric Dumazet > Signed-off-by: Rik van Riel > --- Acked-by: Rafael Aquini > arch/x86/kernel/smp.c | 22 +++--- > 1 files changed, 19 insertions(+), 3 deletions(-) > > diff --git a/arch/x86/kernel/smp.c b/arch/x86/kernel/smp.c > index 05f828b..1877890 100644 > --- a/arch/x86/kernel/smp.c > +++ b/arch/x86/kernel/smp.c > @@ -23,6 +23,7 @@ > #include > #include > #include > +#include > > #include > #include > @@ -134,12 +135,26 @@ static bool smp_no_nmi_ipi = false; > #define DELAY_FIXED_1 (1< #define MIN_SPINLOCK_DELAY (1 * DELAY_FIXED_1) > #define MAX_SPINLOCK_DELAY (16000 * DELAY_FIXED_1) > -DEFINE_PER_CPU(unsigned, spinlock_delay) = { MIN_SPINLOCK_DELAY }; > +#define DELAY_HASH_SHIFT 6 > +struct delay_entry { > + u32 hash; > + u32 delay; > +}; > +static DEFINE_PER_CPU(struct delay_entry [1 << DELAY_HASH_SHIFT], > spinlock_delay) = { > + [0 ... (1 << DELAY_HASH_SHIFT) - 1] = { > + .hash = 0, > + .delay = MIN_SPINLOCK_DELAY, > + }, > +}; > + > void ticket_spin_lock_wait(arch_spinlock_t *lock, struct __raw_tickets inc) > { > __ticket_t head = inc.head, ticket = inc.tail; > __ticket_t waiters_ahead; > - unsigned delay = __this_cpu_read(spinlock_delay); > + u32 hash = hash32_ptr(lock); > + u32 slot = hash_32(hash, DELAY_HASH_SHIFT); > + struct delay_entry *ent = &__get_cpu_var(spinlock_delay[slot]); > + u32 delay = (ent->hash == hash) ? ent->delay : MIN_SPINLOCK_DELAY; > unsigned loops = 1; > > for (;;) { > @@ -175,7 +190,8 @@ void ticket_spin_lock_wait(arch_spinlock_t *lock, struct > __raw_tickets inc) > break; > } > } > - __this_cpu_write(spinlock_delay, delay); > + ent->hash = hash; > + ent->delay = delay; > } > > /* > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3/5] x86,smp: auto tune spinlock backoff delay factor
On Tue, Jan 08, 2013 at 05:30:29PM -0500, Rik van Riel wrote: > Many spinlocks are embedded in data structures; having many CPUs > pounce on the cache line the lock is in will slow down the lock > holder, and can cause system performance to fall off a cliff. > > The paper "Non-scalable locks are dangerous" is a good reference: > > http://pdos.csail.mit.edu/papers/linux:lock.pdf > > In the Linux kernel, spinlocks are optimized for the case of > there not being contention. After all, if there is contention, > the data structure can be improved to reduce or eliminate > lock contention. > > Likewise, the spinlock API should remain simple, and the > common case of the lock not being contended should remain > as fast as ever. > > However, since spinlock contention should be fairly uncommon, > we can add functionality into the spinlock slow path that keeps > system performance from falling off a cliff when there is lock > contention. > > Proportional delay in ticket locks is delaying the time between > checking the ticket based on a delay factor, and the number of > CPUs ahead of us in the queue for this lock. Checking the lock > less often allows the lock holder to continue running, resulting > in better throughput and preventing performance from dropping > off a cliff. > > Proportional spinlock delay with a high delay factor works well > when there is lots contention on a lock. Likewise, a smaller > delay factor works well when a lock is lightly contended. > > Making the code auto-tune the delay factor results in a system > that performs well with both light and heavy lock contention. > > Signed-off-by: Rik van Riel > --- > v3: use fixed-point math for the delay calculations, suggested by Michel > Lespinasse > Acked-by: Rafael Aquini > arch/x86/kernel/smp.c | 43 +++ > 1 files changed, 39 insertions(+), 4 deletions(-) > > diff --git a/arch/x86/kernel/smp.c b/arch/x86/kernel/smp.c > index aa743e9..05f828b 100644 > --- a/arch/x86/kernel/smp.c > +++ b/arch/x86/kernel/smp.c > @@ -113,13 +113,34 @@ static atomic_t stopping_cpu = ATOMIC_INIT(-1); > static bool smp_no_nmi_ipi = false; > > /* > - * Wait on a congested ticket spinlock. > + * Wait on a congested ticket spinlock. Many spinlocks are embedded in > + * data structures; having many CPUs pounce on the cache line with the > + * spinlock simultaneously can slow down the lock holder, and the system > + * as a whole. > + * > + * To prevent total performance collapse in case of bad spinlock contention, > + * perform proportional backoff. The per-cpu value of delay is automatically > + * tuned to limit the number of times spinning CPUs poll the lock before > + * obtaining it. This limits the amount of cross-CPU traffic required to > obtain > + * a spinlock, and keeps system performance from dropping off a cliff. > + * > + * There is a tradeoff. If we poll too often, the whole system is slowed > + * down. If we sleep too long, the lock will go unused for a period of > + * time. The solution is to go for a fast spin if we are at the head of > + * the queue, to slowly increase the delay if we sleep for too short a > + * time, and to decrease the delay if we slept for too long. > */ > +#define DELAY_SHIFT 8 > +#define DELAY_FIXED_1 (1< +#define MIN_SPINLOCK_DELAY (1 * DELAY_FIXED_1) > +#define MAX_SPINLOCK_DELAY (16000 * DELAY_FIXED_1) > +DEFINE_PER_CPU(unsigned, spinlock_delay) = { MIN_SPINLOCK_DELAY }; > void ticket_spin_lock_wait(arch_spinlock_t *lock, struct __raw_tickets inc) > { > __ticket_t head = inc.head, ticket = inc.tail; > __ticket_t waiters_ahead; > - unsigned loops; > + unsigned delay = __this_cpu_read(spinlock_delay); > + unsigned loops = 1; > > for (;;) { > waiters_ahead = ticket - head - 1; > @@ -133,14 +154,28 @@ void ticket_spin_lock_wait(arch_spinlock_t *lock, > struct __raw_tickets inc) > } while (ACCESS_ONCE(lock->tickets.head) != ticket); > break; > } > - loops = 50 * waiters_ahead; > + > + /* Aggressively increase delay, to minimize lock accesses. */ > + if (delay < MAX_SPINLOCK_DELAY) > + delay += DELAY_FIXED_1 / 7; > + > + loops = (delay * waiters_ahead) >> DELAY_SHIFT; > while (loops--) > cpu_relax(); > > head = ACCESS_ONCE(lock->tickets.head); > - if (head == ticket) > + if (head == ticket) { > + /* > + * We overslept, and do not know by how. > + * Exponentially decay the value of delay, > + * to get it back to a good value quickly. > + */ > + if (delay >= 2 * DELAY_FIXED_1) > + delay -= max(delay/32, DELAY_FIXED_1); > break; > + } > }
Re: [Pv-drivers] [PATCH 0/6] VSOCK for Linux upstreaming
> > Our position is that VSOCK feature set is more complete and that it > > should be possible to use transports other than VMCI for VSOCK > > traffic, should interested parties implement them, > > Implementing other transports requires restructing vsock (and vmci) > first as the current vsock code is not a hypervisor neutral service. I'm going to bite the bullet and spend the next couple of days doing just that: factoring out the VMCI bits and hiding them behind a transport layer. It'll be a bit rough, but it'll be a start. We'll submit another patch series next week with that. I'm hoping that'll get us over this hump, since it should by hypervisor agnostic at that point. It'll be up to you guys to add virtio, though :) And in the meantime, are there any other glaring errors that we need to address? Thanks! - Andy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] module, fix percpu reserved memory exhaustion
Rusty, There is likely some subtlety of moving the module mutex that I'm unaware of. What I can say is that this patch seems to resolve the problem for me, or at least through 100+ reboots I have not seen the problem (I'm still testing as I write this). I'm more than willing to hear an alternative approach, or test an alternative patch. Thanks, P. 8< In recent Fedora releases (F17 & F18) some users have reported seeing messages similar to [ 15.478121] Pid: 727, comm: systemd-udevd Tainted: GF 3.8.0-rc2+ #1 [ 15.478121] Call Trace: [ 15.478131] [] pcpu_alloc+0xa01/0xa60 [ 15.478137] [] ? printk+0x61/0x63 [ 15.478140] [] __alloc_reserved_percpu+0x13/0x20 [ 15.478145] [] load_module+0x1dc2/0x20b0 [ 15.478150] [] ? do_page_fault+0xe/0x10 [ 15.478152] [] ? page_fault+0x28/0x30 [ 15.478155] [] sys_init_module+0xd7/0x120 [ 15.478159] [] system_call_fastpath+0x16/0x1b [ 15.478160] kvm: Could not allocate 304 bytes percpu data [ 15.478174] PERCPU: allocation failed, size=304 align=32, alloc from reserved chunk failed during system boot. In some cases, users have also reported seeing this message along with a failed load of other modules. As the message indicates, the reserved chunk of percpu memory (where modules allocate their memory) is exhausted. A debug printk inserted in the code shows [ 15.478533] PRARIT size = 304 > chunk->contig_hint = 208 ie) the reserved chunk of percpu has only 208 bytes of available space. What is happening is systemd is loading an instance of the kvm module for each cpu found (see commit e9bda3b). When the module load occurs the kernel currently allocates the modules percpu data area prior to checking to see if the module is already loaded or is in the process of being loaded. If the module is already loaded, or finishes load, the module loading code releases the current instance's module's percpu data. The problem is that these module loads race and it is possible that all of the percpu reserved area is consumed by repeated loads of the same module which results in the failure of other drivers to load. This patch moves the module percpu allocation after the check for an existing instance of the module. Signed-off-by: Prarit Bhargava Cc: Rusty Russell Cc: Mike Galbraith --- kernel/module.c | 124 ++- 1 file changed, 85 insertions(+), 39 deletions(-) diff --git a/kernel/module.c b/kernel/module.c index 250092c..e7e9b57 100644 --- a/kernel/module.c +++ b/kernel/module.c @@ -1929,6 +1929,27 @@ static int verify_export_symbols(struct module *mod) return 0; } +static void simplify_percpu_symbols(struct module *mod, + const struct load_info *info) +{ + Elf_Shdr *symsec = &info->sechdrs[info->index.sym]; + Elf_Sym *sym = (void *)symsec->sh_addr; + unsigned long secbase; + unsigned int i; + + /* +* No need for error checking in this function because +* simplify_symbols has already been called. +*/ + for (i = 1; i < symsec->sh_size / sizeof(Elf_Sym); i++) { + /* Divert to percpu allocation if a percpu var. */ + if (sym[i].st_shndx == info->index.pcpu) { + secbase = (unsigned long)mod_percpu(mod); + sym[i].st_value += secbase; + } + } +} + /* Change all symbols so that st_value encodes the pointer directly. */ static int simplify_symbols(struct module *mod, const struct load_info *info) { @@ -1976,12 +1997,11 @@ static int simplify_symbols(struct module *mod, const struct load_info *info) break; default: - /* Divert to percpu allocation if a percpu var. */ - if (sym[i].st_shndx == info->index.pcpu) - secbase = (unsigned long)mod_percpu(mod); - else + /* percpu diverts handled in simplify_percpu_symbols */ + if (sym[i].st_shndx != info->index.pcpu) { secbase = info->sechdrs[sym[i].st_shndx].sh_addr; - sym[i].st_value += secbase; + sym[i].st_value += secbase; + } break; } } @@ -2899,11 +2919,29 @@ int __weak module_frob_arch_sections(Elf_Ehdr *hdr, return 0; } +static int allocate_percpu(struct module *mod, struct load_info *info) +{ + Elf_Shdr *pcpusec; + int err; + + pcpusec = &info->sechdrs[info->index.pcpu]; + if (pcpusec->sh_size) { + /* We have a special allocation for this section. */ + pr_debug("module %s attempting to percpu with size %d\n", +mod->name, pcpusec->sh_size); + err = percpu_modalloc(mod, + pcpuse
linux-next: change contacts for the kvm tree
Hi Marcelo, I have removed Avi as a contact for the kvm tree as the address I have (Avi Kivity ) bounces. -- Cheers, Stephen Rothwells...@canb.auug.org.au pgpnJmosQ9hpR.pgp Description: PGP signature
linux-next: build warning after merge of the kvm tree
Hi Marcelo, After merging the kvm tree, today's linux-next build (x86_64 allmodconfig) produced this warning: In file included from arch/x86/kvm/mmu.c:3482:0: arch/x86/kvm/paging_tmpl.h: In function 'paging64_walk_addr_generic': arch/x86/kvm/paging_tmpl.h:154:57: warning: unused variable 'shift' [-Wunused-variable] In file included from arch/x86/kvm/mmu.c:3486:0: arch/x86/kvm/paging_tmpl.h: In function 'paging32_walk_addr_generic': arch/x86/kvm/paging_tmpl.h:154:57: warning: unused variable 'shift' [-Wunused-variable] Introduced by commit 908e7d7999bc ("KVM: MMU: simplify folding of dirty bit into accessed_dirty"). -- Cheers, Stephen Rothwells...@canb.auug.org.au pgp5_jM1h1y9n.pgp Description: PGP signature
[PATCH RFT] regulator: lp8788-buck: Silence build warning
This driver use id as array index, thus add bounder checking for id. This patch fixes below build warning: drivers/regulator/lp8788-buck.c: In function 'lp8788_buck_probe': drivers/regulator/lp8788-buck.c:490:8: warning: array subscript is above array bounds [-Warray-bounds] drivers/regulator/lp8788-buck.c:489:63: warning: array subscript is above array bounds [-Warray-bounds] Reported-by: Fengguang Wu Signed-off-by: Axel Lin --- Hi Fengguang, My toolchain does not show the warning you mentioned, can you test if this patch indeed silence the build warning? Thanks, Axel drivers/regulator/lp8788-buck.c |5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/regulator/lp8788-buck.c b/drivers/regulator/lp8788-buck.c index 1161949..ef845c7 100644 --- a/drivers/regulator/lp8788-buck.c +++ b/drivers/regulator/lp8788-buck.c @@ -474,7 +474,7 @@ static int lp8788_init_dvs(struct platform_device *pdev, u8 default_dvs_mode[] = { LP8788_BUCK1_DVS_I2C, LP8788_BUCK2_DVS_I2C }; /* no dvs for buck3, 4 */ - if (id == BUCK3 || id == BUCK4) + if (id > BUCK2) return 0; /* no dvs platform data, then dvs will be selected by I2C registers */ @@ -505,6 +505,9 @@ static int lp8788_buck_probe(struct platform_device *pdev) struct regulator_dev *rdev; int ret; + if (id >= LP8788_NUM_BUCKS) + return -EINVAL; + buck = devm_kzalloc(&pdev->dev, sizeof(struct lp8788_buck), GFP_KERNEL); if (!buck) return -ENOMEM; -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/5] x86,smp: proportional backoff for ticket spinlocks
On Tue, Jan 08, 2013 at 05:32:41PM -0500, Rik van Riel wrote: > Subject: x86,smp: proportional backoff for ticket spinlocks > > Simple fixed value proportional backoff for ticket spinlocks. > By pounding on the cacheline with the spin lock less often, > bus traffic is reduced. In cases of a data structure with > embedded spinlock, the lock holder has a better chance of > making progress. > > If we are next in line behind the current holder of the > lock, we do a fast spin, so as not to waste any time when > the lock is released. > > The number 50 is likely to be wrong for many setups, and > this patch is mostly to illustrate the concept of proportional > backup. The next patch automatically tunes the delay value. > > Signed-off-by: Rik van Riel > Signed-off-by: Michel Lespinasse > --- Acked-by: Rafael Aquini > arch/x86/kernel/smp.c | 23 --- > 1 files changed, 20 insertions(+), 3 deletions(-) > > diff --git a/arch/x86/kernel/smp.c b/arch/x86/kernel/smp.c > index 20da354..aa743e9 100644 > --- a/arch/x86/kernel/smp.c > +++ b/arch/x86/kernel/smp.c > @@ -117,11 +117,28 @@ static bool smp_no_nmi_ipi = false; > */ > void ticket_spin_lock_wait(arch_spinlock_t *lock, struct __raw_tickets inc) > { > + __ticket_t head = inc.head, ticket = inc.tail; > + __ticket_t waiters_ahead; > + unsigned loops; > + > for (;;) { > - cpu_relax(); > - inc.head = ACCESS_ONCE(lock->tickets.head); > + waiters_ahead = ticket - head - 1; > + /* > + * We are next after the current lock holder. Check often > + * to avoid wasting time when the lock is released. > + */ > + if (!waiters_ahead) { > + do { > + cpu_relax(); > + } while (ACCESS_ONCE(lock->tickets.head) != ticket); > + break; > + } > + loops = 50 * waiters_ahead; > + while (loops--) > + cpu_relax(); > > - if (inc.head == inc.tail) > + head = ACCESS_ONCE(lock->tickets.head); > + if (head == ticket) > break; > } > } > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/5] x86,smp: make ticket spinlock proportional backoff w/ auto tuning
On Wed, Jan 09, 2013 at 06:20:35PM +0530, Raghavendra K T wrote: > I ran kernbench on 32 core (mx3850) machine with 3.8-rc2 base. > x base_3.8rc2 > + rik_backoff > N Min MaxMedian AvgStddev > x 8 222.977231.16 227.735 227.388 3.1512986 > + 8218.75 232.347 229.1035 228.25425 4.2730225 > No difference proven at 95.0% confidence I got similar results on smaller systems (1 socket, dual-cores and quad-cores) when running Rik's latest series, no big difference for good nor for worse, but I also think Rik's work is meant to address bigger systems with more cores contending for any given spinlock. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v6 05/15] memory-hotplug: introduce new function arch_remove_memory() for removing page table depends on architecture
Hi Andrew, On 01/10/2013 06:50 AM, Andrew Morton wrote: On Wed, 9 Jan 2013 17:32:29 +0800 Tang Chen wrote: For removing memory, we need to remove page table. But it depends on architecture. So the patch introduce arch_remove_memory() for removing page table. Now it only calls __remove_pages(). Note: __remove_pages() for some archtecuture is not implemented (I don't know how to implement it for s390). Can this break the build for s390? No, I don't think so. The arch_remove_memory() in s390 will only return -EBUSY. Thanks. :) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] mm: forcely swapout when we are out of page cache
On Wed, Jan 09, 2013 at 04:26:02PM -0800, Andrew Morton wrote: > On Wed, 9 Jan 2013 15:21:14 +0900 > Minchan Kim wrote: > > > If laptop_mode is enable, VM try to avoid I/O for saving the power. > > But if there isn't reclaimable memory without I/O, we should do I/O > > for preventing unnecessary OOM kill although we sacrifices power. > > > > One of example is that we are out of page cache. Remained one is > > only anonymous pages, for swapping out, we needs may_writepage = 1. > > > > Reported-by: Luigi Semenzato > > Signed-off-by: Minchan Kim > > --- > > mm/vmscan.c |6 ++ > > 1 file changed, 6 insertions(+) > > > > diff --git a/mm/vmscan.c b/mm/vmscan.c > > index 439cc47..624c816 100644 > > --- a/mm/vmscan.c > > +++ b/mm/vmscan.c > > @@ -1728,6 +1728,12 @@ static void get_scan_count(struct lruvec *lruvec, > > struct scan_control *sc, > > free = zone_page_state(zone, NR_FREE_PAGES); > > if (unlikely(file + free <= high_wmark_pages(zone))) { > > scan_balance = SCAN_ANON; > > + /* > > +* From now on, we have to swap out > > +* for peventing OOM kill although > > +* we sacrifice power consumption. > > +*/ > > + sc->may_writepage = 1; > > goto out; > > } > > } > > This is pretty ugly. get_scan_count() is, as its name implies, an > idempotent function which inspects the state of things and returns a > result. As such, it has no business going in and altering the state of > the scan_control. > > We have code in both direct reclaim and in kswapd to set may_writepage > if vmscan is getting into trouble. I don't see why adding another > instance is necessary if the existing instances are working correctly. > > > > (Is it correct that __zone_reclaim() ignores laptop_mode?) > > > I have a feeling that laptop mode has bitrotted and these patches are > kinda hacking around as-yet-not-understood failures... Absolutely, this patch is last guard for unexpectable behavior. As I mentioned in cover-letter, Luigi's problem could be solved either [1/2] or [2/2] but I wanted to add this as last resort in case of unexpected emergency. But you're right. It's not good to hide the problem like this path so let's drop [2/2]. Also, I absolutely agree it has bitrotted so for correcting it, we need a volunteer who have to inverstigate power saveing experiment with long time. So [1/2] would be band-aid until that. > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ -- Kind regards, Minchan Kim -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v6 00/15] memory-hotplug: hot-remove physical memory
Hi Andrew, On 01/10/2013 07:33 AM, Andrew Morton wrote: On Wed, 9 Jan 2013 17:32:24 +0800 Tang Chen wrote: This patch-set aims to implement physical memory hot-removing. As you were on th patch delivery path, all of these patches should have your Signed-off-by:. But some were missing it. I fixed this in my copy of the patches. Thank you very much for the help. Next time I'll add it myself. I suspect this patchset adds a significant amount of code which will not be used if CONFIG_MEMORY_HOTPLUG=n. "[PATCH v6 06/15] memory-hotplug: implement register_page_bootmem_info_section of sparse-vmemmap", for example. This is not a good thing, so please go through the patchset (in fact, go through all the memhotplug code) and let's see if we can reduce the bloat for CONFIG_MEMORY_HOTPLUG=n kernels. This needn't be done immediately - it would be OK by me if you were to defer this exercise until all the new memhotplug code is largely in place. But please, let's do it. OK, I'll do have a check on it when the page_cgroup problem is solved. Thanks. :) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v6 00/15] memory-hotplug: hot-remove physical memory
Hi Andrew, Thank you very much for your pushing. :) On 01/10/2013 06:23 AM, Andrew Morton wrote: This does sound like a significant problem. We should assume that mmecg is available and in use. In patch1, we provide a solution which is not good enough: Iterate twice to offline the memory. 1st iterate: offline every non primary memory block. 2nd iterate: offline primary (i.e. first added) memory block. Let's flesh this out a bit. If we online memory8, memory9, memory10 and memory11 then I'd have thought that they would need to offlined in reverse order, which will require four iterations, not two. Is this wrong and if so, why? Well, we may need more than two iterations if all memory8, memory9, memory10 are in use by kernel, and 10 depends on 9, 9 depends on 8. So, as you see here, the iteration method is not good enough. But this only happens when the memory is used by kernel, which will not be able to be migrated. So if we can use a boot option, such as movablecore_map, or movable_online functionality to limit the memory as movable, the kernel will not use this memory. So it is safe when we are doing node hot-remove. Also, what happens if we wish to offline only memory9? Do we offline memory11 then memory10 then memory9 and then re-online memory10 and memory11? In this case, offlining memory9 could fail if user do this by himself, for example using sysfs. In this path, it is in memory hot-remove path. So when we remove a memory device, it will automatically offline all pages, and it is in reverse order by itself. And again, this is not good enough. We will figure out a reasonable way to solve it soon. And a new idea from Wen Congyang is: allocate the memory from the memory block they are describing. Yes. But we are not sure if it is OK to do so because there is not existing API to do so, and we need to move page_cgroup memory allocation from MEM_GOING_ONLINE to MEM_ONLINE. This all sounds solvable - can we proceed in this fashion? Yes, we are in progress now. And also, it may interfere the hugepage. Please provide full details on this problem. It is not very clear now, and if I find something, I'll share it out. Note: if the memory provided by the memory device is used by the kernel, it can't be offlined. It is not a bug. Right. But how often does this happen in testing? In other words, please provide an overall description of how well memory hot-remove is presently operating. Is it reliable? What is the success rate in real-world situations? We test the hot-remove functionality mostly with movable_online used. And the memory used by kernel is not allowed to be removed. We will do some tests in the kernel memory offline cases, and tell you the test results soon. And since we are trying out some other ways, I think the problem will be solved soon. Are there precautions which the administrator can take to improve the success rate? Administrator could use movablecore_map boot option or movable_online functionality (which is now in kernel) to limit memory as movable to avoid this problem. What are the remaining problems and are there plans to address them? For now, we will try to allocate page_group on the memory block which itself is describing. And all the other parts seems work well now. And we are still testing. If we have any problem, we will share. Thanks. :) -- To unsubscribe from this list: send the line "unsubscribe linux-acpi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3/3] ARM: Dove: add Cubox sdhci card detect gpio
On Tue, Dec 18, 2012 at 01:24:29AM +0100, Sebastian Hesselbarth wrote: > Card detect for sdhci on Cubox is connected to the wrong pin > (sdio1_cd instead of sdio0_cd). With support for cd-gpios and > pinctrl add the corresponding properties to DT for Cubox. > > Signed-off-by: Sebastian Hesselbarth > --- > Cc: Russell King > Cc: Jason Cooper > Cc: Andrew Lunn > Cc: Sebastian Hesselbarth > Cc: linux-arm-ker...@lists.infradead.org > Cc: linux-kernel@vger.kernel.org > --- > arch/arm/boot/dts/dove-cubox.dts | 14 -- > 1 file changed, 12 insertions(+), 2 deletions(-) Applied to mvebu/fixes thx, Jason. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH V3 4/8] memcg: add per cgroup dirty pages accounting
(2013/01/10 0:02), Sha Zhengju wrote: On Mon, Jan 7, 2013 at 3:25 PM, Kamezawa Hiroyuki wrote: (2013/01/05 13:48), Sha Zhengju wrote: On Wed, Jan 2, 2013 at 6:44 PM, Michal Hocko wrote: On Wed 26-12-12 01:26:07, Sha Zhengju wrote: From: Sha Zhengju This patch adds memcg routines to count dirty pages, which allows memory controller to maintain an accurate view of the amount of its dirty memory and can provide some info for users while cgroup's direct reclaim is working. I guess you meant targeted resp. (hard/soft) limit reclaim here, right? It is true that this is direct reclaim but it is not clear to me Yes, I meant memcg hard/soft reclaim here which is triggered directly by allocation and is distinct from background kswapd reclaim (global). why the usefulnes should be limitted to the reclaim for users. I would understand this if the users was in fact in-kernel users. One of the reasons I'm trying to accounting the dirty pages is to get a more board overall view of memory usages because memcg hard/soft reclaim may have effect on response time of user application. Yeah, the beneficiary can be application administrator or kernel users. :P [...] To prevent AB/BA deadlock mentioned by Greg Thelen in previous version (https://lkml.org/lkml/2012/7/30/227), we adjust the lock order: ->private_lock --> mapping->tree_lock --> memcg->move_lock. So we need to make mapping->tree_lock ahead of TestSetPageDirty in __set_page_dirty() and __set_page_dirty_nobuffers(). But in order to avoiding useless spinlock contention, a prepare PageDirty() checking is added. But there is another AA deadlock here I believe. page_remove_rmap mem_cgroup_begin_update_page_stat <<< 1 set_page_dirty __set_page_dirty_buffers __set_page_dirty mem_cgroup_begin_update_page_stat <<< 2 move_lock_mem_cgroup spin_lock_irqsave(&memcg->move_lock, *flags); mem_cgroup_begin_update_page_stat is not recursive wrt. locking AFAICS because we might race with the moving charges: CPU0CPU1 page_remove_rmap mem_cgroup_can_attach mem_cgroup_begin_update_page_stat (1) rcu_read_lock mem_cgroup_start_move atomic_inc(&memcg_moving) atomic_inc(&memcg->moving_account) synchronize_rcu __mem_cgroup_begin_update_page_stat mem_cgroup_stolen <<< TRUE move_lock_mem_cgroup [...] mem_cgroup_begin_update_page_stat (2) __mem_cgroup_begin_update_page_stat mem_cgroup_stolen <<< still TRUE move_lock_mem_cgroup <<< DEADLOCK [...] mem_cgroup_end_update_page_stat rcu_unlock # wake up from synchronize_rcu [...] mem_cgroup_move_task mem_cgroup_move_charge walk_page_range mem_cgroup_move_account move_lock_mem_cgroup Maybe I have missed some other locking which would prevent this from happening but the locking relations are really complicated in this area so if mem_cgroup_{begin,end}_update_page_stat might be called recursively then we need a fat comment which justifies that. Ohhh...good catching! I didn't notice there is a recursive call of mem_cgroup_{begin,end}_update_page_stat in page_remove_rmap(). The mem_cgroup_{begin,end}_update_page_stat() design has depressed me a lot recently as the lock granularity is a little bigger than I thought. Not only the resource but also some code logic is in the range of locking which may be deadlock prone. The problem still exists if we are trying to add stat account of other memcg page later, may I make bold to suggest that we dig into the lock again... But with regard to the current lock implementation, I doubt if we can we can account MEM_CGROUP_STAT_FILE_{MAPPED, DIRTY} in one breath and just try to get move_lock once in the beginning. IMHO we can make mem_cgroup_{begin,end}_update_page_stat() to recursive aware and what I'm thinking now is changing memcg->move_lock to rw-spinlock from the original spinlock: mem_cgroup_{begin,end}_update_page_stat() try to get the read lock which make it reenterable and memcg moving task side try to get the write spinlock. Then the race may be following: CPU0CPU1 page_remove_rmap mem_cgroup_can_attach mem_cgroup_begin_update_page_stat (1) rcu_read_lock mem_cgroup_start_move atomic_inc(&memcg_moving) atomic_inc(&memcg->moving_account)
[PATCH] hardlockup: detect hard lockups without NMIs using secondary cpus
Emulate NMIs on systems where they are not available by using timer interrupts on other cpus. Each cpu will use its softlockup hrtimer to check that the next cpu is processing hrtimer interrupts by verifying that a counter is increasing. This patch is useful on systems where the hardlockup detector is not available due to a lack of NMIs, for example most ARM SoCs. Without this patch any cpu stuck with interrupts disabled can cause a hardware watchdog reset with no debugging information, but with this patch the kernel can detect the lockup and panic, which can result in useful debugging info. Signed-off-by: Colin Cross --- include/linux/nmi.h |5 ++- kernel/watchdog.c | 98 --- lib/Kconfig.debug | 14 +++- 3 files changed, 110 insertions(+), 7 deletions(-) diff --git a/include/linux/nmi.h b/include/linux/nmi.h index db50840..c8f8aa0 100644 --- a/include/linux/nmi.h +++ b/include/linux/nmi.h @@ -14,8 +14,11 @@ * may be used to reset the timeout - for code which intentionally * disables interrupts for a long time. This call is stateless. */ -#if defined(CONFIG_HAVE_NMI_WATCHDOG) || defined(CONFIG_HARDLOCKUP_DETECTOR) +#if defined(CONFIG_HAVE_NMI_WATCHDOG) || defined(CONFIG_HARDLOCKUP_DETECTOR_NMI) #include +#endif + +#if defined(CONFIG_HAVE_NMI_WATCHDOG) || defined(CONFIG_HARDLOCKUP_DETECTOR) extern void touch_nmi_watchdog(void); #else static inline void touch_nmi_watchdog(void) diff --git a/kernel/watchdog.c b/kernel/watchdog.c index 75a2ab3..94c231e 100644 --- a/kernel/watchdog.c +++ b/kernel/watchdog.c @@ -44,6 +44,11 @@ static DEFINE_PER_CPU(bool, hard_watchdog_warn); static DEFINE_PER_CPU(bool, watchdog_nmi_touch); static DEFINE_PER_CPU(unsigned long, hrtimer_interrupts_saved); +#endif +#ifdef CONFIG_HARDLOCKUP_DETECTOR_OTHER_CPU +static cpumask_t __read_mostly watchdog_cpus; +#endif +#ifdef CONFIG_HARDLOCKUP_DETECTOR_NMI static DEFINE_PER_CPU(struct perf_event *, watchdog_ev); #endif @@ -179,7 +184,7 @@ void touch_softlockup_watchdog_sync(void) __raw_get_cpu_var(watchdog_touch_ts) = 0; } -#ifdef CONFIG_HARDLOCKUP_DETECTOR +#ifdef CONFIG_HARDLOCKUP_DETECTOR_NMI /* watchdog detector functions */ static int is_hardlockup(void) { @@ -193,6 +198,64 @@ static int is_hardlockup(void) } #endif +#ifdef CONFIG_HARDLOCKUP_DETECTOR_OTHER_CPU +static int is_hardlockup_other_cpu(int cpu) +{ + unsigned long hrint = per_cpu(hrtimer_interrupts, cpu); + + if (per_cpu(hrtimer_interrupts_saved, cpu) == hrint) + return 1; + + per_cpu(hrtimer_interrupts_saved, cpu) = hrint; + return 0; +} + +static void watchdog_check_hardlockup_other_cpu(void) +{ + int cpu; + cpumask_t cpus = watchdog_cpus; + + /* +* Test for hardlockups every 3 samples. The sample period is +* watchdog_thresh * 2 / 5, so 3 samples gets us back to slightly over +* watchdog_thresh (over by 20%). +*/ + if (__this_cpu_read(hrtimer_interrupts) % 3 != 0) + return; + + /* check for a hardlockup on the next cpu */ + cpu = cpumask_next(smp_processor_id(), &cpus); + if (cpu >= nr_cpu_ids) + cpu = cpumask_first(&cpus); + if (cpu == smp_processor_id()) + return; + + smp_rmb(); + + if (per_cpu(watchdog_nmi_touch, cpu) == true) { + per_cpu(watchdog_nmi_touch, cpu) = false; + return; + } + + if (is_hardlockup_other_cpu(cpu)) { + /* only warn once */ + if (per_cpu(hard_watchdog_warn, cpu) == true) + return; + + if (hardlockup_panic) + panic("Watchdog detected hard LOCKUP on cpu %d", cpu); + else + WARN(1, "Watchdog detected hard LOCKUP on cpu %d", cpu); + + per_cpu(hard_watchdog_warn, cpu) = true; + } else { + per_cpu(hard_watchdog_warn, cpu) = false; + } +} +#else +static inline void watchdog_check_hardlockup_other_cpu(void) { return; } +#endif + static int is_softlockup(unsigned long touch_ts) { unsigned long now = get_timestamp(smp_processor_id()); @@ -204,7 +267,7 @@ static int is_softlockup(unsigned long touch_ts) return 0; } -#ifdef CONFIG_HARDLOCKUP_DETECTOR +#ifdef CONFIG_HARDLOCKUP_DETECTOR_NMI static struct perf_event_attr wd_hw_attr = { .type = PERF_TYPE_HARDWARE, @@ -252,7 +315,7 @@ static void watchdog_overflow_callback(struct perf_event *event, __this_cpu_write(hard_watchdog_warn, false); return; } -#endif /* CONFIG_HARDLOCKUP_DETECTOR */ +#endif /* CONFIG_HARDLOCKUP_DETECTOR_NMI */ static void watchdog_interrupt_count(void) { @@ -272,6 +335,9 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer) /* kick the hardlockup detector */ watchdog_interrupt_count(); + /* test for hard
Re: [PATCH 1/2] mm: prevent to add a page to swap if may_writepage is unset
Hi Andrew, On Wed, Jan 09, 2013 at 04:18:54PM -0800, Andrew Morton wrote: > On Wed, 9 Jan 2013 15:21:13 +0900 > Minchan Kim wrote: > > > Recently, Luigi reported there are lots of free swap space when > > OOM happens. It's easily reproduced on zram-over-swap, where > > many instance of memory hogs are running and laptop_mode is enabled. > > > > Luigi reported there was no problem when he disabled laptop_mode. > > The problem when I investigate problem is following as. > > > > try_to_free_pages disable may_writepage if laptop_mode is enabled. > > shrink_page_list adds lots of anon pages in swap cache by > > add_to_swap, which makes pages Dirty and rotate them to head of > > inactive LRU without pageout. If it is repeated, inactive anon LRU > > is full of Dirty and SwapCache pages. > > > > In case of that, isolate_lru_pages fails because it try to isolate > > clean page due to may_writepage == 0. > > > > The may_writepage could be 1 only if total_scanned is higher than > > writeback_threshold in do_try_to_free_pages but unfortunately, > > VM can't isolate anon pages from inactive anon lru list by > > above reason and we already reclaimed all file-backed pages. > > So it ends up OOM killing. > > > > This patch prevents to add a page to swap cache unnecessary when > > may_writepage is unset so anoymous lru list isn't full of > > Dirty/Swapcache page. So VM can isolate pages from anon lru list, > > which ends up setting may_writepage to 1 and could swap out > > anon lru pages. When OOM triggers, I confirmed swap space was full. > > > > ... > > > > --- a/mm/vmscan.c > > +++ b/mm/vmscan.c > > @@ -780,6 +780,8 @@ static unsigned long shrink_page_list(struct list_head > > *page_list, > > if (PageAnon(page) && !PageSwapCache(page)) { > > if (!(sc->gfp_mask & __GFP_IO)) > > goto keep_locked; > > + if (!sc->may_writepage) > > + goto keep_locked; > > if (!add_to_swap(page)) > > goto activate_locked; > > may_enter_fs = 1; > > I'm not really getting it, and the description is rather hard to follow :( It seems I don't have a talent about description. :( I hope it would be better this year. :) > > We should be adding anon pages to swapcache even when laptop_mode is > set. And we should be writing them to swap as well, then reclaiming > them. The only thing laptop_mode shouild do is make the disk spin up > less frequently - that doesn't mean "not at all"! So it seems your rationale is that let's save power in only system has enough memory so let's remove may_writepage in reclaim path? If it is, I love it because I didn't see any number about power saving through reclaiming throttling(But surely there was reason to add it) and not sure it works well during long time because we have tweaked reclaim part too many. > > So something seems screwed up here and the patch looks like a > heavy-handed workaround. Why aren't these anon pages getting written > out in laptop_mode? Don't know. It was there long time and I don't want to screw it up. If we decide paging out in reclaim path regardless of laptop_mode, it makes the problem easy without ugly workaround. Remove may_writepage? If it's too agressive, we can remove it in only direct reclaim path. > > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majord...@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: mailto:"d...@kvack.org";> em...@kvack.org -- Kind regards, Minchan Kim -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/4] arm: vt8500: Add support for Wondermedia WM8750/WM8850
On Wed, 2013-01-09 at 21:27 +, Arnd Bergmann wrote: > On Wednesday 09 January 2013, Tony Prisk wrote: > > I'm quite happy to send a pull request if that's what you prefer. > > > > Generally people have just taken the patches straight from email, so > > everytime I've done a pull-request I get a reply back saying the patches > > have already been applied. > > > > Is there some 'rule' around pull-requests vs emailed patches? > > Generally, pull requests tend to be less work for us, so I prefer > them. In particular, when you add a tag description or a signed > tag, that gives automatically puts nice text into the merge > changeset. > > > Should patches in pull-requests have Ack'd lines already? > > Yes. > > Arnd This is what I thought - and the reason I haven't sent a pull-request for the patch's - I haven't had any Ack's :) Regards Tony P -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v5 01/14] memory-hotplug: try to offline the memory twice to avoid dependence
Hi Glauber, On 01/09/2013 11:09 PM, Glauber Costa wrote: We try to make all page_cgroup allocations local to the node they are describing now. If the memory is the first memory onlined in this node, we will allocate it from the other node. For example, node1 has 4 memory blocks: 8-11, and we online it from 8 to 11 1. memory block 8, page_cgroup allocations are in the other nodes 2. memory block 9, page_cgroup allocations are in memory block 8 So we should offline memory block 9 first. But we don't know in which order the user online the memory block. I think we can modify memcg like this: allocate the memory from the memory block they are describing I am not sure it is OK to do so. I don't see a reason why not. I'm not sure, but if we do this, we could bring in a fragment for each memory block (a memory section, 128MB, right?). Is this a problem when we use large page (such as 1GB page) ? Even if not, will these fragments make any bad effects ? Thank. :) You would have to tweak a bit the lookup function for page_cgroup, but assuming you will always have the pfns and limits, it should be easy to do. I think the only tricky part is that today we have a single node_page_cgroup, and we would of course have to have one per memory block. My assumption is that the number of memory blocks is limited and likely not very big. So even a static array would do. Kamezawa, do you have any input in here? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v7u1 08/31] x86, 64bit: early #PF handler set page table
On Mon, Jan 7, 2013 at 7:55 AM, Borislav Petkov wrote: > Those -vXX version lines need to go under the "---" line. Alternatively, > you might want to add some of them to the commit message with a proper > explanation since they are not that trivial at a first glance, for > example the -v5, -v6, -v8, -v9 with a better explanation. mostly they are for tracking version. > >> > > This needs hpa's S-O-B. he will add later when he put the in the tip. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 1/3] dmaengine: add dma_get_channel_caps()
On Wed, Oct 24, 2012 at 08:43:27AM +0530, Vinod Koul wrote: > On Tue, 2012-10-23 at 23:49 +0100, Grant Likely wrote: > > > +enum dmaengine_apis { > > > + DMAENGINE_MEMCPY= 0x0001, > > > + DMAENGINE_XOR = 0x0002, > > > + DMAENGINE_XOR_VAL = 0x0004, > > > + DMAENGINE_PQ= 0x0008, > > > + DMAENGINE_PQ_VAL= 0x0010, > > > + DMAENGINE_MEMSET= 0x0020, > > > + DMAENGINE_SLAVE = 0x0040, > > > + DMAENGINE_CYCLIC= 0x0080, > > > + DMAENGINE_INTERLEAVED = 0x0100, > > > + DMAENGINE_SG= 0x0200, > > > +}; > > > > Actually, one more comment. Why the new enum? Why can't the > > dma_transaction_type enum be used directly along with dma_cap_mask_t? > Some of the capabilities above are not there in dma_caps_t like DMA_SG. > Also DMA_INTERRUPT and DMA_PRIVATE would not make much sense here. > > BUT would help to keep things simpler if have one definition which > includes all. All good points. I think we can best eliminate this redundant set of enums by using dma_cap_mask_t and noting in a comment as follows: diff --git a/include/linux/dmaengine.h b/include/linux/dmaengine.h index 0181887..9add5ca 100644 --- a/include/linux/dmaengine.h +++ b/include/linux/dmaengine.h @@ -391,14 +391,15 @@ enum dmaengine_apis { * a) APIs/ops supported * b) channel physical capabilities * - * @ops: or'ed api capability + * @cap_mask: api/ops capability (DMA_INTERRUPT and DMA_PRIVATE + *are invalid api/ops and will never be set) * @seg_nr: maximum number of SG segments supported on a SG/SLAVE * channel (0 for no maximum or not a SG/SLAVE channel) * @seg_len: maximum length of SG segments supported on a SG/SLAVE * channel (0 for no maximum or not a SG/SLAVE channel) */ struct dmaengine_chan_caps { - enum dmaengine_apis ops; + dma_cap_mask_t cap_mask; int seg_nr; int seg_len; }; If that sounds good I'll send an updated series with this change. -Matt -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v7u1 12/31] x86: add get_ramdisk_image/size()
On Mon, Jan 7, 2013 at 7:56 AM, Borislav Petkov wrote: > On Thu, Jan 03, 2013 at 04:48:32PM -0800, Yinghai Lu wrote: >> There are several places to find ramdisk information early for reserving >> and relocating. >> >> Use functions to make code more readable and consistent. >> >> Later will add ext_ramdisk_image/size in those functions to support >> loading ramdisk above 4g. >> >> Signed-off-by: Yinghai Lu >> --- >> arch/x86/kernel/setup.c | 29 + >> 1 file changed, 21 insertions(+), 8 deletions(-) >> >> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c >> index 1b8a8cc..644a123 100644 >> --- a/arch/x86/kernel/setup.c >> +++ b/arch/x86/kernel/setup.c >> @@ -294,12 +294,25 @@ static void __init reserve_brk(void) >> >> #ifdef CONFIG_BLK_DEV_INITRD >> >> +static u64 __init get_ramdisk_image(void) >> +{ >> + u64 ramdisk_image = boot_params.hdr.ramdisk_image; >> + >> + return ramdisk_image; > > just do No, I will insert line between them. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] leds-lm355x: support LED trigger functionality
On Wed, Jan 9, 2013 at 5:42 PM, Kim, Milo wrote: >> -Original Message- >> From: Bryan Wu [mailto:coolo...@gmail.com] >> Sent: Thursday, January 10, 2013 10:16 AM >> To: Kim, Milo >> Cc: Jeong, Daniel; gshark.je...@gmail.com; linux-l...@vger.kernel.org; >> linux-kernel@vger.kernel.org >> Subject: Re: [PATCH] leds-lm355x: support LED trigger functionality >> >> On Mon, Jan 7, 2013 at 10:35 PM, Kim, Milo wrote: >> > LM355x family devices provide flash, torch and indicator functions. >> > This patch support LED trigger feature. >> > Using LED trigger APIs(), other driver simply turn on/off the flash, >> torch >> > and indicator. >> > >> > Platform data >> > the name of LED trigger is configurable. >> > >> > Documentation >> > example and detailed description added. >> > >> > Signed-off-by: Milo(Woogyom) Kim >> > --- >> > Documentation/leds/leds-lm3556.txt| 62 >> + >> > drivers/leds/leds-lm355x.c|3 ++ >> > include/linux/platform_data/leds-lm355x.h |8 >> > 3 files changed, 73 insertions(+) >> > >> > diff --git a/Documentation/leds/leds-lm3556.txt >> b/Documentation/leds/leds-lm3556.txt >> > index d9eb91b..73244cd 100644 >> > --- a/Documentation/leds/leds-lm3556.txt >> > +++ b/Documentation/leds/leds-lm3556.txt >> > @@ -83,3 +83,65 @@ and register it in the platform init function >> > Example: >> > board_register_i2c_bus(4, 400, >> > board_i2c_ch4, >> ARRAY_SIZE(board_i2c_ch4)); >> > + >> > +Support LED Triggers >> > + >> > +Flash, torch and indicator can be controlled not only by an user- >> space but also >> > +by other drivers, kernel space. >> > +For example, flash turns on by camera driver internally. >> > +To support this functionality, LED trigger is registered. >> > +The name of LED trigger is configurable in the platform data. >> > + >> > +Example: LED trigger name for flash >> > +#include >> > + >> > +struct lm355x_trigger_name lm3556_trigger_name = { >> > + .flash = "flash", >> > +}; >> > + >> > +struct lm355x_platform_data lm3556_pdata = { >> > + ... >> > + .trigger = &lm3556_trigger_name, >> > +}; >> > + >> > +Example: Flash control in simple camera driver >> >> Hi Milo, >> >> Can we share this camera flash trigger with other users? just >> introduce a new trigger in drivers/leds/ledtrig-flash.c, >> drivers/leds/ledtrig-torch.c, drivers/leds/ledtrig-indicator.c. So it >> can shared by plenty of users. >> >> And this trigger can be independent with LM355x and use other LED >> hardware driver underneath. >> >> So I expect such trigger driver instead of a specific trigger in LED >> hardware driver and in other subsystem. > > Good idea, Bryan. Thanks! > Then added name field of lm355x platform data can be gone and driver is > replaced with fixed name field. > And how about gathering ledtrig-* drivers in new subdirectory under > drivers/leds > ?(eg. drivers/leds/trigger/) > Nice, feel free to submit patches. -Bryan -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH] ARM: DTS: exynos4210-origen: Add MAX8997 node entry
Thomas Abraham wrote: > > On 4 January 2013 00:23, Tushar Behera wrote: > > The node entry is added as per documentation and legacy board file. > > > > Signed-off-by: Tushar Behera > > --- > > The patch is rebased on v3.8-rc1. > > > > arch/arm/boot/dts/exynos4210-origen.dts | 150 > +++ > > 1 files changed, 150 insertions(+), 0 deletions(-) > > This patch has already been posted. Here is the link: > https://patchwork.kernel.org/patch/1794501/ > OK, I will apply. Thanks. - Kukjin -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH] ARM: DTS: exynos4210-origen: Add MAX8997 node entry
Tushar Behera wrote: > > On 01/04/2013 10:51 PM, Thomas Abraham wrote: > > On 4 January 2013 00:23, Tushar Behera wrote: > >> The node entry is added as per documentation and legacy board file. > >> > >> Signed-off-by: Tushar Behera > >> --- > >> The patch is rebased on v3.8-rc1. > >> > >> arch/arm/boot/dts/exynos4210-origen.dts | 150 > +++ > >> 1 files changed, 150 insertions(+), 0 deletions(-) > > Please ignore this patch. > OK. > > > > This patch has already been posted. Here is the link: > > https://patchwork.kernel.org/patch/1794501/ > > > > Sorry, had missed that patch. I have added a 'Tested-by' to the original > patch. > OK, I will add 'Tested-by' when I apply that. Thanks. - Kukjin -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH] leds-lm355x: support LED trigger functionality
> -Original Message- > From: Bryan Wu [mailto:coolo...@gmail.com] > Sent: Thursday, January 10, 2013 10:16 AM > To: Kim, Milo > Cc: Jeong, Daniel; gshark.je...@gmail.com; linux-l...@vger.kernel.org; > linux-kernel@vger.kernel.org > Subject: Re: [PATCH] leds-lm355x: support LED trigger functionality > > On Mon, Jan 7, 2013 at 10:35 PM, Kim, Milo wrote: > > LM355x family devices provide flash, torch and indicator functions. > > This patch support LED trigger feature. > > Using LED trigger APIs(), other driver simply turn on/off the flash, > torch > > and indicator. > > > > Platform data > > the name of LED trigger is configurable. > > > > Documentation > > example and detailed description added. > > > > Signed-off-by: Milo(Woogyom) Kim > > --- > > Documentation/leds/leds-lm3556.txt| 62 > + > > drivers/leds/leds-lm355x.c|3 ++ > > include/linux/platform_data/leds-lm355x.h |8 > > 3 files changed, 73 insertions(+) > > > > diff --git a/Documentation/leds/leds-lm3556.txt > b/Documentation/leds/leds-lm3556.txt > > index d9eb91b..73244cd 100644 > > --- a/Documentation/leds/leds-lm3556.txt > > +++ b/Documentation/leds/leds-lm3556.txt > > @@ -83,3 +83,65 @@ and register it in the platform init function > > Example: > > board_register_i2c_bus(4, 400, > > board_i2c_ch4, > ARRAY_SIZE(board_i2c_ch4)); > > + > > +Support LED Triggers > > + > > +Flash, torch and indicator can be controlled not only by an user- > space but also > > +by other drivers, kernel space. > > +For example, flash turns on by camera driver internally. > > +To support this functionality, LED trigger is registered. > > +The name of LED trigger is configurable in the platform data. > > + > > +Example: LED trigger name for flash > > +#include > > + > > +struct lm355x_trigger_name lm3556_trigger_name = { > > + .flash = "flash", > > +}; > > + > > +struct lm355x_platform_data lm3556_pdata = { > > + ... > > + .trigger = &lm3556_trigger_name, > > +}; > > + > > +Example: Flash control in simple camera driver > > Hi Milo, > > Can we share this camera flash trigger with other users? just > introduce a new trigger in drivers/leds/ledtrig-flash.c, > drivers/leds/ledtrig-torch.c, drivers/leds/ledtrig-indicator.c. So it > can shared by plenty of users. > > And this trigger can be independent with LM355x and use other LED > hardware driver underneath. > > So I expect such trigger driver instead of a specific trigger in LED > hardware driver and in other subsystem. Good idea, Bryan. Thanks! Then added name field of lm355x platform data can be gone and driver is replaced with fixed name field. And how about gathering ledtrig-* drivers in new subdirectory under drivers/leds ?(eg. drivers/leds/trigger/) Best Regards, Milo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH] ARM: dts: exynos5250: Set HDMI version to v1.4
Sean Paul wrote: > > On Mon, Jan 7, 2013 at 4:00 PM, Sean Paul wrote: > > Set the HDMI version to v1.4 for all exynos5250 targets. This property > > was previously being inferred by the existence of the hdmi node. > > > > Signed-off-by: Sean Paul > > --- > > The patch adding the property has churned a bit from > devicetree-discuss reviews. I'll upload this again after it's settled. > > Please drop. > OK, thanks for your information. - Kukjin -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH 1/3] pinctrl: exynos: add exynos5250 SoC specific data
Linus Walleij wrote: > > On Thu, Dec 27, 2012 at 5:58 PM, Kukjin Kim > wrote: > > > And I think, would be clear if the config could be changed like following. > > > > 8<-- > > From: Kukjin Kim > > Subject: [PATCH] pinctrl: exynos: change PINCTRL_EXYNOS option > > > > Since pinctrl-exynos can support exynos4 and exynos5 so changed > > the option name to PINCTRL_EXYNOS for more clarity. > > > > Cc: Thomas Abraham > > Cc: Linus Walleij > > Cc: Grant Likely > > Signed-off-by: Kukjin Kim > > Acked-by: Linus Walleij > > Shall I take this into the pinctrl tree? > Yes, please :-) Just note, regarding Samsung pinctrl changes in the following: git://git.kernel.org/pub/scm/linux/kernel/git/kgene/linux-samsung.git next/pinctrl-exynos Thanks. - Kukjin -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v7u1 20/31] x86, kexec: replace ident_mapping_init and init_level4_page
On Sat, Jan 5, 2013 at 5:24 AM, Borislav Petkov wrote: > On Fri, Jan 04, 2013 at 02:04:05PM -0800, Yinghai Lu wrote: >> On Fri, Jan 4, 2013 at 1:01 PM, Borislav Petkov wrote: >> > On Thu, Jan 03, 2013 at 04:48:40PM -0800, Yinghai Lu wrote: >> >> static int init_pgtable(struct kimage *image, unsigned long >> >> start_pgtable) >> >> { >> >> + struct x86_mapping_info info = { >> >> + .alloc_pgt_page = alloc_pgt_page, >> >> + .context= image, >> >> + .pmd_flag = __PAGE_KERNEL_LARGE_EXEC, >> >> + }; >> > >> > This is leaving ->kernel_mapping uninitialized to contain a random, >> > previous stack value. I don't think we want that. >> >> that should be initialized to false by default. > > So make it explicit. You can't possibly rely on what the stack contains > when you allocate that struct there. I should say: that *is* initialized to false by default. please check http://stackoverflow.com/questions/10828294/c-and-c-partial-initialization-of-automatic-structure -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] leds-lm355x: support LED trigger functionality
On Mon, Jan 7, 2013 at 10:35 PM, Kim, Milo wrote: > LM355x family devices provide flash, torch and indicator functions. > This patch support LED trigger feature. > Using LED trigger APIs(), other driver simply turn on/off the flash, torch > and indicator. > > Platform data > the name of LED trigger is configurable. > > Documentation > example and detailed description added. > > Signed-off-by: Milo(Woogyom) Kim > --- > Documentation/leds/leds-lm3556.txt| 62 > + > drivers/leds/leds-lm355x.c|3 ++ > include/linux/platform_data/leds-lm355x.h |8 > 3 files changed, 73 insertions(+) > > diff --git a/Documentation/leds/leds-lm3556.txt > b/Documentation/leds/leds-lm3556.txt > index d9eb91b..73244cd 100644 > --- a/Documentation/leds/leds-lm3556.txt > +++ b/Documentation/leds/leds-lm3556.txt > @@ -83,3 +83,65 @@ and register it in the platform init function > Example: > board_register_i2c_bus(4, 400, > board_i2c_ch4, ARRAY_SIZE(board_i2c_ch4)); > + > +Support LED Triggers > + > +Flash, torch and indicator can be controlled not only by an user-space but > also > +by other drivers, kernel space. > +For example, flash turns on by camera driver internally. > +To support this functionality, LED trigger is registered. > +The name of LED trigger is configurable in the platform data. > + > +Example: LED trigger name for flash > +#include > + > +struct lm355x_trigger_name lm3556_trigger_name = { > + .flash = "flash", > +}; > + > +struct lm355x_platform_data lm3556_pdata = { > + ... > + .trigger = &lm3556_trigger_name, > +}; > + > +Example: Flash control in simple camera driver Hi Milo, Can we share this camera flash trigger with other users? just introduce a new trigger in drivers/leds/ledtrig-flash.c, drivers/leds/ledtrig-torch.c, drivers/leds/ledtrig-indicator.c. So it can shared by plenty of users. And this trigger can be independent with LM355x and use other LED hardware driver underneath. So I expect such trigger driver instead of a specific trigger in LED hardware driver and in other subsystem. -Bryan > +#include > + > +#ifdef CONFIG_LEDS_TRIGGERS > +DEFINE_LED_TRIGGER(flash_led_trigger); > +#endif > + > +static int foo_camera_init() > +{ > + ... > + > +#ifdef CONFIG_LEDS_TRIGGERS > + /* should be same name as in lm355x_platform_data */ > + led_trigger_register_simple("flash", &flash_led_trigger); > +#endif > + > + ... > +} > + > +static void foo_camera_exit() > +{ > + ... > + > +#ifdef CONFIG_LEDS_TRIGGERS > + led_trigger_unregister_simple(flash_led_trigger); > +#endif > + > + ... > +} > + > +#ifdef CONFIG_LEDS_TRIGGERS > +static void foo_camera_flash_ctrl(bool on) > +{ > + if (on) > + led_trigger_event(flash_led_trigger, LED_FULL); > + else > + led_trigger_event(flash_led_trigger, LED_OFF); > +} > +#else > +#define foo_camera_flash_ctrl NULL > +#endif > diff --git a/drivers/leds/leds-lm355x.c b/drivers/leds/leds-lm355x.c > index 65d7928..29df4c0 100644 > --- a/drivers/leds/leds-lm355x.c > +++ b/drivers/leds/leds-lm355x.c > @@ -477,6 +477,7 @@ static int lm355x_probe(struct i2c_client *client, > chip->cdev_flash.name = "flash"; > chip->cdev_flash.max_brightness = 16; > chip->cdev_flash.brightness_set = lm355x_strobe_brightness_set; > + chip->cdev_flash.default_trigger = pdata->trigger->flash; > err = led_classdev_register((struct device *) > &client->dev, &chip->cdev_flash); > if (err < 0) > @@ -486,6 +487,7 @@ static int lm355x_probe(struct i2c_client *client, > chip->cdev_torch.name = "torch"; > chip->cdev_torch.max_brightness = 8; > chip->cdev_torch.brightness_set = lm355x_torch_brightness_set; > + chip->cdev_torch.default_trigger = pdata->trigger->torch; > err = led_classdev_register((struct device *) > &client->dev, &chip->cdev_torch); > if (err < 0) > @@ -499,6 +501,7 @@ static int lm355x_probe(struct i2c_client *client, > else > chip->cdev_indicator.max_brightness = 8; > chip->cdev_indicator.brightness_set = lm355x_indicator_brightness_set; > + chip->cdev_indicator.default_trigger = pdata->trigger->indicator; > err = led_classdev_register((struct device *) > &client->dev, &chip->cdev_indicator); > if (err < 0) > diff --git a/include/linux/platform_data/leds-lm355x.h > b/include/linux/platform_data/leds-lm355x.h > index b88724b..b64d312 100644 > --- a/include/linux/platform_data/leds-lm355x.h > +++ b/include/linux/platform_data/leds-lm355x.h > @@ -42,6 +42,12 @@ enum lm355x_pmode { > LM355x_PMODE_ENABLE = 0x04, > }; > > +struct lm355x_trigger_name { > + const char *flash; > + const char *torc
RE: sd8688 firmware location
Hi Dan, > > btmrvl_sdio and libertas_sdio both use firmware files sd8688.bin and > > sd8688_helper.bin. In linux-firmware, they're present in libertas/ tree and > > (since 3d32a58b) libertas_sdio perfers loading it from there, while it is > > able > > to fallback to load it from linux-firmware root. btmrvl_sdio, on the other > > hand > > only looks in the root and ends up not being successful. > > > > Obviously, there are two solutions to the problem -- either teach > > btmrvl_sdio > > to look into libertas/, or move the files in linux-firmware tree. I don't > > really have a strong preference, though it probably makes less sense to > > keep in > > in libertas/, since the bluetooth hardware is not really marketed as > > "Libertas." > > > > I'm following up with patches to linux and linux-firmware and I'd be very > > thankful if you could pick one (not both of them). > > So the BT part and the wifi part have different SDIO IDs; are they > actually connected separately to the SDIO bus? Or is the chip only in > one mode at one time or something like that? Is there a problem with > having both libertas and btmrvl loaded at the same time since they're > essentially the same chip? SD8688 is a combo chip with different device IDs for WLAN (function 1) and BT (function 2). WLAN + BT together are connected to the bus with a single SDIO interface. Both libertas and btmrvl drivers can be loaded at the same time, and WLAN & BT functions work concurrently. Of course the SDIO bus is shared between WLAN and BT. The firmware image can be downloaded from host to device by either libertas or btmrvl driver whoever becomes the winner from firmware downloading perspective. Regards, Bing > > I don't really mind moving stuff to mrvl/ out of libertas/ for these > devices, but I do want some backwards compat code in libertas for that. > Unless, of course, Marcel was talking about symlinks in the > linux-firmware git tree, which would be fine with me. The important > point is that simply updating your linux-firmware package or install or > whatever *should not* result in a failed firmware load. > > Dan
Re: i915_hangcheck_hung problem with 3.8-rc2+ (Linus's latest tree)
On Thu, Jan 10, 2013 at 11:07 AM, Chris Wilson wrote: > On Wed, 9 Jan 2013 16:40:25 -0800, Greg KH wrote: >> On Wed, Jan 09, 2013 at 02:12:04PM -0600, Dave Kleikamp wrote: >> > On 01/09/2013 01:44 PM, Dave Kleikamp wrote: >> > > >> > > I can easily reproduce it running glxgears on 3.8-rc1 or 3.8-rc2. >> > > >> > > 00:02.0 VGA compatible controller [0300]: Intel Corporation Core >> > > Processor Integrated Graphics Controller [8086:0046] (rev 02) >> > > >> > > Thinkpad T410 >> > > >> > > Shaggy >> > >> > Daniel's patch: >> > >> > drm/i915: Revert shrinker changes from "Track unbound pages" >> > >> > fixes the problem for me. >> >> After an afternoon of multiple kernel builds and other stressful things, >> it looks like it fixes it for me as well. Chris, this will be going to >> Linus soon, right? > > Daniel will send it on. I hope before he does so, he will clarify the > changelog to note that it is just papering over the issue. If the > conjecture is right, it will not prevent that path from triggering the > hang, nor does it prevent other eviction paths from potentially causing > the same issue. In this case since the issue was papered over all the kernel up until 3.7, I think repapering is the answer for now. I have a novel idea maybe someone could spend some time working out what is broken in private on a test box instead of making everyone who runs 3.7 and 3.8 on ILK deal with it. I of course know this won't happen and I'll be reverting patches from you guys that cause Ironlake flakyness for ever. Dave. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] drivers/cpufreq: Warn user when powernow-k8 tries to fall back to acpi-cpufreq and it is unavailable.
This patch is in reference to bug#:51741. (https://bugzilla.kernel.org/show_bug.cgi?id=51741) powernow-k8 falls back to acpi-cpufreq if CPU is not supported. However, it states that acpi-cpufreq has taken over even if acpi-cpufreq is not compiled in. This patch rewords the warning message to clarify that the CPU is unsupported and prints a warning message when there is no acpi-cpufreq present. Signed-off-by: Aravind Gopalakrishnan --- drivers/cpufreq/powernow-k8.c | 10 +- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/drivers/cpufreq/powernow-k8.c b/drivers/cpufreq/powernow-k8.c index 056faf6..6fa58b4 100644 --- a/drivers/cpufreq/powernow-k8.c +++ b/drivers/cpufreq/powernow-k8.c @@ -1256,7 +1256,15 @@ static int __cpuinit powernowk8_init(void) int rv; if (static_cpu_has(X86_FEATURE_HW_PSTATE)) { - pr_warn(PFX "this CPU is not supported anymore, using acpi-cpufreq instead.\n"); + pr_warn(PFX + "this CPU is not supported anymore, use acpi-cpufreq instead" + "Look for message from acpi-cpufreq to ensure it is loaded." + ".\n"); +#ifndef CONFIG_X86_ACPI_CPUFREQ + pr_warn(PFX "acpi-cpufreq is disabled." + "Enable it in the config options to get frequency scaling.\n"); + return -ENODEV; +#endif request_module("acpi-cpufreq"); return -ENODEV; } -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: i915_hangcheck_hung problem with 3.8-rc2+ (Linus's latest tree)
On Wed, 9 Jan 2013 16:40:25 -0800, Greg KH wrote: > On Wed, Jan 09, 2013 at 02:12:04PM -0600, Dave Kleikamp wrote: > > On 01/09/2013 01:44 PM, Dave Kleikamp wrote: > > > > > > I can easily reproduce it running glxgears on 3.8-rc1 or 3.8-rc2. > > > > > > 00:02.0 VGA compatible controller [0300]: Intel Corporation Core > > > Processor Integrated Graphics Controller [8086:0046] (rev 02) > > > > > > Thinkpad T410 > > > > > > Shaggy > > > > Daniel's patch: > > > > drm/i915: Revert shrinker changes from "Track unbound pages" > > > > fixes the problem for me. > > After an afternoon of multiple kernel builds and other stressful things, > it looks like it fixes it for me as well. Chris, this will be going to > Linus soon, right? Daniel will send it on. I hope before he does so, he will clarify the changelog to note that it is just papering over the issue. If the conjecture is right, it will not prevent that path from triggering the hang, nor does it prevent other eviction paths from potentially causing the same issue. -Chris -- Chris Wilson, Intel Open Source Technology Centre -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 6/6][v3] perf: Document the ABI of perf sysfs entries
[PATCH 6/6][v3] perf: Document the ABI of perf sysfs entries This patchset addes two new sets of files to sysfs: - generic and POWER-specific perf events in /sys/devices/cpu/events/ - perf event config format in /sys/devices/cpu/format/event Document the format of these files which would become part of the ABI. Changelog[v3]: [Greg KH] Include ABI documentation. Signed-off-by: Sukadev Bhattiprolu --- Documentation/ABI/stable/sysfs-devices-cpu-events | 54 + Documentation/ABI/stable/sysfs-devices-cpu-format | 27 ++ 2 files changed, 81 insertions(+), 0 deletions(-) create mode 100644 Documentation/ABI/stable/sysfs-devices-cpu-format diff --git a/Documentation/ABI/stable/sysfs-devices-cpu-events b/Documentation/ABI/stable/sysfs-devices-cpu-events index e69de29..f37d542 100644 --- a/Documentation/ABI/stable/sysfs-devices-cpu-events +++ b/Documentation/ABI/stable/sysfs-devices-cpu-events @@ -0,0 +1,54 @@ +What: /sys/devices/cpu/events/ + /sys/devices/cpu/events/branch-misses + /sys/devices/cpu/events/cache-references + /sys/devices/cpu/events/cache-misses + /sys/devices/cpu/events/stalled-cycles-frontend + /sys/devices/cpu/events/branch-instructions + /sys/devices/cpu/events/stalled-cycles-backend + /sys/devices/cpu/events/instructions + /sys/devices/cpu/events/cpu-cycles + +Date: 2013/01/08 + +Contact: Linux kernel mailing list + +Description: Generic performance monitoring events + + A collection of performance monitoring events that may be + supported by many/most CPUs. These events can be monitored + using the 'perf(1)' tool. + + The contents of each file would look like: + + event=0x + + where 'N' is a hex digit. + + +What: /sys/devices/cpu/events/PM_LD_MISS_L1 + /sys/devices/cpu/events/PM_LD_REF_L1 + /sys/devices/cpu/events/PM_CYC + /sys/devices/cpu/events/PM_BRU_FIN + /sys/devices/cpu/events/PM_GCT_NOSLOT_CYC + /sys/devices/cpu/events/PM_BRU_MPRED + /sys/devices/cpu/events/PM_INST_CMPL + /sys/devices/cpu/events/PM_CMPLU_STALL + +Date: 2013/01/08 + +Contact: Linux kernel mailing list + Linux Powerpc mailing list + +Description: POWER specific performance monitoring events + + A collection of performance monitoring events that may be + supported by the POWER CPU. These events can be monitored + using the 'perf(1)' tool. + + These events may not be supported by other CPUs. + + The contents of each file would look like: + + event=0x + + where 'N' is a hex digit. diff --git a/Documentation/ABI/stable/sysfs-devices-cpu-format b/Documentation/ABI/stable/sysfs-devices-cpu-format new file mode 100644 index 000..b15cfb2 --- /dev/null +++ b/Documentation/ABI/stable/sysfs-devices-cpu-format @@ -0,0 +1,27 @@ +What: /sys/devices/cpu/format/ + /sys/devices/cpu/format/event + +Date: 2013/01/08 + +Contact: Linux kernel mailing list + +Description: Format of performance monitoring events + + Each CPU/architecture may use different format to represent + the perf event. The 'event' file describes the configuration + format of the performance monitoring event on the CPU/system. + + The contents of each file would look like: + + config:m-n + + where m and n are the starting and ending bits that are + used to represent the event. + + For example, on POWER, + + $ cat /sys/devices/cpu/format/event + config:0-20 + + meaning that POWER uses the first 20-bits to represent a perf + event. -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 5/6][v3] perf: Create a sysfs entry for Power event format
[PATCH 5/6][v3] perf: Create a sysfs entry for Power event format Create a sysfs entry, '/sys/bus/event_source/devices/cpu/format/event' which describes the format of a POWER cpu. The format of the event is the same for all POWER cpus at least in (Power6, Power7), so bulk of this change is common in the code common to POWER cpus. This code is based on corresponding code in x86. Changelog[v2]: [Jiri Olsa] Use PMU_FORMAT_ATTR() rather than duplicating it. Signed-off-by: Sukadev Bhattiprolu --- arch/powerpc/include/asm/perf_event_server.h |6 ++ arch/powerpc/perf/core-book3s.c | 12 arch/powerpc/perf/power7-pmu.c |1 + 3 files changed, 19 insertions(+), 0 deletions(-) diff --git a/arch/powerpc/include/asm/perf_event_server.h b/arch/powerpc/include/asm/perf_event_server.h index b29fcc6..ee63205 100644 --- a/arch/powerpc/include/asm/perf_event_server.h +++ b/arch/powerpc/include/asm/perf_event_server.h @@ -135,3 +135,9 @@ extern ssize_t power_events_sysfs_show(struct device *dev, #definePOWER_EVENT_ATTR(_name, _id)EVENT_ATTR(PM_##_name, _id, _p) #definePOWER_EVENT_PTR(_id)EVENT_PTR(_id, _p) + +/* + * Format of a perf event is the same on all POWER cpus. Declare a + * common sysfs attribute group that individual POWER cpus can share. + */ +extern struct attribute_group power_pmu_format_group; diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c index fa476d5..4ae044b 100644 --- a/arch/powerpc/perf/core-book3s.c +++ b/arch/powerpc/perf/core-book3s.c @@ -1315,6 +1315,18 @@ ssize_t power_events_sysfs_show(struct device *dev, return sprintf(page, "event=0x%02llx\n", pmu_attr->id); } +PMU_FORMAT_ATTR(event, "config:0-20"); + +static struct attribute *power_pmu_format_attr[] = { + &format_attr_event.attr, + NULL, +}; + +struct attribute_group power_pmu_format_group = { + .name = "format", + .attrs = power_pmu_format_attr, +}; + struct pmu power_pmu = { .pmu_enable = power_pmu_enable, .pmu_disable= power_pmu_disable, diff --git a/arch/powerpc/perf/power7-pmu.c b/arch/powerpc/perf/power7-pmu.c index 5627940..5fb3c9b 100644 --- a/arch/powerpc/perf/power7-pmu.c +++ b/arch/powerpc/perf/power7-pmu.c @@ -410,6 +410,7 @@ static struct attribute_group power7_pmu_events_group = { }; static const struct attribute_group *power7_pmu_attr_groups[] = { + &power_pmu_format_group, &power7_pmu_events_group, NULL, }; -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 4/6][v3] perf/POWER7: Make some POWER7 events available in sysfs
[PATCH 4/6][v3] perf/POWER7: Make some POWER7 events available in sysfs Make some POWER7-specific perf events available in sysfs. $ /bin/ls -1 /sys/bus/event_source/devices/cpu/events/ branch-instructions branch-misses cache-misses cache-references cpu-cycles instructions PM_BRU_FIN PM_BRU_MPRED PM_CMPLU_STALL PM_CYC PM_GCT_NOSLOT_CYC PM_INST_CMPL PM_LD_MISS_L1 PM_LD_REF_L1 stalled-cycles-backend stalled-cycles-frontend where the 'PM_*' events are POWER specific and the others are the generic events. This will enable users to specify these events with their symbolic names rather than with their raw code. perf stat -e 'cpu/PM_CYC/' ... Signed-off-by: Sukadev Bhattiprolu --- arch/powerpc/include/asm/perf_event_server.h |2 ++ arch/powerpc/perf/power7-pmu.c | 18 ++ 2 files changed, 20 insertions(+), 0 deletions(-) diff --git a/arch/powerpc/include/asm/perf_event_server.h b/arch/powerpc/include/asm/perf_event_server.h index 3f21d89..b29fcc6 100644 --- a/arch/powerpc/include/asm/perf_event_server.h +++ b/arch/powerpc/include/asm/perf_event_server.h @@ -133,3 +133,5 @@ extern ssize_t power_events_sysfs_show(struct device *dev, #defineGENERIC_EVENT_ATTR(_name, _id) EVENT_ATTR(_name, _id, _g) #defineGENERIC_EVENT_PTR(_id) EVENT_PTR(_id, _g) +#definePOWER_EVENT_ATTR(_name, _id)EVENT_ATTR(PM_##_name, _id, _p) +#definePOWER_EVENT_PTR(_id)EVENT_PTR(_id, _p) diff --git a/arch/powerpc/perf/power7-pmu.c b/arch/powerpc/perf/power7-pmu.c index ae5d757..5627940 100644 --- a/arch/powerpc/perf/power7-pmu.c +++ b/arch/powerpc/perf/power7-pmu.c @@ -373,6 +373,15 @@ GENERIC_EVENT_ATTR(cache-misses, LD_MISS_L1); GENERIC_EVENT_ATTR(branch-instructions,BRU_FIN); GENERIC_EVENT_ATTR(branch-misses, BRU_MPRED); +POWER_EVENT_ATTR(CYC, CYC); +POWER_EVENT_ATTR(GCT_NOSLOT_CYC, GCT_NOSLOT_CYC); +POWER_EVENT_ATTR(CMPLU_STALL, CMPLU_STALL); +POWER_EVENT_ATTR(INST_CMPL,INST_CMPL); +POWER_EVENT_ATTR(LD_REF_L1,LD_REF_L1); +POWER_EVENT_ATTR(LD_MISS_L1, LD_MISS_L1); +POWER_EVENT_ATTR(BRU_FIN, BRU_FIN) +POWER_EVENT_ATTR(BRU_MPRED,BRU_MPRED); + static struct attribute *power7_events_attr[] = { GENERIC_EVENT_PTR(CYC), GENERIC_EVENT_PTR(GCT_NOSLOT_CYC), @@ -382,6 +391,15 @@ static struct attribute *power7_events_attr[] = { GENERIC_EVENT_PTR(LD_MISS_L1), GENERIC_EVENT_PTR(BRU_FIN), GENERIC_EVENT_PTR(BRU_MPRED), + + POWER_EVENT_PTR(CYC), + POWER_EVENT_PTR(GCT_NOSLOT_CYC), + POWER_EVENT_PTR(CMPLU_STALL), + POWER_EVENT_PTR(INST_CMPL), + POWER_EVENT_PTR(LD_REF_L1), + POWER_EVENT_PTR(LD_MISS_L1), + POWER_EVENT_PTR(BRU_FIN), + POWER_EVENT_PTR(BRU_MPRED), NULL }; -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 3/6][v3] perf/POWER7: Make generic event translations available in sysfs
[PATCH 3/6][v3] perf/POWER7: Make generic event translations available in sysfs Make the generic perf events in POWER7 available via sysfs. $ ls /sys/bus/event_source/devices/cpu/events branch-instructions branch-misses cache-misses cache-references cpu-cycles instructions stalled-cycles-backend stalled-cycles-frontend $ cat /sys/bus/event_source/devices/cpu/events/cache-misses event=0x400f0 This patch is based on commits that implement this functionality on x86. Eg: commit a47473939db20e3961b200eb00acf5fcf084d755 Author: Jiri Olsa Date: Wed Oct 10 14:53:11 2012 +0200 perf/x86: Make hardware event translations available in sysfs Changelog:[v3] [Jiri Olsa] Drop EVENT_ID() macro since it is only used once. Signed-off-by: Sukadev Bhattiprolu --- arch/powerpc/include/asm/perf_event_server.h | 24 ++ arch/powerpc/perf/core-book3s.c | 12 +++ arch/powerpc/perf/power7-pmu.c| 34 + 3 files changed, 70 insertions(+), 0 deletions(-) create mode 100644 Documentation/ABI/stable/sysfs-devices-cpu-events diff --git a/Documentation/ABI/stable/sysfs-devices-cpu-events b/Documentation/ABI/stable/sysfs-devices-cpu-events new file mode 100644 index 000..e69de29 diff --git a/arch/powerpc/include/asm/perf_event_server.h b/arch/powerpc/include/asm/perf_event_server.h index 9710be3..3f21d89 100644 --- a/arch/powerpc/include/asm/perf_event_server.h +++ b/arch/powerpc/include/asm/perf_event_server.h @@ -11,6 +11,7 @@ #include #include +#include #define MAX_HWEVENTS 8 #define MAX_EVENT_ALTERNATIVES 8 @@ -35,6 +36,7 @@ struct power_pmu { void(*disable_pmc)(unsigned int pmc, unsigned long mmcr[]); int (*limited_pmc_event)(u64 event_id); u32 flags; + const struct attribute_group**attr_groups; int n_generic; int *generic_events; int (*cache_events)[PERF_COUNT_HW_CACHE_MAX] @@ -109,3 +111,25 @@ extern unsigned long perf_instruction_pointer(struct pt_regs *regs); * If an event_id is not subject to the constraint expressed by a particular * field, then it will have 0 in both the mask and value for that field. */ + +extern ssize_t power_events_sysfs_show(struct device *dev, + struct device_attribute *attr, char *page); + +/* + * EVENT_VAR() is same as PMU_EVENT_VAR with a suffix. + * + * Having a suffix allows us to have aliases in sysfs - eg: the generic + * event 'cpu-cycles' can have two entries in sysfs: 'cpu-cycles' and + * 'PM_CYC' where the latter is the name by which the event is known in + * POWER CPU specification. + */ +#defineEVENT_VAR(_id, _suffix) event_attr_##_id##_suffix +#defineEVENT_PTR(_id, _suffix) &EVENT_VAR(_id, _suffix) + +#defineEVENT_ATTR(_name, _id, _suffix) \ + PMU_EVENT_ATTR(_name, EVENT_VAR(_id, _suffix), PME_PM_##_id,\ + power_events_sysfs_show) + +#defineGENERIC_EVENT_ATTR(_name, _id) EVENT_ATTR(_name, _id, _g) +#defineGENERIC_EVENT_PTR(_id) EVENT_PTR(_id, _g) + diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c index aa2465e..fa476d5 100644 --- a/arch/powerpc/perf/core-book3s.c +++ b/arch/powerpc/perf/core-book3s.c @@ -1305,6 +1305,16 @@ static int power_pmu_event_idx(struct perf_event *event) return event->hw.idx; } +ssize_t power_events_sysfs_show(struct device *dev, + struct device_attribute *attr, char *page) +{ + struct perf_pmu_events_attr *pmu_attr; + + pmu_attr = container_of(attr, struct perf_pmu_events_attr, attr); + + return sprintf(page, "event=0x%02llx\n", pmu_attr->id); +} + struct pmu power_pmu = { .pmu_enable = power_pmu_enable, .pmu_disable= power_pmu_disable, @@ -1537,6 +1547,8 @@ int __cpuinit register_power_pmu(struct power_pmu *pmu) pr_info("%s performance monitor hardware support registered\n", pmu->name); + power_pmu.attr_groups = ppmu->attr_groups; + #ifdef MSR_HV /* * Use FCHV to ignore kernel events if MSR.HV is set. diff --git a/arch/powerpc/perf/power7-pmu.c b/arch/powerpc/perf/power7-pmu.c index 44e70d2..ae5d757 100644 --- a/arch/powerpc/perf/power7-pmu.c +++ b/arch/powerpc/perf/power7-pmu.c @@ -363,6 +363,39 @@ static int power7_cache_events[C(MAX)][C(OP_MAX)][C(RESULT_MAX)] = { }, }; + +GENERIC_EVENT_ATTR(cpu-cycles, CYC); +GENERIC_EVENT_ATTR(stalled-cycles-frontend,GCT_NOSLOT_CYC); +GENERIC_EVENT_ATTR(stalled-cycles-backend, CMPLU_STALL); +GENERIC_EVENT_ATTR(instructions, INST_CMPL); +GENERIC_EVENT_ATTR(cache-references,
[PATCH 2/6][v3] perf: Make EVENT_ATTR global
[PATCH 2/6][v3] perf: Make EVENT_ATTR global Rename EVENT_ATTR() to PMU_EVENT_ATTR() and make it global so it is available to all architectures. Further to allow architectures flexibility, have PMU_EVENT_ATTR() pass in the variable name as a parameter. Changelog[v3] - [Jiri Olsa] No need to define PMU_EVENT_PTR() Signed-off-by: Sukadev Bhattiprolu --- arch/x86/kernel/cpu/perf_event.c | 13 +++-- include/linux/perf_event.h | 11 +++ 2 files changed, 14 insertions(+), 10 deletions(-) diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c index 4428fd1..59a1238 100644 --- a/arch/x86/kernel/cpu/perf_event.c +++ b/arch/x86/kernel/cpu/perf_event.c @@ -1316,11 +1316,6 @@ static struct attribute_group x86_pmu_format_group = { .attrs = NULL, }; -struct perf_pmu_events_attr { - struct device_attribute attr; - u64 id; -}; - /* * Remove all undefined events (x86_pmu.event_map(id) == 0) * out of events_attr attributes. @@ -1354,11 +1349,9 @@ static ssize_t events_sysfs_show(struct device *dev, struct device_attribute *at #define EVENT_VAR(_id) event_attr_##_id #define EVENT_PTR(_id) &event_attr_##_id.attr.attr -#define EVENT_ATTR(_name, _id) \ -static struct perf_pmu_events_attr EVENT_VAR(_id) = { \ - .attr = __ATTR(_name, 0444, events_sysfs_show, NULL), \ - .id = PERF_COUNT_HW_##_id, \ -}; +#define EVENT_ATTR(_name, _id) \ + PMU_EVENT_ATTR(_name, EVENT_VAR(_id), PERF_COUNT_HW_##_id, \ + events_sysfs_show) EVENT_ATTR(cpu-cycles, CPU_CYCLES ); EVENT_ATTR(instructions, INSTRUCTIONS); diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index 6bfb2fa..42adf01 100644 --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -817,6 +817,17 @@ do { \ } while (0) +struct perf_pmu_events_attr { + struct device_attribute attr; + u64 id; +}; + +#define PMU_EVENT_ATTR(_name, _var, _id, _show) \ +static struct perf_pmu_events_attr _var = {\ + .attr = __ATTR(_name, 0444, _show, NULL), \ + .id = _id, \ +}; + #define PMU_FORMAT_ATTR(_name, _format) \ static ssize_t \ _name##_show(struct device *dev, \ -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 1/6][v3] perf/Power7: Use macros to identify perf events
[PATCH 1/6][v3] perf/Power7: Use macros to identify perf events Define and use macros to identify perf events codes. This would make it easier and more readable when these event codes need to be used in more than one place. Signed-off-by: Sukadev Bhattiprolu --- arch/powerpc/perf/power7-pmu.c | 28 1 files changed, 20 insertions(+), 8 deletions(-) diff --git a/arch/powerpc/perf/power7-pmu.c b/arch/powerpc/perf/power7-pmu.c index 441af08..44e70d2 100644 --- a/arch/powerpc/perf/power7-pmu.c +++ b/arch/powerpc/perf/power7-pmu.c @@ -51,6 +51,18 @@ #define MMCR1_PMCSEL_MSK 0xff /* + * Power7 event codes. + */ +#definePME_PM_CYC 0x1e +#definePME_PM_GCT_NOSLOT_CYC 0x100f8 +#definePME_PM_CMPLU_STALL 0x4000a +#definePME_PM_INST_CMPL0x2 +#definePME_PM_LD_REF_L10xc880 +#definePME_PM_LD_MISS_L1 0x400f0 +#definePME_PM_BRU_FIN 0x10068 +#definePME_PM_BRU_MPRED0x400f6 + +/* * Layout of constraint bits: * 554433221100 * 3210987654321098765432109876543210987654321098765432109876543210 @@ -296,14 +308,14 @@ static void power7_disable_pmc(unsigned int pmc, unsigned long mmcr[]) } static int power7_generic_events[] = { - [PERF_COUNT_HW_CPU_CYCLES] = 0x1e, - [PERF_COUNT_HW_STALLED_CYCLES_FRONTEND] = 0x100f8, /* GCT_NOSLOT_CYC */ - [PERF_COUNT_HW_STALLED_CYCLES_BACKEND] = 0x4000a, /* CMPLU_STALL */ - [PERF_COUNT_HW_INSTRUCTIONS] = 2, - [PERF_COUNT_HW_CACHE_REFERENCES] = 0xc880, /* LD_REF_L1_LSU*/ - [PERF_COUNT_HW_CACHE_MISSES] = 0x400f0, /* LD_MISS_L1 */ - [PERF_COUNT_HW_BRANCH_INSTRUCTIONS] = 0x10068, /* BRU_FIN */ - [PERF_COUNT_HW_BRANCH_MISSES] = 0x400f6,/* BR_MPRED */ + [PERF_COUNT_HW_CPU_CYCLES] =PME_PM_CYC, + [PERF_COUNT_HW_STALLED_CYCLES_FRONTEND] = PME_PM_GCT_NOSLOT_CYC, + [PERF_COUNT_HW_STALLED_CYCLES_BACKEND] =PME_PM_CMPLU_STALL, + [PERF_COUNT_HW_INSTRUCTIONS] = PME_PM_INST_CMPL, + [PERF_COUNT_HW_CACHE_REFERENCES] = PME_PM_LD_REF_L1, + [PERF_COUNT_HW_CACHE_MISSES] = PME_PM_LD_MISS_L1, + [PERF_COUNT_HW_BRANCH_INSTRUCTIONS] = PME_PM_BRU_FIN, + [PERF_COUNT_HW_BRANCH_MISSES] = PME_PM_BRU_MPRED, }; #define C(x) PERF_COUNT_HW_CACHE_##x -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] MAINTAINERS: mm: add additional include files to listing
Add gfp.h, mmzone.h, memory_hotplug.h & vmalloc.h to the "MEMORY MANAGMENT" section so scripts/get_maintainer.pl can do a better job of making recommendations. Signed-off-by: Cody P Schafer --- MAINTAINERS | 4 1 file changed, 4 insertions(+) diff --git a/MAINTAINERS b/MAINTAINERS index 915564e..e77ef28 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -5003,6 +5003,10 @@ L: linux...@kvack.org W: http://www.linux-mm.org S: Maintained F: include/linux/mm.h +F: include/linux/gfp.h +F: include/linux/mmzone.h +F: include/linux/memory_hotplug.h +F: include/linux/vmalloc.h F: mm/ MEMORY RESOURCE CONTROLLER -- 1.8.0.3 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/4] module: add syscall to load module from fd
On Wed, Jan 9, 2013 at 6:29 PM, Lucas De Marchi wrote: > On Sun, Jan 6, 2013 at 4:59 PM, Michael Kerrisk (man-pages) > wrote: >> Hi Rusty, (and Lucas, and Kees) >> >> On Thu, Jan 3, 2013 at 1:12 AM, Rusty Russell wrote: >>> Michael Kerrisk writes: Hi Rusty, >>> >>> Hi Michael, >>> The description here is rather thin. Could you supply a sentence or two for each of MODULE_INIT_IGNORE_MODVERSIONS and MODULE_INIT_IGNORE_VERMAGIC that would be suitable for the manual page? Thanks, >>> >>> There are one or two safety checks built into a module, which are >>> checked to match the kernel on module load. The first is a "vermagic" >>> string containing the kernel version number and prominent features (such >>> as CPU type). If the module was built with CONFIG_MODVERSIONS set, a >>> version hash is recorded for each symbol the module uses based on the >>> types it refers to: in this case, the kernel version number within the >>> "vermagic" string is ignored, as the symbol version hashes are assumed >>> to be sufficiently reliable. >>> >>> Using the MODULE_INIT_IGNORE_VERMAGIC flag indicates that the vermagic >>> is to be ignored, and the MODULE_INIT_IGNORE_MODVERSIONS flag indicates >>> that the version hashes are to be ignored. If the kernel is built to >>> permit such forced loading (ie. CONFIG_MODULE_FORCE_LOAD is set) then >>> loading will continue, otherwise it will fail with ENOEXEC as expected >>> for malformed modules. >>> >>> Hope that is more usable? >> >> Yes, that helps. I did some reworking of that text. Hopefully, I did >> not introduce any errors. >> >> Below is the text that is proposed to document finit_module() in the >> man pages. I'd appreciate any review (Kees, Lucas, Rusty?) >> >> Thanks, >> >> Michael >> >>finit_module() >>The finit_module() system call is like init_module(), but reads >>the module to be loaded from the file descriptor fd. It is >>useful when the authenticity of a kernel module can be deter‐ >>mined from its location in the file system; in cases where that >>is possible, the overhead of using cryptographically signed >>modules to determine the authenticity of a module can be >>avoided. The param_values argument is as for init_module(). >> >>The flags argument modifies the operation of finit_module(). >>It is a bit mask value created by ORing together zero or more >>of the following flags: >> >>MODULE_INIT_IGNORE_MODVERSIONS >> Ignore symbol version hashes. >> >>MODULE_INIT_IGNORE_VERMAGIC >> Ignore kernel version magic. >> >>There are some safety checks built into a module to ensure that >>it matches the kernel against which it is loaded. These checks >>are recorded when the module is built and verified when the >>module is loaded. First, the module records a "vermagic" >>string containing the kernel version number and prominent fea‐ >>tures (such as the CPU type). Second, if the module was built >>with the CONFIG_MODVERSIONS configuration option enabled, a >>version hash is recorded for each symbol the module uses. This >>hash is based on the types of the arguments and return value >>for the function named by the symbol. In this case, the kernel >>version number within the "vermagic" string is ignored, as the >>symbol version hashes are assumed to be sufficiently reliable. >> >>Using the MODULE_INIT_IGNORE_VERMAGIC flag indicates that the >>"vermagic" string is to be ignored, and the MOD‐ >>ULE_INIT_IGNORE_MODVERSIONS flag indicates that the symbol ver‐ >>sion hashes are to be ignored. If the kernel is built to per‐ >>mit forced loading (i.e., configured with CONFIG_MOD‐ >>ULE_FORCE_LOAD), then loading will continue, otherwise it will >>fail with ENOEXEC as expected for malformed modules. >> ... >>ERRORS >> ... >>The following errors may additionally occur for finit_module(): >> >>EBADF The file referred to by fd is not opened for reading. >> >>EFBIG The file referred to by fd is too large. >> >>EINVAL flags is invalid. >> >>ENOEXEC >> fd does not refer to an open file. >> >> > > > Looks good to me. Thanks for looking it over, Lucas. Cheers, Michael -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Author of "The Linux Programming Interface"; http://man7.org/tlpi/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RESEND v1 06/16] vfs: add temp calculation function
On Thu, Dec 20, 2012 at 10:43:25PM +0800, zwu.ker...@gmail.com wrote: > --- a/fs/hot_tracking.c > +++ b/fs/hot_tracking.c > @@ -25,6 +25,14 @@ > static struct kmem_cache *hot_inode_item_cachep __read_mostly; > static struct kmem_cache *hot_range_item_cachep __read_mostly; > > +static u64 hot_raw_shift(u64 counter, u32 bits, bool dir) > +{ > + if (dir) > + return counter << bits; > + else > + return counter >> bits; > +} I don't understand the purpose of this function, it obscures a simple bitwise shift. > + > /* > * Initialize the inode tree. Should be called for each new inode > * access or other user of the hot_inode interface. > @@ -315,6 +323,72 @@ static void hot_freq_data_update(struct hot_freq_data > *freq_data, bool write) > } > > /* > + * hot_temp_calc() is responsible for distilling the six heat > + * criteria down into a single temperature value for the data, > + * which is an integer between 0 and HEAT_MAX_VALUE. I didn't find HEAT_MAX_VALUE defined anywhere. > + */ > +static u32 hot_temp_calc(struct hot_freq_data *freq_data) > +{ > + u32 result = 0; > + > + struct timespec ckt = current_kernel_time(); > + u64 cur_time = timespec_to_ns(&ckt); > + > + u32 nrr_heat = (u32)hot_raw_shift((u64)freq_data->nr_reads, > + NRR_MULTIPLIER_POWER, true); > + u32 nrw_heat = (u32)hot_raw_shift((u64)freq_data->nr_writes, > + NRW_MULTIPLIER_POWER, true); So many typecasts, some of them unnecessary and in connection with hot_raw_shift this is hard to read and understand. u32 nrr_heat = (u32)((u64)freq_data->nr_reads << NRR_MULTIPLIER_POWER); is not much better without a comment why this is doing the right thing. > + > + u64 ltr_heat = > + hot_raw_shift((cur_time - timespec_to_ns(&freq_data->last_read_time)), > + LTR_DIVIDER_POWER, false); > + u64 ltw_heat = > + hot_raw_shift((cur_time - timespec_to_ns(&freq_data->last_write_time)), > + LTW_DIVIDER_POWER, false); > + > + u64 avr_heat = > + hot_raw_shiftu64) -1) - freq_data->avg_delta_reads), > + AVR_DIVIDER_POWER, false); > + u64 avw_heat = > + hot_raw_shiftu64) -1) - freq_data->avg_delta_writes), > + AVW_DIVIDER_POWER, false); > + > + /* ltr_heat is now guaranteed to be u32 safe */ > + if (ltr_heat >= hot_raw_shift((u64) 1, 32, true)) > + ltr_heat = 0; > + else > + ltr_heat = hot_raw_shift((u64) 1, 32, true) - ltr_heat; > + > + /* ltw_heat is now guaranteed to be u32 safe */ > + if (ltw_heat >= hot_raw_shift((u64) 1, 32, true)) > + ltw_heat = 0; > + else > + ltw_heat = hot_raw_shift((u64) 1, 32, true) - ltw_heat; > + > + /* avr_heat is now guaranteed to be u32 safe */ > + if (avr_heat >= hot_raw_shift((u64) 1, 32, true)) > + avr_heat = (u32) -1; > + > + /* avw_heat is now guaranteed to be u32 safe */ > + if (avw_heat >= hot_raw_shift((u64) 1, 32, true)) > + avw_heat = (u32) -1; > + > + nrr_heat = (u32)hot_raw_shift((u64)nrr_heat, > + (3 - NRR_COEFF_POWER), false); > + nrw_heat = (u32)hot_raw_shift((u64)nrw_heat, > + (3 - NRW_COEFF_POWER), false); > + ltr_heat = hot_raw_shift(ltr_heat, (3 - LTR_COEFF_POWER), false); > + ltw_heat = hot_raw_shift(ltw_heat, (3 - LTW_COEFF_POWER), false); > + avr_heat = hot_raw_shift(avr_heat, (3 - AVR_COEFF_POWER), false); > + avw_heat = hot_raw_shift(avw_heat, (3 - AVW_COEFF_POWER), false); > + > + result = nrr_heat + nrw_heat + (u32) ltr_heat + > + (u32) ltw_heat + (u32) avr_heat + (u32) avw_heat; Reading through the function up to here I've got lost in the shifts that I don't see the meaning of the resulting value and how can I interpet it if I watch it change over time. What are the expected weights of the number and time factors? There are more details in the documentation, but the big picture is blurred by talking implementation details. Let's put the impl. details here and write a better user documentation with a few examples to the docs. Is it possible to describe some common access patterns and how they affect the temperature? You've been benchmarking this patchset, I'm sure you can write up a few examples based on that. > + > + return result; > +} david -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RESEND v1 05/16] vfs: add hooks to enable hot tracking
On Thu, Dec 20, 2012 at 10:43:24PM +0800, zwu.ker...@gmail.com wrote: > --- a/fs/direct-io.c > +++ b/fs/direct-io.c > @@ -37,6 +37,7 @@ > #include > #include > #include > +#include "hot_tracking.h" > > /* > * How many user pages to map in one call to get_user_pages(). This > determines > @@ -1299,6 +1300,11 @@ __blockdev_direct_IO(int rw, struct kiocb *iocb, > struct inode *inode, > prefetch(bdev->bd_queue); > prefetch((char *)bdev->bd_queue + SMP_CACHE_BYTES); > > + /* Hot data tracking */ > + hot_update_freqs(inode, offset, > + iov_length(iov, nr_segs), > + rw & WRITE); hot_update_freqs takes an 'int rw' directly, so you should pass plain 'rw' here and do the 'rw & WRITE' check in hot_freq_data_update itself. > + > return do_blockdev_direct_IO(rw, iocb, inode, bdev, iov, offset, >nr_segs, get_block, end_io, >submit_io, flags); > --- a/mm/page-writeback.c > +++ b/mm/page-writeback.c > @@ -35,6 +35,7 @@ > #include /* __set_page_dirty_buffers */ > #include > #include > +#include > #include > > /* > @@ -1902,13 +1903,24 @@ EXPORT_SYMBOL(generic_writepages); > int do_writepages(struct address_space *mapping, struct writeback_control > *wbc) > { > int ret; > + loff_t start = 0; > + size_t count = 0; > > if (wbc->nr_to_write <= 0) > return 0; > + > + start = mapping->writeback_index << PAGE_CACHE_SHIFT; > + count = wbc->nr_to_write; > + > if (mapping->a_ops->writepages) > ret = mapping->a_ops->writepages(mapping, wbc); > else > ret = generic_writepages(mapping, wbc); > + > + /* Hot data tracking */ > + hot_update_freqs(mapping->host, start, > + (count - wbc->nr_to_write) * PAGE_CACHE_SIZE, 1); I think the frequencies should not be updated in case of error returned from writepages. > + > return ret; > } > > --- a/mm/readahead.c > +++ b/mm/readahead.c > @@ -138,6 +139,12 @@ static int read_pages(struct address_space *mapping, > struct file *filp, > out: > blk_finish_plug(&plug); > > + /* Hot data tracking */ > + hot_update_freqs(mapping->host, > + (loff_t)(list_entry(pages->prev, struct page, lru)->index) > + << PAGE_CACHE_SHIFT, > + (size_t)nr_pages * PAGE_CACHE_SIZE, 0); same comment here > + > return ret; > } david -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RESEND v1 04/16] vfs: add two map arrays
On Thu, Dec 20, 2012 at 10:43:23PM +0800, zwu.ker...@gmail.com wrote: > --- a/fs/hot_tracking.c > +++ b/fs/hot_tracking.c > +/* Free inode and range map info */ > +static void hot_map_exit(struct hot_info *root) > +{ > + int i; > + for (i = 0; i < HEAT_MAP_SIZE; i++) { > + spin_lock(&root->heat_inode_map[i].lock); > + hot_map_list_free(&root->heat_inode_map[i].node_list, root); > + spin_unlock(&root->heat_inode_map[i].lock); please insert an empty line here to improve readability > + spin_lock(&root->heat_range_map[i].lock); > + hot_map_list_free(&root->heat_range_map[i].node_list, root); > + spin_unlock(&root->heat_range_map[i].lock); > + } > +} > + > +/* > * Initialize kmem cache for hot_inode_item and hot_range_item. > */ > void __init hot_cache_init(void) > --- a/include/linux/hot_tracking.h > +++ b/include/linux/hot_tracking.h > @@ -71,6 +82,12 @@ struct hot_range_item { > struct hot_info { > struct hot_rb_tree hot_inode_tree; > spinlock_t lock; /*protect inode tree */ > + > + /* map of inode temperature */ > + struct hot_map_head heat_inode_map[HEAT_MAP_SIZE]; > + /* map of range temperature */ > + struct hot_map_head heat_range_map[HEAT_MAP_SIZE]; > + unsigned int hot_map_nr; > }; Final layout of struct hot_info is struct hot_info { struct hot_rb_tree hot_inode_tree; /* 0 8 */ spinlock_t lock; /* 872 */ /* --- cacheline 1 boundary (64 bytes) was 16 bytes ago --- */ struct hot_map_headheat_inode_map[256]; /*80 24576 */ /* --- cacheline 385 boundary (24640 bytes) was 16 bytes ago --- */ struct hot_map_headheat_range_map[256]; /* 24656 24576 */ /* --- cacheline 769 boundary (49216 bytes) was 16 bytes ago --- */ unsigned int hot_map_nr; /* 49232 4 */ /* XXX 4 bytes hole, try to pack */ struct workqueue_struct * update_wq;/* 49240 8 */ struct delayed_workupdate_work; /* 49248 216 */ /* XXX last struct has 4 bytes of padding */ /* --- cacheline 772 boundary (49408 bytes) was 56 bytes ago --- */ struct hot_type * hot_type; /* 49464 8 */ /* --- cacheline 773 boundary (49472 bytes) --- */ struct shrinkerhot_shrink; /* 4947248 */ struct dentry *vol_dentry; /* 49520 8 */ /* size: 49528, cachelines: 774, members: 10 */ /* sum members: 49524, holes: 1, sum holes: 4 */ /* paddings: 1, sum paddings: 4 */ /* last cacheline: 56 bytes */ }; that's an order-4 allocation and the heat_*_map[] themselves need order-3. Also the structure struct hot_map_head { struct list_head node_list;/* 016 */ u8 temp; /*16 1 */ /* XXX 7 bytes hole, try to pack */ spinlock_t lock; /*2472 */ /* --- cacheline 1 boundary (64 bytes) was 32 bytes ago --- */ /* size: 96, cachelines: 2, members: 3 */ /* sum members: 89, holes: 1, sum holes: 7 */ /* last cacheline: 32 bytes */ }; is not packed efficiently and given the number of the array items, the wasted space adds to the sum. So, this needs to be fixed. Options I see: 1) try to allocate the structure with GFP_NOWARN and use vmalloc as a fallback 2) allocate heat_*_map arrays dynamically An array of 256 pointers takes 2048 bytes, so when there are 2 of them plus other struct items, overall size will go beyond a 4k page. Also, doing kmalloc on each heat_*_map item could spread them over memory, although hot_info is a long-term structure and it would make sense to keep the data located at one place. For struct hot_map_head I suggest to create a slab. david -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RESEND v1 03/16] vfs: add I/O frequency update function
On Thu, Dec 20, 2012 at 10:43:22PM +0800, zwu.ker...@gmail.com wrote: > --- a/fs/hot_tracking.c > +++ b/fs/hot_tracking.c > @@ -164,6 +164,135 @@ static void hot_inode_tree_exit(struct hot_info *root) > spin_unlock(&root->lock); > } > > +struct hot_inode_item > +*hot_inode_item_lookup(struct hot_info *root, u64 ino) > +{ > + struct rb_node **p = &root->hot_inode_tree.map.rb_node; > + struct rb_node *parent = NULL; > + struct hot_comm_item *ci; > + struct hot_inode_item *entry; > + > + /* walk tree to find insertion point */ > + spin_lock(&root->lock); > + while (*p) { > + parent = *p; > + ci = rb_entry(parent, struct hot_comm_item, rb_node); > + entry = container_of(ci, struct hot_inode_item, hot_inode); > + if (ino < entry->i_ino) > + p = &(*p)->rb_left; > + else if (ino > entry->i_ino) > + p = &(*p)->rb_right; style comment: put { } around the all if/else blocks > + else { > + spin_unlock(&root->lock); > + kref_get(&entry->hot_inode.refs); jumping forwards in the series, the spin_unlock and kref_get get swapped later, and I think that's the right order. Otherwise there's a small window where the entry does not get the reference and could be potentially freed by racing kref_put, no? spin_unlock(tree) spin_lock(tree) kref_put(E) or via hot_inode_item_put(E) (1) kref_get(E) (2) if the reference count at (1) was 1, it's freed and (2) hits a free memory. hot_inode_item_put can be called from filesystem or via seq print of the respective /proc files, so I think there are chances to hit the problem. > + return entry; > + } > + } > + spin_unlock(&root->lock); > + > + entry = kmem_cache_zalloc(hot_inode_item_cachep, GFP_NOFS); > + if (!entry) > + return ERR_PTR(-ENOMEM); > + > + spin_lock(&root->lock); > + hot_inode_item_init(entry, ino, &root->hot_inode_tree); > + rb_link_node(&entry->hot_inode.rb_node, parent, p); > + rb_insert_color(&entry->hot_inode.rb_node, > + &root->hot_inode_tree.map); > + spin_unlock(&root->lock); > + > + kref_get(&entry->hot_inode.refs); Similar here, the entry is inserted into the tree but there's no refcount yet. And the order of spin_unlock/kref_get remains unchanged. > + return entry; > +} > +EXPORT_SYMBOL_GPL(hot_inode_item_lookup); > + > +static struct hot_range_item > +*hot_range_item_lookup(struct hot_inode_item *he, > + loff_t start) > +{ > + struct rb_node **p = &he->hot_range_tree.map.rb_node; > + struct rb_node *parent = NULL; > + struct hot_comm_item *ci; > + struct hot_range_item *entry; > + > + /* walk tree to find insertion point */ > + spin_lock(&he->lock); > + while (*p) { > + parent = *p; > + ci = rb_entry(parent, struct hot_comm_item, rb_node); > + entry = container_of(ci, struct hot_range_item, hot_range); > + if (start < entry->start) > + p = &(*p)->rb_left; > + else if (start > hot_range_end(entry)) > + p = &(*p)->rb_right; if { ...} else if { ... } > + else { > + spin_unlock(&he->lock); > + kref_get(&entry->hot_range.refs); same here > + return entry; > + } > + } > + spin_unlock(&he->lock); > + > + entry = kmem_cache_zalloc(hot_range_item_cachep, GFP_NOFS); > + if (!entry) > + return ERR_PTR(-ENOMEM); > + > + spin_lock(&he->lock); > + hot_range_item_init(entry, start, he); > + rb_link_node(&entry->hot_range.rb_node, parent, p); > + rb_insert_color(&entry->hot_range.rb_node, > + &he->hot_range_tree.map); > + spin_unlock(&he->lock); > + > + kref_get(&entry->hot_range.refs); and here > + return entry; > +} > + > +/* > + * This function does the actual work of updating > + * the frequency numbers, whatever they turn out to be. Can this function be described a bit better? This comment did not help. > + */ > +static void hot_rw_freq_calc(struct timespec old_atime, > + struct timespec cur_time, u64 *avg) > +{ > + struct timespec delta_ts; > + u64 new_delta; > + > + delta_ts = timespec_sub(cur_time, old_atime); > + new_delta = timespec_to_ns(&delta_ts) >> FREQ_POWER; > + > + *avg = (*avg << FREQ_POWER) - *avg + new_delta; > + *avg = *avg >> FREQ_POWER; > +} > + > +static void hot_freq_data_update(struct hot_freq_data *freq_data, bool write) > +{ > + struct timespec cur_time = current_kernel_time(); > + > + if (write) { > + freq_data->nr_writes += 1; The preferred style is fr
Re: [PATCH RESEND v1 01/16] vfs: introduce some data structures
On Thu, Dec 20, 2012 at 10:43:20PM +0800, zwu.ker...@gmail.com wrote: > --- /dev/null > +++ b/fs/hot_tracking.c > @@ -0,0 +1,109 @@ > +/* > + * fs/hot_tracking.c >From what I've undrestood the file name written here is not wanted, so please drop it (and from .h too) > + * > + * Copyright (C) 2012 IBM Corp. All rights reserved. > + * Written by Zhi Yong Wu > + * > + * This program is free software; you can redistribute it and/or > + * modify it under the terms of the GNU General Public > + * License v2 as published by the Free Software Foundation. A short description of the hot tracking feature or pointer to the Documentation/ file would be nice here. > + */ > + > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include "hot_tracking.h" > + > +/* kmem_cache pointers for slab caches */ This comment seems useless to me, I does not help understanding the code, just says the same what reads in C. There are more such redundant comments in the series, but I'm not going point to all of them right now. > +static struct kmem_cache *hot_inode_item_cachep __read_mostly; > +static struct kmem_cache *hot_range_item_cachep __read_mostly; > + > --- /dev/null > +++ b/include/linux/hot_tracking.h > +/* The common info for both following structures */ > +struct hot_comm_item { > + struct rb_node rb_node; /* rbtree index */ > + struct hot_freq_data hot_freq_data; /* frequency data */ > + spinlock_t lock; /* protects object data */ > + struct kref refs; /* prevents kfree */ > +}; > + > +/* An item representing an inode and its access frequency */ > +struct hot_inode_item { > + struct hot_comm_item hot_inode; /* node in hot_inode_tree */ > + struct hot_rb_tree hot_range_tree; /* tree of ranges */ > + spinlock_t lock; /* protect range tree */ > + struct hot_rb_tree *hot_inode_tree; > + u64 i_ino; /* inode number from inode */ > +}; Please align the comments to something like this (or drop them if they seem redundant): /* The common info for both following structures */ struct hot_comm_item { struct rb_node rb_node; /* rbtree index */ struct hot_freq_data hot_freq_data; /* frequency data */ spinlock_t lock; /* protects object data */ struct kref refs;/* prevents kfree */ struct list_head n_list; /* list node index */ }; /* An item representing an inode and its access frequency */ struct hot_inode_item { struct hot_comm_item hot_inode; /* node in hot_inode_tree */ struct hot_rb_tree hot_range_tree; /* tree of ranges */ spinlock_t lock; /* protect range tree */ struct hot_rb_tree *hot_inode_tree; u64 i_ino; /* inode number from inode */ }; > +extern void __init hot_cache_init(void); this belongs to the private include fs/hot_tracking.h (because this is called only once by vfs init and not by filesystems), there's hot_track_init(superblock) for that purpose introduced later. david -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RESEND v1 02/16] vfs: add init and cleanup functions
On Thu, Dec 20, 2012 at 10:43:21PM +0800, zwu.ker...@gmail.com wrote: > From: Zhi Yong Wu > --- a/fs/hot_tracking.c > +++ b/fs/hot_tracking.c > @@ -107,3 +189,38 @@ err: > kmem_cache_destroy(hot_inode_item_cachep); > } > EXPORT_SYMBOL_GPL(hot_cache_init); > + > +/* > + * Initialize the data structures for hot data tracking. > + */ > +int hot_track_init(struct super_block *sb) > +{ > + struct hot_info *root; > + int ret = -ENOMEM; > + > + root = kzalloc(sizeof(struct hot_info), GFP_NOFS); > + if (!root) { > + printk(KERN_ERR "%s: Failed to malloc memory for " > + "hot_info\n", __func__); > + return ret; > + } > + > + hot_inode_tree_init(root); This function is supposed to be called from the filesystem init, please add a sanity check that would catch multiple initialization attempts. > + > + sb->s_hot_root = root; > + > + printk(KERN_INFO "VFS: Turning on hot data tracking\n"); > + > + return 0; > +} > +EXPORT_SYMBOL_GPL(hot_track_init); > + > +void hot_track_exit(struct super_block *sb) > +{ > + struct hot_info *root = sb->s_hot_root; another sanity check to catch the opposite. Why? The option is parsed and enabled from the filesystems, due to unexpected bugs eg with remounting or incorrectly handled error paths, vfs layer should IMHO rather warn than crash. > + > + hot_inode_tree_exit(root); > + sb->s_hot_root = NULL; > + kfree(root); > +} > +EXPORT_SYMBOL_GPL(hot_track_exit); david -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: i915_hangcheck_hung problem with 3.8-rc2+ (Linus's latest tree)
On Wed, Jan 09, 2013 at 02:12:04PM -0600, Dave Kleikamp wrote: > On 01/09/2013 01:44 PM, Dave Kleikamp wrote: > > > > I can easily reproduce it running glxgears on 3.8-rc1 or 3.8-rc2. > > > > 00:02.0 VGA compatible controller [0300]: Intel Corporation Core > > Processor Integrated Graphics Controller [8086:0046] (rev 02) > > > > Thinkpad T410 > > > > Shaggy > > Daniel's patch: > > drm/i915: Revert shrinker changes from "Track unbound pages" > > fixes the problem for me. After an afternoon of multiple kernel builds and other stressful things, it looks like it fixes it for me as well. Chris, this will be going to Linus soon, right? thanks, greg k-h -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/8] PCI, ACPI, x86: Reserve fw allocated resource for hot-add root bus
On Wed, Jan 9, 2013 at 1:10 PM, Rafael J. Wysocki wrote: > On Wednesday, January 09, 2013 11:01:39 AM Yinghai Lu wrote: >> On Wed, Jan 9, 2013 at 10:39 AM, Bjorn Helgaas wrote: >> >> the reason why we need to change those codes for x86, we want to make it >> >> support >> >> pci root bus hotplug. So it would be reasonable for us to align other >> >> platform to x86 >> >> changes after pci root bus hotplug change is completely done. >> > >> > OK, I opened https://bugzilla.kernel.org/show_bug.cgi?id=52531 as a >> > way to keep track of this consistency issue and merged >> > pci/yinghai-survey-resources to my -next branch. >> >> Thanks a lot. will send other pci root bus hotplug out. >> >> question: now Rafael's tree has acpi-scan branch and it touches pci-root.c. >> >> so is it ok for me to base patches on your pci/next and his pm/acpi-scan? >> how? >> can you two have some arrangement like you pulling Rafael's branch? > > My acpi-scan branch is not going to be rebased going forward, so it can be > pulled from safely if that helps. I'm happy to do that, but it is outside the scope of my limited git experience. My guess is that I should do this (doing the pull into a branch which I later merge into my -next branch): $ git checkout -b pci/yinghai-survey-resources+acpi-scan pci/yinghai-survey-resources $ git pull --no-ff --log git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git acpi-scan $ vi drivers/acpi/pci_root.c# resolve conflicts $ git add drivers/acpi/pci_root.c $ git commit $ git checkout next $ git merge --no-ff --log pci/yinghai-survey-resources+acpi-scan Is that reasonable? This won't cause issues when both Rafael and I ask Linus to pull from our trees later? Bjorn -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/