Re: [PATCH v6 00/15] memory-hotplug: hot-remove physical memory

2013-01-09 Thread Glauber Costa
On 01/10/2013 11:31 AM, Kamezawa Hiroyuki wrote:
> (2013/01/10 16:14), Glauber Costa wrote:
>> On 01/10/2013 06:17 AM, Tang Chen wrote:
> Note: if the memory provided by the memory device is used by the
> kernel, it
> can't be offlined. It is not a bug.

 Right.  But how often does this happen in testing?  In other words,
 please provide an overall description of how well memory hot-remove is
 presently operating.  Is it reliable?  What is the success rate in
 real-world situations?
>>>
>>> We test the hot-remove functionality mostly with movable_online used.
>>> And the memory used by kernel is not allowed to be removed.
>>
>> Can you try doing this using cpusets configured to hardwall ?
>> It is my understanding that the object allocators will try hard not to
>> allocate anything outside the walls defined by cpuset. Which means that
>> if you have one process per node, and they are hardwalled, your kernel
>> memory will be spread evenly among the machine. With a big enough load,
>> they should eventually be present in all blocks.
>>
> 
> I'm sorry I couldn't catch your point.
> Do you want to confirm whether cpuset can work enough instead of
> ZONE_MOVABLE ?
> Or Do you want to confirm whether ZONE_MOVABLE will not work if it's
> used with cpuset ?
> 
> 
No, I am not proposing to use cpuset do tackle the problem. I am just
wondering if you would still have high success rates with cpusets in use
with hardwalls. This is just one example of a workload that would spread
kernel memory around quite heavily.

So this is just me trying to understand the limitations of the mechanism.

>> Another question I have for you: Have you considering calling
>> shrink_slab to try to deplete the caches and therefore free at least
>> slab memory in the nodes that can't be offlined? Is it relevant?
>>
> 
> At this stage, we don't consider to call shrink_slab(). We require
> nearly 100% success at offlining memory for removing DIMM.
> It's my understanding.
> 
Of course, this is indisputable.

> IMHO, I don't think shrink_slab() can kill all objects in a node even
> if they are some caches. We need more study for doing that.
> 

Indeed, shrink_slab can only kill cached objects. They, however, are
usually a very big part of kernel memory. I wonder though if in case of
failure, it is worth it to try at least one shrink pass before you give up.

It is not very different from what is in memory-failure.c, except that
we could do better and do a more targetted shrinking (support for that
is being worked on)


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] cpufreq: Simplify __cpufreq_remove_dev()

2013-01-09 Thread Shawn Guo
On Wed, Jan 09, 2013 at 04:50:44PM +0530, Viresh Kumar wrote:
> @Shawn: I believe your driver don't require that ugly code anymore (Though i
> know there is a situation for that to happen, if we have two cpus, you remove
> second one and then add it back. With this cpufreq_add_dev() would call init()
> first and then try to match if there are any managed_policies present. But the
> issue you pointed out about unregistering the driver would be solved by this
> patch.)

Yes, just played it and it works for me.  However, I would have to keep
that little ugly code in my patch to save the dependency on your patch.
Will send a follow-up to clean that up once your patch hits mainline.

Shawn

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] bnx2x: NULL dereference on error in debug code

2013-01-09 Thread David Miller
From: Dan Carpenter 
Date: Tue, 8 Jan 2013 16:42:14 +0300

> "vfop" is NULL here.  I've changed the debugging to not use it.
> 
> Signed-off-by: Dan Carpenter 

Applied, thanks Dan.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] MAINTAINERS: fix a status pattern

2013-01-09 Thread Zhang Yanfei
Change MAINTAINED to Maintained.

Signed-off-by: Zhang Yanfei 
---
 MAINTAINERS |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index c4f15a1..33d8f95 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -648,7 +648,7 @@ F:  arch/arm/
 
 ARM SUB-ARCHITECTURES
 L: linux-arm-ker...@lists.infradead.org (moderated for non-subscribers)
-S: MAINTAINED
+S: Maintained
 F: arch/arm/mach-*/
 F: arch/arm/plat-*/
 T: git git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc.git
-- 
1.7.1
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] MAINTAINERS: fix arch/arm/plat-omap/include/plat/omap_hwmod.h

2013-01-09 Thread Zhang Yanfei
This file was moved to arch/arm/mach-omap2/omap_hwmod.h by commit
2a296c8 (ARM: OMAP: Make plat/omap_hwmod.h local to mach-omap2).

Signed-off-by: Zhang Yanfei 
---
 MAINTAINERS |3 +--
 1 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 915564e..c4f15a1 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -5507,8 +5507,7 @@ M:Benoît Cousson 
 M: Paul Walmsley 
 L: linux-o...@vger.kernel.org
 S: Maintained
-F: arch/arm/mach-omap2/omap_hwmod.c
-F: arch/arm/plat-omap/include/plat/omap_hwmod.h
+F: arch/arm/mach-omap2/omap_hwmod.*
 
 OMAP HWMOD DATA FOR OMAP4-BASED DEVICES
 M: Benoît Cousson 
-- 
1.7.1
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH V2 3/3] ARM: davinci: da850: add NAND driver entries

2013-01-09 Thread Kumar, Anil
On Wed, Jan 09, 2013 at 18:17:46, Nori, Sekhar wrote:
> On 1/8/2013 1:50 PM, Kumar, Anil wrote:
> > Add NAND driver entries to export NAND functionality on da850 EVM and
> > NAND pinctrl node to do pin mux according to pinctrl-single driver.
> 
> Subject line should have DT and da850 evm somewhere.
> Otherwise NAND has been supported on DA850 for long.

I will update it in V3 of this patch series.

> 
> > 
> > Signed-off-by: Kumar, Anil 
> > ---
> > :100644 100644 c7609d0... 382a7da... M  arch/arm/boot/dts/da850-evm.dts
> > :100644 100644 e9c6e82... 16e2ac2... M  arch/arm/boot/dts/da850.dtsi
> >  arch/arm/boot/dts/da850-evm.dts |   19 +++
> >  arch/arm/boot/dts/da850.dtsi|   13 +
> >  2 files changed, 32 insertions(+), 0 deletions(-)
> > 
> > diff --git a/arch/arm/boot/dts/da850-evm.dts 
> > b/arch/arm/boot/dts/da850-evm.dts
> > index c7609d0..382a7da 100644
> > --- a/arch/arm/boot/dts/da850-evm.dts
> > +++ b/arch/arm/boot/dts/da850-evm.dts
> > @@ -28,4 +28,23 @@
> > status = "okay";
> > };
> > };
> > +   nand_cs3@6200 {
> > +   status = "okay";
> > +   };
> > +};
> > +&pmx_core{
> > +   pinctrl-names = "default";
> > +   pinctrl-0 = <
> > +   &nand_cs3_pins
> > +   >;
> 
> This means that the NAND pins are configured even if NAND is not
> probed. Right?

Right 

> This can be moved into the nand_cs3 node to avoid that. 
> And then when used with Linus Walleij's patch "drivers/pinctrl: grab 
> default handles from device core" which should be accepted soon, the 
> pins will be automatically setup when the NAND gets probed.

It is better. I will do it in V3 of this patch series.

> 
> > +
> > +   nand_cs3_pins: pinmux_nand_pins {
> > +   pinctrl-single,bits = <
> > +   0x1c 0x0011  0x00ff /* EMA_OE, EMA_WE */
> > +   0x1c 0x0110  0x0ff0 /* EMA_CS[4],EMA_CS[3]*/
> > +   0x24 0x  0x /* EMA_D[0], EMA_D[1], 
> > EMA_D[2], EMA_D[3],
> > +  EMA_D[4], EMA_D[5], 
> > EMA_D[6], EMA_D[7] */
> > +   0x30 0x0110  0x0ff0 /* EMA_A[1], EMA_A[2] */
> > +   >;
> > +   };
> 
> I suspect a number of other boards using NAND on CS3 would have to 
> repeat these configurations. It will be better to move them into the 
> da850.dtsi file so it can be reused.

> 
> Since I am new to the DT and pinctrl stuff, I had to actually try these to 
> make sure what I am suggesting isn't nonsense. I came up with this 
> patch on top of your series. Please test it at your end

Tested. It is working. 

> and also see if you 
> are happy with the changes.

I do not think that it is good idea to move NAND pin mux information
into da850.dtsi because this information is evm specific.
if we will use this approach then we must use the same approach for
other modules also as ASoC etc.   

Thanks,
Anil

> ---8<
> diff --git a/arch/arm/boot/dts/da850-evm.dts b/arch/arm/boot/dts/da850-evm.dts
> index 382a7da..52abb9d 100644
> --- a/arch/arm/boot/dts/da850-evm.dts
> +++ b/arch/arm/boot/dts/da850-evm.dts
> @@ -30,21 +30,8 @@
>   };
>   nand_cs3@6200 {
>   status = "okay";
> + pinctrl-names = "default";
> + pinctrl-0 = <&nand_cs3_pins>;
>   };
>  };
> -&pmx_core{
> - pinctrl-names = "default";
> - pinctrl-0 = <
> - &nand_cs3_pins
> - >;
>  
> - nand_cs3_pins: pinmux_nand_pins {
> - pinctrl-single,bits = <
> - 0x1c 0x0011  0x00ff /* EMA_OE, EMA_WE */
> - 0x1c 0x0110  0x0ff0 /* EMA_CS[4],EMA_CS[3]*/
> - 0x24 0x  0x /* EMA_D[0], EMA_D[1], 
> EMA_D[2], EMA_D[3],
> -EMA_D[4], EMA_D[5], 
> EMA_D[6], EMA_D[7] */
> - 0x30 0x0110  0x0ff0 /* EMA_A[1], EMA_A[2] */
> - >;
> - };
> -};
> diff --git a/arch/arm/boot/dts/da850.dtsi b/arch/arm/boot/dts/da850.dtsi
> index 6ab9730..c4b21a8 100644
> --- a/arch/arm/boot/dts/da850.dtsi
> +++ b/arch/arm/boot/dts/da850.dtsi
> @@ -38,7 +38,25 @@
>   pinctrl-single,register-width = <32>;
>   pinctrl-single,function-mask = <0x>;
>   status = "disabled";
> +
> + nand_cs3_pins: pinmux_nand_pins {
> + pinctrl-single,bits = <
> + /* EMA_OE, EMA_WE */
> + 0x1c 0x0011  0x00ff
> + /* EMA_CS[4],EMA_CS[3]*/
> + 0x1c 0x0110  0x0ff0
> + /*
> +  * EMA_D[0], EMA_D[1], EMA_D[2],
> +  * EMA_D[3

Re: [PATCH v6 00/15] memory-hotplug: hot-remove physical memory

2013-01-09 Thread Kamezawa Hiroyuki

(2013/01/10 16:14), Glauber Costa wrote:

On 01/10/2013 06:17 AM, Tang Chen wrote:

Note: if the memory provided by the memory device is used by the
kernel, it
can't be offlined. It is not a bug.


Right.  But how often does this happen in testing?  In other words,
please provide an overall description of how well memory hot-remove is
presently operating.  Is it reliable?  What is the success rate in
real-world situations?


We test the hot-remove functionality mostly with movable_online used.
And the memory used by kernel is not allowed to be removed.


Can you try doing this using cpusets configured to hardwall ?
It is my understanding that the object allocators will try hard not to
allocate anything outside the walls defined by cpuset. Which means that
if you have one process per node, and they are hardwalled, your kernel
memory will be spread evenly among the machine. With a big enough load,
they should eventually be present in all blocks.



I'm sorry I couldn't catch your point.
Do you want to confirm whether cpuset can work enough instead of ZONE_MOVABLE ?
Or Do you want to confirm whether ZONE_MOVABLE will not work if it's used with 
cpuset ?



Another question I have for you: Have you considering calling
shrink_slab to try to deplete the caches and therefore free at least
slab memory in the nodes that can't be offlined? Is it relevant?



At this stage, we don't consider to call shrink_slab(). We require
nearly 100% success at offlining memory for removing DIMM.
It's my understanding.

IMHO, I don't think shrink_slab() can kill all objects in a node even
if they are some caches. We need more study for doing that.

Thanks,
-Kame


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 05/14] lib: Add I/O map cache implementation

2013-01-09 Thread Thierry Reding
On Wed, Jan 09, 2013 at 04:17:58PM -0700, Jason Gunthorpe wrote:
> On Wed, Jan 09, 2013 at 04:12:31PM -0700, Stephen Warren wrote:
> > On 01/09/2013 03:10 PM, Arnd Bergmann wrote:
> > > On Wednesday 09 January 2013, Thierry Reding wrote:
> > >> What happens on Tegra is that we need to map 256 MiB of physical memory
> > >> to access all the PCIe extended configuration space. However, ioremap()
> > >> on such a large region fails if not enough vmalloc() space is available.
> > >>
> > >> This was observed when somebody tested this on CardHu which has a 1 GiB
> > >> of RAM and therefore remapping the full 256 MiB fails.
> > ...
> > > Have you checked if the hardware supports an alternative config
> > > space access mechanism that does not depend on a huge address range?
> > > A lot of them provide an index/data register pair somewhere, as the
> > > original PC implementation did.
> > 
> > That would be nice, but I've talked to the HW engineers, and there's no
> > indication that any alternative mechanism exists.
> 
> It seems to be convention that extended config space is often only
> accessible through mmio space, that was true on x86 last I checked
> too..
> 
> You could decrease the size of the mapping to only span the bus
> numbers that are configured for use via DT.

That won't work, unfortunately. The mapping is such that the bus number
is not encoded in the uppermost bits, the extended register number is.
So the only thing that we could do is decrease the size of the extended
register space for *all* devices.

> Are there any concerns about these config registers being accessed
> from a context where a new mapping can't be made? Interrupt? Machine
> Check? PCI-E Advanced Error Reporting?

I haven't checked but I would expect configuration space accesses to not
happen in interrupt context. Usually they are limited to enumeration and
driver probe.

Thierry


pgp8hk9HE_8Vd.pgp
Description: PGP signature


Re: [PATCH 1/2] Add mempressure cgroup

2013-01-09 Thread Glauber Costa
On 01/10/2013 02:06 AM, Anton Vorontsov wrote:
> On Wed, Jan 09, 2013 at 01:55:14PM -0800, Tejun Heo wrote:
> [...]
>>> We can use mempressure w/o memcg, and even then it can (or should :) be
>>> useful (for cpuset, for example).
>>
>> The problem is that you end with, at the very least, duplicate
>> hierarchical accounting mechanisms which overlap with each other
>> while, most likely, being slightly different.  About the same thing
>> happened with cpu and cpuacct controllers and we're now trying to
>> deprecate the latter.
> 
> Yeah. I started answering your comments about hierarchical accounting,
> looked into the memcg code, and realized that *this* is where I need the
> memcg stuff. :)
> 
> Thus yes, I guess I'll have to integrate it with memcg, or sort of.
> 

That being my point since the beginning. To generate per-memcg pressure,
you need memcg anyway. So you would have to have two different and
orthogonal mechanisms, and therefore, double account.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v6 00/15] memory-hotplug: hot-remove physical memory

2013-01-09 Thread Glauber Costa
On 01/10/2013 06:17 AM, Tang Chen wrote:
>>> Note: if the memory provided by the memory device is used by the
>>> kernel, it
>>> can't be offlined. It is not a bug.
>>
>> Right.  But how often does this happen in testing?  In other words,
>> please provide an overall description of how well memory hot-remove is
>> presently operating.  Is it reliable?  What is the success rate in
>> real-world situations?
> 
> We test the hot-remove functionality mostly with movable_online used.
> And the memory used by kernel is not allowed to be removed.

Can you try doing this using cpusets configured to hardwall ?
It is my understanding that the object allocators will try hard not to
allocate anything outside the walls defined by cpuset. Which means that
if you have one process per node, and they are hardwalled, your kernel
memory will be spread evenly among the machine. With a big enough load,
they should eventually be present in all blocks.

Another question I have for you: Have you considering calling
shrink_slab to try to deplete the caches and therefore free at least
slab memory in the nodes that can't be offlined? Is it relevant?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/5] kfifo: remove unnecessary type check

2013-01-09 Thread Yuanhan Liu
On Wed, Jan 09, 2013 at 04:29:39PM +0100, Stefani Seibold wrote:
> Am Mittwoch, den 09.01.2013, 10:35 +0800 schrieb Yuanhan Liu:
> > On Tue, Jan 08, 2013 at 10:51:04PM +0100, Stefani Seibold wrote:
> > > Am Dienstag, den 08.01.2013, 22:57 +0800 schrieb Yuanhan Liu:
> > > > Firstly, this kind of type check doesn't work. It does something similar
> > > > as following:
> > > > void * __dummy = NULL;
> > > > __buf = __dummy;
> > > > 
> > > > __dummy is defined as void *. Thus it will not trigger warnings as
> > > > expected.
> > > > 
> > > > Second, we don't need that kind of check. Since the prototype
> > > > of __kfifo_out is:
> > > > unsigned int __kfifo_out(struct __kfifo *fifo,  void *buf, 
> > > > unsigned int len)
> > > > 
> > > > buf is defined as void *, so we don't need do the type check. Remove it.
> > > > 
> > > 
> > > Thats wrong.
> > > 
> > > First the type checking will be used in kfifo_put() and kfifo_in() for
> > > const types to check if the passed type of the data can converted to the
> > > fifo element type. 
> > 
> > Hi Stefani,
> > 
> > Yes, I see now. After rechecking the code, I found that this kind of
> > type checking only works for those static defined kifo by
> > DECLARE/DEFINE_KFIFO. As the ptrtype is the same as the data type:
> > 
> > /* the 4th argument "type" is "ptrtype" */
> > #define STRUCT_KFIFO(type, size) struct __STRUCT_KFIFO(type, size, 0, 
> > type)
> > 
> > #define DECLARE_KFIFO(fifo, type, size) STRUCT_KFIFO(type, size) fifo
> > 
> > While, for those kfifo dynamically allocated, the type checking will not
> > work as expected then as ptrtype is always "void":
> > 
> > struct kfifo __STRUCT_KFIFO_PTR(unsigned char, 0, void);
> > 
> 
> You should avoid using struct kfifo, as you can read in kfifo.h this is
> only for compatibility reason.

Well, the fact is struct kfifo is used far more widely than
DECLARE/DEFINE_KFIFO; say above 50 vs less than 10.

Thanks.

--yliu
> 
> If you use the macro DECLARE_KFIFO_PTR(), DECLARE_KFIFO() or
> DEFINE_KFIFO() instead.
> 
> Have a look at the examples files in the samples/kfifo directory.
> 
> - Stefani
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 05/14] lib: Add I/O map cache implementation

2013-01-09 Thread Thierry Reding
On Wed, Jan 09, 2013 at 10:10:49PM +, Arnd Bergmann wrote:
> On Wednesday 09 January 2013, Thierry Reding wrote:
> > What happens on Tegra is that we need to map 256 MiB of physical memory
> > to access all the PCIe extended configuration space. However, ioremap()
> > on such a large region fails if not enough vmalloc() space is available.
> > 
> > This was observed when somebody tested this on CardHu which has a 1 GiB
> > of RAM and therefore remapping the full 256 MiB fails.
> 
> Hmm, config space accesses are fairly rare and generally not expected
> to be fast, and 256 MB is really a huge waste of virtual address space,
> so I agree that just ioremapping the entire space is not a good
> solution.
> 
> However, it's not clear that a cache is necessary. Have you measured
> a significant performance benefit of this implementation over just
> iorempping and unmapping a single page for every config space access?

No, I had not actually implemented it that way because I figured I might
just as well implement something generic with the added benefit that
most remapping operations would be cached automatically since the PCI
enumeration algorithms usually access the configuration space of a
single device at a time, so it actually maps to the best case for an LRU
based cache approach.

> Even if we actually want a cache, how about a private implementation
> that just remembers a single page in LRU? I doubt that there are
> more drivers that would benefit from a generalized version that you
> provide.

I can move the code to the Tegra PCIe driver, but there's quite a bit of
code that isn't actually related to the PCI functionality and I'd really
like to avoid cluttering the driver with this implementation. Keeping it
in a more central location will certainly increase the code's visibility
and make it easier for other potential users to find.

Also I just noticed that I hadn't actually added a parameter to the
iomap_cache_create() function to specify the maximum number of pages, so
currently the code only uses a single page anyway. It should be trivial
to change. I guess performance was good enough with a single page that I
didn't have a reason to increase the maximum number of pages.

Thierry


pgpxKXeino59f.pgp
Description: PGP signature


Re: linux-next: build warning after merge of the kvm tree

2013-01-09 Thread Gleb Natapov
On Thu, Jan 10, 2013 at 01:34:36PM +1100, Stephen Rothwell wrote:
> Hi Marcelo,
> 
> After merging the kvm tree, today's linux-next build (x86_64 allmodconfig)
> produced this warning:
> 
> In file included from arch/x86/kvm/mmu.c:3482:0:
> arch/x86/kvm/paging_tmpl.h: In function 'paging64_walk_addr_generic':
> arch/x86/kvm/paging_tmpl.h:154:57: warning: unused variable 'shift' 
> [-Wunused-variable]
> In file included from arch/x86/kvm/mmu.c:3486:0:
> arch/x86/kvm/paging_tmpl.h: In function 'paging32_walk_addr_generic':
> arch/x86/kvm/paging_tmpl.h:154:57: warning: unused variable 'shift' 
> [-Wunused-variable]
> 
> Introduced by commit 908e7d7999bc ("KVM: MMU: simplify folding of dirty
> bit into accessed_dirty").
> 
Was fixed yesterday. Thanks.

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: linux-next: change contacts for the kvm tree

2013-01-09 Thread Gleb Natapov
On Thu, Jan 10, 2013 at 01:36:18PM +1100, Stephen Rothwell wrote:
> Hi Marcelo,
> 
> I have removed Avi as a contact for the kvm tree as the address I have
> (Avi Kivity ) bounces. 
Please add Gleb Natapov  instead.

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/14] Rewrite Tegra PCIe driver

2013-01-09 Thread Thierry Reding
On Wed, Jan 09, 2013 at 10:25:17PM +0100, Thomas Petazzoni wrote:
> Dear Thierry Reding,
> 
> On Wed,  9 Jan 2013 21:43:00 +0100, Thierry Reding wrote:
> > This patch series contains an almost complete rewrite of the Tegra PCIe
> > driver. The code is moved to the drivers/pci/host directory and turned
> > into a proper platform driver, adding MSI and DT support while at it.
> > Other PCI host controller drivers can be added to that directory in an
> > attempt to make it easier to factor out common code.
> 
> Thanks!
> 
> I have started basing the Marvell PCIe code on some of your earlier
> versions. But apparently in this final version, you no longer have the
> emulated Host bridge. Why so?

The reason is that with the latest bindings the matching of root ports
to device tree nodes works as-is and nothing else indicates that the
emulated host bridge is actually required to make any of this work. So
in order not to introduce unneeded code I've left it out for now. If
somebody decides that we actually need this host bridge (for standards
compliance or whatnot) it could easily be added back.

However, before the emulated bridge implementation can be merged I think
the PCI ID issue needs to be resolved.

> For the Marvell PCIe code, I've used your emulated Host bridge, and
> added an emulated PCI-to-PCI bridge implementation, in order to get the
> following hierarchy:
> 
>  + Host Bridge
>+ PCI-to-PCI bridge
>  + PCI Device
>+ PCI-to-PCI bridge
>  + PCI device
> 
> So, I instantiate one unique emulated Host Bridge, and then one
> emulated PCI-to-PCI Bridge for each PCIe interface that I have.

Oh dear, that's even worse than on Tegra. The Marvell hardware doesn't
even expose the root ports as PCI devices on the bus?

> The nice thing about that is that I can then read the configuration
> space of the PCI-to-PCI bridge to find out how much I/O space and
> memory space is needed for the device connected to this interface, and
> at which address is has been mapped. This greatly helps my "address
> decoding" problem, and removes the ad-hoc virtual space allocator I had
> written.
> 
> Is there a reason for having given up on this idea? Is there still a
> hope for a different PCIe implementation to use this idea?

I suppose that in your case it really makes sense because you already
need the emulated PCI-to-PCI bridges and therefore adding an emulated
host bridge doesn't add much. As I said, for Tegra everything still
works without, so I didn't see a reason to add needless code.

Thierry


pgpb8owggtbnV.pgp
Description: PGP signature


Re: PTRACE_SYSCALL && vsyscall (Was: arch_check_bp_in_kernelspace: fix the range check)

2013-01-09 Thread u3557
Hi Everyone,

> On 01/08, Pedro Alves wrote:
>>
>> On 12/04/2012 05:59 PM, Oleg Nesterov wrote:
>>
>> > But If we want to allow to trace vsyscall's, hw bp doesn't look very
>> > nice imo. HBP_NUM = 4 and you need to setup 3 bp's to trace them all.
>>
>> Irrespective of the whole syscall tracing issue, allowing HW bkpts in
>> the vsyscall just seems like a bug fix to me.
>
> And I never argued. I sent the patch iirc ;)

Exactly, it is a bug and I am still waiting for it to be fixed in the
Linux kernel.

Fully emulating PTRACE_SYSCALL could also provide a suitable way to
fix my problem, and it may also help others by saving them the need
to program and waste x86 debug registers, but it doesn't change the
fact that my problem is caused by a bug in the first place, which
should be fixed in any case.

Best Regards,
Amnon.


>
>> > That is why I think PTRACE_SYSCALL should "simply work" somehow. And
>> > so far I think that "just report syscall_exit with orig_ax = -1" is
>> > the best (and simple) solution.
>>
>> If you report exit alone, you'll confuse current GDB into mistaking
>> it for an enter,
>
> Sure. That is why I asked Jan.
>
>> > OK. We can do more. We can report both syscall_enter/exit and we can
>> > change orig_ax/ax temporary to "fool" the tracer, so that everything
>> > will look as a "normal" syscall. Like vsyscall_seccomp() does.
>> >
>> > But this needs much more changes.
>>
>> I'd just like to add, that if any new syscall related option is
>> to be added, can we please just go all the way and add
>> PTRACE_EVENT_SYSCALL_ENTER|PTRACE_EVENT_SYSCALL_EXIT instead?
>
> Oh yes, this was suggested many times.
>
> Oleg.
>
>


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V3 1/2] virtio-net: fix the set affinity bug when CPU IDs are not consecutive

2013-01-09 Thread Rusty Russell
Wanlong Gao  writes:
> On 01/09/2013 07:31 AM, Rusty Russell wrote:
>> Wanlong Gao  writes:
>>>   */
>>>  static u16 virtnet_select_queue(struct net_device *dev, struct sk_buff 
>>> *skb)
>>>  {
>>> -   int txq = skb_rx_queue_recorded(skb) ? skb_get_rx_queue(skb) :
>>> - smp_processor_id();
>>> +   int txq = 0;
>>> +
>>> +   if (skb_rx_queue_recorded(skb))
>>> +   txq = skb_get_rx_queue(skb);
>>> +   else if ((txq = per_cpu(vq_index, smp_processor_id())) == -1)
>>> +   txq = 0;
>> 
>> You should use __get_cpu_var() instead of smp_processor_id() here, ie:
>> 
>> else if ((txq = __get_cpu_var(vq_index)) == -1)
>> 
>> And AFAICT, no reason to initialize txq to 0 to start with.
>> 
>> So:
>> 
>> int txq;
>> 
>> if (skb_rx_queue_recorded(skb))
>>  txq = skb_get_rx_queue(skb);
>> else {
>> txq = __get_cpu_var(vq_index);
>> if (txq == -1)
>> txq = 0;
>> }
>
> Got it, thank you.
>
>> 
>> Now, just to confirm, I assume this can happen even if we use vq_index,
>> right, because of races with virtnet_set_channels?
>
> I still can't understand this race, could you explain more? thank you.

I assume that someone can call virtnet_set_channels() while we are
inside virtnet_select_queue(), so they reduce dev->real_num_tx_queues,
causing virtnet_set_channels to do:

while (unlikely(txq >= dev->real_num_tx_queues))
txq -= dev->real_num_tx_queues;

Otherwise, when is this loop called?

Thanks,
Rusty.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] genirq: Give warning when setup an already-setup non-shared irq

2013-01-09 Thread Chuansheng Liu

Meet the case when the request_threaded_irq() with the same irq
is called twice continually, get the below mismatch info:
"IRQ handler type mismatch for IRQ 323"

Here give a right warning that like below:
"Trying to setup already-setup non-shared IRQ 323"

Signed-off-by: liu chuansheng 
---
 kernel/irq/manage.c |   10 ++
 1 files changed, 10 insertions(+), 0 deletions(-)

diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
index e49a288..6802ce1 100644
--- a/kernel/irq/manage.c
+++ b/kernel/irq/manage.c
@@ -999,6 +999,16 @@ __setup_irq(unsigned int irq, struct irq_desc *desc, 
struct irqaction *new)
old_ptr = &desc->action;
old = *old_ptr;
if (old) {
+   /* Give warning when setup an irq which has been setup
+* already.
+*/
+   if (!(old->flags & IRQF_SHARED)) {
+   ret = -EINVAL;
+   WARN(1, "Trying to setup already-setup non-shared IRQ 
%d\n",
+   irq);
+   goto out_mask;
+   }
+
/*
 * Can't share interrupts unless both agree to and are
 * the same type (level, edge, polarity). So both flag
-- 
1.7.0.4



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V4 0/4] input: keyboard: tegra: cleanups and DT supports

2013-01-09 Thread Laxman Dewangan

Hi Dmitry,

On Monday 07 January 2013 10:22 PM, Stephen Warren wrote:

On 01/06/2013 04:14 AM, Laxman Dewangan wrote:

This patch series:
  - fix build warning,
  - use devm_* for allocation,
  - make column/rows configuration through DT and
  - remove the rarely used  key mapping table.

The series,
Reviewed-by: Stephen Warren 



If you are fine with this series then can it be apply please? I can 
handle if there is any comment on this series to close this.


Thanks,
Laxman

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RESEND v1 01/16] vfs: introduce some data structures

2013-01-09 Thread Zhi Yong Wu
On Thu, Jan 10, 2013 at 8:48 AM, David Sterba  wrote:
> On Thu, Dec 20, 2012 at 10:43:20PM +0800, zwu.ker...@gmail.com wrote:
>> --- /dev/null
>> +++ b/fs/hot_tracking.c
>> @@ -0,0 +1,109 @@
>> +/*
>> + * fs/hot_tracking.c
>
> From what I've undrestood the file name written here is not wanted, so
> please drop it (and from .h too)
Done.
>
>> + *
>> + * Copyright (C) 2012 IBM Corp. All rights reserved.
>> + * Written by Zhi Yong Wu 
>> + *
>> + * This program is free software; you can redistribute it and/or
>> + * modify it under the terms of the GNU General Public
>> + * License v2 as published by the Free Software Foundation.
>
> A short description of the hot tracking feature or pointer to the
> Documentation/ file would be nice here.
ok, Done
>
>> + */
>> +
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include "hot_tracking.h"
>> +
>> +/* kmem_cache pointers for slab caches */
>
> This comment seems useless to me, I does not help understanding the code, just
> says the same what reads in C. There are more such redundant comments in the
> series, but I'm not going point to all of them right now.
Removed.
>
>> +static struct kmem_cache *hot_inode_item_cachep __read_mostly;
>> +static struct kmem_cache *hot_range_item_cachep __read_mostly;
>> +
>
>> --- /dev/null
>> +++ b/include/linux/hot_tracking.h
>> +/* The common info for both following structures */
>> +struct hot_comm_item {
>> + struct rb_node rb_node; /* rbtree index */
>> + struct hot_freq_data hot_freq_data;  /* frequency data */
>> + spinlock_t lock; /* protects object data */
>> + struct kref refs;  /* prevents kfree */
>> +};
>> +
>> +/* An item representing an inode and its access frequency */
>> +struct hot_inode_item {
>> + struct hot_comm_item hot_inode; /* node in hot_inode_tree */
>> + struct hot_rb_tree hot_range_tree; /* tree of ranges */
>> + spinlock_t lock; /* protect range tree */
>> + struct hot_rb_tree *hot_inode_tree;
>> + u64 i_ino; /* inode number from inode */
>> +};
>
> Please align the comments to something like this (or drop them if they seem
> redundant):
Done
>
> /* The common info for both following structures */
> struct hot_comm_item {
> struct rb_node rb_node;  /* rbtree index */
> struct hot_freq_data hot_freq_data;  /* frequency data */
> spinlock_t lock; /* protects object data */
> struct kref refs;/* prevents kfree */
> struct list_head n_list; /* list node index */
> };
>
> /* An item representing an inode and its access frequency */
> struct hot_inode_item {
> struct hot_comm_item hot_inode;  /* node in hot_inode_tree */
> struct hot_rb_tree hot_range_tree;   /* tree of ranges */
> spinlock_t lock; /* protect range tree */
> struct hot_rb_tree *hot_inode_tree;
> u64 i_ino;   /* inode number from inode */
> };
>
>> +extern void __init hot_cache_init(void);
>
> this belongs to the private include fs/hot_tracking.h (because this is called
> only once by vfs init and not by filesystems), there's
> hot_track_init(superblock) for that purpose introduced later.
Done, Move it to fs/hot_tracking.h
>
>
> david



-- 
Regards,

Zhi Yong Wu
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 0/3] ARM: dts: omap: add dt data for MUSB

2013-01-09 Thread kishon

On Friday 28 December 2012 12:05 AM, Aaro Koskinen wrote:

Hi,

On Thu, Sep 20, 2012 at 05:21:15AM +0200, Benoit Cousson wrote:

On 09/19/2012 11:32 AM, Kishon Vijay Abraham I wrote:

This patch series adds dt data to get MUSB working in omap4 and omap3

Changes from v2:
* Changes the subject of all the patches to include "ARM: dts:"
* Added reg property and interrupt property for "usb_otg_hs". Previously these
   were obtained from ti,hwmods property.
* Rebased on
   git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap.git devel-dt

Changes from v1:
Just removed the omap-usb2 dt data and sent that as a separate patch.

Kishon Vijay Abraham I (3):
   ARM: dts: Add twl6030-usb data
   ARM: dts: Add twl4030-usb data
   ARM: dts: omap: Add usb_otg and glue data


Thanks for the update. I've just pulled the series for 3.7.


I wonder what happened to the patch #3 (Add usb_otg and glue data)
of this series? Why was it dropped? I cannot see it in 3.7 or 3.8-rc1.


Benoit?

-Kishon
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v6 04/15] memory-hotplug: remove /sys/firmware/memmap/X sysfs

2013-01-09 Thread Tang Chen

Hi Andrew,

On 01/10/2013 07:19 AM, Andrew Morton wrote:

...

+   entry = firmware_map_find_entry(start, end - 1, type);
+   if (!entry)
+   return -EINVAL;
+
+   firmware_map_remove_entry(entry);

...



The above code looks racy.  After firmware_map_find_entry() does the
spin_unlock() there is nothing to prevent a concurrent
firmware_map_remove_entry() from removing the entry, so the kernel ends
up calling firmware_map_remove_entry() twice against the same entry.

An easy fix for this is to hold the spinlock across the entire
lookup/remove operation.


This problem is inherent to firmware_map_find_entry() as you have
implemented it, so this function simply should not exist in the current
form - no caller can use it without being buggy!  A simple fix for this
is to remove the spin_lock()/spin_unlock() from
firmware_map_find_entry() and add locking documentation to
firmware_map_find_entry(), explaining that the caller must hold
map_entries_lock and must not release that lock until processing of
firmware_map_find_entry()'s return value has completed.


Thank you for your advice, I'll fix it soon.

Since you have merged the patch-set, do I need to resend all these
patches again, or just send a patch to fix it based on the current
one ?

Thanks. :)





--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v6 02/15] memory-hotplug: check whether all memory blocks are offlined or not when removing memory

2013-01-09 Thread Tang Chen

Hi Andrew,

On 01/10/2013 07:11 AM, Andrew Morton wrote:

On Wed, 9 Jan 2013 17:32:26 +0800
Tang Chen  wrote:


We remove the memory like this:
1. lock memory hotplug
2. offline a memory block
3. unlock memory hotplug
4. repeat 1-3 to offline all memory blocks
5. lock memory hotplug
6. remove memory(TODO)
7. unlock memory hotplug

All memory blocks must be offlined before removing memory. But we don't hold
the lock in the whole operation. So we should check whether all memory blocks
are offlined before step6. Otherwise, kernel maybe panicked.


Well, the obvious question is: why don't we hold lock_memory_hotplug()
for all of steps 1-4?  Please send the reasons for this in a form which
I can paste into the changelog.


In the changelog form:

Offlining a memory block and removing a memory device can be two
different operations. Users can just offline some memory blocks
without removing the memory device. For this purpose, the kernel has
held lock_memory_hotplug() in __offline_pages(). To reuse the code
for memory hot-remove, we repeat step 1-3 to offline all the memory
blocks, repeatedly lock and unlock memory hotplug, but not hold the
memory hotplug lock in the whole operation.




Actually, I wonder if doing this would fix a race in the current
remove_memory() repeat: loop.  That code does a
find_memory_block_hinted() followed by offline_memory_block(), but
afaict find_memory_block_hinted() only does a get_device().  Is the
get_device() sufficiently strong to prevent problems if another thread
concurrently offlines or otherwise alters this memory_block's state?


I think we already have memory_block->state_mutex to protect the
concurrently changing of memory_block's state.

The find_memory_block_hinted() here is to find the memory_block
corresponding to the memory section we are dealing with.

Thanks. :)





--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v6 04/15] memory-hotplug: remove /sys/firmware/memmap/X sysfs

2013-01-09 Thread Tang Chen

Hi Andrew,

On 01/10/2013 06:49 AM, Andrew Morton wrote:

On Wed, 9 Jan 2013 17:32:28 +0800
Tang Chen  wrote:


When (hot)adding memory into system, /sys/firmware/memmap/X/{end, start, type}
sysfs files are created. But there is no code to remove these files. The patch
implements the function to remove them.

Note: The code does not free firmware_map_entry which is allocated by bootmem.
   So the patch makes memory leak. But I think the memory leak size is
   very samll. And it does not affect the system.


Well that's bad.  Can we remember the address of that memory and then
reuse the storage if/when the memory is re-added?  That at least puts an upper
bound on the leak.


I think we can do this. I'll post a new patch to do so.

Thanks. :)






--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] Fix wrong EOF compare

2013-01-09 Thread Minchan Kim
getc returns "int" so EOF could be -1 but storing getc's return
value to char directly makes the vaule to 255 so below condition
is always false.

It happens in my ARM system so loop is not ended, then segfaulted.
This patch fixes it.

*curr = getc(fin); // *curr = 255
if (*curr == EOF) return -1; // if ( 255 == -1)

Cc: Mel Gorman 
Cc: Andy Whitcroft 
Cc: Alexander Nyberg 
Signed-off-by: Minchan Kim 
---
 Documentation/page_owner.c |6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/Documentation/page_owner.c b/Documentation/page_owner.c
index f0156e1..b777fb6 100644
--- a/Documentation/page_owner.c
+++ b/Documentation/page_owner.c
@@ -32,12 +32,14 @@ int read_block(char *buf, FILE *fin)
 {
int ret = 0;
int hit = 0;
+   int vaule;
char *curr = buf;
 
for (;;) {
-   *curr = getc(fin);
-   if (*curr == EOF) return -1;
+   value = getc(fin);
+   if (value == EOF) return -1;
 
+   *curr = value;
ret++;
if (*curr == '\n' && hit == 1)
return ret - 1;
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] cpufreq: Simplify __cpufreq_remove_dev()

2013-01-09 Thread Viresh Kumar
On 9 January 2013 21:09, Viresh Kumar  wrote:
> I have tried that too, it is also pushed at:
>
> https://lkml.org/lkml/2012/12/16/5

Bad link :(

http://git.linaro.org/gitweb?p=arm/big.LITTLE/mp.git;a=shortlog;h=refs/heads/cpufreq-fixes-v2
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] cpufreq: Simplify __cpufreq_remove_dev()

2013-01-09 Thread Viresh Kumar
On 9 January 2013 21:09, Viresh Kumar  wrote:
> On 9 January 2013 16:50, Viresh Kumar  wrote:
>> [Probably need to simplify cpufreq_add_dev() too, but that can be done as 
>> next
>> step.]
>
> I have tried that too, it is also pushed at:
>
> https://lkml.org/lkml/2012/12/16/5
>
> [Untested for now, will be doing it tomorrow]
>
> From: Viresh Kumar 
> Date: Wed, 9 Jan 2013 21:02:50 +0530
> Subject: [PATCH] cpufreq: Simplify cpufreq_add_dev()
>
> Currently cpufreq_add_dev() firsts allocated policy, calls ->init() and then
> checks if this cpu should be already managed or not. And if it already 
> managed,
> free its policy.
>
> We can save all this if we somehow know if this cpu is managed or not in
> advance. policy->related_cpus contains list of all valid sibling cpus of
> policy->cpu. We can check this to know if current cpu is already managed.
>
> Signed-off-by: Viresh Kumar 

Tested-by: Viresh Kumar 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [TRIVIAL PATCH 11/26] powerpc: Convert print_symbol to %pSR

2013-01-09 Thread Benjamin Herrenschmidt
On Thu, 2012-12-13 at 11:58 +, Arnd Bergmann wrote:
> On Wednesday 12 December 2012, Joe Perches wrote:
> > Use the new vsprintf extension to avoid any possible
> > message interleaving.
> > 
> > Convert the #ifdef DEBUG block to a single pr_debug.
> > 
> > Signed-off-by: Joe Perches 
> 
> nice cleanup!

 ... which also breaks the build :-(

> Acked-by: Arnd Bergmann 

I'll fix it up locally.

Ben.

> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] sched: Get rid of unnecessary checks from select_idle_sibling

2013-01-09 Thread Namhyung Kim
On Wed, 09 Jan 2013 16:34:39 +0800, Alex Shi wrote:
> On 01/09/2013 03:54 PM, Namhyung Kim wrote:
>> Hi Alex,
>> 
>> On Wed, 09 Jan 2013 15:33:40 +0800, Alex Shi wrote:
>>> On 01/09/2013 02:50 PM, Namhyung Kim wrote:
 From: Namhyung Kim 

 AFAICS @target cpu of select_idle_sibling() is always either prev_cpu
 or this_cpu.  So no need to check it again and the conditionals can be
 consolidated.
>> [snip]
>>> Uh, we don't know if the target is this_cpu or previous cpu, If we just
>>> check the target idle status, we may miss another idle cpu. So this
>>> patch change the logical in this function.
>> 
>> select_idle_sibling() is called only in select_task_rq_fair() if it
>> found a suitable affine_sd.  The default target is the 'prev_cpu' of the
>> task but if wake_affine() returns true it'd be (this) 'cpu'.
>> 
>> I cannot see where the prev_cpu or the cpu is set to another one before
>> calling select_idle_sibling.
>
> The old logical will return directly whenever prev_cpu or this cpu idle,
> but your new logical just has one chance.

Sorry, I can't get your point.  Could you elaborate on it a bit more?

Thanks,
Namhyung
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] sched: Get rid of unnecessary checks from select_idle_sibling

2013-01-09 Thread Namhyung Kim
Hi Preeti,

On Wed, 09 Jan 2013 13:51:00 +0530, Preeti U. Murthy wrote:
> On 01/09/2013 12:20 PM, Namhyung Kim wrote:
>> From: Namhyung Kim 
>> 
>> AFAICS @target cpu of select_idle_sibling() is always either prev_cpu
>> or this_cpu.  So no need to check it again and the conditionals can be
>> consolidated.
[snip]
> If NUMA_TTWU_BIAS or NUMA_TTWU_TO is true(it is false by

I can't find those bits in the code.  I've checked v3.8-rc2,
next-20130110, tip/master and tip/numa/core but there's nothing like
above.  Which tree are you saying?


> default),cpu/prev_cpu can be changed to be a random node_cpu(the node
> that 'this_cpu' is on). In which case even if the node cpu is idle,it
> would not be a viable target,looks like.Maybe that is why
> select_idle_sibling() makes the check if the target is prev_cpu/this cpu.

Looking into tip/numa/core, I can see that there's a code added for
CONFIG_NUMA_BALANCING.  But still, it seems nothing changed on a path
from select_task_rq_fair() to select_idle_sibling() - i.e. if the
select_idle_sibling called, the target would be either prev_cpu or this
cpu.  Am I missing something?

Thanks,
Namhyung
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH V2] vfs: re-implement writeback_inodes_sb(_nr)_if_idle() and rename them

2013-01-09 Thread Miao Xie
writeback_inodes_sb(_nr)_if_idle() is re-implemented by replacing down_read()
with down_read_trylock() because
- If ->s_umount is write locked, then the sb is not idle. That is
  writeback_inodes_sb(_nr)_if_idle() needn't wait for the lock.
- writeback_inodes_sb(_nr)_if_idle() grabs s_umount lock when it want to start
  writeback, it may bring us deadlock problem when doing umount. In order to
  fix the problem, ext4 and btrfs implemented their own writeback functions
  instead of writeback_inodes_sb(_nr)_if_idle(), but it introduced the redundant
  code, it is better to implement a new writeback_inodes_sb(_nr)_if_idle().

The name of these two functions is cumbersome, so rename them to
try_to_writeback_inodes_sb(_nr).

This idea came from Christoph Hellwig.
Some code is from the patch of Kamal Mostafa.

Signed-off-by: Miao Xie 
---
Changelog v1 -> v2:
- do not remove EXPORT_SYMBOL of writeback_inodes_sb_br()
---
 fs/btrfs/extent-tree.c| 20 +++-
 fs/ext4/inode.c   |  8 ++--
 fs/fs-writeback.c | 44 
 include/linux/writeback.h |  6 +++---
 4 files changed, 28 insertions(+), 50 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 521e9d4..f31abb1 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -3689,20 +3689,6 @@ static int can_overcommit(struct btrfs_root *root,
return 0;
 }
 
-static int writeback_inodes_sb_nr_if_idle_safe(struct super_block *sb,
-  unsigned long nr_pages,
-  enum wb_reason reason)
-{
-   if (!writeback_in_progress(sb->s_bdi) &&
-   down_read_trylock(&sb->s_umount)) {
-   writeback_inodes_sb_nr(sb, nr_pages, reason);
-   up_read(&sb->s_umount);
-   return 1;
-   }
-
-   return 0;
-}
-
 /*
  * shrink metadata reservation for delalloc
  */
@@ -3735,9 +3721,9 @@ static void shrink_delalloc(struct btrfs_root *root, u64 
to_reclaim, u64 orig,
while (delalloc_bytes && loops < 3) {
max_reclaim = min(delalloc_bytes, to_reclaim);
nr_pages = max_reclaim >> PAGE_CACHE_SHIFT;
-   writeback_inodes_sb_nr_if_idle_safe(root->fs_info->sb,
-   nr_pages,
-   WB_REASON_FS_FREE_SPACE);
+   try_to_writeback_inodes_sb_nr(root->fs_info->sb,
+ nr_pages,
+ WB_REASON_FS_FREE_SPACE);
 
/*
 * We need to wait for the async pages to actually start before
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index cbfe13b..5f6eef7 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -2512,12 +2512,8 @@ static int ext4_nonda_switch(struct super_block *sb)
/*
 * Start pushing delalloc when 1/2 of free blocks are dirty.
 */
-   if (dirty_blocks && (free_blocks < 2 * dirty_blocks) &&
-   !writeback_in_progress(sb->s_bdi) &&
-   down_read_trylock(&sb->s_umount)) {
-   writeback_inodes_sb(sb, WB_REASON_FS_FREE_SPACE);
-   up_read(&sb->s_umount);
-   }
+   if (dirty_blocks && (free_blocks < 2 * dirty_blocks))
+   try_to_writeback_inodes_sb(sb, WB_REASON_FS_FREE_SPACE);
 
if (2 * free_blocks < 3 * dirty_blocks ||
free_blocks < (dirty_blocks + EXT4_FREECLUSTERS_WATERMARK)) {
diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 310972b..ad3cc46 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -1332,47 +1332,43 @@ void writeback_inodes_sb(struct super_block *sb, enum 
wb_reason reason)
 EXPORT_SYMBOL(writeback_inodes_sb);
 
 /**
- * writeback_inodes_sb_if_idle -   start writeback if none underway
+ * try_to_writeback_inodes_sb_nr - try to start writeback if none underway
  * @sb: the superblock
- * @reason: reason why some writeback work was initiated
+ * @nr: the number of pages to write
+ * @reason: the reason of writeback
  *
- * Invoke writeback_inodes_sb if no writeback is currently underway.
+ * Invoke writeback_inodes_sb_nr if no writeback is currently underway.
  * Returns 1 if writeback was started, 0 if not.
  */
-int writeback_inodes_sb_if_idle(struct super_block *sb, enum wb_reason reason)
+int try_to_writeback_inodes_sb_nr(struct super_block *sb,
+ unsigned long nr,
+ enum wb_reason reason)
 {
-   if (!writeback_in_progress(sb->s_bdi)) {
-   down_read(&sb->s_umount);
-   writeback_inodes_sb(sb, reason);
-   up_read(&sb->s_umount);
+   if (writeback_in_progress(sb->s_bdi))
return 1;
-   } else
+
+   if (!down_read_trylock(&sb->s_umount))
return 0;
+
+   writeback_inodes_sb_nr(sb, nr, reason);
+

Re: [PATCH 1/1] uprobes: Teach handler_chain() to filter out the probed task

2013-01-09 Thread Srikar Dronamraju
* Oleg Nesterov  [2012-12-29 18:36:14]:

> Currrently the are 2 problems with pre-filtering:
> 
> 1. It is not possible to add/remove a task (mm) after uprobe_register()
> 
> 2. A forked child inherits all breakpoints and uprobe_consumer can not
>control this.
> 
> This patch does the first step to improve the filtering. handler_chain()
> removes the breakpoints installed by this uprobe from current->mm if all
> handlers return UPROBE_HANDLER_REMOVE.
> 
> Note that handler_chain() relies on ->register_rwsem to avoid the race
> with uprobe_register/unregister which can add/del a consumer, or even
> remove and then insert the new uprobe at the same address.
> 
> Perhaps we will add uprobe_apply_mm(uprobe, mm, is_register) and teach
> copy_mm() to do filter(UPROBE_FILTER_FORK), but I think this change makes
> sense anyway.
> 
> Note: instead of checking the retcode from uc->handler, we could add
> uc->filter(UPROBE_FILTER_BPHIT). But I think this is not optimal to
> call 2 hooks in a row. This buys nothing, and if handler/filter do
> something nontrivial they will probably do the same work twice.
> 
> Signed-off-by: Oleg Nesterov 

Acked-by: Srikar Dronamraju 

> ---
>  include/linux/uprobes.h |3 ++
>  kernel/events/uprobes.c |   58 ++
>  2 files changed, 51 insertions(+), 10 deletions(-)
> 
> diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h
> index c2df693..95d0002 100644
> --- a/include/linux/uprobes.h
> +++ b/include/linux/uprobes.h
> @@ -35,6 +35,9 @@ struct inode;
>  # include 
>  #endif
> 
> +#define UPROBE_HANDLER_REMOVE1
> +#define UPROBE_HANDLER_MASK  1
> +
>  enum uprobe_filter_ctx {
>   UPROBE_FILTER_REGISTER,
>   UPROBE_FILTER_UNREGISTER,
> diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
> index e2ebb6f..59b6e97 100644
> --- a/kernel/events/uprobes.c
> +++ b/kernel/events/uprobes.c
> @@ -440,16 +440,6 @@ static struct uprobe *alloc_uprobe(struct inode *inode, 
> loff_t offset)
>   return uprobe;
>  }
> 
> -static void handler_chain(struct uprobe *uprobe, struct pt_regs *regs)
> -{
> - struct uprobe_consumer *uc;
> -
> - down_read(&uprobe->register_rwsem);
> - for (uc = uprobe->consumers; uc; uc = uc->next)
> - uc->handler(uc, regs);
> - up_read(&uprobe->register_rwsem);
> -}
> -
>  static void consumer_add(struct uprobe *uprobe, struct uprobe_consumer *uc)
>  {
>   down_write(&uprobe->consumer_rwsem);
> @@ -882,6 +872,33 @@ void uprobe_unregister(struct inode *inode, loff_t 
> offset, struct uprobe_consume
>   put_uprobe(uprobe);
>  }
> 
> +static int unapply_uprobe(struct uprobe *uprobe, struct mm_struct *mm)
> +{
> + struct vm_area_struct *vma;
> + int err = 0;
> +
> + down_read(&mm->mmap_sem);
> + for (vma = mm->mmap; vma; vma = vma->vm_next) {
> + unsigned long vaddr;
> + loff_t offset;
> +
> + if (!valid_vma(vma, false) ||
> + vma->vm_file->f_mapping->host != uprobe->inode)
> + continue;
> +
> + offset = (loff_t)vma->vm_pgoff << PAGE_SHIFT;
> + if (uprobe->offset <  offset ||
> + uprobe->offset >= offset + vma->vm_end - vma->vm_start)
> + continue;
> +
> + vaddr = offset_to_vaddr(vma, uprobe->offset);
> + err |= remove_breakpoint(uprobe, mm, vaddr);
> + }
> + up_read(&mm->mmap_sem);
> +
> + return err;
> +}
> +
>  static struct rb_node *
>  find_node_in_range(struct inode *inode, loff_t min, loff_t max)
>  {
> @@ -1435,6 +1452,27 @@ static struct uprobe *find_active_uprobe(unsigned long 
> bp_vaddr, int *is_swbp)
>   return uprobe;
>  }
> 
> +static void handler_chain(struct uprobe *uprobe, struct pt_regs *regs)
> +{
> + struct uprobe_consumer *uc;
> + int remove = UPROBE_HANDLER_REMOVE;
> +
> + down_read(&uprobe->register_rwsem);
> + for (uc = uprobe->consumers; uc; uc = uc->next) {
> + int rc = uc->handler(uc, regs);
> +
> + WARN(rc & ~UPROBE_HANDLER_MASK,
> + "bad rc=0x%x from %pf()\n", rc, uc->handler);
> + remove &= rc;
> + }
> +
> + if (remove && uprobe->consumers) {
> + WARN_ON(!uprobe_is_active(uprobe));
> + unapply_uprobe(uprobe, current->mm);
> + }
> + up_read(&uprobe->register_rwsem);
> +}
> +
>  /*
>   * Run handler and ask thread to singlestep.
>   * Ensure all non-fatal signals cannot interrupt thread while it singlesteps.
> -- 
> 1.5.5.1
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: USB autosuspend vs. URB submission

2013-01-09 Thread Ming Lei
On Thu, Jan 10, 2013 at 7:05 AM, Laurent Pinchart
 wrote:
>
> I've never heard of such problems with the uvcvideo driver, and I don't see
> anything wrong with the code at first sight. The driver only submits URBs when

IMO, there is a minor fault in the error handling path of
uvc_status_start() inside uvc_v4l2_open(), and the 'users' count
should have been decreased before usb_autopm_put_interface().
In theory, the warning can be triggered when the device is
opened just between usb_autopm_put_interface() and
atomic_dec(&stream->dev->users), but I don't think it is
the cause of the report.

> starting the video capture (at that point no URB should be in flight) or in
> the URB completion handler (by definition the URB has completed then).
>
> I've had a quick look at the trace posted at
> https://bugzilla.redhat.com/show_bug.cgi?id=879462 but usbmon only shows URBs
> that are successfully submitted. I'm not sure what useful information I could
> get from the trace.

It might be useful to post the relevant 'dmesg' and the usbmon together.

Thanks,
-- 
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V2 Resend 3/4] workqueue: Schedule work on non-idle cpu instead of current one

2013-01-09 Thread Viresh Kumar
On 10 January 2013 00:19, Tejun Heo  wrote:
> On Mon, Jan 07, 2013 at 11:37:22PM +0530, Viresh Kumar wrote:
>> We are talking about a core being idle from schedulers perspective :)
>
> But it's not like cpu doesn't consume power if scheduler considers it
> idle, right?  Can you please explain in detail how this contributes to
> saving power?  Is it primarily about routing work items to lower power
> CPUs?  And please don't point to presentation slides.  They don't seem
> to explain much better and patches and the code should be able to
> describe themselves.  Here, more so, as the desired behavior change
> and the resulting powersave are rather subtle.

I got your concerns. Firstly, when cpu is idle from schedulers perspective, it
consumes a lot of power.

queue_work_on_any_cpu() would queue the work on any other cpu only
when current cpu is idle from schedulers perspective, and this can only
happen when the cpu was actually idle (in low power state), woke up due
to some interrupt/timer and is asked to queue a work..

The idea is to choose other non-idle cpu at this point, so that current cpu
can immediately go into deeper idle state. With this cpus can stay longer
at deeper idle state, rather than running works.

And in cases, where works are rearmed from the handler, this can cause
sufficient power loss, which could be easily saved by pushing this work to
non-idle cpus.

The same approach is taken for deffered timers too, they are already using
such routine. .
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V3 4/8] memcg: add per cgroup dirty pages accounting

2013-01-09 Thread Kamezawa Hiroyuki

(2013/01/10 13:26), Sha Zhengju wrote:


But this method also has its pros and cons(e.g. need lock nesting). So
I doubt whether the following is able to deal with these issues all
together:
(CPU-A does "page stat accounting" and CPU-B does "move")

  CPU-ACPU-B

move_lock_mem_cgroup()
memcg = pc->mem_cgroup
SetPageDirty(page)
move_unlock_mem_cgroup()
   move_lock_mem_cgroup()
   if (PageDirty) {
old_memcg->nr_dirty --;
new_memcg->nr_dirty ++;
}
pc->mem_cgroup = new_memcg
move_unlock_mem_cgroup()

memcg->nr_dirty ++


For CPU-A, we save pc->mem_cgroup in a temporary variable just before
SetPageDirty inside move_lock and then update stats if the page is set
PG_dirty successfully. But CPU-B may do "moving" in advance that
"old_memcg->nr_dirty --" will make old_memcg->nr_dirty incorrect but
soon CPU-A will do "memcg->nr_dirty ++" at the heels that amend the
stats.
However, there is a potential problem that old_memcg->nr_dirty  may be
minus in a very short period but not a big issue IMHO.



IMHO, this will work. Please take care of that the recorded memcg will not
be invalid pointer when you update the nr_dirty later.
(Maybe RCU will protect it.)

_If_ this method can handle "nesting" problem clearer and make implementation
simpler, please go ahead. To be honest, I'm not sure how the code will be until
seeing the patch. Hmm, why you write SetPageDirty() here rather than
TestSetPageDirty()

Thanks,
-Kame




--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] serial:ifx6x60:Delete SPI timer when shut down port

2013-01-09 Thread channing

When shut down SPI port, it's possible that MRDY has been asserted and a SPI
timer was activated waiting for SRDY assert, in the case, it needs to delete
this timer.

Signed-off-by: Chen Jun 
Signed-off-by: channing 
---
 drivers/tty/serial/ifx6x60.c |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/drivers/tty/serial/ifx6x60.c b/drivers/tty/serial/ifx6x60.c
index 675d94a..7eed323 100644
--- a/drivers/tty/serial/ifx6x60.c
+++ b/drivers/tty/serial/ifx6x60.c
@@ -637,6 +637,7 @@ static void ifx_port_shutdown(struct tty_port *port)
 
clear_bit(IFX_SPI_STATE_IO_AVAILABLE, &ifx_dev->flags);
mrdy_set_low(ifx_dev);
+   del_timer(&ifx_dev->spi_timer);
clear_bit(IFX_SPI_STATE_TIMER_PENDING, &ifx_dev->flags);
tasklet_kill(&ifx_dev->io_work_tasklet);
 }
-- 
1.7.1



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V3 4/8] memcg: add per cgroup dirty pages accounting

2013-01-09 Thread Sha Zhengju
On Thu, Jan 10, 2013 at 10:16 AM, Kamezawa Hiroyuki
 wrote:
> (2013/01/10 0:02), Sha Zhengju wrote:
>>
>> On Mon, Jan 7, 2013 at 3:25 PM, Kamezawa Hiroyuki
>>  wrote:
>>>
>>> (2013/01/05 13:48), Sha Zhengju wrote:


 On Wed, Jan 2, 2013 at 6:44 PM, Michal Hocko  wrote:
>
>
> On Wed 26-12-12 01:26:07, Sha Zhengju wrote:
>>
>>
>> From: Sha Zhengju 
>>
>> This patch adds memcg routines to count dirty pages, which allows
>> memory
>> controller
>> to maintain an accurate view of the amount of its dirty memory and can
>> provide some
>> info for users while cgroup's direct reclaim is working.
>
>
>
> I guess you meant targeted resp. (hard/soft) limit reclaim here,
> right? It is true that this is direct reclaim but it is not clear to me



 Yes, I meant memcg hard/soft reclaim here which is triggered directly
 by allocation and is distinct from background kswapd reclaim (global).

> why the usefulnes should be limitted to the reclaim for users. I would
> understand this if the users was in fact in-kernel users.
>

 One of the reasons I'm trying to accounting the dirty pages is to get a
 more board overall view of memory usages because memcg hard/soft
 reclaim may have effect on response time of user application.
 Yeah, the beneficiary can be application administrator or kernel users.
 :P

> [...]
>>
>>
>> To prevent AB/BA deadlock mentioned by Greg Thelen in previous version
>> (https://lkml.org/lkml/2012/7/30/227), we adjust the lock order:
>> ->private_lock --> mapping->tree_lock --> memcg->move_lock.
>> So we need to make mapping->tree_lock ahead of TestSetPageDirty in
>> __set_page_dirty()
>> and __set_page_dirty_nobuffers(). But in order to avoiding useless
>> spinlock contention,
>> a prepare PageDirty() checking is added.
>
>
>
> But there is another AA deadlock here I believe.
> page_remove_rmap
> mem_cgroup_begin_update_page_stat <<< 1
> set_page_dirty
>   __set_page_dirty_buffers
> __set_page_dirty
>   mem_cgroup_begin_update_page_stat   <<< 2
> move_lock_mem_cgroup
>   spin_lock_irqsave(&memcg->move_lock, *flags);
>
> mem_cgroup_begin_update_page_stat is not recursive wrt. locking AFAICS
> because we might race with the moving charges:
>   CPU0CPU1
> page_remove_rmap
>   mem_cgroup_can_attach
> mem_cgroup_begin_update_page_stat (1)
>   rcu_read_lock
>
> mem_cgroup_start_move
>
> atomic_inc(&memcg_moving)
>
> atomic_inc(&memcg->moving_account)
>   synchronize_rcu
>   __mem_cgroup_begin_update_page_stat
> mem_cgroup_stolen <<< TRUE
> move_lock_mem_cgroup
> [...]
>   mem_cgroup_begin_update_page_stat (2)
> __mem_cgroup_begin_update_page_stat
>   mem_cgroup_stolen <<< still TRUE
>   move_lock_mem_cgroup  <<< DEADLOCK
> [...]
> mem_cgroup_end_update_page_stat
>   rcu_unlock
> # wake up from
> synchronize_rcu
>   [...]
>   mem_cgroup_move_task
>
> mem_cgroup_move_charge
>   walk_page_range
>
> mem_cgroup_move_account
>
> move_lock_mem_cgroup
>
>
> Maybe I have missed some other locking which would prevent this from
> happening but the locking relations are really complicated in this area
> so if mem_cgroup_{begin,end}_update_page_stat might be called
> recursively then we need a fat comment which justifies that.
>

 Ohhh...good catching!  I didn't notice there is a recursive call of
 mem_cgroup_{begin,end}_update_page_stat in page_remove_rmap().
 The mem_cgroup_{begin,end}_update_page_stat() design has depressed
 me a lot recently as the lock granularity is a little bigger than I
 thought.
 Not only the resource but also some code logic is in the range of
 locking
 which may be deadlock prone. The problem still exists if we are trying
 to
 add stat account of other memcg page later, may I make bold to suggest
 that we dig into the lock again...

 But with regard to the current lock implementation, I doubt if we can we
 can
 account MEM_CGROUP_STAT_FILE_{MAPPED, DIRTY} in one breath and just
 try to get move_lock once in the beginning. IMHO we can make
 mem_cgroup_{begin,end}_update_page_stat() to recursive aware and what

Re: [PATCH RFT] regulator: lp8788-buck: Silence build warning

2013-01-09 Thread Fengguang Wu

Thanks for the fix! It works!

Tested-by: Fengguang Wu  

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] sysctl: Enable IA64 "ignore-unaligned-usertrap" to be used cross-arch

2013-01-09 Thread Vineet Gupta
On Thursday 10 January 2013 12:25 AM, Tony Luck wrote:
> Ok. Queued for 3.9 merge window.  It should show up in linux-next in
> the next day or two.  We'll see if anyone complains about the name
> SYSCTL_ARCH_UNALIGN_NO_WARN (every time I touch the Kconfig
> bits it seems that there is some other, better way that we do this now).
>
> -Tony

Thanks a bunch.
-Vineet
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel BUG at kernel/sched_rt.c:493!

2013-01-09 Thread Mike Galbraith
On Tue, 2013-01-08 at 09:01 -0600, Shawn Bohrer wrote: 
> On Tue, Jan 08, 2013 at 09:36:05AM -0500, Steven Rostedt wrote:
> > > 
> > > I've also managed to reproduce this on 3.8.0-rc2 so it appears the bug
> > > is still present in the latest kernel.
> > 
> > Shawn,
> > 
> > Can you send me your .config file.
> 
> I've attached the 3.8.0-rc2 config that I used to reproduce this in an
> 8 core kvm image.  Let me know if you need anything else.

I tried beating on my little Q6600 with no success.  I even tried
setting the entire box rt, GUI and all, nada.

Hm, maybe re-installing systemd..

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH 3/7] pwm: pwm-tiehrpwm: Update the clock handling of pwm-tiehrpwm driver

2013-01-09 Thread Philip, Avinash
On Wed, Jan 02, 2013 at 19:08:43, Thierry Reding wrote:
> On Wed, Jan 02, 2013 at 06:54:50PM +0530, Philip Avinash wrote:
> > The clock framework has changed and it's now better to invoke
> > clock_prepare_enable() and clk_disable_unprepare() rather than the
> > legacy clk_enable() and clk_disable() calls. This patch converts the
> > pwm-tiehrpwm driver to the new framework.
> > 
> > Signed-off-by: Philip Avinash 
> > Cc: Thierry Reding 
> > ---
> > In 3.8-rc1, common clock frame work support added to AM335x.
> > 
> >  drivers/pwm/pwm-tiehrpwm.c |4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/pwm/pwm-tiehrpwm.c b/drivers/pwm/pwm-tiehrpwm.c
> > index 72a6dd4..af6f162 100644
> > --- a/drivers/pwm/pwm-tiehrpwm.c
> > +++ b/drivers/pwm/pwm-tiehrpwm.c
> > @@ -341,7 +341,7 @@ static int ehrpwm_pwm_enable(struct pwm_chip *chip, 
> > struct pwm_device *pwm)
> > configure_polarity(pc, pwm->hwpwm);
> >  
> > /* Enable TBCLK before enabling PWM device */
> > -   clk_enable(pc->tbclk);
> > +   clk_prepare_enable(pc->tbclk);
> 
> I apparently didn't catch this before, but maybe it would be useful to
> check the return value here to make sure we only proceed if the clock
> can actually be enabled.

Ok I will check return value & make it return status of pwm_enable().
I will send it as individual patch in next series. I am waiting for
reviews / acceptance of HWMOD and clock tree node patches.

Thanks
Avinash

> 
> Thierry
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH 0/7] HWMOD fixes for AM33xx PWM submodules and device tree nodes.

2013-01-09 Thread Philip, Avinash
Hi Paul,

On Wed, Jan 02, 2013 at 18:54:47, Philip, Avinash wrote:
> This patch series is being part of [1] and resubmitting on top of 3.8-rc1 to
> make it to 3.9.
> 
> In AM33xx PWM sub modules like ECAP, EHRPWM & EQEP are  integrated to
> PWM subsystem. All these submodules shares the resources (clock) & has
> a clock gating register in PWM Subsystem. So a parent PWM subsystem
> driver is created. To support PWM subsystem driver, parent child
> relation is created in HWMOD entry in patch #2.
> 
> In addition EHRPWM module requires explicit clock gating from control
> module. Hence add clock node for tbclk handling in patch #4.
> 
> Patch #3 to support common clock migration for AM33xx.
> 
> Also Device tree nodes populated to support parent child relation
> between PWMSS, ECAP & EHRPWM submodules (patch #5, 6 & 7).
> 
> This patch series based on linux_omap/master [2] and tested for
> backlight with ECAP on AM335x-evm & AM335x-evmsk.

Are there any reviews on this patch series?
This patch set is required for LCD backlight work based on ECAP PWM
in am335x-evm & am335x-evmsk.

Thanks
Avinash

> 
> 1. https://lkml.org/lkml/2012/11/27/120
> 2. 
> http://git.kernel.org/?p=linux/kernel/git/tmlind/linux-omap.git;a=commit;h=c4e2e79ff7c9f4fdad7da432c6b449121cc3033e
> 
> Philip Avinash (7):
>   ARM: OMAP: AM33xx hwmod: Corrects PWM subsystem HWMOD entries
>   ARM: OMAP: AM33xx hwmod: Add parent-child relationship for PWM
> subsystem
>   pwm: pwm-tiehrpwm: Update the clock handling of pwm-tiehrpwm driver
>   ARM: AM33XX: clk: Add clock node for EHRPWM TBCLK
>   ARM: dts: AM33XX: Add PWMSS device tree nodes
>   ARM: dts: AM33XX: Add PWM backlight DT data to  am335x-evm
>   ARM: dts: AM33XX: Add PWM backlight DT data to am335x-evmsk
> 
>  arch/arm/boot/dts/am335x-evm.dts   |   23 ++
>  arch/arm/boot/dts/am335x-evmsk.dts |   23 ++
>  arch/arm/boot/dts/am33xx.dtsi  |   84 ++
>  arch/arm/mach-omap2/cclock33xx_data.c  |   30 +++
>  arch/arm/mach-omap2/control.h  |8 +
>  arch/arm/mach-omap2/omap_hwmod_33xx_data.c |  390 
> ++--
>  drivers/pwm/pwm-tiehrpwm.c |4 +-
>  7 files changed, 417 insertions(+), 145 deletions(-)
> 
> -- 
> 1.7.9.5
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/4] gpio: introduce descriptor-based interface

2013-01-09 Thread Alex Courbot
On Wednesday 09 January 2013 18:46:12 Arnd Bergmann wrote:
> > The question is, do we want to totally get rid of the integer
> > namespace? That would be the ultimate step, but would require another
> > way to identify GPIOs (controller_device:offset might be such a way),
> > and also to reorganize sysfs nodes. Wouldn't that be considered
> > breaking user-space? 'cause we all know what happens to those who
> > break user-space.
> 
> 
> The user interface could eventually be the only part of the kernel that
> uses the numbers, but you are right that we cannot change that.

That's sad, as it makes it necessary to maintain the global integer namespace 
(assigning a base GPIO to each controller and making sure the controllers 
ranges do no overlap) even if it is not used internally anymore. We could make 
global numbers assignment transparent, but that would potentially change the 
GPIO numbers in user-space and cause another incompatibility.

> > With the current code, a driver should depend on gpiolib being
> > compiled if it uses the new interface. It is not even declared if
> > gpiolib is not used.
> > 
> > Given that both interfaces are quite close, one could imagine having a
> > gpiod wrapper around the integer namespace (the "opaque descriptors"
> > would then just be casted integers). This way drivers would only need
> > to depend on GENERIC_GPIO. It's a little bit weird to have gpiod
> > wrapping around gpio in one case and the opposite in another though -
> > I'd rather have these platforms convert to GPIO descriptors internally
> > (or even better, to gpiolib), but this is probably asking too much.
> 
> 
> I think it would be reasonable to force everybody to use gpiolib,
> that's much easier than converting everyone to the descriptor based
> interface.
> 
> 
> > I do not know all the details of gpiolib's history, but why would
> > anyone want to implement the generic gpio interface and not use
> > gpiolib anyways?
> 
> 
> Only legacy users did this. Initially there was only the header file,
> with the API declared but several different implementations of it.
> gpiolib was introduced later to reduce code duplication and allow having
> multiple implementations in the same kernel.

Does the following sound reasonable?
1) Make sure every target that uses GENERIC_GPIO also implements its drivers 
using gpiolib, convert the (hopefully) few ones that don't to use gpiolib
2) Make GENERIC_GPIO require GPIOLIB or just merge both options into a single 
one
3) Turn gpio into a full subsystem (like pinctrl)

This should make things less blurry and easier to maintain (less header files, 
only one interface, etc.) GPIO controllers would also be better integrated 
into the driver model.

> > > If we are adding a new way to deal with GPIOs, would it make sense to
> > > have that more closely integrated into pinctrl in one form or another?
> > > My feeling is that there is already a significant overlap between the
> > > two, and while the addition of the gpiod_* functions doesn't
> > > necessarily
> > > make this worse, it could be a chance to improve the current situation
> > > to make the interface more consistent with pinctrl.
> > 
> > 
> > That may be a chance to introduce deeper changes indeed - what do you
> > have in mind exactly?
> 
> 
> I don't know enough about pinctrl to have a specific idea yet, but maybe
> someone else has ideas.

I had a deeper look at pinctrl, and indeed I can see the connection between 
the two. There already interfaces to link GPIO ranges to pin ranges and have 
GPIO drivers switch the pin to the correct state when a GPIO is requested 
(this, btw, should also be updated to not use global GPIO numbers at some 
point). Maybe some tighter integration that I just don't see yet can be done 
too.

> Regarding the integration of pinctrl with gpio,
> I was thinking in the past that we could make pinctrl provide everything
> that gpiolib does, and have a generic gpiolib driver on top of pinctrl
> so that platforms don't need to implement both interfaces but only need
> to provide a pure pinctrl driver. Not sure if this makes any sense.

That would work if all GPIOs were connected to a ball, but how about GPIO 
expanders that are external to the chip? They have no use for pinctrl AFAICT. 
On the other hand, maybe we can have one pinctrl-gpio driver for those chips 
where pinctrl alone can emulate all the functionality of a GPIO controller. 
Maybe such a driver exists already?

But in general, I agree pinctrl should be a source of inspiration for how to 
design GPIO. In particular, having a per-chip integer namespace instead of a 
single global one is definitely something to take (and that's how things work 
in the DT already).

Alex.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] fadvise: perform WILLNEED readahead asynchronously

2013-01-09 Thread Eric Wong
Simon Jeons  wrote:
> On Tue, 2012-12-25 at 02:22 +, Eric Wong wrote:
> 
> Please add changelog.

Changes since v1:

* separate unbound workqueue for high-priority tasks

* account for inflight readahead to avoid denial-of-service

* limit concurrency for non-high-priority tasks (1 per CPU, same as aio)

* take IO priority of requesting process into account when in workqueue.

* process queued readahead in 2M chunks to help ensure fairness between
  multiple requests with few CPUs/workqueues.  Idle tasks get smaller
  256K chunks.

* stops readahead for idle tasks on read congestion

Will try to benchmark with Postgres when I get the chance.

Any other (Free Software) applications that might benefit from
lower FADV_WILLNEED latency?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2] cgroup: use new hashtable implementation

2013-01-09 Thread Li Zefan
Switch cgroup to use the new hashtable implementation. No functional changes.

Signed-off-by: Li Zefan 
---

v2: convert a hlist_del() to hash_del().

---
 kernel/cgroup.c | 92 -
 1 file changed, 39 insertions(+), 53 deletions(-)

diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 4855892..a5262d9 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -52,7 +52,7 @@
 #include 
 #include 
 #include 
-#include 
+#include 
 #include 
 #include 
 #include 
@@ -376,22 +376,18 @@ static int css_set_count;
  * account cgroups in empty hierarchies.
  */
 #define CSS_SET_HASH_BITS  7
-#define CSS_SET_TABLE_SIZE (1 << CSS_SET_HASH_BITS)
-static struct hlist_head css_set_table[CSS_SET_TABLE_SIZE];
+static DEFINE_HASHTABLE(css_set_table, CSS_SET_HASH_BITS);
 
-static struct hlist_head *css_set_hash(struct cgroup_subsys_state *css[])
+static unsigned long css_set_hash(struct cgroup_subsys_state *css[])
 {
int i;
-   int index;
-   unsigned long tmp = 0UL;
+   unsigned long key = 0UL;
 
for (i = 0; i < CGROUP_SUBSYS_COUNT; i++)
-   tmp += (unsigned long)css[i];
-   tmp = (tmp >> 16) ^ tmp;
+   key += (unsigned long)css[i];
+   key = (key >> 16) ^ key;
 
-   index = hash_long(tmp, CSS_SET_HASH_BITS);
-
-   return &css_set_table[index];
+   return key;
 }
 
 /* We don't maintain the lists running through each css_set to its
@@ -418,7 +414,7 @@ static void __put_css_set(struct css_set *cg, int taskexit)
}
 
/* This css_set is dead. unlink it and release cgroup refcounts */
-   hlist_del(&cg->hlist);
+   hash_del(&cg->hlist);
css_set_count--;
 
list_for_each_entry_safe(link, saved_link, &cg->cg_links,
@@ -550,9 +546,9 @@ static struct css_set *find_existing_css_set(
 {
int i;
struct cgroupfs_root *root = cgrp->root;
-   struct hlist_head *hhead;
struct hlist_node *node;
struct css_set *cg;
+   unsigned long key;
 
/*
 * Build the set of subsystem state objects that we want to see in the
@@ -572,8 +568,8 @@ static struct css_set *find_existing_css_set(
}
}
 
-   hhead = css_set_hash(template);
-   hlist_for_each_entry(cg, node, hhead, hlist) {
+   key = css_set_hash(template);
+   hash_for_each_possible(css_set_table, cg, node, hlist, key) {
if (!compare_css_sets(cg, oldcg, cgrp, template))
continue;
 
@@ -657,8 +653,8 @@ static struct css_set *find_css_set(
 
struct list_head tmp_cg_links;
 
-   struct hlist_head *hhead;
struct cg_cgroup_link *link;
+   unsigned long key;
 
/* First see if we already have a cgroup group that matches
 * the desired set */
@@ -704,8 +700,8 @@ static struct css_set *find_css_set(
css_set_count++;
 
/* Add this cgroup group to the hash table */
-   hhead = css_set_hash(res->subsys);
-   hlist_add_head(&res->hlist, hhead);
+   key = css_set_hash(res->subsys);
+   hash_add(css_set_table, &res->hlist, key);
 
write_unlock(&css_set_lock);
 
@@ -1597,6 +1593,8 @@ static struct dentry *cgroup_mount(struct 
file_system_type *fs_type,
struct cgroupfs_root *existing_root;
const struct cred *cred;
int i;
+   struct hlist_node *node;
+   struct css_set *cg;
 
BUG_ON(sb->s_root != NULL);
 
@@ -1650,14 +1648,8 @@ static struct dentry *cgroup_mount(struct 
file_system_type *fs_type,
/* Link the top cgroup in this hierarchy into all
 * the css_set objects */
write_lock(&css_set_lock);
-   for (i = 0; i < CSS_SET_TABLE_SIZE; i++) {
-   struct hlist_head *hhead = &css_set_table[i];
-   struct hlist_node *node;
-   struct css_set *cg;
-
-   hlist_for_each_entry(cg, node, hhead, hlist)
-   link_css_set(&tmp_cg_links, cg, root_cgrp);
-   }
+   hash_for_each(css_set_table, i, node, cg, hlist)
+   link_css_set(&tmp_cg_links, cg, root_cgrp);
write_unlock(&css_set_lock);
 
free_cg_links(&tmp_cg_links);
@@ -4438,6 +4430,9 @@ int __init_or_module cgroup_load_subsys(struct 
cgroup_subsys *ss)
 {
struct cgroup_subsys_state *css;
int i, ret;
+   struct hlist_node *node, *tmp;
+   struct css_set *cg;
+   unsigned long key;
 
/* check name and function validity */
if (ss->name == NULL || strlen(ss->name) > MAX_CGROUP_TYPE_NAMELEN ||
@@ -4503,23 +4498,17 @@ int __init_or_module cgroup_load_subsys(struct 
cgroup_subsys *ss)
 * this is all done under the css_set_lock.
 */
write_lock(&css_set_lock);
-   for (i = 0; i < CSS_SET_TABLE_SIZE; i++) {
-

Re: [PATCH V3 0/22] sched: simplified fork, enable load average into LB and power awareness scheduling

2013-01-09 Thread Alex Shi
>>
>> 3, power awareness scheduling, patch 13~22,
>> The subset implement my previous power aware scheduling proposal:
>> https://lkml.org/lkml/2012/8/13/139
>> It defines 2 new power aware policy balance and powersaving, and then
>> try to spread or pack tasks on each of sched group level according the
>> different scheduler policy. That can save much power when task number in
>> system is no more then LCPU number.
> 
> Interesting stuff. I have read through your patches, but it is still not
> clear to me what metrics you use to determine whether a sched group is
> fully utilized or if it can be used for packing more tasks. Is it based on
> nr_running or PJT's tracked load or both? How is the threshold defined?

Thanks review, Morten!

cpu utilisation = rq->util * (rq->nr_running? rq->running : 1),
here: rq->util = running time / whole period.

If nr_running == 2, util == 99%, the potential max 'utilisation' is 99 *
2 = 198, because both of tasks may has the possibility to run full time.

group utils = Sum of all cpu's util,
like a 2 LCPU group, A nr_running is 0, B cpu util is 99%, and has 3 tasks,
So, the group utils = A'util + 99 * 3, that is bigger than threshold =
99% * 2.

The above calculation bias to performance, and that is our purpose.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


linux-next: Tree for Jan 10

2013-01-09 Thread Stephen Rothwell
Hi all,

Changes since 20130109:

The pm tree gained a conflict against the pci tree.



I have created today's linux-next tree at
git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
(patches at http://www.kernel.org/pub/linux/kernel/next/ ).  If you
are tracking the linux-next tree using git, you should not use "git pull"
to do so as that will try to merge the new linux-next release with the
old one.  You should use "git fetch" as mentioned in the FAQ on the wiki
(see below).

You can see which trees have been included by looking in the Next/Trees
file in the source.  There are also quilt-import.log and merge.log files
in the Next directory.  Between each merge, the tree was built with
a ppc64_defconfig for powerpc and an allmodconfig for x86_64. After the
final fixups (if any), it is also built with powerpc allnoconfig (32 and
64 bit), ppc44x_defconfig and allyesconfig (minus
CONFIG_PROFILE_ALL_BRANCHES - this fails its final link) and i386, sparc,
sparc64 and arm defconfig. These builds also have
CONFIG_ENABLE_WARN_DEPRECATED, CONFIG_ENABLE_MUST_CHECK and
CONFIG_DEBUG_INFO disabled when necessary.

Below is a summary of the state of the merge.

We are up to 214 trees (counting Linus' and 28 trees of patches pending
for Linus' tree), more are welcome (even if they are currently empty).
Thanks to those who have contributed, and to those who haven't, please do.

Status of my local build tests will be at
http://kisskb.ellerman.id.au/linux-next .  If maintainers want to give
advice about cross compilers/configs that work, we are always open to add
more builds.

Thanks to Randy Dunlap for doing many randconfig builds.  And to Paul
Gortmaker for triage and bug fixes.

There is a wiki covering stuff to do with linux-next at
http://linux.f-seidel.de/linux-next/pmwiki/ .  Thanks to Frank Seidel.

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au

$ git checkout master
$ git reset --hard stable
Merging origin/master (5c49985 Merge branch 'fixes' of 
git://git.linaro.org/people/rmk/linux-arm)
Merging fixes/master (d287b87 Merge branch 'for-linus' of 
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs)
Merging kbuild-current/rc-fixes (bad9955 menuconfig: Replace CIRCLEQ by 
list_head-style lists.)
Merging arm-current/fixes (d106de3 ARM: 7614/1: mm: fix wrong branch from 
Cortex-A9 to PJ4b)
Merging m68k-current/for-linus (e7e29b4 m68k: Wire up finit_module)
Merging powerpc-merge/merge (e6449c9 powerpc: Add missing NULL terminator to 
avoid boot panic on PPC40x)
Merging sparc/master (4e4d78f sparc: Hook up finit_module syscall.)
Merging net/master (c9be4a5 net: prevent setting ttl=0 via IP_TTL)
Merging sound-current/for-linus (56bde0f ALSA: hdspm - Fix wordclock status on 
AES32)
Merging pci-current/for-linus (56d0da4 PCI/AER: pci_get_domain_bus_and_slot() 
call missing required pci_dev_put())
Merging wireless/master (5e20a4b b43: Fix firmware loading when driver is built 
into the kernel)
Merging driver-core.current/driver-core-linus (4956964 Merge tag 
'driver-core-3.8-rc2' of 
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core)
Merging tty.current/tty-linus (d1c3ed6 Linux 3.8-rc2)
Merging usb.current/usb-linus (75e1a2a USB: ehci: make debug port in-use 
detection functional again)
Merging staging.current/staging-linus (e16a922 staging: tidspbridge: use 
prepare/unprepare on dsp clocks)
Merging char-misc.current/char-misc-linus (e6028db mei: fix mismatch in mutex 
unlock-lock in mei_amthif_read())
Merging input-current/for-linus (bec7a4b Input: lm8323 - fix checking PWM 
interrupt status)
Merging md-current/for-linus (a9add5d md/raid5: add blktrace calls)
Merging audit-current/for-linus (c158a35 audit: no leading space in 
audit_log_d_path prefix)
Merging crypto-current/master (a2c0911 crypto: caam - Updated SEC-4.0 device 
tree binding for ERA information.)
Merging ide/master (9974e43 ide: fix generic_ide_suspend/resume Oops)
Merging dwmw2/master (084a0ec x86: add CONFIG_X86_MOVBE option)
CONFLICT (content): Merge conflict in arch/x86/Kconfig
Merging sh-current/sh-fixes-for-linus (4403310 SH: Convert out[bwl] macros to 
inline functions)
Merging irqdomain-current/irqdomain/merge (a0d271c Linux 3.6)
Merging devicetree-current/devicetree/merge (ab28698 of: define struct device 
in of_platform.h if !OF_DEVICE and !OF_ADDRESS)
Merging spi-current/spi/merge (d3601e5 spi/sh-hspi: fix return value check in 
hspi_probe().)
Merging gpio-current/gpio/merge (bc1008c gpio/mvebu-gpio: Make mvebu-gpio 
depend on OF_CONFIG)
Merging rr-fixes/fixes (52441fa module: prevent warning when finit_module a 0 
sized file)
Merging asm-generic/master (fb9de7e xtensa: Use generic asm/mmu.h for nommu)
Merging arm/for-next (32887f3 Merge branch 'fixes' into for-next)
Merging arm-perf/for-next/perf (1052917 Merge branch 'perf/updat

Re: [PATCH] Only reset e820 once, even with multiple memmap=exactmap params

2013-01-09 Thread Thomas Renninger
On Tuesday, January 08, 2013 09:19:18 AM Yinghai Lu wrote:
...
> 
> that exactmap logic still have problem:
> We need to check exactmap at first, aka need to scan the whole comand line
> to see if exactmap is there at first and reset e820 tables then handle
> other memmap opt.
> 
> Also please update your patch after
> 
> tip/x86/mm2
> 
> I have one patch that process memmap= with "," there.
> 
> http://git.kernel.org/?p=linux/kernel/git/tip/tip.git;a=commitdiff;h=9710f58
> 1bb4c35589ac046b0cfc0deb7f369fc85
> 
> We could put exactmap scanning in new parse_memmap_opt.

I still do not understand why:

Kexec (kexec/firmware_memmap.c) is setting up the e820 map from:
/sys/firmware/memmap/*
and pass it via bootloader structures.
And this e820 table gets immediately voided by memmap=exactmap
and a new one passed via boot parameters is set up.
If I read this correctly, this is what happens?

Can kexec simply pass the memory to use via memmap=X@Y
Then take the original e820 table, but not the usable entries (those
are coming from above memmap=X@Y).
That would mean that the kexec kernel takes all the
original ACPI, ACPI NVS, reserved, unusable (everthing but usable) entries
from the original e820 table and identifies the usable memory from
memmap boot param?

This would be much smarter than trying to pass the mmconf reserved
area and I could imagine other issues will show up if the reserved areas
do not match the original ones in the kexec kernel.

If this really can be done and memmap=exactmap was only used by kexec,
it's logic could be redefined from "drop all e820 entries" to
"drop all usable e820 entries" and no further adjustings in kexec/kernel are
needed to get mmconf working (and other issues may be avoided before
they happen). Beside that ACPI reserved aread is not needed anymore to get
passed via memmap=X#Y by kexec.

   Thomas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Linux 3.8-rc3

2013-01-09 Thread Linus Torvalds
The holidays are over, and things are starting to get back to normal.
Except for Greg, I suspect, who is probably still digging his way out
of his email hole.

Anyway, another week, another -rc. A fairly normal-sized one. Most of
the changes are in drivers, with the bulk of them being one-liners all
over that just remove the last traces of __dev[init|exit].

But apart from that, there are real updates (GPU drivers stand out -
mainly radeon and exynos), and some architecture work (arm, powerpc,
mips, microblaze), filesystems (f2fs, gfs, cifs, nfs) and networking
(netfilter and sunrpc).

And I really hope things calm down. I realize there was some pent-up
patches from the holidays, but let's make sure -rc4 really is smaller.
Ok?

   Linus

---

Aaro Koskinen (2):
  ARM: OMAP1: fix USB configuration use-after-release
  drm/nouveau: fix init with agpgart-uninorth

Abhijith Das (1):
  GFS2: Fix race in gfs2_rs_alloc

Adrian Huang (1):
  ACPI / APEI: Fix the returned value in erst_dbg_read

Alex Deucher (10):
  drm/radeon: add connector table for Mac G4 Silver
  drm/radeon/r6xx: fix DMA engine for ttm bo transfers
  drm/radeon: fix typo in evergreen dma fence
  drm/radeon: add GPU reset flags
  drm/radeon: switch to a finer grained reset for r6xx/7xx
  drm/radeon: switch to a finer grained reset for evergreen
  drm/radeon: switch to a finer grained reset for cayman/TN
  drm/radeon: switch to a finer grained reset for SI (v2)
  drm/radeon: split r6xx and r7xx copy_dma functions
  drm/radeon: fix DMA CS parser for r6xx linear copy packet

Alex Elder (1):
  sctp: fix Kconfig bug in default cookie hmac selection

Alexander Aring (1):
  mac802154: fix NOHZ local_softirq_pending 08 warning

Alexander Schremmer (1):
  ALSA: usb-audio: Add support for Creative BT-D1 via usb sound quirks

Andrew Collins (1):
  netfilter: nf_nat: Also handle non-ESTABLISHED routing changes
in MASQUERADE

Andrew Lunn (5):
  ARM: Kirkwood: Fix missing clk for USB device.
  Power: gpio-poweroff: Fix documentation and gpio_is_valid
  ARM: Kirkwood: Fix missing sdio clock
  ARM: Kirkwood: Use fixed-regulator instead of board gpio call
  arm: mvebu: Add missing ; for cpu node.

Arend van Spriel (1):
  brcmsmac: add copyright information for Canonical

Asim Kadav (1):
  sound: oss/pas2: Fix possible access out of array

Barry Song (2):
  pinctrl: sirf: add missing DT-binding document
  pinctrl: sirf: enable GPIO pullup/down configuration from dts

Bartlomiej Zolnierkiewicz (1):
  mm: fix zone_watermark_ok_safe() accounting of isolated pages

Bjørn Mork (1):
  net: qmi_wwan: add Telekom Speedstick LTE II

Bob Hockney (1):
  netfilter: nfnetlink_log: fix mac address for 6in4 tunnels

Bob Peterson (2):
  GFS2: Stop looking for free blocks at end of rgrp
  GFS2: Reset rd_last_alloc when it reaches the end of the rgrp

Borislav Petkov (1):
  EDAC: Fix EDAC Kconfig menu

Carlos Alberto Lopez Perez (1):
  Documentation/sysctl/kernel.txt: document /proc/sys/shmall

Catalin Marinas (3):
  arm64: Include linux/ptrace.h in asm/compat.h
  arm64: Keep the ARM64 Kconfig selects sorted
  arm64: Always select ARM_AMBA and GENERIC_GPIO

Cesar Eduardo Barros (18):
  MAINTAINERS: remove drivers/platform/msm/
  MAINTAINERS: remove arch/arm/common/time-acorn.c
  MAINTAINERS: remove arch/arm/plat-s5p/
  MAINTAINERS: fix drivers/rtc/rtc-vt8500.c
  MAINTAINERS: fix arch/arm/mach-at91/include/mach/at_hdmac.h
  MAINTAINERS: fix drivers/media/platform/atmel-isi.c
  MAINTAINERS: adjust for UAPI
  MAINTAINERS: fix drivers/media/usb/dvb-usb/cxusb*
  MAINTAINERS: remove drivers/video/epson1355fb.c
  MAINTAINERS: fix .../plat-mxc/include/mach/imxfb.h
  MAINTAINERS: fix drivers/ieee802154/
  MAINTAINERS: remove firmware/isci/
  MAINTAINERS: remove arch/x86/platform/mrst/pmu.*
  MAINTAINERS: fix Documentation/mei/
  MAINTAINERS: remove drivers/mmc/host/imxmmc.*
  MAINTAINERS: remove arch/*/lib/perf_event*.c
  MAINTAINERS: remove include/linux/of_pwm.h
  MAINTAINERS: fix drivers/staging/sm7xx/

Christian Lamparter (2):
  carl9170: fix -EINVAL bailout during init with !CONFIG_MAC80211_MESH
  p54usb: add USBIDs for two more p54usb devices

Christophe TORDEUX (1):
  Input: sentelic - only report position of first finger as ST coordinates

Damien Zammit (1):
  ALSA: usb-audio: Fix kernel panic of Digidesign Mbox2 quirk

Daniel Lezcano (1):
  cpuidle: fix lock contention in the idle path

Daniel Vetter (1):
  drm/ttm: fix fence locking in ttm_buffer_object_transfer

David Henningsson (2):
  ALSA: hda - Switch "On" and "Off" for "Mute-LED Mode" kcontrol
  ALSA: hda - add mute LED for HP Pavilion 17 (Realtek codec)

Dmitry Torokhov (1):
  Input: gpio_keys - defer probing if GPIO probing is deferred

Eric Dumazet (2):
  

Re: [PATCH v3 04/22] sched: don't need go to smaller sched domain

2013-01-09 Thread Mike Galbraith
On Wed, 2013-01-09 at 17:38 +, Morten Rasmussen wrote: 
> On Sat, Jan 05, 2013 at 08:37:33AM +, Alex Shi wrote:
> > If parent sched domain has no task allowed cpu find. neither find in
> > it's child. So, go out to save useless checking.
> > 
> > Signed-off-by: Alex Shi 
> > ---
> >  kernel/sched/fair.c | 6 ++
> >  1 file changed, 2 insertions(+), 4 deletions(-)
> > 
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 3c7b09a..ecfbf8e 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -3378,10 +3378,8 @@ select_task_rq_fair(struct task_struct *p, int 
> > sd_flag, int wake_flags)
> > load_idx = sd->wake_idx;
> >  
> > group = find_idlest_group(sd, p, cpu, load_idx);
> 
> The previous patch changed the behavior of find_idlest_group() to
> returning the local group if it is suitable. This effectively means that
> you remove the recursive search for a suitable idle sched group. You
> could as well merge find_idlest_group() and find_idlest_cpu() to avoid
> iterating through the cpus of the same sched group twice.

find_idlest_* could stop when seeing 0 too, can't get much more idle.

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 4/5] x86,smp: keep spinlock delay values per hashed spinlock address

2013-01-09 Thread Rafael Aquini
On Tue, Jan 08, 2013 at 05:31:19PM -0500, Rik van Riel wrote:
> From: Eric Dumazet 
> 
> Eric Dumazet found a regression with the first version of the spinlock
> backoff code, in a workload where multiple spinlocks were contended,
> each having a different wait time.
> 
> This patch has multiple delay values per cpu, indexed on a hash
> of the lock address, to avoid that problem.
> 
> Eric Dumazet wrote:
> 
> I did some tests with your patches with following configuration :
> 
> tc qdisc add dev eth0 root htb r2q 1000 default 3
> (to force a contention on qdisc lock, even with a multi queue net
> device)
> 
> and 24 concurrent "netperf -t UDP_STREAM -H other_machine -- -m 128"
> 
> Machine : 2 Intel(R) Xeon(R) CPU X5660  @ 2.80GHz
> (24 threads), and a fast NIC (10Gbps)
> 
> Resulting in a 13 % regression (676 Mbits -> 595 Mbits)
> 
> In this workload we have at least two contended spinlocks, with
> different delays. (spinlocks are not held for the same duration)
> 
> It clearly defeats your assumption of a single per cpu delay being OK :
> Some cpus are spinning too long while the lock was released.
> 
> We might try to use a hash on lock address, and an array of 16 different
> delays so that different spinlocks have a chance of not sharing the same
> delay.
> 
> With following patch, I get 982 Mbits/s with same bench, so an increase
> of 45 % instead of a 13 % regression.
> 
> Signed-off-by: Eric Dumazet 
> Signed-off-by: Rik van Riel 
> ---

Acked-by: Rafael Aquini 


>  arch/x86/kernel/smp.c |   22 +++---
>  1 files changed, 19 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/kernel/smp.c b/arch/x86/kernel/smp.c
> index 05f828b..1877890 100644
> --- a/arch/x86/kernel/smp.c
> +++ b/arch/x86/kernel/smp.c
> @@ -23,6 +23,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include 
>  #include 
> @@ -134,12 +135,26 @@ static bool smp_no_nmi_ipi = false;
>  #define DELAY_FIXED_1 (1<  #define MIN_SPINLOCK_DELAY (1 * DELAY_FIXED_1)
>  #define MAX_SPINLOCK_DELAY (16000 * DELAY_FIXED_1)
> -DEFINE_PER_CPU(unsigned, spinlock_delay) = { MIN_SPINLOCK_DELAY };
> +#define DELAY_HASH_SHIFT 6
> +struct delay_entry {
> + u32 hash;
> + u32 delay;
> +};
> +static DEFINE_PER_CPU(struct delay_entry [1 << DELAY_HASH_SHIFT], 
> spinlock_delay) = {
> + [0 ... (1 << DELAY_HASH_SHIFT) - 1] = {
> + .hash = 0,
> + .delay = MIN_SPINLOCK_DELAY,
> + },
> +};
> +
>  void ticket_spin_lock_wait(arch_spinlock_t *lock, struct __raw_tickets inc)
>  {
>   __ticket_t head = inc.head, ticket = inc.tail;
>   __ticket_t waiters_ahead;
> - unsigned delay = __this_cpu_read(spinlock_delay);
> + u32 hash = hash32_ptr(lock);
> + u32 slot = hash_32(hash, DELAY_HASH_SHIFT);
> + struct delay_entry *ent = &__get_cpu_var(spinlock_delay[slot]);
> + u32 delay = (ent->hash == hash) ? ent->delay : MIN_SPINLOCK_DELAY;
>   unsigned loops = 1;
>  
>   for (;;) {
> @@ -175,7 +190,8 @@ void ticket_spin_lock_wait(arch_spinlock_t *lock, struct 
> __raw_tickets inc)
>   break;
>   }
>   }
> - __this_cpu_write(spinlock_delay, delay);
> + ent->hash = hash;
> + ent->delay = delay;
>  }
>  
>  /*
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/5] x86,smp: auto tune spinlock backoff delay factor

2013-01-09 Thread Rafael Aquini
On Tue, Jan 08, 2013 at 05:30:29PM -0500, Rik van Riel wrote:
> Many spinlocks are embedded in data structures; having many CPUs
> pounce on the cache line the lock is in will slow down the lock
> holder, and can cause system performance to fall off a cliff.
> 
> The paper "Non-scalable locks are dangerous" is a good reference:
> 
>   http://pdos.csail.mit.edu/papers/linux:lock.pdf
> 
> In the Linux kernel, spinlocks are optimized for the case of
> there not being contention. After all, if there is contention,
> the data structure can be improved to reduce or eliminate
> lock contention.
> 
> Likewise, the spinlock API should remain simple, and the
> common case of the lock not being contended should remain
> as fast as ever.
> 
> However, since spinlock contention should be fairly uncommon,
> we can add functionality into the spinlock slow path that keeps
> system performance from falling off a cliff when there is lock
> contention.
> 
> Proportional delay in ticket locks is delaying the time between
> checking the ticket based on a delay factor, and the number of
> CPUs ahead of us in the queue for this lock. Checking the lock
> less often allows the lock holder to continue running, resulting
> in better throughput and preventing performance from dropping
> off a cliff.
> 
> Proportional spinlock delay with a high delay factor works well
> when there is lots contention on a lock. Likewise, a smaller
> delay factor works well when a lock is lightly contended.
> 
> Making the code auto-tune the delay factor results in a system
> that performs well with both light and heavy lock contention.
> 
> Signed-off-by: Rik van Riel 
> ---
> v3: use fixed-point math for the delay calculations, suggested by Michel 
> Lespinasse
>

Acked-by: Rafael Aquini 

 
>  arch/x86/kernel/smp.c |   43 +++
>  1 files changed, 39 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/kernel/smp.c b/arch/x86/kernel/smp.c
> index aa743e9..05f828b 100644
> --- a/arch/x86/kernel/smp.c
> +++ b/arch/x86/kernel/smp.c
> @@ -113,13 +113,34 @@ static atomic_t stopping_cpu = ATOMIC_INIT(-1);
>  static bool smp_no_nmi_ipi = false;
>  
>  /*
> - * Wait on a congested ticket spinlock.
> + * Wait on a congested ticket spinlock. Many spinlocks are embedded in
> + * data structures; having many CPUs pounce on the cache line with the
> + * spinlock simultaneously can slow down the lock holder, and the system
> + * as a whole.
> + *
> + * To prevent total performance collapse in case of bad spinlock contention,
> + * perform proportional backoff. The per-cpu value of delay is automatically
> + * tuned to limit the number of times spinning CPUs poll the lock before
> + * obtaining it. This limits the amount of cross-CPU traffic required to 
> obtain
> + * a spinlock, and keeps system performance from dropping off a cliff.
> + *
> + * There is a tradeoff. If we poll too often, the whole system is slowed
> + * down. If we sleep too long, the lock will go unused for a period of
> + * time. The solution is to go for a fast spin if we are at the head of
> + * the queue, to slowly increase the delay if we sleep for too short a
> + * time, and to decrease the delay if we slept for too long.
>   */
> +#define DELAY_SHIFT 8
> +#define DELAY_FIXED_1 (1< +#define MIN_SPINLOCK_DELAY (1 * DELAY_FIXED_1)
> +#define MAX_SPINLOCK_DELAY (16000 * DELAY_FIXED_1)
> +DEFINE_PER_CPU(unsigned, spinlock_delay) = { MIN_SPINLOCK_DELAY };
>  void ticket_spin_lock_wait(arch_spinlock_t *lock, struct __raw_tickets inc)
>  {
>   __ticket_t head = inc.head, ticket = inc.tail;
>   __ticket_t waiters_ahead;
> - unsigned loops;
> + unsigned delay = __this_cpu_read(spinlock_delay);
> + unsigned loops = 1;
>  
>   for (;;) {
>   waiters_ahead = ticket - head - 1;
> @@ -133,14 +154,28 @@ void ticket_spin_lock_wait(arch_spinlock_t *lock, 
> struct __raw_tickets inc)
>   } while (ACCESS_ONCE(lock->tickets.head) != ticket);
>   break;
>   }
> - loops = 50 * waiters_ahead;
> +
> + /* Aggressively increase delay, to minimize lock accesses. */
> + if (delay < MAX_SPINLOCK_DELAY)
> + delay += DELAY_FIXED_1 / 7;
> +
> + loops = (delay * waiters_ahead) >> DELAY_SHIFT;
>   while (loops--)
>   cpu_relax();
>  
>   head = ACCESS_ONCE(lock->tickets.head);
> - if (head == ticket)
> + if (head == ticket) {
> + /*
> +  * We overslept, and do not know by how.
> +  * Exponentially decay the value of delay,
> +  * to get it back to a good value quickly.
> +  */
> + if (delay >= 2 * DELAY_FIXED_1)
> + delay -= max(delay/32, DELAY_FIXED_1);
>   break;
> + }
>   }

Re: [Pv-drivers] [PATCH 0/6] VSOCK for Linux upstreaming

2013-01-09 Thread Andy King
> > Our position is that VSOCK feature set is more complete and that it
> > should be possible to use transports other than VMCI for VSOCK
> > traffic, should interested parties implement them,
> 
> Implementing other transports requires restructing vsock (and vmci)
> first as the current vsock code is not a hypervisor neutral service.

I'm going to bite the bullet and spend the next couple of days doing
just that: factoring out the VMCI bits and hiding them behind a
transport layer.  It'll be a bit rough, but it'll be a start.  We'll
submit another patch series next week with that.  I'm hoping that'll
get us over this hump, since it should by hypervisor agnostic at
that point.  It'll be up to you guys to add virtio, though :)

And in the meantime, are there any other glaring errors that we need
to address?

Thanks!
- Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] module, fix percpu reserved memory exhaustion

2013-01-09 Thread Prarit Bhargava
Rusty,

There is likely some subtlety of moving the module mutex that I'm unaware of.
What I can say is that this patch seems to resolve the problem for me, or at
least through 100+ reboots I have not seen the problem (I'm still testing
as I write this).

I'm more than willing to hear an alternative approach, or test an alternative
patch.

Thanks,

P.

8<

In recent Fedora releases (F17 & F18) some users have reported seeing
messages similar to

[   15.478121] Pid: 727, comm: systemd-udevd Tainted: GF 3.8.0-rc2+ #1
[   15.478121] Call Trace:
[   15.478131]  [] pcpu_alloc+0xa01/0xa60
[   15.478137]  [] ? printk+0x61/0x63
[   15.478140]  [] __alloc_reserved_percpu+0x13/0x20
[   15.478145]  [] load_module+0x1dc2/0x20b0
[   15.478150]  [] ? do_page_fault+0xe/0x10
[   15.478152]  [] ? page_fault+0x28/0x30
[   15.478155]  [] sys_init_module+0xd7/0x120
[   15.478159]  [] system_call_fastpath+0x16/0x1b
[   15.478160] kvm: Could not allocate 304 bytes percpu data
[   15.478174] PERCPU: allocation failed, size=304 align=32, alloc from 
reserved chunk failed

during system boot.  In some cases, users have also reported seeing this
message along with a failed load of other modules.

As the message indicates, the reserved chunk of percpu memory (where
modules allocate their memory) is exhausted.  A debug printk inserted in
the code shows

[   15.478533] PRARIT size = 304 > chunk->contig_hint = 208

ie) the reserved chunk of percpu has only 208 bytes of available space.

What is happening is systemd is loading an instance of the kvm module for
each cpu found (see commit e9bda3b).  When the module load occurs the kernel
currently allocates the modules percpu data area prior to checking to see
if the module is already loaded or is in the process of being loaded.  If
the module is already loaded, or finishes load, the module loading code
releases the current instance's module's percpu data.

The problem is that these module loads race and it is possible that all of
the percpu reserved area is consumed by repeated loads of the same module
which results in the failure of other drivers to load.

This patch moves the module percpu allocation after the check for an
existing instance of the module.

Signed-off-by: Prarit Bhargava 
Cc: Rusty Russell 
Cc: Mike Galbraith 
---
 kernel/module.c |  124 ++-
 1 file changed, 85 insertions(+), 39 deletions(-)

diff --git a/kernel/module.c b/kernel/module.c
index 250092c..e7e9b57 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -1929,6 +1929,27 @@ static int verify_export_symbols(struct module *mod)
return 0;
 }
 
+static void simplify_percpu_symbols(struct module *mod,
+   const struct load_info *info)
+{
+   Elf_Shdr *symsec = &info->sechdrs[info->index.sym];
+   Elf_Sym *sym = (void *)symsec->sh_addr;
+   unsigned long secbase;
+   unsigned int i;
+
+   /*
+* No need for error checking in this function because
+* simplify_symbols has already been called.
+*/
+   for (i = 1; i < symsec->sh_size / sizeof(Elf_Sym); i++) {
+   /* Divert to percpu allocation if a percpu var. */
+   if (sym[i].st_shndx == info->index.pcpu) {
+   secbase = (unsigned long)mod_percpu(mod);
+   sym[i].st_value += secbase;
+   }
+   }
+}
+
 /* Change all symbols so that st_value encodes the pointer directly. */
 static int simplify_symbols(struct module *mod, const struct load_info *info)
 {
@@ -1976,12 +1997,11 @@ static int simplify_symbols(struct module *mod, const 
struct load_info *info)
break;
 
default:
-   /* Divert to percpu allocation if a percpu var. */
-   if (sym[i].st_shndx == info->index.pcpu)
-   secbase = (unsigned long)mod_percpu(mod);
-   else
+   /* percpu diverts handled in simplify_percpu_symbols */
+   if (sym[i].st_shndx != info->index.pcpu) {
secbase = 
info->sechdrs[sym[i].st_shndx].sh_addr;
-   sym[i].st_value += secbase;
+   sym[i].st_value += secbase;
+   }
break;
}
}
@@ -2899,11 +2919,29 @@ int __weak module_frob_arch_sections(Elf_Ehdr *hdr,
return 0;
 }
 
+static int allocate_percpu(struct module *mod, struct load_info *info)
+{
+   Elf_Shdr *pcpusec;
+   int err;
+
+   pcpusec = &info->sechdrs[info->index.pcpu];
+   if (pcpusec->sh_size) {
+   /* We have a special allocation for this section. */
+   pr_debug("module %s attempting to percpu with size %d\n",
+mod->name, pcpusec->sh_size);
+   err = percpu_modalloc(mod,
+ pcpuse

linux-next: change contacts for the kvm tree

2013-01-09 Thread Stephen Rothwell
Hi Marcelo,

I have removed Avi as a contact for the kvm tree as the address I have
(Avi Kivity ) bounces. 
-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au


pgpnJmosQ9hpR.pgp
Description: PGP signature


linux-next: build warning after merge of the kvm tree

2013-01-09 Thread Stephen Rothwell
Hi Marcelo,

After merging the kvm tree, today's linux-next build (x86_64 allmodconfig)
produced this warning:

In file included from arch/x86/kvm/mmu.c:3482:0:
arch/x86/kvm/paging_tmpl.h: In function 'paging64_walk_addr_generic':
arch/x86/kvm/paging_tmpl.h:154:57: warning: unused variable 'shift' 
[-Wunused-variable]
In file included from arch/x86/kvm/mmu.c:3486:0:
arch/x86/kvm/paging_tmpl.h: In function 'paging32_walk_addr_generic':
arch/x86/kvm/paging_tmpl.h:154:57: warning: unused variable 'shift' 
[-Wunused-variable]

Introduced by commit 908e7d7999bc ("KVM: MMU: simplify folding of dirty
bit into accessed_dirty").

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au


pgp5_jM1h1y9n.pgp
Description: PGP signature


[PATCH RFT] regulator: lp8788-buck: Silence build warning

2013-01-09 Thread Axel Lin
This driver use id as array index, thus add bounder checking for id.

This patch fixes below build warning:
drivers/regulator/lp8788-buck.c: In function 'lp8788_buck_probe':
drivers/regulator/lp8788-buck.c:490:8: warning: array subscript is above array 
bounds [-Warray-bounds]
drivers/regulator/lp8788-buck.c:489:63: warning: array subscript is above array 
bounds [-Warray-bounds]

Reported-by: Fengguang Wu 
Signed-off-by: Axel Lin 
---
Hi Fengguang,
My toolchain does not show the warning you mentioned,
can you test if this patch indeed silence the build warning?

Thanks,
Axel
 drivers/regulator/lp8788-buck.c |5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/regulator/lp8788-buck.c b/drivers/regulator/lp8788-buck.c
index 1161949..ef845c7 100644
--- a/drivers/regulator/lp8788-buck.c
+++ b/drivers/regulator/lp8788-buck.c
@@ -474,7 +474,7 @@ static int lp8788_init_dvs(struct platform_device *pdev,
u8 default_dvs_mode[] = { LP8788_BUCK1_DVS_I2C, LP8788_BUCK2_DVS_I2C };
 
/* no dvs for buck3, 4 */
-   if (id == BUCK3 || id == BUCK4)
+   if (id > BUCK2)
return 0;
 
/* no dvs platform data, then dvs will be selected by I2C registers */
@@ -505,6 +505,9 @@ static int lp8788_buck_probe(struct platform_device *pdev)
struct regulator_dev *rdev;
int ret;
 
+   if (id >= LP8788_NUM_BUCKS)
+   return -EINVAL;
+
buck = devm_kzalloc(&pdev->dev, sizeof(struct lp8788_buck), GFP_KERNEL);
if (!buck)
return -ENOMEM;
-- 
1.7.9.5



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/5] x86,smp: proportional backoff for ticket spinlocks

2013-01-09 Thread Rafael Aquini
On Tue, Jan 08, 2013 at 05:32:41PM -0500, Rik van Riel wrote:
> Subject: x86,smp: proportional backoff for ticket spinlocks
> 
> Simple fixed value proportional backoff for ticket spinlocks.
> By pounding on the cacheline with the spin lock less often,
> bus traffic is reduced. In cases of a data structure with
> embedded spinlock, the lock holder has a better chance of
> making progress.
> 
> If we are next in line behind the current holder of the
> lock, we do a fast spin, so as not to waste any time when
> the lock is released.
> 
> The number 50 is likely to be wrong for many setups, and
> this patch is mostly to illustrate the concept of proportional
> backup. The next patch automatically tunes the delay value.
> 
> Signed-off-by: Rik van Riel 
> Signed-off-by: Michel Lespinasse 
> ---

Acked-by: Rafael Aquini 


>  arch/x86/kernel/smp.c |   23 ---
>  1 files changed, 20 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/kernel/smp.c b/arch/x86/kernel/smp.c
> index 20da354..aa743e9 100644
> --- a/arch/x86/kernel/smp.c
> +++ b/arch/x86/kernel/smp.c
> @@ -117,11 +117,28 @@ static bool smp_no_nmi_ipi = false;
>   */
>  void ticket_spin_lock_wait(arch_spinlock_t *lock, struct __raw_tickets inc)
>  {
> + __ticket_t head = inc.head, ticket = inc.tail;
> + __ticket_t waiters_ahead;
> + unsigned loops;
> +
>   for (;;) {
> - cpu_relax();
> - inc.head = ACCESS_ONCE(lock->tickets.head);
> + waiters_ahead = ticket - head - 1;
> + /*
> +  * We are next after the current lock holder. Check often
> +  * to avoid wasting time when the lock is released.
> +  */
> + if (!waiters_ahead) {
> + do {
> + cpu_relax();
> + } while (ACCESS_ONCE(lock->tickets.head) != ticket);
> + break;
> + }
> + loops = 50 * waiters_ahead;
> + while (loops--)
> + cpu_relax();
>  
> - if (inc.head == inc.tail)
> + head = ACCESS_ONCE(lock->tickets.head);
> + if (head == ticket)
>   break;
>   }
>  }
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/5] x86,smp: make ticket spinlock proportional backoff w/ auto tuning

2013-01-09 Thread Rafael Aquini
On Wed, Jan 09, 2013 at 06:20:35PM +0530, Raghavendra K T wrote:
> I ran kernbench on 32 core (mx3850) machine with 3.8-rc2 base.
> x base_3.8rc2
> + rik_backoff
> N   Min   MaxMedian   AvgStddev
> x   8   222.977231.16   227.735   227.388 3.1512986
> +   8218.75   232.347  229.1035 228.25425 4.2730225
> No difference proven at 95.0% confidence

I got similar results on smaller systems (1 socket, dual-cores and quad-cores)
when running Rik's latest series, no big difference for good nor for worse,
but I also think Rik's work is meant to address bigger systems with more cores
contending for any given spinlock.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v6 05/15] memory-hotplug: introduce new function arch_remove_memory() for removing page table depends on architecture

2013-01-09 Thread Tang Chen

Hi Andrew,

On 01/10/2013 06:50 AM, Andrew Morton wrote:

On Wed, 9 Jan 2013 17:32:29 +0800
Tang Chen  wrote:


For removing memory, we need to remove page table. But it depends
on architecture. So the patch introduce arch_remove_memory() for
removing page table. Now it only calls __remove_pages().

Note: __remove_pages() for some archtecuture is not implemented
   (I don't know how to implement it for s390).


Can this break the build for s390?


No, I don't think so. The arch_remove_memory() in s390 will only
return -EBUSY.

Thanks. :)







--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] mm: forcely swapout when we are out of page cache

2013-01-09 Thread Minchan Kim
On Wed, Jan 09, 2013 at 04:26:02PM -0800, Andrew Morton wrote:
> On Wed,  9 Jan 2013 15:21:14 +0900
> Minchan Kim  wrote:
> 
> > If laptop_mode is enable, VM try to avoid I/O for saving the power.
> > But if there isn't reclaimable memory without I/O, we should do I/O
> > for preventing unnecessary OOM kill although we sacrifices power.
> > 
> > One of example is that we are out of page cache. Remained one is
> > only anonymous pages, for swapping out, we needs may_writepage = 1.
> > 
> > Reported-by: Luigi Semenzato 
> > Signed-off-by: Minchan Kim 
> > ---
> >  mm/vmscan.c |6 ++
> >  1 file changed, 6 insertions(+)
> > 
> > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > index 439cc47..624c816 100644
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -1728,6 +1728,12 @@ static void get_scan_count(struct lruvec *lruvec, 
> > struct scan_control *sc,
> > free = zone_page_state(zone, NR_FREE_PAGES);
> > if (unlikely(file + free <= high_wmark_pages(zone))) {
> > scan_balance = SCAN_ANON;
> > +   /*
> > +* From now on, we have to swap out
> > +* for peventing OOM kill although
> > +* we sacrifice power consumption.
> > +*/
> > +   sc->may_writepage = 1;
> > goto out;
> > }
> > }
> 
> This is pretty ugly.  get_scan_count() is, as its name implies, an
> idempotent function which inspects the state of things and returns a
> result.  As such, it has no business going in and altering the state of
> the scan_control.
> 
> We have code in both direct reclaim and in kswapd to set may_writepage
> if vmscan is getting into trouble.  I don't see why adding another
> instance is necessary if the existing instances are working correctly.
> 
> 
> 
> (Is it correct that __zone_reclaim() ignores laptop_mode?)
> 
> 
> I have a feeling that laptop mode has bitrotted and these patches are
> kinda hacking around as-yet-not-understood failures...

Absolutely, this patch is last guard for unexpectable behavior.
As I mentioned in cover-letter, Luigi's problem could be solved either [1/2]
or [2/2] but I wanted to add this as last resort in case of unexpected
emergency. But you're right. It's not good to hide the problem like this path
so let's drop [2/2].

Also, I absolutely agree it has bitrotted so for correcting it, we need a
volunteer who have to inverstigate power saveing experiment with long time.
So [1/2] would be band-aid until that.

> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

-- 
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v6 00/15] memory-hotplug: hot-remove physical memory

2013-01-09 Thread Tang Chen

Hi Andrew,

On 01/10/2013 07:33 AM, Andrew Morton wrote:

On Wed, 9 Jan 2013 17:32:24 +0800
Tang Chen  wrote:


This patch-set aims to implement physical memory hot-removing.


As you were on th patch delivery path, all of these patches should have
your Signed-off-by:.  But some were missing it.  I fixed this in my
copy of the patches.


Thank you very much for the help. Next time I'll add it myself.




I suspect this patchset adds a significant amount of code which will
not be used if CONFIG_MEMORY_HOTPLUG=n.  "[PATCH v6 06/15]
memory-hotplug: implement register_page_bootmem_info_section of
sparse-vmemmap", for example.  This is not a good thing, so please go
through the patchset (in fact, go through all the memhotplug code) and
let's see if we can reduce the bloat for CONFIG_MEMORY_HOTPLUG=n
kernels.

This needn't be done immediately - it would be OK by me if you were to
defer this exercise until all the new memhotplug code is largely in
place.  But please, let's do it.


OK, I'll do have a check on it when the page_cgroup problem is solved.

Thanks. :)







--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v6 00/15] memory-hotplug: hot-remove physical memory

2013-01-09 Thread Tang Chen

Hi Andrew,

Thank you very much for your pushing. :)

On 01/10/2013 06:23 AM, Andrew Morton wrote:


This does sound like a significant problem.  We should assume that
mmecg is available and in use.


In patch1, we provide a solution which is not good enough:
Iterate twice to offline the memory.
1st iterate: offline every non primary memory block.
2nd iterate: offline primary (i.e. first added) memory block.


Let's flesh this out a bit.

If we online memory8, memory9, memory10 and memory11 then I'd have
thought that they would need to offlined in reverse order, which will
require four iterations, not two.  Is this wrong and if so, why?


Well, we may need more than two iterations if all memory8, memory9,
memory10 are in use by kernel, and 10 depends on 9, 9 depends on 8.

So, as you see here, the iteration method is not good enough.

But this only happens when the memory is used by kernel, which will not
be able to be migrated. So if we can use a boot option, such as
movablecore_map, or movable_online functionality to limit the memory as 
movable, the kernel will not use this memory. So it is safe when we are

doing node hot-remove.



Also, what happens if we wish to offline only memory9?  Do we offline
memory11 then memory10 then memory9 and then re-online memory10 and
memory11?


In this case, offlining memory9 could fail if user do this by himself,
for example using sysfs.

In this path, it is in memory hot-remove path. So when we remove a
memory device, it will automatically offline all pages, and it is in
reverse order by itself.

And again, this is not good enough. We will figure out a reasonable way
to solve it soon.




And a new idea from Wen Congyang  is:
allocate the memory from the memory block they are describing.


Yes.


But we are not sure if it is OK to do so because there is not existing API
to do so, and we need to move page_cgroup memory allocation from 
MEM_GOING_ONLINE
to MEM_ONLINE.


This all sounds solvable - can we proceed in this fashion?


Yes, we are in progress now.




And also, it may interfere the hugepage.


Please provide full details on this problem.


It is not very clear now, and if I find something, I'll share it out.




Note: if the memory provided by the memory device is used by the kernel, it
can't be offlined. It is not a bug.


Right.  But how often does this happen in testing?  In other words,
please provide an overall description of how well memory hot-remove is
presently operating.  Is it reliable?  What is the success rate in
real-world situations?


We test the hot-remove functionality mostly with movable_online used.
And the memory used by kernel is not allowed to be removed.

We will do some tests in the kernel memory offline cases, and tell you
the test results soon.

And since we are trying out some other ways, I think the problem will
be solved soon.


Are there precautions which the administrator
can take to improve the success rate?


Administrator could use movablecore_map boot option or movable_online
functionality (which is now in kernel) to limit memory as movable to
avoid this problem.


What are the remaining problems
and are there plans to address them?


For now, we will try to allocate page_group on the memory block which
itself is describing. And all the other parts seems work well now.

And we are still testing. If we have any problem, we will share.

Thanks. :)




--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/3] ARM: Dove: add Cubox sdhci card detect gpio

2013-01-09 Thread Jason Cooper
On Tue, Dec 18, 2012 at 01:24:29AM +0100, Sebastian Hesselbarth wrote:
> Card detect for sdhci on Cubox is connected to the wrong pin
> (sdio1_cd instead of sdio0_cd). With support for cd-gpios and
> pinctrl add the corresponding properties to DT for Cubox.
> 
> Signed-off-by: Sebastian Hesselbarth 
> ---
> Cc: Russell King 
> Cc: Jason Cooper 
> Cc: Andrew Lunn 
> Cc: Sebastian Hesselbarth 
> Cc: linux-arm-ker...@lists.infradead.org
> Cc: linux-kernel@vger.kernel.org
> ---
>  arch/arm/boot/dts/dove-cubox.dts |   14 --
>  1 file changed, 12 insertions(+), 2 deletions(-)

Applied to mvebu/fixes

thx,

Jason.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V3 4/8] memcg: add per cgroup dirty pages accounting

2013-01-09 Thread Kamezawa Hiroyuki

(2013/01/10 0:02), Sha Zhengju wrote:

On Mon, Jan 7, 2013 at 3:25 PM, Kamezawa Hiroyuki
 wrote:

(2013/01/05 13:48), Sha Zhengju wrote:


On Wed, Jan 2, 2013 at 6:44 PM, Michal Hocko  wrote:


On Wed 26-12-12 01:26:07, Sha Zhengju wrote:


From: Sha Zhengju 

This patch adds memcg routines to count dirty pages, which allows memory
controller
to maintain an accurate view of the amount of its dirty memory and can
provide some
info for users while cgroup's direct reclaim is working.



I guess you meant targeted resp. (hard/soft) limit reclaim here,
right? It is true that this is direct reclaim but it is not clear to me



Yes, I meant memcg hard/soft reclaim here which is triggered directly
by allocation and is distinct from background kswapd reclaim (global).


why the usefulnes should be limitted to the reclaim for users. I would
understand this if the users was in fact in-kernel users.



One of the reasons I'm trying to accounting the dirty pages is to get a
more board overall view of memory usages because memcg hard/soft
reclaim may have effect on response time of user application.
Yeah, the beneficiary can be application administrator or kernel users.
:P


[...]


To prevent AB/BA deadlock mentioned by Greg Thelen in previous version
(https://lkml.org/lkml/2012/7/30/227), we adjust the lock order:
->private_lock --> mapping->tree_lock --> memcg->move_lock.
So we need to make mapping->tree_lock ahead of TestSetPageDirty in
__set_page_dirty()
and __set_page_dirty_nobuffers(). But in order to avoiding useless
spinlock contention,
a prepare PageDirty() checking is added.



But there is another AA deadlock here I believe.
page_remove_rmap
mem_cgroup_begin_update_page_stat <<< 1
set_page_dirty
  __set_page_dirty_buffers
__set_page_dirty
  mem_cgroup_begin_update_page_stat   <<< 2
move_lock_mem_cgroup
  spin_lock_irqsave(&memcg->move_lock, *flags);

mem_cgroup_begin_update_page_stat is not recursive wrt. locking AFAICS
because we might race with the moving charges:
  CPU0CPU1
page_remove_rmap
  mem_cgroup_can_attach
mem_cgroup_begin_update_page_stat (1)
  rcu_read_lock
mem_cgroup_start_move

atomic_inc(&memcg_moving)

atomic_inc(&memcg->moving_account)
  synchronize_rcu
  __mem_cgroup_begin_update_page_stat
mem_cgroup_stolen <<< TRUE
move_lock_mem_cgroup
[...]
  mem_cgroup_begin_update_page_stat (2)
__mem_cgroup_begin_update_page_stat
  mem_cgroup_stolen <<< still TRUE
  move_lock_mem_cgroup  <<< DEADLOCK
[...]
mem_cgroup_end_update_page_stat
  rcu_unlock
# wake up from
synchronize_rcu
  [...]
  mem_cgroup_move_task
mem_cgroup_move_charge
  walk_page_range

mem_cgroup_move_account

move_lock_mem_cgroup


Maybe I have missed some other locking which would prevent this from
happening but the locking relations are really complicated in this area
so if mem_cgroup_{begin,end}_update_page_stat might be called
recursively then we need a fat comment which justifies that.



Ohhh...good catching!  I didn't notice there is a recursive call of
mem_cgroup_{begin,end}_update_page_stat in page_remove_rmap().
The mem_cgroup_{begin,end}_update_page_stat() design has depressed
me a lot recently as the lock granularity is a little bigger than I
thought.
Not only the resource but also some code logic is in the range of locking
which may be deadlock prone. The problem still exists if we are trying to
add stat account of other memcg page later, may I make bold to suggest
that we dig into the lock again...

But with regard to the current lock implementation, I doubt if we can we
can
account MEM_CGROUP_STAT_FILE_{MAPPED, DIRTY} in one breath and just
try to get move_lock once in the beginning. IMHO we can make
mem_cgroup_{begin,end}_update_page_stat() to recursive aware and what I'm
thinking now is changing memcg->move_lock to rw-spinlock from the
original spinlock:
mem_cgroup_{begin,end}_update_page_stat() try to get the read lock which
make it
reenterable and memcg moving task side try to get the write spinlock.
Then the race may be following:

  CPU0CPU1
page_remove_rmap
  mem_cgroup_can_attach
mem_cgroup_begin_update_page_stat (1)
  rcu_read_lock
mem_cgroup_start_move

atomic_inc(&memcg_moving)

atomic_inc(&memcg->moving_account)
  

[PATCH] hardlockup: detect hard lockups without NMIs using secondary cpus

2013-01-09 Thread Colin Cross
Emulate NMIs on systems where they are not available by using timer
interrupts on other cpus.  Each cpu will use its softlockup hrtimer
to check that the next cpu is processing hrtimer interrupts by
verifying that a counter is increasing.

This patch is useful on systems where the hardlockup detector is not
available due to a lack of NMIs, for example most ARM SoCs.
Without this patch any cpu stuck with interrupts disabled can
cause a hardware watchdog reset with no debugging information,
but with this patch the kernel can detect the lockup and panic,
which can result in useful debugging info.

Signed-off-by: Colin Cross 
---
 include/linux/nmi.h |5 ++-
 kernel/watchdog.c   |   98 ---
 lib/Kconfig.debug   |   14 +++-
 3 files changed, 110 insertions(+), 7 deletions(-)

diff --git a/include/linux/nmi.h b/include/linux/nmi.h
index db50840..c8f8aa0 100644
--- a/include/linux/nmi.h
+++ b/include/linux/nmi.h
@@ -14,8 +14,11 @@
  * may be used to reset the timeout - for code which intentionally
  * disables interrupts for a long time. This call is stateless.
  */
-#if defined(CONFIG_HAVE_NMI_WATCHDOG) || defined(CONFIG_HARDLOCKUP_DETECTOR)
+#if defined(CONFIG_HAVE_NMI_WATCHDOG) || 
defined(CONFIG_HARDLOCKUP_DETECTOR_NMI)
 #include 
+#endif
+
+#if defined(CONFIG_HAVE_NMI_WATCHDOG) || defined(CONFIG_HARDLOCKUP_DETECTOR)
 extern void touch_nmi_watchdog(void);
 #else
 static inline void touch_nmi_watchdog(void)
diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 75a2ab3..94c231e 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -44,6 +44,11 @@
 static DEFINE_PER_CPU(bool, hard_watchdog_warn);
 static DEFINE_PER_CPU(bool, watchdog_nmi_touch);
 static DEFINE_PER_CPU(unsigned long, hrtimer_interrupts_saved);
+#endif
+#ifdef CONFIG_HARDLOCKUP_DETECTOR_OTHER_CPU
+static cpumask_t __read_mostly watchdog_cpus;
+#endif
+#ifdef CONFIG_HARDLOCKUP_DETECTOR_NMI
 static DEFINE_PER_CPU(struct perf_event *, watchdog_ev);
 #endif
 
@@ -179,7 +184,7 @@ void touch_softlockup_watchdog_sync(void)
__raw_get_cpu_var(watchdog_touch_ts) = 0;
 }
 
-#ifdef CONFIG_HARDLOCKUP_DETECTOR
+#ifdef CONFIG_HARDLOCKUP_DETECTOR_NMI
 /* watchdog detector functions */
 static int is_hardlockup(void)
 {
@@ -193,6 +198,64 @@ static int is_hardlockup(void)
 }
 #endif
 
+#ifdef CONFIG_HARDLOCKUP_DETECTOR_OTHER_CPU
+static int is_hardlockup_other_cpu(int cpu)
+{
+   unsigned long hrint = per_cpu(hrtimer_interrupts, cpu);
+
+   if (per_cpu(hrtimer_interrupts_saved, cpu) == hrint)
+   return 1;
+
+   per_cpu(hrtimer_interrupts_saved, cpu) = hrint;
+   return 0;
+}
+
+static void watchdog_check_hardlockup_other_cpu(void)
+{
+   int cpu;
+   cpumask_t cpus = watchdog_cpus;
+
+   /*
+* Test for hardlockups every 3 samples.  The sample period is
+*  watchdog_thresh * 2 / 5, so 3 samples gets us back to slightly over
+*  watchdog_thresh (over by 20%).
+*/
+   if (__this_cpu_read(hrtimer_interrupts) % 3 != 0)
+   return;
+
+   /* check for a hardlockup on the next cpu */
+   cpu = cpumask_next(smp_processor_id(), &cpus);
+   if (cpu >= nr_cpu_ids)
+   cpu = cpumask_first(&cpus);
+   if (cpu == smp_processor_id())
+   return;
+
+   smp_rmb();
+
+   if (per_cpu(watchdog_nmi_touch, cpu) == true) {
+   per_cpu(watchdog_nmi_touch, cpu) = false;
+   return;
+   }
+
+   if (is_hardlockup_other_cpu(cpu)) {
+   /* only warn once */
+   if (per_cpu(hard_watchdog_warn, cpu) == true)
+   return;
+
+   if (hardlockup_panic)
+   panic("Watchdog detected hard LOCKUP on cpu %d", cpu);
+   else
+   WARN(1, "Watchdog detected hard LOCKUP on cpu %d", cpu);
+
+   per_cpu(hard_watchdog_warn, cpu) = true;
+   } else {
+   per_cpu(hard_watchdog_warn, cpu) = false;
+   }
+}
+#else
+static inline void watchdog_check_hardlockup_other_cpu(void) { return; }
+#endif
+
 static int is_softlockup(unsigned long touch_ts)
 {
unsigned long now = get_timestamp(smp_processor_id());
@@ -204,7 +267,7 @@ static int is_softlockup(unsigned long touch_ts)
return 0;
 }
 
-#ifdef CONFIG_HARDLOCKUP_DETECTOR
+#ifdef CONFIG_HARDLOCKUP_DETECTOR_NMI
 
 static struct perf_event_attr wd_hw_attr = {
.type   = PERF_TYPE_HARDWARE,
@@ -252,7 +315,7 @@ static void watchdog_overflow_callback(struct perf_event 
*event,
__this_cpu_write(hard_watchdog_warn, false);
return;
 }
-#endif /* CONFIG_HARDLOCKUP_DETECTOR */
+#endif /* CONFIG_HARDLOCKUP_DETECTOR_NMI */
 
 static void watchdog_interrupt_count(void)
 {
@@ -272,6 +335,9 @@ static enum hrtimer_restart watchdog_timer_fn(struct 
hrtimer *hrtimer)
/* kick the hardlockup detector */
watchdog_interrupt_count();
 
+   /* test for hard

Re: [PATCH 1/2] mm: prevent to add a page to swap if may_writepage is unset

2013-01-09 Thread Minchan Kim
Hi Andrew,

On Wed, Jan 09, 2013 at 04:18:54PM -0800, Andrew Morton wrote:
> On Wed,  9 Jan 2013 15:21:13 +0900
> Minchan Kim  wrote:
> 
> > Recently, Luigi reported there are lots of free swap space when
> > OOM happens. It's easily reproduced on zram-over-swap, where
> > many instance of memory hogs are running and laptop_mode is enabled.
> > 
> > Luigi reported there was no problem when he disabled laptop_mode.
> > The problem when I investigate problem is following as.
> > 
> > try_to_free_pages disable may_writepage if laptop_mode is enabled.
> > shrink_page_list adds lots of anon pages in swap cache by
> > add_to_swap, which makes pages Dirty and rotate them to head of
> > inactive LRU without pageout. If it is repeated, inactive anon LRU
> > is full of Dirty and SwapCache pages.
> > 
> > In case of that, isolate_lru_pages fails because it try to isolate
> > clean page due to may_writepage == 0.
> > 
> > The may_writepage could be 1 only if total_scanned is higher than
> > writeback_threshold in do_try_to_free_pages but unfortunately,
> > VM can't isolate anon pages from inactive anon lru list by
> > above reason and we already reclaimed all file-backed pages.
> > So it ends up OOM killing.
> > 
> > This patch prevents to add a page to swap cache unnecessary when
> > may_writepage is unset so anoymous lru list isn't full of
> > Dirty/Swapcache page. So VM can isolate pages from anon lru list,
> > which ends up setting may_writepage to 1 and could swap out
> > anon lru pages. When OOM triggers, I confirmed swap space was full.
> > 
> > ...
> >
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -780,6 +780,8 @@ static unsigned long shrink_page_list(struct list_head 
> > *page_list,
> > if (PageAnon(page) && !PageSwapCache(page)) {
> > if (!(sc->gfp_mask & __GFP_IO))
> > goto keep_locked;
> > +   if (!sc->may_writepage)
> > +   goto keep_locked;
> > if (!add_to_swap(page))
> > goto activate_locked;
> > may_enter_fs = 1;
> 
> I'm not really getting it, and the description is rather hard to follow :(

It seems I don't have a talent about description. :(
I hope it would be better this year. :)

> 
> We should be adding anon pages to swapcache even when laptop_mode is
> set.  And we should be writing them to swap as well, then reclaiming
> them.  The only thing laptop_mode shouild do is make the disk spin up
> less frequently - that doesn't mean "not at all"!

So it seems your rationale is that let's save power in only system has
enough memory so let's remove may_writepage in reclaim path?

If it is, I love it because I didn't see any number about power saving
through reclaiming throttling(But surely there was reason to add it)
and not sure it works well during long time because we have tweaked
reclaim part too many.

> 
> So something seems screwed up here and the patch looks like a
> heavy-handed workaround.  Why aren't these anon pages getting written
> out in laptop_mode?

Don't know. It was there long time and I don't want to screw it up.
If we decide paging out in reclaim path regardless of laptop_mode,
it makes the problem easy without ugly workaround.

Remove may_writepage? If it's too agressive, we can remove it in only
direct reclaim path.

> 
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majord...@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: mailto:"d...@kvack.org";> em...@kvack.org 

-- 
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/4] arm: vt8500: Add support for Wondermedia WM8750/WM8850

2013-01-09 Thread Tony Prisk
On Wed, 2013-01-09 at 21:27 +, Arnd Bergmann wrote:
> On Wednesday 09 January 2013, Tony Prisk wrote:
> > I'm quite happy to send a pull request if that's what you prefer.
> > 
> > Generally people have just taken the patches straight from email, so
> > everytime I've done a pull-request I get a reply back saying the patches
> > have already been applied.
> > 
> > Is there some 'rule' around pull-requests vs emailed patches?
> 
> Generally, pull requests tend to be less work for us, so I prefer
> them. In particular, when you add a tag description or a signed
> tag, that gives automatically puts  nice text into the merge
> changeset.
> 
> > Should patches in pull-requests have Ack'd lines already?
> 
> Yes.
> 
>   Arnd

This is what I thought - and the reason I haven't sent a pull-request
for the patch's - I haven't had any Ack's :)

Regards
Tony P

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 01/14] memory-hotplug: try to offline the memory twice to avoid dependence

2013-01-09 Thread Tang Chen

Hi Glauber,

On 01/09/2013 11:09 PM, Glauber Costa wrote:


We try to make all page_cgroup allocations local to the node they are describing
now. If the memory is the first memory onlined in this node, we will allocate
it from the other node.

For example, node1 has 4 memory blocks: 8-11, and we online it from 8 to 11
1. memory block 8, page_cgroup allocations are in the other nodes
2. memory block 9, page_cgroup allocations are in memory block 8

So we should offline memory block 9 first. But we don't know in which order
the user online the memory block.

I think we can modify memcg like this:
allocate the memory from the memory block they are describing

I am not sure it is OK to do so.


I don't see a reason why not.


I'm not sure, but if we do this, we could bring in a fragment for each
memory block (a memory section, 128MB, right?). Is this a problem when
we use large page (such as 1GB page) ?

Even if not, will these fragments make any bad effects ?

Thank. :)



You would have to tweak a bit the lookup function for page_cgroup, but
assuming you will always have the pfns and limits, it should be easy to do.

I think the only tricky part is that today we have a single
node_page_cgroup, and we would of course have to have one per memory
block. My assumption is that the number of memory blocks is limited and
likely not very big. So even a static array would do.

Kamezawa, do you have any input in here?



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7u1 08/31] x86, 64bit: early #PF handler set page table

2013-01-09 Thread Yinghai Lu
On Mon, Jan 7, 2013 at 7:55 AM, Borislav Petkov  wrote:
> Those -vXX version lines need to go under the "---" line. Alternatively,
> you might want to add some of them to the commit message with a proper
> explanation since they are not that trivial at a first glance, for
> example the -v5, -v6, -v8, -v9 with a better explanation.

mostly they are for tracking version.

>
>>
>
> This needs hpa's S-O-B.

he will add later when he put the in the tip.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 1/3] dmaengine: add dma_get_channel_caps()

2013-01-09 Thread Matt Porter
On Wed, Oct 24, 2012 at 08:43:27AM +0530, Vinod Koul wrote:
> On Tue, 2012-10-23 at 23:49 +0100, Grant Likely wrote:
> > > +enum dmaengine_apis {
> > > +   DMAENGINE_MEMCPY= 0x0001,
> > > +   DMAENGINE_XOR   = 0x0002,
> > > +   DMAENGINE_XOR_VAL   = 0x0004,
> > > +   DMAENGINE_PQ= 0x0008,
> > > +   DMAENGINE_PQ_VAL= 0x0010,
> > > +   DMAENGINE_MEMSET= 0x0020,
> > > +   DMAENGINE_SLAVE = 0x0040,
> > > +   DMAENGINE_CYCLIC= 0x0080,
> > > +   DMAENGINE_INTERLEAVED   = 0x0100,
> > > +   DMAENGINE_SG= 0x0200,
> > > +};
> > 
> > Actually, one more comment. Why the new enum? Why can't the
> > dma_transaction_type enum be used directly along with dma_cap_mask_t? 
> Some of the capabilities above are not there in dma_caps_t like DMA_SG.
> Also DMA_INTERRUPT and DMA_PRIVATE would not make much sense here.
> 
> BUT would help to keep things simpler if have one definition which
> includes all.

All good points. I think we can best eliminate this redundant set of 
enums by using dma_cap_mask_t and noting in a comment as follows:

diff --git a/include/linux/dmaengine.h b/include/linux/dmaengine.h
index 0181887..9add5ca 100644
--- a/include/linux/dmaengine.h
+++ b/include/linux/dmaengine.h
@@ -391,14 +391,15 @@ enum dmaengine_apis {
  * a) APIs/ops supported
  * b) channel physical capabilities
  *
- * @ops: or'ed api capability
+ * @cap_mask: api/ops capability (DMA_INTERRUPT and DMA_PRIVATE
+ *are invalid api/ops and will never be set)
  * @seg_nr: maximum number of SG segments supported on a SG/SLAVE
  * channel (0 for no maximum or not a SG/SLAVE channel)
  * @seg_len: maximum length of SG segments supported on a SG/SLAVE
  *  channel (0 for no maximum or not a SG/SLAVE channel)
  */
 struct dmaengine_chan_caps {
-   enum dmaengine_apis ops;
+   dma_cap_mask_t cap_mask;
int seg_nr;
int seg_len;
 };

If that sounds good I'll send an updated series with this change.

-Matt
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7u1 12/31] x86: add get_ramdisk_image/size()

2013-01-09 Thread Yinghai Lu
On Mon, Jan 7, 2013 at 7:56 AM, Borislav Petkov  wrote:
> On Thu, Jan 03, 2013 at 04:48:32PM -0800, Yinghai Lu wrote:
>> There are several places to find ramdisk information early for reserving
>> and relocating.
>>
>> Use functions to make code more readable and consistent.
>>
>> Later will add ext_ramdisk_image/size in those functions to support
>> loading ramdisk above 4g.
>>
>> Signed-off-by: Yinghai Lu 
>> ---
>>  arch/x86/kernel/setup.c |   29 +
>>  1 file changed, 21 insertions(+), 8 deletions(-)
>>
>> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
>> index 1b8a8cc..644a123 100644
>> --- a/arch/x86/kernel/setup.c
>> +++ b/arch/x86/kernel/setup.c
>> @@ -294,12 +294,25 @@ static void __init reserve_brk(void)
>>
>>  #ifdef CONFIG_BLK_DEV_INITRD
>>
>> +static u64 __init get_ramdisk_image(void)
>> +{
>> + u64 ramdisk_image = boot_params.hdr.ramdisk_image;
>> +
>> + return ramdisk_image;
>
> just do

No, I will insert line between them.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] leds-lm355x: support LED trigger functionality

2013-01-09 Thread Bryan Wu
On Wed, Jan 9, 2013 at 5:42 PM, Kim, Milo  wrote:
>> -Original Message-
>> From: Bryan Wu [mailto:coolo...@gmail.com]
>> Sent: Thursday, January 10, 2013 10:16 AM
>> To: Kim, Milo
>> Cc: Jeong, Daniel; gshark.je...@gmail.com; linux-l...@vger.kernel.org;
>> linux-kernel@vger.kernel.org
>> Subject: Re: [PATCH] leds-lm355x: support LED trigger functionality
>>
>> On Mon, Jan 7, 2013 at 10:35 PM, Kim, Milo  wrote:
>> >  LM355x family devices provide flash, torch and indicator functions.
>> >  This patch support LED trigger feature.
>> >  Using LED trigger APIs(), other driver simply turn on/off the flash,
>> torch
>> >  and indicator.
>> >
>> >  Platform data
>> >   the name of LED trigger is configurable.
>> >
>> >  Documentation
>> >   example and detailed description added.
>> >
>> > Signed-off-by: Milo(Woogyom) Kim 
>> > ---
>> >  Documentation/leds/leds-lm3556.txt|   62
>> +
>> >  drivers/leds/leds-lm355x.c|3 ++
>> >  include/linux/platform_data/leds-lm355x.h |8 
>> >  3 files changed, 73 insertions(+)
>> >
>> > diff --git a/Documentation/leds/leds-lm3556.txt
>> b/Documentation/leds/leds-lm3556.txt
>> > index d9eb91b..73244cd 100644
>> > --- a/Documentation/leds/leds-lm3556.txt
>> > +++ b/Documentation/leds/leds-lm3556.txt
>> > @@ -83,3 +83,65 @@ and register it in the platform init function
>> >  Example:
>> > board_register_i2c_bus(4, 400,
>> > board_i2c_ch4,
>> ARRAY_SIZE(board_i2c_ch4));
>> > +
>> > +Support LED Triggers
>> > +
>> > +Flash, torch and indicator can be controlled not only by an user-
>> space but also
>> > +by other drivers, kernel space.
>> > +For example, flash turns on by camera driver internally.
>> > +To support this functionality, LED trigger is registered.
>> > +The name of LED trigger is configurable in the platform data.
>> > +
>> > +Example: LED trigger name for flash
>> > +#include 
>> > +
>> > +struct lm355x_trigger_name lm3556_trigger_name = {
>> > +   .flash = "flash",
>> > +};
>> > +
>> > +struct lm355x_platform_data lm3556_pdata = {
>> > +   ...
>> > +   .trigger = &lm3556_trigger_name,
>> > +};
>> > +
>> > +Example: Flash control in simple camera driver
>>
>> Hi Milo,
>>
>> Can we share this camera flash trigger with other users? just
>> introduce a new trigger in drivers/leds/ledtrig-flash.c,
>> drivers/leds/ledtrig-torch.c, drivers/leds/ledtrig-indicator.c. So it
>> can shared by plenty of users.
>>
>> And this trigger can be independent with LM355x and use other LED
>> hardware driver underneath.
>>
>> So I expect such trigger driver instead of a specific trigger in LED
>> hardware driver and in other subsystem.
>
> Good idea, Bryan. Thanks!
> Then added name field of lm355x platform data can be gone and driver is
> replaced with fixed name field.
> And how about gathering ledtrig-* drivers in new subdirectory under 
> drivers/leds
> ?(eg. drivers/leds/trigger/)
>

Nice, feel free to submit patches.

-Bryan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH] ARM: DTS: exynos4210-origen: Add MAX8997 node entry

2013-01-09 Thread Kukjin Kim
Thomas Abraham wrote:
> 
> On 4 January 2013 00:23, Tushar Behera  wrote:
> > The node entry is added as per documentation and legacy board file.
> >
> > Signed-off-by: Tushar Behera 
> > ---
> > The patch is rebased on v3.8-rc1.
> >
> >  arch/arm/boot/dts/exynos4210-origen.dts |  150
> +++
> >  1 files changed, 150 insertions(+), 0 deletions(-)
> 
> This patch has already been posted. Here is the link:
> https://patchwork.kernel.org/patch/1794501/
> 
OK, I will apply.

Thanks.

- Kukjin

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH] ARM: DTS: exynos4210-origen: Add MAX8997 node entry

2013-01-09 Thread Kukjin Kim
Tushar Behera wrote:
> 
> On 01/04/2013 10:51 PM, Thomas Abraham wrote:
> > On 4 January 2013 00:23, Tushar Behera  wrote:
> >> The node entry is added as per documentation and legacy board file.
> >>
> >> Signed-off-by: Tushar Behera 
> >> ---
> >> The patch is rebased on v3.8-rc1.
> >>
> >>  arch/arm/boot/dts/exynos4210-origen.dts |  150
> +++
> >>  1 files changed, 150 insertions(+), 0 deletions(-)
> 
> Please ignore this patch.
> 
OK.

> >
> > This patch has already been posted. Here is the link:
> > https://patchwork.kernel.org/patch/1794501/
> >
> 
> Sorry, had missed that patch. I have added a 'Tested-by' to the original
> patch.
> 
OK, I will add 'Tested-by' when I apply that.

Thanks.

- Kukjin

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH] leds-lm355x: support LED trigger functionality

2013-01-09 Thread Kim, Milo
> -Original Message-
> From: Bryan Wu [mailto:coolo...@gmail.com]
> Sent: Thursday, January 10, 2013 10:16 AM
> To: Kim, Milo
> Cc: Jeong, Daniel; gshark.je...@gmail.com; linux-l...@vger.kernel.org;
> linux-kernel@vger.kernel.org
> Subject: Re: [PATCH] leds-lm355x: support LED trigger functionality
> 
> On Mon, Jan 7, 2013 at 10:35 PM, Kim, Milo  wrote:
> >  LM355x family devices provide flash, torch and indicator functions.
> >  This patch support LED trigger feature.
> >  Using LED trigger APIs(), other driver simply turn on/off the flash,
> torch
> >  and indicator.
> >
> >  Platform data
> >   the name of LED trigger is configurable.
> >
> >  Documentation
> >   example and detailed description added.
> >
> > Signed-off-by: Milo(Woogyom) Kim 
> > ---
> >  Documentation/leds/leds-lm3556.txt|   62
> +
> >  drivers/leds/leds-lm355x.c|3 ++
> >  include/linux/platform_data/leds-lm355x.h |8 
> >  3 files changed, 73 insertions(+)
> >
> > diff --git a/Documentation/leds/leds-lm3556.txt
> b/Documentation/leds/leds-lm3556.txt
> > index d9eb91b..73244cd 100644
> > --- a/Documentation/leds/leds-lm3556.txt
> > +++ b/Documentation/leds/leds-lm3556.txt
> > @@ -83,3 +83,65 @@ and register it in the platform init function
> >  Example:
> > board_register_i2c_bus(4, 400,
> > board_i2c_ch4,
> ARRAY_SIZE(board_i2c_ch4));
> > +
> > +Support LED Triggers
> > +
> > +Flash, torch and indicator can be controlled not only by an user-
> space but also
> > +by other drivers, kernel space.
> > +For example, flash turns on by camera driver internally.
> > +To support this functionality, LED trigger is registered.
> > +The name of LED trigger is configurable in the platform data.
> > +
> > +Example: LED trigger name for flash
> > +#include 
> > +
> > +struct lm355x_trigger_name lm3556_trigger_name = {
> > +   .flash = "flash",
> > +};
> > +
> > +struct lm355x_platform_data lm3556_pdata = {
> > +   ...
> > +   .trigger = &lm3556_trigger_name,
> > +};
> > +
> > +Example: Flash control in simple camera driver
> 
> Hi Milo,
> 
> Can we share this camera flash trigger with other users? just
> introduce a new trigger in drivers/leds/ledtrig-flash.c,
> drivers/leds/ledtrig-torch.c, drivers/leds/ledtrig-indicator.c. So it
> can shared by plenty of users.
> 
> And this trigger can be independent with LM355x and use other LED
> hardware driver underneath.
> 
> So I expect such trigger driver instead of a specific trigger in LED
> hardware driver and in other subsystem.

Good idea, Bryan. Thanks!
Then added name field of lm355x platform data can be gone and driver is
replaced with fixed name field.
And how about gathering ledtrig-* drivers in new subdirectory under drivers/leds
?(eg. drivers/leds/trigger/)

Best Regards,
Milo

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH] ARM: dts: exynos5250: Set HDMI version to v1.4

2013-01-09 Thread Kukjin Kim
Sean Paul wrote:
> 
> On Mon, Jan 7, 2013 at 4:00 PM, Sean Paul  wrote:
> > Set the HDMI version to v1.4 for all exynos5250 targets. This property
> > was previously being inferred by the existence of the hdmi node.
> >
> > Signed-off-by: Sean Paul 
> > ---
> 
> The patch adding the property has churned a bit from
> devicetree-discuss reviews. I'll upload this again after it's settled.
> 
> Please drop.
> 
OK, thanks for your information.

- Kukjin

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH 1/3] pinctrl: exynos: add exynos5250 SoC specific data

2013-01-09 Thread Kukjin Kim
Linus Walleij wrote:
> 
> On Thu, Dec 27, 2012 at 5:58 PM, Kukjin Kim 
> wrote:
> 
> > And I think, would be clear if the config could be changed like
following.
> >
> > 8<--
> > From: Kukjin Kim 
> > Subject: [PATCH] pinctrl: exynos: change PINCTRL_EXYNOS option
> >
> > Since pinctrl-exynos can support exynos4 and exynos5 so changed
> > the option name to PINCTRL_EXYNOS for more clarity.
> >
> > Cc: Thomas Abraham 
> > Cc: Linus Walleij 
> > Cc: Grant Likely 
> > Signed-off-by: Kukjin Kim 
> 
> Acked-by: Linus Walleij 
> 
> Shall I take this into the pinctrl tree?
> 
Yes, please :-)

Just note, regarding Samsung pinctrl changes in the following:
git://git.kernel.org/pub/scm/linux/kernel/git/kgene/linux-samsung.git
next/pinctrl-exynos

Thanks.

- Kukjin

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7u1 20/31] x86, kexec: replace ident_mapping_init and init_level4_page

2013-01-09 Thread Yinghai Lu
On Sat, Jan 5, 2013 at 5:24 AM, Borislav Petkov  wrote:
> On Fri, Jan 04, 2013 at 02:04:05PM -0800, Yinghai Lu wrote:
>> On Fri, Jan 4, 2013 at 1:01 PM, Borislav Petkov  wrote:
>> > On Thu, Jan 03, 2013 at 04:48:40PM -0800, Yinghai Lu wrote:
>> >>  static int init_pgtable(struct kimage *image, unsigned long 
>> >> start_pgtable)
>> >>  {
>> >> + struct x86_mapping_info info = {
>> >> + .alloc_pgt_page = alloc_pgt_page,
>> >> + .context= image,
>> >> + .pmd_flag   = __PAGE_KERNEL_LARGE_EXEC,
>> >> + };
>> >
>> > This is leaving ->kernel_mapping uninitialized to contain a random,
>> > previous stack value. I don't think we want that.
>>
>> that should be initialized to false by default.
>
> So make it explicit. You can't possibly rely on what the stack contains
> when you allocate that struct there.

I should say:

that *is* initialized to false by default.

please check

http://stackoverflow.com/questions/10828294/c-and-c-partial-initialization-of-automatic-structure
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] leds-lm355x: support LED trigger functionality

2013-01-09 Thread Bryan Wu
On Mon, Jan 7, 2013 at 10:35 PM, Kim, Milo  wrote:
>  LM355x family devices provide flash, torch and indicator functions.
>  This patch support LED trigger feature.
>  Using LED trigger APIs(), other driver simply turn on/off the flash, torch
>  and indicator.
>
>  Platform data
>   the name of LED trigger is configurable.
>
>  Documentation
>   example and detailed description added.
>
> Signed-off-by: Milo(Woogyom) Kim 
> ---
>  Documentation/leds/leds-lm3556.txt|   62 
> +
>  drivers/leds/leds-lm355x.c|3 ++
>  include/linux/platform_data/leds-lm355x.h |8 
>  3 files changed, 73 insertions(+)
>
> diff --git a/Documentation/leds/leds-lm3556.txt 
> b/Documentation/leds/leds-lm3556.txt
> index d9eb91b..73244cd 100644
> --- a/Documentation/leds/leds-lm3556.txt
> +++ b/Documentation/leds/leds-lm3556.txt
> @@ -83,3 +83,65 @@ and register it in the platform init function
>  Example:
> board_register_i2c_bus(4, 400,
> board_i2c_ch4, ARRAY_SIZE(board_i2c_ch4));
> +
> +Support LED Triggers
> +
> +Flash, torch and indicator can be controlled not only by an user-space but 
> also
> +by other drivers, kernel space.
> +For example, flash turns on by camera driver internally.
> +To support this functionality, LED trigger is registered.
> +The name of LED trigger is configurable in the platform data.
> +
> +Example: LED trigger name for flash
> +#include 
> +
> +struct lm355x_trigger_name lm3556_trigger_name = {
> +   .flash = "flash",
> +};
> +
> +struct lm355x_platform_data lm3556_pdata = {
> +   ...
> +   .trigger = &lm3556_trigger_name,
> +};
> +
> +Example: Flash control in simple camera driver

Hi Milo,

Can we share this camera flash trigger with other users? just
introduce a new trigger in drivers/leds/ledtrig-flash.c,
drivers/leds/ledtrig-torch.c, drivers/leds/ledtrig-indicator.c. So it
can shared by plenty of users.

And this trigger can be independent with LM355x and use other LED
hardware driver underneath.

So I expect such trigger driver instead of a specific trigger in LED
hardware driver and in other subsystem.

-Bryan

> +#include 
> +
> +#ifdef CONFIG_LEDS_TRIGGERS
> +DEFINE_LED_TRIGGER(flash_led_trigger);
> +#endif
> +
> +static int foo_camera_init()
> +{
> +   ...
> +
> +#ifdef CONFIG_LEDS_TRIGGERS
> +   /* should be same name as in lm355x_platform_data */
> +   led_trigger_register_simple("flash", &flash_led_trigger);
> +#endif
> +
> +   ...
> +}
> +
> +static void foo_camera_exit()
> +{
> +   ...
> +
> +#ifdef CONFIG_LEDS_TRIGGERS
> +   led_trigger_unregister_simple(flash_led_trigger);
> +#endif
> +
> +   ...
> +}
> +
> +#ifdef CONFIG_LEDS_TRIGGERS
> +static void foo_camera_flash_ctrl(bool on)
> +{
> +   if (on)
> +   led_trigger_event(flash_led_trigger, LED_FULL);
> +   else
> +   led_trigger_event(flash_led_trigger, LED_OFF);
> +}
> +#else
> +#define foo_camera_flash_ctrl  NULL
> +#endif
> diff --git a/drivers/leds/leds-lm355x.c b/drivers/leds/leds-lm355x.c
> index 65d7928..29df4c0 100644
> --- a/drivers/leds/leds-lm355x.c
> +++ b/drivers/leds/leds-lm355x.c
> @@ -477,6 +477,7 @@ static int lm355x_probe(struct i2c_client *client,
> chip->cdev_flash.name = "flash";
> chip->cdev_flash.max_brightness = 16;
> chip->cdev_flash.brightness_set = lm355x_strobe_brightness_set;
> +   chip->cdev_flash.default_trigger = pdata->trigger->flash;
> err = led_classdev_register((struct device *)
> &client->dev, &chip->cdev_flash);
> if (err < 0)
> @@ -486,6 +487,7 @@ static int lm355x_probe(struct i2c_client *client,
> chip->cdev_torch.name = "torch";
> chip->cdev_torch.max_brightness = 8;
> chip->cdev_torch.brightness_set = lm355x_torch_brightness_set;
> +   chip->cdev_torch.default_trigger = pdata->trigger->torch;
> err = led_classdev_register((struct device *)
> &client->dev, &chip->cdev_torch);
> if (err < 0)
> @@ -499,6 +501,7 @@ static int lm355x_probe(struct i2c_client *client,
> else
> chip->cdev_indicator.max_brightness = 8;
> chip->cdev_indicator.brightness_set = lm355x_indicator_brightness_set;
> +   chip->cdev_indicator.default_trigger = pdata->trigger->indicator;
> err = led_classdev_register((struct device *)
> &client->dev, &chip->cdev_indicator);
> if (err < 0)
> diff --git a/include/linux/platform_data/leds-lm355x.h 
> b/include/linux/platform_data/leds-lm355x.h
> index b88724b..b64d312 100644
> --- a/include/linux/platform_data/leds-lm355x.h
> +++ b/include/linux/platform_data/leds-lm355x.h
> @@ -42,6 +42,12 @@ enum lm355x_pmode {
> LM355x_PMODE_ENABLE = 0x04,
>  };
>
> +struct lm355x_trigger_name {
> +   const char *flash;
> +   const char *torc

RE: sd8688 firmware location

2013-01-09 Thread Bing Zhao
Hi Dan,

> > btmrvl_sdio and libertas_sdio both use firmware files sd8688.bin and
> > sd8688_helper.bin. In linux-firmware, they're present in libertas/ tree and
> > (since 3d32a58b) libertas_sdio perfers loading it from there, while it is 
> > able
> > to fallback to load it from linux-firmware root. btmrvl_sdio, on the other 
> > hand
> > only looks in the root and ends up not being successful.
> >
> > Obviously, there are two solutions to the problem -- either teach 
> > btmrvl_sdio
> > to look into libertas/, or move the files in linux-firmware tree. I don't
> > really have a strong preference, though it probably makes less sense to 
> > keep in
> > in libertas/, since the bluetooth hardware is not really marketed as 
> > "Libertas."
> >
> > I'm following up with patches to linux and linux-firmware and I'd be very
> > thankful if you could pick one (not both of them).
> 
> So the BT part and the wifi part have different SDIO IDs; are they
> actually connected separately to the SDIO bus?  Or is the chip only in
> one mode at one time or something like that?  Is there a problem with
> having both libertas and btmrvl loaded at the same time since they're
> essentially the same chip?

SD8688 is a combo chip with different device IDs for WLAN (function 1) and BT 
(function 2). WLAN + BT together are connected to the bus with a single SDIO 
interface. Both libertas and btmrvl drivers can be loaded at the same time, and 
WLAN & BT functions work concurrently. Of course the SDIO bus is shared between 
WLAN and BT.

The firmware image can be downloaded from host to device by either libertas or 
btmrvl driver whoever becomes the winner from firmware downloading perspective.

Regards,
Bing

> 
> I don't really mind moving stuff to mrvl/ out of libertas/ for these
> devices, but I do want some backwards compat code in libertas for that.
> Unless, of course, Marcel was talking about symlinks in the
> linux-firmware git tree, which would be fine with me.  The important
> point is that simply updating your linux-firmware package or install or
> whatever *should not* result in a failed firmware load.
> 
> Dan


Re: i915_hangcheck_hung problem with 3.8-rc2+ (Linus's latest tree)

2013-01-09 Thread Dave Airlie
On Thu, Jan 10, 2013 at 11:07 AM, Chris Wilson  wrote:
> On Wed, 9 Jan 2013 16:40:25 -0800, Greg KH  wrote:
>> On Wed, Jan 09, 2013 at 02:12:04PM -0600, Dave Kleikamp wrote:
>> > On 01/09/2013 01:44 PM, Dave Kleikamp wrote:
>> > >
>> > > I can easily reproduce it running glxgears on 3.8-rc1 or 3.8-rc2.
>> > >
>> > > 00:02.0 VGA compatible controller [0300]: Intel Corporation Core
>> > > Processor Integrated Graphics Controller [8086:0046] (rev 02)
>> > >
>> > > Thinkpad T410
>> > >
>> > > Shaggy
>> >
>> > Daniel's patch:
>> >
>> > drm/i915: Revert shrinker changes from "Track unbound pages"
>> >
>> > fixes the problem for me.
>>
>> After an afternoon of multiple kernel builds and other stressful things,
>> it looks like it fixes it for me as well.  Chris, this will be going to
>> Linus soon, right?
>
> Daniel will send it on. I hope before he does so, he will clarify the
> changelog to note that it is just papering over the issue. If the
> conjecture is right, it will not prevent that path from triggering the
> hang, nor does it prevent other eviction paths from potentially causing
> the same issue.

In this case since the issue was papered over all the kernel up until
3.7, I think repapering is the answer for now. I have a novel idea
maybe someone could spend some time working out what is broken in
private on a test box instead of making everyone who runs 3.7 and 3.8
on ILK deal with it. I of course know this won't happen and I'll be
reverting patches from you guys that cause Ironlake flakyness for
ever.

Dave.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] drivers/cpufreq: Warn user when powernow-k8 tries to fall back to acpi-cpufreq and it is unavailable.

2013-01-09 Thread Aravind Gopalakrishnan
This patch is in reference to bug#:51741. 
(https://bugzilla.kernel.org/show_bug.cgi?id=51741)
powernow-k8 falls back to acpi-cpufreq if CPU is not supported. However, it 
states that acpi-cpufreq
has taken over even if acpi-cpufreq is not compiled in. This patch rewords the 
warning message to
clarify that the CPU is unsupported and prints a warning message when there is 
no acpi-cpufreq
present.

Signed-off-by: Aravind Gopalakrishnan 
---
 drivers/cpufreq/powernow-k8.c |   10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/drivers/cpufreq/powernow-k8.c b/drivers/cpufreq/powernow-k8.c
index 056faf6..6fa58b4 100644
--- a/drivers/cpufreq/powernow-k8.c
+++ b/drivers/cpufreq/powernow-k8.c
@@ -1256,7 +1256,15 @@ static int __cpuinit powernowk8_init(void)
int rv;
 
if (static_cpu_has(X86_FEATURE_HW_PSTATE)) {
-   pr_warn(PFX "this CPU is not supported anymore, using 
acpi-cpufreq instead.\n");
+   pr_warn(PFX
+   "this CPU is not supported anymore, use acpi-cpufreq 
instead"
+   "Look for message from acpi-cpufreq to ensure it is 
loaded."
+   ".\n");
+#ifndef CONFIG_X86_ACPI_CPUFREQ
+   pr_warn(PFX "acpi-cpufreq is disabled."
+   "Enable it in the config options to get frequency scaling.\n");
+   return -ENODEV;
+#endif
request_module("acpi-cpufreq");
return -ENODEV;
}
-- 
1.7.10.4


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: i915_hangcheck_hung problem with 3.8-rc2+ (Linus's latest tree)

2013-01-09 Thread Chris Wilson
On Wed, 9 Jan 2013 16:40:25 -0800, Greg KH  wrote:
> On Wed, Jan 09, 2013 at 02:12:04PM -0600, Dave Kleikamp wrote:
> > On 01/09/2013 01:44 PM, Dave Kleikamp wrote:
> > > 
> > > I can easily reproduce it running glxgears on 3.8-rc1 or 3.8-rc2.
> > > 
> > > 00:02.0 VGA compatible controller [0300]: Intel Corporation Core
> > > Processor Integrated Graphics Controller [8086:0046] (rev 02)
> > > 
> > > Thinkpad T410
> > > 
> > > Shaggy
> > 
> > Daniel's patch:
> > 
> > drm/i915: Revert shrinker changes from "Track unbound pages"
> > 
> > fixes the problem for me.
> 
> After an afternoon of multiple kernel builds and other stressful things,
> it looks like it fixes it for me as well.  Chris, this will be going to
> Linus soon, right?

Daniel will send it on. I hope before he does so, he will clarify the
changelog to note that it is just papering over the issue. If the
conjecture is right, it will not prevent that path from triggering the
hang, nor does it prevent other eviction paths from potentially causing
the same issue.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 6/6][v3] perf: Document the ABI of perf sysfs entries

2013-01-09 Thread Sukadev Bhattiprolu
[PATCH 6/6][v3] perf: Document the ABI of perf sysfs entries

This patchset addes two new sets of files to sysfs:

- generic and POWER-specific perf events in /sys/devices/cpu/events/
- perf event config format in /sys/devices/cpu/format/event

Document the format of these files which would become part of the ABI.

Changelog[v3]:
[Greg KH] Include ABI documentation.

Signed-off-by: Sukadev Bhattiprolu 
---
 Documentation/ABI/stable/sysfs-devices-cpu-events |   54 +
 Documentation/ABI/stable/sysfs-devices-cpu-format |   27 ++
 2 files changed, 81 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/ABI/stable/sysfs-devices-cpu-format

diff --git a/Documentation/ABI/stable/sysfs-devices-cpu-events 
b/Documentation/ABI/stable/sysfs-devices-cpu-events
index e69de29..f37d542 100644
--- a/Documentation/ABI/stable/sysfs-devices-cpu-events
+++ b/Documentation/ABI/stable/sysfs-devices-cpu-events
@@ -0,0 +1,54 @@
+What:  /sys/devices/cpu/events/
+   /sys/devices/cpu/events/branch-misses
+   /sys/devices/cpu/events/cache-references
+   /sys/devices/cpu/events/cache-misses
+   /sys/devices/cpu/events/stalled-cycles-frontend
+   /sys/devices/cpu/events/branch-instructions
+   /sys/devices/cpu/events/stalled-cycles-backend
+   /sys/devices/cpu/events/instructions
+   /sys/devices/cpu/events/cpu-cycles
+
+Date:  2013/01/08
+
+Contact:   Linux kernel mailing list 
+
+Description:   Generic performance monitoring events
+
+   A collection of performance monitoring events that may be
+   supported by many/most CPUs. These events can be monitored
+   using the 'perf(1)' tool.
+
+   The contents of each file would look like:
+
+   event=0x
+
+   where 'N' is a hex digit.
+
+
+What:  /sys/devices/cpu/events/PM_LD_MISS_L1
+   /sys/devices/cpu/events/PM_LD_REF_L1
+   /sys/devices/cpu/events/PM_CYC
+   /sys/devices/cpu/events/PM_BRU_FIN
+   /sys/devices/cpu/events/PM_GCT_NOSLOT_CYC
+   /sys/devices/cpu/events/PM_BRU_MPRED
+   /sys/devices/cpu/events/PM_INST_CMPL
+   /sys/devices/cpu/events/PM_CMPLU_STALL
+
+Date:  2013/01/08
+
+Contact:   Linux kernel mailing list 
+   Linux Powerpc mailing list 
+
+Description:   POWER specific performance monitoring events
+
+   A collection of performance monitoring events that may be
+   supported by the POWER CPU. These events can be monitored
+   using the 'perf(1)' tool.
+
+   These events may not be supported by other CPUs.
+
+   The contents of each file would look like:
+
+   event=0x
+
+   where 'N' is a hex digit.
diff --git a/Documentation/ABI/stable/sysfs-devices-cpu-format 
b/Documentation/ABI/stable/sysfs-devices-cpu-format
new file mode 100644
index 000..b15cfb2
--- /dev/null
+++ b/Documentation/ABI/stable/sysfs-devices-cpu-format
@@ -0,0 +1,27 @@
+What:  /sys/devices/cpu/format/
+   /sys/devices/cpu/format/event
+
+Date:  2013/01/08
+
+Contact:   Linux kernel mailing list 
+
+Description:   Format of performance monitoring events
+
+   Each CPU/architecture may use different format to represent
+   the perf event.  The 'event' file describes the configuration
+   format of the performance monitoring event on the CPU/system.
+
+   The contents of each file would look like:
+
+   config:m-n
+
+   where m and n are the starting and ending bits that are
+   used to represent the event.
+
+   For example, on POWER,
+
+   $ cat /sys/devices/cpu/format/event
+   config:0-20
+
+   meaning that POWER uses the first 20-bits to represent a perf
+   event.
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 5/6][v3] perf: Create a sysfs entry for Power event format

2013-01-09 Thread Sukadev Bhattiprolu
[PATCH 5/6][v3] perf: Create a sysfs entry for Power event format

Create a sysfs entry, '/sys/bus/event_source/devices/cpu/format/event'
which describes the format of a POWER cpu.

The format of the event is the same for all POWER cpus at least in
(Power6, Power7), so bulk of this change is common in the code common
to POWER cpus.

This code is based on corresponding code in x86.

Changelog[v2]: [Jiri Olsa] Use PMU_FORMAT_ATTR() rather than duplicating it.

Signed-off-by: Sukadev Bhattiprolu 
---
 arch/powerpc/include/asm/perf_event_server.h |6 ++
 arch/powerpc/perf/core-book3s.c  |   12 
 arch/powerpc/perf/power7-pmu.c   |1 +
 3 files changed, 19 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/include/asm/perf_event_server.h 
b/arch/powerpc/include/asm/perf_event_server.h
index b29fcc6..ee63205 100644
--- a/arch/powerpc/include/asm/perf_event_server.h
+++ b/arch/powerpc/include/asm/perf_event_server.h
@@ -135,3 +135,9 @@ extern ssize_t power_events_sysfs_show(struct device *dev,
 
 #definePOWER_EVENT_ATTR(_name, _id)EVENT_ATTR(PM_##_name, _id, _p)
 #definePOWER_EVENT_PTR(_id)EVENT_PTR(_id, _p)
+
+/*
+ * Format of a perf event is the same on all POWER cpus. Declare a
+ * common sysfs attribute group that individual POWER cpus can share.
+ */
+extern struct attribute_group power_pmu_format_group;
diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
index fa476d5..4ae044b 100644
--- a/arch/powerpc/perf/core-book3s.c
+++ b/arch/powerpc/perf/core-book3s.c
@@ -1315,6 +1315,18 @@ ssize_t power_events_sysfs_show(struct device *dev,
return sprintf(page, "event=0x%02llx\n", pmu_attr->id);
 }
 
+PMU_FORMAT_ATTR(event, "config:0-20");
+
+static struct attribute *power_pmu_format_attr[] = {
+   &format_attr_event.attr,
+   NULL,
+};
+
+struct attribute_group power_pmu_format_group = {
+   .name = "format",
+   .attrs = power_pmu_format_attr,
+};
+
 struct pmu power_pmu = {
.pmu_enable = power_pmu_enable,
.pmu_disable= power_pmu_disable,
diff --git a/arch/powerpc/perf/power7-pmu.c b/arch/powerpc/perf/power7-pmu.c
index 5627940..5fb3c9b 100644
--- a/arch/powerpc/perf/power7-pmu.c
+++ b/arch/powerpc/perf/power7-pmu.c
@@ -410,6 +410,7 @@ static struct attribute_group power7_pmu_events_group = {
 };
 
 static const struct attribute_group *power7_pmu_attr_groups[] = {
+   &power_pmu_format_group,
&power7_pmu_events_group,
NULL,
 };
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 4/6][v3] perf/POWER7: Make some POWER7 events available in sysfs

2013-01-09 Thread Sukadev Bhattiprolu
[PATCH 4/6][v3] perf/POWER7: Make some POWER7 events available in sysfs

Make some POWER7-specific perf events available in sysfs.

$ /bin/ls -1 /sys/bus/event_source/devices/cpu/events/
branch-instructions
branch-misses
cache-misses
cache-references
cpu-cycles
instructions
PM_BRU_FIN
PM_BRU_MPRED
PM_CMPLU_STALL
PM_CYC
PM_GCT_NOSLOT_CYC
PM_INST_CMPL
PM_LD_MISS_L1
PM_LD_REF_L1
stalled-cycles-backend
stalled-cycles-frontend

where the 'PM_*' events are POWER specific and the others are the
generic events.

This will enable users to specify these events with their symbolic
names rather than with their raw code.

perf stat -e 'cpu/PM_CYC/' ...

Signed-off-by: Sukadev Bhattiprolu 
---
 arch/powerpc/include/asm/perf_event_server.h |2 ++
 arch/powerpc/perf/power7-pmu.c   |   18 ++
 2 files changed, 20 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/include/asm/perf_event_server.h 
b/arch/powerpc/include/asm/perf_event_server.h
index 3f21d89..b29fcc6 100644
--- a/arch/powerpc/include/asm/perf_event_server.h
+++ b/arch/powerpc/include/asm/perf_event_server.h
@@ -133,3 +133,5 @@ extern ssize_t power_events_sysfs_show(struct device *dev,
 #defineGENERIC_EVENT_ATTR(_name, _id)  EVENT_ATTR(_name, _id, _g)
 #defineGENERIC_EVENT_PTR(_id)  EVENT_PTR(_id, _g)
 
+#definePOWER_EVENT_ATTR(_name, _id)EVENT_ATTR(PM_##_name, _id, _p)
+#definePOWER_EVENT_PTR(_id)EVENT_PTR(_id, _p)
diff --git a/arch/powerpc/perf/power7-pmu.c b/arch/powerpc/perf/power7-pmu.c
index ae5d757..5627940 100644
--- a/arch/powerpc/perf/power7-pmu.c
+++ b/arch/powerpc/perf/power7-pmu.c
@@ -373,6 +373,15 @@ GENERIC_EVENT_ATTR(cache-misses,   LD_MISS_L1);
 GENERIC_EVENT_ATTR(branch-instructions,BRU_FIN);
 GENERIC_EVENT_ATTR(branch-misses,  BRU_MPRED);
 
+POWER_EVENT_ATTR(CYC,  CYC);
+POWER_EVENT_ATTR(GCT_NOSLOT_CYC,   GCT_NOSLOT_CYC);
+POWER_EVENT_ATTR(CMPLU_STALL,  CMPLU_STALL);
+POWER_EVENT_ATTR(INST_CMPL,INST_CMPL);
+POWER_EVENT_ATTR(LD_REF_L1,LD_REF_L1);
+POWER_EVENT_ATTR(LD_MISS_L1,   LD_MISS_L1);
+POWER_EVENT_ATTR(BRU_FIN,  BRU_FIN)
+POWER_EVENT_ATTR(BRU_MPRED,BRU_MPRED);
+
 static struct attribute *power7_events_attr[] = {
GENERIC_EVENT_PTR(CYC),
GENERIC_EVENT_PTR(GCT_NOSLOT_CYC),
@@ -382,6 +391,15 @@ static struct attribute *power7_events_attr[] = {
GENERIC_EVENT_PTR(LD_MISS_L1),
GENERIC_EVENT_PTR(BRU_FIN),
GENERIC_EVENT_PTR(BRU_MPRED),
+
+   POWER_EVENT_PTR(CYC),
+   POWER_EVENT_PTR(GCT_NOSLOT_CYC),
+   POWER_EVENT_PTR(CMPLU_STALL),
+   POWER_EVENT_PTR(INST_CMPL),
+   POWER_EVENT_PTR(LD_REF_L1),
+   POWER_EVENT_PTR(LD_MISS_L1),
+   POWER_EVENT_PTR(BRU_FIN),
+   POWER_EVENT_PTR(BRU_MPRED),
NULL
 };
 
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3/6][v3] perf/POWER7: Make generic event translations available in sysfs

2013-01-09 Thread Sukadev Bhattiprolu
[PATCH 3/6][v3] perf/POWER7: Make generic event translations available in sysfs

Make the generic perf events in POWER7 available via sysfs.

$ ls /sys/bus/event_source/devices/cpu/events
branch-instructions
branch-misses
cache-misses
cache-references
cpu-cycles
instructions
stalled-cycles-backend
stalled-cycles-frontend

$ cat /sys/bus/event_source/devices/cpu/events/cache-misses
event=0x400f0

This patch is based on commits that implement this functionality on x86.
Eg:
commit a47473939db20e3961b200eb00acf5fcf084d755
Author: Jiri Olsa 
Date:   Wed Oct 10 14:53:11 2012 +0200

perf/x86: Make hardware event translations available in sysfs

Changelog:[v3]
[Jiri Olsa] Drop EVENT_ID() macro since it is only used once.

Signed-off-by: Sukadev Bhattiprolu 
---
 arch/powerpc/include/asm/perf_event_server.h  |   24 ++
 arch/powerpc/perf/core-book3s.c   |   12 +++
 arch/powerpc/perf/power7-pmu.c|   34 +
 3 files changed, 70 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/ABI/stable/sysfs-devices-cpu-events

diff --git a/Documentation/ABI/stable/sysfs-devices-cpu-events 
b/Documentation/ABI/stable/sysfs-devices-cpu-events
new file mode 100644
index 000..e69de29
diff --git a/arch/powerpc/include/asm/perf_event_server.h 
b/arch/powerpc/include/asm/perf_event_server.h
index 9710be3..3f21d89 100644
--- a/arch/powerpc/include/asm/perf_event_server.h
+++ b/arch/powerpc/include/asm/perf_event_server.h
@@ -11,6 +11,7 @@
 
 #include 
 #include 
+#include 
 
 #define MAX_HWEVENTS   8
 #define MAX_EVENT_ALTERNATIVES 8
@@ -35,6 +36,7 @@ struct power_pmu {
void(*disable_pmc)(unsigned int pmc, unsigned long mmcr[]);
int (*limited_pmc_event)(u64 event_id);
u32 flags;
+   const struct attribute_group**attr_groups;
int n_generic;
int *generic_events;
int (*cache_events)[PERF_COUNT_HW_CACHE_MAX]
@@ -109,3 +111,25 @@ extern unsigned long perf_instruction_pointer(struct 
pt_regs *regs);
  * If an event_id is not subject to the constraint expressed by a particular
  * field, then it will have 0 in both the mask and value for that field.
  */
+
+extern ssize_t power_events_sysfs_show(struct device *dev,
+   struct device_attribute *attr, char *page);
+
+/*
+ * EVENT_VAR() is same as PMU_EVENT_VAR with a suffix.
+ *
+ * Having a suffix allows us to have aliases in sysfs - eg: the generic
+ * event 'cpu-cycles' can have two entries in sysfs: 'cpu-cycles' and
+ * 'PM_CYC' where the latter is the name by which the event is known in
+ * POWER CPU specification.
+ */
+#defineEVENT_VAR(_id, _suffix) event_attr_##_id##_suffix
+#defineEVENT_PTR(_id, _suffix) &EVENT_VAR(_id, _suffix)
+
+#defineEVENT_ATTR(_name, _id, _suffix) 
\
+   PMU_EVENT_ATTR(_name, EVENT_VAR(_id, _suffix), PME_PM_##_id,\
+   power_events_sysfs_show)
+
+#defineGENERIC_EVENT_ATTR(_name, _id)  EVENT_ATTR(_name, _id, _g)
+#defineGENERIC_EVENT_PTR(_id)  EVENT_PTR(_id, _g)
+
diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
index aa2465e..fa476d5 100644
--- a/arch/powerpc/perf/core-book3s.c
+++ b/arch/powerpc/perf/core-book3s.c
@@ -1305,6 +1305,16 @@ static int power_pmu_event_idx(struct perf_event *event)
return event->hw.idx;
 }
 
+ssize_t power_events_sysfs_show(struct device *dev,
+   struct device_attribute *attr, char *page)
+{
+   struct perf_pmu_events_attr *pmu_attr;
+
+   pmu_attr = container_of(attr, struct perf_pmu_events_attr, attr);
+
+   return sprintf(page, "event=0x%02llx\n", pmu_attr->id);
+}
+
 struct pmu power_pmu = {
.pmu_enable = power_pmu_enable,
.pmu_disable= power_pmu_disable,
@@ -1537,6 +1547,8 @@ int __cpuinit register_power_pmu(struct power_pmu *pmu)
pr_info("%s performance monitor hardware support registered\n",
pmu->name);
 
+   power_pmu.attr_groups = ppmu->attr_groups;
+
 #ifdef MSR_HV
/*
 * Use FCHV to ignore kernel events if MSR.HV is set.
diff --git a/arch/powerpc/perf/power7-pmu.c b/arch/powerpc/perf/power7-pmu.c
index 44e70d2..ae5d757 100644
--- a/arch/powerpc/perf/power7-pmu.c
+++ b/arch/powerpc/perf/power7-pmu.c
@@ -363,6 +363,39 @@ static int 
power7_cache_events[C(MAX)][C(OP_MAX)][C(RESULT_MAX)] = {
},
 };
 
+
+GENERIC_EVENT_ATTR(cpu-cycles, CYC);
+GENERIC_EVENT_ATTR(stalled-cycles-frontend,GCT_NOSLOT_CYC);
+GENERIC_EVENT_ATTR(stalled-cycles-backend, CMPLU_STALL);
+GENERIC_EVENT_ATTR(instructions,   INST_CMPL);
+GENERIC_EVENT_ATTR(cache-references,   

[PATCH 2/6][v3] perf: Make EVENT_ATTR global

2013-01-09 Thread Sukadev Bhattiprolu
[PATCH 2/6][v3] perf: Make EVENT_ATTR global

Rename EVENT_ATTR() to PMU_EVENT_ATTR() and make it global so it is
available to all architectures.

Further to allow architectures flexibility, have PMU_EVENT_ATTR() pass
in the variable name as a parameter.

Changelog[v3]
- [Jiri Olsa] No need to define PMU_EVENT_PTR()

Signed-off-by: Sukadev Bhattiprolu 
---
 arch/x86/kernel/cpu/perf_event.c |   13 +++--
 include/linux/perf_event.h   |   11 +++
 2 files changed, 14 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index 4428fd1..59a1238 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -1316,11 +1316,6 @@ static struct attribute_group x86_pmu_format_group = {
.attrs = NULL,
 };
 
-struct perf_pmu_events_attr {
-   struct device_attribute attr;
-   u64 id;
-};
-
 /*
  * Remove all undefined events (x86_pmu.event_map(id) == 0)
  * out of events_attr attributes.
@@ -1354,11 +1349,9 @@ static ssize_t events_sysfs_show(struct device *dev, 
struct device_attribute *at
 #define EVENT_VAR(_id)  event_attr_##_id
 #define EVENT_PTR(_id) &event_attr_##_id.attr.attr
 
-#define EVENT_ATTR(_name, _id) \
-static struct perf_pmu_events_attr EVENT_VAR(_id) = {  \
-   .attr = __ATTR(_name, 0444, events_sysfs_show, NULL),   \
-   .id   =  PERF_COUNT_HW_##_id,   \
-};
+#define EVENT_ATTR(_name, _id) \
+   PMU_EVENT_ATTR(_name, EVENT_VAR(_id), PERF_COUNT_HW_##_id,  \
+   events_sysfs_show)
 
 EVENT_ATTR(cpu-cycles, CPU_CYCLES  );
 EVENT_ATTR(instructions,   INSTRUCTIONS);
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 6bfb2fa..42adf01 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -817,6 +817,17 @@ do {   
\
 } while (0)
 
 
+struct perf_pmu_events_attr {
+   struct device_attribute attr;
+   u64 id;
+};
+
+#define PMU_EVENT_ATTR(_name, _var, _id, _show)
\
+static struct perf_pmu_events_attr _var = {\
+   .attr = __ATTR(_name, 0444, _show, NULL),   \
+   .id   =  _id,   \
+};
+
 #define PMU_FORMAT_ATTR(_name, _format)
\
 static ssize_t \
 _name##_show(struct device *dev,   \
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/6][v3] perf/Power7: Use macros to identify perf events

2013-01-09 Thread Sukadev Bhattiprolu
[PATCH 1/6][v3] perf/Power7: Use macros to identify perf events

Define and use macros to identify perf events codes. This would make it
easier and more readable when these event codes need to be used in more
than one place.

Signed-off-by: Sukadev Bhattiprolu 
---
 arch/powerpc/perf/power7-pmu.c |   28 
 1 files changed, 20 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/perf/power7-pmu.c b/arch/powerpc/perf/power7-pmu.c
index 441af08..44e70d2 100644
--- a/arch/powerpc/perf/power7-pmu.c
+++ b/arch/powerpc/perf/power7-pmu.c
@@ -51,6 +51,18 @@
 #define MMCR1_PMCSEL_MSK   0xff
 
 /*
+ * Power7 event codes.
+ */
+#definePME_PM_CYC  0x1e
+#definePME_PM_GCT_NOSLOT_CYC   0x100f8
+#definePME_PM_CMPLU_STALL  0x4000a
+#definePME_PM_INST_CMPL0x2
+#definePME_PM_LD_REF_L10xc880
+#definePME_PM_LD_MISS_L1   0x400f0
+#definePME_PM_BRU_FIN  0x10068
+#definePME_PM_BRU_MPRED0x400f6
+
+/*
  * Layout of constraint bits:
  * 554433221100
  * 3210987654321098765432109876543210987654321098765432109876543210
@@ -296,14 +308,14 @@ static void power7_disable_pmc(unsigned int pmc, unsigned 
long mmcr[])
 }
 
 static int power7_generic_events[] = {
-   [PERF_COUNT_HW_CPU_CYCLES] = 0x1e,
-   [PERF_COUNT_HW_STALLED_CYCLES_FRONTEND] = 0x100f8, /* GCT_NOSLOT_CYC */
-   [PERF_COUNT_HW_STALLED_CYCLES_BACKEND] = 0x4000a,  /* CMPLU_STALL */
-   [PERF_COUNT_HW_INSTRUCTIONS] = 2,
-   [PERF_COUNT_HW_CACHE_REFERENCES] = 0xc880,  /* LD_REF_L1_LSU*/
-   [PERF_COUNT_HW_CACHE_MISSES] = 0x400f0, /* LD_MISS_L1   */
-   [PERF_COUNT_HW_BRANCH_INSTRUCTIONS] = 0x10068,  /* BRU_FIN  */
-   [PERF_COUNT_HW_BRANCH_MISSES] = 0x400f6,/* BR_MPRED */
+   [PERF_COUNT_HW_CPU_CYCLES] =PME_PM_CYC,
+   [PERF_COUNT_HW_STALLED_CYCLES_FRONTEND] =   PME_PM_GCT_NOSLOT_CYC,
+   [PERF_COUNT_HW_STALLED_CYCLES_BACKEND] =PME_PM_CMPLU_STALL,
+   [PERF_COUNT_HW_INSTRUCTIONS] =  PME_PM_INST_CMPL,
+   [PERF_COUNT_HW_CACHE_REFERENCES] =  PME_PM_LD_REF_L1,
+   [PERF_COUNT_HW_CACHE_MISSES] =  PME_PM_LD_MISS_L1,
+   [PERF_COUNT_HW_BRANCH_INSTRUCTIONS] =   PME_PM_BRU_FIN,
+   [PERF_COUNT_HW_BRANCH_MISSES] = PME_PM_BRU_MPRED,
 };
 
 #define C(x)   PERF_COUNT_HW_CACHE_##x
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] MAINTAINERS: mm: add additional include files to listing

2013-01-09 Thread Cody P Schafer
Add gfp.h, mmzone.h, memory_hotplug.h & vmalloc.h to the "MEMORY
MANAGMENT" section so scripts/get_maintainer.pl can do a better job of
making recommendations.

Signed-off-by: Cody P Schafer 
---
 MAINTAINERS | 4 
 1 file changed, 4 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 915564e..e77ef28 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -5003,6 +5003,10 @@ L:   linux...@kvack.org
 W: http://www.linux-mm.org
 S: Maintained
 F: include/linux/mm.h
+F: include/linux/gfp.h
+F: include/linux/mmzone.h
+F: include/linux/memory_hotplug.h
+F: include/linux/vmalloc.h
 F: mm/
 
 MEMORY RESOURCE CONTROLLER
-- 
1.8.0.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/4] module: add syscall to load module from fd

2013-01-09 Thread Michael Kerrisk (man-pages)
On Wed, Jan 9, 2013 at 6:29 PM, Lucas De Marchi
 wrote:
> On Sun, Jan 6, 2013 at 4:59 PM, Michael Kerrisk (man-pages)
>  wrote:
>> Hi Rusty, (and Lucas, and Kees)
>>
>> On Thu, Jan 3, 2013 at 1:12 AM, Rusty Russell  wrote:
>>> Michael Kerrisk  writes:
 Hi Rusty,
>>>
>>> Hi Michael,
>>>
 The description here is rather thin. Could you supply a sentence or
 two for each of MODULE_INIT_IGNORE_MODVERSIONS and
 MODULE_INIT_IGNORE_VERMAGIC that would be suitable for the manual
 page?

 Thanks,
>>>
>>> There are one or two safety checks built into a module, which are
>>> checked to match the kernel on module load.  The first is a "vermagic"
>>> string containing the kernel version number and prominent features (such
>>> as CPU type).  If the module was built with CONFIG_MODVERSIONS set, a
>>> version hash is recorded for each symbol the module uses based on the
>>> types it refers to: in this case, the kernel version number within the
>>> "vermagic" string is ignored, as the symbol version hashes are assumed
>>> to be sufficiently reliable.
>>>
>>> Using the MODULE_INIT_IGNORE_VERMAGIC flag indicates that the vermagic
>>> is to be ignored, and the MODULE_INIT_IGNORE_MODVERSIONS flag indicates
>>> that the version hashes are to be ignored.  If the kernel is built to
>>> permit such forced loading (ie. CONFIG_MODULE_FORCE_LOAD is set) then
>>> loading will continue, otherwise it will fail with ENOEXEC as expected
>>> for malformed modules.
>>>
>>> Hope that is more usable?
>>
>> Yes, that helps. I did some reworking of that text. Hopefully, I did
>> not introduce any errors.
>>
>> Below is the text that is proposed to document finit_module() in the
>> man pages. I'd appreciate any review (Kees, Lucas, Rusty?)
>>
>> Thanks,
>>
>> Michael
>>
>>finit_module()
>>The finit_module() system call is like init_module(), but reads
>>the module to be loaded from the file  descriptor  fd.   It  is
>>useful  when  the authenticity of a kernel module can be deter‐
>>mined from its location in the file system; in cases where that
>>is  possible,  the  overhead  of using cryptographically signed
>>modules to determine  the  authenticity  of  a  module  can  be
>>avoided.  The param_values argument is as for init_module().
>>
>>The  flags  argument  modifies the operation of finit_module().
>>It is a bit mask value created by ORing together zero  or  more
>>of the following flags:
>>
>>MODULE_INIT_IGNORE_MODVERSIONS
>>   Ignore symbol version hashes.
>>
>>MODULE_INIT_IGNORE_VERMAGIC
>>   Ignore kernel version magic.
>>
>>There are some safety checks built into a module to ensure that
>>it matches the kernel against which it is loaded.  These checks
>>are  recorded  when  the  module is built and verified when the
>>module is loaded.   First,  the  module  records  a  "vermagic"
>>string  containing the kernel version number and prominent fea‐
>>tures (such as the CPU type).  Second, if the module was  built
>>with  the  CONFIG_MODVERSIONS  configuration  option enabled, a
>>version hash is recorded for each symbol the module uses.  This
>>hash  is  based  on the types of the arguments and return value
>>for the function named by the symbol.  In this case, the kernel
>>version  number within the "vermagic" string is ignored, as the
>>symbol version hashes are assumed to be sufficiently reliable.
>>
>>Using the MODULE_INIT_IGNORE_VERMAGIC flag indicates  that  the
>>"vermagic"   string   is   to   be   ignored,   and   the  MOD‐
>>ULE_INIT_IGNORE_MODVERSIONS flag indicates that the symbol ver‐
>>sion  hashes are to be ignored.  If the kernel is built to per‐
>>mit  forced  loading   (i.e.,   configured   with   CONFIG_MOD‐
>>ULE_FORCE_LOAD),  then loading will continue, otherwise it will
>>fail with ENOEXEC as expected for malformed modules.
>> ...
>>ERRORS
>> ...
>>The following errors may additionally occur for finit_module():
>>
>>EBADF  The file referred to by fd is not opened for reading.
>>
>>EFBIG  The file referred to by fd is too large.
>>
>>EINVAL flags is invalid.
>>
>>ENOEXEC
>>   fd does not refer to an open file.
>>
>>
>
>
> Looks good to me.

Thanks for looking it over, Lucas.

Cheers,

Michael

-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Author of "The Linux Programming Interface"; http://man7.org/tlpi/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RESEND v1 06/16] vfs: add temp calculation function

2013-01-09 Thread David Sterba
On Thu, Dec 20, 2012 at 10:43:25PM +0800, zwu.ker...@gmail.com wrote:
> --- a/fs/hot_tracking.c
> +++ b/fs/hot_tracking.c
> @@ -25,6 +25,14 @@
>  static struct kmem_cache *hot_inode_item_cachep __read_mostly;
>  static struct kmem_cache *hot_range_item_cachep __read_mostly;
>  
> +static u64 hot_raw_shift(u64 counter, u32 bits, bool dir)
> +{
> + if (dir)
> + return counter << bits;
> + else
> + return counter >> bits;
> +}

I don't understand the purpose of this function, it obscures a simple
bitwise shift.

> +
>  /*
>   * Initialize the inode tree. Should be called for each new inode
>   * access or other user of the hot_inode interface.
> @@ -315,6 +323,72 @@ static void hot_freq_data_update(struct hot_freq_data 
> *freq_data, bool write)
>  }
>  
>  /*
> + * hot_temp_calc() is responsible for distilling the six heat
> + * criteria down into a single temperature value for the data,
> + * which is an integer between 0 and HEAT_MAX_VALUE.

I didn't find HEAT_MAX_VALUE defined anywhere.

> + */
> +static u32 hot_temp_calc(struct hot_freq_data *freq_data)
> +{
> + u32 result = 0;
> +
> + struct timespec ckt = current_kernel_time();
> + u64 cur_time = timespec_to_ns(&ckt);
> +
> + u32 nrr_heat = (u32)hot_raw_shift((u64)freq_data->nr_reads,
> + NRR_MULTIPLIER_POWER, true);
> + u32 nrw_heat = (u32)hot_raw_shift((u64)freq_data->nr_writes,
> + NRW_MULTIPLIER_POWER, true);

So many typecasts, some of them unnecessary and in connection with
hot_raw_shift this is hard to read and understand.

u32 nrr_heat = (u32)((u64)freq_data->nr_reads << NRR_MULTIPLIER_POWER);

is not much better without a comment why this is doing the right thing.

> +
> + u64 ltr_heat =
> + hot_raw_shift((cur_time - timespec_to_ns(&freq_data->last_read_time)),
> + LTR_DIVIDER_POWER, false);
> + u64 ltw_heat =
> + hot_raw_shift((cur_time - timespec_to_ns(&freq_data->last_write_time)),
> + LTW_DIVIDER_POWER, false);
> +
> + u64 avr_heat =
> + hot_raw_shiftu64) -1) - freq_data->avg_delta_reads),
> + AVR_DIVIDER_POWER, false);
> + u64 avw_heat =
> + hot_raw_shiftu64) -1) - freq_data->avg_delta_writes),
> + AVW_DIVIDER_POWER, false);
> +
> + /* ltr_heat is now guaranteed to be u32 safe */
> + if (ltr_heat >= hot_raw_shift((u64) 1, 32, true))
> + ltr_heat = 0;
> + else
> + ltr_heat = hot_raw_shift((u64) 1, 32, true) - ltr_heat;
> +
> + /* ltw_heat is now guaranteed to be u32 safe */
> + if (ltw_heat >= hot_raw_shift((u64) 1, 32, true))
> + ltw_heat = 0;
> + else
> + ltw_heat = hot_raw_shift((u64) 1, 32, true) - ltw_heat;
> +
> + /* avr_heat is now guaranteed to be u32 safe */
> + if (avr_heat >= hot_raw_shift((u64) 1, 32, true))
> + avr_heat = (u32) -1;
> +
> + /* avw_heat is now guaranteed to be u32 safe */
> + if (avw_heat >= hot_raw_shift((u64) 1, 32, true))
> + avw_heat = (u32) -1;
> +
> + nrr_heat = (u32)hot_raw_shift((u64)nrr_heat,
> + (3 - NRR_COEFF_POWER), false);
> + nrw_heat = (u32)hot_raw_shift((u64)nrw_heat,
> + (3 - NRW_COEFF_POWER), false);
> + ltr_heat = hot_raw_shift(ltr_heat, (3 - LTR_COEFF_POWER), false);
> + ltw_heat = hot_raw_shift(ltw_heat, (3 - LTW_COEFF_POWER), false);
> + avr_heat = hot_raw_shift(avr_heat, (3 - AVR_COEFF_POWER), false);
> + avw_heat = hot_raw_shift(avw_heat, (3 - AVW_COEFF_POWER), false);
> +
> + result = nrr_heat + nrw_heat + (u32) ltr_heat +
> + (u32) ltw_heat + (u32) avr_heat + (u32) avw_heat;

Reading through the function up to here I've got lost in the shifts that
I don't see the meaning of the resulting value and how can I interpet it
if I watch it change over time. What are the expected weights of the
number and time factors? There are more details in the documentation, but
the big picture is blurred by talking implementation details.

Let's put the impl. details here and write a better user documentation
with a few examples to the docs. Is it possible to describe some common
access patterns and how they affect the temperature?

You've been benchmarking this patchset, I'm sure you can write up a few
examples based on that.

> +
> + return result;
> +}

david
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RESEND v1 05/16] vfs: add hooks to enable hot tracking

2013-01-09 Thread David Sterba
On Thu, Dec 20, 2012 at 10:43:24PM +0800, zwu.ker...@gmail.com wrote:
> --- a/fs/direct-io.c
> +++ b/fs/direct-io.c
> @@ -37,6 +37,7 @@
>  #include 
>  #include 
>  #include 
> +#include "hot_tracking.h"
>  
>  /*
>   * How many user pages to map in one call to get_user_pages().  This 
> determines
> @@ -1299,6 +1300,11 @@ __blockdev_direct_IO(int rw, struct kiocb *iocb, 
> struct inode *inode,
>   prefetch(bdev->bd_queue);
>   prefetch((char *)bdev->bd_queue + SMP_CACHE_BYTES);
>  
> + /* Hot data tracking */
> + hot_update_freqs(inode, offset,
> + iov_length(iov, nr_segs),
> + rw & WRITE);

hot_update_freqs takes an 'int rw' directly, so you should pass plain
'rw' here and do the 'rw & WRITE' check in hot_freq_data_update itself.

> +
>   return do_blockdev_direct_IO(rw, iocb, inode, bdev, iov, offset,
>nr_segs, get_block, end_io,
>submit_io, flags);
> --- a/mm/page-writeback.c
> +++ b/mm/page-writeback.c
> @@ -35,6 +35,7 @@
>  #include  /* __set_page_dirty_buffers */
>  #include 
>  #include 
> +#include 
>  #include 
>  
>  /*
> @@ -1902,13 +1903,24 @@ EXPORT_SYMBOL(generic_writepages);
>  int do_writepages(struct address_space *mapping, struct writeback_control 
> *wbc)
>  {
>   int ret;
> + loff_t start = 0;
> + size_t count = 0;
>  
>   if (wbc->nr_to_write <= 0)
>   return 0;
> +
> + start = mapping->writeback_index << PAGE_CACHE_SHIFT;
> + count = wbc->nr_to_write;
> +
>   if (mapping->a_ops->writepages)
>   ret = mapping->a_ops->writepages(mapping, wbc);
>   else
>   ret = generic_writepages(mapping, wbc);
> +
> + /* Hot data tracking */
> + hot_update_freqs(mapping->host, start,
> + (count - wbc->nr_to_write) * PAGE_CACHE_SIZE, 1);

I think the frequencies should not be updated in case of error returned
from writepages.

> +
>   return ret;
>  }
>  
> --- a/mm/readahead.c
> +++ b/mm/readahead.c
> @@ -138,6 +139,12 @@ static int read_pages(struct address_space *mapping, 
> struct file *filp,
>  out:
>   blk_finish_plug(&plug);
>  
> + /* Hot data tracking */
> + hot_update_freqs(mapping->host,
> + (loff_t)(list_entry(pages->prev, struct page, lru)->index)
> + << PAGE_CACHE_SHIFT,
> + (size_t)nr_pages * PAGE_CACHE_SIZE, 0);

same comment here

> +
>   return ret;
>  }


david
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RESEND v1 04/16] vfs: add two map arrays

2013-01-09 Thread David Sterba
On Thu, Dec 20, 2012 at 10:43:23PM +0800, zwu.ker...@gmail.com wrote:
> --- a/fs/hot_tracking.c
> +++ b/fs/hot_tracking.c
> +/* Free inode and range map info */
> +static void hot_map_exit(struct hot_info *root)
> +{
> + int i;
> + for (i = 0; i < HEAT_MAP_SIZE; i++) {
> + spin_lock(&root->heat_inode_map[i].lock);
> + hot_map_list_free(&root->heat_inode_map[i].node_list, root);
> + spin_unlock(&root->heat_inode_map[i].lock);

please insert an empty line here to improve readability

> + spin_lock(&root->heat_range_map[i].lock);
> + hot_map_list_free(&root->heat_range_map[i].node_list, root);
> + spin_unlock(&root->heat_range_map[i].lock);
> + }
> +}
> +
> +/*
>   * Initialize kmem cache for hot_inode_item and hot_range_item.
>   */
>  void __init hot_cache_init(void)
> --- a/include/linux/hot_tracking.h
> +++ b/include/linux/hot_tracking.h
> @@ -71,6 +82,12 @@ struct hot_range_item {
>  struct hot_info {
>   struct hot_rb_tree hot_inode_tree;
>   spinlock_t lock; /*protect inode tree */
> +
> + /* map of inode temperature */
> + struct hot_map_head heat_inode_map[HEAT_MAP_SIZE];
> + /* map of range temperature */
> + struct hot_map_head heat_range_map[HEAT_MAP_SIZE];
> + unsigned int hot_map_nr;
>  };

Final layout of struct hot_info is

struct hot_info {
struct hot_rb_tree hot_inode_tree;   /* 0 8 */
spinlock_t lock; /* 872 */
/* --- cacheline 1 boundary (64 bytes) was 16 bytes ago --- */
struct hot_map_headheat_inode_map[256];  /*80 24576 */
/* --- cacheline 385 boundary (24640 bytes) was 16 bytes ago --- */
struct hot_map_headheat_range_map[256];  /* 24656 24576 */
/* --- cacheline 769 boundary (49216 bytes) was 16 bytes ago --- */
unsigned int   hot_map_nr;   /* 49232 4 */

/* XXX 4 bytes hole, try to pack */

struct workqueue_struct *  update_wq;/* 49240 8 */
struct delayed_workupdate_work;  /* 49248   216 */

/* XXX last struct has 4 bytes of padding */

/* --- cacheline 772 boundary (49408 bytes) was 56 bytes ago --- */
struct hot_type *  hot_type; /* 49464 8 */
/* --- cacheline 773 boundary (49472 bytes) --- */
struct shrinkerhot_shrink;   /* 4947248 */
struct dentry *vol_dentry;   /* 49520 8 */

/* size: 49528, cachelines: 774, members: 10 */
/* sum members: 49524, holes: 1, sum holes: 4 */
/* paddings: 1, sum paddings: 4 */
/* last cacheline: 56 bytes */
};

that's an order-4 allocation and the heat_*_map[] themselves need order-3.

Also the structure

struct hot_map_head {
struct list_head   node_list;/* 016 */
u8 temp; /*16 1 */

/* XXX 7 bytes hole, try to pack */

spinlock_t lock; /*2472 */
/* --- cacheline 1 boundary (64 bytes) was 32 bytes ago --- */

/* size: 96, cachelines: 2, members: 3 */
/* sum members: 89, holes: 1, sum holes: 7 */
/* last cacheline: 32 bytes */
};

is not packed efficiently and given the number of the array items, the wasted
space adds to the sum.

So, this needs to be fixed. Options I see:

1) try to allocate the structure with GFP_NOWARN and use vmalloc as a fallback
2) allocate heat_*_map arrays dynamically

An array of 256 pointers takes 2048 bytes, so when there are 2 of them plus
other struct items, overall size will go beyond a 4k page. Also, doing
kmalloc on each heat_*_map item could spread them over memory, although
hot_info is a long-term structure and it would make sense to keep the
data located at one place. For struct hot_map_head I suggest to create a
slab.


david
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RESEND v1 03/16] vfs: add I/O frequency update function

2013-01-09 Thread David Sterba
On Thu, Dec 20, 2012 at 10:43:22PM +0800, zwu.ker...@gmail.com wrote:
> --- a/fs/hot_tracking.c
> +++ b/fs/hot_tracking.c
> @@ -164,6 +164,135 @@ static void hot_inode_tree_exit(struct hot_info *root)
>   spin_unlock(&root->lock);
>  }
>  
> +struct hot_inode_item
> +*hot_inode_item_lookup(struct hot_info *root, u64 ino)
> +{
> + struct rb_node **p = &root->hot_inode_tree.map.rb_node;
> + struct rb_node *parent = NULL;
> + struct hot_comm_item *ci;
> + struct hot_inode_item *entry;
> +
> + /* walk tree to find insertion point */
> + spin_lock(&root->lock);
> + while (*p) {
> + parent = *p;
> + ci = rb_entry(parent, struct hot_comm_item, rb_node);
> + entry = container_of(ci, struct hot_inode_item, hot_inode);
> + if (ino < entry->i_ino)
> + p = &(*p)->rb_left;
> + else if (ino > entry->i_ino)
> + p = &(*p)->rb_right;

style comment: put { } around the all if/else blocks

> + else {
> + spin_unlock(&root->lock);
> + kref_get(&entry->hot_inode.refs);

jumping forwards in the series, the spin_unlock and kref_get get swapped
later, and I think that's the right order. Otherwise there's a small
window where the entry does not get the reference and could be
potentially freed by racing kref_put, no?


spin_unlock(tree)
 spin_lock(tree)
 
 kref_put(E) or via hot_inode_item_put(E) (1)
kref_get(E)   (2)


if the reference count at (1) was 1, it's freed and (2) hits a free
memory. hot_inode_item_put can be called from filesystem or via seq
print of the respective /proc files, so I think there are chances to hit
the problem.

> + return entry;
> + }
> + }
> + spin_unlock(&root->lock);
> +
> + entry = kmem_cache_zalloc(hot_inode_item_cachep, GFP_NOFS);
> + if (!entry)
> + return ERR_PTR(-ENOMEM);
> +
> + spin_lock(&root->lock);
> + hot_inode_item_init(entry, ino, &root->hot_inode_tree);
> + rb_link_node(&entry->hot_inode.rb_node, parent, p);
> + rb_insert_color(&entry->hot_inode.rb_node,
> + &root->hot_inode_tree.map);
> + spin_unlock(&root->lock);
> +
> + kref_get(&entry->hot_inode.refs);

Similar here, the entry is inserted into the tree but there's no
refcount yet. And the order of spin_unlock/kref_get remains unchanged.

> + return entry;
> +}
> +EXPORT_SYMBOL_GPL(hot_inode_item_lookup);
> +
> +static struct hot_range_item
> +*hot_range_item_lookup(struct hot_inode_item *he,
> + loff_t start)
> +{
> + struct rb_node **p = &he->hot_range_tree.map.rb_node;
> + struct rb_node *parent = NULL;
> + struct hot_comm_item *ci;
> + struct hot_range_item *entry;
> +
> + /* walk tree to find insertion point */
> + spin_lock(&he->lock);
> + while (*p) {
> + parent = *p;
> + ci = rb_entry(parent, struct hot_comm_item, rb_node);
> + entry = container_of(ci, struct hot_range_item, hot_range);
> + if (start < entry->start)
> + p = &(*p)->rb_left;
> + else if (start > hot_range_end(entry))
> + p = &(*p)->rb_right;

if { ...} 
else if { ... }

> + else {
> + spin_unlock(&he->lock);
> + kref_get(&entry->hot_range.refs);

same here

> + return entry;
> + }
> + }
> + spin_unlock(&he->lock);
> +
> + entry = kmem_cache_zalloc(hot_range_item_cachep, GFP_NOFS);
> + if (!entry)
> + return ERR_PTR(-ENOMEM);
> +
> + spin_lock(&he->lock);
> + hot_range_item_init(entry, start, he);
> + rb_link_node(&entry->hot_range.rb_node, parent, p);
> + rb_insert_color(&entry->hot_range.rb_node,
> + &he->hot_range_tree.map);
> + spin_unlock(&he->lock);
> +
> + kref_get(&entry->hot_range.refs);

and here

> + return entry;
> +}
> +
> +/*
> + * This function does the actual work of updating
> + * the frequency numbers, whatever they turn out to be.

Can this function be described a bit better? This comment did not help.

> + */
> +static void hot_rw_freq_calc(struct timespec old_atime,
> + struct timespec cur_time, u64 *avg)
> +{
> + struct timespec delta_ts;
> + u64 new_delta;
> +
> + delta_ts = timespec_sub(cur_time, old_atime);
> + new_delta = timespec_to_ns(&delta_ts) >> FREQ_POWER;
> +
> + *avg = (*avg << FREQ_POWER) - *avg + new_delta;
> + *avg = *avg >> FREQ_POWER;
> +}
> +
> +static void hot_freq_data_update(struct hot_freq_data *freq_data, bool write)
> +{
> + struct timespec cur_time = current_kernel_time();
> +
> + if (write) {
> + freq_data->nr_writes += 1;

The preferred style is

fr

Re: [PATCH RESEND v1 01/16] vfs: introduce some data structures

2013-01-09 Thread David Sterba
On Thu, Dec 20, 2012 at 10:43:20PM +0800, zwu.ker...@gmail.com wrote:
> --- /dev/null
> +++ b/fs/hot_tracking.c
> @@ -0,0 +1,109 @@
> +/*
> + * fs/hot_tracking.c

>From what I've undrestood the file name written here is not wanted, so
please drop it (and from .h too)

> + *
> + * Copyright (C) 2012 IBM Corp. All rights reserved.
> + * Written by Zhi Yong Wu 
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public
> + * License v2 as published by the Free Software Foundation.

A short description of the hot tracking feature or pointer to the
Documentation/ file would be nice here.

> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include "hot_tracking.h"
> +
> +/* kmem_cache pointers for slab caches */

This comment seems useless to me, I does not help understanding the code, just
says the same what reads in C. There are more such redundant comments in the
series, but I'm not going point to all of them right now.

> +static struct kmem_cache *hot_inode_item_cachep __read_mostly;
> +static struct kmem_cache *hot_range_item_cachep __read_mostly;
> +

> --- /dev/null
> +++ b/include/linux/hot_tracking.h
> +/* The common info for both following structures */
> +struct hot_comm_item {
> + struct rb_node rb_node; /* rbtree index */
> + struct hot_freq_data hot_freq_data;  /* frequency data */
> + spinlock_t lock; /* protects object data */
> + struct kref refs;  /* prevents kfree */
> +};
> +
> +/* An item representing an inode and its access frequency */
> +struct hot_inode_item {
> + struct hot_comm_item hot_inode; /* node in hot_inode_tree */
> + struct hot_rb_tree hot_range_tree; /* tree of ranges */
> + spinlock_t lock; /* protect range tree */
> + struct hot_rb_tree *hot_inode_tree;
> + u64 i_ino; /* inode number from inode */
> +};

Please align the comments to something like this (or drop them if they seem
redundant):

/* The common info for both following structures */
struct hot_comm_item {
struct rb_node rb_node;  /* rbtree index */
struct hot_freq_data hot_freq_data;  /* frequency data */
spinlock_t lock; /* protects object data */
struct kref refs;/* prevents kfree */
struct list_head n_list; /* list node index */
};

/* An item representing an inode and its access frequency */
struct hot_inode_item {
struct hot_comm_item hot_inode;  /* node in hot_inode_tree */
struct hot_rb_tree hot_range_tree;   /* tree of ranges */
spinlock_t lock; /* protect range tree */
struct hot_rb_tree *hot_inode_tree;
u64 i_ino;   /* inode number from inode */
};

> +extern void __init hot_cache_init(void);

this belongs to the private include fs/hot_tracking.h (because this is called
only once by vfs init and not by filesystems), there's
hot_track_init(superblock) for that purpose introduced later.


david
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RESEND v1 02/16] vfs: add init and cleanup functions

2013-01-09 Thread David Sterba
On Thu, Dec 20, 2012 at 10:43:21PM +0800, zwu.ker...@gmail.com wrote:
> From: Zhi Yong Wu 
> --- a/fs/hot_tracking.c
> +++ b/fs/hot_tracking.c
> @@ -107,3 +189,38 @@ err:
>   kmem_cache_destroy(hot_inode_item_cachep);
>  }
>  EXPORT_SYMBOL_GPL(hot_cache_init);
> +
> +/*
> + * Initialize the data structures for hot data tracking.
> + */
> +int hot_track_init(struct super_block *sb)
> +{
> + struct hot_info *root;
> + int ret = -ENOMEM;
> +
> + root = kzalloc(sizeof(struct hot_info), GFP_NOFS);
> + if (!root) {
> + printk(KERN_ERR "%s: Failed to malloc memory for "
> + "hot_info\n", __func__);
> + return ret;
> + }
> +
> + hot_inode_tree_init(root);

This function is supposed to be called from the filesystem init, please
add a sanity check that would catch multiple initialization attempts.

> +
> + sb->s_hot_root = root;
> +
> + printk(KERN_INFO "VFS: Turning on hot data tracking\n");
> +
> + return 0;
> +}
> +EXPORT_SYMBOL_GPL(hot_track_init);
> +
> +void hot_track_exit(struct super_block *sb)
> +{
> + struct hot_info *root = sb->s_hot_root;

another sanity check to catch the opposite.

Why? The option is parsed and enabled from the filesystems, due to
unexpected bugs eg with remounting or incorrectly handled error paths,
vfs layer should IMHO rather warn than crash.

> +
> + hot_inode_tree_exit(root);
> + sb->s_hot_root = NULL;
> + kfree(root);
> +}
> +EXPORT_SYMBOL_GPL(hot_track_exit);


david
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: i915_hangcheck_hung problem with 3.8-rc2+ (Linus's latest tree)

2013-01-09 Thread Greg KH
On Wed, Jan 09, 2013 at 02:12:04PM -0600, Dave Kleikamp wrote:
> On 01/09/2013 01:44 PM, Dave Kleikamp wrote:
> > 
> > I can easily reproduce it running glxgears on 3.8-rc1 or 3.8-rc2.
> > 
> > 00:02.0 VGA compatible controller [0300]: Intel Corporation Core
> > Processor Integrated Graphics Controller [8086:0046] (rev 02)
> > 
> > Thinkpad T410
> > 
> > Shaggy
> 
> Daniel's patch:
> 
> drm/i915: Revert shrinker changes from "Track unbound pages"
> 
> fixes the problem for me.

After an afternoon of multiple kernel builds and other stressful things,
it looks like it fixes it for me as well.  Chris, this will be going to
Linus soon, right?

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/8] PCI, ACPI, x86: Reserve fw allocated resource for hot-add root bus

2013-01-09 Thread Bjorn Helgaas
On Wed, Jan 9, 2013 at 1:10 PM, Rafael J. Wysocki  wrote:
> On Wednesday, January 09, 2013 11:01:39 AM Yinghai Lu wrote:
>> On Wed, Jan 9, 2013 at 10:39 AM, Bjorn Helgaas  wrote:
>> >> the reason why we need to change those codes for x86, we want to make it 
>> >> support
>> >> pci root bus hotplug. So it would be reasonable for us to align other
>> >> platform to x86
>> >> changes after pci root bus hotplug change is completely done.
>> >
>> > OK, I opened https://bugzilla.kernel.org/show_bug.cgi?id=52531 as a
>> > way to keep track of this consistency issue and merged
>> > pci/yinghai-survey-resources to my -next branch.
>>
>> Thanks a lot. will send other pci root bus hotplug out.
>>
>> question: now Rafael's tree has acpi-scan branch and it touches pci-root.c.
>>
>> so is it ok for me to base patches on your pci/next and his pm/acpi-scan?
>> how?
>> can you two have some arrangement like you pulling Rafael's branch?
>
> My acpi-scan branch is not going to be rebased going forward, so it can be
> pulled from safely if that helps.

I'm happy to do that, but it is outside the scope of my limited git
experience.  My guess is that I should do this (doing the pull into a
branch which I later merge into my -next branch):

  $ git checkout -b pci/yinghai-survey-resources+acpi-scan
pci/yinghai-survey-resources
  $ git pull --no-ff --log
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git
acpi-scan
  $ vi drivers/acpi/pci_root.c# resolve conflicts
  $ git add drivers/acpi/pci_root.c
  $ git commit

  $ git checkout next
  $ git merge --no-ff --log pci/yinghai-survey-resources+acpi-scan

Is that reasonable?  This won't cause issues when both Rafael and I
ask Linus to pull from our trees later?

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   3   4   5   6   7   8   9   >