date:20130307

Re: [PATCH 08/16] TTY: serial/bfin_uart, unbreak build with KGDB enabled

2013-03-07 Thread Sonic Zhang

Acked-by: Sonic Zhang 

On Thu, Mar 7, 2013 at 8:12 PM, Jiri Slaby  wrote:
> There are no (and never were any) kgdb fields in uart_ops. Setting
> them produces a build error:
> drivers/tty/serial/bfin_uart.c:1054:2: error: unknown field 
> 'kgdboc_port_startup' specified in initializer
> drivers/tty/serial/bfin_uart.c:1054:2: warning: initialization from 
> incompatible pointer type [enabled by default]
> drivers/tty/serial/bfin_uart.c:1054:2: warning: (near initialization for 
> 'bfin_serial_pops.ioctl') [enabled by default]
> drivers/tty/serial/bfin_uart.c:1055:2: error: unknown field 
> 'kgdboc_port_shutdown' specified in initializer
> drivers/tty/serial/bfin_uart.c:1055:2: warning: initialization from 
> incompatible pointer type [enabled by default]
> drivers/tty/serial/bfin_uart.c:1055:2: warning: (near initialization for 
> 'bfin_serial_pops.poll_init') [enabled by default]
>
> Remove them.
>
> Signed-off-by: Jiri Slaby 
> Cc: Sonic Zhang 
> Cc: uclinux-dist-de...@blackfin.uclinux.org
> ---
>  drivers/tty/serial/bfin_uart.c | 23 ---
>  1 file changed, 23 deletions(-)
>
> diff --git a/drivers/tty/serial/bfin_uart.c b/drivers/tty/serial/bfin_uart.c
> index 12dceda..26a3be7 100644
> --- a/drivers/tty/serial/bfin_uart.c
> +++ b/drivers/tty/serial/bfin_uart.c
> @@ -1011,24 +1011,6 @@ static int bfin_serial_poll_get_char(struct uart_port 
> *port)
>  }
>  #endif
>
> -#if defined(CONFIG_KGDB_SERIAL_CONSOLE) || \
> -   defined(CONFIG_KGDB_SERIAL_CONSOLE_MODULE)
> -static void bfin_kgdboc_port_shutdown(struct uart_port *port)
> -{
> -   if (kgdboc_break_enabled) {
> -   kgdboc_break_enabled = 0;
> -   bfin_serial_shutdown(port);
> -   }
> -}
> -
> -static int bfin_kgdboc_port_startup(struct uart_port *port)
> -{
> -   kgdboc_port_line = port->line;
> -   kgdboc_break_enabled = !bfin_serial_startup(port);
> -   return 0;
> -}
> -#endif
> -
>  static struct uart_ops bfin_serial_pops = {
> .tx_empty   = bfin_serial_tx_empty,
> .set_mctrl  = bfin_serial_set_mctrl,
> @@ -1047,11 +1029,6 @@ static struct uart_ops bfin_serial_pops = {
> .request_port   = bfin_serial_request_port,
> .config_port= bfin_serial_config_port,
> .verify_port= bfin_serial_verify_port,
> -#if defined(CONFIG_KGDB_SERIAL_CONSOLE) || \
> -   defined(CONFIG_KGDB_SERIAL_CONSOLE_MODULE)
> -   .kgdboc_port_startup= bfin_kgdboc_port_startup,
> -   .kgdboc_port_shutdown   = bfin_kgdboc_port_shutdown,
> -#endif
>  #ifdef CONFIG_CONSOLE_POLL
> .poll_put_char  = bfin_serial_poll_put_char,
> .poll_get_char  = bfin_serial_poll_get_char,
> --
> 1.8.1.4
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 3.9-rc1: crash kernel panic - not syncing: Can not allocate SWIOTLB buffer earlier and can't now provide you with the DMA bounce buffer

2013-03-07 Thread Takao Indoh


(2013/03/08 16:33), WANG Chao wrote:

On 03/08/2013 03:27 PM, Yinghai Lu wrote:

On Thu, Mar 7, 2013 at 11:20 PM, WANG Chao  wrote:


looks like your system DO have DMAR table, please enable dmar
remapping in your kernel config.


I've already got following config:
CONFIG_DMAR_TABLE=y
CONFIG_INTEL_IOMMU=y
CONFIG_IRQ_REMAP=y

but I don't have intel_iommu=on in kernel cmdline. IIRC, iommu will prevent
2nd kernel from booting ...


Did you put intel_iommu=on on first and second cpu both?


I tried, 2nd kernel didn't boot and keep splitting errors like these:
[2.106939] DMAR: No ATSR found
[2.110121] IOMMU 0 0xfed9: using Queued invalidation
[2.115522] IOMMU 1 0xfed91000: using Queued invalidation
[2.120919] IOMMU: Setting RMRR:
[2.124162] IOMMU: Setting identity map for device :00:02.0 [0xab80
- 0xaf9f]
[2.133099] IOMMU: Setting identity map for device :00:1d.0 [0xaac95000
- 0xaacb2fff]
[2.141305] IOMMU: Setting identity map for device :00:1a.0 [0xaac95000
- 0xaacb2fff]
[2.149503] IOMMU: Setting identity map for device :00:14.0 [0xaac95000
- 0xaacb2fff]
[2.157690] IOMMU: Prepare 0-16MiB unity mapping for LPC
[2.163011] IOMMU: Setting identity map for device :00:1f.0 [0x0 - 
0xff
[Errors, here we go]
[2.170932] dmar: DRHD: handling fault status reg 3
[2.170933] PCI-DMA: Intel(R) Virtualization Technology for Directed I/O
[2.182486] dmar: DMAR:[DMA Write] Request device [00:02.0] fault addr 
e000
[2.182486] DMAR:[fault reason 05] PTE Write access is not set
[2.195705] dmar: DRHD: handling fault status reg 3
[2.200570] dmar: DMAR:[DMA Read] Request device [00:02.0] fault addr 
ff873000
[2.200570] DMAR:[fault reason 06] PTE Read access is not set
[2.213618] dmar: DRHD: handling fault status reg 3
[..]


This is the problem I'm working on.
https://lkml.org/lkml/2012/11/26/814

Thansk,
Takao Indoh

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 3.9-rc1: crash kernel panic - not syncing: Can not allocate SWIOTLB buffer earlier and can't now provide you with the DMA bounce buffer

2013-03-07 Thread Yinghai Lu

On Thu, Mar 7, 2013 at 11:33 PM, WANG Chao  wrote:
> On 03/08/2013 03:27 PM, Yinghai Lu wrote:
>> On Thu, Mar 7, 2013 at 11:20 PM, WANG Chao  wrote:

 looks like your system DO have DMAR table, please enable dmar
 remapping in your kernel config.
>>>
>>> I've already got following config:
>>> CONFIG_DMAR_TABLE=y
>>> CONFIG_INTEL_IOMMU=y
>>> CONFIG_IRQ_REMAP=y
>>>
>>> but I don't have intel_iommu=on in kernel cmdline. IIRC, iommu will prevent
>>> 2nd kernel from booting ...
>>
>> Did you put intel_iommu=on on first and second cpu both?
>
> I tried, 2nd kernel didn't boot and keep splitting errors like these:
> [2.106939] DMAR: No ATSR found
> [2.110121] IOMMU 0 0xfed9: using Queued invalidation
> [2.115522] IOMMU 1 0xfed91000: using Queued invalidation
> [2.120919] IOMMU: Setting RMRR:
> [2.124162] IOMMU: Setting identity map for device :00:02.0 [0xab80
> - 0xaf9f]
> [2.133099] IOMMU: Setting identity map for device :00:1d.0 [0xaac95000
> - 0xaacb2fff]
> [2.141305] IOMMU: Setting identity map for device :00:1a.0 [0xaac95000
> - 0xaacb2fff]
> [2.149503] IOMMU: Setting identity map for device :00:14.0 [0xaac95000
> - 0xaacb2fff]
> [2.157690] IOMMU: Prepare 0-16MiB unity mapping for LPC
> [2.163011] IOMMU: Setting identity map for device :00:1f.0 [0x0 - 
> 0xff
> [Errors, here we go]
> [2.170932] dmar: DRHD: handling fault status reg 3
> [2.170933] PCI-DMA: Intel(R) Virtualization Technology for Directed I/O
> [2.182486] dmar: DMAR:[DMA Write] Request device [00:02.0] fault addr 
> e000
> [2.182486] DMAR:[fault reason 05] PTE Write access is not set
> [2.195705] dmar: DRHD: handling fault status reg 3
> [2.200570] dmar: DMAR:[DMA Read] Request device [00:02.0] fault addr 
> ff873000
> [2.200570] DMAR:[fault reason 06] PTE Read access is not set
> [2.213618] dmar: DRHD: handling fault status reg 3

my Nehalem-EX and Westmere-EX is working with iommu enabled in second kernel.

what is 00:02.0 in your system?

Is your kernel upsteam kernel or redhat flavor one?

Thanks

Yinghai
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: mmap vs fs cache

2013-03-07 Thread Howard Chu


Johannes Weiner wrote:

On Thu, Mar 07, 2013 at 04:43:12PM +0100, Jan Kara wrote:



2 questions:
   why is there data in the FS cache that isn't owned by (the mmap
of) the process that caused it to be paged in in the first place?


The filesystem cache is shared among processes because the filesystem
is also shared among processes.  If another task were to access the
same file, we still should only have one copy of that data in memory.


That's irrelevant to the question. As I already explained, the first 16GB that 
was paged in didn't behave this way. Perhaps "owned" was the wrong word, since 
this is a MAP_SHARED mapping. But the point is that the memory is not being 
accounted in slapd's process size, when it was before, up to 16GB.



It sounds to me like slapd is itself caching all the data it reads.


You're misreading the information then. slapd is doing no caching of its own, 
its RSS and SHR memory size are both the same. All it is using is the mmap, 
nothing else. The RSS == SHR == FS cache, up to 16GB. RSS is always == SHR, 
but above 16GB they grow more slowly than the FS cache.



If that is true, shouldn't it really be using direct IO to prevent
this double buffering of filesystem data in memory?


There is no double buffering.


   is there a tunable knob to discourage the page cache from stealing
from the process?


Try reducing /proc/sys/vm/swappiness, which ranges from 0-100 and
defaults to 60.


I've already tried setting it to 0 with no effect.

--
  -- Howard Chu
  CTO, Symas Corp.   http://www.symas.com
  Director, Highland Sun http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 14/14] x86, mm: Put pagetable on local node ram

2013-03-07 Thread Yinghai Lu

On Thu, Mar 7, 2013 at 11:01 PM, Tejun Heo  wrote:
> On Thu, Mar 07, 2013 at 08:58:40PM -0800, Yinghai Lu wrote:
>> If node with ram is hotplugable, local node mem for page table and vmemmap
>> should be on that node ram.
>>
>> This patch is some kind of refreshment of
>> | commit 1411e0ec3123ae4c4ead6bfc9fe3ee5a3ae5c327
>> | Date:   Mon Dec 27 16:48:17 2010 -0800
>> |
>> |x86-64, numa: Put pgtable to local node memory
>> That was reverted before.
>>
>> We have reason to reintroduce it to make memory hotplug work.
>>
>> Split calling of init_mem_mapping into early_initmem_info
>> for nodes after we get numa info there.
>>
>> First node will be low range.
>> Need to rework alloc_low_pages to alloc page table in following order:
>>   BRK, local node, low range
>>
>> Still only load_cr3 one time, otherwise we would break xen 64bit again.
>
> Hmmm... can you please split this patch further?  init_mem_mapping()
> change can be separated, no?

will try to split it out.

> Also, comments are disturbingly missing.
> How are other people reading the code supposed to know what it's
> trying to achieve why and how?  Hmmm... we're also likely to end up
> with smaller mapping for misaligned NUMA configurations (I think my
> test machine is like that).  Is it guaranteed that the top level ends
> up in the first node?  It really needs documentation.

Yes. To really memory hotplug working, will need to trim the node
alignment to be
1G in memblock and numa_meminfo.

also need to put pgd page in low range (first node) if 512G block is
crossing node.
for example: if node2 is [256g, 1024g), pgd for 256g-512g, must be stay on node0
and 512g-1024g could stay on node2.
or just put all PGD pages on low range (first node).

Thanks

Yinghai
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: kswapd craziness round 2

2013-03-07 Thread Zlatko Calusic


On 08.03.2013 07:42, Hillf Danton wrote:

On Fri, Mar 8, 2013 at 3:37 AM, Jiri Slaby  wrote:

On 03/01/2013 03:02 PM, Hillf Danton wrote:

On Fri, Mar 1, 2013 at 1:02 AM, Jiri Slaby  wrote:


Ok, no difference, kswap is still crazy. I'm attaching the output of
"grep -vw '0' /proc/vmstat" if you see something there.


Thanks to you for test and data.

Lets try to restore the deleted nap, then.


Oh, it seems to be nice now:
root   579  0.0  0.0  0 0 ?SMar04   0:13 [kswapd0]


Double thanks.

But Mel does not like it, probably.
Lets try nap in another way.

Hillf

--- a/mm/vmscan.c   Thu Feb 21 20:01:02 2013
+++ b/mm/vmscan.c   Fri Mar  8 14:36:10 2013
@@ -2793,6 +2793,10 @@ loop_again:
 * speculatively avoid congestion waits
 */
zone_clear_flag(zone, ZONE_CONGESTED);
+
+   else if (sc.priority > 2 &&
+sc.priority < DEF_PRIORITY - 2)
+   wait_iff_congested(zone, BLK_RW_ASYNC, HZ/10);
}

/*
--



There's another bug in there, which I'm still chasing. Artificial sleeps 
like this just mask the real bug and introduce new problems (on my 4GB 
server kswapd spends all the time in those congestion wait calls). The 
problem is that the bug needs about 5 days of uptime to reveal it's ugly 
head. So far I can only tell that it was introduced somewhere between 
3.1 & 3.4.


Also, check shrink_inactive_list(), it already sleeps if really needed:

if (nr_writeback && nr_writeback >=
(nr_taken >> (DEF_PRIORITY - sc->priority)))
wait_iff_congested(zone, BLK_RW_ASYNC, HZ/10);

Regards,
--
Zlatko

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/2] [GIT PULL][3.9] tracing: Fix in snapshot API

2013-03-07 Thread Hiraku Toyooka


Steven,

(03/08/2013 12:34 AM), Steven Rostedt wrote:

The second patch, returns success on a reset of the buffer even if
the buffer wasn't allocated. Returning -EINVAL is just confusing.


I realized we should update the snapshot documentation together with
this change.
I attached a patch to update the documentation. Could you include this
patch?

Thanks,
Hiraku Toyooka


From: Hiraku Toyooka 
Subject: [PATCH] tracing: update documentation of snapshot utility

Now, "snapshot" file returns success on a reset of snapshot buffer
even if the buffer wasn't allocated, instead of returning EINVAL.
This patch updates snapshot desctiption according to the change.

Signed-off-by: Hiraku Toyooka 
---
 Documentation/trace/ftrace.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/trace/ftrace.txt b/Documentation/trace/ftrace.txt
index 53d6a3c..a372304 100644
--- a/Documentation/trace/ftrace.txt
+++ b/Documentation/trace/ftrace.txt
@@ -1873,7 +1873,7 @@ feature:

status\input  | 0  | 1  |else|
--++++
-   not allocated |(do nothing)| alloc+swap |   EINVAL   |
+   not allocated |(do nothing)| alloc+swap |(do nothing)|
--++++
allocated |free|swap|   clear|
--++++
--
1.7.11.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 11/14] x86, acpi, numa: split SLIT handling out

2013-03-07 Thread Yinghai Lu

On Thu, Mar 7, 2013 at 11:19 PM, Tejun Heo  wrote:
> On Thu, Mar 7, 2013 at 11:18 PM, Yinghai Lu  wrote:
>> ia64 like to call in this seqence
>> acpi_numa_init()
>> parse srat
>> parse slit
>> then
>> acpi_numa_arch_fixup()
>>
>> in this arch_fixup, it will try to fill dummy distance_matrix.
>>
>> so would to keep acpi_numa_init ...
>
> Can't it just call acpi_numa_init_srat() and then init_slit()?  What
> am I missing?

Yes, but need to break acpi_numa_init calling
in arch/ia64/kernel/setup.c

with
acpi_numa_init_srat()
acpi_numa_init_slit()
acpi_numa_arch_fixup()

current code is
acpi_numa_init calling arch_numa_arch_fixup()

Thanks

Yinghai
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 3.9-rc1: crash kernel panic - not syncing: Can not allocate SWIOTLB buffer earlier and can't now provide you with the DMA bounce buffer

2013-03-07 Thread WANG Chao

On 03/08/2013 03:27 PM, Yinghai Lu wrote:
> On Thu, Mar 7, 2013 at 11:20 PM, WANG Chao  wrote:
>>>
>>> looks like your system DO have DMAR table, please enable dmar
>>> remapping in your kernel config.
>>
>> I've already got following config:
>> CONFIG_DMAR_TABLE=y
>> CONFIG_INTEL_IOMMU=y
>> CONFIG_IRQ_REMAP=y
>>
>> but I don't have intel_iommu=on in kernel cmdline. IIRC, iommu will prevent
>> 2nd kernel from booting ...
> 
> Did you put intel_iommu=on on first and second cpu both?

I tried, 2nd kernel didn't boot and keep splitting errors like these:
[2.106939] DMAR: No ATSR found
[2.110121] IOMMU 0 0xfed9: using Queued invalidation
[2.115522] IOMMU 1 0xfed91000: using Queued invalidation
[2.120919] IOMMU: Setting RMRR:
[2.124162] IOMMU: Setting identity map for device :00:02.0 [0xab80
- 0xaf9f]
[2.133099] IOMMU: Setting identity map for device :00:1d.0 [0xaac95000
- 0xaacb2fff]
[2.141305] IOMMU: Setting identity map for device :00:1a.0 [0xaac95000
- 0xaacb2fff]
[2.149503] IOMMU: Setting identity map for device :00:14.0 [0xaac95000
- 0xaacb2fff]
[2.157690] IOMMU: Prepare 0-16MiB unity mapping for LPC
[2.163011] IOMMU: Setting identity map for device :00:1f.0 [0x0 - 
0xff
[Errors, here we go]
[2.170932] dmar: DRHD: handling fault status reg 3
[2.170933] PCI-DMA: Intel(R) Virtualization Technology for Directed I/O
[2.182486] dmar: DMAR:[DMA Write] Request device [00:02.0] fault addr 
e000
[2.182486] DMAR:[fault reason 05] PTE Write access is not set
[2.195705] dmar: DRHD: handling fault status reg 3
[2.200570] dmar: DMAR:[DMA Read] Request device [00:02.0] fault addr 
ff873000
[2.200570] DMAR:[fault reason 06] PTE Read access is not set
[2.213618] dmar: DRHD: handling fault status reg 3
[..]

Thanks,
WANG Chao
> 
> Thanks
> 
> Yinghai
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] sched: wakeup buddy

2013-03-07 Thread Michael Wang

On 03/08/2013 02:44 PM, Mike Galbraith wrote:
> On Fri, 2013-03-08 at 10:37 +0800, Michael Wang wrote: 
>> On 03/07/2013 05:43 PM, Mike Galbraith wrote:
>>> On Thu, 2013-03-07 at 09:36 +0100, Peter Zijlstra wrote: 
 On Wed, 2013-03-06 at 15:06 +0800, Michael Wang wrote:

> wake_affine() stuff is trying to bind related tasks closely, but it 
> doesn't
> work well according to the test on 'perf bench sched pipe' (thanks to 
> Peter).

 so sched-pipe is a poor benchmark for this.. 

 Ideally we'd write a new benchmark that has some actual data footprint
 and we'd measure the cost of tasks being apart on the various cache
 metrics and see what affine wakeup does for it.

 Before doing something like what you're proposing, I'd have a hard look
 at WF_SYNC, it is possible we should disable/fix select_idle_sibling
 for sync wakeups.
>>>
>>> If nobody beats me to it, I'm going to try tracking shortest round trip
>>> to idle, and use a multiple of that to shut select_idle_sibling() down.
>>> If avg_idle approaches round trip time, there's no win to be had, we're
>>> just wasting cycles.
>>
>> That's great if we have it, I'm a little doubt whether it is possible to
>> find a better way to replace the select_idle_sibling() (look at the way
>> it locates idle cpu...) in some cases, but I'm looking forward it ;-)
> 
> I'm not going to replace it, only stop it from wasting cycles when
> there's very likely nothing to gain.  Save task wakeup time, if delta
> rq->clock - p->last_wakeup < N*shortest_idle or some such very cheap
> metric.  Wake ultra switchers L2 affine if allowed, only go hunting for
> an idle L3 if the thing is on another package.  
> 
> In general, I think things would work better if we'd just rate limit how
> frequently we can wakeup migrate each individual task.  

Isn't the wakeup buddy already limit the rate? and by turning the knob,
we could change the rate on our demand.

We want
> jabbering tasks to share L3, but we don't really want to trash L2 at an
> awesome rate.

I don't get it..., it's a task which has 'sleep' for some time, unless
there is no task running on prev_cpu when it's sleeping, otherwise
whatever the new cpu is, we will trash L2, isn't it?

Regards,
Michael Wang

> 
> -Mike
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] sctp: don't break the loop while meeting the active_path so as to find the matched transport

2013-03-07 Thread Xufeng Zhang

sctp_assoc_lookup_tsn() function searchs which transport a certain TSN
was sent on, if not found in the active_path transport, then go search
all the other transports in the peer's transport_addr_list, however, we
should continue to the next entry rather than break the loop when meet
the active_path transport.

Signed-off-by: Xufeng Zhang 
---
 net/sctp/associola.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/net/sctp/associola.c b/net/sctp/associola.c
index 43cd0dd..d2709e2 100644
--- a/net/sctp/associola.c
+++ b/net/sctp/associola.c
@@ -1079,7 +1079,7 @@ struct sctp_transport *sctp_assoc_lookup_tsn(struct 
sctp_association *asoc,
transports) {
 
if (transport == active)
-   break;
+   continue;
list_for_each_entry(chunk, &transport->transmitted,
transmitted_list) {
if (key == chunk->subh.data_hdr->tsn) {
-- 
1.7.0.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/1 v5] pwm_bl: Add support for backlight enable regulator

2013-03-07 Thread Thierry Reding

On Thu, Mar 07, 2013 at 01:16:08PM -0800, Andrew Chew wrote:
> The backlight enable regulator is specified in the device tree node for
> backlight.
> 
> Signed-off-by: Andrew Chew 

This looks good in general. I'd like to see Alex' comment addressed and
perhaps the commit message shouldn't be device tree specific, since the
same can be done with legacy board setup code. It would be nice if it'd
mention something about why the regulator is added.

Thierry

pgpTFqtAciKbr.pgp
Description: PGP signature

Re: [PATCH 04/14] x86, ACPI: make acpi override finding work with 32bit flat mode

2013-03-07 Thread Tejun Heo

On Thu, Mar 07, 2013 at 11:25:14PM -0800, Yinghai Lu wrote:
> >> > Why is this table made a stack variable?  What's the benefit of doing
> >> > that?
> >>
> >> so I do need to switch global variables to phys and access it.
> >
> > I can't really understand what your response means.  Can you please
> > elaborate?
> 
> sorry, I missed NOT.
> 
> so I do NOT need to switch global variables from kernel virtual addr
> to phys address and access it
> in 32bit flat mode.

Ah, okay, so the function is called with a completely different
address mode and so you actually want to build the table on stack so
that you don't have to flip the address mode for the global address.

> >> yes, one for 32bit from head_32.S, phys.
> >> one for 64bit from head64.c. with _va.
> >
> > head64.c can't call with phys?  Why not?
> 
> HPA's #PF set up page table only handle kernel low mapping address.
> 
> and after reset_early_page_tables, only kernel high mapping address is
> there. and other low mapping will be supported via #PF handler.

Okay, it now makes sense.  Ah You'll definitely need a lot of
documentation explanining what's going on.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 3.9-rc1: crash kernel panic - not syncing: Can not allocate SWIOTLB buffer earlier and can't now provide you with the DMA bounce buffer

2013-03-07 Thread Yinghai Lu

On Thu, Mar 7, 2013 at 11:20 PM, WANG Chao  wrote:
>>
>> looks like your system DO have DMAR table, please enable dmar
>> remapping in your kernel config.
>
> I've already got following config:
> CONFIG_DMAR_TABLE=y
> CONFIG_INTEL_IOMMU=y
> CONFIG_IRQ_REMAP=y
>
> but I don't have intel_iommu=on in kernel cmdline. IIRC, iommu will prevent
> 2nd kernel from booting ...

Did you put intel_iommu=on on first and second cpu both?

Thanks

Yinghai
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] leds-ot200: Fix misbehavior caused by wrong bit masks

2013-03-07 Thread Christian Gmeiner

2013/3/5 Bryan Wu :
> On Sun, Mar 3, 2013 at 11:40 PM, Christian Gmeiner
>  wrote:
>> ping
>> --
>> Christian Gmeiner, MSc
>>
>>
>> 2013/2/23 Christian Gmeiner :
>>> 2013/2/15 Bryan Wu :
 On Wed, Feb 13, 2013 at 7:58 AM, Christian Gmeiner
  wrote:
> During the development of this driver an in-house register
> documentation was used. The last weeks some integration tests
> were done and this problem was found. It turned out that
> the released register documentation is wrong.
>
> The fix is very simple: shift all masks by one.
>
> Our customers can control LEDs from userspace via Java,
> C++ or what every. They have running/working applications where
> they want to control led_3 but led_2 get's used.
> I got a bug report in our in-house bug tracker so it would be
> great to fix this upstream.
>
> Signed-off-by: Christian Gmeiner 

 Thanks, Christian.

 And Andrew, are you going to take care of this patch? Or I will merge this.

>>>
>>> Whats the current state of the patch? Hope we can get it into 3.9 :)
>>>
>
> I think this patch is already in linux-next via Andrew's tree (commit
> 023206171f235f93f26c314f76f5405a3077aaba). So it will be merged into
> 3.10 I guess, but not 3.9.
>
> Or I can send out this patch to Linus as a fix and ask Andrew to drop
> this from his tree.
>

I am fine with the current situation and 3.10 is fine too.

thanks
--
Christian Gmeiner, MSc
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] alarmtimer: add error prints when suspend failed

2013-03-07 Thread Laxman Dewangan


On Friday 08 March 2013 04:46 AM, Greg KH wrote:

On Fri, Mar 08, 2013 at 12:57:37AM +0530, Laxman Dewangan wrote:

The alramtimer suspend failed when nearest alarm wakeup time is
less than 2 sec or rtc timer can not start.

In suspend/resume stress testing, we found that sometimes alramtimer
failed to suspend and hence it cancel the suspend ops. Add error prints
in suspend failure to provide more info when failure occurs to help
debugging.

Signed-off-by: Laxman Dewangan 
---
  kernel/time/alarmtimer.c |6 +-
  1 files changed, 5 insertions(+), 1 deletions(-)

diff --git a/kernel/time/alarmtimer.c b/kernel/time/alarmtimer.c
index f11d83b..eed5646 100644
--- a/kernel/time/alarmtimer.c
+++ b/kernel/time/alarmtimer.c
@@ -249,6 +249,8 @@ static int alarmtimer_suspend(struct device *dev)
  
  	if (ktime_to_ns(min) < 2 * NSEC_PER_SEC) {

__pm_wakeup_event(ws, 2 * MSEC_PER_SEC);
+   dev_err(dev,
+   "Nearest alarm wakeup time < 2sec, avoiding suspend\n");

What can userspace now do with this information?  How often is this now
going to spam the syslog and cause confusion?



When we executed the stress on suspend/resume for system stability, 
occasionally we get such error (3/4 times in 1000 cycle):

[ 235.508010] dpm_run_callback(): platform_pm_suspend+0x0/0x64 returns -16
[ 235.514999] PM: Device alarmtimer failed to suspend: error -16
[ 235.520958] PM: Some devices failed to suspend


After tracing back the failure case, we found that possible reason could 
be above one.
In this case, if any function returns error then always better to print 
the error so that it is easy to findout the cause of the error and analyse.


It should not generate spam as this does happen on some cases.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] irqchip: Renesas INTC External IRQ pin driver

2013-03-07 Thread Simon Horman

On Wed, Mar 06, 2013 at 02:05:52PM +0100, Thomas Gleixner wrote:
> On Wed, 6 Mar 2013, Simon Horman wrote:
> 
> > On Wed, Mar 06, 2013 at 11:01:14AM +0100, Thomas Gleixner wrote:
> > > On Wed, 6 Mar 2013, Simon Horman wrote:
> > > > On Mon, Feb 18, 2013 at 11:28:34PM +0900, Magnus Damm wrote:
> > > > > The SoCs using this driver are currently mainly used
> > > > > together with regular platform devices so this driver
> > > > > allows configuration via platform data to support things
> > > > > like static interrupt base address. DT support will
> > > > > be added incrementally in the not so distant future.
> > > > 
> > > > Hi Thomas,
> > > > 
> > > > I'm wondering how you would like to handle merging this driver.
> > > > I can think of three options but I'm sure there are others.
> > > > 
> > > > * You can take the patches (there is a follow-up series) yourself.
> > > > * I can prepare a pull request for you
> > > > * I can prepare a pull request for arm-soc with the shmobile patches 
> > > > that
> > > >   enable the driver on the r8a7779 and sh73a0.
> > > > 
> > > > The last option is possibly the easiest.
> > > 
> > > Correct.
> > > 
> > > > But in that case I'd appreciate an Ack from you on this patch.
> > > 
> > > You want to pick the V2 series, which already has my blessing:
> > > 
> > > https://lkml.org/lkml/2013/2/26/305
> > > 
> > > For merging it through arm-soc you have my ack now :)
> > 
> > Thanks, I have that.
> > 
> > It seems that V2 adds onto this patch rather than replaces it.
> > So could I also get an Ack for this patch too?
> 
> Ah,. right. V2 is an incremental fix for V1. Yes, please add my Ack.

Thanks.

I have added this into a new intc-external-irq branch of
the renesas tree on kernel.org and thus queued it up for v3.10.

I have also added the following patches to that branch:

irqchip: irqc: Add DT support
irqchip: intc-irqpin: Initial DT support
ARM: shmobile: Make r8a7779 INTC irqpin platform data static
ARM: shmobile: Make sh73a0 INTC irqpin platform data static
irqchip: Renesas IRQC driver
irqchip: intc-irqpin: GPL header for platform data
irqchip: intc-irqpin: Make use of devm functions
irqchip: intc-irqpin: Add force comments
irqchip: intc-irqpin: Cache mapped IRQ
irqchip: intc-irqpin: Whitespace fixes
ARM: shmobile: INTC External IRQ pin driver on r8a7779
ARM: shmobile: INTC External IRQ pin driver on sh73a0
ARM: shmobile: irq_pin() for static IRQ pin assignment

The intc-external-irq branch is merged into the next branch and
I expect it to appear in linux-next in the not to distant future.

I have removed the topic/intc-external-irq branch from the reneas tree,
the branch where these patches were being staged.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 04/14] x86, ACPI: make acpi override finding work with 32bit flat mode

2013-03-07 Thread Yinghai Lu

On Thu, Mar 7, 2013 at 11:06 PM, Tejun Heo  wrote:
> Hello, Yinghai.
>
> On Thu, Mar 07, 2013 at 10:57:21PM -0800, Yinghai Lu wrote:
>> On Thu, Mar 7, 2013 at 9:50 PM, Tejun Heo  wrote:
>> > On Thu, Mar 07, 2013 at 08:58:30PM -0800, Yinghai Lu wrote:
>> >> We will find acpi tables in initrd during head_32.S in 32bit flat mode.
>> >>
>> >> So need acpi_initrd_override_find could take phys directly.
>> >
>> > The patch description doesn't explain even half of what's going on.
>>
>> hope HPA could understand.
>>
>> Access initrd before relocate_initrd and init_memory mapping.
>
> I really hope the changelogs were better.  Eh well...
>
>> >> -/* All but ACPI_SIG_RSDP and ACPI_SIG_FACS: */
>> >> -static const char * const table_sigs[] = {
>> >> - ACPI_SIG_BERT, ACPI_SIG_CPEP, ACPI_SIG_ECDT, ACPI_SIG_EINJ,
>> >> - ACPI_SIG_ERST, ACPI_SIG_HEST, ACPI_SIG_MADT, ACPI_SIG_MSCT,
>> >> - ACPI_SIG_SBST, ACPI_SIG_SLIT, ACPI_SIG_SRAT, ACPI_SIG_ASF,
>> >> - ACPI_SIG_BOOT, ACPI_SIG_DBGP, ACPI_SIG_DMAR, ACPI_SIG_HPET,
>> >> - ACPI_SIG_IBFT, ACPI_SIG_IVRS, ACPI_SIG_MCFG, ACPI_SIG_MCHI,
>> >> - ACPI_SIG_SLIC, ACPI_SIG_SPCR, ACPI_SIG_SPMI, ACPI_SIG_TCPA,
>> >> - ACPI_SIG_UEFI, ACPI_SIG_WAET, ACPI_SIG_WDAT, ACPI_SIG_WDDT,
>> >> - ACPI_SIG_WDRT, ACPI_SIG_DSDT, ACPI_SIG_FADT, ACPI_SIG_PSDT,
>> >> - ACPI_SIG_RSDT, ACPI_SIG_XSDT, ACPI_SIG_SSDT, NULL };
>> >
>> > Why is this table made a stack variable?  What's the benefit of doing
>> > that?
>>
>> so I do need to switch global variables to phys and access it.
>
> I can't really understand what your response means.  Can you please
> elaborate?

sorry, I missed NOT.

so I do NOT need to switch global variables from kernel virtual addr
to phys address and access it
in 32bit flat mode.

>
>> > Is it really necessary to make the function take both virtual and
>> > physical addresses?  Can't we just make the function take phys_addr_t
>> > and update everyone to call with physaddr?  Also @is_phys isn't simple
>> > address switch.  It also changes error reporting.  If you're gonna
>> > keep @is_phys, let's at least write up a function comment explaining
>> > what's going on and why we need it.  But, really, if at all possible,
>> > let's change the function to take single type of argument and
>> > predicate error message printing on something else (e.g. early printk
>> > initialized or whatever).
>>
>> yes, one for 32bit from head_32.S, phys.
>> one for 64bit from head64.c. with _va.
>
> head64.c can't call with phys?  Why not?

HPA's #PF set up page table only handle kernel low mapping address.

and after reset_early_page_tables, only kernel high mapping address is
there. and other low mapping will be supported via #PF handler.

>
>> Not sure if I could use early_printk from head_32.S, as Fenghua does
>> not print out
>> from microcode updating early in the same parts.
>
> ISTR it works but it doens't have to (although it would be much nicer
> if it did).  You can test whether printk is online and skip if it
> isn't online yet.

ok, will give it try.

Thanks

Yinghai
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/1 v4] pwm_bl: Add support for backlight enable regulator

2013-03-07 Thread Thierry Reding

On Fri, Mar 08, 2013 at 11:21:04AM +0900, Alex Courbot wrote:
> On 03/08/2013 06:07 AM, Andrew Chew wrote:
> >>From: Thierry Reding [mailto:thierry.red...@avionic-design.de]
> >>Sent: Thursday, March 07, 2013 3:27 AM
> >>To: Alex Courbot
> >>Cc: Andrew Chew; linux-kernel@vger.kernel.org
> >>Subject: Re: [PATCH 1/1 v4] pwm_bl: Add support for backlight enable
> >>regulator
> >>
> >>* PGP Signed by an unknown key
> >>
> >>On Thu, Mar 07, 2013 at 07:11:25PM +0900, Alex Courbot wrote:
> >>>On 03/07/2013 04:11 PM, Thierry Reding wrote:
> >+boolen_supply_enabled;
> 
> This boolean can be dropped. As discussed in a previous thread, the
> pwm-backlight driver shouldn't need to know about any other uses of
> the regulator.
> >>>
> >>>Sorry for being obstinate - but I'm still not convinced we can get rid
> >>>of it. I checked the regulator code, and as you mentioned in the
> >>>previous version, calls to regulator_enable() and
> >>>regulator_disable() *must* be balanced in this driver.
> >>>
> >>>Without this variable we would call regulator_enable() every time
> >>>pwm_backlight_enable() is called (and same thing when disabling).
> >>>Now imagine the driver is asked to set the following intensities: 5,
> >>>12, then 0. You would have two calls to regulator_enable() but only
> >>>one to regulator_disable(), which would result in the enable GPIO
> >>>remaining active even though it would be shut down. Or I missed
> >>>something obvious.
> >>>
> >>>The regulator must be enabled/disabled on transitions from/to 0, and
> >>>AFAICT there is no way for this driver to detect them.
> >>
> >>Yes, that's true, but I don't think it should be solved for just this one
> >>regulator. Instead if we need to track the enable state we might as well 
> >>track
> >>it for *any* resource so that the PWM isn't enabled/disabled twice either.
> >
> >That makes sense, but I'm confused due to previous comments.  The most
> >obvious way to do this seems to be to have a bool track the enable state.
> >Do you still want me to do away with this bool?  I can satisfy your very
> >last comment by keeping the bool (renaming it to something more generic)
> >and encapsulating the pwm_enable()/pwm_disable() call within.
> 
> I think that's what Thierry meant, yes.

Yes, it is. =)

> >>I expect that if the changes are split up then the board-setup code changes
> >>need to be done prior to the driver change. Using the lookup tables should
> >>make this easy because they aren't tied to the platform data and can be
> >>added independently. The patches should probably go through the same
> >>subsystem tree to take care of the dependencies.
> >>
> >>Keeping everything in one patch would work too, but it's certainly more
> >>chaotic.
> >
> >Am I supposed to handle those patches?  I'm concerned that I don't have
> >hardware to test properly, but I can give it a shot if it's my 
> >responsibility.
> 
> Yes, if you introduce incompatibilities you have the burden of
> performing the transition without breaking things at any single
> point of the git history. Since this is just about adding a dummy
> regulator, it should go fine even without testing. And in the event
> it does not, that's what linux-next is for.

Right. We'll need an Acked-by from the board/machine maintainers anyway
and if something still breaks we can always fix it after somebody's
actually done the testing.

> Make sure you also update the dts of current device tree users, as
> they will break, too.
> 
> What I don't know is if you should update all users in one big
> patch, or instead provide one patch per platform changed. Maybe
> Thierry can provide some guidance here.

I think it'd be good to split them up into per-architecture and
per-machine. Per-board would probably be too much. That'll allow the
respective maintainers to ack patches that touch their machines or
boards without having them go through all other hunks too.

Thierry


pgpFqbj3Dx5sx.pgp
Description: PGP signature

Re: 3.9-rc1: crash kernel panic - not syncing: Can not allocate SWIOTLB buffer earlier and can't now provide you with the DMA bounce buffer

2013-03-07 Thread WANG Chao

On 03/08/2013 02:36 PM, Yinghai Lu wrote:
> On Thu, Mar 7, 2013 at 10:32 PM, Yinghai Lu  wrote:
>> On Thu, Mar 7, 2013 at 10:03 PM, CAI Qian  wrote:
>>> CC'ing kexec ML. Also mentioned that 3.8 has no such issue.
>>>
>>> This message looks suspicious and out of range while 3.8 reservation
>>> looks within the range.
>>>
>>> [0.00] Reserving 128MB of memory at 5216MB for crashkernel
>>> (System RAM: 3977MB)
>>>
>>> Wondering if anything to do with memblock again...
>>
>> that is intended...
>>
>>> - Original Message -
 From: "WANG Chao" 
 To: "LKML" vger.kernel.org>
 Cc: "CAI Qian" 
 Sent: Friday, March 8, 2013 1:54:37 PM
 Subject: 3.9-rc1: crash kernel panic - not syncing: Can not allocate 
 SWIOTLB buffer earlier and can't now provide you
 with the DMA bounce buffer

 Hi, All

 On 3.9-rc1, I load crash kernel with latest kexec-tools(up to
 28d413a), but
 2nd kernel panic at early time:
 [2.948076] Kernel panic - not syncing: Can not allocate SWIOTLB
 buffer earlier and can't now provide you with the DMA bounce buffer
 [2.959958] Pid: 53, comm: khubd Not tainted 3.9.0-rc1+ #1
>>
>> You need to add crashkernel_low=64M in first kernel.
>>
>> As your system does not support DMA remapping.
> 
> looks like your system DO have DMAR table, please enable dmar
> remapping in your kernel config.

I've already got following config:
CONFIG_DMAR_TABLE=y
CONFIG_INTEL_IOMMU=y
CONFIG_IRQ_REMAP=y

but I don't have intel_iommu=on in kernel cmdline. IIRC, iommu will prevent
2nd kernel from booting ...

I tested crashkernel=128M and crashkernel_low=64M, seems 2nd-kernel/kexec only
works when two params are used in combination.

Thanks,
WANG Chao

> 
> Yinghai
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/4][V2] time : set broadcast irq affinity

2013-03-07 Thread Thomas Gleixner

On Fri, 8 Mar 2013, Daniel Lezcano wrote:
> On 03/06/2013 10:48 AM, Thomas Gleixner wrote:
> I was wondering if it would be possible to take the 3/4 and 4/4
> otherwise the flag dependency will prevent to send those to the
> maintainer's tree until they gain visibility on it.

I can take them with the ack of arm soc folks.

Thanks,

tglx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 11/14] x86, acpi, numa: split SLIT handling out

2013-03-07 Thread Tejun Heo

On Thu, Mar 7, 2013 at 11:18 PM, Yinghai Lu  wrote:
> ia64 like to call in this seqence
> acpi_numa_init()
> parse srat
> parse slit
> then
> acpi_numa_arch_fixup()
>
> in this arch_fixup, it will try to fill dummy distance_matrix.
>
> so would to keep acpi_numa_init ...

Can't it just call acpi_numa_init_srat() and then init_slit()?  What
am I missing?

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 11/14] x86, acpi, numa: split SLIT handling out

2013-03-07 Thread Yinghai Lu

On Thu, Mar 7, 2013 at 10:46 PM, Tejun Heo  wrote:
> On Thu, Mar 07, 2013 at 08:58:37PM -0800, Yinghai Lu wrote:
>> +void __init acpi_numa_init_only_slit(void)
>> +{
>> + /* SLIT: System Locality Information Table */
>> + acpi_table_parse(ACPI_SIG_SLIT, acpi_parse_slit);
>> +}
>> +
>> +static int __init __acpi_numa_init(bool with_slit)
>>  {
>>   int cnt = 0;
>
> Hmmm how about just having the following two functions
>
> acpi_numa_init_srat();
> acpi_numa_init_slit();

ok.


> and update both x86 and ia64 to use the two functions?

ia64 like to call in this seqence
acpi_numa_init()
parse srat
parse slit
then
acpi_numa_arch_fixup()

in this arch_fixup, it will try to fill dummy distance_matrix.

so would to keep acpi_numa_init ...
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RFC 0/2] kvm: Better yield_to candidate using preemption notifiers

2013-03-07 Thread Raghavendra K T


On 03/08/2013 12:40 AM, Marcelo Tosatti wrote:

On Mon, Mar 04, 2013 at 11:31:46PM +0530, Raghavendra K T wrote:

  This patch series further filters better vcpu candidate to yield to
in PLE handler. The main idea is to record the preempted vcpus using
preempt notifiers and iterate only those preempted vcpus in the
handler. Note that the vcpus which were in spinloop during pause loop
exit are already filtered.

Thanks Jiannan, Avi for bringing the idea and Gleb, PeterZ for
precious suggestions during the discussion.
Thanks Srikar for suggesting to avoid rcu lock while checking task state
that has improved overcommit cases.

There are basically two approches for the implementation.

Method 1: Uses per vcpu preempt flag (this series).

Method 2: We keep a bitmap of preempted vcpus. using this we can easily
iterate over preempted vcpus.

Note that method 2 needs an extra index variable to identify/map bitmap to
vcpu and it also needs static vcpu allocation.

I am also posting Method 2 approach for reference in case it interests.

Result: decent improvement for kernbench and ebizzy.

base = 3.8.0 + undercommit patches
patched = base + preempt patches

Tested on 32 core (no HT) mx3850 machine with 32 vcpu guest 8GB RAM

--+---+---+---++---+
kernbench (exec time in sec lower is beter)
--+---+---+---++---+
   base   stdev   patched   stdev  %improve
--+---+---+---++---+
1x47.0383 4.6977 44.2584 1.2899 5.90986
2x96.0071 7.1873 91.2605 7.3567 4.94401
3x   164.015710.3613156.675011.4267 4.47561
4x   212.576823.7326204.480013.2908 3.80888
--+---+---+---++---+
no ple kernbench 1x result for reference: 46.056133

--+---+---+---++---+
ebizzy (record/sec higher is better)
--+---+---+---++---+
   base   stdev   patched   stdev  %improve
--+---+---+---++---+
1x  5609.200056.93436263.700064.7097 11.66833
2x  2071.9000   108.48292653.5000   181.8395 28.07085
3x  1557.4167   109.71411993.5000   166.3176 28.00043
4x  1254.750091.29971765.5000   237.5410 40.70532
--+---+---+---++---+
no ple ebizzy 1x result for reference : 7394.9 rec/sec

Please let me know if you have any suggestions and comments.

Raghavendra K T (2):
kvm: Record the preemption status of vcpus using preempt notifiers
kvm: Iterate over only vcpus that are preempted


Reviewed-by: Marcelo Tosatti 



Thank you Marcelo.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 04/14] x86, ACPI: make acpi override finding work with 32bit flat mode

2013-03-07 Thread Andrew Morton

On Thu, 7 Mar 2013 22:57:21 -0800 Yinghai Lu  wrote:

> >
> >> @@ -552,38 +552,47 @@ u8 __init acpi_table_checksum(u8 *buffer, u32 length)
> >>   return sum;
> >>  }
> >>
> >> -/* All but ACPI_SIG_RSDP and ACPI_SIG_FACS: */
> >> -static const char * const table_sigs[] = {
> >> - ACPI_SIG_BERT, ACPI_SIG_CPEP, ACPI_SIG_ECDT, ACPI_SIG_EINJ,
> >> - ACPI_SIG_ERST, ACPI_SIG_HEST, ACPI_SIG_MADT, ACPI_SIG_MSCT,
> >> - ACPI_SIG_SBST, ACPI_SIG_SLIT, ACPI_SIG_SRAT, ACPI_SIG_ASF,
> >> - ACPI_SIG_BOOT, ACPI_SIG_DBGP, ACPI_SIG_DMAR, ACPI_SIG_HPET,
> >> - ACPI_SIG_IBFT, ACPI_SIG_IVRS, ACPI_SIG_MCFG, ACPI_SIG_MCHI,
> >> - ACPI_SIG_SLIC, ACPI_SIG_SPCR, ACPI_SIG_SPMI, ACPI_SIG_TCPA,
> >> - ACPI_SIG_UEFI, ACPI_SIG_WAET, ACPI_SIG_WDAT, ACPI_SIG_WDDT,
> >> - ACPI_SIG_WDRT, ACPI_SIG_DSDT, ACPI_SIG_FADT, ACPI_SIG_PSDT,
> >> - ACPI_SIG_RSDT, ACPI_SIG_XSDT, ACPI_SIG_SSDT, NULL };
> >
> > Why is this table made a stack variable?  What's the benefit of doing
> > that?
> 
> so I do need to switch global variables to phys and access it.

What Tejun means is that it should be marked "static" within
acpi_initrd_override(), so we don't have to build a copy on the stack
at runtime each time acpi_initrd_override() is called.

While we're there, it should be __initdata also.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 10/14] x86, mm, numa: Move emulation handling down.

2013-03-07 Thread Yinghai Lu

On Thu, Mar 7, 2013 at 10:42 PM, Tejun Heo  wrote:
> On Thu, Mar 07, 2013 at 08:58:36PM -0800, Yinghai Lu wrote:
>> -static int __init numa_check_memblks(struct numa_meminfo *mi)
>> +
>> +int __init numa_check_memblks(struct numa_meminfo *mi)
>>  {
>> + nodemask_t tmp_node_map;
>>   unsigned long pfn_align;
>>
>>   /* Account for nodes with cpus and no memory */
>> - node_possible_map = numa_nodes_parsed;
>> - numa_nodemask_from_meminfo(&node_possible_map, mi);
>> - if (WARN_ON(nodes_empty(node_possible_map)))
>> + tmp_node_map = numa_nodes_parsed;
>> + numa_nodemask_from_meminfo(&tmp_node_map, mi);
>> + if (WARN_ON(nodes_empty(tmp_node_map)))
>>   return -EINVAL;
>>
>>   if (!numa_meminfo_cover_memory(mi))
>> @@ -562,6 +564,7 @@ static int __init numa_check_memblks(struct numa_meminfo 
>> *mi)
>>   return -EINVAL;
>>   }
>>
>> + node_possible_map = tmp_node_map;
>
> Hmmm it's kinda nasty to have a side effect like the above for a
> function named numa_check_memblks().  Maybe we can move this to the
> caller or name the function to make it clear that some global state is
> being updated?

ok, will split it out for node_possibe_map updating.

>
>> @@ -608,8 +611,6 @@ static int __init numa_init(int (*init_func)(void))
>>   if (ret < 0)
>>   return ret;
>>
>> - numa_emulation(&numa_meminfo, numa_distance_cnt);
>> -
>>   ret = numa_check_memblks(&numa_meminfo);
>>   if (ret < 0)
>>   return ret;
>> @@ -669,6 +670,8 @@ void __init x86_numa_init(void)
>>   numa_init(dummy_numa_init);
>>
>>  out:
>> + numa_emulation(&numa_meminfo, numa_distance_cnt);
>> +
>>   for (i = 0; i < mi->nr_blks; i++) {
>>   struct numa_memblk *mb = &mi->blk[i];
>>   memblock_set_node(mb->start, mb->end - mb->start, mb->nid);
>> diff --git a/arch/x86/mm/numa_emulation.c b/arch/x86/mm/numa_emulation.c
>> index d47..5a0433d 100644
>> --- a/arch/x86/mm/numa_emulation.c
>> +++ b/arch/x86/mm/numa_emulation.c
>> @@ -348,7 +348,7 @@ void __init numa_emulation(struct numa_meminfo 
>> *numa_meminfo, int numa_dist_cnt)
>>   if (ret < 0)
>>   goto no_emu;
>>
>> - if (numa_cleanup_meminfo(&ei) < 0) {
>> + if (numa_cleanup_meminfo(&ei) < 0 || numa_check_memblks(&ei) < 0) {
>>   pr_warning("NUMA: Warning: constructed meminfo invalid, 
>> disabling emulation\n");
>>   goto no_emu;
>>   }
>
> Given that acpi is the only mechanism which matters in any modern NUMA
> machines, I think the re-ordering should be fine.

Good.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v8 0/12] Palmas Updates

2013-03-07 Thread Linus Walleij

On Thu, Mar 7, 2013 at 2:23 PM, Ian Lartey  wrote:

> This patchset adds to the support for the Palmas iseries of PMIC chips.
>
> Some of the patches have previously been submitted individually.
> The DT bindings doc has been added first due to comments that it was
> missing.

Can the patches to the individual subsystems be applied individually
(like can we apply the two GPIO patches to the GPIO tree) or are
the deps such that the whole shebang needs to go in at once and you're
just harvesting ACKs to take it all into MFD or similar?

Yours,
Linus Walleij
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 09/14] x86, mm, numa: set memblock nid later

2013-03-07 Thread Yinghai Lu

On Thu, Mar 7, 2013 at 10:28 PM, Tejun Heo  wrote:
> On Thu, Mar 07, 2013 at 08:58:35PM -0800, Yinghai Lu wrote:
>> Only set memblock nid one time.
>
> Would be awesome if the description explains why we're doing this and
> why we're allowed to do this now.

will add more:

set memblock nid will cause membock layout change like array could be doubled.
and we do not have current memblock limit set, so will put down under
1M and could
use too much under 1M.
And for fallback path, we can avoid restore memblock and remerge action.

After We use numa_meminfo nid for checking, we don't need to send it again.

Thanks

Yinghai
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: RFC: Zynq Clock Controller

2013-03-07 Thread Lars-Peter Clausen

On 03/08/2013 12:25 AM, Sören Brinkmann wrote:
> On Thu, Mar 07, 2013 at 11:02:58PM +0100, Lars-Peter Clausen wrote:
>> On 03/07/2013 08:11 PM, Sören Brinkmann wrote:
>>> On Thu, Mar 07, 2013 at 10:36:35AM +0100, Lars-Peter Clausen wrote:
 On 03/06/2013 06:27 PM, Sören Brinkmann wrote:
> Hi Jan,
>  
> what a small world. Good to hear from you.
>
> On Wed, Mar 06, 2013 at 12:51:21PM +0100, Jan Lübbe wrote:
>> Hi Sören,
>>
>> On Tue, 2013-03-05 at 12:04 -0800, Sören Brinkmann wrote:
>>> For this reasons, I'd like to propose moving Zynq into the same
>>> direction. I.e. adding a clock controller with the following DT
>>> description (details may change but the general idea should become
>>> clear):
>>> clkc: clkc {
>>> #clock-cells = <1>;
>>> compatible = "xlnx,ps7-clkc";
>>> ps_clk_frequency = <>;  # board x-tal
>>> # optional props
>>> gem0_emio_clk_freq = <12500>;
>>> gem1_emio_clk_freq = <5000>;
>>> can_mio_clk_freq_xx = <1234>; # this is possible 54 
>>> times with xx = 00..53
>>> };

 I definitely prefer the way it is right now in upstream, where we have one
 dt node per clock. It is more descriptive and also more extensible. And you
 also don't have to remember the clock index, and can use the phandle
 directly instead.
>>> The problems I see with that are:
>>>  - with the clock controller we can model the clock tree as we need without 
>>> changing the DT all the time
>>
>> When do we need to change the clock tree? The clock tree is pretty much fixed
>> in hardware. And the DT describes the hardware, so I don't so a problem 
>> there.
>>
>>>  - w/o a clock controller there is no block which has knowledge of the 
>>> whole clock tree (unless we parse the DT), and could conduct reparent 
>>> operations and similar
>>
>> The clk framework builds a representation of the clock tree. Each clock 
>> should
>> be able to handle re-parenting on it's own, without knowing about the other
>> clocks in the tree, the parent is selected by a field in the clocks register.
>> It doesn't even have to know who the parents are, the clk framework will take
>> care of all of this and just say, ok, switch your input clock to X, where X 
>> is
>> a simple integer number The clk framework will also take care of 
>> re-calculating
>> all the updates frequencies after re-parenting.
> Nope, clk_set_parent() takes two 'struct *clk' as argument. So, you have to 
> have those of the clock to change and all its parents. A device driver for 
> example, which gets a clock through clk_get() does not have that information 
> and should not, since it should not have to be aware of the SOC's clock 
> hierarchy, IMHO.
> 

Yes, you global clk_set_parent() method takes two clk structs. But the clock
framework takes care of all the magic of mapping that second clk struct to a
integer number, based on the clks parent list. So your set_parent callback in
your clk_ops takes as parameters your clk and the index of the parent.

>>
>>>  - once the clock controller is properly defined the clock connection 
>>> should be contained in a dtsi which never changes. So, regarding 
>>> remembering outputs, I don't see a problem. I rather see issues with having 
>>> a pile of clocks described in the DT. I have currently 44 outputs proposed, 
>>> and to model the whole tree a lot of more clocks are used (I started 
>>> modeling everything with the clock primitives and removed custom clock 
>>> implementations, except for the PLLs). Having all those in the DT does not 
>>> really help, IMHO.
>>
>> Well, except if you want to use external clocks for ethernet or CAN. Also you
>> don't need to change the dtsi either if you have a separate node for each 
>> clock.
>>
>> To avoid misunderstandings let me clarify that I don't want to have one node
>> per clock, but rather one node per clock block (or whatever they are called).
>> The combination of input select + divider + output gate. E.g. the UART clock
>> block with it's 3 inputs and two outputs.
>>
>> Also the APER clocks should probably be one node with 24 outputs.
> Okay, so instead of having one block encapsulating all clocks, you want it on 
> a finer granularity. I don't see huge differences why that should be 
> advantageous? It would just mean to create several blocks with their custom 
> DT bindings instead of a single one. Just the abstraction level would be a 
> little different.

The DT is supposed to describe the hardware, not the software. These are the
basic blocks. The are independent of each other and mostly orthogonal. It's the
same IP block instantiated a couple of times next to each other (sometimes with
slightly different parameters). They just happen to have their register mapped
in the same IO region. The SLCR is just a container which co

Re: BUG: IPv4: Attempt to release TCP socket in state 1

2013-03-07 Thread dormando

> On Wed, 2013-03-06 at 16:41 -0800, dormando wrote:
>
> > Ok... bridge module is loaded but nothing seems to be using it. No
> > bond/tunnels/anything enabled. I couldn't quickly figure out what was
> > causing it to load.
> >
> > We removed the need for macvlan, started machines with a fresh boot, and
> > they still crashed without it, after a few hours.
> >
> > Unfortunately I just saw a machine crash in the same way on 3.6.6 and
> > 3.6.9. I'm working on getting a completely pristine 3.6.6 and 3.6.9
> > tested. Our patches are minor but there were a few, so I'm backing it all
> > out just to be sure.
> >
> > Is there anything in particular which is most interesting? I can post lots
> > and lots and lots of information. Sadly bridge/macvlan weren't part of the
> > problem. .config, sysctls are easiest I guess? When this "hang" happens
> > the machine is still up somewhat, but we lose access to it. Syslog is
> > still writing entries to disk occasionally, so it's possible we could set
> > something up to dump more information.
> >
> > It takes a day or two to cycle this, so it might take a while to get
> > information and test crashes.
>
> Thanks !
>
> Please add a stack trace, it might help :
>
> diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
> index 68f6a94..1d4d97e 100644
> --- a/net/ipv4/af_inet.c
> +++ b/net/ipv4/af_inet.c
> @@ -141,8 +141,9 @@ void inet_sock_destruct(struct sock *sk)
>   sk_mem_reclaim(sk);
>
>   if (sk->sk_type == SOCK_STREAM && sk->sk_state != TCP_CLOSE) {
> - pr_err("Attempt to release TCP socket in state %d %p\n",
> -sk->sk_state, sk);
> + pr_err("Attempt to release TCP socket family %d in state %d 
> %p\n",
> +sk->sk_family, sk->sk_state, sk);
> + WARN_ON_ONCE(1);
>   return;
>   }
>   if (!sock_flag(sk, SOCK_DEAD)) {

Ok. I have a pristine 3.6.6 up and testing now... It definitely looks like
we've been having this crash for quite a while, but much more rarely.
Recent changes in traffic have made it worse. I'll try your patch soon.

It'll take a few days to reproduce. I'll be back (ho ho ho). Please ping
with any ideas you folks might have in the meantime :(
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 1/3] pinctrl: exynos: add exynos5250 SoC specific data

2013-03-07 Thread Linus Walleij

On Thu, Mar 7, 2013 at 12:09 PM, Kukjin Kim  wrote:
> [Me]
>> So shall I merge these three patches to the pinctrl tree or not?
>>
> Hi Linus, this series is already in Samsung tree with your ack.

OK good, thanks!

Yours,
Linus Walleij
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 03/14] x86, ACPI: store override acpi tables phys addr

2013-03-07 Thread Tejun Heo

On Thu, Mar 07, 2013 at 10:49:04PM -0800, Yinghai Lu wrote:
> >> @@ -654,10 +654,13 @@ void __init acpi_initrd_override_copy(void)
> >>   arch_reserve_mem_area(acpi_tables_addr, all_tables_size);
> >>
> >>   for (no = 0; no < table_nr; no++) {
> >> - size_t size = early_initrd_files[no].size;
> >> + unsigned long size = early_initrd_files[no].size;
> >>
> >>   p = early_ioremap(acpi_tables_addr + total_offset, size);
> >> - memcpy(p, early_initrd_files[no].data, size);
> >> + q = early_ioremap((unsigned long)early_initrd_files[no].data,
> >> +  size);
> >> + memcpy(p, q, size);
> >> + early_iounmap(q, size);
> >
> > Ah, okay, so the loop change in the previous patch was for this, I
> > suppose?  That chunk probably should either be a separate patch or
> > rolled into this one.
> 
> merge two patches?

Hmm... probably better to just move the related chunks from the
previous patch to this one with better explanation on what's going on.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 05/14] x86, ACPI: Find acpi tables in initrd early at head_32.S/head64.c

2013-03-07 Thread Tejun Heo

On Thu, Mar 07, 2013 at 11:02:15PM -0800, Yinghai Lu wrote:
> > Also, does it really have to be called from head_32.S?  No way this
> > can be done after entering C code?  It would be great if you can
> > explain overall design choices in the head message (and important
> > patches).
> 
> have to be with head_32.S and it is with 32bit flat mode, so could access
> 4G blow without setting page table.
> 
> Will try add to more in the change log.

Yes, please.  In the comment too.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 04/14] x86, ACPI: make acpi override finding work with 32bit flat mode

2013-03-07 Thread Tejun Heo

Hello, Yinghai.

On Thu, Mar 07, 2013 at 10:57:21PM -0800, Yinghai Lu wrote:
> On Thu, Mar 7, 2013 at 9:50 PM, Tejun Heo  wrote:
> > On Thu, Mar 07, 2013 at 08:58:30PM -0800, Yinghai Lu wrote:
> >> We will find acpi tables in initrd during head_32.S in 32bit flat mode.
> >>
> >> So need acpi_initrd_override_find could take phys directly.
> >
> > The patch description doesn't explain even half of what's going on.
> 
> hope HPA could understand.
> 
> Access initrd before relocate_initrd and init_memory mapping.

I really hope the changelogs were better.  Eh well...

> >> -/* All but ACPI_SIG_RSDP and ACPI_SIG_FACS: */
> >> -static const char * const table_sigs[] = {
> >> - ACPI_SIG_BERT, ACPI_SIG_CPEP, ACPI_SIG_ECDT, ACPI_SIG_EINJ,
> >> - ACPI_SIG_ERST, ACPI_SIG_HEST, ACPI_SIG_MADT, ACPI_SIG_MSCT,
> >> - ACPI_SIG_SBST, ACPI_SIG_SLIT, ACPI_SIG_SRAT, ACPI_SIG_ASF,
> >> - ACPI_SIG_BOOT, ACPI_SIG_DBGP, ACPI_SIG_DMAR, ACPI_SIG_HPET,
> >> - ACPI_SIG_IBFT, ACPI_SIG_IVRS, ACPI_SIG_MCFG, ACPI_SIG_MCHI,
> >> - ACPI_SIG_SLIC, ACPI_SIG_SPCR, ACPI_SIG_SPMI, ACPI_SIG_TCPA,
> >> - ACPI_SIG_UEFI, ACPI_SIG_WAET, ACPI_SIG_WDAT, ACPI_SIG_WDDT,
> >> - ACPI_SIG_WDRT, ACPI_SIG_DSDT, ACPI_SIG_FADT, ACPI_SIG_PSDT,
> >> - ACPI_SIG_RSDT, ACPI_SIG_XSDT, ACPI_SIG_SSDT, NULL };
> >
> > Why is this table made a stack variable?  What's the benefit of doing
> > that?
> 
> so I do need to switch global variables to phys and access it.

I can't really understand what your response means.  Can you please
elaborate?

> > Is it really necessary to make the function take both virtual and
> > physical addresses?  Can't we just make the function take phys_addr_t
> > and update everyone to call with physaddr?  Also @is_phys isn't simple
> > address switch.  It also changes error reporting.  If you're gonna
> > keep @is_phys, let's at least write up a function comment explaining
> > what's going on and why we need it.  But, really, if at all possible,
> > let's change the function to take single type of argument and
> > predicate error message printing on something else (e.g. early printk
> > initialized or whatever).
> 
> yes, one for 32bit from head_32.S, phys.
> one for 64bit from head64.c. with _va.

head64.c can't call with phys?  Why not?

> Not sure if I could use early_printk from head_32.S, as Fenghua does
> not print out
> from microcode updating early in the same parts.

ISTR it works but it doens't have to (although it would be much nicer
if it did).  You can test whether printk is online and skip if it
isn't online yet.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] mmc/host:rtsx: Don't execute power up sequence repeatedly

2013-03-07 Thread wei_wang

From: Wei WANG 

For some Realtek's card reader, power up sequence can only be executed
when power has been turned off fully.
So rtsx host should not start power up sequence again when set_ios been
called if the power has been turned on.

Signed-off-by: Wei WANG 
---
 drivers/mmc/host/rtsx_pci_sdmmc.c |   10 ++
 1 file changed, 10 insertions(+)

diff --git a/drivers/mmc/host/rtsx_pci_sdmmc.c 
b/drivers/mmc/host/rtsx_pci_sdmmc.c
index f981f7d..ad13f42 100644
--- a/drivers/mmc/host/rtsx_pci_sdmmc.c
+++ b/drivers/mmc/host/rtsx_pci_sdmmc.c
@@ -57,6 +57,9 @@ struct realtek_pci_sdmmc {
booleject;
boolinitial_mode;
boolddr_mode;
+   int power_state;
+#define SDMMC_POWER_ON 1
+#define SDMMC_POWER_OFF0
 };
 
 static inline struct device *sdmmc_dev(struct realtek_pci_sdmmc *host)
@@ -765,6 +768,9 @@ static int sd_power_on(struct realtek_pci_sdmmc *host)
struct rtsx_pcr *pcr = host->pcr;
int err;
 
+   if (host->power_state == SDMMC_POWER_ON)
+   return 0;
+
rtsx_pci_init_cmd(pcr);
rtsx_pci_add_cmd(pcr, WRITE_REG_CMD, CARD_SELECT, 0x07, SD_MOD_SEL);
rtsx_pci_add_cmd(pcr, WRITE_REG_CMD, CARD_SHARE_MODE,
@@ -787,6 +793,7 @@ static int sd_power_on(struct realtek_pci_sdmmc *host)
if (err < 0)
return err;
 
+   host->power_state = SDMMC_POWER_ON;
return 0;
 }
 
@@ -795,6 +802,8 @@ static int sd_power_off(struct realtek_pci_sdmmc *host)
struct rtsx_pcr *pcr = host->pcr;
int err;
 
+   host->power_state = SDMMC_POWER_OFF;
+
rtsx_pci_init_cmd(pcr);
 
rtsx_pci_add_cmd(pcr, WRITE_REG_CMD, CARD_CLK_EN, SD_CLK_EN, 0);
@@ -1260,6 +1269,7 @@ static int rtsx_pci_sdmmc_drv_probe(struct 
platform_device *pdev)
host->pcr = pcr;
host->mmc = mmc;
host->pdev = pdev;
+   host->power_state = SDMMC_POWER_OFF;
platform_set_drvdata(pdev, host);
pcr->slots[RTSX_SD_CARD].p_dev = pdev;
pcr->slots[RTSX_SD_CARD].card_event = rtsx_pci_sdmmc_card_event;
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 08/14] x86, mm, numa: use numa_meminfo to check node_map_pfn alignment

2013-03-07 Thread Yinghai Lu

On Thu, Mar 7, 2013 at 10:26 PM, Tejun Heo  wrote:
> On Thu, Mar 07, 2013 at 08:58:34PM -0800, Yinghai Lu wrote:
>> We could use numa_meminfo directly instead of memblock nid.
>>
>> So we could move down set memblock nid down and only do it one time
>> for successful path
>>
>> Move node_map_pfn_alignment() to arch/x86/mm as no other user for it.
>
> Please don't move and update in the same patch.  It makes it difficult
> to review what's really changing.

ok, will make it two patches, one for moving and one for changing to
numa_meminfo.

Thanks

Yinghai
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 06/14] x86, mm, numa: Move successful path handling code later

2013-03-07 Thread Yinghai Lu

On Thu, Mar 7, 2013 at 10:04 PM, Tejun Heo  wrote:
>>  static int __init numa_register_memblks(struct numa_meminfo *mi)
>
> After this patch, the above name is a bit misleading, I think.

later i changed it to numa_check_memblks()

Thanks

Yinghai
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 05/14] x86, ACPI: Find acpi tables in initrd early at head_32.S/head64.c

2013-03-07 Thread Yinghai Lu

On Thu, Mar 7, 2013 at 9:57 PM, Tejun Heo  wrote:
> On Thu, Mar 07, 2013 at 08:58:31PM -0800, Yinghai Lu wrote:
>> diff --git a/arch/x86/kernel/head_32.S b/arch/x86/kernel/head_32.S
>> index 73afd11..ca08f0e 100644
>> --- a/arch/x86/kernel/head_32.S
>> +++ b/arch/x86/kernel/head_32.S
>> @@ -149,6 +149,10 @@ ENTRY(startup_32)
>>   call load_ucode_bsp
>>  #endif
>>
>> +#ifdef CONFIG_ACPI_INITRD_TABLE_OVERRIDE
>> + call x86_acpi_override_find
>> +#endif
>
> The function is always defined.  We can probalby lose ifdef here?

just mimic microcode updating again.

>
> Also, does it really have to be called from head_32.S?  No way this
> can be done after entering C code?  It would be great if you can
> explain overall design choices in the head message (and important
> patches).

have to be with head_32.S and it is with 32bit flat mode, so could access
4G blow without setting page table.

Will try add to more in the change log.

>
>> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
>> index 668e658..d43545a 100644
>> --- a/arch/x86/kernel/setup.c
>> +++ b/arch/x86/kernel/setup.c
>> @@ -424,6 +424,32 @@ static void __init reserve_initrd(void)
>>  }
>>  #endif /* CONFIG_BLK_DEV_INITRD */
>>
>> +#ifdef CONFIG_ACPI_INITRD_TABLE_OVERRIDE
>> +void __init x86_acpi_override_find(void)
>> +{
>> + unsigned long ramdisk_image, ramdisk_size;
>> + unsigned char *p = NULL;
>> +
>> +#ifdef CONFIG_X86_32
>> + struct boot_params *boot_params_p;
>> +
>> + boot_params_p = (struct boot_params *)__pa_symbol(&boot_params);
>> + ramdisk_image = boot_params_p->hdr.ramdisk_image;
>> + ramdisk_size  = boot_params_p->hdr.ramdisk_size;
>> + p = (unsigned char *)ramdisk_image;
>> + acpi_initrd_override_find(p, ramdisk_size, true);
>> +#else
>> + ramdisk_image = get_ramdisk_image();
>> + ramdisk_size  = get_ramdisk_size();
>> + if (ramdisk_image)
>> + p = __va(ramdisk_image);
>> + acpi_initrd_override_find(p, ramdisk_size, false);
>> +#endif
>> +}
>> +#else
>> +void __init x86_acpi_override_find(void) { }
>
> And add a comment here why we're not doing static inline for the dummy
> function?

...
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 14/14] x86, mm: Put pagetable on local node ram

2013-03-07 Thread Tejun Heo

On Thu, Mar 07, 2013 at 08:58:40PM -0800, Yinghai Lu wrote:
> If node with ram is hotplugable, local node mem for page table and vmemmap
> should be on that node ram.
> 
> This patch is some kind of refreshment of
> | commit 1411e0ec3123ae4c4ead6bfc9fe3ee5a3ae5c327
> | Date:   Mon Dec 27 16:48:17 2010 -0800
> |
> |x86-64, numa: Put pgtable to local node memory
> That was reverted before.
> 
> We have reason to reintroduce it to make memory hotplug work.
> 
> Split calling of init_mem_mapping into early_initmem_info
> for nodes after we get numa info there.
> 
> First node will be low range.
> Need to rework alloc_low_pages to alloc page table in following order:
>   BRK, local node, low range
> 
> Still only load_cr3 one time, otherwise we would break xen 64bit again.

Hmmm... can you please split this patch further?  init_mem_mapping()
change can be separated, no?  Also, comments are disturbingly missing.
How are other people reading the code supposed to know what it's
trying to achieve why and how?  Hmmm... we're also likely to end up
with smaller mapping for misaligned NUMA configurations (I think my
test machine is like that).  Is it guaranteed that the top level ends
up in the first node?  It really needs documentation.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v3 1/2] Input: atmel_mxt_ts - Support for touchpad variant

2013-03-07 Thread Benson Leung

On Thu, Mar 7, 2013 at 7:43 PM, Benson Leung  wrote:
> +static void mxt_input_button(struct mxt_data *data, struct mxt_message 
> *message)
> +{
> +   struct device *dev = &data->client->dev;
Oops. I missed a warning: unused variable 'dev' here.

> +   struct input_dev *input = data->input_dev;
> +   bool button;
> +   int i;
> +
> +   /* Active-low switch */
> +   for (i = 0; i < MXT_NUM_GPIO; i++) {
> +   if (data->pdata->key_map[i] == KEY_RESERVED)
> +   continue;
> +   button = !(message->message[0] & MXT_GPIO0_MASK << i);
> +   input_report_key(input, data->pdata->key_map[i], button);
> +   }
> +}
> +




--
Benson Leung
Software Engineer, Chrom* OS
ble...@chromium.org
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 04/14] x86, ACPI: make acpi override finding work with 32bit flat mode

2013-03-07 Thread Yinghai Lu

On Thu, Mar 7, 2013 at 9:50 PM, Tejun Heo  wrote:
> On Thu, Mar 07, 2013 at 08:58:30PM -0800, Yinghai Lu wrote:
>> We will find acpi tables in initrd during head_32.S in 32bit flat mode.
>>
>> So need acpi_initrd_override_find could take phys directly.
>
> The patch description doesn't explain even half of what's going on.

hope HPA could understand.

Access initrd before relocate_initrd and init_memory mapping.

>
>> @@ -552,38 +552,47 @@ u8 __init acpi_table_checksum(u8 *buffer, u32 length)
>>   return sum;
>>  }
>>
>> -/* All but ACPI_SIG_RSDP and ACPI_SIG_FACS: */
>> -static const char * const table_sigs[] = {
>> - ACPI_SIG_BERT, ACPI_SIG_CPEP, ACPI_SIG_ECDT, ACPI_SIG_EINJ,
>> - ACPI_SIG_ERST, ACPI_SIG_HEST, ACPI_SIG_MADT, ACPI_SIG_MSCT,
>> - ACPI_SIG_SBST, ACPI_SIG_SLIT, ACPI_SIG_SRAT, ACPI_SIG_ASF,
>> - ACPI_SIG_BOOT, ACPI_SIG_DBGP, ACPI_SIG_DMAR, ACPI_SIG_HPET,
>> - ACPI_SIG_IBFT, ACPI_SIG_IVRS, ACPI_SIG_MCFG, ACPI_SIG_MCHI,
>> - ACPI_SIG_SLIC, ACPI_SIG_SPCR, ACPI_SIG_SPMI, ACPI_SIG_TCPA,
>> - ACPI_SIG_UEFI, ACPI_SIG_WAET, ACPI_SIG_WDAT, ACPI_SIG_WDDT,
>> - ACPI_SIG_WDRT, ACPI_SIG_DSDT, ACPI_SIG_FADT, ACPI_SIG_PSDT,
>> - ACPI_SIG_RSDT, ACPI_SIG_XSDT, ACPI_SIG_SSDT, NULL };
>
> Why is this table made a stack variable?  What's the benefit of doing
> that?

so I do need to switch global variables to phys and access it.

>
>>  /* Non-fatal errors: Affected tables/files are ignored */
>>  #define INVALID_TABLE(x, path, name) \
>> - { pr_err("ACPI OVERRIDE: " x " [%s%s]\n", path, name); continue; }
>> + do { pr_err("ACPI OVERRIDE: " x " [%s%s]\n", path, name); } while (0)
>
> Might as well rename the macro to something which indicates it's just
> printing error message.  Urgh... who thought embedding control flow
> directive like continue inside a macro was a good idea? :(

so I removed it.

>
>> -void __init acpi_initrd_override_find(void *data, size_t size)
>> +void __init acpi_initrd_override_find(void *data, size_t size, bool is_phys)
>
> Is it really necessary to make the function take both virtual and
> physical addresses?  Can't we just make the function take phys_addr_t
> and update everyone to call with physaddr?  Also @is_phys isn't simple
> address switch.  It also changes error reporting.  If you're gonna
> keep @is_phys, let's at least write up a function comment explaining
> what's going on and why we need it.  But, really, if at all possible,
> let's change the function to take single type of argument and
> predicate error message printing on something else (e.g. early printk
> initialized or whatever).

yes, one for 32bit from head_32.S, phys.
one for 64bit from head64.c. with _va.

Not sure if I could use early_printk from head_32.S, as Fenghua does
not print out
from microcode updating early in the same parts.

Will check that.

>
>> @@ -654,11 +677,14 @@ void __init acpi_initrd_override_copy(void)
>>   arch_reserve_mem_area(acpi_tables_addr, all_tables_size);
>>
>>   for (no = 0; no < table_nr; no++) {
>> + unsigned long phys_addr = (unsigned 
>> long)early_initrd_files[no].data;
>
> Can we please use phys_addr_t for physical addresses?

ok.

>
>>   unsigned long size = early_initrd_files[no].size;
>>
>> + q = early_ioremap(phys_addr, size);
>> + pr_info("%4.4s ACPI table found in initrd [%#010lx-%#010lx]\n",
>> + ((struct acpi_table_header *)q)->signature,
>> + phys_addr, phys_addr + size - 1);
>
> Maybe putting pr_info after ioremapping both p and q would be easier
> on the eyes?

ok.

>
>>   p = early_ioremap(acpi_tables_addr + total_offset, size);
>> - q = early_ioremap((unsigned long)early_initrd_files[no].data,
>> -  size);
>>   memcpy(p, q, size);
>>   early_iounmap(q, size);
>>   early_iounmap(p, size);

Thanks

Yinghai
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [PATCH 2/2] PCI: fix system hang issue of Marvell SATA host controller

2013-03-07 Thread Xiangliang Yu

Hi, Bjorn

> >> > Fix system hang issue: if first accessed resource file of BAR0 ~
> >> > BAR4, system will hang after executing lspci command
> >>
> >> This needs more explanation.  We've already read the BARs by the time
> >> header quirks are run, so apparently it's not just the mere act of
> >> accessing a BAR that causes a hang.
> >>
> >> We need to know exactly what's going on here.  For example, do BARs
> >> 0-4 exist?  Does the device decode accesses to the regions described
> >> by the BARs?  The PCI core has to know what resources the device uses,
> >> so if the device decodes accesses, we can't just throw away the
> >> start/end information.
> > The BARs 0-4 is exist and the PCI device is enable IO space, but user access
> the regions file by udevadm command with info parameter, the system will hang.
> > Like this: udevadmin info --attribut-walk
> --path=/sys/device/pci-device/000:*.
> > Because the device is just AHCI host controller, don't need the BAR0 ~ 4 
> > region
> file.
> > Is my explanation ok for the patch?
> 
> No, I still don't know what causes the hang; I only know that udevadm
> can trigger it.  I don't want to just paper over the problem until we
> know what the root cause is.
> 
> Does "lspci -H1 -vv" also cause a hang?  What about "setpci -s
> BASE_ADDRESS_0"?  "setpci -H1 -s BASE_ADDRESS_0"?
The commands are ok because the commands can't find the device after accessing 
IO port.
The root cause is that accessing of IO port will make the chip go bad. So, the 
point of the patch is don't export capability of the IO accessing.

> 
> >>
> >> > ---
> >> >  drivers/pci/quirks.c |   15 +++
> >> >  1 files changed, 15 insertions(+), 0 deletions(-)
> >> >
> >> > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> >> > index 0369fb6..d49f8dc 100644
> >> > --- a/drivers/pci/quirks.c
> >> > +++ b/drivers/pci/quirks.c
> >> > @@ -44,6 +44,21 @@ static void quirk_mmio_always_on(struct pci_dev *dev)
> >> >  DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_ANY_ID, PCI_ANY_ID,
> >> > PCI_CLASS_BRIDGE_HOST, 8,
> >> quirk_mmio_always_on);
> >> >
> >> > +/* The BAR0 ~ BAR4 of Marvell 9125 device can't be accessed
> >> > +*  by IO resource file, and need to skip the files
> >> > +*/
> >> > +static void quirk_marvell_mask_bar(struct pci_dev *dev)
> >> > +{
> >> > +   int i;
> >> > +
> >> > +   for (i = 0; i < 5; i++)
> >> > +   if (dev->resource[i].start)
> >> > +   dev->resource[i].start =
> >> > +   dev->resource[i].end = 0;
> >> > +}
> >> > +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_MARVELL_EXT, 0x9125,
> >> > +   quirk_marvell_mask_bar);
> >> > +
> >> >  /* The Mellanox Tavor device gives false positive parity errors
> >> >   * Mark this device with a broken_parity_status, to allow
> >> >   * PCI scanning code to "skip" this now blacklisted device.
> >> > --
> >> > 1.7.5.4
> >> >
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 03/14] x86, ACPI: store override acpi tables phys addr

2013-03-07 Thread Yinghai Lu

On Thu, Mar 7, 2013 at 9:36 PM, Tejun Heo  wrote:
> On Thu, Mar 07, 2013 at 08:58:29PM -0800, Yinghai Lu wrote:
>> As later 32bit only find table with phys address during 32bit flat mode
>> in head_32.S.
>>
>> To keep 32bit and 64 bit consistent, use phys_addr for all.
>>
>> Use early_ioremap to access during copying.
>>
>> Signed-off-by: Yinghai Lu 
>> Cc: Thomas Renninger 
>> Cc: Rafael J. Wysocki 
>> Cc: linux-a...@vger.kernel.org
>> ---
>> @@ -654,10 +654,13 @@ void __init acpi_initrd_override_copy(void)
>>   arch_reserve_mem_area(acpi_tables_addr, all_tables_size);
>>
>>   for (no = 0; no < table_nr; no++) {
>> - size_t size = early_initrd_files[no].size;
>> + unsigned long size = early_initrd_files[no].size;
>>
>>   p = early_ioremap(acpi_tables_addr + total_offset, size);
>> - memcpy(p, early_initrd_files[no].data, size);
>> + q = early_ioremap((unsigned long)early_initrd_files[no].data,
>> +  size);
>> + memcpy(p, q, size);
>> + early_iounmap(q, size);
>
> Ah, okay, so the loop change in the previous patch was for this, I
> suppose?  That chunk probably should either be a separate patch or
> rolled into this one.

merge two patches?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] delay blacklist symbol lookup until we actually need it

2013-03-07 Thread Masami Hiramatsu

(2013/03/07 21:45), oskar.and...@sonymobile.com wrote:
> From: Toby Collett 
> 
> The symbol lookup can take a long time and kprobes is
> initialised very early in boot, so delay symbol lookup
> until the blacklist is first used.

I like this idea, but the implementation may have to be changed
if we split blacklist into common/arch as the previous thread.

I'd like to see the update series of patches including that changes.

Thank you,

> 
> Reviewed-by: Radovan Lekanovic 
> Signed-off-by: Toby Collett 
> Signed-off-by: Oskar Andero 
> ---
>  kernel/kprobes.c |   91 +++--
>  1 files changed, 53 insertions(+), 38 deletions(-)
> 
> diff --git a/kernel/kprobes.c b/kernel/kprobes.c
> index e35be53..71a6bee 100644
> --- a/kernel/kprobes.c
> +++ b/kernel/kprobes.c
> @@ -68,6 +68,7 @@
>  #endif
>  
>  static int kprobes_initialized;
> +static int kprobe_blacklist_initialized;
>  static struct hlist_head kprobe_table[KPROBE_TABLE_SIZE];
>  static struct hlist_head kretprobe_inst_table[KPROBE_TABLE_SIZE];
>  
> @@ -102,6 +103,53 @@ static struct kprobe_blackpoint kprobe_blacklist[] = {
>   {NULL}/* Terminator */
>  };
>  
> +/* it can take some time ( > 100ms ) to initialise the
> + * blacklist so we delay this until we actually need it
> + */
> +static void init_kprobe_blacklist(void)
> +{
> + int i;
> + unsigned long offset = 0, size = 0;
> + char *modname, namebuf[128];
> + const char *symbol_name;
> + void *addr;
> + struct kprobe_blackpoint *kb;
> +
> + /*
> +  * Lookup and populate the kprobe_blacklist.
> +  *
> +  * Unlike the kretprobe blacklist, we'll need to determine
> +  * the range of addresses that belong to the said functions,
> +  * since a kprobe need not necessarily be at the beginning
> +  * of a function.
> +  */
> + for (kb = kprobe_blacklist; kb->name != NULL; kb++) {
> + kprobe_lookup_name(kb->name, addr);
> + if (!addr)
> + continue;
> +
> + kb->start_addr = (unsigned long)addr;
> + symbol_name = kallsyms_lookup(kb->start_addr,
> + &size, &offset, &modname, namebuf);
> + if (!symbol_name)
> + kb->range = 0;
> + else
> + kb->range = size;
> + }
> +
> + if (kretprobe_blacklist_size) {
> + /* lookup the function address from its name */
> + for (i = 0; kretprobe_blacklist[i].name != NULL; i++) {
> + kprobe_lookup_name(kretprobe_blacklist[i].name,
> +kretprobe_blacklist[i].addr);
> + if (!kretprobe_blacklist[i].addr)
> + printk("kretprobe: lookup failed: %s\n",
> +kretprobe_blacklist[i].name);
> + }
> + }
> + kprobe_blacklist_initialized = 1;
> +}
> +
>  #ifdef __ARCH_WANT_KPROBES_INSN_SLOT
>  /*
>   * kprobe->ainsn.insn points to the copy of the instruction to be
> @@ -1331,6 +1379,9 @@ static int __kprobes in_kprobes_functions(unsigned long 
> addr)
>   if (addr >= (unsigned long)__kprobes_text_start &&
>   addr < (unsigned long)__kprobes_text_end)
>   return -EINVAL;
> +
> + if (!kprobe_blacklist_initialized)
> + init_kprobe_blacklist();
>   /*
>* If there exists a kprobe_blacklist, verify and
>* fail any probe registration in the prohibited area
> @@ -1816,6 +1867,8 @@ int __kprobes register_kretprobe(struct kretprobe *rp)
>   void *addr;
>  
>   if (kretprobe_blacklist_size) {
> + if (!kprobe_blacklist_initialized)
> + init_kprobe_blacklist();
>   addr = kprobe_addr(&rp->kp);
>   if (IS_ERR(addr))
>   return PTR_ERR(addr);
> @@ -2065,11 +2118,6 @@ static struct notifier_block kprobe_module_nb = {
>  static int __init init_kprobes(void)
>  {
>   int i, err = 0;
> - unsigned long offset = 0, size = 0;
> - char *modname, namebuf[128];
> - const char *symbol_name;
> - void *addr;
> - struct kprobe_blackpoint *kb;
>  
>   /* FIXME allocate the probe table, currently defined statically */
>   /* initialize all list heads */
> @@ -2079,39 +2127,6 @@ static int __init init_kprobes(void)
>   raw_spin_lock_init(&(kretprobe_table_locks[i].lock));
>   }
>  
> - /*
> -  * Lookup and populate the kprobe_blacklist.
> -  *
> -  * Unlike the kretprobe blacklist, we'll need to determine
> -  * the range of addresses that belong to the said functions,
> -  * since a kprobe need not necessarily be at the beginning
> -  * of a function.
> -  */
> - for (kb = kprobe_blacklist; kb->name != NULL; kb++) {
> - kprobe_lookup_name(kb->name, addr);
> - if (!addr)
> - continue;
> -
> -

Re: [PATCH 02/14] x86, ACPI: Split find/copy from acpi_initrd_override

2013-03-07 Thread Yinghai Lu

On Thu, Mar 7, 2013 at 9:33 PM, Tejun Heo  wrote:
> On Thu, Mar 07, 2013 at 08:58:28PM -0800, Yinghai Lu wrote:
>> diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
>> index c9e36d7..b9d2ff0 100644
>> --- a/drivers/acpi/osl.c
>> +++ b/drivers/acpi/osl.c
>> @@ -539,6 +539,7 @@ acpi_os_predefined_override(const struct 
>> acpi_predefined_names *init_val,
>>
>>  static u64 acpi_tables_addr;
>>  static int all_tables_size;
>> +static int table_nr;
>
> Not particularly good choice of name for static variable visible to
> multiple functions.  all_tables_size isn't a stellar choice either but
> no need to continue the tradition.  Maybe acpi_nr_initrd_files?  Also,
> why is this one defined here away from the actual table?

ok, acpi_nr_initrd_files.

will check if it could be killed.

>> -/* Must not increase 10 or needs code modification below */
>> -#define ACPI_OVERRIDE_TABLES 10
>> +#define ACPI_OVERRIDE_TABLES 64
>
> What's up with the silent bumping of table size?

will mention that in change log.

>
>> +static struct cpio_data __initdata early_initrd_files[ACPI_OVERRIDE_TABLES];
>
> acpi_initrd_files[]?  Do we really need the "early" designation
> together with initrd?

just move it out from acpi_initrd_override.

>
>> @@ -647,14 +653,14 @@ void __init acpi_initrd_override(void *data, size_t 
>> size)
>>   memblock_reserve(acpi_tables_addr, acpi_tables_addr + all_tables_size);
>>   arch_reserve_mem_area(acpi_tables_addr, all_tables_size);
>>
>> - p = early_ioremap(acpi_tables_addr, all_tables_size);
>> -
>>   for (no = 0; no < table_nr; no++) {
>> - memcpy(p + total_offset, early_initrd_files[no].data,
>> -early_initrd_files[no].size);
>> - total_offset += early_initrd_files[no].size;
>> + size_t size = early_initrd_files[no].size;
>> +
>> + p = early_ioremap(acpi_tables_addr + total_offset, size);
>> + memcpy(p, early_initrd_files[no].data, size);
>> + early_iounmap(p, size);
>> + total_offset += size;
>>   }
>> - early_iounmap(p, all_tables_size);
>
> Why is this necessary?  Why no explanation in the description?

actually it is the reason for bump table_nr to 64.

early_ioremap only can map 256k one time, so there will have limit for
overall size.

If map one by one, then we could increase the number of limit.

>
>> --- a/include/linux/acpi.h
>> +++ b/include/linux/acpi.h
>> @@ -79,14 +79,6 @@ typedef int (*acpi_tbl_table_handler)(struct 
>> acpi_table_header *table);
>>  typedef int (*acpi_tbl_entry_handler)(struct acpi_subtable_header *header,
>> const unsigned long end);
>>
>> -#ifdef CONFIG_ACPI_INITRD_TABLE_OVERRIDE
>> -void acpi_initrd_override(void *data, size_t size);
>> -#else
>> -static inline void acpi_initrd_override(void *data, size_t size)
>> -{
>> -}
>> -#endif
>> -
>>  char * __acpi_map_table (unsigned long phys_addr, unsigned long size);
>>  void __acpi_unmap_table(char *map, unsigned long size);
>>  int early_acpi_boot_init(void);
>> @@ -485,6 +477,14 @@ static inline bool acpi_driver_match_device(struct 
>> device *dev,
>>
>>  #endif   /* !CONFIG_ACPI */
>>
>> +#ifdef CONFIG_ACPI_INITRD_TABLE_OVERRIDE
>> +void acpi_initrd_override_find(void *data, size_t size);
>> +void acpi_initrd_override_copy(void);
>> +#else
>> +static inline void acpi_initrd_override_find(void *data, size_t size) { }
>> +static inline void acpi_initrd_override_copy(void) { }
>> +#endif
>
> I don't get this part either.  Why is it necessary to move the
> prototypes to avoid #ifdefs in setup.c?  Ah, okay, you're brining it
> outside CONFIG_ACPI so that they're defined regardless of that config
> option.  Can you please add why you're moving the prototype in the
> descriptoin?  Having "what" is nice but "why" is much nicer. :)

I think i have that in the log.

more detail is : ACPI_INITRD_TABLE_OVERRIDE depends
one ACPI and BLK_DEV_INITRD.

So could move it out safely.

Thanks

Yinghai
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 11/14] x86, acpi, numa: split SLIT handling out

2013-03-07 Thread Tejun Heo

On Thu, Mar 07, 2013 at 08:58:37PM -0800, Yinghai Lu wrote:
> +void __init acpi_numa_init_only_slit(void)
> +{
> + /* SLIT: System Locality Information Table */
> + acpi_table_parse(ACPI_SIG_SLIT, acpi_parse_slit);
> +}
> +
> +static int __init __acpi_numa_init(bool with_slit)
>  {
>   int cnt = 0;

Hmmm how about just having the following two functions

acpi_numa_init_srat();
acpi_numa_init_slit();

and update both x86 and ia64 to use the two functions?

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] sched: wakeup buddy

2013-03-07 Thread Mike Galbraith

On Fri, 2013-03-08 at 10:37 +0800, Michael Wang wrote: 
> On 03/07/2013 05:43 PM, Mike Galbraith wrote:
> > On Thu, 2013-03-07 at 09:36 +0100, Peter Zijlstra wrote: 
> >> On Wed, 2013-03-06 at 15:06 +0800, Michael Wang wrote:
> >>
> >>> wake_affine() stuff is trying to bind related tasks closely, but it 
> >>> doesn't
> >>> work well according to the test on 'perf bench sched pipe' (thanks to 
> >>> Peter).
> >>
> >> so sched-pipe is a poor benchmark for this.. 
> >>
> >> Ideally we'd write a new benchmark that has some actual data footprint
> >> and we'd measure the cost of tasks being apart on the various cache
> >> metrics and see what affine wakeup does for it.
> >>
> >> Before doing something like what you're proposing, I'd have a hard look
> >> at WF_SYNC, it is possible we should disable/fix select_idle_sibling
> >> for sync wakeups.
> > 
> > If nobody beats me to it, I'm going to try tracking shortest round trip
> > to idle, and use a multiple of that to shut select_idle_sibling() down.
> > If avg_idle approaches round trip time, there's no win to be had, we're
> > just wasting cycles.
> 
> That's great if we have it, I'm a little doubt whether it is possible to
> find a better way to replace the select_idle_sibling() (look at the way
> it locates idle cpu...) in some cases, but I'm looking forward it ;-)

I'm not going to replace it, only stop it from wasting cycles when
there's very likely nothing to gain.  Save task wakeup time, if delta
rq->clock - p->last_wakeup < N*shortest_idle or some such very cheap
metric.  Wake ultra switchers L2 affine if allowed, only go hunting for
an idle L3 if the thing is on another package.  

In general, I think things would work better if we'd just rate limit how
frequently we can wakeup migrate each individual task.  We want
jabbering tasks to share L3, but we don't really want to trash L2 at an
awesome rate.

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: kswapd craziness round 2

2013-03-07 Thread Hillf Danton

On Fri, Mar 8, 2013 at 3:37 AM, Jiri Slaby  wrote:
> On 03/01/2013 03:02 PM, Hillf Danton wrote:
>> On Fri, Mar 1, 2013 at 1:02 AM, Jiri Slaby  wrote:
>>>
>>> Ok, no difference, kswap is still crazy. I'm attaching the output of
>>> "grep -vw '0' /proc/vmstat" if you see something there.
>>>
>> Thanks to you for test and data.
>>
>> Lets try to restore the deleted nap, then.
>
> Oh, it seems to be nice now:
> root   579  0.0  0.0  0 0 ?SMar04   0:13 [kswapd0]
>
Double thanks.

But Mel does not like it, probably.
Lets try nap in another way.

Hillf

--- a/mm/vmscan.c   Thu Feb 21 20:01:02 2013
+++ b/mm/vmscan.c   Fri Mar  8 14:36:10 2013
@@ -2793,6 +2793,10 @@ loop_again:
 * speculatively avoid congestion waits
 */
zone_clear_flag(zone, ZONE_CONGESTED);
+
+   else if (sc.priority > 2 &&
+sc.priority < DEF_PRIORITY - 2)
+   wait_iff_congested(zone, BLK_RW_ASYNC, HZ/10);
}

/*
--

>>
>> --- a/mm/vmscan.c Thu Feb 21 20:01:02 2013
>> +++ b/mm/vmscan.c Fri Mar  1 21:55:40 2013
>> @@ -2817,6 +2817,10 @@ loop_again:
>>*/
>>   if (sc.nr_reclaimed >= SWAP_CLUSTER_MAX)
>>   break;
>> +
>> + if (sc.priority < DEF_PRIORITY - 2)
>> + congestion_wait(BLK_RW_ASYNC, HZ/10);
>> +
>>   } while (--sc.priority >= 0);
>>
>>  out:
>> --
>>
>
>
> --
> js
> suse labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 10/14] x86, mm, numa: Move emulation handling down.

2013-03-07 Thread Tejun Heo

On Thu, Mar 07, 2013 at 08:58:36PM -0800, Yinghai Lu wrote:
> -static int __init numa_check_memblks(struct numa_meminfo *mi)
> +
> +int __init numa_check_memblks(struct numa_meminfo *mi)
>  {
> + nodemask_t tmp_node_map;
>   unsigned long pfn_align;
>  
>   /* Account for nodes with cpus and no memory */
> - node_possible_map = numa_nodes_parsed;
> - numa_nodemask_from_meminfo(&node_possible_map, mi);
> - if (WARN_ON(nodes_empty(node_possible_map)))
> + tmp_node_map = numa_nodes_parsed;
> + numa_nodemask_from_meminfo(&tmp_node_map, mi);
> + if (WARN_ON(nodes_empty(tmp_node_map)))
>   return -EINVAL;
>  
>   if (!numa_meminfo_cover_memory(mi))
> @@ -562,6 +564,7 @@ static int __init numa_check_memblks(struct numa_meminfo 
> *mi)
>   return -EINVAL;
>   }
>  
> + node_possible_map = tmp_node_map;

Hmmm it's kinda nasty to have a side effect like the above for a
function named numa_check_memblks().  Maybe we can move this to the
caller or name the function to make it clear that some global state is
being updated?

> @@ -608,8 +611,6 @@ static int __init numa_init(int (*init_func)(void))
>   if (ret < 0)
>   return ret;
>  
> - numa_emulation(&numa_meminfo, numa_distance_cnt);
> -
>   ret = numa_check_memblks(&numa_meminfo);
>   if (ret < 0)
>   return ret;
> @@ -669,6 +670,8 @@ void __init x86_numa_init(void)
>   numa_init(dummy_numa_init);
>  
>  out:
> + numa_emulation(&numa_meminfo, numa_distance_cnt);
> +
>   for (i = 0; i < mi->nr_blks; i++) {
>   struct numa_memblk *mb = &mi->blk[i];
>   memblock_set_node(mb->start, mb->end - mb->start, mb->nid);
> diff --git a/arch/x86/mm/numa_emulation.c b/arch/x86/mm/numa_emulation.c
> index d47..5a0433d 100644
> --- a/arch/x86/mm/numa_emulation.c
> +++ b/arch/x86/mm/numa_emulation.c
> @@ -348,7 +348,7 @@ void __init numa_emulation(struct numa_meminfo 
> *numa_meminfo, int numa_dist_cnt)
>   if (ret < 0)
>   goto no_emu;
>  
> - if (numa_cleanup_meminfo(&ei) < 0) {
> + if (numa_cleanup_meminfo(&ei) < 0 || numa_check_memblks(&ei) < 0) {
>   pr_warning("NUMA: Warning: constructed meminfo invalid, 
> disabling emulation\n");
>   goto no_emu;
>   }

Given that acpi is the only mechanism which matters in any modern NUMA
machines, I think the re-ordering should be fine.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] ARM: proc: Add Krait proc info

2013-03-07 Thread Will Deacon

On Wed, Mar 06, 2013 at 05:20:32AM +, Stephen Boyd wrote:
> On 03/05/13 14:03, Stephen Boyd wrote:
> > On 03/05/13 00:34, Will Deacon wrote:
> >> I was looking at this the other day and wondered whether we could set
> >> HWCAP_IDIV in __v7_setup, depending on ID_ISAR0[27:24]. I can't immediately
> >> think why that would be difficult, but similarly there may well be a reason
> >> why we assign it like this.
> >>
> >> Fancy taking a look?
> > Ok I'll take a look. 
> 
> Hmm. I wonder if we did it this way because between version B and C of
> DDI0406 the definition of those bits changed.
> 
> In DDI0406B we have
> 
> 0 - no support
> 1 - support
> 
> and in DDI0406C we have
> 
> 0 - no support
> 1 - support in Thumb
> 2 - support in Thumb and ARM

Well spotted, although I think this a documentation error. I've checked both
A7 and A15 and they both advertise '2' (although r0p0 TRM for A7 also gets
this wrong, the CPU does the right thing).

What about the Qualcomm CPUs?

Will
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v3 -next 0/5] Add support for LZ4-compressed kernel

2013-03-07 Thread Kyungsik Lee

Hello,

On Tue, Mar 05, 2013 at 03:06:16PM -0800, Andrew Morton wrote:
> On Tue,  5 Mar 2013 20:47:31 +0900 Kyungsik Lee  wrote:
> 
> > This is the third version. In this version, Some codes are fixed
> > and more description and note are added. I would like to thank David Sterba
> > for his review.
> > 
> > The Last patch[5/5] of the patch set is for making x86 and arm default to
> > LZ4-compressed for testing the LZ4 code in the linux-next.
> > It was requested by Andrew Morton in the patch set v2.
> > 
> > Currently, A preliminary version of LZ4 de/compression tool is supported.
> > However, It is expected that we will have a tool with more features
> > once its format is finished.
> 
> What happened to the changelog?  The earlier version at least had some
> rudimentary benchmarking results, but now we don't even have that. 
> 
> Someone should prepare the information explaining why we added this to
> Linux, and I'd prefer that person be you rather than me!  Certainly it
> should include performance measurements - both speed and space.  Also
> it should capture our thinking regarding all the other decompressors,
> explaining why we view it as acceptable to add yet another one.
> 
> Please, put yourself in the position of someone reading these commits
> in 2017 wondering "why did they merge this".  We should tell them.

Sorry for the inconvenience regarding changelog. Another patch(v4)
is not required so this is the information you mentioned.
I'm not sure that I captured what we had discussed regarding all the other
decompressors properly.


Benchmark Results(PATCH v3)
Compiler: Linaro ARM gcc 4.6.2

1. ARMv7, 1.5GHz based board
   Kernel: linux 3.4
   Uncompressed Kernel Size: 14MB
Compressed Size  Decompression Speed
   LZO  6.7MB20.1MB/s, 25.2MB/s(UA)
   LZ4  7.3MB29.1MB/s, 45.6MB/s(UA)

2. ARMv7, 1.7GHz based board
   Kernel: linux 3.7
   Uncompressed Kernel Size: 14MB
Compressed Size  Decompression Speed
   LZO  6.0MB34.1MB/s, 52.2MB/s(UA)
   LZ4  6.5MB86.7MB/s
- UA: Unaligned memory Access support
- Latest patch set for LZO applied


This patch set is for adding support for LZ4-compressed Kernel.
LZ4 is a very fast lossless compression algorithm and it also features
an extremely fast decoder [1].

But we have five of decompressors already and one question which does arise,
however, is that of where do we stop adding new ones? This issue had been
discussed and came to the conclusion [2].
Russell King said that
we should have:
- one decompressor which is the fastest
- one decompressor for the highest compression ratio
- one popular decompressor (eg conventional gzip)
If we have a replacement one for one of these, then it should do exactly that:
replace it.

The benchmark shows that an 8% increase in image size vs a 66% increase
in decompression speed compared to LZO(which has been known as the fastest
decompressor in the Kernel). Therefore the "fast but may not be small"
compression title has clearly been taken by LZ4 [3].

[1] http://code.google.com/p/lz4/
[2] http://thread.gmane.org/gmane.linux.kbuild.devel/9157
[3] http://thread.gmane.org/gmane.linux.kbuild.devel/9347


Thanks,
Kyungsik
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] efivarfs: fix abnormal GUID in variable name by using strcpy to replace null with dash

2013-03-07 Thread joeyli

於 四，2013-03-07 於 11:39 +，Matt Fleming 提到：
> > This patch works on a normal UEFI machine, we will test it on HP
> z220. I
> > will send out it formally after test success.
> 
> Has anyone tried contacting HP to tell them their firmware is doing
> bizarre things?

We will try to contact with HP workstation department.
Another good chance is in UEFI Plugfest Taipei at next next week, I will
discuss with HP engineer for this issue.


Thanks a lot!
Joey Lee



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 3.9-rc1: crash kernel panic - not syncing: Can not allocate SWIOTLB buffer earlier and can't now provide you with the DMA bounce buffer

2013-03-07 Thread Yinghai Lu

On Thu, Mar 7, 2013 at 10:32 PM, Yinghai Lu  wrote:
> On Thu, Mar 7, 2013 at 10:03 PM, CAI Qian  wrote:
>> CC'ing kexec ML. Also mentioned that 3.8 has no such issue.
>>
>> This message looks suspicious and out of range while 3.8 reservation
>> looks within the range.
>>
>> [0.00] Reserving 128MB of memory at 5216MB for crashkernel
>> (System RAM: 3977MB)
>>
>> Wondering if anything to do with memblock again...
>
> that is intended...
>
>> - Original Message -
>>> From: "WANG Chao" 
>>> To: "LKML" vger.kernel.org>
>>> Cc: "CAI Qian" 
>>> Sent: Friday, March 8, 2013 1:54:37 PM
>>> Subject: 3.9-rc1: crash kernel panic - not syncing: Can not allocate 
>>> SWIOTLB buffer earlier and can't now provide you
>>> with the DMA bounce buffer
>>>
>>> Hi, All
>>>
>>> On 3.9-rc1, I load crash kernel with latest kexec-tools(up to
>>> 28d413a), but
>>> 2nd kernel panic at early time:
>>> [2.948076] Kernel panic - not syncing: Can not allocate SWIOTLB
>>> buffer earlier and can't now provide you with the DMA bounce buffer
>>> [2.959958] Pid: 53, comm: khubd Not tainted 3.9.0-rc1+ #1
>
> You need to add crashkernel_low=64M in first kernel.
>
> As your system does not support DMA remapping.

looks like your system DO have DMAR table, please enable dmar
remapping in your kernel config.

Yinghai
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 3.9-rc1: crash kernel panic - not syncing: Can not allocate SWIOTLB buffer earlier and can't now provide you with the DMA bounce buffer

2013-03-07 Thread Yinghai Lu

On Thu, Mar 7, 2013 at 10:03 PM, CAI Qian  wrote:
> CC'ing kexec ML. Also mentioned that 3.8 has no such issue.
>
> This message looks suspicious and out of range while 3.8 reservation
> looks within the range.
>
> [0.00] Reserving 128MB of memory at 5216MB for crashkernel
> (System RAM: 3977MB)
>
> Wondering if anything to do with memblock again...

that is intended...

> - Original Message -
>> From: "WANG Chao" 
>> To: "LKML" vger.kernel.org>
>> Cc: "CAI Qian" 
>> Sent: Friday, March 8, 2013 1:54:37 PM
>> Subject: 3.9-rc1: crash kernel panic - not syncing: Can not allocate SWIOTLB 
>> buffer earlier and can't now provide you
>> with the DMA bounce buffer
>>
>> Hi, All
>>
>> On 3.9-rc1, I load crash kernel with latest kexec-tools(up to
>> 28d413a), but
>> 2nd kernel panic at early time:
>> [2.948076] Kernel panic - not syncing: Can not allocate SWIOTLB
>> buffer earlier and can't now provide you with the DMA bounce buffer
>> [2.959958] Pid: 53, comm: khubd Not tainted 3.9.0-rc1+ #1

You need to add crashkernel_low=64M in first kernel.

As your system does not support DMA remapping.

Thanks

Yinghai
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 09/14] x86, mm, numa: set memblock nid later

2013-03-07 Thread Tejun Heo

On Thu, Mar 07, 2013 at 08:58:35PM -0800, Yinghai Lu wrote:
> Only set memblock nid one time.

Would be awesome if the description explains why we're doing this and
why we're allowed to do this now.

> Also rename numa_register_memblks to numa_check_memblks()
> after move set memblock nid out.

Ah... so, it's getting renamed here.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 08/14] x86, mm, numa: use numa_meminfo to check node_map_pfn alignment

2013-03-07 Thread Tejun Heo

On Thu, Mar 07, 2013 at 08:58:34PM -0800, Yinghai Lu wrote:
> We could use numa_meminfo directly instead of memblock nid.
> 
> So we could move down set memblock nid down and only do it one time
> for successful path
> 
> Move node_map_pfn_alignment() to arch/x86/mm as no other user for it.

Please don't move and update in the same patch.  It makes it difficult
to review what's really changing.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] spi: slink-tegra20: move runtime pm calls to transfer_one_message

2013-03-07 Thread Laxman Dewangan

The prepare_transfer_hardware() is called in atomic context and
calling synchronous runtime pm calls can create scheduling deadlock.

Therefore, in place of calling runtime PM calls from prepare/unprepare
message transfer, calling this in transfer_one_message().

Signed-off-by: Laxman Dewangan 
---
 drivers/spi/spi-tegra20-slink.c |   25 -
 1 files changed, 8 insertions(+), 17 deletions(-)

diff --git a/drivers/spi/spi-tegra20-slink.c b/drivers/spi/spi-tegra20-slink.c
index b8698b3..a829563 100644
--- a/drivers/spi/spi-tegra20-slink.c
+++ b/drivers/spi/spi-tegra20-slink.c
@@ -858,21 +858,6 @@ static int tegra_slink_setup(struct spi_device *spi)
return 0;
 }
 
-static int tegra_slink_prepare_transfer(struct spi_master *master)
-{
-   struct tegra_slink_data *tspi = spi_master_get_devdata(master);
-
-   return pm_runtime_get_sync(tspi->dev);
-}
-
-static int tegra_slink_unprepare_transfer(struct spi_master *master)
-{
-   struct tegra_slink_data *tspi = spi_master_get_devdata(master);
-
-   pm_runtime_put(tspi->dev);
-   return 0;
-}
-
 static int tegra_slink_transfer_one_message(struct spi_master *master,
struct spi_message *msg)
 {
@@ -885,6 +870,12 @@ static int tegra_slink_transfer_one_message(struct 
spi_master *master,
 
msg->status = 0;
msg->actual_length = 0;
+   ret = pm_runtime_get_sync(tspi->dev);
+   if (ret < 0) {
+   dev_err(tspi->dev, "runtime get failed: %d\n", ret);
+   goto done;
+   }
+
single_xfer = list_is_singular(&msg->transfers);
list_for_each_entry(xfer, &msg->transfers, transfer_list) {
INIT_COMPLETION(tspi->xfer_completion);
@@ -921,6 +912,8 @@ static int tegra_slink_transfer_one_message(struct 
spi_master *master,
 exit:
tegra_slink_writel(tspi, tspi->def_command_reg, SLINK_COMMAND);
tegra_slink_writel(tspi, tspi->def_command2_reg, SLINK_COMMAND2);
+   pm_runtime_put(tspi->dev);
+done:
msg->status = ret;
spi_finalize_current_message(master);
return ret;
@@ -1148,9 +1141,7 @@ static int tegra_slink_probe(struct platform_device *pdev)
/* the spi->mode bits understood by this driver: */
master->mode_bits = SPI_CPOL | SPI_CPHA | SPI_CS_HIGH;
master->setup = tegra_slink_setup;
-   master->prepare_transfer_hardware = tegra_slink_prepare_transfer;
master->transfer_one_message = tegra_slink_transfer_one_message;
-   master->unprepare_transfer_hardware = tegra_slink_unprepare_transfer;
master->num_chipselect = MAX_CHIP_SELECT;
master->bus_num = -1;
 
-- 
1.7.1.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v2 7/7] kmod: remove call_usermodehelper_fns()

2013-03-07 Thread Lucas De Marchi

This function suffers from not being able to determine if the cleanup is
called in case it returns -ENOMEM. Nobody is using it anymore, so let's
remove it.

Signed-off-by: Lucas De Marchi 
---
 include/linux/kmod.h | 11 +--
 kernel/kmod.c| 31 +--
 2 files changed, 18 insertions(+), 24 deletions(-)

diff --git a/include/linux/kmod.h b/include/linux/kmod.h
index 7eebcf5..0555cc6 100644
--- a/include/linux/kmod.h
+++ b/include/linux/kmod.h
@@ -67,9 +67,7 @@ struct subprocess_info {
 };
 
 extern int
-call_usermodehelper_fns(char *path, char **argv, char **envp, int wait,
-   int (*init)(struct subprocess_info *info, struct cred 
*new),
-   void (*cleanup)(struct subprocess_info *), void *data);
+call_usermodehelper(char *path, char **argv, char **envp, int wait);
 
 extern struct subprocess_info *
 call_usermodehelper_setup(char *path, char **argv, char **envp, gfp_t gfp_mask,
@@ -79,13 +77,6 @@ call_usermodehelper_setup(char *path, char **argv, char 
**envp, gfp_t gfp_mask,
 extern int
 call_usermodehelper_exec(struct subprocess_info *info, int wait);
 
-static inline int
-call_usermodehelper(char *path, char **argv, char **envp, int wait)
-{
-   return call_usermodehelper_fns(path, argv, envp, wait,
-  NULL, NULL, NULL);
-}
-
 extern struct ctl_table usermodehelper_table[];
 
 enum umh_disable_depth {
diff --git a/kernel/kmod.c b/kernel/kmod.c
index 2fd6222..eebb63c 100644
--- a/kernel/kmod.c
+++ b/kernel/kmod.c
@@ -557,8 +557,8 @@ struct subprocess_info *call_usermodehelper_setup(char 
*path, char **argv,
  * call_usermodehelper_exec - start a usermode application
  * @sub_info: information about the subprocessa
  * @wait: wait for the application to finish and return status.
- *when -1 don't wait at all, but you get no useful error back when
- *the program couldn't be exec'ed. This makes it safe to call
+ *when UMH_NO_WAIT don't wait at all, but you get no useful error back
+ *when the program couldn't be exec'ed. This makes it safe to call
  *from interrupt context.
  *
  * Runs a user-space application.  The application is started
@@ -618,29 +618,32 @@ unlock:
 }
 EXPORT_SYMBOL(call_usermodehelper_exec);
 
-/*
- * call_usermodehelper_fns() will not run the caller-provided cleanup function
- * if a memory allocation failure is experienced.  So the caller might need to
- * check the call_usermodehelper_fns() return value: if it is -ENOMEM, perform
- * the necessaary cleanup within the caller.
+/**
+ * call_usermodehelper() - prepare and start a usermode application
+ * @path: path to usermode executable
+ * @argv: arg vector for process
+ * @envp: environment for process
+ * @wait: wait for the application to finish and return status.
+ *when UMH_NO_WAIT don't wait at all, but you get no useful error back
+ *when the program couldn't be exec'ed. This makes it safe to call
+ *from interrupt context.
+ *
+ * This function is the equivalent to use call_usermodehelper_setup() and
+ * call_usermodehelper_exec().
  */
-int call_usermodehelper_fns(
-   char *path, char **argv, char **envp, int wait,
-   int (*init)(struct subprocess_info *info, struct cred *new),
-   void (*cleanup)(struct subprocess_info *), void *data)
+int call_usermodehelper(char *path, char **argv, char **envp, int wait)
 {
struct subprocess_info *info;
gfp_t gfp_mask = (wait == UMH_NO_WAIT) ? GFP_ATOMIC : GFP_KERNEL;
 
info = call_usermodehelper_setup(path, argv, envp, gfp_mask,
-init, cleanup, data);
-
+NULL, NULL, NULL);
if (info == NULL)
return -ENOMEM;
 
return call_usermodehelper_exec(info, wait);
 }
-EXPORT_SYMBOL(call_usermodehelper_fns);
+EXPORT_SYMBOL(call_usermodehelper);
 
 static int proc_cap_handler(struct ctl_table *table, int write,
 void __user *buffer, size_t *lenp, loff_t *ppos)
-- 
1.8.1.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/1 v5] pwm_bl: Add support for backlight enable regulator

2013-03-07 Thread Alex Courbot


On 03/08/2013 06:16 AM, Andrew Chew wrote:

The backlight enable regulator is specified in the device tree node for
backlight.

Signed-off-by: Andrew Chew 
---
Renamed en_supply to enable_supply to match the corresponding device tree
property.

Removed unnecessary check for pb->enable_supply validity.  This supply
is mandatory, and probe will fail if it is not provided.

Renamed the tracking bool from en_supply_enabled to enabled so that it's
more generic.  Encapsulated pwm_enable() and pwm_disable() calls into the
enabled check so that we never unnecessarily turn on or off the pwm if
it's already been turned on or off.

  .../bindings/video/backlight/pwm-backlight.txt |   14 +
  drivers/video/backlight/pwm_bl.c   |   56 
  2 files changed, 60 insertions(+), 10 deletions(-)

diff --git 
a/Documentation/devicetree/bindings/video/backlight/pwm-backlight.txt 
b/Documentation/devicetree/bindings/video/backlight/pwm-backlight.txt
index 1e4fc72..7e2e089 100644
--- a/Documentation/devicetree/bindings/video/backlight/pwm-backlight.txt
+++ b/Documentation/devicetree/bindings/video/backlight/pwm-backlight.txt
@@ -10,6 +10,11 @@ Required properties:
last value in the array represents a 100% duty cycle (brightest).
- default-brightness-level: the default brightness level (index into the
array defined by the "brightness-levels" property)
+  - enable-supply: A phandle to the regulator device tree node. This
+  regulator will be turned on and off as the pwm is enabled and disabled.
+  Many backlights are enabled via a GPIO. In this case, we instantiate
+  a fixed regulator and give that to enable-supply. If a regulator
+  is not needed, then provide a dummy fixed regulator.

  Optional properties:
- pwm-names: a list of names for the PWM devices specified in the
@@ -19,10 +24,19 @@ Optional properties:

  Example:

+   bl_en: fixed-regulator {
+compatible = "regulator-fixed";
+regulator-name = "bl-en-supply";
+regulator-boot-on;
+gpio = <&some_gpio>;
+enable-active-high;
+   };
+
backlight {
compatible = "pwm-backlight";
pwms = <&pwm 0 500>;

brightness-levels = <0 4 8 16 32 64 128 255>;
default-brightness-level = <6>;
+   enable-supply = <&bl_en>;
};
diff --git a/drivers/video/backlight/pwm_bl.c b/drivers/video/backlight/pwm_bl.c
index 069983c..c517d4a 100644
--- a/drivers/video/backlight/pwm_bl.c
+++ b/drivers/video/backlight/pwm_bl.c
@@ -20,10 +20,13 @@
  #include 
  #include 
  #include 
+#include 

  struct pwm_bl_data {
struct pwm_device   *pwm;
struct device   *dev;
+   boolenabled;
+   struct regulator*enable_supply;
unsigned intperiod;
unsigned intlth_brightness;
unsigned int*levels;
@@ -35,6 +38,38 @@ struct pwm_bl_data {
void(*exit)(struct device *);
  };

+static void pwm_backlight_enable(struct backlight_device *bl)
+{
+   struct pwm_bl_data *pb = dev_get_drvdata(&bl->dev);
+
+   /* Bail if we are already enabled. */
+   if (pb->enabled)
+   return;
+
+   pwm_enable(pb->pwm);
+
+   if (regulator_enable(pb->enable_supply) != 0)
+   dev_warn(&bl->dev, "Failed to enable regulator");
+
+   pb->enabled = true;
+}
+
+static void pwm_backlight_disable(struct backlight_device *bl)
+{
+   struct pwm_bl_data *pb = dev_get_drvdata(&bl->dev);
+
+   /* Bail if we are already disabled. */
+   if (!pb->enabled)
+   return;
+
+   if (regulator_disable(pb->enable_supply) != 0)
+   dev_warn(&bl->dev, "Failed to disable regulator");
+
+   pwm_disable(pb->pwm);
+
+   pb->enabled = false;
+}
+
  static int pwm_backlight_update_status(struct backlight_device *bl)
  {
struct pwm_bl_data *pb = dev_get_drvdata(&bl->dev);
@@ -52,7 +87,7 @@ static int pwm_backlight_update_status(struct 
backlight_device *bl)

if (brightness == 0) {
pwm_config(pb->pwm, 0, pb->period);
-   pwm_disable(pb->pwm);
+   pwm_backlight_disable(bl);
} else {
int duty_cycle;

@@ -66,7 +101,7 @@ static int pwm_backlight_update_status(struct 
backlight_device *bl)
duty_cycle = pb->lth_brightness +
 (duty_cycle * (pb->period - pb->lth_brightness) / max);
pwm_config(pb->pwm, duty_cycle, pb->period);
-   pwm_enable(pb->pwm);
+   pwm_backlight_enable(bl);
}

if (pb->notify_after)
@@ -145,12 +180,6 @@ static int pwm_backlight_parse_dt(struct device *dev,
data->max_brightness--;
}

-   /*
-* TODO: Most users of this driver use a number of GPIOs to control
-

Re: [PATCH] delay blacklist symbol lookup until we actually need it

2013-03-07 Thread Ananth N Mavinakayanahalli

On Thu, Mar 07, 2013 at 01:45:18PM +0100, oskar.and...@sonymobile.com wrote:
> From: Toby Collett 
> 
> The symbol lookup can take a long time and kprobes is
> initialised very early in boot, so delay symbol lookup
> until the blacklist is first used.
> 
> Reviewed-by: Radovan Lekanovic 
> Signed-off-by: Toby Collett 
> Signed-off-by: Oskar Andero 

Sounds resonable.

Acked-by: Ananth N Mavinakayanahalli 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v2 3/7] kmod: split call to call_usermodehelper_fns()

2013-03-07 Thread Lucas De Marchi

Use call_usermodehelper_setup() + call_usermodehelper_exec() instead of
calling call_usermodehelper_fns(). In case the latter returns -ENOMEM
the cleanup function may had not been called - in this case we would
not free argv and module_name.

Signed-off-by: Lucas De Marchi 
---
 kernel/kmod.c | 15 +--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/kernel/kmod.c b/kernel/kmod.c
index b39f240..2fd6222 100644
--- a/kernel/kmod.c
+++ b/kernel/kmod.c
@@ -77,6 +77,8 @@ static void free_modprobe_argv(struct subprocess_info *info)
 
 static int call_modprobe(char *module_name, int wait)
 {
+   struct subprocess_info *info;
+   gfp_t gfp_mask;
static char *envp[] = {
"HOME=/",
"TERM=linux",
@@ -98,8 +100,17 @@ static int call_modprobe(char *module_name, int wait)
argv[3] = module_name;  /* check free_modprobe_argv() */
argv[4] = NULL;
 
-   return call_usermodehelper_fns(modprobe_path, argv, envp,
-   wait | UMH_KILLABLE, NULL, free_modprobe_argv, NULL);
+   gfp_mask = (wait == UMH_NO_WAIT) ? GFP_ATOMIC : GFP_KERNEL;
+   info = call_usermodehelper_setup(modprobe_path, argv, envp,
+gfp_mask, NULL, free_modprobe_argv,
+NULL);
+   if (!info)
+   goto free_module_name;
+
+   return call_usermodehelper_exec(info, wait | UMH_KILLABLE);
+
+free_module_name:
+   kfree(module_name);
 free_argv:
kfree(argv);
 out:
-- 
1.8.1.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v2 1/7] kernel/sys.c: Use the simpler call_usermodehelper()

2013-03-07 Thread Lucas De Marchi

Commit "7ff6764 usermodehelper: cleanup/fix __orderly_poweroff() &&
argv_free()" simplified __orderly_poweroff() removing the need to use
call_usermodehelper_fns().

Since we are not passing any callback, it's simpler to use
call_usermodehelper().

Signed-off-by: Lucas De Marchi 
---
 kernel/sys.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/kernel/sys.c b/kernel/sys.c
index 81f5644..bd15276 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -2203,8 +2203,7 @@ static int __orderly_poweroff(void)
return -ENOMEM;
}
 
-   ret = call_usermodehelper_fns(argv[0], argv, envp, UMH_WAIT_EXEC,
- NULL, NULL, NULL);
+   ret = call_usermodehelper(argv[0], argv, envp, UMH_WAIT_EXEC);
argv_free(argv);
 
return ret;
-- 
1.8.1.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v2 2/7] usermodehelper: Export _exec() and _setup() functions

2013-03-07 Thread Lucas De Marchi

call_usermodehelper_setup() + call_usermodehelper_exec() need to be
called instead of call_usermodehelper_fns() when the cleanup function
needs to be called even when an ENOMEM error occurs. In this case using
call_usermodehelper_fns() the user can't distinguish if the cleanup
function was called or not.

Signed-off-by: Lucas De Marchi 
---
 include/linux/kmod.h |  8 
 kernel/kmod.c| 56 +---
 2 files changed, 31 insertions(+), 33 deletions(-)

diff --git a/include/linux/kmod.h b/include/linux/kmod.h
index 5398d58..7eebcf5 100644
--- a/include/linux/kmod.h
+++ b/include/linux/kmod.h
@@ -71,6 +71,14 @@ call_usermodehelper_fns(char *path, char **argv, char 
**envp, int wait,
int (*init)(struct subprocess_info *info, struct cred 
*new),
void (*cleanup)(struct subprocess_info *), void *data);
 
+extern struct subprocess_info *
+call_usermodehelper_setup(char *path, char **argv, char **envp, gfp_t gfp_mask,
+ int (*init)(struct subprocess_info *info, struct cred 
*new),
+ void (*cleanup)(struct subprocess_info *), void 
*data);
+
+extern int
+call_usermodehelper_exec(struct subprocess_info *info, int wait);
+
 static inline int
 call_usermodehelper(char *path, char **argv, char **envp, int wait)
 {
diff --git a/kernel/kmod.c b/kernel/kmod.c
index 56dd349..b39f240 100644
--- a/kernel/kmod.c
+++ b/kernel/kmod.c
@@ -502,14 +502,28 @@ static void helper_unlock(void)
  * @argv: arg vector for process
  * @envp: environment for process
  * @gfp_mask: gfp mask for memory allocation
+ * @cleanup: a cleanup function
+ * @init: an init function
+ * @data: arbitrary context sensitive data
  *
  * Returns either %NULL on allocation failure, or a subprocess_info
  * structure.  This should be passed to call_usermodehelper_exec to
  * exec the process and free the structure.
+ *
+ * The init function is used to customize the helper process prior to
+ * exec.  A non-zero return code causes the process to error out, exit,
+ * and return the failure to the calling process
+ *
+ * The cleanup function is just before ethe subprocess_info is about to
+ * be freed.  This can be used for freeing the argv and envp.  The
+ * Function must be runnable in either a process context or the
+ * context in which call_usermodehelper_exec is called.
  */
-static
 struct subprocess_info *call_usermodehelper_setup(char *path, char **argv,
- char **envp, gfp_t gfp_mask)
+   char **envp, gfp_t gfp_mask,
+   int (*init)(struct subprocess_info *info, struct cred *new),
+   void (*cleanup)(struct subprocess_info *info),
+   void *data)
 {
struct subprocess_info *sub_info;
sub_info = kzalloc(sizeof(struct subprocess_info), gfp_mask);
@@ -520,38 +534,15 @@ struct subprocess_info *call_usermodehelper_setup(char 
*path, char **argv,
sub_info->path = path;
sub_info->argv = argv;
sub_info->envp = envp;
+
+   sub_info->cleanup = cleanup;
+   sub_info->init = init;
+   sub_info->data = data;
   out:
return sub_info;
 }
 
 /**
- * call_usermodehelper_setfns - set a cleanup/init function
- * @info: a subprocess_info returned by call_usermodehelper_setup
- * @cleanup: a cleanup function
- * @init: an init function
- * @data: arbitrary context sensitive data
- *
- * The init function is used to customize the helper process prior to
- * exec.  A non-zero return code causes the process to error out, exit,
- * and return the failure to the calling process
- *
- * The cleanup function is just before ethe subprocess_info is about to
- * be freed.  This can be used for freeing the argv and envp.  The
- * Function must be runnable in either a process context or the
- * context in which call_usermodehelper_exec is called.
- */
-static
-void call_usermodehelper_setfns(struct subprocess_info *info,
-   int (*init)(struct subprocess_info *info, struct cred *new),
-   void (*cleanup)(struct subprocess_info *info),
-   void *data)
-{
-   info->cleanup = cleanup;
-   info->init = init;
-   info->data = data;
-}
-
-/**
  * call_usermodehelper_exec - start a usermode application
  * @sub_info: information about the subprocessa
  * @wait: wait for the application to finish and return status.
@@ -563,7 +554,6 @@ void call_usermodehelper_setfns(struct subprocess_info 
*info,
  * asynchronously if wait is not set, and runs as a child of keventd.
  * (ie. it runs with full root capabilities).
  */
-static
 int call_usermodehelper_exec(struct subprocess_info *sub_info, int wait)
 {
DECLARE_COMPLETION_ONSTACK(done);
@@ -615,6 +605,7 @@ unlock:
helper_unlock();
return retval;
 }
+EXPORT_SYMBOL(call_usermodehelper_exec);
 
 /*
  * call_usermodehelper_fns() will not run the caller-provided cleanu

[PATCH v2 0/7] kmod/usermodehelper changes

2013-03-07 Thread Lucas De Marchi

Changes from v1:
  - Remove call_usermodehelper_fns() converting all calling sites to either
call_usermodelper() or call_usermodehelper_setup() + 
call_usermodehelper_exec()
  - Don't check the return code in call_usermodehelper_freeinfo() - now that
allocating the subprocess_info is separating from executing it, it's safe to
allways call the cleanup.

Lucas De Marchi (7):
  kernel/sys.c: Use the simpler call_usermodehelper()
  usermodehelper: Export _exec() and _setup() functions
  kmod: split call to call_usermodehelper_fns()
  KEYS: split call to call_usermodehelper_fns()
  coredump: remove trailling whitespaces
  Split remaining calls to call_usermodehelper_fns()
  kmod: remove call_usermodehelper_fns()

 fs/coredump.c   |  21 +++---
 include/linux/kmod.h|  17 
 init/do_mounts_initrd.c |  11 -
 kernel/kmod.c   | 100 +++-
 kernel/sys.c|   3 +-
 security/keys/request_key.c |  14 +--
 6 files changed, 96 insertions(+), 70 deletions(-)

-- 
1.8.1.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v2 5/7] coredump: remove trailling whitespaces

2013-03-07 Thread Lucas De Marchi

Signed-off-by: Lucas De Marchi 
---
 fs/coredump.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/fs/coredump.c b/fs/coredump.c
index c647965..7dfb3b0 100644
--- a/fs/coredump.c
+++ b/fs/coredump.c
@@ -522,7 +522,7 @@ void do_coredump(siginfo_t *siginfo)
 
ispipe = format_corename(&cn, &cprm);
 
-   if (ispipe) {
+   if (ispipe) {
int dump_count;
char **helper_argv;
 
@@ -576,10 +576,10 @@ void do_coredump(siginfo_t *siginfo)
NULL, &cprm);
argv_free(helper_argv);
if (retval) {
-   printk(KERN_INFO "Core dump to %s pipe failed\n",
+   printk(KERN_INFO "Core dump to %s pipe failed\n",
   cn.corename);
goto close_fail;
-   }
+   }
} else {
struct inode *inode;
 
-- 
1.8.1.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v2 4/7] KEYS: split call to call_usermodehelper_fns()

2013-03-07 Thread Lucas De Marchi

Use call_usermodehelper_setup() + call_usermodehelper_exec() instead of
calling call_usermodehelper_fns(). In case there's an OOM in this last
function the cleanup function may not be called - in this case we would
miss a call to key_put().

Signed-off-by: Lucas De Marchi 
---
 security/keys/request_key.c | 14 +++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/security/keys/request_key.c b/security/keys/request_key.c
index 4bd6bdb..c958f34 100644
--- a/security/keys/request_key.c
+++ b/security/keys/request_key.c
@@ -93,9 +93,17 @@ static void umh_keys_cleanup(struct subprocess_info *info)
 static int call_usermodehelper_keys(char *path, char **argv, char **envp,
struct key *session_keyring, int wait)
 {
-   return call_usermodehelper_fns(path, argv, envp, wait,
-  umh_keys_init, umh_keys_cleanup,
-  key_get(session_keyring));
+   struct subprocess_info *info;
+
+   info = call_usermodehelper_setup(path, argv, envp, GFP_KERNEL,
+ umh_keys_init, umh_keys_cleanup,
+ key_get(session_keyring));
+   if (!info) {
+   key_put(session_keyring);
+   return -ENOMEM;
+   }
+
+   return call_usermodehelper_exec(info, wait);
 }
 
 /*
-- 
1.8.1.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v2 6/7] Split remaining calls to call_usermodehelper_fns()

2013-03-07 Thread Lucas De Marchi

These are the only users of call_usermodehelper_fns(). This function
suffers from not being able to determine if the cleanup is called. Even
if in this places the cleanup pointer is NULL, convert them to use the
separate call_usermodehelper_setup() + call_usermodehelper_exec()
functions so we can remove the _fns variant.

Signed-off-by: Lucas De Marchi 
---
 fs/coredump.c   | 15 ---
 init/do_mounts_initrd.c | 11 +--
 2 files changed, 21 insertions(+), 5 deletions(-)

diff --git a/fs/coredump.c b/fs/coredump.c
index 7dfb3b0..468b4f6 100644
--- a/fs/coredump.c
+++ b/fs/coredump.c
@@ -525,6 +525,7 @@ void do_coredump(siginfo_t *siginfo)
if (ispipe) {
int dump_count;
char **helper_argv;
+   struct subprocess_info *sub_info;
 
if (ispipe < 0) {
printk(KERN_WARNING "format_corename failed\n");
@@ -571,9 +572,17 @@ void do_coredump(siginfo_t *siginfo)
goto fail_dropcount;
}
 
-   retval = call_usermodehelper_fns(helper_argv[0], helper_argv,
-   NULL, UMH_WAIT_EXEC, umh_pipe_setup,
-   NULL, &cprm);
+   sub_info = call_usermodehelper_setup(helper_argv[0],
+   helper_argv, NULL, GFP_KERNEL,
+   umh_pipe_setup, NULL, &cprm);
+   if (!sub_info) {
+   printk(KERN_WARNING "%s failed to allocate memory\n",
+  __func__);
+   argv_free(helper_argv);
+   goto fail_dropcount;
+   }
+
+   retval = call_usermodehelper_exec(sub_info, UMH_WAIT_EXEC);
argv_free(helper_argv);
if (retval) {
printk(KERN_INFO "Core dump to %s pipe failed\n",
diff --git a/init/do_mounts_initrd.c b/init/do_mounts_initrd.c
index a32ec1c..747cc66 100644
--- a/init/do_mounts_initrd.c
+++ b/init/do_mounts_initrd.c
@@ -50,6 +50,7 @@ static int init_linuxrc(struct subprocess_info *info, struct 
cred *new)
 
 static void __init handle_initrd(void)
 {
+   struct subprocess_info *info;
static char *argv[] = { "linuxrc", NULL, };
extern char *envp_init[];
int error;
@@ -70,8 +71,14 @@ static void __init handle_initrd(void)
 */
current->flags |= PF_FREEZER_SKIP;
 
-   call_usermodehelper_fns("/linuxrc", argv, envp_init, UMH_WAIT_PROC,
-   init_linuxrc, NULL, NULL);
+   info = call_usermodehelper_setup("/linuxrc", argv, envp_init,
+GFP_KERNEL, init_linuxrc, NULL, NULL);
+   if (!info) {
+   printk(KERN_WARNING "%s failed to allocate memory\n",
+  __func__);
+   return;
+   }
+   call_usermodehelper_exec(info, UMH_WAIT_PROC);
 
current->flags &= ~PF_FREEZER_SKIP;
 
-- 
1.8.1.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 01/14] x86, ACPI, mm: Kill max_low_pfn_mapped

2013-03-07 Thread H. Peter Anvin

On 03/07/2013 09:28 PM, Tejun Heo wrote:
> On Thu, Mar 7, 2013 at 9:27 PM, Yinghai Lu  wrote:
>> They are not using memblock_find_in_range(), so 1ULL<< will not help.
>>
>> Really hope i915 drm guys could clean that hacks.
> 
> The code isn't being used.  Just leave it alone.  Maybe add a comment.
>  The change is just making things more confusing.
> 

Indeed, but...

Daniel: can you guys clean this up or can we just remove the #if 0 clause?

-hpa


-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH V2] cpufreq: ARM big LITTLE: Add generic cpufreq driver and its DT glue

2013-03-07 Thread Guennadi Liakhovetski

Hi Viresh

On Fri, 8 Mar 2013, Viresh Kumar wrote:

> On 8 March 2013 05:56, Guennadi Liakhovetski  wrote:
> > I like generic drivers :)
> 
> Me too :)
> 
> > cpufreq-cpu0 is yet another such generic
> > (cpufreq) driver. Now, comparing the functionality of the two:
> 
> Great!!
> 
> > we see, that this driver "only" switches CPU clock frequencies. Whereas
> > the cpufreq-cpu0 driver also manipulates a regulator (if available)
> > directly. I understand, power-saving is also an important consideration
> > for big.LITTLE systems. So, I presume, you plan to implement voltage
> > switching in cpufreq notifiers?
> 
> So the platform on which we are currently testing these is ARM TC2 Soc
> and this switching is done by the firmware instead. And so didn't went
> for regulator hookups initially.. Obviously in future regulator hookups would
> find some space in this driver but not required for now.
> 
> > Now, my question is: is this (notifier)
> > actually the preferred method and the cpufreq-cpu0 driver is doing it
> > "wrongly?"
> 
> What notifiers are you talking about? I believe using the regulator framework
> is the right way of doing this. And that would be part of this code later on.

Also in your driver you're doing

cpufreq_notify_transition(&freqs, CPUFREQ_PRECHANGE);
...
cpufreq_notify_transition(&freqs, CPUFREQ_POSTCHANGE);

So, theoretically you could install such notifiers to adjust CPU voltages 
(using regulators too). But adding regulator calls directly to the driver 
would make it consistent with cpufreq-cpu0.c, so, if this doesn't violate 
any concepts, I think, it would be good to add those when suitable systems 
appear.

Thanks
Guennadi
---
Guennadi Liakhovetski, Ph.D.
Freelance Open-Source Software Developer
http://www.open-technology.de/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 06/14] x86, mm, numa: Move successful path handling code later

2013-03-07 Thread Tejun Heo

>  static int __init numa_register_memblks(struct numa_meminfo *mi)

After this patch, the above name is a bit misleading, I think.

> +out:

Maybe register: would fit better?

> + /* Finally register nodes. */
> + for_each_node_mask(nid, node_possible_map) {
> + u64 start = PFN_PHYS(max_pfn);

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 3.9-rc1: crash kernel panic - not syncing: Can not allocate SWIOTLB buffer earlier and can't now provide you with the DMA bounce buffer

2013-03-07 Thread CAI Qian

CC'ing kexec ML. Also mentioned that 3.8 has no such issue.

This message looks suspicious and out of range while 3.8 reservation
looks within the range.

[0.00] Reserving 128MB of memory at 5216MB for crashkernel
(System RAM: 3977MB)

Wondering if anything to do with memblock again...

CAI Qian

- Original Message -
> From: "WANG Chao" 
> To: "LKML" vger.kernel.org>
> Cc: "CAI Qian" 
> Sent: Friday, March 8, 2013 1:54:37 PM
> Subject: 3.9-rc1: crash kernel panic - not syncing: Can not allocate SWIOTLB 
> buffer earlier and can't now provide you
> with the DMA bounce buffer
> 
> Hi, All
> 
> On 3.9-rc1, I load crash kernel with latest kexec-tools(up to
> 28d413a), but
> 2nd kernel panic at early time:
> [2.948076] Kernel panic - not syncing: Can not allocate SWIOTLB
> buffer earlier and can't now provide you with the DMA bounce buffer
> [2.959958] Pid: 53, comm: khubd Not tainted 3.9.0-rc1+ #1
> [2.965426] Call Trace:
> [2.967866]  [] panic+0xc1/0x1d0
> [2.972644]  []
> swiotlb_tbl_map_single+0x27c/0x280
> [2.978991]  [] map_single+0x19/0x20
> [2.984115]  [] swiotlb_map_page+0x6e/0x160
> [2.989845]  []
> usb_hcd_map_urb_for_dma+0x230/0x4a0
> [2.996268]  [] usb_hcd_submit_urb+0x295/0x8e0
> [3.002258]  [] ? __dequeue_entity+0x2f/0x50
> [3.008076]  [] ? __switch_to+0x13e/0x4a0
> [3.013632]  [] usb_submit_urb+0xff/0x3d0
> [3.019186]  [] ? __schedule+0x3de/0x7e0
> [3.024657]  [] usb_start_wait_urb+0x6a/0x160
> [3.030560]  [] ? __kmalloc+0x55/0x210
> [3.035856]  [] ? usb_alloc_urb+0x1e/0x50
> [3.041411]  [] usb_control_msg+0xde/0x140
> [3.047056]  [] ? hub_port_init+0x310/0xaf0
> [3.052785]  [] ? hub_port_init+0x2eb/0xaf0
> [3.058515]  [] hub_port_init+0x338/0xaf0
> [3.064071]  [] ? update_autosuspend+0x39/0x60
> [3.070062]  [] ?
> pm_runtime_set_autosuspend_delay+0x49/0x70
> [3.077264]  []
> hub_port_connect_change+0x24a/0xaa0
> [3.083684]  [] hub_events+0x2ea/0x910
> [3.088981]  [] ? __schedule+0x3de/0x7e0
> [3.094451]  [] hub_thread+0x35/0x1e0
> [3.099661]  [] ? wake_up_bit+0x40/0x40
> [3.105045]  [] ? hub_events+0x910/0x910
> [3.110514]  [] kthread+0xc0/0xd0
> [3.115378]  [] ?
> kthread_create_on_node+0x120/0x120
> [3.121887]  [] ret_from_fork+0x7c/0xb0
> [3.127271]  [] ?
> kthread_create_on_node+0x120/0x120
> 
> 
> Here's the full log:
> # grep 'Crash' /proc/iomem
>   14600-14dff : Crash kernel
> 
> # dmesg | grep -i reserving
> [0.00] Reserving 128MB of memory at 5216MB for crashkernel
> (System RAM: 3977MB)
> 
> # kexec -p /boot/vmlinuz-3.9.0-rc1+
> --command-line='console=ttyS0,115200n81'
> # echo c > /proc/sysrq-trigger
> 
> [  217.879315] SysRq : Trigger a crash
> [  217.882836] BUG: unable to handle kernel NULL pointer dereference
> at   (null)
> [  217.890674] IP: [] sysrq_handle_crash+0x16/0x20
> [  217.896773] PGD 13df22067 PUD 139726067 PMD 0
> [  217.901244] Oops: 0002 [#1] SMP
> [  217.904491] Modules linked in: lockd sunrpc
> nf_conntrack_netbios_ns nf_conntrack_broadcast ipt_MASQUERADE
> ip6table_mangle ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6
> iptable_nat nf_nat_ip
> v4 nf_nat iptable_mangle ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4
> xt_conntrack nf_conntrack ebtable_filter ebtables ip6table_filter
> ip6_tables iptable_filter ip_tables sg coretemp kvm_inte
> l kvm e1000e iTCO_wdt crc32c_intel iTCO_vendor_support ptp
> ghash_clmulni_intel pps_core mei microcode pcspkr i2c_i801 lpc_ich
> mfd_core xfs libcrc32c sr_mod sd_mod cdrom crc_t10dif i915 i2c_al
> go_bit drm_kms_helper drm ahci libahci libata i2c_core video
> dm_mirror dm_region_hash dm_log dm_mod
> [  217.963690] CPU 0
> [  217.965526] Pid: 1206, comm: bash Not tainted 3.9.0-rc1+ #1 Intel
> Corporation 2012 Client Platform/Emerald Lake 2
> [  217.975948] RIP: 0010:[]  []
> sysrq_handle_crash+0x16/0x20
> [  217.984468] RSP: 0018:8801367e9e38  EFLAGS: 00010092
> [  217.989765] RAX: 000f RBX: 819b67c0 RCX:
> 88014e20ffe8
> [  217.996881] RDX:  RSI: 88014e20e3b8 RDI:
> 0063
> [  218.003998] RBP: 8801367e9e38 R08: 81c06280 R09:
> 0419
> [  218.03] R10: 0002 R11: 0418 R12:
> 0063
> [  218.018230] R13: 0286 R14:  R15:
> 0007
> [  218.025346] FS:  7fdd48ace740() GS:88014e20()
> knlGS:
> [  218.033416] CS:  0010 DS:  ES:  CR0: 80050033
> [  218.039147] CR2:  CR3: 00013a67c000 CR4:
> 001407f0
> [  218.046263] DR0:  DR1:  DR2:
> 
> [  218.053379] DR3:  DR6: 0ff0 DR7:
> 0400
> [  218.060496] Process bash (pid: 1206, threadinfo 8801367e8000,
> task 88013e8ae5c0)
> [  218.068564] Stack:
> [  218.070570]  8801367e9e78 813c2147 88

Re: [PATCH] Kprobes blacklist: Conditionally add x86-specific symbols

2013-03-07 Thread Ananth N Mavinakayanahalli

On Fri, Mar 08, 2013 at 01:23:25PM +0900, Masami Hiramatsu wrote:
> (2013/03/07 19:44), oskar.and...@sonymobile.com wrote:
> > From: Bjorn Davidsson 
> > 
> > The kprobes blacklist contains x86-specific symbols.
> > Looking for these in kallsyms takes unnecessary time
> > during startup on non-X86 platform.
> > Added #ifdef CONFIG_X86 around them.
> 
> Right. however, it might be better break that into
> common and arch-specific lists, because there may be
> other arch-specific non-probe-able functions on each
> architecture...

Agreed. CONFIG_ in kernel/* is not the right thing to do IMO.

You are moving the blacklist initialization to later in the next patch,
so how much overhead will it then be?

Ananth

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 05/14] x86, ACPI: Find acpi tables in initrd early at head_32.S/head64.c

2013-03-07 Thread Tejun Heo

On Thu, Mar 07, 2013 at 08:58:31PM -0800, Yinghai Lu wrote:
> diff --git a/arch/x86/kernel/head_32.S b/arch/x86/kernel/head_32.S
> index 73afd11..ca08f0e 100644
> --- a/arch/x86/kernel/head_32.S
> +++ b/arch/x86/kernel/head_32.S
> @@ -149,6 +149,10 @@ ENTRY(startup_32)
>   call load_ucode_bsp
>  #endif
>  
> +#ifdef CONFIG_ACPI_INITRD_TABLE_OVERRIDE
> + call x86_acpi_override_find
> +#endif

The function is always defined.  We can probalby lose ifdef here?

Also, does it really have to be called from head_32.S?  No way this
can be done after entering C code?  It would be great if you can
explain overall design choices in the head message (and important
patches).

> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> index 668e658..d43545a 100644
> --- a/arch/x86/kernel/setup.c
> +++ b/arch/x86/kernel/setup.c
> @@ -424,6 +424,32 @@ static void __init reserve_initrd(void)
>  }
>  #endif /* CONFIG_BLK_DEV_INITRD */
>  
> +#ifdef CONFIG_ACPI_INITRD_TABLE_OVERRIDE
> +void __init x86_acpi_override_find(void)
> +{
> + unsigned long ramdisk_image, ramdisk_size;
> + unsigned char *p = NULL;
> +
> +#ifdef CONFIG_X86_32
> + struct boot_params *boot_params_p;
> +
> + boot_params_p = (struct boot_params *)__pa_symbol(&boot_params);
> + ramdisk_image = boot_params_p->hdr.ramdisk_image;
> + ramdisk_size  = boot_params_p->hdr.ramdisk_size;
> + p = (unsigned char *)ramdisk_image;
> + acpi_initrd_override_find(p, ramdisk_size, true);
> +#else
> + ramdisk_image = get_ramdisk_image();
> + ramdisk_size  = get_ramdisk_size();
> + if (ramdisk_image)
> + p = __va(ramdisk_image);
> + acpi_initrd_override_find(p, ramdisk_size, false);
> +#endif
> +}
> +#else
> +void __init x86_acpi_override_find(void) { }

And add a comment here why we're not doing static inline for the dummy
function?

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 3/3] intel_idle: set the state_tables array as __initdata to save mem

2013-03-07 Thread Chuansheng Liu


Currently, in intel_idle.c, there are 5 state_tables array, every
array size is sizeof(struct cpuidle_state) * CPUIDLE_STATE_MAX.

As in intel_idle_cpuidle_driver_init(), we have copied the data into
intel_idle_driver->state[], so do not need to keep state_tables[]
there any more after system init.

It will save about 3~4k memory, also benefits mobile devices.
Here changing them as __initdata, also removing global var
cpuidle_state_table pointer.

Signed-off-by: liu chuansheng 
---
 drivers/idle/intel_idle.c |   18 --
 1 files changed, 8 insertions(+), 10 deletions(-)

diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c
index 503b401..cbdc952 100644
--- a/drivers/idle/intel_idle.c
+++ b/drivers/idle/intel_idle.c
@@ -99,8 +99,6 @@ static int intel_idle(struct cpuidle_device *dev,
struct cpuidle_driver *drv, int index);
 static int intel_idle_cpu_init(int cpu);
 
-static struct cpuidle_state *cpuidle_state_table;
-
 /*
  * Set this flag for states where the HW flushes the TLB for us
  * and so we don't need cross-calls to keep it consistent.
@@ -124,7 +122,7 @@ static struct cpuidle_state *cpuidle_state_table;
  * which is also the index into the MWAIT hint array.
  * Thus C0 is a dummy.
  */
-static struct cpuidle_state nehalem_cstates[CPUIDLE_STATE_MAX] = {
+static struct cpuidle_state nehalem_cstates[CPUIDLE_STATE_MAX] __initdata = {
{
.name = "C1-NHM",
.desc = "MWAIT 0x00",
@@ -157,7 +155,7 @@ static struct cpuidle_state 
nehalem_cstates[CPUIDLE_STATE_MAX] = {
.enter = NULL }
 };
 
-static struct cpuidle_state snb_cstates[CPUIDLE_STATE_MAX] = {
+static struct cpuidle_state snb_cstates[CPUIDLE_STATE_MAX] __initdata = {
{
.name = "C1-SNB",
.desc = "MWAIT 0x00",
@@ -197,7 +195,7 @@ static struct cpuidle_state snb_cstates[CPUIDLE_STATE_MAX] 
= {
.enter = NULL }
 };
 
-static struct cpuidle_state ivb_cstates[CPUIDLE_STATE_MAX] = {
+static struct cpuidle_state ivb_cstates[CPUIDLE_STATE_MAX] __initdata = {
{
.name = "C1-IVB",
.desc = "MWAIT 0x00",
@@ -237,7 +235,7 @@ static struct cpuidle_state ivb_cstates[CPUIDLE_STATE_MAX] 
= {
.enter = NULL }
 };
 
-static struct cpuidle_state hsw_cstates[CPUIDLE_STATE_MAX] = {
+static struct cpuidle_state hsw_cstates[CPUIDLE_STATE_MAX] __initdata = {
{
.name = "C1-HSW",
.desc = "MWAIT 0x00",
@@ -277,7 +275,7 @@ static struct cpuidle_state hsw_cstates[CPUIDLE_STATE_MAX] 
= {
.enter = NULL }
 };
 
-static struct cpuidle_state atom_cstates[CPUIDLE_STATE_MAX] = {
+static struct cpuidle_state atom_cstates[CPUIDLE_STATE_MAX] __initdata = {
{
.name = "C1E-ATM",
.desc = "MWAIT 0x00",
@@ -472,7 +470,7 @@ MODULE_DEVICE_TABLE(x86cpu, intel_idle_ids);
 /*
  * intel_idle_probe()
  */
-static int intel_idle_probe(void)
+static int __init intel_idle_probe(void)
 {
unsigned int eax, ebx, ecx;
const struct x86_cpu_id *id;
@@ -504,7 +502,6 @@ static int intel_idle_probe(void)
pr_debug(PREFIX "MWAIT substates: 0x%x\n", mwait_substates);
 
icpu = (const struct idle_cpu *)id->driver_data;
-   cpuidle_state_table = icpu->state_table;
 
if (boot_cpu_has(X86_FEATURE_ARAT)) /* Always Reliable APIC Timer */
lapic_timer_reliable_states = LAPIC_TIMER_ALWAYS_RELIABLE;
@@ -540,10 +537,11 @@ static void intel_idle_cpuidle_devices_uninit(void)
  * intel_idle_cpuidle_driver_init()
  * allocate, initialize cpuidle_states
  */
-static int intel_idle_cpuidle_driver_init(void)
+static int __init intel_idle_cpuidle_driver_init(void)
 {
int cstate;
struct cpuidle_driver *drv = &intel_idle_driver;
+   struct cpuidle_state *cpuidle_state_table = icpu->state_table;
 
drv->state_count = 1;
 
-- 
1.7.0.4



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 2/3] intel_idle: Removing the redundant calculating for dev->state_count

2013-03-07 Thread Chuansheng Liu


In function intel_idle_cpu_init() and intel_idle_cpuidle_driver_init(),
they are having the same for(;;) loop.

Here in intel_idle_cpu_init(), the dev->state_count can be assigned by
drv->state_count directly.

Signed-off-by: liu chuansheng 
---
 drivers/idle/intel_idle.c |   30 ++
 1 files changed, 2 insertions(+), 28 deletions(-)

diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c
index 17c9cf9..503b401 100644
--- a/drivers/idle/intel_idle.c
+++ b/drivers/idle/intel_idle.c
@@ -599,38 +599,12 @@ static int intel_idle_cpuidle_driver_init(void)
  */
 static int intel_idle_cpu_init(int cpu)
 {
-   int cstate;
struct cpuidle_device *dev;
+   struct cpuidle_driver *drv = &intel_idle_driver;
 
dev = per_cpu_ptr(intel_idle_cpuidle_devices, cpu);
 
-   dev->state_count = 1;
-
-   for (cstate = 0; cstate < CPUIDLE_STATE_MAX; ++cstate) {
-   int num_substates, mwait_hint, mwait_cstate, mwait_substate;
-
-   if (cpuidle_state_table[cstate].enter == NULL)
-   break;
-
-   if (cstate + 1 > max_cstate) {
-   printk(PREFIX "max_cstate %d reached\n", max_cstate);
-   break;
-   }
-
-   mwait_hint = flg2MWAIT(cpuidle_state_table[cstate].flags);
-   mwait_cstate = MWAIT_HINT2CSTATE(mwait_hint);
-   mwait_substate = MWAIT_HINT2SUBSTATE(mwait_hint);
-
-   /* does the state exist in CPUID.MWAIT? */
-   num_substates = (mwait_substates >> ((mwait_cstate + 1) * 4))
-   & MWAIT_SUBSTATE_MASK;
-
-   /* if sub-state in table is not enumerated by CPUID */
-   if ((mwait_substate + 1) > num_substates)
-   continue;
-
-   dev->state_count += 1;
-   }
+   dev->state_count = drv->state_count;
 
dev->cpu = cpu;
 
-- 
1.7.0.4



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

3.9-rc1: crash kernel panic - not syncing: Can not allocate SWIOTLB buffer earlier and can't now provide you with the DMA bounce buffer

2013-03-07 Thread WANG Chao

Hi, All

On 3.9-rc1, I load crash kernel with latest kexec-tools(up to 28d413a), but
2nd kernel panic at early time:
[2.948076] Kernel panic - not syncing: Can not allocate SWIOTLB buffer 
earlier and can't now provide you with the DMA bounce buffer
[2.959958] Pid: 53, comm: khubd Not tainted 3.9.0-rc1+ #1
[2.965426] Call Trace:
[2.967866]  [] panic+0xc1/0x1d0
[2.972644]  [] swiotlb_tbl_map_single+0x27c/0x280
[2.978991]  [] map_single+0x19/0x20
[2.984115]  [] swiotlb_map_page+0x6e/0x160
[2.989845]  [] usb_hcd_map_urb_for_dma+0x230/0x4a0
[2.996268]  [] usb_hcd_submit_urb+0x295/0x8e0
[3.002258]  [] ? __dequeue_entity+0x2f/0x50
[3.008076]  [] ? __switch_to+0x13e/0x4a0
[3.013632]  [] usb_submit_urb+0xff/0x3d0
[3.019186]  [] ? __schedule+0x3de/0x7e0
[3.024657]  [] usb_start_wait_urb+0x6a/0x160
[3.030560]  [] ? __kmalloc+0x55/0x210
[3.035856]  [] ? usb_alloc_urb+0x1e/0x50
[3.041411]  [] usb_control_msg+0xde/0x140
[3.047056]  [] ? hub_port_init+0x310/0xaf0
[3.052785]  [] ? hub_port_init+0x2eb/0xaf0
[3.058515]  [] hub_port_init+0x338/0xaf0
[3.064071]  [] ? update_autosuspend+0x39/0x60
[3.070062]  [] ? 
pm_runtime_set_autosuspend_delay+0x49/0x70
[3.077264]  [] hub_port_connect_change+0x24a/0xaa0
[3.083684]  [] hub_events+0x2ea/0x910
[3.088981]  [] ? __schedule+0x3de/0x7e0
[3.094451]  [] hub_thread+0x35/0x1e0
[3.099661]  [] ? wake_up_bit+0x40/0x40
[3.105045]  [] ? hub_events+0x910/0x910
[3.110514]  [] kthread+0xc0/0xd0
[3.115378]  [] ? kthread_create_on_node+0x120/0x120
[3.121887]  [] ret_from_fork+0x7c/0xb0
[3.127271]  [] ? kthread_create_on_node+0x120/0x120


Here's the full log:
# grep 'Crash' /proc/iomem
  14600-14dff : Crash kernel

# dmesg | grep -i reserving
[0.00] Reserving 128MB of memory at 5216MB for crashkernel (System RAM: 
3977MB)

# kexec -p /boot/vmlinuz-3.9.0-rc1+ --command-line='console=ttyS0,115200n81'
# echo c > /proc/sysrq-trigger

[  217.879315] SysRq : Trigger a crash
[  217.882836] BUG: unable to handle kernel NULL pointer dereference at 
  (null)
[  217.890674] IP: [] sysrq_handle_crash+0x16/0x20
[  217.896773] PGD 13df22067 PUD 139726067 PMD 0
[  217.901244] Oops: 0002 [#1] SMP
[  217.904491] Modules linked in: lockd sunrpc nf_conntrack_netbios_ns 
nf_conntrack_broadcast ipt_MASQUERADE ip6table_mangle ip6t_REJECT 
nf_conntrack_ipv6 nf_defrag_ipv6 iptable_nat nf_nat_ip
v4 nf_nat iptable_mangle ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 
xt_conntrack nf_conntrack ebtable_filter ebtables ip6table_filter ip6_tables 
iptable_filter ip_tables sg coretemp kvm_inte
l kvm e1000e iTCO_wdt crc32c_intel iTCO_vendor_support ptp ghash_clmulni_intel 
pps_core mei microcode pcspkr i2c_i801 lpc_ich mfd_core xfs libcrc32c sr_mod 
sd_mod cdrom crc_t10dif i915 i2c_al
go_bit drm_kms_helper drm ahci libahci libata i2c_core video dm_mirror 
dm_region_hash dm_log dm_mod
[  217.963690] CPU 0
[  217.965526] Pid: 1206, comm: bash Not tainted 3.9.0-rc1+ #1 Intel 
Corporation 2012 Client Platform/Emerald Lake 2
[  217.975948] RIP: 0010:[]  [] 
sysrq_handle_crash+0x16/0x20
[  217.984468] RSP: 0018:8801367e9e38  EFLAGS: 00010092
[  217.989765] RAX: 000f RBX: 819b67c0 RCX: 88014e20ffe8
[  217.996881] RDX:  RSI: 88014e20e3b8 RDI: 0063
[  218.003998] RBP: 8801367e9e38 R08: 81c06280 R09: 0419
[  218.03] R10: 0002 R11: 0418 R12: 0063
[  218.018230] R13: 0286 R14:  R15: 0007
[  218.025346] FS:  7fdd48ace740() GS:88014e20() 
knlGS:
[  218.033416] CS:  0010 DS:  ES:  CR0: 80050033
[  218.039147] CR2:  CR3: 00013a67c000 CR4: 001407f0
[  218.046263] DR0:  DR1:  DR2: 
[  218.053379] DR3:  DR6: 0ff0 DR7: 0400
[  218.060496] Process bash (pid: 1206, threadinfo 8801367e8000, task 
88013e8ae5c0)
[  218.068564] Stack:
[  218.070570]  8801367e9e78 813c2147 88013e8ae5c0 
0002
[  218.078001]  88013c4f9200 7fdd48ad1000 0002 
8801367e9f50
[  218.085427]  8801367e9ea8 813c21fa 88013c4f9200 
7fdd48ad1000
[  218.092854] Call Trace:
[  218.095298]  [] __handle_sysrq+0x127/0x190
[  218.100947]  [] write_sysrq_trigger+0x4a/0x50
[  218.106854]  [] proc_reg_write+0x75/0xb0
[  218.112329]  [] vfs_write+0xac/0x180
[  218.117456]  [] sys_write+0x52/0xa0
[  218.122499]  [] ? do_page_fault+0xe/0x10
[  218.127977]  [] system_call_fastpath+0x16/0x1b
[  218.133970] Code: 89 ef e8 ee f7 ff ff eb c3 66 2e 0f 1f 84 00 00 00 00 00 
66 90 0f 1f 44 00 00 55 c7 05 64 44 84 00 01 00 00 00 48 89 e5 0f ae f8  04 
25 00 00 00 00 01 5d c3 0f 1f 44 
00 00 55 48 89 e5 53 48
[  218.153653] RIP  [] sysrq_handle_crash+0

[PATCH 1/3] intel_idle: changing the continue to break in intel_idle_cpu_init()

2013-03-07 Thread Chuansheng Liu


According to commit e022e7eb9, the .enter == NULL is the last one in
state_tables[].

So just like intel_idle_cpuidle_driver_init(), in case of .enter == NULL,
breaking the for(;;) loop directly.

Signed-off-by: liu chuansheng 
---
 drivers/idle/intel_idle.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c
index 5d66750..17c9cf9 100644
--- a/drivers/idle/intel_idle.c
+++ b/drivers/idle/intel_idle.c
@@ -610,7 +610,7 @@ static int intel_idle_cpu_init(int cpu)
int num_substates, mwait_hint, mwait_cstate, mwait_substate;
 
if (cpuidle_state_table[cstate].enter == NULL)
-   continue;
+   break;
 
if (cstate + 1 > max_cstate) {
printk(PREFIX "max_cstate %d reached\n", max_cstate);
-- 
1.7.0.4



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 0/3] intel_idle: set the state_tables array into __initdata to save mem

2013-03-07 Thread Chuansheng Liu

As Daniel suggested, I did some cleanup before setting the state_tables array
into __initdata.

Thanks your help to review them.

[PATCH 1/3] intel_idle: changing the continue to break in intel_idle_cpu_init()
[PATCH 2/3] intel_idle: Removing the redundant calculating for dev->state_count
[PATCH 3/3] intel_idle: set the state_tables array as __initdata to save mem


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 04/14] x86, ACPI: make acpi override finding work with 32bit flat mode

2013-03-07 Thread Tejun Heo

On Thu, Mar 07, 2013 at 08:58:30PM -0800, Yinghai Lu wrote:
> We will find acpi tables in initrd during head_32.S in 32bit flat mode.
> 
> So need acpi_initrd_override_find could take phys directly.

The patch description doesn't explain even half of what's going on.

> @@ -552,38 +552,47 @@ u8 __init acpi_table_checksum(u8 *buffer, u32 length)
>   return sum;
>  }
>  
> -/* All but ACPI_SIG_RSDP and ACPI_SIG_FACS: */
> -static const char * const table_sigs[] = {
> - ACPI_SIG_BERT, ACPI_SIG_CPEP, ACPI_SIG_ECDT, ACPI_SIG_EINJ,
> - ACPI_SIG_ERST, ACPI_SIG_HEST, ACPI_SIG_MADT, ACPI_SIG_MSCT,
> - ACPI_SIG_SBST, ACPI_SIG_SLIT, ACPI_SIG_SRAT, ACPI_SIG_ASF,
> - ACPI_SIG_BOOT, ACPI_SIG_DBGP, ACPI_SIG_DMAR, ACPI_SIG_HPET,
> - ACPI_SIG_IBFT, ACPI_SIG_IVRS, ACPI_SIG_MCFG, ACPI_SIG_MCHI,
> - ACPI_SIG_SLIC, ACPI_SIG_SPCR, ACPI_SIG_SPMI, ACPI_SIG_TCPA,
> - ACPI_SIG_UEFI, ACPI_SIG_WAET, ACPI_SIG_WDAT, ACPI_SIG_WDDT,
> - ACPI_SIG_WDRT, ACPI_SIG_DSDT, ACPI_SIG_FADT, ACPI_SIG_PSDT,
> - ACPI_SIG_RSDT, ACPI_SIG_XSDT, ACPI_SIG_SSDT, NULL };

Why is this table made a stack variable?  What's the benefit of doing
that?

>  /* Non-fatal errors: Affected tables/files are ignored */
>  #define INVALID_TABLE(x, path, name) \
> - { pr_err("ACPI OVERRIDE: " x " [%s%s]\n", path, name); continue; }
> + do { pr_err("ACPI OVERRIDE: " x " [%s%s]\n", path, name); } while (0)

Might as well rename the macro to something which indicates it's just
printing error message.  Urgh... who thought embedding control flow
directive like continue inside a macro was a good idea? :(

> -void __init acpi_initrd_override_find(void *data, size_t size)
> +void __init acpi_initrd_override_find(void *data, size_t size, bool is_phys)

Is it really necessary to make the function take both virtual and
physical addresses?  Can't we just make the function take phys_addr_t
and update everyone to call with physaddr?  Also @is_phys isn't simple
address switch.  It also changes error reporting.  If you're gonna
keep @is_phys, let's at least write up a function comment explaining
what's going on and why we need it.  But, really, if at all possible,
let's change the function to take single type of argument and
predicate error message printing on something else (e.g. early printk
initialized or whatever).

> @@ -654,11 +677,14 @@ void __init acpi_initrd_override_copy(void)
>   arch_reserve_mem_area(acpi_tables_addr, all_tables_size);
>  
>   for (no = 0; no < table_nr; no++) {
> + unsigned long phys_addr = (unsigned 
> long)early_initrd_files[no].data;

Can we please use phys_addr_t for physical addresses?

>   unsigned long size = early_initrd_files[no].size;
>  
> + q = early_ioremap(phys_addr, size);
> + pr_info("%4.4s ACPI table found in initrd [%#010lx-%#010lx]\n",
> + ((struct acpi_table_header *)q)->signature,
> + phys_addr, phys_addr + size - 1);

Maybe putting pr_info after ioremapping both p and q would be easier
on the eyes?

>   p = early_ioremap(acpi_tables_addr + total_offset, size);
> - q = early_ioremap((unsigned long)early_initrd_files[no].data,
> -  size);
>   memcpy(p, q, size);
>   early_iounmap(q, size);
>   early_iounmap(p, size);

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

irq_work: WARNING: at kernel/irq_work.c:98 irq_work_needs_cpu+0x8a/0xb0()

2013-03-07 Thread Sasha Levin

Hi guys,

While fuzzing with trinity inside a KVM tools guest it seemed to have hit the
new warning in kernel/irq_work.c:

[  486.527075] [ cut here ]
[  486.527788] WARNING: at kernel/irq_work.c:98 irq_work_needs_cpu+0x8a/0xb0()
[  486.528870] Modules linked in:
[  486.529377] Pid: 0, comm: swapper/2 Tainted: GW
3.9.0-rc1-next-20130307-sasha-00047-g0a7d304-dirty #1037
[  486.530165] Call Trace:
[  486.530165]  [] warn_slowpath_common+0x8c/0xc0
[  486.530165]  [] warn_slowpath_null+0x15/0x20
[  486.530165]  [] irq_work_needs_cpu+0x8a/0xb0
[  486.530165]  [] tick_nohz_stop_sched_tick+0x95/0x2a0
[  486.530165]  [] __tick_nohz_idle_enter+0x189/0x1b0
[  486.530165]  [] tick_nohz_idle_enter+0xa1/0xd0
[  486.530165]  [] cpu_idle+0x77/0x180
[  486.530165]  [] ? setup_APIC_timer+0xc9/0xce
[  486.530165]  [] start_secondary+0xe1/0xe8
[  486.530165] ---[ end trace dd075f5cfc2c4f26 ]---

Obviously this was happening when trinity tried to exercise the shutdown 
syscall.

It was followed by RCU choking and causing a bunch of locked tasks, preventing
shutdown. I guess it's the result of whatever caused this warning above to
happen, but in-case it isn't, the relevant parts of the RCU hang are:

[  607.040283] INFO: task init:1 blocked for more than 120 seconds.
[  607.042932] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[  607.047046] initD 8800ba308000  2736 1  0 0x
[  607.050498]  8800ba311b18 0002 0003 
8800ba308000
[  607.055110]  8800ba31 8800ba310010 8800ba311fd8 
8800ba31
[  607.058208]  8800ba310010 8800ba311fd8 8542c420 
8800ba308000
[  607.060611] Call Trace:
[  607.060847]  [] ? __mutex_lock_common+0x365/0x5d0
[  607.061462]  [] schedule+0x55/0x60
[  607.061948]  [] schedule_preempt_disabled+0x13/0x20
[  607.062590]  [] __mutex_lock_common+0x3a5/0x5d0
[  607.063209]  [] ? rcu_cleanup_dead_cpu+0x52/0x250
[  607.063840]  [] ? free_cpumask_var+0x9/0x10
[  607.064453]  [] ? rcu_cleanup_dead_cpu+0x52/0x250
[  607.065035]  [] mutex_lock_nested+0x40/0x50
[  607.065606]  [] rcu_cleanup_dead_cpu+0x52/0x250
[  607.066230]  [] ? trace_hardirqs_on+0xd/0x10
[  607.066810]  [] rcu_cpu_notify+0x1b4/0x1ef
[  607.067375]  [] notifier_call_chain+0xee/0x130
[  607.067975]  [] __raw_notifier_call_chain+0x9/0x10
[  607.068631]  [] __cpu_notify+0x1b/0x30
[  607.069165]  [] cpu_notify_nofail+0x10/0x30
[  607.069749]  [] _cpu_down+0x185/0x2e0
[  607.070319]  [] disable_nonboot_cpus+0x88/0x1b0
[  607.070937]  [] kernel_restart+0x16/0x60
[  607.071487]  [] SYSC_reboot+0x18c/0x2a0
[  607.072020]  [] ? rcu_cleanup_after_idle+0x23/0xf0
[  607.072635]  [] ? rcu_eqs_exit_common+0x64/0x280
[  607.073251]  [] ? user_exit+0xc5/0x100
[  607.073772]  [] ? trace_hardirqs_on+0xd/0x10
[  607.074352]  [] ? syscall_trace_enter+0x23/0x290
[  607.075054]  [] SyS_reboot+0x9/0x10
[  607.075495]  [] tracesys+0xdd/0xe2
[  607.075967] 4 locks held by init/1:
[  607.076439]  #0:  (reboot_mutex){+.+.+.}, at: [] 
SYSC_reboot+0xe6/0x2a0
[  607.077276]  #1:  (cpu_add_remove_lock){+.+.+.}, at: [] 
cpu_maps_update_begin+0x12/0x20
[  607.078288]  #2:  (cpu_hotplug.lock){+.+.+.}, at: [] 
cpu_hotplug_begin+0x27/0x60
[  607.079260]  #3:  (rcu_preempt_state.onoff_mutex){+.+...}, at: 
[] rcu_cleanup_dead_cpu+0x52/0x250

[  607.187177] rcu_preempt D 8800aa8884a8  513611  2 0x
[  607.187890]  8800ba391c08 0002 8800ba391bb8 
00078117e00a
[  607.188674]  8800ba39 8800ba390010 8800ba391fd8 
8800ba39
[  607.189472]  8800ba390010 8800ba391fd8 8800ba308000 
8800ba388000
[  607.190581] Call Trace:
[  607.190849]  [] schedule+0x55/0x60
[  607.191336]  [] schedule_timeout+0x276/0x2c0
[  607.191904]  [] ? lock_timer_base+0x70/0x70
[  607.192460]  [] schedule_timeout_uninterruptible+0x19/0x20
[  607.193132]  [] rcu_gp_init+0x438/0x490
[  607.193646]  [] ? trace_hardirqs_on+0xd/0x10
[  607.194216]  [] rcu_gp_kthread+0xbc/0x2d0
[  607.194760]  [] ? rcu_gp_init+0x490/0x490
[  607.195298]  [] ? wake_up_bit+0x40/0x40
[  607.195823]  [] ? rcu_gp_init+0x490/0x490
[  607.196364]  [] kthread+0xe2/0xf0
[  607.196842]  [] ? __lock_release+0x1da/0x1f0
[  607.197405]  [] ? __init_kthread_worker+0x70/0x70
[  607.198022]  [] ret_from_fork+0x7c/0xb0
[  607.198559]  [] ? __init_kthread_worker+0x70/0x70

[  609.414891] Showing all locks held in the system:
[  609.415490] 4 locks held by init/1:
[  609.415836]  #0:  (reboot_mutex){+.+.+.}, at: [] 
SYSC_reboot+0xe6/0x2a0
[  609.416708]  #1:  (cpu_add_remove_lock){+.+.+.}, at: [] 
cpu_maps_update_begin+0x12/0x20
[  609.417712]  #2:  (cpu_hotplug.lock){+.+.+.}, at: [] 
cpu_hotplug_begin+0x27/0x60
[  609.418668]  #3:  (rcu_preempt_state.onoff_mutex){+.+...}, at: 
[] rcu_cleanup_dead_cpu+0x52/0x250
[  609.419819] 1 lock held by rcu_preempt/11:
[  609.420277]  #0:  (rcu_pr

Re: [PATCH 03/14] x86, ACPI: store override acpi tables phys addr

2013-03-07 Thread Tejun Heo

On Thu, Mar 07, 2013 at 08:58:29PM -0800, Yinghai Lu wrote:
> As later 32bit only find table with phys address during 32bit flat mode
> in head_32.S.
> 
> To keep 32bit and 64 bit consistent, use phys_addr for all.
> 
> Use early_ioremap to access during copying.
> 
> Signed-off-by: Yinghai Lu 
> Cc: Thomas Renninger 
> Cc: Rafael J. Wysocki 
> Cc: linux-a...@vger.kernel.org
> ---
> @@ -654,10 +654,13 @@ void __init acpi_initrd_override_copy(void)
>   arch_reserve_mem_area(acpi_tables_addr, all_tables_size);
>  
>   for (no = 0; no < table_nr; no++) {
> - size_t size = early_initrd_files[no].size;
> + unsigned long size = early_initrd_files[no].size;
>  
>   p = early_ioremap(acpi_tables_addr + total_offset, size);
> - memcpy(p, early_initrd_files[no].data, size);
> + q = early_ioremap((unsigned long)early_initrd_files[no].data,
> +  size);
> + memcpy(p, q, size);
> + early_iounmap(q, size);

Ah, okay, so the loop change in the previous patch was for this, I
suppose?  That chunk probably should either be a separate patch or
rolled into this one.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 02/14] x86, ACPI: Split find/copy from acpi_initrd_override

2013-03-07 Thread Tejun Heo

On Thu, Mar 07, 2013 at 08:58:28PM -0800, Yinghai Lu wrote:
> To parse srat early, we will need to move acpi table probing early.
> and to keep acpi_initrd_table_override working, we need to move it
> ahead.
> 
> But current that is called after init_mem_mapping and relocate_initrd().
> 
> Copying need to be after memblock is ready, because it need to allocate
> some buffer for acpi tables.
> 
> Finding will be moved into head_32.S and head64.c, just like microcode
> early scanning.
> 
> So split them at first.
> 
> Also move down functions declaration to avoid #ifdef in setup.c
> 
> Signed-off-by: Yinghai 
> Cc: Thomas Renninger 
> Cc: Pekka Enberg 
> Cc: Jacob Shin 
> Cc: Rafael J. Wysocki 
> Cc: linux-a...@vger.kernel.org
...
> diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
> index c9e36d7..b9d2ff0 100644
> --- a/drivers/acpi/osl.c
> +++ b/drivers/acpi/osl.c
> @@ -539,6 +539,7 @@ acpi_os_predefined_override(const struct 
> acpi_predefined_names *init_val,
>  
>  static u64 acpi_tables_addr;
>  static int all_tables_size;
> +static int table_nr;

Not particularly good choice of name for static variable visible to
multiple functions.  all_tables_size isn't a stellar choice either but
no need to continue the tradition.  Maybe acpi_nr_initrd_files?  Also,
why is this one defined here away from the actual table?

> -/* Must not increase 10 or needs code modification below */
> -#define ACPI_OVERRIDE_TABLES 10
> +#define ACPI_OVERRIDE_TABLES 64

What's up with the silent bumping of table size?

> +static struct cpio_data __initdata early_initrd_files[ACPI_OVERRIDE_TABLES];

acpi_initrd_files[]?  Do we really need the "early" designation
together with initrd?

> @@ -647,14 +653,14 @@ void __init acpi_initrd_override(void *data, size_t 
> size)
>   memblock_reserve(acpi_tables_addr, acpi_tables_addr + all_tables_size);
>   arch_reserve_mem_area(acpi_tables_addr, all_tables_size);
>  
> - p = early_ioremap(acpi_tables_addr, all_tables_size);
> -
>   for (no = 0; no < table_nr; no++) {
> - memcpy(p + total_offset, early_initrd_files[no].data,
> -early_initrd_files[no].size);
> - total_offset += early_initrd_files[no].size;
> + size_t size = early_initrd_files[no].size;
> +
> + p = early_ioremap(acpi_tables_addr + total_offset, size);
> + memcpy(p, early_initrd_files[no].data, size);
> + early_iounmap(p, size);
> + total_offset += size;
>   }
> - early_iounmap(p, all_tables_size);

Why is this necessary?  Why no explanation in the description?

> --- a/include/linux/acpi.h
> +++ b/include/linux/acpi.h
> @@ -79,14 +79,6 @@ typedef int (*acpi_tbl_table_handler)(struct 
> acpi_table_header *table);
>  typedef int (*acpi_tbl_entry_handler)(struct acpi_subtable_header *header,
> const unsigned long end);
>  
> -#ifdef CONFIG_ACPI_INITRD_TABLE_OVERRIDE
> -void acpi_initrd_override(void *data, size_t size);
> -#else
> -static inline void acpi_initrd_override(void *data, size_t size)
> -{
> -}
> -#endif
> -
>  char * __acpi_map_table (unsigned long phys_addr, unsigned long size);
>  void __acpi_unmap_table(char *map, unsigned long size);
>  int early_acpi_boot_init(void);
> @@ -485,6 +477,14 @@ static inline bool acpi_driver_match_device(struct 
> device *dev,
>  
>  #endif   /* !CONFIG_ACPI */
>  
> +#ifdef CONFIG_ACPI_INITRD_TABLE_OVERRIDE
> +void acpi_initrd_override_find(void *data, size_t size);
> +void acpi_initrd_override_copy(void);
> +#else
> +static inline void acpi_initrd_override_find(void *data, size_t size) { }
> +static inline void acpi_initrd_override_copy(void) { }
> +#endif

I don't get this part either.  Why is it necessary to move the
prototypes to avoid #ifdefs in setup.c?  Ah, okay, you're brining it
outside CONFIG_ACPI so that they're defined regardless of that config
option.  Can you please add why you're moving the prototype in the
descriptoin?  Having "what" is nice but "why" is much nicer. :)

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 01/14] x86, ACPI, mm: Kill max_low_pfn_mapped

2013-03-07 Thread Tejun Heo

On Thu, Mar 7, 2013 at 9:27 PM, Yinghai Lu  wrote:
> They are not using memblock_find_in_range(), so 1ULL<< will not help.
>
> Really hope i915 drm guys could clean that hacks.

The code isn't being used.  Just leave it alone.  Maybe add a comment.
 The change is just making things more confusing.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 01/14] x86, ACPI, mm: Kill max_low_pfn_mapped

2013-03-07 Thread Yinghai Lu

On Thu, Mar 7, 2013 at 9:25 PM, Tejun Heo  wrote:
> On Thu, Mar 7, 2013 at 9:22 PM, Yinghai Lu  wrote:
>> On Thu, Mar 7, 2013 at 9:10 PM, Tejun Heo  wrote:
>>> On Thu, Mar 07, 2013 at 08:58:27PM -0800, Yinghai Lu wrote:
 diff --git a/drivers/gpu/drm/i915/i915_gem_stolen.c 
 b/drivers/gpu/drm/i915/i915_gem_stolen.c
 index 69d97cb..7f9380b 100644
 --- a/drivers/gpu/drm/i915/i915_gem_stolen.c
 +++ b/drivers/gpu/drm/i915/i915_gem_stolen.c
 @@ -81,7 +81,7 @@ static unsigned long i915_stolen_to_physical(struct 
 drm_device *dev)
   base -= dev_priv->mm.gtt->stolen_size;
   } else {
   /* Stolen is immediately above Top of Memory */
 - base = max_low_pfn_mapped << PAGE_SHIFT;
 + base = __REMOVED_CRAZY__ << PAGE_SHIFT;
>>>
>>> Huh?
>>
>> Whole function:
>
> Yeah, but can't we still just do 1LLU << 32 like other places? Or at
> least explain what was there before? It's gonna confuse the hell out
> of future readers of the code.

They are not using memblock_find_in_range(), so 1ULL<< will not help.

Really hope i915 drm guys could clean that hacks.

Thanks

Yinghai
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 01/14] x86, ACPI, mm: Kill max_low_pfn_mapped

2013-03-07 Thread Tejun Heo

On Thu, Mar 7, 2013 at 9:22 PM, Yinghai Lu  wrote:
> On Thu, Mar 7, 2013 at 9:10 PM, Tejun Heo  wrote:
>> On Thu, Mar 07, 2013 at 08:58:27PM -0800, Yinghai Lu wrote:
>>> diff --git a/drivers/gpu/drm/i915/i915_gem_stolen.c 
>>> b/drivers/gpu/drm/i915/i915_gem_stolen.c
>>> index 69d97cb..7f9380b 100644
>>> --- a/drivers/gpu/drm/i915/i915_gem_stolen.c
>>> +++ b/drivers/gpu/drm/i915/i915_gem_stolen.c
>>> @@ -81,7 +81,7 @@ static unsigned long i915_stolen_to_physical(struct 
>>> drm_device *dev)
>>>   base -= dev_priv->mm.gtt->stolen_size;
>>>   } else {
>>>   /* Stolen is immediately above Top of Memory */
>>> - base = max_low_pfn_mapped << PAGE_SHIFT;
>>> + base = __REMOVED_CRAZY__ << PAGE_SHIFT;
>>
>> Huh?
>
> Whole function:

Yeah, but can't we still just do 1LLU << 32 like other places? Or at
least explain what was there before? It's gonna confuse the hell out
of future readers of the code.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 01/14] x86, ACPI, mm: Kill max_low_pfn_mapped

2013-03-07 Thread Yinghai Lu

On Thu, Mar 7, 2013 at 9:10 PM, Tejun Heo  wrote:
> On Thu, Mar 07, 2013 at 08:58:27PM -0800, Yinghai Lu wrote:
>> diff --git a/drivers/gpu/drm/i915/i915_gem_stolen.c 
>> b/drivers/gpu/drm/i915/i915_gem_stolen.c
>> index 69d97cb..7f9380b 100644
>> --- a/drivers/gpu/drm/i915/i915_gem_stolen.c
>> +++ b/drivers/gpu/drm/i915/i915_gem_stolen.c
>> @@ -81,7 +81,7 @@ static unsigned long i915_stolen_to_physical(struct 
>> drm_device *dev)
>>   base -= dev_priv->mm.gtt->stolen_size;
>>   } else {
>>   /* Stolen is immediately above Top of Memory */
>> - base = max_low_pfn_mapped << PAGE_SHIFT;
>> + base = __REMOVED_CRAZY__ << PAGE_SHIFT;
>
> Huh?

Whole function:

static unsigned long i915_stolen_to_physical(struct drm_device *dev)
{
struct drm_i915_private *dev_priv = dev->dev_private;
struct pci_dev *pdev = dev_priv->bridge_dev;
u32 base;

/* On the machines I have tested the Graphics Base of Stolen Memory
 * is unreliable, so on those compute the base by subtracting the
 * stolen memory from the Top of Low Usable DRAM which is where the
 * BIOS places the graphics stolen memory.
 *
 * On gen2, the layout is slightly different with the Graphics Segment
 * immediately following Top of Memory (or Top of Usable DRAM). Note
 * it appears that TOUD is only reported by 865g, so we just use the
 * top of memory as determined by the e820 probe.
 *
 * XXX gen2 requires an unavailable symbol and 945gm fails with
 * its value of TOLUD.
 */
base = 0;
if (INTEL_INFO(dev)->gen >= 6) {
/* Read Base Data of Stolen Memory Register (BDSM) directly.
 * Note that there is also a MCHBAR miror at 0x1080c0 or
 * we could use device 2:0x5c instead.
*/
pci_read_config_dword(pdev, 0xB0, &base);
base &= ~4095; /* lower bits used for locking register */
} else if (INTEL_INFO(dev)->gen > 3 || IS_G33(dev)) {
/* Read Graphics Base of Stolen Memory directly */
pci_read_config_dword(pdev, 0xA4, &base);
#if 0
} else if (IS_GEN3(dev)) {
u8 val;
/* Stolen is immediately below Top of Low Usable DRAM */
pci_read_config_byte(pdev, 0x9c, &val);
base = val >> 3 << 27;
base -= dev_priv->mm.gtt->stolen_size;
} else {
/* Stolen is immediately above Top of Memory */
base = __REMOVED_CRAZY__ << PAGE_SHIFT;
#endif
}

return base;
}
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 01/14] x86, ACPI, mm: Kill max_low_pfn_mapped

2013-03-07 Thread Tejun Heo

On Thu, Mar 07, 2013 at 08:58:27PM -0800, Yinghai Lu wrote:
> diff --git a/drivers/gpu/drm/i915/i915_gem_stolen.c 
> b/drivers/gpu/drm/i915/i915_gem_stolen.c
> index 69d97cb..7f9380b 100644
> --- a/drivers/gpu/drm/i915/i915_gem_stolen.c
> +++ b/drivers/gpu/drm/i915/i915_gem_stolen.c
> @@ -81,7 +81,7 @@ static unsigned long i915_stolen_to_physical(struct 
> drm_device *dev)
>   base -= dev_priv->mm.gtt->stolen_size;
>   } else {
>   /* Stolen is immediately above Top of Memory */
> - base = max_low_pfn_mapped << PAGE_SHIFT;
> + base = __REMOVED_CRAZY__ << PAGE_SHIFT;

Huh?

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 03/14] x86, ACPI: store override acpi tables phys addr

2013-03-07 Thread Yinghai Lu

As later 32bit only find table with phys address during 32bit flat mode
in head_32.S.

To keep 32bit and 64 bit consistent, use phys_addr for all.

Use early_ioremap to access during copying.

Signed-off-by: Yinghai Lu 
Cc: Thomas Renninger 
Cc: Rafael J. Wysocki 
Cc: linux-a...@vger.kernel.org
---
 drivers/acpi/osl.c |   11 +++
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
index b9d2ff0..60317ea 100644
--- a/drivers/acpi/osl.c
+++ b/drivers/acpi/osl.c
@@ -616,7 +616,7 @@ void __init acpi_initrd_override_find(void *data, size_t 
size)
table->signature, cpio_path, file.name, table->length);
 
all_tables_size += table->length;
-   early_initrd_files[table_nr].data = file.data;
+   early_initrd_files[table_nr].data = (void *)__pa(file.data);
early_initrd_files[table_nr].size = file.size;
table_nr++;
}
@@ -625,7 +625,7 @@ void __init acpi_initrd_override_find(void *data, size_t 
size)
 void __init acpi_initrd_override_copy(void)
 {
int no, total_offset = 0;
-   char *p;
+   char *p, *q;
 
if (!table_nr)
return;
@@ -654,10 +654,13 @@ void __init acpi_initrd_override_copy(void)
arch_reserve_mem_area(acpi_tables_addr, all_tables_size);
 
for (no = 0; no < table_nr; no++) {
-   size_t size = early_initrd_files[no].size;
+   unsigned long size = early_initrd_files[no].size;
 
p = early_ioremap(acpi_tables_addr + total_offset, size);
-   memcpy(p, early_initrd_files[no].data, size);
+   q = early_ioremap((unsigned long)early_initrd_files[no].data,
+size);
+   memcpy(p, q, size);
+   early_iounmap(q, size);
early_iounmap(p, size);
total_offset += size;
}
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 11/14] x86, acpi, numa: split SLIT handling out

2013-03-07 Thread Yinghai Lu

We need to handle slit later, as it need to allocate buffer.

Also we only need srat info before init_mem_mapping.

x86_acpi_numa_init become x86_acpi_numa_init_only_slit
x86_acpi_numa_init_no_slit.

Signed-off-by: Yinghai Lu 
Cc: Tejun Heo 
Cc: Rafael J. Wysocki 
Cc: linux-a...@vger.kernel.org
---
 arch/x86/include/asm/acpi.h |3 ++-
 arch/x86/mm/numa.c  |   13 -
 arch/x86/mm/srat.c  |8 ++--
 drivers/acpi/numa.c |   22 +++---
 include/linux/acpi.h|2 ++
 5 files changed, 41 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/acpi.h b/arch/x86/include/asm/acpi.h
index b31bf97..9f171a7 100644
--- a/arch/x86/include/asm/acpi.h
+++ b/arch/x86/include/asm/acpi.h
@@ -178,7 +178,8 @@ static inline void disable_acpi(void) { }
 
 #ifdef CONFIG_ACPI_NUMA
 extern int acpi_numa;
-extern int x86_acpi_numa_init(void);
+int x86_acpi_numa_init_no_slit(void);
+void x86_acpi_numa_init_only_slit(void);
 #endif /* CONFIG_ACPI_NUMA */
 
 #define acpi_unlazy_tlb(x) leave_mm(x)
diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
index ace0370..23ec6ba 100644
--- a/arch/x86/mm/numa.c
+++ b/arch/x86/mm/numa.c
@@ -640,6 +640,10 @@ static int __init dummy_numa_init(void)
return 0;
 }
 
+#ifdef CONFIG_ACPI_NUMA
+static bool srat_used __initdata;
+#endif
+
 /**
  * x86_numa_init - Initialize NUMA
  *
@@ -658,8 +662,10 @@ void __init x86_numa_init(void)
goto out;
 #endif
 #ifdef CONFIG_ACPI_NUMA
-   if (!numa_init(x86_acpi_numa_init))
+   if (!numa_init(x86_acpi_numa_init_no_slit)) {
+   srat_used = true;
goto out;
+   }
 #endif
 #ifdef CONFIG_AMD_NUMA
if (!numa_init(amd_numa_init))
@@ -670,6 +676,11 @@ void __init x86_numa_init(void)
numa_init(dummy_numa_init);
 
 out:
+#ifdef CONFIG_ACPI_NUMA
+   if (srat_used)
+   x86_acpi_numa_init_only_slit();
+#endif
+
numa_emulation(&numa_meminfo, numa_distance_cnt);
 
for (i = 0; i < mi->nr_blks; i++) {
diff --git a/arch/x86/mm/srat.c b/arch/x86/mm/srat.c
index cdd0da9..47a62b2 100644
--- a/arch/x86/mm/srat.c
+++ b/arch/x86/mm/srat.c
@@ -187,11 +187,15 @@ out_err:
 
 void __init acpi_numa_arch_fixup(void) {}
 
-int __init x86_acpi_numa_init(void)
+void __init x86_acpi_numa_init_only_slit(void)
+{
+   acpi_numa_init_only_slit();
+}
+int __init x86_acpi_numa_init_no_slit(void)
 {
int ret;
 
-   ret = acpi_numa_init();
+   ret = acpi_numa_init_no_slit();
if (ret < 0)
return ret;
return srat_disabled() ? -EINVAL : 0;
diff --git a/drivers/acpi/numa.c b/drivers/acpi/numa.c
index 33e609f..2215718 100644
--- a/drivers/acpi/numa.c
+++ b/drivers/acpi/numa.c
@@ -282,7 +282,13 @@ acpi_table_parse_srat(enum acpi_srat_type id,
handler, max_entries);
 }
 
-int __init acpi_numa_init(void)
+void __init acpi_numa_init_only_slit(void)
+{
+   /* SLIT: System Locality Information Table */
+   acpi_table_parse(ACPI_SIG_SLIT, acpi_parse_slit);
+}
+
+static int __init __acpi_numa_init(bool with_slit)
 {
int cnt = 0;
 
@@ -303,8 +309,8 @@ int __init acpi_numa_init(void)
NR_NODE_MEMBLKS);
}
 
-   /* SLIT: System Locality Information Table */
-   acpi_table_parse(ACPI_SIG_SLIT, acpi_parse_slit);
+   if (with_slit)
+   acpi_numa_init_only_slit();
 
acpi_numa_arch_fixup();
 
@@ -315,6 +321,16 @@ int __init acpi_numa_init(void)
return 0;
 }
 
+int __init acpi_numa_init(void)
+{
+   return __acpi_numa_init(true);
+}
+
+int __init acpi_numa_init_no_slit(void)
+{
+   return __acpi_numa_init(false);
+}
+
 int acpi_get_pxm(acpi_handle h)
 {
unsigned long long pxm;
diff --git a/include/linux/acpi.h b/include/linux/acpi.h
index 46a8a89..bfd2852 100644
--- a/include/linux/acpi.h
+++ b/include/linux/acpi.h
@@ -86,6 +86,8 @@ int acpi_boot_init (void);
 void acpi_boot_table_init (void);
 int acpi_mps_check (void);
 int acpi_numa_init (void);
+int acpi_numa_init_no_slit(void);
+void acpi_numa_init_only_slit(void);
 
 int acpi_table_init (void);
 int acpi_table_parse(char *id, acpi_tbl_table_handler handler);
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 06/14] x86, mm, numa: Move successful path handling code later

2013-03-07 Thread Yinghai Lu

We could move setup_node_data() and numa_init_array() calling out
numa_init() to make numa_init() small.

Those functions only need to be called for success path, and only
call them one time in x86_numa_init().

So later we could split parse numa info to two stages.
early one will be before init_mem_mapping.

Signed-off-by: Yinghai Lu 
Cc: Tejun Heo 
---
 arch/x86/mm/numa.c |   68 
 1 file changed, 37 insertions(+), 31 deletions(-)

diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
index 72fe01e..24c20f0 100644
--- a/arch/x86/mm/numa.c
+++ b/arch/x86/mm/numa.c
@@ -480,7 +480,7 @@ static bool __init numa_meminfo_cover_memory(const struct 
numa_meminfo *mi)
 static int __init numa_register_memblks(struct numa_meminfo *mi)
 {
unsigned long uninitialized_var(pfn_align);
-   int i, nid;
+   int i;
 
/* Account for nodes with cpus and no memory */
node_possible_map = numa_nodes_parsed;
@@ -509,24 +509,6 @@ static int __init numa_register_memblks(struct 
numa_meminfo *mi)
if (!numa_meminfo_cover_memory(mi))
return -EINVAL;
 
-   /* Finally register nodes. */
-   for_each_node_mask(nid, node_possible_map) {
-   u64 start = PFN_PHYS(max_pfn);
-   u64 end = 0;
-
-   for (i = 0; i < mi->nr_blks; i++) {
-   if (nid != mi->blk[i].nid)
-   continue;
-   start = min(mi->blk[i].start, start);
-   end = max(mi->blk[i].end, end);
-   }
-
-   if (start < end)
-   setup_node_data(nid, start, end);
-   }
-
-   /* Dump memblock with node info and return. */
-   memblock_dump_all();
return 0;
 }
 
@@ -580,15 +562,6 @@ static int __init numa_init(int (*init_func)(void))
if (ret < 0)
return ret;
 
-   for (i = 0; i < nr_cpu_ids; i++) {
-   int nid = early_cpu_to_node(i);
-
-   if (nid == NUMA_NO_NODE)
-   continue;
-   if (!node_online(nid))
-   numa_clear_node(i);
-   }
-   numa_init_array();
return 0;
 }
 
@@ -623,22 +596,55 @@ static int __init dummy_numa_init(void)
  */
 void __init x86_numa_init(void)
 {
+   int i, nid;
+   struct numa_meminfo *mi = &numa_meminfo;
+
if (!numa_off) {
 #ifdef CONFIG_X86_NUMAQ
if (!numa_init(numaq_numa_init))
-   return;
+   goto out;
 #endif
 #ifdef CONFIG_ACPI_NUMA
if (!numa_init(x86_acpi_numa_init))
-   return;
+   goto out;
 #endif
 #ifdef CONFIG_AMD_NUMA
if (!numa_init(amd_numa_init))
-   return;
+   goto out;
 #endif
}
 
numa_init(dummy_numa_init);
+
+out:
+   /* Finally register nodes. */
+   for_each_node_mask(nid, node_possible_map) {
+   u64 start = PFN_PHYS(max_pfn);
+   u64 end = 0;
+
+   for (i = 0; i < mi->nr_blks; i++) {
+   if (nid != mi->blk[i].nid)
+   continue;
+   start = min(mi->blk[i].start, start);
+   end = max(mi->blk[i].end, end);
+   }
+
+   if (start < end)
+   setup_node_data(nid, start, end);
+   }
+
+   /* Dump memblock with node info */
+   memblock_dump_all();
+
+   for (i = 0; i < nr_cpu_ids; i++) {
+   int nid = early_cpu_to_node(i);
+
+   if (nid == NUMA_NO_NODE)
+   continue;
+   if (!node_online(nid))
+   numa_clear_node(i);
+   }
+   numa_init_array();
 }
 
 static __init int find_near_online_node(int node)
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 04/14] x86, ACPI: make acpi override finding work with 32bit flat mode

2013-03-07 Thread Yinghai Lu

We will find acpi tables in initrd during head_32.S in 32bit flat mode.

So need acpi_initrd_override_find could take phys directly.

Signed-off-by: Yinghai Lu 
Cc: Thomas Renninger 
Cc: Pekka Enberg 
Cc: Jacob Shin 
Cc: Rafael J. Wysocki 
Cc: linux-a...@vger.kernel.org
---
 arch/x86/kernel/setup.c |2 +-
 drivers/acpi/osl.c  |   84 +++
 include/linux/acpi.h|4 +--
 3 files changed, 58 insertions(+), 32 deletions(-)

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index e2913e9..668e658 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -1093,7 +1093,7 @@ void __init setup_arch(char **cmdline_p)
reserve_initrd();
 
acpi_initrd_override_find((void *)initrd_start,
-   initrd_end - initrd_start);
+   initrd_end - initrd_start, false);
acpi_initrd_override_copy();
 
reserve_crashkernel();
diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
index 60317ea..b375159 100644
--- a/drivers/acpi/osl.c
+++ b/drivers/acpi/osl.c
@@ -552,38 +552,47 @@ u8 __init acpi_table_checksum(u8 *buffer, u32 length)
return sum;
 }
 
-/* All but ACPI_SIG_RSDP and ACPI_SIG_FACS: */
-static const char * const table_sigs[] = {
-   ACPI_SIG_BERT, ACPI_SIG_CPEP, ACPI_SIG_ECDT, ACPI_SIG_EINJ,
-   ACPI_SIG_ERST, ACPI_SIG_HEST, ACPI_SIG_MADT, ACPI_SIG_MSCT,
-   ACPI_SIG_SBST, ACPI_SIG_SLIT, ACPI_SIG_SRAT, ACPI_SIG_ASF,
-   ACPI_SIG_BOOT, ACPI_SIG_DBGP, ACPI_SIG_DMAR, ACPI_SIG_HPET,
-   ACPI_SIG_IBFT, ACPI_SIG_IVRS, ACPI_SIG_MCFG, ACPI_SIG_MCHI,
-   ACPI_SIG_SLIC, ACPI_SIG_SPCR, ACPI_SIG_SPMI, ACPI_SIG_TCPA,
-   ACPI_SIG_UEFI, ACPI_SIG_WAET, ACPI_SIG_WDAT, ACPI_SIG_WDDT,
-   ACPI_SIG_WDRT, ACPI_SIG_DSDT, ACPI_SIG_FADT, ACPI_SIG_PSDT,
-   ACPI_SIG_RSDT, ACPI_SIG_XSDT, ACPI_SIG_SSDT, NULL };
-
 /* Non-fatal errors: Affected tables/files are ignored */
 #define INVALID_TABLE(x, path, name)   \
-   { pr_err("ACPI OVERRIDE: " x " [%s%s]\n", path, name); continue; }
+   do { pr_err("ACPI OVERRIDE: " x " [%s%s]\n", path, name); } while (0)
 
 #define ACPI_HEADER_SIZE sizeof(struct acpi_table_header)
 
 #define ACPI_OVERRIDE_TABLES 64
 static struct cpio_data __initdata early_initrd_files[ACPI_OVERRIDE_TABLES];
 
-void __init acpi_initrd_override_find(void *data, size_t size)
+void __init acpi_initrd_override_find(void *data, size_t size, bool is_phys)
 {
int sig, no;
long offset = 0;
struct acpi_table_header *table;
char cpio_path[32] = "kernel/firmware/acpi/";
struct cpio_data file;
+   struct cpio_data *files = early_initrd_files;
+   int *all_tables_size_p = &all_tables_size;
+   int *table_nr_p = &table_nr;
+
+   /* All but ACPI_SIG_RSDP and ACPI_SIG_FACS: */
+   char *table_sigs[] = {
+   ACPI_SIG_BERT, ACPI_SIG_CPEP, ACPI_SIG_ECDT, ACPI_SIG_EINJ,
+   ACPI_SIG_ERST, ACPI_SIG_HEST, ACPI_SIG_MADT, ACPI_SIG_MSCT,
+   ACPI_SIG_SBST, ACPI_SIG_SLIT, ACPI_SIG_SRAT, ACPI_SIG_ASF,
+   ACPI_SIG_BOOT, ACPI_SIG_DBGP, ACPI_SIG_DMAR, ACPI_SIG_HPET,
+   ACPI_SIG_IBFT, ACPI_SIG_IVRS, ACPI_SIG_MCFG, ACPI_SIG_MCHI,
+   ACPI_SIG_SLIC, ACPI_SIG_SPCR, ACPI_SIG_SPMI, ACPI_SIG_TCPA,
+   ACPI_SIG_UEFI, ACPI_SIG_WAET, ACPI_SIG_WDAT, ACPI_SIG_WDDT,
+   ACPI_SIG_WDRT, ACPI_SIG_DSDT, ACPI_SIG_FADT, ACPI_SIG_PSDT,
+   ACPI_SIG_RSDT, ACPI_SIG_XSDT, ACPI_SIG_SSDT, NULL };
 
if (data == NULL || size == 0)
return;
 
+   if (is_phys) {
+   files = (struct cpio_data *)__pa_symbol(early_initrd_files);
+   all_tables_size_p = (int *)__pa_symbol(&all_tables_size);
+   table_nr_p = (int *)__pa_symbol(&table_nr);
+   }
+
for (no = 0; no < ACPI_OVERRIDE_TABLES; no++) {
file = find_cpio_data(cpio_path, data, size, &offset);
if (!file.data)
@@ -592,9 +601,12 @@ void __init acpi_initrd_override_find(void *data, size_t 
size)
data += offset;
size -= offset;
 
-   if (file.size < sizeof(struct acpi_table_header))
-   INVALID_TABLE("Table smaller than ACPI header",
+   if (file.size < sizeof(struct acpi_table_header)) {
+   if (!is_phys)
+   INVALID_TABLE("Table smaller than ACPI header",
  cpio_path, file.name);
+   continue;
+   }
 
table = file.data;
 
@@ -602,23 +614,34 @@ void __init acpi_initrd_override_find(void *data, size_t 
size)
if (!memcmp(table->signature, table_sigs[sig], 4))
break;
 
-   if (!table_sigs[sig])
-   INVALID_TABLE("Unknown sign

[PATCH 10/14] x86, mm, numa: Move emulation handling down.

2013-03-07 Thread Yinghai Lu

It will need to allocate buffer for new numa_meminfo and
distance matrix, so move it down.

Also we change the behavoir:
before this patch, if user input wrong data in command line, it
will fall back to next numa or disabling numa.
after this patch, if user input wrong data in command line, it will
stay with numa info from probing before, like acpi srat or amd_numa.

Signed-off-by: Yinghai Lu 
Cc: Tejun Heo 
Cc: David Rientjes 
---
 arch/x86/mm/numa.c   |   15 +--
 arch/x86/mm/numa_emulation.c |2 +-
 arch/x86/mm/numa_internal.h  |2 ++
 3 files changed, 12 insertions(+), 7 deletions(-)

diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
index e875c2b..ace0370 100644
--- a/arch/x86/mm/numa.c
+++ b/arch/x86/mm/numa.c
@@ -537,14 +537,16 @@ static unsigned long __init node_map_pfn_alignment(struct 
numa_meminfo *mi)
 }
 #endif
 
-static int __init numa_check_memblks(struct numa_meminfo *mi)
+
+int __init numa_check_memblks(struct numa_meminfo *mi)
 {
+   nodemask_t tmp_node_map;
unsigned long pfn_align;
 
/* Account for nodes with cpus and no memory */
-   node_possible_map = numa_nodes_parsed;
-   numa_nodemask_from_meminfo(&node_possible_map, mi);
-   if (WARN_ON(nodes_empty(node_possible_map)))
+   tmp_node_map = numa_nodes_parsed;
+   numa_nodemask_from_meminfo(&tmp_node_map, mi);
+   if (WARN_ON(nodes_empty(tmp_node_map)))
return -EINVAL;
 
if (!numa_meminfo_cover_memory(mi))
@@ -562,6 +564,7 @@ static int __init numa_check_memblks(struct numa_meminfo 
*mi)
return -EINVAL;
}
 
+   node_possible_map = tmp_node_map;
return 0;
 }
 
@@ -608,8 +611,6 @@ static int __init numa_init(int (*init_func)(void))
if (ret < 0)
return ret;
 
-   numa_emulation(&numa_meminfo, numa_distance_cnt);
-
ret = numa_check_memblks(&numa_meminfo);
if (ret < 0)
return ret;
@@ -669,6 +670,8 @@ void __init x86_numa_init(void)
numa_init(dummy_numa_init);
 
 out:
+   numa_emulation(&numa_meminfo, numa_distance_cnt);
+
for (i = 0; i < mi->nr_blks; i++) {
struct numa_memblk *mb = &mi->blk[i];
memblock_set_node(mb->start, mb->end - mb->start, mb->nid);
diff --git a/arch/x86/mm/numa_emulation.c b/arch/x86/mm/numa_emulation.c
index d47..5a0433d 100644
--- a/arch/x86/mm/numa_emulation.c
+++ b/arch/x86/mm/numa_emulation.c
@@ -348,7 +348,7 @@ void __init numa_emulation(struct numa_meminfo 
*numa_meminfo, int numa_dist_cnt)
if (ret < 0)
goto no_emu;
 
-   if (numa_cleanup_meminfo(&ei) < 0) {
+   if (numa_cleanup_meminfo(&ei) < 0 || numa_check_memblks(&ei) < 0) {
pr_warning("NUMA: Warning: constructed meminfo invalid, 
disabling emulation\n");
goto no_emu;
}
diff --git a/arch/x86/mm/numa_internal.h b/arch/x86/mm/numa_internal.h
index ad86ec9..bb2fbcc 100644
--- a/arch/x86/mm/numa_internal.h
+++ b/arch/x86/mm/numa_internal.h
@@ -21,6 +21,8 @@ void __init numa_reset_distance(void);
 
 void __init x86_numa_init(void);
 
+int __init numa_check_memblks(struct numa_meminfo *mi);
+
 #ifdef CONFIG_NUMA_EMU
 void __init numa_emulation(struct numa_meminfo *numa_meminfo,
   int numa_dist_cnt);
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 01/14] x86, ACPI, mm: Kill max_low_pfn_mapped

2013-03-07 Thread Yinghai Lu

Now we have arch_pfn_mapped array, and max_low_pfn_mapped should not
be used anymore.

Only user is ACPI_OVERRIDE, and it should not use that, as later
accessing is using early_remap. Change to try to 4G below and
then 4G above.

Other user is in drm/i915, but it is commented out.

Should use arch_pfn_mapped or just 1<<(32-PAGE_SHIFT) instead.

Suggested-by: H. Peter Anvin 
Signed-off-by: Yinghai Lu 
Cc: Thomas Renninger 
Cc: "Rafael J. Wysocki" 
Cc: Daniel Vetter 
Cc: David Airlie 
Cc: Jacob Shin 
Cc: linux-a...@vger.kernel.org
Cc: dri-de...@lists.freedesktop.org
---
 arch/x86/include/asm/page_types.h  |1 -
 arch/x86/kernel/setup.c|4 +---
 arch/x86/mm/init.c |4 
 drivers/acpi/osl.c |9 ++---
 drivers/gpu/drm/i915/i915_gem_stolen.c |2 +-
 5 files changed, 8 insertions(+), 12 deletions(-)

diff --git a/arch/x86/include/asm/page_types.h 
b/arch/x86/include/asm/page_types.h
index 54c9787..b012b82 100644
--- a/arch/x86/include/asm/page_types.h
+++ b/arch/x86/include/asm/page_types.h
@@ -43,7 +43,6 @@
 
 extern int devmem_is_allowed(unsigned long pagenr);
 
-extern unsigned long max_low_pfn_mapped;
 extern unsigned long max_pfn_mapped;
 
 static inline phys_addr_t get_max_mapped(void)
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 90d8cc9..4dcaae7 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -113,13 +113,11 @@
 #include 
 
 /*
- * max_low_pfn_mapped: highest direct mapped pfn under 4GB
- * max_pfn_mapped: highest direct mapped pfn over 4GB
+ * max_pfn_mapped: highest direct mapped pfn
  *
  * The direct mapping only covers E820_RAM regions, so the ranges and gaps are
  * represented by pfn_mapped
  */
-unsigned long max_low_pfn_mapped;
 unsigned long max_pfn_mapped;
 
 #ifdef CONFIG_DMI
diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index 59b7fc4..abcc241 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -313,10 +313,6 @@ static void add_pfn_range_mapped(unsigned long start_pfn, 
unsigned long end_pfn)
nr_pfn_mapped = clean_sort_range(pfn_mapped, E820_X_MAX);
 
max_pfn_mapped = max(max_pfn_mapped, end_pfn);
-
-   if (start_pfn < (1UL<<(32-PAGE_SHIFT)))
-   max_low_pfn_mapped = max(max_low_pfn_mapped,
-min(end_pfn, 1UL<<(32-PAGE_SHIFT)));
 }
 
 bool pfn_range_is_mapped(unsigned long start_pfn, unsigned long end_pfn)
diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
index 586e7e9..c9e36d7 100644
--- a/drivers/acpi/osl.c
+++ b/drivers/acpi/osl.c
@@ -624,9 +624,12 @@ void __init acpi_initrd_override(void *data, size_t size)
if (table_nr == 0)
return;
 
-   acpi_tables_addr =
-   memblock_find_in_range(0, max_low_pfn_mapped << PAGE_SHIFT,
-  all_tables_size, PAGE_SIZE);
+   /* under 4G at first, then above 4G */
+   acpi_tables_addr = memblock_find_in_range(0, 1ULL<<32,
+   all_tables_size, PAGE_SIZE);
+   if (!acpi_tables_addr)
+   acpi_tables_addr = memblock_find_in_range(1ULL<<32, -1ULL,
+   all_tables_size, PAGE_SIZE);
if (!acpi_tables_addr) {
WARN_ON(1);
return;
diff --git a/drivers/gpu/drm/i915/i915_gem_stolen.c 
b/drivers/gpu/drm/i915/i915_gem_stolen.c
index 69d97cb..7f9380b 100644
--- a/drivers/gpu/drm/i915/i915_gem_stolen.c
+++ b/drivers/gpu/drm/i915/i915_gem_stolen.c
@@ -81,7 +81,7 @@ static unsigned long i915_stolen_to_physical(struct 
drm_device *dev)
base -= dev_priv->mm.gtt->stolen_size;
} else {
/* Stolen is immediately above Top of Memory */
-   base = max_low_pfn_mapped << PAGE_SHIFT;
+   base = __REMOVED_CRAZY__ << PAGE_SHIFT;
 #endif
}
 
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 08/14] x86, mm, numa: use numa_meminfo to check node_map_pfn alignment

2013-03-07 Thread Yinghai Lu

We could use numa_meminfo directly instead of memblock nid.

So we could move down set memblock nid down and only do it one time
for successful path

Move node_map_pfn_alignment() to arch/x86/mm as no other user for it.

Signed-off-by: Yinghai Lu 
Cc: Tejun Heo 
---
 arch/x86/mm/numa.c |   76 +---
 include/linux/mm.h |1 -
 mm/page_alloc.c|   50 --
 3 files changed, 67 insertions(+), 60 deletions(-)

diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
index 6df5028..b8cc248 100644
--- a/arch/x86/mm/numa.c
+++ b/arch/x86/mm/numa.c
@@ -477,9 +477,69 @@ static bool __init numa_meminfo_cover_memory(const struct 
numa_meminfo *mi)
return true;
 }
 
+/**
+ * node_map_pfn_alignment - determine the maximum internode alignment
+ *
+ * This function should be called after node map is populated and sorted.
+ * It calculates the maximum power of two alignment which can distinguish
+ * all the nodes.
+ *
+ * For example, if all nodes are 1GiB and aligned to 1GiB, the return value
+ * would indicate 1GiB alignment with (1 << (30 - PAGE_SHIFT)).  If the
+ * nodes are shifted by 256MiB, 256MiB.  Note that if only the last node is
+ * shifted, 1GiB is enough and this function will indicate so.
+ *
+ * This is used to test whether pfn -> nid mapping of the chosen memory
+ * model has fine enough granularity to avoid incorrect mapping for the
+ * populated node map.
+ *
+ * Returns the determined alignment in pfn's.  0 if there is no alignment
+ * requirement (single node).
+ */
+#ifdef NODE_NOT_IN_PAGE_FLAGS
+static unsigned long __init node_map_pfn_alignment(struct numa_meminfo *mi)
+{
+   unsigned long accl_mask = 0, last_end = 0;
+   unsigned long start, end, mask;
+   int last_nid = -1;
+   int i, nid;
+
+   for (i = 0; i < mi->nr_blks; i++) {
+   start = mi->blk[i].start >> PAGE_SHIFT;
+   end = mi->blk[i].end >> PAGE_SHIFT;
+   nid = mi->blk[i].nid;
+   if (!start || last_nid < 0 || last_nid == nid) {
+   last_nid = nid;
+   last_end = end;
+   continue;
+   }
+
+   /*
+* Start with a mask granular enough to pin-point to the
+* start pfn and tick off bits one-by-one until it becomes
+* too coarse to separate the current node from the last.
+*/
+   mask = ~((1 << __ffs(start)) - 1);
+   while (mask && last_end <= (start & (mask << 1)))
+   mask <<= 1;
+
+   /* accumulate all internode masks */
+   accl_mask |= mask;
+   }
+
+   /* convert mask to number of pages */
+   return ~accl_mask + 1;
+}
+#else
+static unsigned long __init node_map_pfn_alignment(struct numa_meminfo *mi)
+{
+   return 0;
+}
+#endif
+
 static int __init numa_register_memblks(struct numa_meminfo *mi)
 {
-   unsigned long uninitialized_var(pfn_align);
+   unsigned long pfn_align;
int i;
 
/* Account for nodes with cpus and no memory */
@@ -491,24 +551,22 @@ static int __init numa_register_memblks(struct 
numa_meminfo *mi)
if (!numa_meminfo_cover_memory(mi))
return -EINVAL;
 
-   for (i = 0; i < mi->nr_blks; i++) {
-   struct numa_memblk *mb = &mi->blk[i];
-   memblock_set_node(mb->start, mb->end - mb->start, mb->nid);
-   }
-
/*
 * If sections array is gonna be used for pfn -> nid mapping, check
 * whether its granularity is fine enough.
 */
-#ifdef NODE_NOT_IN_PAGE_FLAGS
-   pfn_align = node_map_pfn_alignment();
+   pfn_align = node_map_pfn_alignment(mi);
if (pfn_align && pfn_align < PAGES_PER_SECTION) {
printk(KERN_WARNING "Node alignment %LuMB < min %LuMB, 
rejecting NUMA config\n",
   PFN_PHYS(pfn_align) >> 20,
   PFN_PHYS(PAGES_PER_SECTION) >> 20);
return -EINVAL;
}
-#endif
+
+   for (i = 0; i < mi->nr_blks; i++) {
+   struct numa_memblk *mb = &mi->blk[i];
+   memblock_set_node(mb->start, mb->end - mb->start, mb->nid);
+   }
 
return 0;
 }
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 2ae2050..1c79b10 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1323,7 +1323,6 @@ extern void free_initmem(void);
  * CONFIG_HAVE_MEMBLOCK_NODE_MAP.
  */
 extern void free_area_init_nodes(unsigned long *max_zone_pfn);
-unsigned long node_map_pfn_alignment(void);
 extern unsigned long absent_pages_in_range(unsigned long start_pfn,
unsigned long end_pfn);
 extern void get_pfn_range_for_nid(unsigned int nid,
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 580d919..f368db4 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4725,56 +4725,6 @@ static inlin

[PATCH 02/14] x86, ACPI: Split find/copy from acpi_initrd_override

2013-03-07 Thread Yinghai Lu

To parse srat early, we will need to move acpi table probing early.
and to keep acpi_initrd_table_override working, we need to move it
ahead.

But current that is called after init_mem_mapping and relocate_initrd().

Copying need to be after memblock is ready, because it need to allocate
some buffer for acpi tables.

Finding will be moved into head_32.S and head64.c, just like microcode
early scanning.

So split them at first.

Also move down functions declaration to avoid #ifdef in setup.c

Signed-off-by: Yinghai 
Cc: Thomas Renninger 
Cc: Pekka Enberg 
Cc: Jacob Shin 
Cc: Rafael J. Wysocki 
Cc: linux-a...@vger.kernel.org
---
 arch/x86/kernel/setup.c |6 +++---
 drivers/acpi/osl.c  |   32 +++-
 include/linux/acpi.h|   16 
 3 files changed, 30 insertions(+), 24 deletions(-)

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 4dcaae7..e2913e9 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -1092,9 +1092,9 @@ void __init setup_arch(char **cmdline_p)
 
reserve_initrd();
 
-#if defined(CONFIG_ACPI) && defined(CONFIG_BLK_DEV_INITRD)
-   acpi_initrd_override((void *)initrd_start, initrd_end - initrd_start);
-#endif
+   acpi_initrd_override_find((void *)initrd_start,
+   initrd_end - initrd_start);
+   acpi_initrd_override_copy();
 
reserve_crashkernel();
 
diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
index c9e36d7..b9d2ff0 100644
--- a/drivers/acpi/osl.c
+++ b/drivers/acpi/osl.c
@@ -539,6 +539,7 @@ acpi_os_predefined_override(const struct 
acpi_predefined_names *init_val,
 
 static u64 acpi_tables_addr;
 static int all_tables_size;
+static int table_nr;
 
 /* Copied from acpica/tbutils.c:acpi_tb_checksum() */
 u8 __init acpi_table_checksum(u8 *buffer, u32 length)
@@ -569,18 +570,16 @@ static const char * const table_sigs[] = {
 
 #define ACPI_HEADER_SIZE sizeof(struct acpi_table_header)
 
-/* Must not increase 10 or needs code modification below */
-#define ACPI_OVERRIDE_TABLES 10
+#define ACPI_OVERRIDE_TABLES 64
+static struct cpio_data __initdata early_initrd_files[ACPI_OVERRIDE_TABLES];
 
-void __init acpi_initrd_override(void *data, size_t size)
+void __init acpi_initrd_override_find(void *data, size_t size)
 {
-   int sig, no, table_nr = 0, total_offset = 0;
+   int sig, no;
long offset = 0;
struct acpi_table_header *table;
char cpio_path[32] = "kernel/firmware/acpi/";
struct cpio_data file;
-   struct cpio_data early_initrd_files[ACPI_OVERRIDE_TABLES];
-   char *p;
 
if (data == NULL || size == 0)
return;
@@ -621,7 +620,14 @@ void __init acpi_initrd_override(void *data, size_t size)
early_initrd_files[table_nr].size = file.size;
table_nr++;
}
-   if (table_nr == 0)
+}
+
+void __init acpi_initrd_override_copy(void)
+{
+   int no, total_offset = 0;
+   char *p;
+
+   if (!table_nr)
return;
 
/* under 4G at first, then above 4G */
@@ -647,14 +653,14 @@ void __init acpi_initrd_override(void *data, size_t size)
memblock_reserve(acpi_tables_addr, acpi_tables_addr + all_tables_size);
arch_reserve_mem_area(acpi_tables_addr, all_tables_size);
 
-   p = early_ioremap(acpi_tables_addr, all_tables_size);
-
for (no = 0; no < table_nr; no++) {
-   memcpy(p + total_offset, early_initrd_files[no].data,
-  early_initrd_files[no].size);
-   total_offset += early_initrd_files[no].size;
+   size_t size = early_initrd_files[no].size;
+
+   p = early_ioremap(acpi_tables_addr + total_offset, size);
+   memcpy(p, early_initrd_files[no].data, size);
+   early_iounmap(p, size);
+   total_offset += size;
}
-   early_iounmap(p, all_tables_size);
 }
 #endif /* CONFIG_ACPI_INITRD_TABLE_OVERRIDE */
 
diff --git a/include/linux/acpi.h b/include/linux/acpi.h
index bcbdd74..1654a241 100644
--- a/include/linux/acpi.h
+++ b/include/linux/acpi.h
@@ -79,14 +79,6 @@ typedef int (*acpi_tbl_table_handler)(struct 
acpi_table_header *table);
 typedef int (*acpi_tbl_entry_handler)(struct acpi_subtable_header *header,
  const unsigned long end);
 
-#ifdef CONFIG_ACPI_INITRD_TABLE_OVERRIDE
-void acpi_initrd_override(void *data, size_t size);
-#else
-static inline void acpi_initrd_override(void *data, size_t size)
-{
-}
-#endif
-
 char * __acpi_map_table (unsigned long phys_addr, unsigned long size);
 void __acpi_unmap_table(char *map, unsigned long size);
 int early_acpi_boot_init(void);
@@ -485,6 +477,14 @@ static inline bool acpi_driver_match_device(struct device 
*dev,
 
 #endif /* !CONFIG_ACPI */
 
+#ifdef CONFIG_ACPI_INITRD_TABLE_OVERRIDE
+void acpi_initrd_override_find(void *data, size_t size);
+void acpi_initrd_override_copy(void);
+#else
+static inline v

[PATCH 07/14] x86, mm, numa: call numa_meminfo_cover_memory() early

2013-03-07 Thread Yinghai Lu

We do not need to use nid in memblock to find out absent pages.

So could move that numa_meminfo_cover_memory() early before set
memblock nid.

Also could make __absent_pages_in_range() to static and use
absent_pages_in_range() directly.

Later will only set memblock nid one time on successful path.

Signed-off-by: Yinghai Lu 
Cc: Tejun Heo 
---
 arch/x86/mm/numa.c |7 ---
 include/linux/mm.h |2 --
 mm/page_alloc.c|2 +-
 3 files changed, 5 insertions(+), 6 deletions(-)

diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
index 24c20f0..6df5028 100644
--- a/arch/x86/mm/numa.c
+++ b/arch/x86/mm/numa.c
@@ -460,7 +460,7 @@ static bool __init numa_meminfo_cover_memory(const struct 
numa_meminfo *mi)
u64 s = mi->blk[i].start >> PAGE_SHIFT;
u64 e = mi->blk[i].end >> PAGE_SHIFT;
numaram += e - s;
-   numaram -= __absent_pages_in_range(mi->blk[i].nid, s, e);
+   numaram -= absent_pages_in_range(s, e);
if ((s64)numaram < 0)
numaram = 0;
}
@@ -488,6 +488,9 @@ static int __init numa_register_memblks(struct numa_meminfo 
*mi)
if (WARN_ON(nodes_empty(node_possible_map)))
return -EINVAL;
 
+   if (!numa_meminfo_cover_memory(mi))
+   return -EINVAL;
+
for (i = 0; i < mi->nr_blks; i++) {
struct numa_memblk *mb = &mi->blk[i];
memblock_set_node(mb->start, mb->end - mb->start, mb->nid);
@@ -506,8 +509,6 @@ static int __init numa_register_memblks(struct numa_meminfo 
*mi)
return -EINVAL;
}
 #endif
-   if (!numa_meminfo_cover_memory(mi))
-   return -EINVAL;
 
return 0;
 }
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 7acc9dc..2ae2050 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1324,8 +1324,6 @@ extern void free_initmem(void);
  */
 extern void free_area_init_nodes(unsigned long *max_zone_pfn);
 unsigned long node_map_pfn_alignment(void);
-unsigned long __absent_pages_in_range(int nid, unsigned long start_pfn,
-   unsigned long end_pfn);
 extern unsigned long absent_pages_in_range(unsigned long start_pfn,
unsigned long end_pfn);
 extern void get_pfn_range_for_nid(unsigned int nid,
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 8fcced7..580d919 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4356,7 +4356,7 @@ static unsigned long __meminit 
zone_spanned_pages_in_node(int nid,
  * Return the number of holes in a range on a node. If nid is MAX_NUMNODES,
  * then all holes in the requested range will be accounted for.
  */
-unsigned long __meminit __absent_pages_in_range(int nid,
+static unsigned long __meminit __absent_pages_in_range(int nid,
unsigned long range_start_pfn,
unsigned long range_end_pfn)
 {
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 13/14] x86, mm: Parse numa info early

2013-03-07 Thread Yinghai Lu

Parse numa info at first and store info into numa_meminfo.

call early_initmem_init before init_memory_mapping(), will
have numa info ready at first, and will still keep numaq, acpi_numa,
amd_numa, dummy fall back sequence.

SLIT and numa emulation handling are still left in initmem_init().

Signed-off-by: Yinghai Lu 
Cc: Pekka Enberg 
Cc: Jacob Shin 
---
 arch/x86/kernel/setup.c |   24 ++--
 1 file changed, 10 insertions(+), 14 deletions(-)

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index c4f1c63..29a6b94 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -1096,13 +1096,21 @@ void __init setup_arch(char **cmdline_p)
trim_platform_memory_ranges();
trim_low_memory_range();
 
+   /*
+* Parse the ACPI tables for possible boot-time SMP configuration.
+*/
+   acpi_initrd_override_copy();
+   acpi_boot_table_init();
+   early_acpi_boot_init();
+   early_initmem_init();
init_mem_mapping();
-
+   memblock.current_limit = get_max_mapped();
early_trap_pf_init();
 
+   reserve_initrd();
+
setup_real_mode();
 
-   memblock.current_limit = get_max_mapped();
dma_contiguous_reserve(0);
 
/*
@@ -1116,24 +1124,12 @@ void __init setup_arch(char **cmdline_p)
/* Allocate bigger log buffer */
setup_log_buf(1);
 
-   reserve_initrd();
-
-   acpi_initrd_override_copy();
-
reserve_crashkernel();
 
vsmp_init();
 
io_delay_init();
 
-   /*
-* Parse the ACPI tables for possible boot-time SMP configuration.
-*/
-   acpi_boot_table_init();
-
-   early_acpi_boot_init();
-
-   early_initmem_init();
initmem_init();
memblock_find_dma_reserve();
 
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 09/14] x86, mm, numa: set memblock nid later

2013-03-07 Thread Yinghai Lu

Only set memblock nid one time.

Also rename numa_register_memblks to numa_check_memblks()
after move set memblock nid out.

Signed-off-by: Yinghai Lu 
Cc: Tejun Heo 
---
 arch/x86/mm/numa.c |   16 +++-
 1 file changed, 7 insertions(+), 9 deletions(-)

diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
index b8cc248..e875c2b 100644
--- a/arch/x86/mm/numa.c
+++ b/arch/x86/mm/numa.c
@@ -537,10 +537,9 @@ static unsigned long __init node_map_pfn_alignment(struct 
numa_meminfo *mi)
 }
 #endif
 
-static int __init numa_register_memblks(struct numa_meminfo *mi)
+static int __init numa_check_memblks(struct numa_meminfo *mi)
 {
unsigned long pfn_align;
-   int i;
 
/* Account for nodes with cpus and no memory */
node_possible_map = numa_nodes_parsed;
@@ -563,11 +562,6 @@ static int __init numa_register_memblks(struct 
numa_meminfo *mi)
return -EINVAL;
}
 
-   for (i = 0; i < mi->nr_blks; i++) {
-   struct numa_memblk *mb = &mi->blk[i];
-   memblock_set_node(mb->start, mb->end - mb->start, mb->nid);
-   }
-
return 0;
 }
 
@@ -605,7 +599,6 @@ static int __init numa_init(int (*init_func)(void))
nodes_clear(node_possible_map);
nodes_clear(node_online_map);
memset(&numa_meminfo, 0, sizeof(numa_meminfo));
-   WARN_ON(memblock_set_node(0, ULLONG_MAX, MAX_NUMNODES));
numa_reset_distance();
 
ret = init_func();
@@ -617,7 +610,7 @@ static int __init numa_init(int (*init_func)(void))
 
numa_emulation(&numa_meminfo, numa_distance_cnt);
 
-   ret = numa_register_memblks(&numa_meminfo);
+   ret = numa_check_memblks(&numa_meminfo);
if (ret < 0)
return ret;
 
@@ -676,6 +669,11 @@ void __init x86_numa_init(void)
numa_init(dummy_numa_init);
 
 out:
+   for (i = 0; i < mi->nr_blks; i++) {
+   struct numa_memblk *mb = &mi->blk[i];
+   memblock_set_node(mb->start, mb->end - mb->start, mb->nid);
+   }
+
/* Finally register nodes. */
for_each_node_mask(nid, node_possible_map) {
u64 start = PFN_PHYS(max_pfn);
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 12/14] x86, mm, numa: Add early_initmem_init() stub

2013-03-07 Thread Yinghai Lu

early_initmem_init() will call early_x86_numa_init().

later will call init_mem_mapping for nodes in it.

Signed-off-by: Yinghai Lu 
Cc: Tejun Heo 
Cc: Pekka Enberg 
Cc: Jacob Shin 
---
 arch/x86/include/asm/page_types.h |1 +
 arch/x86/kernel/setup.c   |1 +
 arch/x86/mm/init_32.c |3 +++
 arch/x86/mm/init_64.c |3 +++
 arch/x86/mm/numa.c|   23 +++
 5 files changed, 23 insertions(+), 8 deletions(-)

diff --git a/arch/x86/include/asm/page_types.h 
b/arch/x86/include/asm/page_types.h
index b012b82..d04dd8c 100644
--- a/arch/x86/include/asm/page_types.h
+++ b/arch/x86/include/asm/page_types.h
@@ -55,6 +55,7 @@ bool pfn_range_is_mapped(unsigned long start_pfn, unsigned 
long end_pfn);
 extern unsigned long init_memory_mapping(unsigned long start,
 unsigned long end);
 
+void early_initmem_init(void);
 extern void initmem_init(void);
 
 #endif /* !__ASSEMBLY__ */
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index d43545a..c4f1c63 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -1133,6 +1133,7 @@ void __init setup_arch(char **cmdline_p)
 
early_acpi_boot_init();
 
+   early_initmem_init();
initmem_init();
memblock_find_dma_reserve();
 
diff --git a/arch/x86/mm/init_32.c b/arch/x86/mm/init_32.c
index 2d19001..3801962 100644
--- a/arch/x86/mm/init_32.c
+++ b/arch/x86/mm/init_32.c
@@ -660,6 +660,9 @@ void __init find_low_pfn_range(void)
 }
 
 #ifndef CONFIG_NEED_MULTIPLE_NODES
+void __init early_initmem_init(void)
+{
+}
 void __init initmem_init(void)
 {
 #ifdef CONFIG_HIGHMEM
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 474e28f..218a4e5 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -640,6 +640,9 @@ kernel_physical_mapping_init(unsigned long start,
 }
 
 #ifndef CONFIG_NUMA
+void __init early_initmem_init(void)
+{
+}
 void __init initmem_init(void)
 {
memblock_set_node(0, (phys_addr_t)ULLONG_MAX, 0);
diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
index 23ec6ba..643b39a 100644
--- a/arch/x86/mm/numa.c
+++ b/arch/x86/mm/numa.c
@@ -651,31 +651,38 @@ static bool srat_used __initdata;
  * last fallback is dummy single node config encomapssing whole memory and
  * never fails.
  */
-void __init x86_numa_init(void)
+static void __init early_x86_numa_init(void)
 {
-   int i, nid;
-   struct numa_meminfo *mi = &numa_meminfo;
-
if (!numa_off) {
 #ifdef CONFIG_X86_NUMAQ
if (!numa_init(numaq_numa_init))
-   goto out;
+   return;
 #endif
 #ifdef CONFIG_ACPI_NUMA
if (!numa_init(x86_acpi_numa_init_no_slit)) {
srat_used = true;
-   goto out;
+   return;
}
 #endif
 #ifdef CONFIG_AMD_NUMA
if (!numa_init(amd_numa_init))
-   goto out;
+   return;
 #endif
}
 
numa_init(dummy_numa_init);
+}
+
+void __init early_initmem_init(void)
+{
+   early_x86_numa_init();
+}
+
+void __init x86_numa_init(void)
+{
+   int i, nid;
+   struct numa_meminfo *mi = &numa_meminfo;
 
-out:
 #ifdef CONFIG_ACPI_NUMA
if (srat_used)
x86_acpi_numa_init_only_slit();
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 05/14] x86, ACPI: Find acpi tables in initrd early at head_32.S/head64.c

2013-03-07 Thread Yinghai Lu

head64.c could use #PF handler set page table to access initrd before
mapping and relocating.

head_32.S could use 32bit flat mode to access initrd before mapping
and relocating.

Signed-off-by: Yinghai Lu 
Cc: Thomas Renninger 
Cc: Pekka Enberg 
Cc: Jacob Shin 
Cc: Rafael J. Wysocki 
Cc: linux-a...@vger.kernel.org
---
 arch/x86/include/asm/setup.h |2 ++
 arch/x86/kernel/head64.c |2 ++
 arch/x86/kernel/head_32.S|4 
 arch/x86/kernel/setup.c  |   28 ++--
 4 files changed, 34 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/setup.h b/arch/x86/include/asm/setup.h
index b7bf350..b09db26 100644
--- a/arch/x86/include/asm/setup.h
+++ b/arch/x86/include/asm/setup.h
@@ -42,6 +42,8 @@ extern void visws_early_detect(void);
 static inline void visws_early_detect(void) { }
 #endif
 
+void x86_acpi_override_find(void);
+
 extern unsigned long saved_video_mode;
 
 extern void reserve_standard_io_resources(void);
diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index c5e403f..a31bc63 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -174,6 +174,8 @@ void __init x86_64_start_kernel(char * real_mode_data)
if (console_loglevel == 10)
early_printk("Kernel alive\n");
 
+   x86_acpi_override_find();
+
clear_page(init_level4_pgt);
/* set init_level4_pgt kernel high mapping*/
init_level4_pgt[511] = early_level4_pgt[511];
diff --git a/arch/x86/kernel/head_32.S b/arch/x86/kernel/head_32.S
index 73afd11..ca08f0e 100644
--- a/arch/x86/kernel/head_32.S
+++ b/arch/x86/kernel/head_32.S
@@ -149,6 +149,10 @@ ENTRY(startup_32)
call load_ucode_bsp
 #endif
 
+#ifdef CONFIG_ACPI_INITRD_TABLE_OVERRIDE
+   call x86_acpi_override_find
+#endif
+
 /*
  * Initialize page tables.  This creates a PDE and a set of page
  * tables, which are located immediately beyond __brk_base.  The variable
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 668e658..d43545a 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -424,6 +424,32 @@ static void __init reserve_initrd(void)
 }
 #endif /* CONFIG_BLK_DEV_INITRD */
 
+#ifdef CONFIG_ACPI_INITRD_TABLE_OVERRIDE
+void __init x86_acpi_override_find(void)
+{
+   unsigned long ramdisk_image, ramdisk_size;
+   unsigned char *p = NULL;
+
+#ifdef CONFIG_X86_32
+   struct boot_params *boot_params_p;
+
+   boot_params_p = (struct boot_params *)__pa_symbol(&boot_params);
+   ramdisk_image = boot_params_p->hdr.ramdisk_image;
+   ramdisk_size  = boot_params_p->hdr.ramdisk_size;
+   p = (unsigned char *)ramdisk_image;
+   acpi_initrd_override_find(p, ramdisk_size, true);
+#else
+   ramdisk_image = get_ramdisk_image();
+   ramdisk_size  = get_ramdisk_size();
+   if (ramdisk_image)
+   p = __va(ramdisk_image);
+   acpi_initrd_override_find(p, ramdisk_size, false);
+#endif
+}
+#else
+void __init x86_acpi_override_find(void) { }
+#endif
+
 static void __init parse_setup_data(void)
 {
struct setup_data *data;
@@ -1092,8 +1118,6 @@ void __init setup_arch(char **cmdline_p)
 
reserve_initrd();
 
-   acpi_initrd_override_find((void *)initrd_start,
-   initrd_end - initrd_start, false);
acpi_initrd_override_copy();
 
reserve_crashkernel();
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

1 2 3 4 5 6 7 >

1 - 100 of 674 matches

Mail list logo