[PATCH 2/2] :kernel:signal: Fixed coding style issue.

2013-03-04 Thread Alexandru Gheorghiu
Fixed coding style issue by removing trailing whitespaces.

Signed-off-by: Alexandru Gheorghiu 
---
 kernel/signal.c |   10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/kernel/signal.c b/kernel/signal.c
index 2ec870a..fd9a953 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -2369,7 +2369,7 @@ relock:
 }
 
 /**
- * signal_delivered - 
+ * signal_delivered -
  * @sig:   number of signal being delivered
  * @info:  siginfo_t of signal being delivered
  * @ka:sigaction setting that chose the handler
@@ -3128,7 +3128,7 @@ int do_sigaction(int sig, struct k_sigaction *act, struct 
k_sigaction *oact)
return 0;
 }
 
-static int 
+static int
 do_sigaltstack (const stack_t __user *uss, stack_t __user *uoss, unsigned long 
sp)
 {
stack_t oss;
@@ -3272,7 +3272,7 @@ int __compat_save_altstack(compat_stack_t __user *uss, 
unsigned long sp)
  */
 SYSCALL_DEFINE1(sigpending, old_sigset_t __user *, set)
 {
-   return sys_rt_sigpending((sigset_t __user *)set, sizeof(old_sigset_t)); 
+   return sys_rt_sigpending((sigset_t __user *)set, sizeof(old_sigset_t));
 }
 
 #endif
@@ -3397,7 +3397,7 @@ COMPAT_SYSCALL_DEFINE4(rt_sigaction, int, sig,
ret = do_sigaction(sig, act ? _ka : NULL, oact ? _ka : NULL);
if (!ret && oact) {
sigset_to_compat(, _ka.sa.sa_mask);
-   ret = put_user(ptr_to_compat(old_ka.sa.sa_handler), 
+   ret = put_user(ptr_to_compat(old_ka.sa.sa_handler),
   >sa_handler);
ret |= copy_to_user(>sa_mask, , sizeof(mask));
ret |= __put_user(old_ka.sa.sa_flags, >sa_flags);
@@ -3573,7 +3573,7 @@ SYSCALL_DEFINE2(rt_sigsuspend, sigset_t __user *, 
unewset, size_t, sigsetsize)
return -EFAULT;
return sigsuspend();
 }
- 
+
 #ifdef CONFIG_COMPAT
 COMPAT_SYSCALL_DEFINE2(rt_sigsuspend, compat_sigset_t __user *, unewset, 
compat_size_t, sigsetsize)
 {
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/3] irq: Add hw continuous IRQs map to virtual continuous IRQs support

2013-03-04 Thread Mike Qiu

于 2013/3/5 10:41, Paul Mundt 写道:

On Tue, Jan 15, 2013 at 03:38:55PM +0800, Mike Qiu wrote:

Adding a function irq_create_mapping_many() which can associate
multiple MSIs to a continous irq mapping.

This is needed to enable multiple MSI support for pSeries.

+int irq_create_mapping_many(struct irq_domain *domain,
+   irq_hw_number_t hwirq_base, int count)
+{

Other than the other review comments already made, I think you can
simplify this considerably by simply doing what irq_create_strict_mappings() 
does,
and relaxing the irq_base requirements.

In any event, as you are creating a new interface, I don't think you want
to carry around half of the legacy crap that irq_create_mapping() has to
deal with. We made the decision to avoid this with irq_create_strict_mappings()
intentionally, too.

Oh, yes, you are right, I will send out V2 of my patch to make it more 
comfortable , and hope you can review my patch again


Thanks

Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 01/20] vmcore: refer to e_phoff member explicitly

2013-03-04 Thread Zhang Yanfei
于 2013年03月02日 16:35, HATAYAMA Daisuke 写道:
> Code around /proc/vmcore currently assumes program header table is
> next to ELF header. But future change can break the assumption on
> kexec-tools and the 1st kernel. To avoid worst case, now refer to
> e_phoff member that indicates position of program header table in
> file-offset.

Reviewed-by: Zhang Yanfei 

> 
> Signed-off-by: HATAYAMA Daisuke 
> ---
> 
>  fs/proc/vmcore.c |   40 
>  1 files changed, 20 insertions(+), 20 deletions(-)
> 
> diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
> index b870f74..abf4f01 100644
> --- a/fs/proc/vmcore.c
> +++ b/fs/proc/vmcore.c
> @@ -221,8 +221,8 @@ static u64 __init get_vmcore_size_elf64(char *elfptr)
>   Elf64_Phdr *phdr_ptr;
>  
>   ehdr_ptr = (Elf64_Ehdr *)elfptr;
> - phdr_ptr = (Elf64_Phdr*)(elfptr + sizeof(Elf64_Ehdr));
> - size = sizeof(Elf64_Ehdr) + ((ehdr_ptr->e_phnum) * sizeof(Elf64_Phdr));
> + phdr_ptr = (Elf64_Phdr*)(elfptr + ehdr_ptr->e_phoff);
> + size = ehdr_ptr->e_phoff + ((ehdr_ptr->e_phnum) * sizeof(Elf64_Phdr));
>   for (i = 0; i < ehdr_ptr->e_phnum; i++) {
>   size += phdr_ptr->p_memsz;
>   phdr_ptr++;
> @@ -238,8 +238,8 @@ static u64 __init get_vmcore_size_elf32(char *elfptr)
>   Elf32_Phdr *phdr_ptr;
>  
>   ehdr_ptr = (Elf32_Ehdr *)elfptr;
> - phdr_ptr = (Elf32_Phdr*)(elfptr + sizeof(Elf32_Ehdr));
> - size = sizeof(Elf32_Ehdr) + ((ehdr_ptr->e_phnum) * sizeof(Elf32_Phdr));
> + phdr_ptr = (Elf32_Phdr*)(elfptr + ehdr_ptr->e_phoff);
> + size = ehdr_ptr->e_phoff + ((ehdr_ptr->e_phnum) * sizeof(Elf32_Phdr));
>   for (i = 0; i < ehdr_ptr->e_phnum; i++) {
>   size += phdr_ptr->p_memsz;
>   phdr_ptr++;
> @@ -259,7 +259,7 @@ static int __init merge_note_headers_elf64(char *elfptr, 
> size_t *elfsz,
>   u64 phdr_sz = 0, note_off;
>  
>   ehdr_ptr = (Elf64_Ehdr *)elfptr;
> - phdr_ptr = (Elf64_Phdr*)(elfptr + sizeof(Elf64_Ehdr));
> + phdr_ptr = (Elf64_Phdr*)(elfptr + ehdr_ptr->e_phoff);
>   for (i = 0; i < ehdr_ptr->e_phnum; i++, phdr_ptr++) {
>   int j;
>   void *notes_section;
> @@ -305,7 +305,7 @@ static int __init merge_note_headers_elf64(char *elfptr, 
> size_t *elfsz,
>   /* Prepare merged PT_NOTE program header. */
>   phdr.p_type= PT_NOTE;
>   phdr.p_flags   = 0;
> - note_off = sizeof(Elf64_Ehdr) +
> + note_off = ehdr_ptr->e_phoff +
>   (ehdr_ptr->e_phnum - nr_ptnote +1) * sizeof(Elf64_Phdr);
>   phdr.p_offset  = note_off;
>   phdr.p_vaddr   = phdr.p_paddr = 0;
> @@ -313,14 +313,14 @@ static int __init merge_note_headers_elf64(char 
> *elfptr, size_t *elfsz,
>   phdr.p_align   = 0;
>  
>   /* Add merged PT_NOTE program header*/
> - tmp = elfptr + sizeof(Elf64_Ehdr);
> + tmp = elfptr + ehdr_ptr->e_phoff;
>   memcpy(tmp, , sizeof(phdr));
>   tmp += sizeof(phdr);
>  
>   /* Remove unwanted PT_NOTE program headers. */
>   i = (nr_ptnote - 1) * sizeof(Elf64_Phdr);
>   *elfsz = *elfsz - i;
> - memmove(tmp, tmp+i, ((*elfsz)-sizeof(Elf64_Ehdr)-sizeof(Elf64_Phdr)));
> + memmove(tmp, tmp+i, ((*elfsz)-ehdr_ptr->e_phoff-sizeof(Elf64_Phdr)));
>  
>   /* Modify e_phnum to reflect merged headers. */
>   ehdr_ptr->e_phnum = ehdr_ptr->e_phnum - nr_ptnote + 1;
> @@ -340,7 +340,7 @@ static int __init merge_note_headers_elf32(char *elfptr, 
> size_t *elfsz,
>   u64 phdr_sz = 0, note_off;
>  
>   ehdr_ptr = (Elf32_Ehdr *)elfptr;
> - phdr_ptr = (Elf32_Phdr*)(elfptr + sizeof(Elf32_Ehdr));
> + phdr_ptr = (Elf32_Phdr*)(elfptr + ehdr_ptr->e_phoff);
>   for (i = 0; i < ehdr_ptr->e_phnum; i++, phdr_ptr++) {
>   int j;
>   void *notes_section;
> @@ -386,7 +386,7 @@ static int __init merge_note_headers_elf32(char *elfptr, 
> size_t *elfsz,
>   /* Prepare merged PT_NOTE program header. */
>   phdr.p_type= PT_NOTE;
>   phdr.p_flags   = 0;
> - note_off = sizeof(Elf32_Ehdr) +
> + note_off = ehdr_ptr->e_phoff +
>   (ehdr_ptr->e_phnum - nr_ptnote +1) * sizeof(Elf32_Phdr);
>   phdr.p_offset  = note_off;
>   phdr.p_vaddr   = phdr.p_paddr = 0;
> @@ -394,14 +394,14 @@ static int __init merge_note_headers_elf32(char 
> *elfptr, size_t *elfsz,
>   phdr.p_align   = 0;
>  
>   /* Add merged PT_NOTE program header*/
> - tmp = elfptr + sizeof(Elf32_Ehdr);
> + tmp = elfptr + ehdr_ptr->e_phoff;
>   memcpy(tmp, , sizeof(phdr));
>   tmp += sizeof(phdr);
>  
>   /* Remove unwanted PT_NOTE program headers. */
>   i = (nr_ptnote - 1) * sizeof(Elf32_Phdr);
>   *elfsz = *elfsz - i;
> - memmove(tmp, tmp+i, ((*elfsz)-sizeof(Elf32_Ehdr)-sizeof(Elf32_Phdr)));
> + memmove(tmp, tmp+i, ((*elfsz)-ehdr_ptr->e_phoff-sizeof(Elf32_Phdr)));
>  
>   /* Modify e_phnum to reflect merged headers. */
>   ehdr_ptr->e_phnum = 

Re: [PATCH linux-next] cpufreq: conservative: Fix sampling_down_factor functionality

2013-03-04 Thread Viresh Kumar
On 5 March 2013 13:22, Stratos Karafotis  wrote:
> I had the same thoughts, but I saw the comments in the code:
>
> /*
>  * Every sampling_rate, we check, if current idle time is less than 20%
>  * (default), then we try to increase frequency Every sampling_rate *
>  * sampling_down_factor, we check, if current idle time is more than 80%, then
>  * we try to decrease frequency

I misread it here when i looked at this mail for the first time. :)
I strongly believe that we need a full stop (.) before "Every sampling_rate",
otherwise it looks like we check for down_factor while increasing freq :)

>  *
>
> Also checking the code before the commit 
> 8e677ce83bf41ba9c74e5b6d9ee60b07d4e5ed93 you may see that sampling down 
> factor works in this way.
> So, I decided to keep the original functionality (also down_skip was already 
> there unused).

I got that comment but i belive the code was never according to that comment
and not even now. Check the initial patch for conservative governor:
b9170836d1aa4ded7cc1ac1cb8fbc7867061c98c

Even now we aren't checking this 80% thing, right? And so in your patch we can
actually fix the patch too with the right logic of code.. And
documentation too :)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


workqueue panic in 3.4 kernel

2013-03-04 Thread Lei Wen
Hi Tejun,

We met one panic issue related workqueue based over 3.4.5 Linux kernel.

Panic log as:
[153587.035369] Unable to handle kernel NULL pointer dereference at
virtual address 0004
[153587.043731] pgd = e1e74000
[153587.046691] [0004] *pgd=
[153587.050567] Internal error: Oops: 5 [#1] PREEMPT SMP ARM
[153587.056152] Modules linked in: hwmap(O) cidatattydev(O) gs_diag(O)
diag(O) gs_modem(O) ccinetdev(O) cci_datastub(O) citty(O) msocketk(O)
smsmdtv seh(O) cploaddev(O) blcr(O) blcr_imports(O) geu(O) galcore(O)
[153587.076416] CPU: 0Tainted: G   O  (3.4.5+ #1)
[153587.082092] PC is at delayed_work_timer_fn+0x1c/0x28
[153587.087249] LR is at delayed_work_timer_fn+0x18/0x28
[153587.092468] pc : []lr : []psr: 2113
[153587.092468] sp : e1e3bf00  ip : 0001  fp : 000a
[153587.104400] r10: 0001  r9 : 578914dc  r8 : c014c7a0
[153587.109832] r7 : 0101  r6 : bf03d554  r5 :   r4 : bf03d544
[153587.116638] r3 : 0101  r2 : bf03d544  r1 : c1a0b27c  r0 : 
[153587.123352] Flags: nzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM
Segment user
[153587.130737] Control: 10c53c7d  Table: 21e7404a  DAC: 0015
[153587.611328] [] (delayed_work_timer_fn+0x1c/0x28) from
[] (run_timer_softirq+0x260/0x384)
[153587.621368] [] (run_timer_softirq+0x260/0x384) from
[] (__do_softirq+0x11c/0x244)
[153587.630828] [] (__do_softirq+0x11c/0x244) from
[] (irq_exit+0x44/0x98)
[153587.639373] [] (irq_exit+0x44/0x98) from []
(handle_IRQ+0x7c/0xb8)
[153587.647583] [] (handle_IRQ+0x7c/0xb8) from []
(gic_handle_irq+0x34/0x58)
[153587.656188] [] (gic_handle_irq+0x34/0x58) from
[] (__irq_usr+0x3c/0x60)

With checking memory,  we find work->data becomes 0x300, when it try
to call get_work_cwq
in delayed_work_timer_fn. Thus cwq becomes NULL before calls __queue_work.
So it is reasonable kernel get panic when it try to access wq with cwq->wq.

To fix it, we try to backport below patches:
commit 60c057bca22285efefbba033624763a778f243bf
Author: Lai Jiangshan 
Date:   Wed Feb 6 18:04:53 2013 -0800

workqueue: add delayed_work->wq to simplify reentrancy handling

commit 1265057fa02c7bed3b6d9ddc8a2048065a370364
Author: Tejun Heo 
Date:   Wed Aug 8 09:38:42 2012 -0700

workqueue: fix CPU binding of flush_delayed_work[_sync]()

And add below change to make sure __cancel_work_timer cannot preempt
between run_timer_softirq and delayed_work_timer_fn.
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index bf4888c..0e9f77c 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -2627,7 +2627,7 @@ static bool __cancel_work_timer(struct work_struct *work,
ret = (timer && likely(del_timer(timer)));
if (!ret)
ret = try_to_grab_pending(work);
-   wait_on_work(work);
+   flush_work(work);
} while (unlikely(ret < 0));

clear_work_data(work);

Do you think this fix is enough? And add flush_work directly in
__cancel_work_timer is ok for
the fix?

Thanks,
Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [REGRESSION] [3.9-rc1] BUG: ktpacpi_nvramd/446 still has locks held!

2013-03-04 Thread Aaron Lu
On 03/05/2013 03:55 AM, Maciej Rutecki wrote:
> Last known good: 3.8.0
> Bad version: 3.9-rc1
> 
> [6.116492] =
> [6.116614] [ BUG: ktpacpi_nvramd/446 still has locks held! ]
> [6.116737] 3.9.0-rc1 #1 Not tainted
> [6.116821] -
> [6.116900] 1 lock held by ktpacpi_nvramd/446:
> [6.116973]  #0:  (_thread_mutex){+.+...}, at: [] 
> hotkey_kthread+0x1f/0x354 [thinkpad_acpi]
> [6.117193]
> [6.117193] stack backtrace:
> [6.117268] Pid: 446, comm: ktpacpi_nvramd Not tainted 3.9.0-rc1 #1
> [6.117381] Call Trace:
> [6.117445]  [] debug_check_no_locks_held+0x8f/0x93
> [6.117600]  [] set_freezable+0x3e/0x64
> [6.117703] input: ThinkPad Extra Buttons as 
> /devices/platform/thinkpad_acpi/input/input5
> [6.117918]  [] hotkey_kthread+0x31/0x354 [thinkpad_acpi]
> [6.118088]  [] ? issue_volchange.29885+0x54/0x54 
> [thinkpad_acpi]
> [6.118250]  [] kthread+0xac/0xb4
> [6.118356]  [] ? __kthread_parkme+0x60/0x60
> [6.118491]  [] ret_from_fork+0x7c/0xb0
> [6.118614]  [] ? __kthread_parkme+0x60/0x60
> 
> Config:
> http://mrutecki.pl/download/kernel/3.9.0-rc1/config-3.9.0-rc1
> 
> full dmesg:
> http://mrutecki.pl/download/kernel/3.9.0-rc1/dmesg-3.9.0-rc1.txt
> 

Thanks for the report!

Looks like the following commit is related:
commit 6aa9707099c4b25700940eb3d016f16c4434360d
Author: Mandeep Singh Baines   Thu Feb 28 09:03:18 2013

lockdep: check that no locks held at freeze time

And the code to trigger this problem is here:
static int hotkey_kthread(void *data)
{
struct tp_nvram_state s[2];
u32 poll_mask, event_mask;
unsigned int si, so;
unsigned long t;
unsigned int change_detector;
unsigned int poll_freq;
bool was_frozen;

mutex_lock(_thread_mutex);

if (tpacpi_lifecycle == TPACPI_LIFE_EXITING)
goto exit;

set_freezable();
~~
in thinkpad_acpi.c.

I don't know much about freeze, I've no idea what is the problem.
So Mandeep and Henrique, can you please kindly take a look? Thanks.

-Aaron

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 20/24] rtc: rtc-sh: use module_platform_driver_probe()

2013-03-04 Thread Paul Mundt
On Mon, Mar 04, 2013 at 05:05:34PM +0900, Jingoo Han wrote:
> This patch uses module_platform_driver_probe() macro which makes
> the code smaller and simpler.
> 
> Signed-off-by: Jingoo Han 

Not sure I see the point, but:

Acked-by: Paul Mundt 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/3] irq: Add hw continuous IRQs map to virtual continuous IRQs support

2013-03-04 Thread Mike Qiu

于 2013/3/5 10:23, Michael Ellerman 写道:

On Tue, Jan 15, 2013 at 03:38:55PM +0800, Mike Qiu wrote:

Adding a function irq_create_mapping_many() which can associate
multiple MSIs to a continous irq mapping.

This is needed to enable multiple MSI support for pSeries.

Signed-off-by: Mike Qiu 
---
  include/linux/irq.h   |2 +
  include/linux/irqdomain.h |3 ++
  kernel/irq/irqdomain.c|   61 +
  3 files changed, 66 insertions(+), 0 deletions(-)

diff --git a/include/linux/irq.h b/include/linux/irq.h
index 60ef45b..e00a7ec 100644
--- a/include/linux/irq.h
+++ b/include/linux/irq.h
@@ -592,6 +592,8 @@ int __irq_alloc_descs(int irq, unsigned int from, unsigned 
int cnt, int node,
  #define irq_alloc_desc_from(from, node)   \
irq_alloc_descs(-1, from, 1, node)
  
+#define irq_alloc_desc_n(nevc, node)		\

+   irq_alloc_descs(-1, 0, nevc, node)

This has been superseeded by irq_alloc_descs_from(), which is the right
way to do it.
Yes, but irq_alloc_descs_from() just for 1 irq, and if I change the api, 
maybe a lot places which call this

function will be affact.



diff --git a/include/linux/irqdomain.h b/include/linux/irqdomain.h
index 0d5b17b..831dded 100644
--- a/include/linux/irqdomain.h
+++ b/include/linux/irqdomain.h
@@ -168,6 +168,9 @@ extern int irq_create_strict_mappings(struct irq_domain 
*domain,
  unsigned int irq_base,
  irq_hw_number_t hwirq_base, int count);
  
+extern int irq_create_mapping_many(struct irq_domain *domain,

+   irq_hw_number_t hwirq_base, int count);
+
  static inline int irq_create_identity_mapping(struct irq_domain *host,
  irq_hw_number_t hwirq)
  {
diff --git a/kernel/irq/irqdomain.c b/kernel/irq/irqdomain.c
index 96f3a1d..38648e6 100644
--- a/kernel/irq/irqdomain.c
+++ b/kernel/irq/irqdomain.c
@@ -636,6 +636,67 @@ int irq_create_strict_mappings(struct irq_domain *domain, 
unsigned int irq_base,
  }
  EXPORT_SYMBOL_GPL(irq_create_strict_mappings);
  
+/**

+ * irq_create_mapping_many - Map a range of hw IRQs to a range of virtual IRQs
+ * @domain: domain owning the interrupt range
+ * @hwirq_base: beginning of continuous hardware IRQ range
+ * @count: Number of interrupts to map

For multiple-MSI the allocated interrupt numbers must be a power-of-2,
and must be naturally aligned. I don't /think/ that's a requirement for
the virtual numbers, but it's probably best that we do it anyway.

So this API needs to specify that it will give you back a power-of-2
block that is naturally aligned - otherwise you can't use it for MSI.
rtas_call will return the numbers of hardware interrupt, and it should 
be power-of-2,

as this I think do not need to specify

+ * This routine is used for allocating and mapping a range of hardware
+ * irqs to virtual IRQs where the virtual irq numbers are not at pre-defined
+ * locations.

This comment doesn't make sense to me.


+ *
+ * Greater than 0 is returned upon success, while any failure to establish a
+ * static mapping is treated as an error.
+ */
+int irq_create_mapping_many(struct irq_domain *domain,
+   irq_hw_number_t hwirq_base, int count)
+{
+   int ret, irq_base;
+   int virq, i;
+
+   pr_debug("irq_create_mapping(0x%p, 0x%lx)\n", domain, hwirq_base);


I'd like to see this whole function rewritten to reduce the duplication
vs irq_create_mapping(). I don't see any reason why this can't be the
core routine, and irq_create_mapping() becomes a caller of it, passing a
count of 1 ?

It's good suggestion.

+   /* Look for default domain if nececssary */
+   if (!domain)
+   domain = irq_default_domain;
+   if (!domain) {
+   pr_warn("irq_create_mapping called for NULL domain, hwirq=%lx\n"
+   , hwirq_base);
+   WARN_ON(1);
+   return 0;
+   }
+   pr_debug("-> using domain @%p\n", domain);
+
+   /* For IRQ_DOMAIN_MAP_LEGACY, get the first virtual interrupt number */
+   if (domain->revmap_type == IRQ_DOMAIN_MAP_LEGACY)
+   return irq_domain_legacy_revmap(domain, hwirq_base);

The above doesn't work.

Why it doesn't work ?

+   /* Check if mapping already exists */
+   for (i = 0; i < count; i++) {
+   virq = irq_find_mapping(domain, hwirq_base+i);
+   if (virq) {
+   pr_debug("existing mapping on virq %d,"
+   " now dispose it first\n", virq);
+   irq_dispose_mapping(virq);

You might have just disposed of someone elses mapping, we shouldn't do
that. It should be an error to the caller.
It's a good question. If the interrupt used for someone elses, why I can 
apply it from the system?
So it may someone else forget to dispose mapping, and it never be used 
for others as I have got


Re: [PATCH 1/1 v2] pwm_bl: Add support for backlight enable GPIO

2013-03-04 Thread Thierry Reding
On Mon, Mar 04, 2013 at 08:43:50PM -0800, Andrew Chew wrote:
> The backlight enable GPIO is specified in the device tree node for
> backlight.
> 
> Signed-off-by: Andrew Chew 
> ---
> I decided to go ahead with disabling/enabling the backlight via GPIO as
> needed.  Note that I named the new functions pwm_backlight_enable() and
> pwm_backlight_disable() (instead of something more gpio-specific) because
> I thought it would be convenient to have a generic hook for when someone
> wants to add yet more stuff to be done on enable/disable.
> 
> I tested this by going into /sys/class/backlight/backlight.n and manually
> adjusting the brightness, and checked the gpio state to see that it had
> the appropriate value.
> 
>  .../bindings/video/backlight/pwm-backlight.txt |2 +
>  drivers/video/backlight/pwm_bl.c   |   50 
> ++--
>  include/linux/pwm_backlight.h  |2 +
>  3 files changed, 50 insertions(+), 4 deletions(-)
> 
> diff --git 
> a/Documentation/devicetree/bindings/video/backlight/pwm-backlight.txt 
> b/Documentation/devicetree/bindings/video/backlight/pwm-backlight.txt
> index 1e4fc72..1ed4f0f 100644
> --- a/Documentation/devicetree/bindings/video/backlight/pwm-backlight.txt
> +++ b/Documentation/devicetree/bindings/video/backlight/pwm-backlight.txt
> @@ -14,6 +14,8 @@ Required properties:
>  Optional properties:
>- pwm-names: a list of names for the PWM devices specified in the
> "pwms" property (see PWM binding[0])
> +  - enable-gpio: a GPIO that needs to be used to enable the backlight

According to Documentation/devicetree/bindings/gpio/gpio.txt this should
be called "enable-gpios".

> +  - enable-gpio-active-high: polarity of GPIO is active high (default is low)

There is OF_GPIO_ACTIVE_LOW, which is automatically parsed from the
second cell of a GPIO specifier. It will only work if you request the
GPIO using of_get_named_gpio_flags(), though, so it will only work in
the non-DT case.

I think using a regulator would be more appropriate, since it gives you
more flexibility than a plain GPIO. Does anybody see a problem with
using a regulator instead?

Thierry


pgpOelTVTf4Oe.pgp
Description: PGP signature


[PATCH v2 04/20] vmcore: allocate buffer for ELF headers on page-size alignment

2013-03-04 Thread HATAYAMA Daisuke
Allocate buffer for ELF headers on page-size aligned boudary to
satisfy mmap() requirement. For this, __get_free_pages() is used
instead of kmalloc().

Also, later patch will decrease actually used buffer size for ELF
headers, so it's necessary to keep original buffer size and actually
used buffer size separately. elfcorebuf_sz_orig keeps the original one
and elfcorebuf_sz the actually used one.

Signed-off-by: HATAYAMA Daisuke 
---

 fs/proc/vmcore.c |   30 +-
 1 files changed, 21 insertions(+), 9 deletions(-)

diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
index b5c9e33..1b02d01 100644
--- a/fs/proc/vmcore.c
+++ b/fs/proc/vmcore.c
@@ -31,6 +31,7 @@ static LIST_HEAD(vmcore_list);
 /* Stores the pointer to the buffer containing kernel elf core headers. */
 static char *elfcorebuf;
 static size_t elfcorebuf_sz;
+static size_t elfcorebuf_sz_orig;
 
 /* Total size of vmcore file. */
 static u64 vmcore_size;
@@ -610,26 +611,31 @@ static int __init parse_crash_elf64_headers(void)
 
/* Read in all elf headers. */
elfcorebuf_sz = ehdr.e_phoff + ehdr.e_phnum * sizeof(Elf64_Phdr);
-   elfcorebuf = kmalloc(elfcorebuf_sz, GFP_KERNEL);
+   elfcorebuf_sz_orig = elfcorebuf_sz;
+   elfcorebuf = (void *) __get_free_pages(GFP_KERNEL | __GFP_ZERO,
+  get_order(elfcorebuf_sz_orig));
if (!elfcorebuf)
return -ENOMEM;
addr = elfcorehdr_addr;
rc = read_from_oldmem(elfcorebuf, elfcorebuf_sz, , 0);
if (rc < 0) {
-   kfree(elfcorebuf);
+   free_pages((unsigned long)elfcorebuf,
+  get_order(elfcorebuf_sz_orig));
return rc;
}
 
/* Merge all PT_NOTE headers into one. */
rc = merge_note_headers_elf64(elfcorebuf, _sz, _list);
if (rc) {
-   kfree(elfcorebuf);
+   free_pages((unsigned long)elfcorebuf,
+  get_order(elfcorebuf_sz_orig));
return rc;
}
rc = process_ptload_program_headers_elf64(elfcorebuf, elfcorebuf_sz,
_list);
if (rc) {
-   kfree(elfcorebuf);
+   free_pages((unsigned long)elfcorebuf,
+  get_order(elfcorebuf_sz_orig));
return rc;
}
set_vmcore_list_offsets_elf64(elfcorebuf, _list);
@@ -665,26 +671,31 @@ static int __init parse_crash_elf32_headers(void)
 
/* Read in all elf headers. */
elfcorebuf_sz = ehdr.e_phoff + ehdr.e_phnum * sizeof(Elf32_Phdr);
-   elfcorebuf = kmalloc(elfcorebuf_sz, GFP_KERNEL);
+   elfcorebuf_sz_orig = elfcorebuf_sz;
+   elfcorebuf = (void *) __get_free_pages(GFP_KERNEL | __GFP_ZERO,
+  get_order(elfcorebuf_sz));
if (!elfcorebuf)
return -ENOMEM;
addr = elfcorehdr_addr;
rc = read_from_oldmem(elfcorebuf, elfcorebuf_sz, , 0);
if (rc < 0) {
-   kfree(elfcorebuf);
+   free_pages((unsigned long)elfcorebuf,
+  get_order(elfcorebuf_sz_orig));
return rc;
}
 
/* Merge all PT_NOTE headers into one. */
rc = merge_note_headers_elf32(elfcorebuf, _sz, _list);
if (rc) {
-   kfree(elfcorebuf);
+   free_pages((unsigned long)elfcorebuf,
+  get_order(elfcorebuf_sz_orig));
return rc;
}
rc = process_ptload_program_headers_elf32(elfcorebuf, elfcorebuf_sz,
_list);
if (rc) {
-   kfree(elfcorebuf);
+   free_pages((unsigned long)elfcorebuf,
+  get_order(elfcorebuf_sz_orig));
return rc;
}
set_vmcore_list_offsets_elf32(elfcorebuf, _list);
@@ -766,7 +777,8 @@ void vmcore_cleanup(void)
list_del(>list);
kfree(m);
}
-   kfree(elfcorebuf);
+   free_pages((unsigned long)elfcorebuf,
+  get_order(elfcorebuf_sz_orig));
elfcorebuf = NULL;
 }
 EXPORT_SYMBOL_GPL(vmcore_cleanup);

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 02/20] vmcore: rearrange program headers without assuming consequtive PT_NOTE entries

2013-03-04 Thread HATAYAMA Daisuke
Current code assumes all PT_NOTE headers are placed at the beginning
of program header table and they are consequtive. But the assumption
could be broken by future changes on either kexec-tools or the 1st
kernel. This patch removes the assumption and rearranges program
headers as the following conditions are satisfied:

- PT_NOTE entry is unique at the first entry,

- the order of program headers are unchanged during this
  rearrangement, only their positions are changed in positive
  direction.

- unused part that occurs in the bottom of program headers are filled
  with 0.

Also, this patch adds one exceptional case where the number of PT_NOTE
entries is somehow 0. Then, immediately go out of the function.

Signed-off-by: HATAYAMA Daisuke 
---

 fs/proc/vmcore.c |   92 +++---
 1 files changed, 74 insertions(+), 18 deletions(-)

diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
index abf4f01..b5c9e33 100644
--- a/fs/proc/vmcore.c
+++ b/fs/proc/vmcore.c
@@ -251,8 +251,7 @@ static u64 __init get_vmcore_size_elf32(char *elfptr)
 static int __init merge_note_headers_elf64(char *elfptr, size_t *elfsz,
struct list_head *vc_list)
 {
-   int i, nr_ptnote=0, rc=0;
-   char *tmp;
+   int i, j, nr_ptnote=0, i_ptnote, rc=0;
Elf64_Ehdr *ehdr_ptr;
Elf64_Phdr phdr, *phdr_ptr;
Elf64_Nhdr *nhdr_ptr;
@@ -302,6 +301,39 @@ static int __init merge_note_headers_elf64(char *elfptr, 
size_t *elfsz,
kfree(notes_section);
}
 
+   if (nr_ptnote == 0)
+   goto out;
+
+   phdr_ptr = (Elf64_Phdr *)(elfptr + ehdr_ptr->e_phoff);
+
+   /* Remove unwanted PT_NOTE program headers. */
+
+/* - 1st pass shifts non-PT_NOTE entries until the first
+PT_NOTE entry. */
+   i_ptnote = -1;
+   for (i = 0; i < ehdr_ptr->e_phnum; ++i) {
+   if (phdr_ptr[i].p_type == PT_NOTE) {
+   i_ptnote = i;
+   break;
+   }
+   }
+   BUG_ON(i_ptnote == -1); /* impossible case since nr_ptnote > 0. */
+   memmove(phdr_ptr + 1, phdr_ptr, i_ptnote * sizeof(Elf64_Phdr));
+
+   /* - 2nd pass moves the remaining non-PT_NOTE entries under
+the first PT_NOTE entry. */
+   for (i = j = i_ptnote + 1; i < ehdr_ptr->e_phnum; i++) {
+   if (phdr_ptr[i].p_type != PT_NOTE) {
+   memmove(phdr_ptr + j, phdr_ptr + i,
+   sizeof(Elf64_Phdr));
+   j++;
+   }
+   }
+
+   /* - Finally, fill unused part with 0. */
+   memset(phdr_ptr + ehdr_ptr->e_phnum - (nr_ptnote - 1), 0,
+  (nr_ptnote - 1) * sizeof(Elf64_Phdr));
+
/* Prepare merged PT_NOTE program header. */
phdr.p_type= PT_NOTE;
phdr.p_flags   = 0;
@@ -313,18 +345,14 @@ static int __init merge_note_headers_elf64(char *elfptr, 
size_t *elfsz,
phdr.p_align   = 0;
 
/* Add merged PT_NOTE program header*/
-   tmp = elfptr + ehdr_ptr->e_phoff;
-   memcpy(tmp, , sizeof(phdr));
-   tmp += sizeof(phdr);
+   memcpy(phdr_ptr, , sizeof(Elf64_Phdr));
 
-   /* Remove unwanted PT_NOTE program headers. */
-   i = (nr_ptnote - 1) * sizeof(Elf64_Phdr);
-   *elfsz = *elfsz - i;
-   memmove(tmp, tmp+i, ((*elfsz)-ehdr_ptr->e_phoff-sizeof(Elf64_Phdr)));
+   *elfsz = *elfsz - (nr_ptnote - 1) * sizeof(Elf64_Phdr);
 
/* Modify e_phnum to reflect merged headers. */
ehdr_ptr->e_phnum = ehdr_ptr->e_phnum - nr_ptnote + 1;
 
+out:
return 0;
 }
 
@@ -332,8 +360,7 @@ static int __init merge_note_headers_elf64(char *elfptr, 
size_t *elfsz,
 static int __init merge_note_headers_elf32(char *elfptr, size_t *elfsz,
struct list_head *vc_list)
 {
-   int i, nr_ptnote=0, rc=0;
-   char *tmp;
+   int i, j, nr_ptnote=0, i_ptnote, rc=0;
Elf32_Ehdr *ehdr_ptr;
Elf32_Phdr phdr, *phdr_ptr;
Elf32_Nhdr *nhdr_ptr;
@@ -383,6 +410,39 @@ static int __init merge_note_headers_elf32(char *elfptr, 
size_t *elfsz,
kfree(notes_section);
}
 
+   if (nr_ptnote == 0)
+   goto out;
+
+   phdr_ptr = (Elf32_Phdr *)(elfptr + ehdr_ptr->e_phoff);
+
+   /* Remove unwanted PT_NOTE program headers. */
+
+   /* - 1st pass shifts non-PT_NOTE entries until the first
+PT_NOTE entry. */
+   i_ptnote = -1;
+   for (i = 0; i < ehdr_ptr->e_phnum; ++i) {
+   if (phdr_ptr[i].p_type == PT_NOTE) {
+   i_ptnote = i;
+   break;
+   }
+   }
+   BUG_ON(i_ptnote == -1); /* impossible case since nr_ptnote > 0. */
+   memmove(phdr_ptr + 1, phdr_ptr, i_ptnote * sizeof(Elf32_Phdr));
+
+   /* - 2nd pass moves the remaining non-PT_NOTE entries under
+the first 

[PATCH v2 01/20] vmcore: refer to e_phoff member explicitly

2013-03-04 Thread HATAYAMA Daisuke
Code around /proc/vmcore currently assumes program header table is
next to ELF header. But future change can break the assumption on
kexec-tools and the 1st kernel. To avoid worst case, now refer to
e_phoff member that indicates position of program header table in
file-offset.

Signed-off-by: HATAYAMA Daisuke 
---

 fs/proc/vmcore.c |   40 
 1 files changed, 20 insertions(+), 20 deletions(-)

diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
index b870f74..abf4f01 100644
--- a/fs/proc/vmcore.c
+++ b/fs/proc/vmcore.c
@@ -221,8 +221,8 @@ static u64 __init get_vmcore_size_elf64(char *elfptr)
Elf64_Phdr *phdr_ptr;
 
ehdr_ptr = (Elf64_Ehdr *)elfptr;
-   phdr_ptr = (Elf64_Phdr*)(elfptr + sizeof(Elf64_Ehdr));
-   size = sizeof(Elf64_Ehdr) + ((ehdr_ptr->e_phnum) * sizeof(Elf64_Phdr));
+   phdr_ptr = (Elf64_Phdr*)(elfptr + ehdr_ptr->e_phoff);
+   size = ehdr_ptr->e_phoff + ((ehdr_ptr->e_phnum) * sizeof(Elf64_Phdr));
for (i = 0; i < ehdr_ptr->e_phnum; i++) {
size += phdr_ptr->p_memsz;
phdr_ptr++;
@@ -238,8 +238,8 @@ static u64 __init get_vmcore_size_elf32(char *elfptr)
Elf32_Phdr *phdr_ptr;
 
ehdr_ptr = (Elf32_Ehdr *)elfptr;
-   phdr_ptr = (Elf32_Phdr*)(elfptr + sizeof(Elf32_Ehdr));
-   size = sizeof(Elf32_Ehdr) + ((ehdr_ptr->e_phnum) * sizeof(Elf32_Phdr));
+   phdr_ptr = (Elf32_Phdr*)(elfptr + ehdr_ptr->e_phoff);
+   size = ehdr_ptr->e_phoff + ((ehdr_ptr->e_phnum) * sizeof(Elf32_Phdr));
for (i = 0; i < ehdr_ptr->e_phnum; i++) {
size += phdr_ptr->p_memsz;
phdr_ptr++;
@@ -259,7 +259,7 @@ static int __init merge_note_headers_elf64(char *elfptr, 
size_t *elfsz,
u64 phdr_sz = 0, note_off;
 
ehdr_ptr = (Elf64_Ehdr *)elfptr;
-   phdr_ptr = (Elf64_Phdr*)(elfptr + sizeof(Elf64_Ehdr));
+   phdr_ptr = (Elf64_Phdr*)(elfptr + ehdr_ptr->e_phoff);
for (i = 0; i < ehdr_ptr->e_phnum; i++, phdr_ptr++) {
int j;
void *notes_section;
@@ -305,7 +305,7 @@ static int __init merge_note_headers_elf64(char *elfptr, 
size_t *elfsz,
/* Prepare merged PT_NOTE program header. */
phdr.p_type= PT_NOTE;
phdr.p_flags   = 0;
-   note_off = sizeof(Elf64_Ehdr) +
+   note_off = ehdr_ptr->e_phoff +
(ehdr_ptr->e_phnum - nr_ptnote +1) * sizeof(Elf64_Phdr);
phdr.p_offset  = note_off;
phdr.p_vaddr   = phdr.p_paddr = 0;
@@ -313,14 +313,14 @@ static int __init merge_note_headers_elf64(char *elfptr, 
size_t *elfsz,
phdr.p_align   = 0;
 
/* Add merged PT_NOTE program header*/
-   tmp = elfptr + sizeof(Elf64_Ehdr);
+   tmp = elfptr + ehdr_ptr->e_phoff;
memcpy(tmp, , sizeof(phdr));
tmp += sizeof(phdr);
 
/* Remove unwanted PT_NOTE program headers. */
i = (nr_ptnote - 1) * sizeof(Elf64_Phdr);
*elfsz = *elfsz - i;
-   memmove(tmp, tmp+i, ((*elfsz)-sizeof(Elf64_Ehdr)-sizeof(Elf64_Phdr)));
+   memmove(tmp, tmp+i, ((*elfsz)-ehdr_ptr->e_phoff-sizeof(Elf64_Phdr)));
 
/* Modify e_phnum to reflect merged headers. */
ehdr_ptr->e_phnum = ehdr_ptr->e_phnum - nr_ptnote + 1;
@@ -340,7 +340,7 @@ static int __init merge_note_headers_elf32(char *elfptr, 
size_t *elfsz,
u64 phdr_sz = 0, note_off;
 
ehdr_ptr = (Elf32_Ehdr *)elfptr;
-   phdr_ptr = (Elf32_Phdr*)(elfptr + sizeof(Elf32_Ehdr));
+   phdr_ptr = (Elf32_Phdr*)(elfptr + ehdr_ptr->e_phoff);
for (i = 0; i < ehdr_ptr->e_phnum; i++, phdr_ptr++) {
int j;
void *notes_section;
@@ -386,7 +386,7 @@ static int __init merge_note_headers_elf32(char *elfptr, 
size_t *elfsz,
/* Prepare merged PT_NOTE program header. */
phdr.p_type= PT_NOTE;
phdr.p_flags   = 0;
-   note_off = sizeof(Elf32_Ehdr) +
+   note_off = ehdr_ptr->e_phoff +
(ehdr_ptr->e_phnum - nr_ptnote +1) * sizeof(Elf32_Phdr);
phdr.p_offset  = note_off;
phdr.p_vaddr   = phdr.p_paddr = 0;
@@ -394,14 +394,14 @@ static int __init merge_note_headers_elf32(char *elfptr, 
size_t *elfsz,
phdr.p_align   = 0;
 
/* Add merged PT_NOTE program header*/
-   tmp = elfptr + sizeof(Elf32_Ehdr);
+   tmp = elfptr + ehdr_ptr->e_phoff;
memcpy(tmp, , sizeof(phdr));
tmp += sizeof(phdr);
 
/* Remove unwanted PT_NOTE program headers. */
i = (nr_ptnote - 1) * sizeof(Elf32_Phdr);
*elfsz = *elfsz - i;
-   memmove(tmp, tmp+i, ((*elfsz)-sizeof(Elf32_Ehdr)-sizeof(Elf32_Phdr)));
+   memmove(tmp, tmp+i, ((*elfsz)-ehdr_ptr->e_phoff-sizeof(Elf32_Phdr)));
 
/* Modify e_phnum to reflect merged headers. */
ehdr_ptr->e_phnum = ehdr_ptr->e_phnum - nr_ptnote + 1;
@@ -422,10 +422,10 @@ static int __init 
process_ptload_program_headers_elf64(char *elfptr,
struct vmcore *new;
 

Re: [PATCH RESEND 1/3] regulator: core: Add enable_is_inverted flag to indicate set enable_mask bits to disable

2013-03-04 Thread Axel Lin
2013/3/5 Mark Brown :
> On Tue, Mar 05, 2013 at 02:16:00PM +0800, Axel Lin wrote:
>> Add enable_is_inverted flag to indicate set enable_mask bits to disable
>> when using regulator_enable_regmap and friends APIs.
>>
>> Signed-off-by: Axel Lin 
>> Reviewed-by: Haojian Zhuang 
>> ---
>> This patch was sent on https://lkml.org/lkml/2013/2/16/14.
>> This resend is against linux-next tree (20130305).
>
> This doesn't apply against v3.9-rc1...  what are the dependencies?

I think it is because of commit 5838b032fd
"regulator: core: update kernel documentation for regulator_desc".

I found the patch (sent on https://lkml.org/lkml/2013/2/16/14 ) does not apply
to today's linux-next tree. So I re-generate the patchs against linux-next.

Should I resend the previous version of this patch?

Axel
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 06/20] vmcore, procfs: introduce a flag to distinguish objects copied in 2nd kernel

2013-03-04 Thread HATAYAMA Daisuke
The part of dump target memory is copied into the 2nd kernel if it
doesn't satisfy mmap()'s page-size boundary requirement. To
distinguish such copied object from usual old memory, a flag
MEM_TYPE_CURRENT_KERNEL is introduced. If this flag is set, the object
is considered being copied into buffer on the 2nd kernel.

Signed-off-by: HATAYAMA Daisuke 
---

 include/linux/proc_fs.h |8 +++-
 1 files changed, 7 insertions(+), 1 deletions(-)

diff --git a/include/linux/proc_fs.h b/include/linux/proc_fs.h
index 8307f2f..11dd592 100644
--- a/include/linux/proc_fs.h
+++ b/include/linux/proc_fs.h
@@ -97,11 +97,17 @@ struct kcore_list {
int type;
 };
 
+#define MEM_TYPE_CURRENT_KERNEL 0x1
+
 struct vmcore {
struct list_head list;
-   unsigned long long paddr;
+   union {
+   unsigned long long paddr;
+   char *buf;
+   };
unsigned long long size;
loff_t offset;
+   unsigned int flag;
 };
 
 #ifdef CONFIG_PROC_FS

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v12 rebased] kvm: notify host when the guest is panicked

2013-03-04 Thread Gleb Natapov
On Mon, Mar 04, 2013 at 05:43:48PM -0300, Marcelo Tosatti wrote:
> On Mon, Mar 04, 2013 at 07:49:13PM +0200, Gleb Natapov wrote:
> > On Sun, Mar 03, 2013 at 07:29:53PM -0300, Marcelo Tosatti wrote:
> > > On Sun, Mar 03, 2013 at 03:00:22PM +0200, Gleb Natapov wrote:
> > > > On Fri, Mar 01, 2013 at 09:03:12PM -0300, Marcelo Tosatti wrote:
> > > > > On Thu, Feb 28, 2013 at 04:54:25PM +0800, Hu Tao wrote:
> > > > > > > > diff --git a/arch/x86/include/uapi/asm/kvm_para.h 
> > > > > > > > b/arch/x86/include/uapi/asm/kvm_para.h
> > > > > > > > index 06fdbd9..c15ef33 100644
> > > > > > > > --- a/arch/x86/include/uapi/asm/kvm_para.h
> > > > > > > > +++ b/arch/x86/include/uapi/asm/kvm_para.h
> > > > > > > > @@ -96,5 +96,7 @@ struct kvm_vcpu_pv_apf_data {
> > > > > > > >  #define KVM_PV_EOI_ENABLED KVM_PV_EOI_MASK
> > > > > > > >  #define KVM_PV_EOI_DISABLED 0x0
> > > > > > > >  
> > > > > > > > +#define KVM_PV_EVENT_PORT  (0x505UL)
> > > > > > > > +
> > > > > > > 
> > > > > > > No need for the ioport to be hard coded. What are the options to
> > > > > > > communicate an address to the guest? An MSR, via ACPI?
> > > > > > 
> > > > > > I'm not quite understanding here. By 'address', you mean an ioport?
> > > > > > how to communicate an address? (I have little knowledge about ACPI)
> > > > > 
> > > > > Yes, the ioport. The address of the ioport should not be fixed (for
> > > > > example future emulated board could use that fixed ioport address,
> > > > > 0x505UL).
> > > > > 
> > > > > One option is to pass the address via an MSR. Yes, that is probably 
> > > > > the
> > > > > best option because there is no dependency on ACPI.
> > > > > 
> > > > Why dependency on ACPI is problematic? ACPI is the standard way on x86
> > > > to enumerate platform devices. Passing it through MSR makes this panic
> > > > device CPU interface which it is not. And since relying on #GP to detect
> > > > valid MSRs is not good interface we will have to guard it by cpuid bit.
> > > > 
> > > > --
> > > > Gleb.
> > > 
> > > KVM guest <-> KVM host interface is not dependent on ACPI, so far. Say,
> > > its possible to use a Linux guest without ACPI and have KVM paravirt 
> > > fully functional.
> > This is not KVM guest <-> KVM host interface though. This is yet another
> > device. We could implement real impi device that have crash reporting
> > capability, but decided to go with something simpler. Without ACPI guest
> > will not be able to power down itself too, but this is not the reason
> > for us to introduce non-ACPI interface for power down.
> 
> Sure (its more of an aesthetic/organizational point, i guess).
> 
> Anyway, one problem with ACPI is whether its initialized early enough
> (which is the whole point of PIO the x86 specific interface).
ACPI is needed pretty early in the boot process.

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 13/20] kexec, elf: introduce NT_VMCORE_DEBUGINFO note type

2013-03-04 Thread HATAYAMA Daisuke
This patch introduces NT_VMCORE_DEBUGINFO to a unique note type in
VMCOREINFO name, which has had no name so far. The name means that
it's a kind of note type in vmcoreinfo that contains system kernel's
debug information.

Signed-off-by: HATAYAMA Daisuke 
---

 include/uapi/linux/elf.h |4 
 kernel/kexec.c   |4 ++--
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/include/uapi/linux/elf.h b/include/uapi/linux/elf.h
index 8072d35..b869904 100644
--- a/include/uapi/linux/elf.h
+++ b/include/uapi/linux/elf.h
@@ -398,6 +398,10 @@ typedef struct elf64_shdr {
 #define NT_METAG_CBUF  0x500   /* Metag catch buffer registers */
 #define NT_METAG_RPIPE 0x501   /* Metag read pipeline state */
 
+/*
+ * Notes exported from /proc/vmcore, belonging to "VMCOREINFO" name.
+ */
+#define NT_VMCORE_DEBUGINFO0   /* vmcore system kernel's debuginfo */
 
 /* Note header in a PT_NOTE section */
 typedef struct elf32_note {
diff --git a/kernel/kexec.c b/kernel/kexec.c
index 195de6d..6597b82 100644
--- a/kernel/kexec.c
+++ b/kernel/kexec.c
@@ -1438,8 +1438,8 @@ static void update_vmcoreinfo_note(void)
 
if (!vmcoreinfo_size)
return;
-   buf = append_elf_note(buf, VMCOREINFO_NOTE_NAME, 0, vmcoreinfo_data,
- vmcoreinfo_size);
+   buf = append_elf_note(buf, VMCOREINFO_NOTE_NAME, NT_VMCORE_DEBUGINFO,
+ vmcoreinfo_data, vmcoreinfo_size);
final_note(buf);
 }
 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 18/20] vmcore: round-up offset of vmcore object in page-size boundary

2013-03-04 Thread HATAYAMA Daisuke
To satisfy mmap()'s page-size bounary requirement, round-up offset of
each vmcore objects in page-size boundary; each offset is connected to
user-space virtual address through mapping of mmap().

Signed-off-by: HATAYAMA Daisuke 
---

 fs/proc/vmcore.c |   18 --
 1 files changed, 8 insertions(+), 10 deletions(-)

diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
index 5582aaa..6660cd5 100644
--- a/fs/proc/vmcore.c
+++ b/fs/proc/vmcore.c
@@ -700,7 +700,7 @@ static int __init process_ptload_program_headers_elf32(char 
*elfptr,
 }
 
 /* Sets offset fields of vmcore elements. */
-static void __init set_vmcore_list_offsets_elf64(char *elfptr,
+static void __init set_vmcore_list_offsets_elf64(char *elfptr, size_t elfsz,
struct list_head *vc_list)
 {
loff_t vmcore_off;
@@ -710,17 +710,16 @@ static void __init set_vmcore_list_offsets_elf64(char 
*elfptr,
ehdr_ptr = (Elf64_Ehdr *)elfptr;
 
/* Skip Elf header and program headers. */
-   vmcore_off = ehdr_ptr->e_phoff +
-   (ehdr_ptr->e_phnum) * sizeof(Elf64_Phdr);
+   vmcore_off = elfsz;
 
list_for_each_entry(m, vc_list, list) {
m->offset = vmcore_off;
-   vmcore_off += m->size;
+   vmcore_off += roundup(m->size, PAGE_SIZE);
}
 }
 
 /* Sets offset fields of vmcore elements. */
-static void __init set_vmcore_list_offsets_elf32(char *elfptr,
+static void __init set_vmcore_list_offsets_elf32(char *elfptr, size_t elfsz,
struct list_head *vc_list)
 {
loff_t vmcore_off;
@@ -730,12 +729,11 @@ static void __init set_vmcore_list_offsets_elf32(char 
*elfptr,
ehdr_ptr = (Elf32_Ehdr *)elfptr;
 
/* Skip Elf header and program headers. */
-   vmcore_off = ehdr_ptr->e_phoff +
-   (ehdr_ptr->e_phnum) * sizeof(Elf32_Phdr);
+   vmcore_off = elfsz;
 
list_for_each_entry(m, vc_list, list) {
m->offset = vmcore_off;
-   vmcore_off += m->size;
+   vmcore_off += roundup(m->size, PAGE_SIZE);
}
 }
 
@@ -795,7 +793,7 @@ static int __init parse_crash_elf64_headers(void)
   get_order(elfcorebuf_sz_orig));
return rc;
}
-   set_vmcore_list_offsets_elf64(elfcorebuf, _list);
+   set_vmcore_list_offsets_elf64(elfcorebuf, elfcorebuf_sz, _list);
return 0;
 }
 
@@ -855,7 +853,7 @@ static int __init parse_crash_elf32_headers(void)
   get_order(elfcorebuf_sz_orig));
return rc;
}
-   set_vmcore_list_offsets_elf32(elfcorebuf, _list);
+   set_vmcore_list_offsets_elf32(elfcorebuf, elfcorebuf_sz, _list);
return 0;
 }
 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 20/20] vmcore: introduce mmap_vmcore()

2013-03-04 Thread HATAYAMA Daisuke
This patch introduces mmap_vmcore().

If flag MEM_TYPE_CURRENT_KERNEL is set, remapped is the buffer on the
2nd kernel. If not set, remapped is some area in old memory.

Neither writable nor executable mapping is permitted even with
mprotect(). Non-writable mapping is also requirement of
remap_pfn_range() when mapping linear pages on non-consequtive
physical pages; see is_cow_mapping().

On x86-32 PAE kernels, mmap() supports at most 16TB memory only. This
limitation comes from the fact that the third argument of
remap_pfn_range(), pfn, is of 32-bit length on x86-32: unsigned long.

Signed-off-by: HATAYAMA Daisuke 
---

 fs/proc/vmcore.c |   72 ++
 1 files changed, 72 insertions(+), 0 deletions(-)

diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
index 709d21a..9433ef0 100644
--- a/fs/proc/vmcore.c
+++ b/fs/proc/vmcore.c
@@ -185,9 +185,81 @@ static ssize_t read_vmcore(struct file *file, char __user 
*buffer,
return acc;
 }
 
+static int mmap_vmcore(struct file *file, struct vm_area_struct *vma)
+{
+   size_t size = vma->vm_end - vma->vm_start;
+   u64 start, end, len, tsz;
+   struct vmcore *m;
+
+   if (!support_mmap_vmcore)
+   return -ENODEV;
+
+   start = (u64)vma->vm_pgoff << PAGE_SHIFT;
+   end = start + size;
+
+   if (size > vmcore_size || end > vmcore_size)
+   return -EINVAL;
+
+   if (vma->vm_flags & (VM_WRITE | VM_EXEC))
+   return -EPERM;
+
+   vma->vm_flags &= ~(VM_MAYWRITE | VM_MAYEXEC);
+
+   len = 0;
+
+   if (start < elfcorebuf_sz) {
+   u64 pfn;
+
+   tsz = elfcorebuf_sz - start;
+   if (size < tsz)
+   tsz = size;
+   pfn = __pa(elfcorebuf + start) >> PAGE_SHIFT;
+   if (remap_pfn_range(vma, vma->vm_start, pfn, tsz,
+   vma->vm_page_prot))
+   return -EAGAIN;
+   size -= tsz;
+   start += tsz;
+   len += tsz;
+
+   if (size == 0)
+   return 0;
+   }
+
+   list_for_each_entry(m, _list, list) {
+   if (start < m->offset + m->size) {
+   u64 pfn = 0;
+
+   tsz = m->offset + m->size - start;
+   if (size < tsz)
+   tsz = size;
+   if (m->flag & MEM_TYPE_CURRENT_KERNEL) {
+   pfn = __pa(m->buf + start - m->offset)
+   >> PAGE_SHIFT;
+   } else {
+   pfn = (m->paddr + (start - m->offset))
+   >> PAGE_SHIFT;
+   }
+   if (remap_pfn_range(vma, vma->vm_start + len, pfn, tsz,
+   vma->vm_page_prot)) {
+   do_munmap(vma->vm_mm, vma->vm_start, len);
+   return -EAGAIN;
+   }
+   size -= tsz;
+   start += tsz;
+   len += tsz;
+
+   if (size == 0)
+   return 0;
+   }
+   }
+
+   return 0;
+}
+
 static const struct file_operations proc_vmcore_operations = {
.read   = read_vmcore,
.llseek = default_llseek,
+   .mmap   = mmap_vmcore,
 };
 
 static struct vmcore* __init get_new_element(void)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 19/20] vmcore: count holes generated by round-up operation for vmcore size

2013-03-04 Thread HATAYAMA Daisuke
The previous patch changed offsets of each vmcore objects by round-up
operation. vmcore size must count the holes.

Signed-off-by: HATAYAMA Daisuke 
---

 fs/proc/vmcore.c |   16 
 1 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
index 6660cd5..709d21a 100644
--- a/fs/proc/vmcore.c
+++ b/fs/proc/vmcore.c
@@ -195,7 +195,7 @@ static struct vmcore* __init get_new_element(void)
return kzalloc(sizeof(struct vmcore), GFP_KERNEL);
 }
 
-static u64 __init get_vmcore_size_elf64(char *elfptr)
+static u64 __init get_vmcore_size_elf64(char *elfptr, size_t elfsz)
 {
int i;
u64 size;
@@ -204,15 +204,15 @@ static u64 __init get_vmcore_size_elf64(char *elfptr)
 
ehdr_ptr = (Elf64_Ehdr *)elfptr;
phdr_ptr = (Elf64_Phdr*)(elfptr + ehdr_ptr->e_phoff);
-   size = ehdr_ptr->e_phoff + ((ehdr_ptr->e_phnum) * sizeof(Elf64_Phdr));
+   size = elfsz;
for (i = 0; i < ehdr_ptr->e_phnum; i++) {
-   size += phdr_ptr->p_memsz;
+   size += roundup(phdr_ptr->p_memsz, PAGE_SIZE);
phdr_ptr++;
}
return size;
 }
 
-static u64 __init get_vmcore_size_elf32(char *elfptr)
+static u64 __init get_vmcore_size_elf32(char *elfptr, size_t elfsz)
 {
int i;
u64 size;
@@ -221,9 +221,9 @@ static u64 __init get_vmcore_size_elf32(char *elfptr)
 
ehdr_ptr = (Elf32_Ehdr *)elfptr;
phdr_ptr = (Elf32_Phdr*)(elfptr + ehdr_ptr->e_phoff);
-   size = ehdr_ptr->e_phoff + ((ehdr_ptr->e_phnum) * sizeof(Elf32_Phdr));
+   size = elfsz;
for (i = 0; i < ehdr_ptr->e_phnum; i++) {
-   size += phdr_ptr->p_memsz;
+   size += roundup(phdr_ptr->p_memsz, PAGE_SIZE);
phdr_ptr++;
}
return size;
@@ -878,14 +878,14 @@ static int __init parse_crash_elf_headers(void)
return rc;
 
/* Determine vmcore size. */
-   vmcore_size = get_vmcore_size_elf64(elfcorebuf);
+   vmcore_size = get_vmcore_size_elf64(elfcorebuf, elfcorebuf_sz);
} else if (e_ident[EI_CLASS] == ELFCLASS32) {
rc = parse_crash_elf32_headers();
if (rc)
return rc;
 
/* Determine vmcore size. */
-   vmcore_size = get_vmcore_size_elf32(elfcorebuf);
+   vmcore_size = get_vmcore_size_elf32(elfcorebuf, elfcorebuf_sz);
} else {
pr_warn("Warning: Core image elf header is not sane\n");
return -EINVAL;

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH LINUX v5] xen: event channel arrays are xen_ulong_t and not unsigned long

2013-03-04 Thread Ian Campbell
On Tue, 2013-03-05 at 06:55 +, Rob Herring wrote:
> Here's a config:

Was it truncated? oldconfig is asking a ton of questions, far more than
I would expect for something like variance between our HEADs etc.

If I just answer the default to all the questions then I get a .config
with Xen on etc, but I still don't see the issue.

Ian.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 17/20] vmcore: check if vmcore objects satify mmap()'s page-size boundary requirement

2013-03-04 Thread HATAYAMA Daisuke
If there's some vmcore object that doesn't satisfy page-size boundary
requirement, remap_pfn_range() fails to remap it to user-space.

Objects that posisbly don't satisfy the requirement are ELF note
segments only. The memory chunks corresponding to PT_LOAD entries are
guaranteed to satisfy page-size boundary requirement by the copy from
old memory to buffer in 2nd kernel done in later patch.

Signed-off-by: HATAYAMA Daisuke 
---

 fs/proc/vmcore.c |   22 ++
 1 files changed, 22 insertions(+), 0 deletions(-)

diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
index e432946..5582aaa 100644
--- a/fs/proc/vmcore.c
+++ b/fs/proc/vmcore.c
@@ -38,6 +38,8 @@ static u64 vmcore_size;
 
 static struct proc_dir_entry *proc_vmcore = NULL;
 
+static bool support_mmap_vmcore;
+
 /*
  * Returns > 0 for RAM pages, 0 for non-RAM pages, < 0 on error
  * The called function has to take care of module refcounting.
@@ -897,6 +899,7 @@ static int __init parse_crash_elf_headers(void)
 static int __init vmcore_init(void)
 {
int rc = 0;
+   struct vmcore *m;
 
/* If elfcorehdr= has been passed in cmdline, then capture the dump.*/
if (!(is_vmcore_usable()))
@@ -907,6 +910,25 @@ static int __init vmcore_init(void)
return rc;
}
 
+   /* If some object doesn't satisfy PAGE_SIZE boundary
+* requirement, mmap_vmcore() is not exported to
+* user-space. */
+   support_mmap_vmcore = true;
+   list_for_each_entry(m, _list, list) {
+   u64 paddr;
+
+   if (m->flag & MEM_TYPE_CURRENT_KERNEL)
+   paddr = (u64)__pa(m->buf);
+   else
+   paddr = m->paddr;
+
+   if ((m->offset & ~PAGE_MASK) || (paddr & ~PAGE_MASK)
+   || (m->size & ~PAGE_MASK)) {
+   support_mmap_vmcore = false;
+   break;
+   }
+   }
+
proc_vmcore = proc_create("vmcore", S_IRUSR, NULL, 
_vmcore_operations);
if (proc_vmcore)
proc_vmcore->size = vmcore_size;

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 16/20] vmcore: check NT_VMCORE_PAD as a mark indicating the end of ELF note buffer

2013-03-04 Thread HATAYAMA Daisuke
Modern kernel marks the end of ELF note buffer with NT_VMCORE_PAD type
note in order to make the buffer satisfy mmap()'s page-size boundary
requirement. This patch makes finishing reading each buffer if the
note type now being read is NT_VMCORE_PAD type.

Signed-off-by: HATAYAMA Daisuke 
---

 fs/proc/vmcore.c |   24 
 1 files changed, 24 insertions(+), 0 deletions(-)

diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
index c8899dc..e432946 100644
--- a/fs/proc/vmcore.c
+++ b/fs/proc/vmcore.c
@@ -259,12 +259,24 @@ static int __init merge_note_headers_elf64(char *elfptr, 
size_t *elfsz,
}
nhdr_ptr = notes_section;
for (j = 0; j < max_sz; j += sz) {
+   char *name;
+
+   /* Old kernel marks the end of ELF note buffer
+* with empty header. */
if (nhdr_ptr->n_namesz == 0)
break;
sz = sizeof(Elf64_Nhdr) +
((nhdr_ptr->n_namesz + 3) & ~3) +
((nhdr_ptr->n_descsz + 3) & ~3);
real_sz += sz;
+
+   /* Modern kernel marks the end of ELF note
+* buffer with NT_VMCORE_PAD type note. */
+   name = (char *)(nhdr_ptr + 1);
+   if (strncmp(name, VMCOREINFO_NOTE_NAME,
+   sizeof(VMCOREINFO_NOTE_NAME)) == 0
+   && nhdr_ptr->n_type == NT_VMCORE_PAD)
+   break;
nhdr_ptr = (Elf64_Nhdr*)((char*)nhdr_ptr + sz);
}
 
@@ -369,12 +381,24 @@ static int __init merge_note_headers_elf32(char *elfptr, 
size_t *elfsz,
}
nhdr_ptr = notes_section;
for (j = 0; j < max_sz; j += sz) {
+   char *name;
+
+   /* Old kernel marks the end of ELF note buffer
+* with empty header. */
if (nhdr_ptr->n_namesz == 0)
break;
sz = sizeof(Elf32_Nhdr) +
((nhdr_ptr->n_namesz + 3) & ~3) +
((nhdr_ptr->n_descsz + 3) & ~3);
real_sz += sz;
+
+   /* Modern kernel marks the end of ELF note
+* buffer with NT_VMCORE_PAD type note. */
+   name = (char *)(nhdr_ptr + 1);
+   if (strncmp(name, VMCOREINFO_NOTE_NAME,
+   sizeof(VMCOREINFO_NOTE_NAME)) == 0
+   && nhdr_ptr->n_type == NT_VMCORE_PAD)
+   break;
nhdr_ptr = (Elf32_Nhdr*)((char*)nhdr_ptr + sz);
}
 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 15/20] kexec: fill note buffers by NT_VMCORE_PAD notes in page-size boundary

2013-03-04 Thread HATAYAMA Daisuke
Fill both crash_notes and vmcoreinfo_note buffers by NT_VMCORE_PAD
note type to make them satisfy mmap()'s page-size boundary
requirement.

So far, end of note segments has been marked by zero-filled elf
header. Instead, this patch writes NT_VMCORE_PAD note in the end of
note segments until the offset on page-size boundary.

Signed-off-by: HATAYAMA Daisuke 
---

 arch/s390/include/asm/kexec.h |7 --
 include/linux/kexec.h |   12 ++-
 kernel/kexec.c|   46 ++---
 3 files changed, 40 insertions(+), 25 deletions(-)

diff --git a/arch/s390/include/asm/kexec.h b/arch/s390/include/asm/kexec.h
index 694bcd6..2a531ce 100644
--- a/arch/s390/include/asm/kexec.h
+++ b/arch/s390/include/asm/kexec.h
@@ -41,8 +41,8 @@
 /*
  * Size for s390x ELF notes per CPU
  *
- * Seven notes plus zero note at the end: prstatus, fpregset, timer,
- * tod_cmp, tod_reg, control regs, and prefix
+ * Seven notes plus note with NT_VMCORE_PAD type at the end: prstatus,
+ * fpregset, timer, tod_cmp, tod_reg, control regs, and prefix
  */
 #define KEXEC_NOTE_BYTES \
(ALIGN(sizeof(struct elf_note), 4) * 8 + \
@@ -53,7 +53,8 @@
 ALIGN(sizeof(u64), 4) + \
 ALIGN(sizeof(u32), 4) + \
 ALIGN(sizeof(u64) * 16, 4) + \
-ALIGN(sizeof(u32), 4) \
+ALIGN(sizeof(u32), 4) + \
+VMCOREINFO_NOTE_NAME_BYTES \
)
 
 /* Provide a dummy definition to avoid build failures. */
diff --git a/include/linux/kexec.h b/include/linux/kexec.h
index 5113570..6592935 100644
--- a/include/linux/kexec.h
+++ b/include/linux/kexec.h
@@ -47,14 +47,16 @@
 #define KEXEC_CORE_NOTE_NAME_BYTES ALIGN(sizeof(KEXEC_CORE_NOTE_NAME), 4)
 #define KEXEC_CORE_NOTE_DESC_BYTES ALIGN(sizeof(struct elf_prstatus), 4)
 /*
- * The per-cpu notes area is a list of notes terminated by a "NULL"
- * note header.  For kdump, the code in vmcore.c runs in the context
- * of the second kernel to combine them into one note.
+ * The per-cpu notes area is a list of notes terminated by a note
+ * header with NT_VMCORE_PAD type. For kdump, the code in vmcore.c
+ * runs in the context of the second kernel to combine them into one
+ * note.
  */
 #ifndef KEXEC_NOTE_BYTES
 #define KEXEC_NOTE_BYTES ( (KEXEC_NOTE_HEAD_BYTES * 2) +   \
KEXEC_CORE_NOTE_NAME_BYTES +\
-   KEXEC_CORE_NOTE_DESC_BYTES )
+   KEXEC_CORE_NOTE_DESC_BYTES +\
+   VMCOREINFO_NOTE_NAME_BYTES)
 #endif
 
 /*
@@ -187,7 +189,7 @@ extern struct kimage *kexec_crash_image;
 #define VMCOREINFO_NOTE_NAME_BYTES ALIGN(sizeof(VMCOREINFO_NOTE_NAME), 4)
 #define VMCOREINFO_NOTE_SIZE   ALIGN(KEXEC_NOTE_HEAD_BYTES*2   \
 +VMCOREINFO_BYTES  \
-+VMCOREINFO_NOTE_NAME_BYTES,   \
++VMCOREINFO_NOTE_NAME_BYTES*2, \
 PAGE_SIZE)
 
 /* Location of a reserved region to hold the crash kernel.
diff --git a/kernel/kexec.c b/kernel/kexec.c
index 6597b82..fbdc0f0 100644
--- a/kernel/kexec.c
+++ b/kernel/kexec.c
@@ -40,6 +40,7 @@
 
 /* Per cpu memory for storing cpu states in case of system crash. */
 note_buf_t __percpu *crash_notes;
+static size_t crash_notes_size = ALIGN(sizeof(note_buf_t), PAGE_SIZE);
 
 /* vmcoreinfo stuff */
 static unsigned char vmcoreinfo_data[VMCOREINFO_BYTES];
@@ -1177,6 +1178,7 @@ unlock:
return ret;
 }
 
+/* If @data is NULL, fill @buf with 0 in @data_len bytes. */
 static u32 *append_elf_note(u32 *buf, char *name, unsigned type, void *data,
size_t data_len)
 {
@@ -1189,26 +1191,36 @@ static u32 *append_elf_note(u32 *buf, char *name, 
unsigned type, void *data,
buf += (sizeof(note) + 3)/4;
memcpy(buf, name, note.n_namesz);
buf += (note.n_namesz + 3)/4;
-   memcpy(buf, data, note.n_descsz);
+   if (data)
+   memcpy(buf, data, note.n_descsz);
+   else
+   memset(buf, 0, note.n_descsz);
buf += (note.n_descsz + 3)/4;
 
return buf;
 }
 
-static void final_note(u32 *buf)
+static void final_note(u32 *buf, size_t buf_len, size_t data_len)
 {
-   struct elf_note note;
+   size_t used_bytes, pad_hdr_size;
 
-   note.n_namesz = 0;
-   note.n_descsz = 0;
-   note.n_type   = 0;
-   memcpy(buf, , sizeof(note));
+   pad_hdr_size = KEXEC_NOTE_HEAD_BYTES + VMCOREINFO_NOTE_NAME_BYTES;
+
+   /*
+* keep space for ELF note header and "VMCOREINFO" name to
+* terminate ELF segment by NT_VMCORE_PAD note.
+*/
+   BUG_ON(data_len + pad_hdr_size > buf_len);
+
+   used_bytes = data_len + pad_hdr_size;
+   append_elf_note(buf, VMCOREINFO_NOTE_NAME, NT_VMCORE_PAD, NULL,
+   roundup(used_bytes, PAGE_SIZE) - used_bytes);
 }
 
 void 

[PATCH v2 14/20] elf: introduce NT_VMCORE_PAD type

2013-03-04 Thread HATAYAMA Daisuke
The NT_VMCORE_PAD type is introduced to make both crash_notes buffer
and vmcoreinfo_note buffer satisfy mmap()'s page-size boundary
requirement by filling them with this note type.

The purpose of this type is just to align the buffer in page-size
boundary; it has no meaning in contents, which are fully filled with
zero.

This note type belongs to "VMCOREINFO" name space and the type in this
name space is 7. The reason why the numbers from 1 to 5 is not chosen
is that for the ones from 1 to 4, there are the corresponding note
types using the same number in "CORE" name space, and crash utility
and makedumpfile don't distinguish note types by name space at all;
for the remaining 5, this has somehow not been used since v2.4.0
kernel despite the fact that NT_AUXV is defined as 6. It looks that it
avoids some dependency to 5. Here simply 5 is not chosen for
conservative viewpoint.

By this change, gdb and binutils work well without any change, but
makedumpfile and crash utility need their changes to distinguish two
note types in "VMCOREINFO" name space.

Signed-off-by: HATAYAMA Daisuke 
---

 include/uapi/linux/elf.h |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/include/uapi/linux/elf.h b/include/uapi/linux/elf.h
index b869904..9753e4c 100644
--- a/include/uapi/linux/elf.h
+++ b/include/uapi/linux/elf.h
@@ -402,6 +402,7 @@ typedef struct elf64_shdr {
  * Notes exported from /proc/vmcore, belonging to "VMCOREINFO" name.
  */
 #define NT_VMCORE_DEBUGINFO0   /* vmcore system kernel's debuginfo */
+#define NT_VMCORE_PAD  7   /* vmcore padding of note segments */
 
 /* Note header in a PT_NOTE section */
 typedef struct elf32_note {

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 12/20] kexec: allocate vmcoreinfo note buffer on page-size boundary

2013-03-04 Thread HATAYAMA Daisuke
To satisfy mmap()'s page-size boundary requirement, specify aligned
attribute to vmcoreinfo_note objects to allocate it on page-size
boundary.

Signed-off-by: HATAYAMA Daisuke 
---

 include/linux/kexec.h |6 --
 kernel/kexec.c|2 +-
 2 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/include/linux/kexec.h b/include/linux/kexec.h
index d2e6927..5113570 100644
--- a/include/linux/kexec.h
+++ b/include/linux/kexec.h
@@ -185,8 +185,10 @@ extern struct kimage *kexec_crash_image;
 #define VMCOREINFO_BYTES   (4096)
 #define VMCOREINFO_NOTE_NAME   "VMCOREINFO"
 #define VMCOREINFO_NOTE_NAME_BYTES ALIGN(sizeof(VMCOREINFO_NOTE_NAME), 4)
-#define VMCOREINFO_NOTE_SIZE   (KEXEC_NOTE_HEAD_BYTES*2 + VMCOREINFO_BYTES 
\
-   + VMCOREINFO_NOTE_NAME_BYTES)
+#define VMCOREINFO_NOTE_SIZE   ALIGN(KEXEC_NOTE_HEAD_BYTES*2   \
++VMCOREINFO_BYTES  \
++VMCOREINFO_NOTE_NAME_BYTES,   \
+PAGE_SIZE)
 
 /* Location of a reserved region to hold the crash kernel.
  */
diff --git a/kernel/kexec.c b/kernel/kexec.c
index d1f365e..195de6d 100644
--- a/kernel/kexec.c
+++ b/kernel/kexec.c
@@ -43,7 +43,7 @@ note_buf_t __percpu *crash_notes;
 
 /* vmcoreinfo stuff */
 static unsigned char vmcoreinfo_data[VMCOREINFO_BYTES];
-u32 vmcoreinfo_note[VMCOREINFO_NOTE_SIZE/4];
+u32 vmcoreinfo_note[VMCOREINFO_NOTE_SIZE/4] __aligned(PAGE_SIZE);
 size_t vmcoreinfo_size;
 size_t vmcoreinfo_max_size = sizeof(vmcoreinfo_data);
 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 11/20] vmcore: allocate per-cpu crash_notes objects on page-size boundary

2013-03-04 Thread HATAYAMA Daisuke
To satisfy mmap()'s page-size boundary requirement, allocate per-cpu
crash_notes objects on page-size boundary.

/proc/vmcore on the 2nd kernel checks if each note objects is
allocated on page-size boundary. If there's some object not satisfying
the page-size boundary requirement, /proc/vmcore doesn't provide
mmap() interface.

Signed-off-by: HATAYAMA Daisuke 
---

 kernel/kexec.c |3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/kernel/kexec.c b/kernel/kexec.c
index bddd3d7..d1f365e 100644
--- a/kernel/kexec.c
+++ b/kernel/kexec.c
@@ -1234,7 +1234,8 @@ void crash_save_cpu(struct pt_regs *regs, int cpu)
 static int __init crash_notes_memory_init(void)
 {
/* Allocate memory for saving cpu registers. */
-   crash_notes = alloc_percpu(note_buf_t);
+   crash_notes = __alloc_percpu(roundup(sizeof(note_buf_t), PAGE_SIZE),
+PAGE_SIZE);
if (!crash_notes) {
printk("Kexec: Memory allocation for saving cpu register"
" states failed\n");

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 10/20] vmcore: read buffers for vmcore objects copied from old memory

2013-03-04 Thread HATAYAMA Daisuke
If flag MEM_TYPE_CURRENT_KERNEL is set, the object is copied in the
buffer on the 2nd kernel, then read_vmcore() reads the buffer. If the
flag is not set, read_vmcore() reads old memory as usual.

Signed-off-by: HATAYAMA Daisuke 
---

 fs/proc/vmcore.c |   15 +++
 1 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
index 09b7f34..c8899dc 100644
--- a/fs/proc/vmcore.c
+++ b/fs/proc/vmcore.c
@@ -158,10 +158,17 @@ static ssize_t read_vmcore(struct file *file, char __user 
*buffer,
tsz = m->offset + m->size - *fpos;
if (buflen < tsz)
tsz = buflen;
-   start = m->paddr + *fpos - m->offset;
-   tmp = read_from_oldmem(buffer, tsz, , 1);
-   if (tmp < 0)
-   return tmp;
+   if (m->flag & MEM_TYPE_CURRENT_KERNEL) {
+   if (copy_to_user(buffer,
+m->buf + *fpos - m->offset,
+tsz))
+   return -EFAULT;
+   } else {
+   start = m->paddr + *fpos - m->offset;
+   tmp = read_from_oldmem(buffer, tsz, , 1);
+   if (tmp < 0)
+   return tmp;
+   }
buflen -= tsz;
*fpos += tsz;
buffer += tsz;

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 09/20] vmcore: clean up read_vmcore()

2013-03-04 Thread HATAYAMA Daisuke
Clean up read_vmcore(). Part for objects in vmcore_list can be written
uniformly to part for ELF headers. By this change, duplicate and
complicated codes are removed, so it's more clear to see what's done
there.

Also, by this change, map_offset_to_paddr() is no longer used. Remove
it.

Signed-off-by: HATAYAMA Daisuke 
---

 fs/proc/vmcore.c |   68 --
 1 files changed, 20 insertions(+), 48 deletions(-)

diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
index e4751ce..09b7f34 100644
--- a/fs/proc/vmcore.c
+++ b/fs/proc/vmcore.c
@@ -118,27 +118,6 @@ static ssize_t read_from_oldmem(char *buf, size_t count,
return read;
 }
 
-/* Maps vmcore file offset to respective physical address in memroy. */
-static u64 map_offset_to_paddr(loff_t offset, struct list_head *vc_list,
-   struct vmcore **m_ptr)
-{
-   struct vmcore *m;
-   u64 paddr;
-
-   list_for_each_entry(m, vc_list, list) {
-   u64 start, end;
-   start = m->offset;
-   end = m->offset + m->size - 1;
-   if (offset >= start && offset <= end) {
-   paddr = m->paddr + offset - start;
-   *m_ptr = m;
-   return paddr;
-   }
-   }
-   *m_ptr = NULL;
-   return 0;
-}
-
 /* Read from the ELF header and then the crash dump. On error, negative value 
is
  * returned otherwise number of bytes read are returned.
  */
@@ -147,8 +126,8 @@ static ssize_t read_vmcore(struct file *file, char __user 
*buffer,
 {
ssize_t acc = 0, tmp;
size_t tsz;
-   u64 start, nr_bytes;
-   struct vmcore *curr_m = NULL;
+   u64 start;
+   struct vmcore *m;
 
if (buflen == 0 || *fpos >= vmcore_size)
return 0;
@@ -174,33 +153,26 @@ static ssize_t read_vmcore(struct file *file, char __user 
*buffer,
return acc;
}
 
-   start = map_offset_to_paddr(*fpos, _list, _m);
-   if (!curr_m)
-   return -EINVAL;
-
-   while (buflen) {
-   tsz = min_t(size_t, buflen, PAGE_SIZE - (start & ~PAGE_MASK));
-
-   /* Calculate left bytes in current memory segment. */
-   nr_bytes = (curr_m->size - (start - curr_m->paddr));
-   if (tsz > nr_bytes)
-   tsz = nr_bytes;
-
-   tmp = read_from_oldmem(buffer, tsz, , 1);
-   if (tmp < 0)
-   return tmp;
-   buflen -= tsz;
-   *fpos += tsz;
-   buffer += tsz;
-   acc += tsz;
-   if (start >= (curr_m->paddr + curr_m->size)) {
-   if (curr_m->list.next == _list)
-   return acc; /*EOF*/
-   curr_m = list_entry(curr_m->list.next,
-   struct vmcore, list);
-   start = curr_m->paddr;
+   list_for_each_entry(m, _list, list) {
+   if (*fpos < m->offset + m->size) {
+   tsz = m->offset + m->size - *fpos;
+   if (buflen < tsz)
+   tsz = buflen;
+   start = m->paddr + *fpos - m->offset;
+   tmp = read_from_oldmem(buffer, tsz, , 1);
+   if (tmp < 0)
+   return tmp;
+   buflen -= tsz;
+   *fpos += tsz;
+   buffer += tsz;
+   acc += tsz;
+
+   /* leave now if filled buffer already */
+   if (buflen == 0)
+   return acc;
}
}
+
return acc;
 }
 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 05/20] vmcore: round up buffer size of ELF headers by PAGE_SIZE

2013-03-04 Thread HATAYAMA Daisuke
To satisfy mmap() page-size boundary requirement, round up buffer size
of ELF headers by PAGE_SIZE. The resulting value becomes offset of ELF
note segments and it's assigned in unique PT_NOTE program header
entry.

Also, some part that assumes past ELF headers' size is replaced by
this new rounded-up value.

Signed-off-by: HATAYAMA Daisuke 
---

 fs/proc/vmcore.c |   16 
 1 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
index 1b02d01..c511cf4 100644
--- a/fs/proc/vmcore.c
+++ b/fs/proc/vmcore.c
@@ -340,7 +340,7 @@ static int __init merge_note_headers_elf64(char *elfptr, 
size_t *elfsz,
phdr.p_flags   = 0;
note_off = ehdr_ptr->e_phoff +
(ehdr_ptr->e_phnum - nr_ptnote +1) * sizeof(Elf64_Phdr);
-   phdr.p_offset  = note_off;
+   phdr.p_offset  = roundup(note_off, PAGE_SIZE);
phdr.p_vaddr   = phdr.p_paddr = 0;
phdr.p_filesz  = phdr.p_memsz = phdr_sz;
phdr.p_align   = 0;
@@ -353,6 +353,7 @@ static int __init merge_note_headers_elf64(char *elfptr, 
size_t *elfsz,
/* Modify e_phnum to reflect merged headers. */
ehdr_ptr->e_phnum = ehdr_ptr->e_phnum - nr_ptnote + 1;
 
+   *elfsz = roundup(*elfsz, PAGE_SIZE);
 out:
return 0;
 }
@@ -449,7 +450,7 @@ static int __init merge_note_headers_elf32(char *elfptr, 
size_t *elfsz,
phdr.p_flags   = 0;
note_off = ehdr_ptr->e_phoff +
(ehdr_ptr->e_phnum - nr_ptnote +1) * sizeof(Elf32_Phdr);
-   phdr.p_offset  = note_off;
+   phdr.p_offset  = roundup(note_off, PAGE_SIZE);
phdr.p_vaddr   = phdr.p_paddr = 0;
phdr.p_filesz  = phdr.p_memsz = phdr_sz;
phdr.p_align   = 0;
@@ -462,6 +463,7 @@ static int __init merge_note_headers_elf32(char *elfptr, 
size_t *elfsz,
/* Modify e_phnum to reflect merged headers. */
ehdr_ptr->e_phnum = ehdr_ptr->e_phnum - nr_ptnote + 1;
 
+   *elfsz = roundup(*elfsz, PAGE_SIZE);
 out:
return 0;
 }
@@ -482,9 +484,8 @@ static int __init process_ptload_program_headers_elf64(char 
*elfptr,
phdr_ptr = (Elf64_Phdr*)(elfptr + ehdr_ptr->e_phoff); /* PT_NOTE hdr */
 
/* First program header is PT_NOTE header. */
-   vmcore_off = ehdr_ptr->e_phoff +
-   (ehdr_ptr->e_phnum) * sizeof(Elf64_Phdr) +
-   phdr_ptr->p_memsz; /* Note sections */
+   vmcore_off = phdr_ptr->p_offset + roundup(phdr_ptr->p_memsz,
+ PAGE_SIZE);
 
for (i = 0; i < ehdr_ptr->e_phnum; i++, phdr_ptr++) {
if (phdr_ptr->p_type != PT_LOAD)
@@ -519,9 +520,8 @@ static int __init process_ptload_program_headers_elf32(char 
*elfptr,
phdr_ptr = (Elf32_Phdr*)(elfptr + ehdr_ptr->e_phoff); /* PT_NOTE hdr */
 
/* First program header is PT_NOTE header. */
-   vmcore_off = ehdr_ptr->e_phoff +
-   (ehdr_ptr->e_phnum) * sizeof(Elf32_Phdr) +
-   phdr_ptr->p_memsz; /* Note sections */
+   vmcore_off = phdr_ptr->p_offset + roundup(phdr_ptr->p_memsz,
+PAGE_SIZE);
 
for (i = 0; i < ehdr_ptr->e_phnum; i++, phdr_ptr++) {
if (phdr_ptr->p_type != PT_LOAD)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 03/20] vmcore, sysfs: export ELF note segment size instead of vmcoreinfo data size

2013-03-04 Thread HATAYAMA Daisuke
p_memsz member of program header entry with PT_NOTE type needs to have
size of the corresponding ELF note segment. Currently, vmcoreinfo
exports data part only. If vmcoreinfo reachs vmcoreinfo_max_size, then
in merge_note_headers_elf{32,64}, empty ELF note header cannot be
found or buffer overrun can happen.

Note: kexec-tools assigns PAGE_SIZE to p_memsz for other ELF note
types. Due to the above reason, the same issue occurs if actual ELF
note data exceeds (PAGE_SIZE - 2 * KEXEC_NOTE_HEAD_BYTES).

Signed-off-by: HATAYAMA Daisuke 
---

 kernel/ksysfs.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/kernel/ksysfs.c b/kernel/ksysfs.c
index 6ada93c..97d2763 100644
--- a/kernel/ksysfs.c
+++ b/kernel/ksysfs.c
@@ -126,7 +126,7 @@ static ssize_t vmcoreinfo_show(struct kobject *kobj,
 {
return sprintf(buf, "%lx %x\n",
   paddr_vmcoreinfo_note(),
-  (unsigned int)vmcoreinfo_max_size);
+  (unsigned int)sizeof(vmcoreinfo_note));
 }
 KERNEL_ATTR_RO(vmcoreinfo);
 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 00/20] kdump, vmcore: support mmap() on /proc/vmcore

2013-03-04 Thread HATAYAMA Daisuke
Currently, read to /proc/vmcore is done by read_oldmem() that uses
ioremap/iounmap per a single page. For example, if memory is 1GB,
ioremap/iounmap is called (1GB / 4KB)-times, that is, 262144
times. This causes big performance degradation.

In particular, the current main user of this mmap() is makedumpfile,
which not only reads memory from /proc/vmcore but also does other
processing like filtering, compression and IO work. Update of page
table and the following TLB flush makes such processing much slow;
though I have yet to make patch for makedumpfile and yet to confirm
how it's improved.

To address the issue, this patch implements mmap() on /proc/vmcore to
improve read performance. My simple benchmark shows the improvement
from 200 [MiB/sec] to over 50.0 [GiB/sec].

ChangeLog
=

v1 => v2)

- Clean up the existing codes: use e_phoff, and remove the assumption
  on PT_NOTE entries.
  => See PATCH 01, 02.

- Fix potencial bug that ELF haeader size is not included in exported
  vmcoreinfo size.
  => See Patch 03.

- Divide patch modifying read_vmcore() into two: clean-up and primary
  code change.
  => See Patch 9, 10.

- Put ELF note segments in page-size boundary on the 1st kernel
  instead of copying them into the buffer on the 2nd kernel.
  => See Patch 11, 12, 13, 14, 16.

Benchmark
=

No change is seen from the previous patch series. See the previous
one from here:

  https://lkml.org/lkml/2013/2/14/89

TODO


- fix makedumpfile to use mmap() on /proc/vmcore and benchmark it to
  confirm whether we can see enough performance improvement. The idea
  is described here:
  http://lists.infradead.org/pipermail/kexec/2013-February/007982.html

- fix crash utility and makedumpfile to support NT_VMCORE_PAD note
  type. Both tools don't distinguish the same note types from
  different note names, which is not conform to ELF specification; now
  both reads NT_VMCORE_PAD note type as NT_VMCORE_DEBUGINFO.

Test


This patch set is composed based on v3.9-rc1.

Done on x86-64, x86-32 both with 1GB and over 4GB memory environments.

---

HATAYAMA Daisuke (20):
  vmcore: introduce mmap_vmcore()
  vmcore: count holes generated by round-up operation for vmcore size
  vmcore: round-up offset of vmcore object in page-size boundary
  vmcore: check if vmcore objects satify mmap()'s page-size boundary 
requirement
  vmcore: check NT_VMCORE_PAD as a mark indicating the end of ELF note 
buffer
  kexec: fill note buffers by NT_VMCORE_PAD notes in page-size boundary
  elf: introduce NT_VMCORE_PAD type
  kexec, elf: introduce NT_VMCORE_DEBUGINFO note type
  kexec: allocate vmcoreinfo note buffer on page-size boundary
  vmcore: allocate per-cpu crash_notes objects on page-size boundary
  vmcore: read buffers for vmcore objects copied from old memory
  vmcore: clean up read_vmcore()
  vmcore: modify vmcore clean-up function to free buffer on 2nd kernel
  vmcore: copy non page-size aligned head and tail pages in 2nd kernel
  vmcore, procfs: introduce a flag to distinguish objects copied in 2nd 
kernel
  vmcore: round up buffer size of ELF headers by PAGE_SIZE
  vmcore: allocate buffer for ELF headers on page-size alignment
  vmcore, sysfs: export ELF note segment size instead of vmcoreinfo data 
size
  vmcore: rearrange program headers without assuming consequtive PT_NOTE 
entries
  vmcore: refer to e_phoff member explicitly


 arch/s390/include/asm/kexec.h |7 
 fs/proc/vmcore.c  |  577 -
 include/linux/kexec.h |   16 +
 include/linux/proc_fs.h   |8 -
 include/uapi/linux/elf.h  |5 
 kernel/kexec.c|   47 ++-
 kernel/ksysfs.c   |2 
 7 files changed, 505 insertions(+), 157 deletions(-)

-- 
Thanks.
HATAYAMA, Daisuke
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RESEND 1/3] regulator: core: Add enable_is_inverted flag to indicate set enable_mask bits to disable

2013-03-04 Thread Mark Brown
On Tue, Mar 05, 2013 at 02:16:00PM +0800, Axel Lin wrote:
> Add enable_is_inverted flag to indicate set enable_mask bits to disable
> when using regulator_enable_regmap and friends APIs.
> 
> Signed-off-by: Axel Lin 
> Reviewed-by: Haojian Zhuang 
> ---
> This patch was sent on https://lkml.org/lkml/2013/2/16/14.
> This resend is against linux-next tree (20130305).

This doesn't apply against v3.9-rc1...  what are the dependencies?


signature.asc
Description: Digital signature


Re: [RFC PATCH v4 5/6] uretprobes: invoke return probe handlers

2013-03-04 Thread Ananth N Mavinakayanahalli
On Mon, Mar 04, 2013 at 03:38:12PM +0100, Anton Arapov wrote:
> 
> diff --git a/arch/x86/include/asm/uprobes.h b/arch/x86/include/asm/uprobes.h
> index c353555..fa9d9de 100644
> --- a/arch/x86/include/asm/uprobes.h
> +++ b/arch/x86/include/asm/uprobes.h
> @@ -56,4 +56,9 @@ extern bool arch_uprobe_xol_was_trapped(struct task_struct 
> *tsk);
>  extern int  arch_uprobe_exception_notify(struct notifier_block *self, 
> unsigned long val, void *data);
>  extern void arch_uprobe_abort_xol(struct arch_uprobe *aup, struct pt_regs 
> *regs);
>  extern unsigned long arch_uretprobe_hijack_return_addr(unsigned long 
> rp_trampoline_vaddr, struct pt_regs *regs);
> +
> +static inline unsigned long arch_uretprobe_get_sp(struct pt_regs *regs)
> +{
> + return (unsigned long)regs->sp;
> +}

You could use GET_USP() here.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/1] pwm_bl: Add support for backlight enable GPIO

2013-03-04 Thread Thierry Reding
On Tue, Mar 05, 2013 at 01:59:06PM +0900, Alex Courbot wrote:
> On 03/05/2013 01:48 PM, Andrew Chew wrote:
> >I sent out a new patch that enables/disables the backlight enable gpio.
> >
> >>On 03/05/2013 01:00 PM, Andrew Chew wrote:
> >>>I did come to the same conclusion regarding the platform data breakage.
> >>>I'm expecting that the use of platform data will go away, at least on
> >>>ARM, since we are all aggressively moving what used to be in platform
> >>>data into the device tree.  Do other platforms use this driver?
> >>
> >>I can see at least 29 users of platform_pwm_backlight_data, all ARM with the
> >>exception of one unicore32. I guess at least for the foreseeable future
> >>platform data will remain.
> >
> >I'm not sure how to solve this, then.  Any suggestions?
> 
> In one of my (many) attempts to add power sequencing to
> pwm-backlight, I just added a boolean to the platform data that must
> be explicitly set in order to enable control by GPIO. I.e.
> 
>   booluse_enable_gpio
>   int enable_gpio;
>   unsigned intenable_gpio_flags;
> 
> enable_gpio and enable_gpio_flags would then only be considered if
> use_enable_gpio is true. Granted, it's not the best solution here
> but that's the only way to handle this correctly with integer GPIOS,
> and it does not pollute the DT anyway (use_enable_gpio will only be
> set by pwm_backlight_parse_dt() if of_get_named_gpio() returned a
> valid GPIO. Btw, you also want to check if the enable-gpio property
> exists first because otherwise probe() will fails if no GPIO is
> specified).

There are two more options that I can see. The first involves updating
all users to initialize the GPIO to -1 in the platform data, which will
remove the requirement for an extra flag field. Another option would be
to make this feature DT only, so that the GPIO can always be assumed to
be -1 for non-DT and DT without an enable-gpio property.

There's a third alternative, namely using a regulator for this, which
has better lookup support and therefore should be easier to make
optional. And as Alex mentioned it already has support for always-on
functionality and such.

Thierry


pgpng2b93uGqI.pgp
Description: PGP signature


Re: For review: pid_namespaces(7) man page

2013-03-04 Thread Michael Kerrisk (man-pages)
On Mon, Mar 4, 2013 at 8:27 PM, Rob Landley  wrote:
> On 03/03/2013 10:03:55 PM, Eric W. Biederman wrote:
>>
>> Rob Landley  writes:
>>
>> > On 03/01/2013 03:57:40 AM, Michael Kerrisk (man-pages) wrote:
>> >> > And yet the glibc guys insist on #define
>> >> GNU_GNU_GNU_ALL_HAIL_STALLMAN in
>> >> > order to access this Linux-specific feature which has nothing
>> >> whatsoever to
>> >> > do with the FSF.
>> >>
>> >> This is a misunderstanding. _GNU_SOURCE is the standard way to expose
>> >> Linux-specific functionality from POSIX header files.
>> >
>> > What standard? The Linux kernel is not, and never was, part of the GNU
>> > project.
>>
>> Is the argument that there should be a _LINUX_SOURCE directive in glibc
>> for this?
>
>
> If you don't #define any feature test macros at all, you get a bunch of
> macros (_BSD_SOURCE, _SVID_SOURCE, _POSIX_SOURCE, _POSIX_C_SOURCE=200809L,
> and so on) defined by default in features.h. If you start defining macros,
> several of the default ones _go_away_, and you start missing things that are
> defined by posix-2008. Yes, defining feature test macros makes definitions
> _vanish_ out of the headers, which means feature test macros can actually
> reduce code portability.

This has nothing to do with reducing portability; have a (careful)
read of feature_test_macros(7).

[...]

> The new musl-libc.org did an _ALL_SOURCE macro that just enables every
> feature test macro they implemented. (That's its definition, it's the
> feature test macro that says feature test macros area bad idea.)
>
>
>> Although come to think of it I can't imagine how  is a POSIX
>> header.  Last I looked it only had linux specific bits in it.  Which
>> makes needing any kind of #define strange.
>
>
> My objection is that Linux system calls are not part of the GNU project.
> Requiring that macro to get Linux system calls out of bionic, uClibc, klibc,
> musl, olibc, dietlibc, or newlib is _silly_. It's the "GNU/Linux" prefix
> imposed on the source level, and it's a fairly recent development (I've only
> noticed it since 2008 or so).

The macro has been present since at least glibc 2.0 (1997).

Cheers,

Michael

-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Author of "The Linux Programming Interface"; http://man7.org/tlpi/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH v2 4/4] timekeeping: utilize the suspend-nonstop clocksource to count suspended time

2013-03-04 Thread Feng Tang
On Tue, Mar 05, 2013 at 02:47:27PM +0800, John Stultz wrote:
> On 03/05/2013 02:38 PM, Feng Tang wrote:
> >On Tue, Mar 05, 2013 at 02:27:34PM +0800, John Stultz wrote:
> >
> >>
> >>So this might be ok for an initial implementation, as on the
> >>non-stop-tsc hardware, the TSC is the best clocksource available.
> >>One concern long term is that there may be cases where the non-stop
> >>clocksource is not the most performant clocksource on a system. In
> >>that case, we may want to use a non-stop clocksource that is not the
> >>current timekeeping clocksource. So that may require some extra
> >>clocksource core interfaces to access the non-stop clocksource
> >>instead of using the timekeeper's clocksource, also we'll have to be
> >>sure to use something other then cycle_last in that case, since
> >>we'll need to read the nonstop clocksource at suspend, rather then
> >>trusting that forward_now updates cycle_last as is done here.
> >Yeah, I just realized this when Jason mentioned the counter_32k on
> >OMAP.
> >
> >So for next step, we may add something in timekeeping.c like
> > static struct clocksource *suspend_time_cs;
> >read and save its counter righer before entering and after getting
> >out of suspended state. And create a new struct which includes
> >all time suspend related flags, counters, pointers, make it as a
> >member of struct timekeeper. Comments?
> I'd maybe add it to the clocksource code rather then the timekeeper.
> Have a clocksource_nonstop_clock() accessor which returns a pointer
> to the highest rated clocksource in the clocksource list that has
> the nonstop flag (updating the pointer at registration time, rather
> then scanning the list every time).

Yes, it's more natural to put it in clocksource code if we need to
pick and compare a best non-stop-tsc clocksource.

Thanks,
Feng


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH RFT] pinctrl: abx500: Fix checking if pin use AlternateFunction register

2013-03-04 Thread Axel Lin
It's pointless to check "af.alt_bit1 == UNUSED" twice.
This looks like a copy-paste bug, I think what we want is to check if *both*
af.alt_bit1 and af.alt_bit2 are UNUSED.

Signed-off-by: Axel Lin 
---
Hi,
I don't have the datasheet and hw.
I'd appreciate if someone can review and test this patch.

Thanks,
Axel
 drivers/pinctrl/pinctrl-abx500.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/pinctrl/pinctrl-abx500.c b/drivers/pinctrl/pinctrl-abx500.c
index e8abc3c..0cf3fa4 100644
--- a/drivers/pinctrl/pinctrl-abx500.c
+++ b/drivers/pinctrl/pinctrl-abx500.c
@@ -422,7 +422,7 @@ static u8 abx500_get_mode(struct pinctrl_dev *pctldev, 
struct gpio_chip *chip,
}
 
/* check if pin use AlternateFunction register */
-   if ((af.alt_bit1 == UNUSED) && (af.alt_bit1 == UNUSED))
+   if ((af.alt_bit1 == UNUSED) && (af.alt_bit2 == UNUSED))
return mode;
/*
 * if pin GPIOSEL bit is set and pin supports alternate function,
-- 
1.7.9.5



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC/PATCH 5/5] media: vb2: use FOLL_DURABLE and __get_user_pages() to avoid CMA migration issues

2013-03-04 Thread Marek Szyprowski
V4L2 devices usually grab additional references to user pages for a very
long period of time, what causes permanent migration failures if the given
page has been allocated from CMA pageblock. By setting FOLL_DURABLE flag,
videobuf2 will instruct __get_user_pages() to migrate user pages out of
CMA pageblocks before blocking them with an additional reference.

Signed-off-by: Marek Szyprowski 
Signed-off-by: Kyungmin Park 
---
 drivers/media/v4l2-core/videobuf2-dma-contig.c |8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/media/v4l2-core/videobuf2-dma-contig.c 
b/drivers/media/v4l2-core/videobuf2-dma-contig.c
index 10beaee..70649ab 100644
--- a/drivers/media/v4l2-core/videobuf2-dma-contig.c
+++ b/drivers/media/v4l2-core/videobuf2-dma-contig.c
@@ -443,9 +443,13 @@ static int vb2_dc_get_user_pages(unsigned long start, 
struct page **pages,
}
} else {
int n;
+   int flags = FOLL_TOUCH | FOLL_GET | FOLL_FORCE | FOLL_DURABLE;
 
-   n = get_user_pages(current, current->mm, start & PAGE_MASK,
-   n_pages, write, 1, pages, NULL);
+   if (write)
+   flags |= FOLL_WRITE;
+
+   n = __get_user_pages(current, current->mm, start & PAGE_MASK,
+   n_pages, flags, pages, NULL, NULL);
/* negative error means that no page was pinned */
n = max(n, 0);
if (n != n_pages) {
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC/PATCH 1/5] mm: introduce migrate_replace_page() for migrating page to the given target

2013-03-04 Thread Marek Szyprowski
Introduce migrate_replace_page() function for migrating single page to the
given target page.

Signed-off-by: Marek Szyprowski 
Signed-off-by: Kyungmin Park 
---
 include/linux/migrate.h |5 
 mm/migrate.c|   59 +++
 2 files changed, 64 insertions(+)

diff --git a/include/linux/migrate.h b/include/linux/migrate.h
index a405d3dc..3a8a6c1 100644
--- a/include/linux/migrate.h
+++ b/include/linux/migrate.h
@@ -35,6 +35,8 @@ enum migrate_reason {
 
 #ifdef CONFIG_MIGRATION
 
+extern int migrate_replace_page(struct page *oldpage, struct page *newpage);
+
 extern void putback_lru_pages(struct list_head *l);
 extern void putback_movable_pages(struct list_head *l);
 extern int migrate_page(struct address_space *,
@@ -57,6 +59,9 @@ extern int migrate_huge_page_move_mapping(struct 
address_space *mapping,
  struct page *newpage, struct page *page);
 #else
 
+static inline int migrate_replace_page(struct page *oldpage,
+   struct page *newpage) { return -ENOSYS; }
+
 static inline void putback_lru_pages(struct list_head *l) {}
 static inline void putback_movable_pages(struct list_head *l) {}
 static inline int migrate_pages(struct list_head *l, new_page_t x,
diff --git a/mm/migrate.c b/mm/migrate.c
index 3bbaf5d..a2a6950 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1067,6 +1067,65 @@ out:
return rc;
 }
 
+/*
+ * migrate_replace_page
+ *
+ * The function takes one single page and a target page (newpage) and
+ * tries to migrate data to the target page. The caller must ensure that
+ * the source page is locked with one additional get_page() call, which
+ * will be freed during the migration. The caller also must release newpage
+ * if migration fails, otherwise the ownership of the newpage is taken.
+ * Source page is released if migration succeeds.
+ *
+ * Return: error code or 0 on success.
+ */
+int migrate_replace_page(struct page *page, struct page *newpage)
+{
+   struct zone *zone = page_zone(page);
+   unsigned long flags;
+   int ret = -EAGAIN;
+   int pass;
+
+   migrate_prep();
+
+   spin_lock_irqsave(>lru_lock, flags);
+
+   if (PageLRU(page) &&
+   __isolate_lru_page(page, ISOLATE_UNEVICTABLE) == 0) {
+   struct lruvec *lruvec = mem_cgroup_page_lruvec(page, zone);
+   del_page_from_lru_list(page, lruvec, page_lru(page));
+   spin_unlock_irqrestore(>lru_lock, flags);
+   } else {
+   spin_unlock_irqrestore(>lru_lock, flags);
+   return -EAGAIN;
+   }
+
+   /* page is now isolated, so release additional reference */
+   put_page(page);
+
+   for (pass = 0; pass < 10 && ret != 0; pass++) {
+   cond_resched();
+
+   if (page_count(page) == 1) {
+   /* page was freed from under us, so we are done */
+   ret = 0;
+   break;
+   }
+   ret = __unmap_and_move(page, newpage, 1, MIGRATE_SYNC);
+   }
+
+   if (ret == 0) {
+   /* take ownership of newpage and add it to lru */
+   putback_lru_page(newpage);
+   } else {
+   /* restore additional reference to the oldpage */
+   get_page(page);
+   }
+
+   putback_lru_page(page);
+   return ret;
+}
+
 #ifdef CONFIG_NUMA
 /*
  * Move a list of individual pages
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC/PATCH 3/5] mm: get_user_pages: use NON-MOVABLE pages when FOLL_DURABLE flag is set

2013-03-04 Thread Marek Szyprowski
Ensure that newly allocated pages, which are faulted in in FOLL_DURABLE
mode comes from non-movalbe pageblocks, to workaround migration failures
with Contiguous Memory Allocator.

Signed-off-by: Marek Szyprowski 
Signed-off-by: Kyungmin Park 
---
 include/linux/highmem.h |   12 ++--
 include/linux/mm.h  |2 ++
 mm/memory.c |   24 ++--
 3 files changed, 30 insertions(+), 8 deletions(-)

diff --git a/include/linux/highmem.h b/include/linux/highmem.h
index 7fb31da..cf0b9d8 100644
--- a/include/linux/highmem.h
+++ b/include/linux/highmem.h
@@ -168,7 +168,8 @@ __alloc_zeroed_user_highpage(gfp_t movableflags,
 #endif
 
 /**
- * alloc_zeroed_user_highpage_movable - Allocate a zeroed HIGHMEM page for a 
VMA that the caller knows can move
+ * alloc_zeroed_user_highpage_movable - Allocate a zeroed HIGHMEM page for
+ * a VMA that the caller knows can move
  * @vma: The VMA the page is to be allocated for
  * @vaddr: The virtual address the page will be inserted into
  *
@@ -177,11 +178,18 @@ __alloc_zeroed_user_highpage(gfp_t movableflags,
  */
 static inline struct page *
 alloc_zeroed_user_highpage_movable(struct vm_area_struct *vma,
-   unsigned long vaddr)
+  unsigned long vaddr)
 {
return __alloc_zeroed_user_highpage(__GFP_MOVABLE, vma, vaddr);
 }
 
+static inline struct page *
+alloc_zeroed_user_highpage(gfp_t gfp, struct vm_area_struct *vma,
+  unsigned long vaddr)
+{
+   return __alloc_zeroed_user_highpage(gfp, vma, vaddr);
+}
+
 static inline void clear_highpage(struct page *page)
 {
void *kaddr = kmap_atomic(page);
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 9806e54..c11f58f 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -165,6 +165,7 @@ extern pgprot_t protection_map[16];
 #define FAULT_FLAG_RETRY_NOWAIT0x10/* Don't drop mmap_sem and wait 
when retrying */
 #define FAULT_FLAG_KILLABLE0x20/* The fault task is in SIGKILL 
killable region */
 #define FAULT_FLAG_TRIED   0x40/* second try */
+#define FAULT_FLAG_NO_CMA  0x80/* don't use CMA pages */
 
 /*
  * vm_fault is filled by the the pagefault handler and passed to the vma's
@@ -1633,6 +1634,7 @@ static inline struct page *follow_page(struct 
vm_area_struct *vma,
 #define FOLL_HWPOISON  0x100   /* check page is hwpoisoned */
 #define FOLL_NUMA  0x200   /* force NUMA hinting page fault */
 #define FOLL_MIGRATION 0x400   /* wait for page to replace migration entry */
+#define FOLL_DURABLE   0x800   /* get the page reference for a long time */
 
 typedef int (*pte_fn_t)(pte_t *pte, pgtable_t token, unsigned long addr,
void *data);
diff --git a/mm/memory.c b/mm/memory.c
index 42dfd8e..2b9c2dd 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1816,6 +1816,9 @@ long __get_user_pages(struct task_struct *tsk, struct 
mm_struct *mm,
int ret;
unsigned int fault_flags = 0;
 
+   if (gup_flags & FOLL_DURABLE)
+   fault_flags = FAULT_FLAG_NO_CMA;
+
/* For mlock, just skip the stack guard page. */
if (foll_flags & FOLL_MLOCK) {
if (stack_guard_page(vma, start))
@@ -2495,7 +2498,7 @@ static inline void cow_user_page(struct page *dst, struct 
page *src, unsigned lo
  */
 static int do_wp_page(struct mm_struct *mm, struct vm_area_struct *vma,
unsigned long address, pte_t *page_table, pmd_t *pmd,
-   spinlock_t *ptl, pte_t orig_pte)
+   spinlock_t *ptl, pte_t orig_pte, unsigned int flags)
__releases(ptl)
 {
struct page *old_page, *new_page = NULL;
@@ -2505,6 +2508,10 @@ static int do_wp_page(struct mm_struct *mm, struct 
vm_area_struct *vma,
struct page *dirty_page = NULL;
unsigned long mmun_start = 0;   /* For mmu_notifiers */
unsigned long mmun_end = 0; /* For mmu_notifiers */
+   gfp_t gfp = GFP_HIGHUSER_MOVABLE;
+
+   if (IS_ENABLED(CONFIG_CMA) && (flags & FAULT_FLAG_NO_CMA))
+   gfp &= ~__GFP_MOVABLE;
 
old_page = vm_normal_page(vma, address, orig_pte);
if (!old_page) {
@@ -2668,11 +2675,11 @@ gotten:
goto oom;
 
if (is_zero_pfn(pte_pfn(orig_pte))) {
-   new_page = alloc_zeroed_user_highpage_movable(vma, address);
+   new_page = alloc_zeroed_user_highpage(gfp, vma, address);
if (!new_page)
goto oom;
} else {
-   new_page = alloc_page_vma(GFP_HIGHUSER_MOVABLE, vma, address);
+   new_page = alloc_page_vma(gfp, vma, address);
if (!new_page)
goto oom;
cow_user_page(new_page, 

[RFC/PATCH 4/5] mm: get_user_pages: migrate out CMA pages when FOLL_DURABLE flag is set

2013-03-04 Thread Marek Szyprowski
When __get_user_pages() is called with FOLL_DURABLE flag, ensure that no
page in CMA pageblocks gets locked. This workarounds the permanent
migration failures caused by locking the pages by get_user_pages() call for
a long period of time.

Signed-off-by: Marek Szyprowski 
Signed-off-by: Kyungmin Park 
---
 mm/internal.h |   12 
 mm/memory.c   |   43 +++
 2 files changed, 55 insertions(+)

diff --git a/mm/internal.h b/mm/internal.h
index 8562de0..a290d04 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -105,6 +105,18 @@ extern void prep_compound_page(struct page *page, unsigned 
long order);
 extern bool is_free_buddy_page(struct page *page);
 #endif
 
+#ifdef CONFIG_CMA
+static inline int is_cma_page(struct page *page)
+{
+   unsigned mt = get_pageblock_migratetype(page);
+   if (mt == MIGRATE_ISOLATE || mt == MIGRATE_CMA)
+   return true;
+   return false;
+}
+#else
+#define is_cma_page(page) 0
+#endif
+
 #if defined CONFIG_COMPACTION || defined CONFIG_CMA
 
 /*
diff --git a/mm/memory.c b/mm/memory.c
index 2b9c2dd..f81b273 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1650,6 +1650,45 @@ static inline int stack_guard_page(struct vm_area_struct 
*vma, unsigned long add
 }
 
 /**
+ * replace_cma_page() - migrate page out of CMA page blocks
+ * @page:  source page to be migrated
+ *
+ * Returns either the old page (if migration was not possible) or the pointer
+ * to the newly allocated page (with additional reference taken).
+ *
+ * get_user_pages() might take a reference to a page for a long period of time,
+ * what prevent such page from migration. This is fatal to the preffered usage
+ * pattern of CMA pageblocks. This function replaces the given user page with
+ * a new one allocated from NON-MOVABLE pageblock, so locking CMA page can be
+ * avoided.
+ */
+static inline struct page *migrate_replace_cma_page(struct page *page)
+{
+   struct page *newpage = alloc_page(GFP_HIGHUSER);
+
+   if (!newpage)
+   goto out;
+
+   /*
+* Take additional reference to the new page to ensure it won't get
+* freed after migration procedure end.
+*/
+   get_page_foll(newpage);
+
+   if (migrate_replace_page(page, newpage) == 0)
+   return newpage;
+
+   put_page(newpage);
+   __free_page(newpage);
+out:
+   /*
+* Migration errors in case of get_user_pages() might not
+* be fatal to CMA itself, so better don't fail here.
+*/
+   return page;
+}
+
+/**
  * __get_user_pages() - pin user pages in memory
  * @tsk:   task_struct of target task
  * @mm:mm_struct of target mm
@@ -1884,6 +1923,10 @@ long __get_user_pages(struct task_struct *tsk, struct 
mm_struct *mm,
}
if (IS_ERR(page))
return i ? i : PTR_ERR(page);
+
+   if ((gup_flags & FOLL_DURABLE) && is_cma_page(page))
+   page = migrate_replace_cma_page(page);
+
if (pages) {
pages[i] = page;
 
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC/PATCH 0/5] Contiguous Memory Allocator and get_user_pages()

2013-03-04 Thread Marek Szyprowski
Hello,

Contiguous Memory Allocator is very sensitive about migration failures
of the individual pages. A single page, which causes permanent migration
failure can break large conitguous allocations and cause the failure of
a multimedia device driver.

One of the known issues with migration of CMA pages are the problems of
migrating the anonymous user pages, for which the others called
get_user_pages(). This takes a reference to the given user pages to let
kernel to operate directly on the page content. This is usually used for
preventing swaping out the page contents and doing direct DMA to/from
userspace.

To solving this issue requires preventing locking of the pages, which
are placed in CMA regions, for a long time. Our idea is to migrate
anonymous page content before locking the page in get_user_pages(). This
cannot be done automatically, as get_user_pages() interface is used very
often for various operations, which usually last for a short period of
time (like for example exec syscall). We have added a new flag
indicating that the given get_user_space() call will grab pages for a
long time, thus it is suitable to use the migration workaround in such
cases.

The proposed extensions is used by V4L2/VideoBuf2
(drivers/media/v4l2-core/videobuf2-dma-contig.c), but that is not the
only place which might benefit from it, like any driver which use DMA to
userspace with get_user_pages(). This one is provided to demonstrate the
use case.

I would like to hear some comments on the presented approach. What do
you think about it? Is there a chance to get such workaround merged at
some point to mainline?

Best regards
Marek Szyprowski
Samsung Poland R Center


Patch summary:

Marek Szyprowski (5):
  mm: introduce migrate_replace_page() for migrating page to the given
target
  mm: get_user_pages: use static inline
  mm: get_user_pages: use NON-MOVABLE pages when FOLL_DURABLE flag is
set
  mm: get_user_pages: migrate out CMA pages when FOLL_DURABLE flag is
set
  media: vb2: use FOLL_DURABLE and __get_user_pages() to avoid CMA
migration issues

 drivers/media/v4l2-core/videobuf2-dma-contig.c |8 +-
 include/linux/highmem.h|   12 ++-
 include/linux/migrate.h|5 +
 include/linux/mm.h |   76 -
 mm/internal.h  |   12 +++
 mm/memory.c|  136 +++-
 mm/migrate.c   |   59 ++
 7 files changed, 225 insertions(+), 83 deletions(-)

-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC/PATCH 2/5] mm: get_user_pages: use static inline

2013-03-04 Thread Marek Szyprowski
__get_user_pages() is already exported function, so get_user_pages()
can be easily inlined to the caller functions.

Signed-off-by: Marek Szyprowski 
Signed-off-by: Kyungmin Park 
---
 include/linux/mm.h |   74 +---
 mm/memory.c|   69 
 2 files changed, 70 insertions(+), 73 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 7acc9dc..9806e54 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1019,10 +1019,7 @@ long __get_user_pages(struct task_struct *tsk, struct 
mm_struct *mm,
  unsigned long start, unsigned long nr_pages,
  unsigned int foll_flags, struct page **pages,
  struct vm_area_struct **vmas, int *nonblocking);
-long get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
-   unsigned long start, unsigned long nr_pages,
-   int write, int force, struct page **pages,
-   struct vm_area_struct **vmas);
+
 int get_user_pages_fast(unsigned long start, int nr_pages, int write,
struct page **pages);
 struct kvec;
@@ -1642,6 +1639,75 @@ typedef int (*pte_fn_t)(pte_t *pte, pgtable_t token, 
unsigned long addr,
 extern int apply_to_page_range(struct mm_struct *mm, unsigned long address,
   unsigned long size, pte_fn_t fn, void *data);
 
+/*
+ * get_user_pages() - pin user pages in memory
+ * @tsk:   the task_struct to use for page fault accounting, or
+ * NULL if faults are not to be recorded.
+ * @mm:mm_struct of target mm
+ * @start: starting user address
+ * @nr_pages:  number of pages from start to pin
+ * @write: whether pages will be written to by the caller
+ * @force: whether to force write access even if user mapping is
+ * readonly. This will result in the page being COWed even
+ * in MAP_SHARED mappings. You do not want this.
+ * @pages: array that receives pointers to the pages pinned.
+ * Should be at least nr_pages long. Or NULL, if caller
+ * only intends to ensure the pages are faulted in.
+ * @vmas:  array of pointers to vmas corresponding to each page.
+ * Or NULL if the caller does not require them.
+ *
+ * Returns number of pages pinned. This may be fewer than the number
+ * requested. If nr_pages is 0 or negative, returns 0. If no pages
+ * were pinned, returns -errno. Each page returned must be released
+ * with a put_page() call when it is finished with. vmas will only
+ * remain valid while mmap_sem is held.
+ *
+ * Must be called with mmap_sem held for read or write.
+ *
+ * get_user_pages walks a process's page tables and takes a reference to
+ * each struct page that each user address corresponds to at a given
+ * instant. That is, it takes the page that would be accessed if a user
+ * thread accesses the given user virtual address at that instant.
+ *
+ * This does not guarantee that the page exists in the user mappings when
+ * get_user_pages returns, and there may even be a completely different
+ * page there in some cases (eg. if mmapped pagecache has been invalidated
+ * and subsequently re faulted). However it does guarantee that the page
+ * won't be freed completely. And mostly callers simply care that the page
+ * contains data that was valid *at some point in time*. Typically, an IO
+ * or similar operation cannot guarantee anything stronger anyway because
+ * locks can't be held over the syscall boundary.
+ *
+ * If write=0, the page must not be written to. If the page is written to,
+ * set_page_dirty (or set_page_dirty_lock, as appropriate) must be called
+ * after the page is finished with, and before put_page is called.
+ *
+ * get_user_pages is typically used for fewer-copy IO operations, to get a
+ * handle on the memory by some means other than accesses via the user virtual
+ * addresses. The pages may be submitted for DMA to devices or accessed via
+ * their kernel linear mapping (via the kmap APIs). Care should be taken to
+ * use the correct cache flushing APIs.
+ *
+ * See also get_user_pages_fast, for performance critical applications.
+ */
+static inline long get_user_pages(struct task_struct *tsk, struct mm_struct 
*mm,
+   unsigned long start, unsigned long nr_pages, int write,
+   int force, struct page **pages,
+   struct vm_area_struct **vmas)
+{
+   int flags = FOLL_TOUCH;
+
+   if (pages)
+   flags |= FOLL_GET;
+   if (write)
+   flags |= FOLL_WRITE;
+   if (force)
+   flags |= FOLL_FORCE;
+
+   return __get_user_pages(tsk, mm, start, nr_pages, flags, pages, vmas,
+   NULL);
+}
+
 #ifdef CONFIG_PROC_FS
 void vm_stat_account(struct mm_struct *, unsigned long, struct file *, long);
 #else
diff --git 

Re: [PATCH LINUX v5] xen: event channel arrays are xen_ulong_t and not unsigned long

2013-03-04 Thread Rob Herring
On 03/04/2013 09:04 PM, Will Deacon wrote:
> Hi guys,
> 
> On Mon, Mar 04, 2013 at 02:45:33AM +, Rob Herring wrote:
>> On 02/20/2013 05:48 AM, Ian Campbell wrote:
>>> On ARM we want these to be the same size on 32- and 64-bit.
>>>
>>> This is an ABI change on ARM. X86 does not change.
>>>
>>> Signed-off-by: Ian Campbell 
>>> Cc: Jan Beulich 
>>> Cc: Keir (Xen.org) 
>>> Cc: Tim Deegan 
>>> Cc: Stefano Stabellini 
>>> Cc: linux-arm-ker...@lists.infradead.org
>>> Cc: xen-de...@lists.xen.org
>>> Cc: Konrad Rzeszutek Wilk 
> 
> [...]
> 
>> I'm seeing some some build failures on randconfig builds with this change:
>>
>> /tmp/ccJaIZOW.s: Assembler messages:
>> /tmp/ccJaIZOW.s:831: Error: even register required -- `ldrexd r5,r6,[r4]'
>>
>> This is with ubuntu 12.04 cross compiler (gcc version 4.6.3
>> (Ubuntu/Linaro 4.6.3-1ubuntu5)).
>>
>> This register restriction is on ARM, but not Thumb builds. Comparing
>> this to atomic64_cmpxchg, I don't see how to fix this. Perhaps Will or
>> Nico have thoughts.
> 
> [...]
> 
>>> +   asm volatile("@ xchg_xen_ulong\n"
>>> +   "1: ldrexd  %0, %H0, [%3]\n"
>>> +   "   strexd  %1, %2, %H2, [%3]\n"
>>> +   "   teq %1, #0\n"
>>> +   "   bne 1b"
>>> +   : "=" (oldval), "=" (tmp)
>>> +   : "r" (val), "r" (ptr)
>>> +   : "memory", "cc");
> 
> I also can't immediately see why GCC would allocate oldval to an odd base
> register. Can you share your .config please?
> 

Here's a config:

CONFIG_SYSVIPC=y
CONFIG_POSIX_MQUEUE=y
CONFIG_FHANDLE=y
CONFIG_IKCONFIG=y
CONFIG_CGROUP_FREEZER=y
CONFIG_CGROUP_DEVICE=y
CONFIG_CPUSETS=y
# CONFIG_PROC_PID_CPUSET is not set
CONFIG_CGROUP_CPUACCT=y
CONFIG_RESOURCE_COUNTERS=y
CONFIG_MEMCG=y
CONFIG_MEMCG_KMEM=y
CONFIG_MEMCG_DEBUG_ASYNC_DESTROY=y
CONFIG_CHECKPOINT_RESTORE=y
CONFIG_UIDGID_STRICT_TYPE_CHECKS=y
CONFIG_SCHED_AUTOGROUP=y
CONFIG_RELAY=y
CONFIG_BLK_DEV_INITRD=y
CONFIG_RD_BZIP2=y
CONFIG_RD_LZMA=y
CONFIG_RD_XZ=y
CONFIG_RD_LZO=y
CONFIG_CC_OPTIMIZE_FOR_SIZE=y
# CONFIG_PRINTK is not set
# CONFIG_BUG is not set
# CONFIG_BASE_FULL is not set
# CONFIG_EPOLL is not set
# CONFIG_TIMERFD is not set
# CONFIG_SHMEM is not set
# CONFIG_AIO is not set
CONFIG_EMBEDDED=y
# CONFIG_PERF_EVENTS is not set
CONFIG_PROFILING=y
# CONFIG_BLOCK is not set
CONFIG_ARCH_BCM=y
CONFIG_GPIO_PCA953X=y
CONFIG_ARCH_MXC=y
CONFIG_MACH_IMX51_DT=y
CONFIG_MACH_EUKREA_CPUIMX51SD=y
CONFIG_SOC_IMX53=y
CONFIG_ARCH_SOCFPGA=y
CONFIG_ARCH_SUNXI=y
CONFIG_ARCH_VEXPRESS_CA9X4=y
CONFIG_CPU_ICACHE_DISABLE=y
CONFIG_CPU_DCACHE_DISABLE=y
CONFIG_ARM_ERRATA_430973=y
CONFIG_PL310_ERRATA_727915=y
CONFIG_HAVE_ARM_ARCH_TIMER=y
# CONFIG_COMPACTION is not set
# CONFIG_CROSS_MEMORY_ATTACH is not set
CONFIG_SECCOMP=y
CONFIG_CC_STACKPROTECTOR=y
CONFIG_XEN=y
# CONFIG_ATAGS is not set
CONFIG_ARM_APPENDED_DTB=y
CONFIG_ARM_ATAG_DTB_COMPAT=y
CONFIG_KEXEC=y
CONFIG_CRASH_DUMP=y
CONFIG_FPE_FASTFPE=y
# CONFIG_BINFMT_ELF is not set
CONFIG_PM_AUTOSLEEP=y
CONFIG_PM_RUNTIME=y
CONFIG_PM_DEBUG=y
CONFIG_APM_EMULATION=y
CONFIG_NET=y
CONFIG_XFRM_SUB_POLICY=y
CONFIG_XFRM_MIGRATE=y
CONFIG_NET_KEY=y
CONFIG_NETWORK_PHY_TIMESTAMPING=y
CONFIG_NETFILTER=y
CONFIG_NETFILTER_DEBUG=y
CONFIG_DECNET_NF_GRABULATOR=y
CONFIG_ATM=y
CONFIG_BRIDGE=y
CONFIG_BRIDGE_VLAN_FILTERING=y
CONFIG_VLAN_8021Q=y
CONFIG_DECNET=y
CONFIG_DECNET_ROUTER=y
CONFIG_LLC2=y
CONFIG_ATALK=y
CONFIG_DEV_APPLETALK=y
CONFIG_LAPB=y
CONFIG_NET_SCHED=y
CONFIG_NET_SCH_HTB=y
CONFIG_NET_SCH_HFSC=y
CONFIG_NET_SCH_ATM=y
CONFIG_NET_SCH_PRIO=y
CONFIG_NET_SCH_MULTIQ=y
CONFIG_NET_SCH_SFB=y
CONFIG_NET_SCH_TBF=y
CONFIG_NET_SCH_GRED=y
CONFIG_NET_SCH_NETEM=y
CONFIG_NET_SCH_CHOKE=y
CONFIG_NET_SCH_QFQ=y
CONFIG_NET_SCH_CODEL=y
CONFIG_NET_CLS_TCINDEX=y
CONFIG_NET_CLS_U32=y
CONFIG_NET_CLS_RSVP=y
CONFIG_NET_CLS_ACT=y
CONFIG_NET_ACT_SKBEDIT=y
CONFIG_NET_CLS_IND=y
CONFIG_DCB=y
CONFIG_VSOCKETS=y
CONFIG_BT=y
CONFIG_BT_BNEP=y
CONFIG_BT_BNEP_PROTO_FILTER=y
CONFIG_BT_HCIUART=y
CONFIG_BT_HCIUART_BCSP=y
CONFIG_BT_HCIUART_ATH3K=y
CONFIG_BT_HCIUART_LL=y
CONFIG_BT_HCIBPA10X=y
CONFIG_BT_HCIBFUSB=y
# CONFIG_WIRELESS is not set
CONFIG_WIMAX=y
CONFIG_RFKILL_REGULATOR=y
CONFIG_CAIF=y
CONFIG_CAIF_DEBUG=y
CONFIG_DEVTMPFS=y
# CONFIG_STANDALONE is not set
# CONFIG_FW_LOADER_USER_HELPER is not set
CONFIG_DEBUG_DRIVER=y
CONFIG_CONNECTOR=y
# CONFIG_PROC_EVENTS is not set
CONFIG_ATMEL_PWM=y
CONFIG_ENCLOSURE_SERVICES=y
CONFIG_ISL29003=y
CONFIG_SENSORS_BH1770=y
CONFIG_ARM_CHARLCD=y
CONFIG_BMP085_I2C=y
CONFIG_USB_SWITCH_FSA9480=y
CONFIG_EEPROM_AT24=y
CONFIG_TI_ST=y
CONFIG_ALTERA_STAPL=y
CONFIG_NETDEVICES=y
CONFIG_DUMMY=y
CONFIG_IFB=y
CONFIG_MACVLAN=y
CONFIG_NETCONSOLE=y
CONFIG_VIRTIO_NET=y
CONFIG_CAIF_HSI=y
CONFIG_NET_DSA_MV88E6060=y
# CONFIG_ETHERNET is not set
CONFIG_AMD_PHY=y
CONFIG_MARVELL_PHY=y
CONFIG_DAVICOM_PHY=y
CONFIG_QSEMI_PHY=y
CONFIG_VITESSE_PHY=y
CONFIG_BCM87XX_PHY=y
CONFIG_ICPLUS_PHY=y
CONFIG_REALTEK_PHY=y
CONFIG_NATIONAL_PHY=y
CONFIG_LSI_ET1011C_PHY=y
CONFIG_FIXED_PHY=y
CONFIG_PPP=y
CONFIG_PPP_BSDCOMP=y
CONFIG_PPP_DEFLATE=y

Re: [RFC PATCH v2 4/4] timekeeping: utilize the suspend-nonstop clocksource to count suspended time

2013-03-04 Thread John Stultz

On 03/05/2013 02:38 PM, Feng Tang wrote:

On Tue, Mar 05, 2013 at 02:27:34PM +0800, John Stultz wrote:



So this might be ok for an initial implementation, as on the
non-stop-tsc hardware, the TSC is the best clocksource available.
One concern long term is that there may be cases where the non-stop
clocksource is not the most performant clocksource on a system. In
that case, we may want to use a non-stop clocksource that is not the
current timekeeping clocksource. So that may require some extra
clocksource core interfaces to access the non-stop clocksource
instead of using the timekeeper's clocksource, also we'll have to be
sure to use something other then cycle_last in that case, since
we'll need to read the nonstop clocksource at suspend, rather then
trusting that forward_now updates cycle_last as is done here.

Yeah, I just realized this when Jason mentioned the counter_32k on
OMAP.

So for next step, we may add something in timekeeping.c like
static struct clocksource *suspend_time_cs;
read and save its counter righer before entering and after getting
out of suspended state. And create a new struct which includes
all time suspend related flags, counters, pointers, make it as a
member of struct timekeeper. Comments?
I'd maybe add it to the clocksource code rather then the timekeeper. 
Have a clocksource_nonstop_clock() accessor which returns a pointer to 
the highest rated clocksource in the clocksource list that has the 
nonstop flag (updating the pointer at registration time, rather then 
scanning the list every time).


That way if we want, we can later define clocksource_nonstop_clock() as 
null, and let the complier optimize out the extra complexity in the 
resume path if the arch doesn't support nonstop clocksource.


thanks
-john
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: For review: pid_namespaces(7) man page

2013-03-04 Thread Eric W. Biederman
"Michael Kerrisk (man-pages)"  writes:

> Eric,
>
> On Mon, Mar 4, 2013 at 6:52 PM, Eric W. Biederman
>  wrote:
>> "Michael Kerrisk (man-pages)"  writes:
>>
>>> On Fri, Mar 1, 2013 at 4:35 PM, Eric W. Biederman
>  wrote:
 "Michael Kerrisk (man-pages)"  writes:

> Hi Rob,
>
> On Fri, Mar 1, 2013 at 5:01 AM, Rob Landley 
> wrote:
>> On 02/28/2013 05:24:07 AM, Michael Kerrisk (man-pages) wrote:
> [...]
>>> Because the above unshare(2) and setns(2) calls only change the
>>> PID namespace for created children, the clone(2) calls neces‐
>>> sarily put the new thread in a different PID namespace from the
>>> calling thread.
>>
>>
>> Um, no they don't. They fail. That's the point.
>
> (Good catch.)
>
>> They _would_ put the new
>> thread in a different PID namespace, which breaks the definition
> of threads.
>>
>> How about:
>>
>> The above unshare(2) and setns(2) calls change the PID namespace
> of
>> children created by subsequent clone(2) calls, which is
> incompatible
>> with CLONE_VM.
>
> I decided on:
>
> The point here is that unshare(2) and setns(2) change the PID
> namespace for created children but not for the calling process,
> while clone(2) CLONE_VM specifies the creation of a new thread
> in the same process.

 Can we make that "for all new tasks created" instead of "created
 children"

 Othewise someone might expect CLONE_THREAD would work as you
 CLONE_THREAD creates a thread and not a child...
>>>
>>> The term "task" is kernel-space talk that rarely appears in man
> pages,
>>> so I am reluctant to use it.
>>
>> With respect to clone and in this case I am not certain we can
> properly
>> describe what happens without talking about tasks. But it is worth
>> a try.
>>
>>
>>> How about this:
>>>
>>> The point here is that unshare(2) and setns(2) change the PID
>>> namespace for processes subsequently created by the caller, but
>>> not for the calling process, while clone(2) CLONE_VM specifies
>>> the creation of a new thread in the same process.
>>
>> Hmm. How about this.
>>
>> The point here is that unshare(2) and setns(2) change the PID
>> namespace that will be used by in all subsequent calls to clone
>> and fork by the caller, but not for the calling process, and
>> that all threads in a process must share the same PID
>> namespace. Which makes a subsequent clone(2) CLONE_VM
>> specify the creation of a new thread in the a different PID
>> namespace but in the same process which is impossible.
>
> I did a little tidying:
>
> The point here is that unshare(2) and setns(2) change the
> PID namespace that will be used in all subsequent calls
> to clone(2) and fork(2), but do not change the PID names‐
> pace of the calling process. Because a subsequent
> clone(2) CLONE_VM would imply the creation of a new
> thread in a different PID namespace, the operation is not
> permitted.
>
> Okay?

That seems reasonable.

CLONE_THREAD might be better to talk about.  The check is CLONE_VM
because it is easier and CLONE_THREAD implies CLONE_THREAD.

> Having asked that, I realize that I'm still not quite comfortable with
> this text. I think the problem is really one of terminology. At the
> start of this passage in the page, there is the sentence:
>
> Every thread in a process must be in the 
> same PID namespace.
>
> Can you define "thread" in this context?

Most definitely a thread group created with CLONE_THREAD.  It is pretty
ugly in just the old fashioned CLONE_VM case too, but that might be
legal.

In a few cases I think the implementation overshoots and test for VM
sharing instead of thread group membership because VM sharing is easier
to test for, and we already have tests for that.

Eric

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH v2 4/4] timekeeping: utilize the suspend-nonstop clocksource to count suspended time

2013-03-04 Thread Feng Tang
On Tue, Mar 05, 2013 at 02:27:34PM +0800, John Stultz wrote:
> On 03/05/2013 10:27 AM, Feng Tang wrote:
> >There are some new processors whose TSC clocksource won't stop during
> >suspend. Currently, after system resumes, kernel will use persistent
> >clock or RTC to compensate the sleep time, but for those new types of
> >clocksources, we could skip the special compensation from external
> >sources, and just use current clocksource for time recounting.
> >
> >This can solve some time drift bugs caused by some not-so-accurate or
> >error-prone RTC devices.
> >
> >The current way to count suspened time is first try to use the persistent
> >clock, and then try the rtc if persistent clock can't be used. This
> >patch will change the trying order to:
> > suspend-nonstop clocksource -> persistent clock -> rtc
> 
> Thanks for sending out another iteration of this code. Jason's
> feedback has been good, but I think this is starting to shape up
> nicely.

Thanks :)

> >Signed-off-by: Feng Tang 
> >---
> >  kernel/time/timekeeping.c |   57 
> > ++--
> >  1 files changed, 49 insertions(+), 8 deletions(-)
> >
> >diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
> >index 9a0bc98..15cc086 100644
> >--- a/kernel/time/timekeeping.c
> >+++ b/kernel/time/timekeeping.c
> >@@ -788,22 +788,63 @@ void timekeeping_inject_sleeptime(struct timespec 
> >*delta)
> >  static void timekeeping_resume(void)
> >  {
> > struct timekeeper *tk = 
> >+struct clocksource *clock = tk->clock;
> > unsigned long flags;
> >-struct timespec ts;
> >+struct timespec ts_new, ts_delta;
> >+cycle_t cycle_now, cycle_delta;
> >+s64 nsec;
> >-read_persistent_clock();
> >+ts_delta.tv_sec = 0;
> >+read_persistent_clock(_new);
> > clockevents_resume();
> > clocksource_resume();
> > write_seqlock_irqsave(>lock, flags);
> >-if (timespec_compare(, _suspend_time) > 0) {
> >-ts = timespec_sub(ts, timekeeping_suspend_time);
> >-__timekeeping_inject_sleeptime(tk, );
> >-}
> >-/* re-base the last cycle value */
> >-tk->clock->cycle_last = tk->clock->read(tk->clock);
> >+/*
> >+ * After system resumes, we need to calculate the suspended time and
> >+ * compensate it for the OS time. There are 3 sources that could be
> >+ * used: Nonstop clocksource during suspend, persistent clock and rtc
> >+ * device.
> >+ *
> >+ * One specific platform may have 1 or 2 or all of them, and the
> >+ * preference will be:
> >+ *  suspend-nonstop clocksource > persistent clock > rtc
> >+ * The less preferred source will only be tried if there is no better
> >+ * usable source. The rtc part is handled separately in rtc core code.
> >+ */
> >+cycle_now = clock->read(clock);
> 
> So this might be ok for an initial implementation, as on the
> non-stop-tsc hardware, the TSC is the best clocksource available.
> One concern long term is that there may be cases where the non-stop
> clocksource is not the most performant clocksource on a system. In
> that case, we may want to use a non-stop clocksource that is not the
> current timekeeping clocksource. So that may require some extra
> clocksource core interfaces to access the non-stop clocksource
> instead of using the timekeeper's clocksource, also we'll have to be
> sure to use something other then cycle_last in that case, since
> we'll need to read the nonstop clocksource at suspend, rather then
> trusting that forward_now updates cycle_last as is done here.

Yeah, I just realized this when Jason mentioned the counter_32k on
OMAP.

So for next step, we may add something in timekeeping.c like
static struct clocksource *suspend_time_cs;
read and save its counter righer before entering and after getting
out of suspended state. And create a new struct which includes
all time suspend related flags, counters, pointers, make it as a
member of struct timekeeper. Comments?

Thanks,
Feng


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V2 01/14] MIPS: Build uasm-generated code only once to avoid CPU Hotplug problem

2013-03-04 Thread Huacai Chen
I'm sorry, this is the only patch, please ignore [01/14].

On Tue, Mar 5, 2013 at 12:37 PM, Huacai Chen  wrote:
> Currently, clear_page()/copy_page() are generated by Micro-assembler
> dynamically. But they are unavailable until uasm_resolve_relocs() has
> finished because jump labels are illegal before that. Since these
> functions are shared by every CPU, we only call build_clear_page()/
> build_copy_page() only once at boot time. Without this patch, programs
> will get random memory corruption (segmentation fault, bus error, etc.)
> while CPU Hotplug (e.g. one CPU is using clear_page() while another is
> generating it in cpu_cache_init()).
>
> For similar reasons we modify build_tlb_refill_handler()'s invocation.
>
> V2:
> 1, Rework the code to make CPU#0 can be online/offline.
> 2, Introduce cpu_has_local_ebase feature since some types of MIPS CPU
> need a per-CPU tlb_refill_handler().
>
> Signed-off-by: Huacai Chen 
> Signed-off-by: Hongbing Hu 
> ---
>  arch/mips/include/asm/cpu-features.h   |3 +++
>  .../asm/mach-loongson/cpu-feature-overrides.h  |1 +
>  arch/mips/mm/page.c|   10 ++
>  arch/mips/mm/tlbex.c   |   10 --
>  4 files changed, 22 insertions(+), 2 deletions(-)
>
> diff --git a/arch/mips/include/asm/cpu-features.h 
> b/arch/mips/include/asm/cpu-features.h
> index c507b93..1204408 100644
> --- a/arch/mips/include/asm/cpu-features.h
> +++ b/arch/mips/include/asm/cpu-features.h
> @@ -110,6 +110,9 @@
>  #ifndef cpu_has_pindexed_dcache
>  #define cpu_has_pindexed_dcache(cpu_data[0].dcache.flags & 
> MIPS_CACHE_PINDEX)
>  #endif
> +#ifndef cpu_has_local_ebase
> +#define cpu_has_local_ebase1
> +#endif
>
>  /*
>   * I-Cache snoops remote store.  This only matters on SMP.  Some 
> multiprocessors
> diff --git a/arch/mips/include/asm/mach-loongson/cpu-feature-overrides.h 
> b/arch/mips/include/asm/mach-loongson/cpu-feature-overrides.h
> index 1a05d85..8eec8e2 100644
> --- a/arch/mips/include/asm/mach-loongson/cpu-feature-overrides.h
> +++ b/arch/mips/include/asm/mach-loongson/cpu-feature-overrides.h
> @@ -57,5 +57,6 @@
>  #define cpu_has_vint   0
>  #define cpu_has_vtag_icache0
>  #define cpu_has_watch  1
> +#define cpu_has_local_ebase0
>
>  #endif /* __ASM_MACH_LOONGSON_CPU_FEATURE_OVERRIDES_H */
> diff --git a/arch/mips/mm/page.c b/arch/mips/mm/page.c
> index 8e666c5..6a39c01 100644
> --- a/arch/mips/mm/page.c
> +++ b/arch/mips/mm/page.c
> @@ -247,6 +247,11 @@ void __cpuinit build_clear_page(void)
> struct uasm_label *l = labels;
> struct uasm_reloc *r = relocs;
> int i;
> +   static atomic_t run_once = ATOMIC_INIT(0);
> +
> +   if (atomic_xchg(_once, 1)) {
> +   return;
> +   }
>
> memset(labels, 0, sizeof(labels));
> memset(relocs, 0, sizeof(relocs));
> @@ -389,6 +394,11 @@ void __cpuinit build_copy_page(void)
> struct uasm_label *l = labels;
> struct uasm_reloc *r = relocs;
> int i;
> +   static atomic_t run_once = ATOMIC_INIT(0);
> +
> +   if (atomic_xchg(_once, 1)) {
> +   return;
> +   }
>
> memset(labels, 0, sizeof(labels));
> memset(relocs, 0, sizeof(relocs));
> diff --git a/arch/mips/mm/tlbex.c b/arch/mips/mm/tlbex.c
> index 1c8ac49..4a8b294 100644
> --- a/arch/mips/mm/tlbex.c
> +++ b/arch/mips/mm/tlbex.c
> @@ -2161,8 +2161,11 @@ void __cpuinit build_tlb_refill_handler(void)
> case CPU_TX3922:
> case CPU_TX3927:
>  #ifndef CONFIG_MIPS_PGD_C0_CONTEXT
> -   build_r3000_tlb_refill_handler();
> +   if (cpu_has_local_ebase)
> +   build_r3000_tlb_refill_handler();
> if (!run_once) {
> +   if (!cpu_has_local_ebase)
> +   build_r3000_tlb_refill_handler();
> build_r3000_tlb_load_handler();
> build_r3000_tlb_store_handler();
> build_r3000_tlb_modify_handler();
> @@ -2191,9 +2194,12 @@ void __cpuinit build_tlb_refill_handler(void)
> build_r4000_tlb_load_handler();
> build_r4000_tlb_store_handler();
> build_r4000_tlb_modify_handler();
> +   if (!cpu_has_local_ebase)
> +   build_r4000_tlb_refill_handler();
> run_once++;
> }
> -   build_r4000_tlb_refill_handler();
> +   if (cpu_has_local_ebase)
> +   build_r4000_tlb_refill_handler();
> }
>  }
>
> --
> 1.7.7.3
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/1] kernel/pid_namespace.c: Fixing a lack of cleanup (Probable resources leak).

2013-03-04 Thread Eric W. Biederman
"Raphael S.Carvalho"  writes:

> Starting point: create_pid_namespace()
>
> Suppose create_pid_cachep() was executed sucessfully, thus:
> pcache was allocated by kmalloc().
> cachep received a cache created by kmem_cache_create().
> and pcache->list was added to the list pid_caches_lh.
>
> So what would happen if proc_alloc_inum() returns an error?
> The resources allocated by create_pid_namespace() would be deallocated!
> How about those resources allocated by create_pid_cachep()?
> By knowing that, I created this patch in order to fix that!

pid caches are not per namespace.  There are per pid namespace depth
and shared among many pid namespaces so in general a leak is fine.

We definitely can't do what you are doing.  There are no checks that
another pid namespace doesn't have pids allocated from the pid cache
you are freeing nor any checks to see that the pid cache was allocated
uniquely per pid namespace.

Under the right circumstances you might be able to free pid caches
but it is hard to figure out what those circumstances are and I don't
expect it is worth the trouble.

Eric


> Signed-off-by: Raphael S.Carvalho 
> ---
>  kernel/pid_namespace.c |   16 +++-
>  1 files changed, 11 insertions(+), 5 deletions(-)
>
> diff --git a/kernel/pid_namespace.c b/kernel/pid_namespace.c
> index c1c3dc1..d94e4b6 100644
> --- a/kernel/pid_namespace.c
> +++ b/kernel/pid_namespace.c
> @@ -37,7 +37,7 @@ static struct kmem_cache *pid_ns_cachep;
>   * @nr_ids: the number of numerical ids this pid will have to carry
>   */
>  
> -static struct kmem_cache *create_pid_cachep(int nr_ids)
> +static struct pid_cache *create_pid_cachep(int nr_ids)
>  {
>   struct pid_cache *pcache;
>   struct kmem_cache *cachep;
> @@ -63,7 +63,7 @@ static struct kmem_cache *create_pid_cachep(int nr_ids)
>   list_add(>list, _caches_lh);
>  out:
>   mutex_unlock(_caches_mutex);
> - return pcache->cachep;
> + return pcache;
>  
>  err_cachep:
>   kfree(pcache);
> @@ -85,6 +85,7 @@ static struct pid_namespace *create_pid_namespace(struct 
> user_namespace *user_ns
>   struct pid_namespace *parent_pid_ns)
>  {
>   struct pid_namespace *ns;
> + struct pid_cache *pcache;
>   unsigned int level = parent_pid_ns->level + 1;
>   int i;
>   int err;
> @@ -103,15 +104,16 @@ static struct pid_namespace 
> *create_pid_namespace(struct user_namespace *user_ns
>   if (!ns->pidmap[0].page)
>   goto out_free;
>  
> - ns->pid_cachep = create_pid_cachep(level + 1);
> - if (ns->pid_cachep == NULL)
> + pcache = create_pid_cachep(level + 1);
> + if (pcache == NULL)
>   goto out_free_map;
>  
>   err = proc_alloc_inum(>proc_inum);
>   if (err)
> - goto out_free_map;
> + goto out_free_cachep;
>  
>   kref_init(>kref);
> + ns->pid_cachep = pcache->cachep;
>   ns->level = level;
>   ns->parent = get_pid_ns(parent_pid_ns);
>   ns->user_ns = get_user_ns(user_ns);
> @@ -126,6 +128,10 @@ static struct pid_namespace *create_pid_namespace(struct 
> user_namespace *user_ns
>  
>   return ns;
>  
> +out_free_cachep:
> + kmem_cache_destroy(pcache->cachep);
> + list_del(>list);
> + kfree(pcache);
>  out_free_map:
>   kfree(ns->pidmap[0].page);
>  out_free:
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[git pull] Please pull powerpc.git merge branch

2013-03-04 Thread Benjamin Herrenschmidt
Hi Linus !

Here are a few powerpc bits & fixes for rc1. A couple of str*cpy fixes,
some fixes in handling the FSCR register on Power8 (controls the
enabling of processor features), a 32-bit build fix and a couple more
nits.

Cheers,
Ben.

The following changes since commit 6dbe51c251a327e012439c4772097a13df43c5b8:

  Linux 3.9-rc1 (2013-03-03 15:11:05 -0800)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc.git merge

for you to fetch changes up to 54c9b2253d34e8998e4bff9ac2d7a3ba0b861d52:

  powerpc: Set DSCR bit in FSCR setup (2013-03-05 16:56:30 +1100)


Akinobu Mita (1):
  powerpc: Remove unused BITOP_LE_SWIZZLE macro

Chen Gang (2):
  powerpc/pseries/hvcserver: Fix strncpy buffer limit in location code
  drivers/tty/hvc: Use strlcpy instead of strncpy

Michael Neuling (4):
  powerpc: Avoid link stack corruption in MMU on syscall entry path
  powerpc: Fix setting FSCR for HV=0 and on secondary CPUs
  powerpc: Add DSCR FSCR register bit definition
  powerpc: Set DSCR bit in FSCR setup

Tony Breeds (2):
  powerpc: Fix compile of sha1-powerpc-asm.S on 32-bit
  powerpc: Wireup the kcmp syscall to sys_ni

 arch/powerpc/crypto/sha1-powerpc-asm.S |4 ++--
 arch/powerpc/include/asm/bitops.h  |2 --
 arch/powerpc/include/asm/reg.h |3 ++-
 arch/powerpc/include/asm/systbl.h  |1 +
 arch/powerpc/include/asm/unistd.h  |2 +-
 arch/powerpc/include/uapi/asm/unistd.h |1 +
 arch/powerpc/kernel/cpu_setup_power.S  |5 +++--
 arch/powerpc/kernel/exceptions-64s.S   |4 ++--
 arch/powerpc/platforms/pseries/hvcserver.c |5 +++--
 drivers/tty/hvc/hvcs.c |9 ++---
 10 files changed, 17 insertions(+), 19 deletions(-)


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH v2 0/4] Add support for S3 non-stop TSC support.

2013-03-04 Thread Feng Tang
On Mon, Mar 04, 2013 at 09:32:03PM -0700, Jason Gunthorpe wrote:
> On Tue, Mar 05, 2013 at 11:53:02AM +0800, Feng Tang wrote:

> > > You may want to also CC the maintainers of all the ARM subsystems that
> > > use read_persistent_clock and check with them to ensure this new
> > > interface will let them migrate their implementations as well.
> > 
> > Maybe I didn't get it well, my patches didn't change the
> > read_persistent_clock(), but inject a new way of counting suspended
> > time. It should have no functional changes to existing platforms.
> 
> Right, your patches are fine stand alone.
> 
> The ARM case of plat-omap/counter_32k.c would ideally be converted to
> use your new API though, that is what I ment about involving them.

I see now. Yes, the counter_32k could be converted to a clocksource
with SUSPEND_NONSTOP flag set, and no need for it to use the
read_persistent_clock any more.

> 
> I'm not sure about mach-tegra/timer.c though - it seems to be using a
> counter as well but somehow sharing registers with the RTC?

I just searched the 3.9-rc1 code, seems the file has been moved to
drivers/clocksource/tegra20_timer.c, and its persistent clock seems
to also be based on a RTC like device.

Thanks,
Feng

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH v2 4/4] timekeeping: utilize the suspend-nonstop clocksource to count suspended time

2013-03-04 Thread John Stultz

On 03/05/2013 10:27 AM, Feng Tang wrote:

There are some new processors whose TSC clocksource won't stop during
suspend. Currently, after system resumes, kernel will use persistent
clock or RTC to compensate the sleep time, but for those new types of
clocksources, we could skip the special compensation from external
sources, and just use current clocksource for time recounting.

This can solve some time drift bugs caused by some not-so-accurate or
error-prone RTC devices.

The current way to count suspened time is first try to use the persistent
clock, and then try the rtc if persistent clock can't be used. This
patch will change the trying order to:
suspend-nonstop clocksource -> persistent clock -> rtc


Thanks for sending out another iteration of this code. Jason's feedback 
has been good, but I think this is starting to shape up nicely.


More below


Signed-off-by: Feng Tang 
---
  kernel/time/timekeeping.c |   57 ++--
  1 files changed, 49 insertions(+), 8 deletions(-)

diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 9a0bc98..15cc086 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -788,22 +788,63 @@ void timekeeping_inject_sleeptime(struct timespec *delta)
  static void timekeeping_resume(void)
  {
struct timekeeper *tk = 
+   struct clocksource *clock = tk->clock;
unsigned long flags;
-   struct timespec ts;
+   struct timespec ts_new, ts_delta;
+   cycle_t cycle_now, cycle_delta;
+   s64 nsec;
  
-	read_persistent_clock();

+   ts_delta.tv_sec = 0;
+   read_persistent_clock(_new);
  
  	clockevents_resume();

clocksource_resume();
  
  	write_seqlock_irqsave(>lock, flags);
  
-	if (timespec_compare(, _suspend_time) > 0) {

-   ts = timespec_sub(ts, timekeeping_suspend_time);
-   __timekeeping_inject_sleeptime(tk, );
-   }
-   /* re-base the last cycle value */
-   tk->clock->cycle_last = tk->clock->read(tk->clock);
+   /*
+* After system resumes, we need to calculate the suspended time and
+* compensate it for the OS time. There are 3 sources that could be
+* used: Nonstop clocksource during suspend, persistent clock and rtc
+* device.
+*
+* One specific platform may have 1 or 2 or all of them, and the
+* preference will be:
+*  suspend-nonstop clocksource > persistent clock > rtc
+* The less preferred source will only be tried if there is no better
+* usable source. The rtc part is handled separately in rtc core code.
+*/
+   cycle_now = clock->read(clock);


So this might be ok for an initial implementation, as on the 
non-stop-tsc hardware, the TSC is the best clocksource available. One 
concern long term is that there may be cases where the non-stop 
clocksource is not the most performant clocksource on a system. In that 
case, we may want to use a non-stop clocksource that is not the current 
timekeeping clocksource. So that may require some extra clocksource core 
interfaces to access the non-stop clocksource instead of using the 
timekeeper's clocksource, also we'll have to be sure to use something 
other then cycle_last in that case, since we'll need to read the nonstop 
clocksource at suspend, rather then trusting that forward_now updates 
cycle_last as is done here.


thanks
-john

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: For review: pid_namespaces(7) man page

2013-03-04 Thread Michael Kerrisk (man-pages)
[Resending, since my mobile device turned things into HTML]

Eric,

On Mon, Mar 4, 2013 at 6:52 PM, Eric W. Biederman  wrote:
> "Michael Kerrisk (man-pages)"  writes:
>
>> On Fri, Mar 1, 2013 at 4:35 PM, Eric W. Biederman  
>> wrote:
>>> "Michael Kerrisk (man-pages)"  writes:
>>>
 Hi Rob,

 On Fri, Mar 1, 2013 at 5:01 AM, Rob Landley  wrote:
> On 02/28/2013 05:24:07 AM, Michael Kerrisk (man-pages) wrote:
[...]
>> Because the above unshare(2) and setns(2) calls only change the
>> PID namespace for created children, the clone(2) calls neces‐
>> sarily put the new thread in a different PID namespace from the
>> calling thread.
>
>
> Um, no they don't. They fail. That's the point.

 (Good catch.)

> They _would_ put the new
> thread in a different PID namespace, which breaks the definition of 
> threads.
>
> How about:
>
> The above unshare(2) and setns(2) calls change the PID namespace of
> children created by subsequent clone(2) calls, which is incompatible
> with CLONE_VM.

 I decided on:

 The point here is that unshare(2) and setns(2) change the PID
 namespace for created children but not for the calling process,
 while clone(2) CLONE_VM specifies the creation of a new thread
 in the same process.
>>>
>>> Can we make that "for all new tasks created" instead of "created
>>> children"
>>>
>>> Othewise someone might expect CLONE_THREAD would work as you
>>> CLONE_THREAD creates a thread and not a child...
>>
>> The term "task" is kernel-space talk that rarely appears in man pages,
>> so I am reluctant to use it.
>
> With respect to clone and in this case I am not certain we can properly
> describe what happens without talking about tasks. But it is worth
> a try.
>
>
>> How about this:
>>
>> The point here is that unshare(2) and setns(2) change the PID
>> namespace for processes subsequently created by the caller, but
>> not for the calling process, while clone(2) CLONE_VM specifies
>> the creation of a new thread in the same process.
>
> Hmm. How about this.
>
> The point here is that unshare(2) and setns(2) change the PID
> namespace that will be used by in all subsequent calls to clone
> and fork by the caller, but not for the calling process, and
> that all threads in a process must share the same PID
> namespace. Which makes a subsequent clone(2) CLONE_VM
> specify the creation of a new thread in the a different PID
> namespace but in the same process which is impossible.

I did a little tidying:

The point here is that unshare(2) and setns(2) change the
PID namespace that will be used in all subsequent calls
to clone(2) and fork(2), but do not change the PID names‐
pace of the calling process. Because a subsequent
clone(2) CLONE_VM would imply the creation of a new
thread in a different PID namespace, the operation is not
permitted.

Okay?

Having asked that, I realize that I'm still not quite comfortable with
this text. I think the problem is really one of terminology. At the
start of this passage in the page, there is the sentence:

Every thread in a process must be in the
same PID namespace.

Can you define "thread" in this context?

Thanks,

Michael
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH v2 0/4] Add support for S3 non-stop TSC support.

2013-03-04 Thread John Stultz

On 03/05/2013 12:32 PM, Jason Gunthorpe wrote:

On Tue, Mar 05, 2013 at 11:53:02AM +0800, Feng Tang wrote:


// Drops some small precision along the way but is simple..
static inline u64 cyclecounter_cyc2ns_128(const struct cyclecounter *cc,
   cycle_t cycles)
{
 u64 max = U64_MAX/cc->mult;
 u64 num = cycles/max;
 u64 result = num * ((max * cc->mult) >> cc->shift);
 return result + cyclecounter_cyc2ns(cc, cycles - num*cc->mult);
}

Your way is surely more accurate, if maintainers are ok with adding
the new API, I will use it.

Okay, give it a good look though, I only wrote it out in email, never
tested it :)

Probably want to use clocksource instead of cyclecounter, but I think 
Jason's approach sounds ok. I might suggest that you initially make the 
function static to the timekeeping code, just so we don't get unexpected 
users.


thanks
-john
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH RESEND 3/3] regulator: max8649: Use enable_is_inverted flag with regulator_enable_regmap and friends APIs

2013-03-04 Thread Axel Lin
Signed-off-by: Axel Lin 
Reviewed-by: Haojian Zhuang 
---
 drivers/regulator/max8649.c |   39 ++-
 1 file changed, 6 insertions(+), 33 deletions(-)

diff --git a/drivers/regulator/max8649.c b/drivers/regulator/max8649.c
index 3ca1438..fdb67ff 100644
--- a/drivers/regulator/max8649.c
+++ b/drivers/regulator/max8649.c
@@ -60,36 +60,6 @@ struct max8649_regulator_info {
unsignedramp_down:1;
 };
 
-/* EN_PD means pulldown on EN input */
-static int max8649_enable(struct regulator_dev *rdev)
-{
-   struct max8649_regulator_info *info = rdev_get_drvdata(rdev);
-   return regmap_update_bits(info->regmap, MAX8649_CONTROL, MAX8649_EN_PD, 
0);
-}
-
-/*
- * Applied internal pulldown resistor on EN input pin.
- * If pulldown EN pin outside, it would be better.
- */
-static int max8649_disable(struct regulator_dev *rdev)
-{
-   struct max8649_regulator_info *info = rdev_get_drvdata(rdev);
-   return regmap_update_bits(info->regmap, MAX8649_CONTROL, MAX8649_EN_PD,
-   MAX8649_EN_PD);
-}
-
-static int max8649_is_enabled(struct regulator_dev *rdev)
-{
-   struct max8649_regulator_info *info = rdev_get_drvdata(rdev);
-   unsigned int val;
-   int ret;
-
-   ret = regmap_read(info->regmap, MAX8649_CONTROL, );
-   if (ret != 0)
-   return ret;
-   return !((unsigned char)val & MAX8649_EN_PD);
-}
-
 static int max8649_enable_time(struct regulator_dev *rdev)
 {
struct max8649_regulator_info *info = rdev_get_drvdata(rdev);
@@ -151,9 +121,9 @@ static struct regulator_ops max8649_dcdc_ops = {
.get_voltage_sel = regulator_get_voltage_sel_regmap,
.list_voltage   = regulator_list_voltage_linear,
.map_voltage= regulator_map_voltage_linear,
-   .enable = max8649_enable,
-   .disable= max8649_disable,
-   .is_enabled = max8649_is_enabled,
+   .enable = regulator_enable_regmap,
+   .disable= regulator_disable_regmap,
+   .is_enabled = regulator_is_enabled_regmap,
.enable_time= max8649_enable_time,
.set_mode   = max8649_set_mode,
.get_mode   = max8649_get_mode,
@@ -169,6 +139,9 @@ static struct regulator_desc dcdc_desc = {
.vsel_mask  = MAX8649_VOL_MASK,
.min_uV = MAX8649_DCDC_VMIN,
.uV_step= MAX8649_DCDC_STEP,
+   .enable_reg = MAX8649_CONTROL,
+   .enable_mask= MAX8649_EN_PD,
+   .enable_is_inverted = true,
 };
 
 static struct regmap_config max8649_regmap_config = {
-- 
1.7.9.5



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH RESEND 2/3] regulator: 88pm8607: Use enable_is_inverted flag with regulator_enable_regmap and friends APIs

2013-03-04 Thread Axel Lin
Signed-off-by: Axel Lin 
Reviewed-by: Haojian Zhuang 
---
 drivers/regulator/88pm8607.c |   36 
 1 file changed, 4 insertions(+), 32 deletions(-)

diff --git a/drivers/regulator/88pm8607.c b/drivers/regulator/88pm8607.c
index c79ab84..493948a 100644
--- a/drivers/regulator/88pm8607.c
+++ b/drivers/regulator/88pm8607.c
@@ -220,35 +220,6 @@ static int pm8607_list_voltage(struct regulator_dev *rdev, 
unsigned index)
return ret;
 }
 
-static int pm8606_preg_enable(struct regulator_dev *rdev)
-{
-   struct pm8607_regulator_info *info = rdev_get_drvdata(rdev);
-
-   return pm860x_set_bits(info->i2c, rdev->desc->enable_reg,
-  1 << rdev->desc->enable_mask, 0);
-}
-
-static int pm8606_preg_disable(struct regulator_dev *rdev)
-{
-   struct pm8607_regulator_info *info = rdev_get_drvdata(rdev);
-
-   return pm860x_set_bits(info->i2c, rdev->desc->enable_reg,
-  1 << rdev->desc->enable_mask,
-  1 << rdev->desc->enable_mask);
-}
-
-static int pm8606_preg_is_enabled(struct regulator_dev *rdev)
-{
-   struct pm8607_regulator_info *info = rdev_get_drvdata(rdev);
-   int ret;
-
-   ret = pm860x_reg_read(info->i2c, rdev->desc->enable_reg);
-   if (ret < 0)
-   return ret;
-
-   return !((unsigned char)ret & (1 << rdev->desc->enable_mask));
-}
-
 static struct regulator_ops pm8607_regulator_ops = {
.list_voltage   = pm8607_list_voltage,
.set_voltage_sel = regulator_set_voltage_sel_regmap,
@@ -259,9 +230,9 @@ static struct regulator_ops pm8607_regulator_ops = {
 };
 
 static struct regulator_ops pm8606_preg_ops = {
-   .enable = pm8606_preg_enable,
-   .disable= pm8606_preg_disable,
-   .is_enabled = pm8606_preg_is_enabled,
+   .enable = regulator_enable_regmap,
+   .disable= regulator_disable_regmap,
+   .is_enabled = regulator_is_enabled_regmap,
 };
 
 #define PM8606_PREG(ereg, ebit)
\
@@ -274,6 +245,7 @@ static struct regulator_ops pm8606_preg_ops = {
.owner  = THIS_MODULE,  \
.enable_reg = PM8606_##ereg,\
.enable_mask = (ebit),  \
+   .enable_is_inverted = true, \
},  \
 }
 
-- 
1.7.9.5



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH RESEND 1/3] regulator: core: Add enable_is_inverted flag to indicate set enable_mask bits to disable

2013-03-04 Thread Axel Lin
Add enable_is_inverted flag to indicate set enable_mask bits to disable
when using regulator_enable_regmap and friends APIs.

Signed-off-by: Axel Lin 
Reviewed-by: Haojian Zhuang 
---
This patch was sent on https://lkml.org/lkml/2013/2/16/14.
This resend is against linux-next tree (20130305).

 drivers/regulator/core.c |   24 
 include/linux/regulator/driver.h |3 +++
 2 files changed, 23 insertions(+), 4 deletions(-)

diff --git a/drivers/regulator/core.c b/drivers/regulator/core.c
index cea0217..d420c8f 100644
--- a/drivers/regulator/core.c
+++ b/drivers/regulator/core.c
@@ -1905,7 +1905,10 @@ int regulator_is_enabled_regmap(struct regulator_dev 
*rdev)
if (ret != 0)
return ret;
 
-   return (val & rdev->desc->enable_mask) != 0;
+   if (rdev->desc->enable_is_inverted)
+   return (val & rdev->desc->enable_mask) == 0;
+   else
+   return (val & rdev->desc->enable_mask) != 0;
 }
 EXPORT_SYMBOL_GPL(regulator_is_enabled_regmap);
 
@@ -1920,9 +1923,15 @@ EXPORT_SYMBOL_GPL(regulator_is_enabled_regmap);
  */
 int regulator_enable_regmap(struct regulator_dev *rdev)
 {
+   unsigned int val;
+
+   if (rdev->desc->enable_is_inverted)
+   val = 0;
+   else
+   val = rdev->desc->enable_mask;
+
return regmap_update_bits(rdev->regmap, rdev->desc->enable_reg,
- rdev->desc->enable_mask,
- rdev->desc->enable_mask);
+ rdev->desc->enable_mask, val);
 }
 EXPORT_SYMBOL_GPL(regulator_enable_regmap);
 
@@ -1937,8 +1946,15 @@ EXPORT_SYMBOL_GPL(regulator_enable_regmap);
  */
 int regulator_disable_regmap(struct regulator_dev *rdev)
 {
+   unsigned int val;
+
+   if (rdev->desc->enable_is_inverted)
+   val = rdev->desc->enable_mask;
+   else
+   val = 0;
+
return regmap_update_bits(rdev->regmap, rdev->desc->enable_reg,
- rdev->desc->enable_mask, 0);
+ rdev->desc->enable_mask, val);
 }
 EXPORT_SYMBOL_GPL(regulator_disable_regmap);
 
diff --git a/include/linux/regulator/driver.h b/include/linux/regulator/driver.h
index a741bb6..5e2d13d 100644
--- a/include/linux/regulator/driver.h
+++ b/include/linux/regulator/driver.h
@@ -200,6 +200,8 @@ enum regulator_type {
  *output when using regulator_set_voltage_sel_regmap
  * @enable_reg: Register for control when using regmap enable/disable ops
  * @enable_mask: Mask for control when using regmap enable/disable ops
+ * @enable_is_inverted: A flag to indicate set enable_mask bits to disable
+ *  when using regulator_enable_regmap and friends APIs.
  * @bypass_reg: Register for control when using regmap set_bypass
  * @bypass_mask: Mask for control when using regmap set_bypass
  *
@@ -229,6 +231,7 @@ struct regulator_desc {
unsigned int apply_bit;
unsigned int enable_reg;
unsigned int enable_mask;
+   bool enable_is_inverted;
unsigned int bypass_reg;
unsigned int bypass_mask;
 
-- 
1.7.9.5



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 02/12] perf annotate: Add a comment on the symbol__parse_objdump_line()

2013-03-04 Thread Namhyung Kim
From: Namhyung Kim 

The symbol__parse_objdump_line() parses result of the objdump run but
it's hard to follow if one doesn't know the output format of the
objdump.  Add a head comment on the function to help her.

Signed-off-by: Namhyung Kim 
---
 tools/perf/util/annotate.c | 20 
 1 file changed, 20 insertions(+)

diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
index 7eac5f0895ee..fa347b169e27 100644
--- a/tools/perf/util/annotate.c
+++ b/tools/perf/util/annotate.c
@@ -689,6 +689,26 @@ static int disasm_line__print(struct disasm_line *dl, 
struct symbol *sym, u64 st
return 0;
 }
 
+/*
+ * symbol__parse_objdump_line() parses objdump output (with -d --no-show-raw)
+ * which looks like following
+ *
+ *  00415500 <_init>:
+ *415500:   sub$0x8,%rsp
+ *415504:   mov0x2f5ad5(%rip),%rax# 70afe0 <_DYNAMIC+0x2f8>
+ *41550b:   test   %rax,%rax
+ *41550e:   je 415515 <_init+0x15>
+ *415510:   callq  416e70 <__gmon_start__@plt>
+ *415515:   add$0x8,%rsp
+ *415519:   retq
+ *
+ * it will be parsed and saved into struct disasm_line as
+ *   
+ *
+ * The offset will be a relative offset from the start of the symbol and -1
+ * means that it's not a disassembly line so should be treated differently.
+ * The ops.raw part will be parsed further according to type of the 
instruction.
+ */
 static int symbol__parse_objdump_line(struct symbol *sym, struct map *map,
  FILE *file, size_t privsize)
 {
-- 
1.7.11.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 01/12] perf annotate: Pass evsel instead of evidx on annotation functions

2013-03-04 Thread Namhyung Kim
From: Namhyung Kim 

Pass evsel instead of evidx.  This is a preparation for supporting
event group view in annotation and no functional change is intended.

Signed-off-by: Namhyung Kim 
---
 tools/perf/builtin-annotate.c | 16 +---
 tools/perf/builtin-top.c  |  2 +-
 tools/perf/ui/browsers/annotate.c | 30 +-
 tools/perf/ui/browsers/hists.c|  2 +-
 tools/perf/ui/gtk/annotate.c  | 10 ++
 tools/perf/util/annotate.c| 36 +++-
 tools/perf/util/annotate.h| 36 +++-
 tools/perf/util/hist.h|  5 +++--
 8 files changed, 75 insertions(+), 62 deletions(-)

diff --git a/tools/perf/builtin-annotate.c b/tools/perf/builtin-annotate.c
index 2e6961ea3184..2f015a99481b 100644
--- a/tools/perf/builtin-annotate.c
+++ b/tools/perf/builtin-annotate.c
@@ -109,14 +109,16 @@ static int process_sample_event(struct perf_tool *tool,
return 0;
 }
 
-static int hist_entry__tty_annotate(struct hist_entry *he, int evidx,
+static int hist_entry__tty_annotate(struct hist_entry *he,
+   struct perf_evsel *evsel,
struct perf_annotate *ann)
 {
-   return symbol__tty_annotate(he->ms.sym, he->ms.map, evidx,
+   return symbol__tty_annotate(he->ms.sym, he->ms.map, evsel,
ann->print_line, ann->full_paths, 0, 0);
 }
 
-static void hists__find_annotations(struct hists *self, int evidx,
+static void hists__find_annotations(struct hists *self,
+   struct perf_evsel *evsel,
struct perf_annotate *ann)
 {
struct rb_node *nd = rb_first(>entries), *next;
@@ -142,14 +144,14 @@ find_next:
if (use_browser == 2) {
int ret;
 
-   ret = hist_entry__gtk_annotate(he, evidx, NULL);
+   ret = hist_entry__gtk_annotate(he, evsel, NULL);
if (!ret || !ann->skip_missing)
return;
 
/* skip missing symbols */
nd = rb_next(nd);
} else if (use_browser == 1) {
-   key = hist_entry__tui_annotate(he, evidx, NULL);
+   key = hist_entry__tui_annotate(he, evsel, NULL);
switch (key) {
case -1:
if (!ann->skip_missing)
@@ -168,7 +170,7 @@ find_next:
if (next != NULL)
nd = next;
} else {
-   hist_entry__tty_annotate(he, evidx, ann);
+   hist_entry__tty_annotate(he, evsel, ann);
nd = rb_next(nd);
/*
 * Since we have a hist_entry per IP for the same
@@ -230,7 +232,7 @@ static int __cmd_annotate(struct perf_annotate *ann)
total_nr_samples += nr_samples;
hists__collapse_resort(hists);
hists__output_resort(hists);
-   hists__find_annotations(hists, pos->idx, ann);
+   hists__find_annotations(hists, pos, ann);
}
}
 
diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
index 72f6eb7b4173..1dcce3229efa 100644
--- a/tools/perf/builtin-top.c
+++ b/tools/perf/builtin-top.c
@@ -231,7 +231,7 @@ static void perf_top__show_details(struct perf_top *top)
printf("Showing %s for %s\n", perf_evsel__name(top->sym_evsel), 
symbol->name);
printf("  Events  Pcnt (>=%d%%)\n", top->sym_pcnt_filter);
 
-   more = symbol__annotate_printf(symbol, he->ms.map, top->sym_evsel->idx,
+   more = symbol__annotate_printf(symbol, he->ms.map, top->sym_evsel,
   0, top->sym_pcnt_filter, 
top->print_entries, 4);
if (top->zero)
symbol__annotate_zero_histogram(symbol, top->sym_evsel->idx);
diff --git a/tools/perf/ui/browsers/annotate.c 
b/tools/perf/ui/browsers/annotate.c
index 7dca1555c610..67798472384b 100644
--- a/tools/perf/ui/browsers/annotate.c
+++ b/tools/perf/ui/browsers/annotate.c
@@ -8,6 +8,7 @@
 #include "../../util/hist.h"
 #include "../../util/sort.h"
 #include "../../util/symbol.h"
+#include "../../util/evsel.h"
 #include 
 #include 
 
@@ -331,7 +332,7 @@ static void annotate_browser__set_rb_top(struct 
annotate_browser *browser,
 }
 
 static void annotate_browser__calc_percent(struct annotate_browser *browser,
-  int evidx)
+  struct perf_evsel *evsel)
 {
struct map_symbol *ms = browser->b.priv;
struct symbol *sym = ms->sym;
@@ -344,7 +345,7 @@ static void annotate_browser__calc_percent(struct 
annotate_browser *browser,
 

[PATCH 04/12] perf annotate: Cleanup disasm__calc_percent()

2013-03-04 Thread Namhyung Kim
From: Namhyung Kim 

The loop end condition is calculated from next disasm_line or the
symbol size if it's the last disasm_line.  But it doesn't need to be
calculated at every iteration.  Moving it out of the function can
simplify code a bit.  Also the src_line doesn't need to be checked in
every time.

Signed-off-by: Namhyung Kim 
---
 tools/perf/util/annotate.c | 30 +++---
 1 file changed, 15 insertions(+), 15 deletions(-)

diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
index a91d7b186081..ae71325d3dc7 100644
--- a/tools/perf/util/annotate.c
+++ b/tools/perf/util/annotate.c
@@ -603,29 +603,28 @@ struct disasm_line *disasm__get_next_ip_line(struct 
list_head *head, struct disa
return NULL;
 }
 
-static double disasm__calc_percent(struct disasm_line *next,
-  struct annotation *notes, int evidx,
-  s64 offset, u64 len, const char **path)
+static double disasm__calc_percent(struct annotation *notes, int evidx,
+  s64 offset, s64 end, const char **path)
 {
struct source_line *src_line = notes->src->lines;
struct sym_hist *h = annotation__histogram(notes, evidx);
unsigned int hits = 0;
double percent = 0.0;
 
-   while (offset < (s64)len &&
-  (next == NULL || offset < next->offset)) {
-   if (src_line) {
+   if (src_line) {
+   while (offset < end) {
if (*path == NULL)
*path = src_line[offset].path;
-   percent += src_line[offset].percent;
-   } else
-   hits += h->addr[offset];
 
-   ++offset;
-   }
+   percent += src_line[offset++].percent;
+   }
+   } else {
+   while (offset < end)
+   hits += h->addr[offset++];
 
-   if (src_line == NULL && h->sum)
-   percent = 100.0 * hits / h->sum;
+   if (h->sum)
+   percent = 100.0 * hits / h->sum;
+   }
 
return percent;
 }
@@ -648,8 +647,9 @@ static int disasm_line__print(struct disasm_line *dl, 
struct symbol *sym, u64 st
 
next = disasm__get_next_ip_line(>src->source, dl);
 
-   percent = disasm__calc_percent(next, notes, evsel->idx,
-  offset, len, );
+   percent = disasm__calc_percent(notes, evsel->idx, offset,
+  next ? next->offset : (s64) len,
+  );
if (percent < min_pcnt)
return -1;
 
-- 
1.7.11.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 03/12] perf annotate: Factor out disasm__calc_percent()

2013-03-04 Thread Namhyung Kim
From: Namhyung Kim 

Factor out calculation of histogram of a symbol into
disasm__calc_percent.  It'll be used for later changes.

Signed-off-by: Namhyung Kim 
---
 tools/perf/util/annotate.c | 49 --
 1 file changed, 30 insertions(+), 19 deletions(-)

diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
index fa347b169e27..a91d7b186081 100644
--- a/tools/perf/util/annotate.c
+++ b/tools/perf/util/annotate.c
@@ -603,6 +603,33 @@ struct disasm_line *disasm__get_next_ip_line(struct 
list_head *head, struct disa
return NULL;
 }
 
+static double disasm__calc_percent(struct disasm_line *next,
+  struct annotation *notes, int evidx,
+  s64 offset, u64 len, const char **path)
+{
+   struct source_line *src_line = notes->src->lines;
+   struct sym_hist *h = annotation__histogram(notes, evidx);
+   unsigned int hits = 0;
+   double percent = 0.0;
+
+   while (offset < (s64)len &&
+  (next == NULL || offset < next->offset)) {
+   if (src_line) {
+   if (*path == NULL)
+   *path = src_line[offset].path;
+   percent += src_line[offset].percent;
+   } else
+   hits += h->addr[offset];
+
+   ++offset;
+   }
+
+   if (src_line == NULL && h->sum)
+   percent = 100.0 * hits / h->sum;
+
+   return percent;
+}
+
 static int disasm_line__print(struct disasm_line *dl, struct symbol *sym, u64 
start,
  struct perf_evsel *evsel, u64 len, int min_pcnt, int 
printed,
  int max_lines, struct disasm_line *queue)
@@ -612,33 +639,17 @@ static int disasm_line__print(struct disasm_line *dl, 
struct symbol *sym, u64 st
 
if (dl->offset != -1) {
const char *path = NULL;
-   unsigned int hits = 0;
-   double percent = 0.0;
+   double percent;
const char *color;
struct annotation *notes = symbol__annotation(sym);
-   struct source_line *src_line = notes->src->lines;
-   struct sym_hist *h = annotation__histogram(notes, evsel->idx);
s64 offset = dl->offset;
const u64 addr = start + offset;
struct disasm_line *next;
 
next = disasm__get_next_ip_line(>src->source, dl);
 
-   while (offset < (s64)len &&
-  (next == NULL || offset < next->offset)) {
-   if (src_line) {
-   if (path == NULL)
-   path = src_line[offset].path;
-   percent += src_line[offset].percent;
-   } else
-   hits += h->addr[offset];
-
-   ++offset;
-   }
-
-   if (src_line == NULL && h->sum)
-   percent = 100.0 * hits / h->sum;
-
+   percent = disasm__calc_percent(next, notes, evsel->idx,
+  offset, len, );
if (percent < min_pcnt)
return -1;
 
-- 
1.7.11.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 06/12] perf evsel: Introduce perf_evsel__is_group_event() helper

2013-03-04 Thread Namhyung Kim
From: Namhyung Kim 

The perf_evsel__is_group_event function is for checking whether given
evsel needs event group view support or not.  Please note that it's
different to the existing perf_evsel__is_group_leader() which checks
only the given evsel is a leader or a standalone (i.e. non-group)
event regardless of event group feature.

Signed-off-by: Namhyung Kim 
---
 tools/perf/builtin-report.c|  2 +-
 tools/perf/ui/browsers/hists.c |  4 ++--
 tools/perf/ui/gtk/hists.c  |  7 ++-
 tools/perf/ui/hist.c   |  7 ++-
 tools/perf/util/annotate.c |  9 +++--
 tools/perf/util/evsel.h| 24 
 6 files changed, 34 insertions(+), 19 deletions(-)

diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index 96b5a7fee4bb..3f4a79ba5ada 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -314,7 +314,7 @@ static size_t hists__fprintf_nr_sample_events(struct hists 
*self,
char buf[512];
size_t size = sizeof(buf);
 
-   if (symbol_conf.event_group && evsel->nr_members > 1) {
+   if (perf_evsel__is_group_event(evsel)) {
struct perf_evsel *pos;
 
perf_evsel__group_desc(evsel, buf, size);
diff --git a/tools/perf/ui/browsers/hists.c b/tools/perf/ui/browsers/hists.c
index 0e125e1543dc..a5843fd6ab51 100644
--- a/tools/perf/ui/browsers/hists.c
+++ b/tools/perf/ui/browsers/hists.c
@@ -1193,7 +1193,7 @@ static int hists__browser_title(struct hists *hists, char 
*bf, size_t size,
char buf[512];
size_t buflen = sizeof(buf);
 
-   if (symbol_conf.event_group && evsel->nr_members > 1) {
+   if (perf_evsel__is_group_event(evsel)) {
struct perf_evsel *pos;
 
perf_evsel__group_desc(evsel, buf, buflen);
@@ -1709,7 +1709,7 @@ static void perf_evsel_menu__write(struct ui_browser 
*browser,
ui_browser__set_color(browser, current_entry ? HE_COLORSET_SELECTED :
   HE_COLORSET_NORMAL);
 
-   if (symbol_conf.event_group && evsel->nr_members > 1) {
+   if (perf_evsel__is_group_event(evsel)) {
struct perf_evsel *pos;
 
ev_name = perf_evsel__group_name(evsel);
diff --git a/tools/perf/ui/gtk/hists.c b/tools/perf/ui/gtk/hists.c
index 1e764a8ad259..6f259b3d14e2 100644
--- a/tools/perf/ui/gtk/hists.c
+++ b/tools/perf/ui/gtk/hists.c
@@ -32,21 +32,18 @@ static int __hpp__color_fmt(struct perf_hpp *hpp, struct 
hist_entry *he,
int ret;
double percent = 0.0;
struct hists *hists = he->hists;
+   struct perf_evsel *evsel = hists_to_evsel(hists);
 
if (hists->stats.total_period)
percent = 100.0 * get_field(he) / hists->stats.total_period;
 
ret = __percent_color_snprintf(hpp->buf, hpp->size, percent);
 
-   if (symbol_conf.event_group) {
+   if (perf_evsel__is_group_event(evsel)) {
int prev_idx, idx_delta;
-   struct perf_evsel *evsel = hists_to_evsel(hists);
struct hist_entry *pair;
int nr_members = evsel->nr_members;
 
-   if (nr_members <= 1)
-   return ret;
-
prev_idx = perf_evsel__group_idx(evsel);
 
list_for_each_entry(pair, >pairs.head, pairs.node) {
diff --git a/tools/perf/ui/hist.c b/tools/perf/ui/hist.c
index d671e63aa351..4bf91b09d62d 100644
--- a/tools/perf/ui/hist.c
+++ b/tools/perf/ui/hist.c
@@ -16,6 +16,7 @@ static int __hpp__fmt(struct perf_hpp *hpp, struct hist_entry 
*he,
 {
int ret;
struct hists *hists = he->hists;
+   struct perf_evsel *evsel = hists_to_evsel(hists);
 
if (fmt_percent) {
double percent = 0.0;
@@ -28,15 +29,11 @@ static int __hpp__fmt(struct perf_hpp *hpp, struct 
hist_entry *he,
} else
ret = print_fn(hpp->buf, hpp->size, fmt, get_field(he));
 
-   if (symbol_conf.event_group) {
+   if (perf_evsel__is_group_event(evsel)) {
int prev_idx, idx_delta;
-   struct perf_evsel *evsel = hists_to_evsel(hists);
struct hist_entry *pair;
int nr_members = evsel->nr_members;
 
-   if (nr_members <= 1)
-   return ret;
-
prev_idx = perf_evsel__group_idx(evsel);
 
list_for_each_entry(pair, >pairs.head, pairs.node) {
diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
index 0955cff5b0ef..f080cc40f00b 100644
--- a/tools/perf/util/annotate.c
+++ b/tools/perf/util/annotate.c
@@ -649,9 +649,7 @@ static int disasm_line__print(struct disasm_line *dl, 
struct symbol *sym, u64 st
 
next = disasm__get_next_ip_line(>src->source, dl);
 
-   if (symbol_conf.event_group &&
-   perf_evsel__is_group_leader(evsel) &&
-   evsel->nr_members > 1) {
+   if 

[PATCH 08/12] perf annotate: Support event group view for --print-line

2013-03-04 Thread Namhyung Kim
Dynamically allocate source_line_percent according to a number of
group members and save nr_pcnt to the struct source_line.  This
way we can handle multiple events in a general manner.

However since the size of struct source_line is not fixed anymore,
iterating whole source_line should care about its size.

  $ perf annotate --group --stdio --print-line

  Sorted summary for file /lib/ld-2.11.1.so
  --
 33.330.00 /build/buildd/eglibc-2.11.1/elf/rtld.c:381
 33.330.00 /build/buildd/eglibc-2.11.1/elf/dynamic-link.h:128
 33.330.00 /build/buildd/eglibc-2.11.1/elf/do-rel.h:105
  0.00   75.00 /build/buildd/eglibc-2.11.1/elf/dynamic-link.h:137
  0.00   25.00 /build/buildd/eglibc-2.11.1/elf/dynamic-link.h:187
  ...

Signed-off-by: Namhyung Kim 
---
 tools/perf/util/annotate.c | 130 +
 tools/perf/util/annotate.h |   1 +
 2 files changed, 98 insertions(+), 33 deletions(-)

diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
index ebf2596d7e2e..05e34df5d041 100644
--- a/tools/perf/util/annotate.c
+++ b/tools/perf/util/annotate.c
@@ -607,18 +607,26 @@ static double disasm__calc_percent(struct annotation 
*notes, int evidx,
   s64 offset, s64 end, const char **path)
 {
struct source_line *src_line = notes->src->lines;
-   struct sym_hist *h = annotation__histogram(notes, evidx);
-   unsigned int hits = 0;
double percent = 0.0;
 
if (src_line) {
+   size_t sizeof_src_line = sizeof(*src_line) +
+   sizeof(src_line->p) * (src_line->nr_pcnt - 1);
+
while (offset < end) {
+   src_line = (void *)notes->src->lines +
+   (sizeof_src_line * offset);
+
if (*path == NULL)
-   *path = src_line[offset].path;
+   *path = src_line->path;
 
-   percent += src_line[offset++].p[0].percent;
+   percent += src_line->p[evidx].percent;
+   offset++;
}
} else {
+   struct sym_hist *h = annotation__histogram(notes, evidx);
+   unsigned int hits = 0;
+
while (offset < end)
hits += h->addr[offset++];
 
@@ -658,9 +666,10 @@ static int disasm_line__print(struct disasm_line *dl, 
struct symbol *sym, u64 st
 
for (i = 0; i < nr_percent; i++) {
percent = disasm__calc_percent(notes,
-   evsel->idx + i, offset,
-   next ? next->offset : (s64) len,
-   );
+   notes->src->lines ? i : evsel->idx + i,
+   offset,
+   next ? next->offset : (s64) len,
+   );
 
ppercents[i] = percent;
if (percent > max_percent)
@@ -921,7 +930,7 @@ static void insert_source_line(struct rb_root *root, struct 
source_line *src_lin
struct source_line *iter;
struct rb_node **p = >rb_node;
struct rb_node *parent = NULL;
-   int ret;
+   int i, ret;
 
while (*p != NULL) {
parent = *p;
@@ -929,7 +938,8 @@ static void insert_source_line(struct rb_root *root, struct 
source_line *src_lin
 
ret = strcmp(iter->path, src_line->path);
if (ret == 0) {
-   iter->p[0].percent_sum += src_line->p[0].percent;
+   for (i = 0; i < src_line->nr_pcnt; i++)
+   iter->p[i].percent_sum += 
src_line->p[i].percent;
return;
}
 
@@ -939,12 +949,26 @@ static void insert_source_line(struct rb_root *root, 
struct source_line *src_lin
p = &(*p)->rb_right;
}
 
-   src_line->p[0].percent_sum = src_line->p[0].percent;
+   for (i = 0; i < src_line->nr_pcnt; i++)
+   src_line->p[i].percent_sum = src_line->p[i].percent;
 
rb_link_node(_line->node, parent, p);
rb_insert_color(_line->node, root);
 }
 
+static int cmp_source_line(struct source_line *a, struct source_line *b)
+{
+   int i;
+
+   for (i = 0; i < a->nr_pcnt; i++) {
+   if (a->p[i].percent_sum == b->p[i].percent_sum)
+   continue;
+   return a->p[i].percent_sum > b->p[i].percent_sum;
+   }
+
+   return 0;
+}
+
 static void __resort_source_line(struct rb_root *root, struct source_line 
*src_line)
 {
struct source_line *iter;
@@ -955,7 +979,7 @@ static void __resort_source_line(struct rb_root *root, 
struct source_line *src_l

[PATCH 10/12] perf annotate browser: Use disasm__calc_percent()

2013-03-04 Thread Namhyung Kim
The disasm_line__calc_percent() which was used by annotate browser
code almost duplicates disasm__calc_percent.  Let's get rid of the
code duplication.

Signed-off-by: Namhyung Kim 
---
 tools/perf/ui/browsers/annotate.c | 50 +++
 tools/perf/util/annotate.c|  4 ++--
 tools/perf/util/annotate.h|  4 
 3 files changed, 20 insertions(+), 38 deletions(-)

diff --git a/tools/perf/ui/browsers/annotate.c 
b/tools/perf/ui/browsers/annotate.c
index 62369f0b6608..8b16926dd56e 100644
--- a/tools/perf/ui/browsers/annotate.c
+++ b/tools/perf/ui/browsers/annotate.c
@@ -240,40 +240,6 @@ static unsigned int annotate_browser__refresh(struct 
ui_browser *browser)
return ret;
 }
 
-static double disasm_line__calc_percent(struct disasm_line *dl, struct symbol 
*sym, int evidx)
-{
-   double percent = 0.0;
-
-   if (dl->offset != -1) {
-   int len = sym->end - sym->start;
-   unsigned int hits = 0;
-   struct annotation *notes = symbol__annotation(sym);
-   struct source_line *src_line = notes->src->lines;
-   struct sym_hist *h = annotation__histogram(notes, evidx);
-   s64 offset = dl->offset;
-   struct disasm_line *next;
-
-   next = disasm__get_next_ip_line(>src->source, dl);
-   while (offset < (s64)len &&
-  (next == NULL || offset < next->offset)) {
-   if (src_line) {
-   percent += src_line[offset].p[0].percent;
-   } else
-   hits += h->addr[offset];
-
-   ++offset;
-   }
-   /*
-* If the percentage wasn't already calculated in
-* symbol__get_source_line, do it now:
-*/
-   if (src_line == NULL && h->sum)
-   percent = 100.0 * hits / h->sum;
-   }
-
-   return percent;
-}
-
 static void disasm_rb_tree__insert(struct rb_root *root, struct 
browser_disasm_line *bdl)
 {
struct rb_node **p = >rb_node;
@@ -337,7 +303,8 @@ static void annotate_browser__calc_percent(struct 
annotate_browser *browser,
struct map_symbol *ms = browser->b.priv;
struct symbol *sym = ms->sym;
struct annotation *notes = symbol__annotation(sym);
-   struct disasm_line *pos;
+   struct disasm_line *pos, *next;
+   s64 len = symbol__size(sym);
 
browser->entries = RB_ROOT;
 
@@ -345,7 +312,18 @@ static void annotate_browser__calc_percent(struct 
annotate_browser *browser,
 
list_for_each_entry(pos, >src->source, node) {
struct browser_disasm_line *bpos = disasm_line__browser(pos);
-   bpos->percent[0] = disasm_line__calc_percent(pos, sym, 
evsel->idx);
+   const char *path = NULL;
+
+   if (pos->offset == -1) {
+   RB_CLEAR_NODE(>rb_node);
+   continue;
+   }
+
+   next = disasm__get_next_ip_line(>src->source, pos);
+   bpos->percent[0] = disasm__calc_percent(notes, evsel->idx,
+   pos->offset, next ? next->offset : len,
+   );
+
if (bpos->percent[0] < 0.01) {
RB_CLEAR_NODE(>rb_node);
continue;
diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
index 05e34df5d041..d102716c43a1 100644
--- a/tools/perf/util/annotate.c
+++ b/tools/perf/util/annotate.c
@@ -603,8 +603,8 @@ struct disasm_line *disasm__get_next_ip_line(struct 
list_head *head, struct disa
return NULL;
 }
 
-static double disasm__calc_percent(struct annotation *notes, int evidx,
-  s64 offset, s64 end, const char **path)
+double disasm__calc_percent(struct annotation *notes, int evidx, s64 offset,
+   s64 end, const char **path)
 {
struct source_line *src_line = notes->src->lines;
double percent = 0.0;
diff --git a/tools/perf/util/annotate.h b/tools/perf/util/annotate.h
index 68f851e6c685..6f3c16f01ab4 100644
--- a/tools/perf/util/annotate.h
+++ b/tools/perf/util/annotate.h
@@ -50,6 +50,8 @@ bool ins__is_jump(const struct ins *ins);
 bool ins__is_call(const struct ins *ins);
 int ins__scnprintf(struct ins *ins, char *bf, size_t size, struct ins_operands 
*ops);
 
+struct annotation;
+
 struct disasm_line {
struct list_headnode;
s64 offset;
@@ -68,6 +70,8 @@ void disasm_line__free(struct disasm_line *dl);
 struct disasm_line *disasm__get_next_ip_line(struct list_head *head, struct 
disasm_line *pos);
 int disasm_line__scnprintf(struct disasm_line *dl, char *bf, size_t size, bool 
raw);
 size_t disasm__fprintf(struct list_head *head, FILE *fp);
+double disasm__calc_percent(struct annotation *notes, int evidx, s64 offset,
+  

[PATCH 12/12] perf annotate/gtk: Support event group view on GTK

2013-03-04 Thread Namhyung Kim
From: Namhyung Kim 

Add support for event group view to GTK annotation browser.

Cc: Pekka Enberg 
Signed-off-by: Namhyung Kim 
---
 tools/perf/ui/gtk/annotate.c | 18 --
 1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/tools/perf/ui/gtk/annotate.c b/tools/perf/ui/gtk/annotate.c
index 6e2fc7e3f093..f538794615db 100644
--- a/tools/perf/ui/gtk/annotate.c
+++ b/tools/perf/ui/gtk/annotate.c
@@ -33,7 +33,7 @@ static int perf_gtk__get_percent(char *buf, size_t size, 
struct symbol *sym,
return 0;
 
symhist = annotation__histogram(symbol__annotation(sym), evidx);
-   if (!symhist->addr[dl->offset])
+   if (!symbol_conf.event_group && !symhist->addr[dl->offset])
return 0;
 
percent = 100.0 * symhist->addr[dl->offset] / symhist->sum;
@@ -119,10 +119,24 @@ static int perf_gtk__annotate_symbol(GtkWidget *window, 
struct symbol *sym,
 
list_for_each_entry(pos, >src->source, node) {
GtkTreeIter iter;
+   int ret = 0;
 
gtk_list_store_append(store, );
 
-   if (perf_gtk__get_percent(s, sizeof(s), sym, pos, evsel->idx))
+   if (perf_evsel__is_group_event(evsel)) {
+   for (i = 0; i < evsel->nr_members; i++) {
+   ret += perf_gtk__get_percent(s + ret,
+sizeof(s) - ret,
+sym, pos,
+evsel->idx + i);
+   ret += scnprintf(s + ret, sizeof(s) - ret, " ");
+   }
+   } else {
+   ret = perf_gtk__get_percent(s, sizeof(s), sym, pos,
+   evsel->idx);
+   }
+
+   if (ret)
gtk_list_store_set(store, , ANN_COL__PERCENT, s, 
-1);
if (perf_gtk__get_offset(s, sizeof(s), sym, map, pos))
gtk_list_store_set(store, , ANN_COL__OFFSET, s, 
-1);
-- 
1.7.11.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 07/12] perf annotate: Factor out struct source_line_percent

2013-03-04 Thread Namhyung Kim
From: Namhyung Kim 

The source_line_percent struct contains percentage value of the symbol
histogram.  This is a preparation of event group view change.

Signed-off-by: Namhyung Kim 
---
 tools/perf/ui/browsers/annotate.c |  2 +-
 tools/perf/util/annotate.c| 14 +++---
 tools/perf/util/annotate.h|  8 ++--
 3 files changed, 14 insertions(+), 10 deletions(-)

diff --git a/tools/perf/ui/browsers/annotate.c 
b/tools/perf/ui/browsers/annotate.c
index 67798472384b..cfae57f90146 100644
--- a/tools/perf/ui/browsers/annotate.c
+++ b/tools/perf/ui/browsers/annotate.c
@@ -257,7 +257,7 @@ static double disasm_line__calc_percent(struct disasm_line 
*dl, struct symbol *s
while (offset < (s64)len &&
   (next == NULL || offset < next->offset)) {
if (src_line) {
-   percent += src_line[offset].percent;
+   percent += src_line[offset].p[0].percent;
} else
hits += h->addr[offset];
 
diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
index f080cc40f00b..ebf2596d7e2e 100644
--- a/tools/perf/util/annotate.c
+++ b/tools/perf/util/annotate.c
@@ -616,7 +616,7 @@ static double disasm__calc_percent(struct annotation 
*notes, int evidx,
if (*path == NULL)
*path = src_line[offset].path;
 
-   percent += src_line[offset++].percent;
+   percent += src_line[offset++].p[0].percent;
}
} else {
while (offset < end)
@@ -929,7 +929,7 @@ static void insert_source_line(struct rb_root *root, struct 
source_line *src_lin
 
ret = strcmp(iter->path, src_line->path);
if (ret == 0) {
-   iter->percent_sum += src_line->percent;
+   iter->p[0].percent_sum += src_line->p[0].percent;
return;
}
 
@@ -939,7 +939,7 @@ static void insert_source_line(struct rb_root *root, struct 
source_line *src_lin
p = &(*p)->rb_right;
}
 
-   src_line->percent_sum = src_line->percent;
+   src_line->p[0].percent_sum = src_line->p[0].percent;
 
rb_link_node(_line->node, parent, p);
rb_insert_color(_line->node, root);
@@ -955,7 +955,7 @@ static void __resort_source_line(struct rb_root *root, 
struct source_line *src_l
parent = *p;
iter = rb_entry(parent, struct source_line, node);
 
-   if (src_line->percent_sum > iter->percent_sum)
+   if (src_line->p[0].percent_sum > iter->p[0].percent_sum)
p = &(*p)->rb_left;
else
p = &(*p)->rb_right;
@@ -1025,8 +1025,8 @@ static int symbol__get_source_line(struct symbol *sym, 
struct map *map,
u64 offset;
FILE *fp;
 
-   src_line[i].percent = 100.0 * h->addr[i] / h->sum;
-   if (src_line[i].percent <= 0.5)
+   src_line[i].p[0].percent = 100.0 * h->addr[i] / h->sum;
+   if (src_line[i].p[0].percent <= 0.5)
continue;
 
offset = start + i;
@@ -1073,7 +1073,7 @@ static void print_summary(struct rb_root *root, const 
char *filename)
char *path;
 
src_line = rb_entry(node, struct source_line, node);
-   percent = src_line->percent_sum;
+   percent = src_line->p[0].percent_sum;
color = get_percent_color(percent);
path = src_line->path;
 
diff --git a/tools/perf/util/annotate.h b/tools/perf/util/annotate.h
index 376395475663..bb2e3f998983 100644
--- a/tools/perf/util/annotate.h
+++ b/tools/perf/util/annotate.h
@@ -74,11 +74,15 @@ struct sym_hist {
u64 addr[0];
 };
 
-struct source_line {
-   struct rb_node  node;
+struct source_line_percent {
double  percent;
double  percent_sum;
+};
+
+struct source_line {
+   struct rb_node  node;
char*path;
+   struct source_line_percent p[1];
 };
 
 /** struct annotated_source - symbols with hits have this attached as in 
sannotation
-- 
1.7.11.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 05/12] perf annotate: Add basic support to event group view

2013-03-04 Thread Namhyung Kim
From: Namhyung Kim 

Add --group option to enable event grouping.  When enabled, all the
group members information will be shown with the leader so skip
non-leader events.

It only supports --stdio output currently.  Later patches will extend
additional features.

 $ perf annotate --group --stdio
 ...
  Percent |  Source code & Disassembly of libpthread-2.15.so
 

  :
  :
  :
  :  Disassembly of section .text:
  :
  :  00387dc0aa50 
<__pthread_mutex_unlock_usercnt>:
 8.082.405.29 :387dc0aa50:   mov%rdi,%rdx
 0.000.000.00 :387dc0aa53:   mov0x10(%rdi),%edi
 0.000.000.00 :387dc0aa56:   mov%edi,%eax
 0.000.800.00 :387dc0aa58:   and$0x7f,%eax
 3.032.403.53 :387dc0aa5b:   test   $0x7c,%dil
 0.000.000.00 :387dc0aa5f:   jne387dc0aaa9 
<__pthread_mutex_unlock_use
 0.000.000.00 :387dc0aa61:   test   %eax,%eax
 0.000.000.00 :387dc0aa63:   jne387dc0aa85 
<__pthread_mutex_unlock_use
 0.000.000.00 :387dc0aa65:   and$0x80,%edi
 0.000.000.00 :387dc0aa6b:   test   %esi,%esi
 3.035.607.06 :387dc0aa6d:   movl   $0x0,0x8(%rdx)
 0.000.000.59 :387dc0aa74:   je 387dc0aa7a 
<__pthread_mutex_unlock_use
 0.000.000.00 :387dc0aa76:   subl   $0x1,0xc(%rdx)
 2.025.601.18 :387dc0aa7a:   mov%edi,%esi
 0.000.000.00 :387dc0aa7c:   lock decl (%rdx)
83.84   83.20   82.35 :387dc0aa7f:   jne387dc0aada 
<_L_unlock_586>
 0.000.000.00 :387dc0aa81:   nop
 0.000.000.00 :387dc0aa82:   xor%eax,%eax
 0.000.000.00 :387dc0aa84:   retq
 ...

Signed-off-by: Namhyung Kim 
---
 tools/perf/Documentation/perf-annotate.txt |  3 ++
 tools/perf/builtin-annotate.c  |  7 
 tools/perf/util/annotate.c | 64 +-
 3 files changed, 63 insertions(+), 11 deletions(-)

diff --git a/tools/perf/Documentation/perf-annotate.txt 
b/tools/perf/Documentation/perf-annotate.txt
index 5ad07ef417f0..e9cd39a92dc2 100644
--- a/tools/perf/Documentation/perf-annotate.txt
+++ b/tools/perf/Documentation/perf-annotate.txt
@@ -93,6 +93,9 @@ OPTIONS
 --skip-missing::
Skip symbols that cannot be annotated.
 
+--group::
+   Show event group information together
+
 SEE ALSO
 
 linkperf:perf-record[1], linkperf:perf-report[1]
diff --git a/tools/perf/builtin-annotate.c b/tools/perf/builtin-annotate.c
index 2f015a99481b..ae36f3cb5410 100644
--- a/tools/perf/builtin-annotate.c
+++ b/tools/perf/builtin-annotate.c
@@ -232,6 +232,11 @@ static int __cmd_annotate(struct perf_annotate *ann)
total_nr_samples += nr_samples;
hists__collapse_resort(hists);
hists__output_resort(hists);
+
+   if (symbol_conf.event_group &&
+   !perf_evsel__is_group_leader(pos))
+   continue;
+
hists__find_annotations(hists, pos, ann);
}
}
@@ -314,6 +319,8 @@ int cmd_annotate(int argc, const char **argv, const char 
*prefix __maybe_unused)
   "Specify disassembler style (e.g. -M intel for intel 
syntax)"),
OPT_STRING(0, "objdump", _path, "path",
   "objdump binary to use for disassembly and annotations"),
+   OPT_BOOLEAN(0, "group", _conf.event_group,
+   "Show event group information together"),
OPT_END()
};
 
diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
index ae71325d3dc7..0955cff5b0ef 100644
--- a/tools/perf/util/annotate.c
+++ b/tools/perf/util/annotate.c
@@ -638,7 +638,9 @@ static int disasm_line__print(struct disasm_line *dl, 
struct symbol *sym, u64 st
 
if (dl->offset != -1) {
const char *path = NULL;
-   double percent;
+   double percent, max_percent = 0.0;
+   double *ppercents = 
+   int i, nr_percent = 1;
const char *color;
struct annotation *notes = symbol__annotation(sym);
s64 offset = dl->offset;
@@ -647,10 +649,27 @@ static int disasm_line__print(struct disasm_line *dl, 
struct symbol *sym, u64 st
 
next = disasm__get_next_ip_line(>src->source, dl);
 
-   percent = disasm__calc_percent(notes, evsel->idx, offset,
-  next ? next->offset : (s64) len,
-  );
-   

[PATCH 11/12] perf annotate browser: Support event group view on TUI

2013-03-04 Thread Namhyung Kim
Dynamically allocate browser_disasm_line according to a number of
group members and save nr_pcnt to the struct.  This way we can
handle multiple events in a general manner.

Signed-off-by: Namhyung Kim 
---
 tools/perf/ui/browsers/annotate.c | 67 +--
 1 file changed, 57 insertions(+), 10 deletions(-)

diff --git a/tools/perf/ui/browsers/annotate.c 
b/tools/perf/ui/browsers/annotate.c
index 8b16926dd56e..95d5998fe57e 100644
--- a/tools/perf/ui/browsers/annotate.c
+++ b/tools/perf/ui/browsers/annotate.c
@@ -17,6 +17,7 @@ struct browser_disasm_line {
u32 idx;
int idx_asm;
int jump_sources;
+   int nr_pcnt;
double  percent[1];
 };
 
@@ -95,14 +96,24 @@ static void annotate_browser__write(struct ui_browser 
*browser, void *entry, int
 (!current_entry || (browser->use_navkeypressed &&
 !browser->navkeypressed)));
int width = browser->width, printed;
+   int i, pcnt_width = 7 * bdl->nr_pcnt;
+   double percent_max = 0.0;
char bf[256];
 
-   if (dl->offset != -1 && bdl->percent[0] != 0.0) {
-   ui_browser__set_percent_color(browser, bdl->percent[0], 
current_entry);
-   slsmg_printf("%6.2f ", bdl->percent[0]);
+   for (i = 0; i < bdl->nr_pcnt; i++) {
+   if (bdl->percent[i] > percent_max)
+   percent_max = bdl->percent[i];
+   }
+
+   if (dl->offset != -1 && percent_max != 0.0) {
+   for (i = 0; i < bdl->nr_pcnt; i++) {
+   ui_browser__set_percent_color(browser, bdl->percent[i],
+ current_entry);
+   slsmg_printf("%6.2f ", bdl->percent[i]);
+   }
} else {
ui_browser__set_percent_color(browser, 0, current_entry);
-   slsmg_write_nstring(" ", 7);
+   slsmg_write_nstring(" ", pcnt_width);
}
 
SLsmg_write_char(' ');
@@ -112,12 +123,12 @@ static void annotate_browser__write(struct ui_browser 
*browser, void *entry, int
width += 1;
 
if (!*dl->line)
-   slsmg_write_nstring(" ", width - 7);
+   slsmg_write_nstring(" ", width - pcnt_width);
else if (dl->offset == -1) {
printed = scnprintf(bf, sizeof(bf), "%*s  ",
ab->addr_width, " ");
slsmg_write_nstring(bf, printed);
-   slsmg_write_nstring(dl->line, width - printed - 6);
+   slsmg_write_nstring(dl->line, width - printed - pcnt_width + 1);
} else {
u64 addr = dl->offset;
int color = -1;
@@ -176,7 +187,7 @@ static void annotate_browser__write(struct ui_browser 
*browser, void *entry, int
}
 
disasm_line__scnprintf(dl, bf, sizeof(bf), 
!annotate_browser__opts.use_offset);
-   slsmg_write_nstring(bf, width - 10 - printed);
+   slsmg_write_nstring(bf, width - pcnt_width - 3 - printed);
}
 
if (current_entry)
@@ -201,6 +212,7 @@ static void annotate_browser__draw_current_jump(struct 
ui_browser *browser)
unsigned int from, to;
struct map_symbol *ms = ab->b.priv;
struct symbol *sym = ms->sym;
+   u8 pcnt_width = 7;
 
/* PLT symbols contain external offsets */
if (strstr(sym->name, "@plt"))
@@ -224,22 +236,48 @@ static void annotate_browser__draw_current_jump(struct 
ui_browser *browser)
to = (u64)btarget->idx;
}
 
+   pcnt_width *= bcursor->nr_pcnt;
+
ui_browser__set_color(browser, HE_COLORSET_CODE);
-   __ui_browser__line_arrow(browser, 9 + ab->addr_width, from, to);
+   __ui_browser__line_arrow(browser, pcnt_width + 2 + ab->addr_width,
+from, to);
 }
 
 static unsigned int annotate_browser__refresh(struct ui_browser *browser)
 {
+   struct annotate_browser *ab;
+   struct disasm_line *cursor;
+   struct browser_disasm_line *bcursor;
int ret = ui_browser__list_head_refresh(browser);
+   int pcnt_width;
+
+   ab = container_of(browser, struct annotate_browser, b);
+   cursor = ab->offsets[0];
+   bcursor = disasm_line__browser(cursor);
+
+   pcnt_width = 7 * bcursor->nr_pcnt;
 
if (annotate_browser__opts.jump_arrows)
annotate_browser__draw_current_jump(browser);
 
ui_browser__set_color(browser, HE_COLORSET_NORMAL);
-   __ui_browser__vline(browser, 7, 0, browser->height - 1);
+   __ui_browser__vline(browser, pcnt_width, 0, browser->height - 1);
return ret;
 }
 
+static int disasm__cmp(struct browser_disasm_line *a,
+  struct browser_disasm_line *b)
+{
+   int i;
+
+   for (i = 0; i < a->nr_pcnt; i++) {
+   

[PATCH 00/12] perf annotate: Add support for event group view (v2)

2013-03-04 Thread Namhyung Kim
Hi all,

This patchset implements event group view on perf annotate.  It's
basically a rebased version and major difference to prior version is
the GTK annotation browser support.

Here goes an example:

 $ perf annotate --group --stdio

  Percent |  Source code & Disassembly of libpthread-2.15.so
 

  :
  :
  :
  :  Disassembly of section .text:
  :
  :  00387dc0aa50 
<__pthread_mutex_unlock_usercnt>:
  crtstuff.c:0
 8.082.405.29 :387dc0aa50:   mov%rdi,%rdx
 0.000.000.00 :387dc0aa53:   mov0x10(%rdi),%edi
 0.000.000.00 :387dc0aa56:   mov%edi,%eax
  crtstuff.c:0
 0.000.800.00 :387dc0aa58:   and$0x7f,%eax
 3.032.403.53 :387dc0aa5b:   test   $0x7c,%dil
 0.000.000.00 :387dc0aa5f:   jne387dc0aaa9 
<__pthread_mutex_unlock_use
 0.000.000.00 :387dc0aa61:   test   %eax,%eax
 0.000.000.00 :387dc0aa63:   jne387dc0aa85 
<__pthread_mutex_unlock_use
 0.000.000.00 :387dc0aa65:   and$0x80,%edi
 0.000.000.00 :387dc0aa6b:   test   %esi,%esi
  crtstuff.c:0
 3.035.607.06 :387dc0aa6d:   movl   $0x0,0x8(%rdx)
  crtstuff.c:0
 0.000.000.59 :387dc0aa74:   je 387dc0aa7a 
<__pthread_mutex_unlock_use
 0.000.000.00 :387dc0aa76:   subl   $0x1,0xc(%rdx)
  crtstuff.c:0
 2.025.601.18 :387dc0aa7a:   mov%edi,%esi
 0.000.000.00 :387dc0aa7c:   lock decl (%rdx)
83.84   83.20   82.35 :387dc0aa7f:   jne387dc0aada 
<_L_unlock_586>
 0.000.000.00 :387dc0aa81:   nop
 0.000.000.00 :387dc0aa82:   xor%eax,%eax
 0.000.000.00 :387dc0aa84:   retq   
 ...


You can access it via perf/annotate-group-v2 branch on my tree

  git://git.kernel.org/pub/scm/linux/kernel/git/namhyung/linux-perf.git

Any comments are welcome, thanks
Namhyung


Namhyung Kim (12):
  perf annotate: Pass evsel instead of evidx on annotation functions
  perf annotate: Add a comment on the symbol__parse_objdump_line()
  perf annotate: Factor out disasm__calc_percent()
  perf annotate: Cleanup disasm__calc_percent()
  perf annotate: Add basic support to event group view
  perf evsel: Introduce perf_evsel__is_group_event() helper
  perf annotate: Factor out struct source_line_percent
  perf annotate: Support event group view for --print-line
  perf annotate browser: Make browser_disasm_line->percent an array
  perf annotate browser: Use disasm__calc_percent()
  perf annotate browser: Support event group view on TUI
  perf annotate/gtk: Support event group view on GTK

 tools/perf/Documentation/perf-annotate.txt |   3 +
 tools/perf/builtin-annotate.c  |  23 ++-
 tools/perf/builtin-report.c|   2 +-
 tools/perf/builtin-top.c   |   2 +-
 tools/perf/ui/browsers/annotate.c  | 139 +--
 tools/perf/ui/browsers/hists.c |   6 +-
 tools/perf/ui/gtk/annotate.c   |  26 ++-
 tools/perf/ui/gtk/hists.c  |   7 +-
 tools/perf/ui/hist.c   |   7 +-
 tools/perf/util/annotate.c | 262 ++---
 tools/perf/util/annotate.h |  49 +++---
 tools/perf/util/evsel.h|  24 +++
 tools/perf/util/hist.h |   5 +-
 13 files changed, 389 insertions(+), 166 deletions(-)

-- 
1.7.11.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 09/12] perf annotate browser: Make browser_disasm_line->percent an array

2013-03-04 Thread Namhyung Kim
Make percent field of struct browser_disasm_line an array and
move it to the last.  This is a preparation of event group view
feature.

Signed-off-by: Namhyung Kim 
---
 tools/perf/ui/browsers/annotate.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/tools/perf/ui/browsers/annotate.c 
b/tools/perf/ui/browsers/annotate.c
index cfae57f90146..62369f0b6608 100644
--- a/tools/perf/ui/browsers/annotate.c
+++ b/tools/perf/ui/browsers/annotate.c
@@ -14,10 +14,10 @@
 
 struct browser_disasm_line {
struct rb_node  rb_node;
-   double  percent;
u32 idx;
int idx_asm;
int jump_sources;
+   double  percent[1];
 };
 
 static struct annotate_browser_opt {
@@ -97,9 +97,9 @@ static void annotate_browser__write(struct ui_browser 
*browser, void *entry, int
int width = browser->width, printed;
char bf[256];
 
-   if (dl->offset != -1 && bdl->percent != 0.0) {
-   ui_browser__set_percent_color(browser, bdl->percent, 
current_entry);
-   slsmg_printf("%6.2f ", bdl->percent);
+   if (dl->offset != -1 && bdl->percent[0] != 0.0) {
+   ui_browser__set_percent_color(browser, bdl->percent[0], 
current_entry);
+   slsmg_printf("%6.2f ", bdl->percent[0]);
} else {
ui_browser__set_percent_color(browser, 0, current_entry);
slsmg_write_nstring(" ", 7);
@@ -283,7 +283,7 @@ static void disasm_rb_tree__insert(struct rb_root *root, 
struct browser_disasm_l
while (*p != NULL) {
parent = *p;
l = rb_entry(parent, struct browser_disasm_line, rb_node);
-   if (bdl->percent < l->percent)
+   if (bdl->percent[0] < l->percent[0])
p = &(*p)->rb_left;
else
p = &(*p)->rb_right;
@@ -345,8 +345,8 @@ static void annotate_browser__calc_percent(struct 
annotate_browser *browser,
 
list_for_each_entry(pos, >src->source, node) {
struct browser_disasm_line *bpos = disasm_line__browser(pos);
-   bpos->percent = disasm_line__calc_percent(pos, sym, evsel->idx);
-   if (bpos->percent < 0.01) {
+   bpos->percent[0] = disasm_line__calc_percent(pos, sym, 
evsel->idx);
+   if (bpos->percent[0] < 0.01) {
RB_CLEAR_NODE(>rb_node);
continue;
}
-- 
1.7.11.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/1] kernel/pid_namespace.c: Fixing a lack of cleanup (Probable resources leak).

2013-03-04 Thread Cyrill Gorcunov
On Tue, Mar 05, 2013 at 02:04:45AM -0300, Raphael S Carvalho wrote:
> >
> > Actually I noticed this problem and I think it is not a BUG.
> > Since the pid_cache is created for all pid namespace which have the same 
> > level.
> > Even this pid namespace is failed to create, the pid_cache will not be 
> > leaked, Other
> > pid namespace which has the same level will use the pid_cache and no need to
> > allocate it again. In other words, the pid_cache for every level pid 
> > namespace will
> > only be created once.
> >
> > I also think this patch add a bug,because there may be other pid 
> > namespace's pid_cachep
> > points to the same pid_cache which will be free at the by label 
> > out_free_cachep.
> >

Yup, drop this patch.

> 
> Yeah, I found the snippet of code which searches for the pcache with
> the same level.
>  46list_for_each_entry(pcache, _caches_lh, list)
>  47if (pcache->nr_ids == nr_ids)
>  48goto out;
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC] MIPS: Build uasm-generated code only once to avoid CPU Hotplug problem

2013-03-04 Thread Huacai Chen
I'm sorry, this is the only patch, please ignore [01/14].

On Mon, Mar 4, 2013 at 8:56 PM, Huacai Chen  wrote:
> Currently, clear_page()/copy_page() are generated by Micro-assembler
> dynamically. But they are unavailable until uasm_resolve_relocs() has
> finished because jump labels are illegal before that. Since these
> functions are shared by every CPU, we only call build_clear_page()/
> build_copy_page() on CPU#0 at boot time. Without this patch, programs
> will get random memory corruption (segmentation fault, bus error, etc.)
> while CPU Hotplug (e.g. one CPU is using clear_page() while another is
> generating it in cpu_cache_init()).
>
> For similar reasons we modify build_tlb_refill_handler()'s invocation.
>
> Signed-off-by: Huacai Chen 
> Signed-off-by: Hongbing Hu 
> ---
>  arch/mips/mm/c-octeon.c|6 --
>  arch/mips/mm/c-r3k.c   |6 --
>  arch/mips/mm/c-r4k.c   |6 --
>  arch/mips/mm/c-tx39.c  |6 --
>  arch/mips/mm/tlb-r3k.c |3 ++-
>  arch/mips/mm/tlb-r4k.c |3 ++-
>  arch/mips/mm/tlb-r8k.c |3 ++-
>  arch/mips/sgi-ip27/ip27-init.c |3 ++-
>  8 files changed, 24 insertions(+), 12 deletions(-)
>
> diff --git a/arch/mips/mm/c-octeon.c b/arch/mips/mm/c-octeon.c
> index 6ec04da..181a1bc 100644
> --- a/arch/mips/mm/c-octeon.c
> +++ b/arch/mips/mm/c-octeon.c
> @@ -280,8 +280,10 @@ void __cpuinit octeon_cache_init(void)
>
> __flush_kernel_vmap_range   = octeon_flush_kernel_vmap_range;
>
> -   build_clear_page();
> -   build_copy_page();
> +   if (smp_processor_id() == 0) {
> +   build_clear_page();
> +   build_copy_page();
> +   }
>
> board_cache_error_setup = octeon_cache_error_setup;
>  }
> diff --git a/arch/mips/mm/c-r3k.c b/arch/mips/mm/c-r3k.c
> index 031c4c2..b7b0cfd 100644
> --- a/arch/mips/mm/c-r3k.c
> +++ b/arch/mips/mm/c-r3k.c
> @@ -342,6 +342,8 @@ void __cpuinit r3k_cache_init(void)
> printk("Primary data cache %ldkB, linesize %ld bytes.\n",
> dcache_size >> 10, dcache_lsize);
>
> -   build_clear_page();
> -   build_copy_page();
> +   if (smp_processor_id() == 0) {
> +   build_clear_page();
> +   build_copy_page();
> +   }
>  }
> diff --git a/arch/mips/mm/c-r4k.c b/arch/mips/mm/c-r4k.c
> index 5f9171d..3f671d6 100644
> --- a/arch/mips/mm/c-r4k.c
> +++ b/arch/mips/mm/c-r4k.c
> @@ -1548,8 +1548,10 @@ void __cpuinit r4k_cache_init(void)
> }
>  #endif
>
> -   build_clear_page();
> -   build_copy_page();
> +   if (smp_processor_id() == 0) {
> +   build_clear_page();
> +   build_copy_page();
> +   }
>  #if !defined(CONFIG_MIPS_CMP)
> local_r4k___flush_cache_all(NULL);
>  #endif
> diff --git a/arch/mips/mm/c-tx39.c b/arch/mips/mm/c-tx39.c
> index 87d23ca..1e42e12 100644
> --- a/arch/mips/mm/c-tx39.c
> +++ b/arch/mips/mm/c-tx39.c
> @@ -434,7 +434,9 @@ void __cpuinit tx39_cache_init(void)
> printk("Primary data cache %ldkB, linesize %d bytes\n",
> dcache_size >> 10, current_cpu_data.dcache.linesz);
>
> -   build_clear_page();
> -   build_copy_page();
> +   if (smp_processor_id() == 0) {
> +   build_clear_page();
> +   build_copy_page();
> +   }
> tx39h_flush_icache_all();
>  }
> diff --git a/arch/mips/mm/tlb-r3k.c b/arch/mips/mm/tlb-r3k.c
> index a63d1ed..86b4a79 100644
> --- a/arch/mips/mm/tlb-r3k.c
> +++ b/arch/mips/mm/tlb-r3k.c
> @@ -280,5 +280,6 @@ void __cpuinit tlb_init(void)
>  {
> local_flush_tlb_all();
>
> -   build_tlb_refill_handler();
> +   if (smp_processor_id() == 0)
> +   build_tlb_refill_handler();
>  }
> diff --git a/arch/mips/mm/tlb-r4k.c b/arch/mips/mm/tlb-r4k.c
> index 0113330..db19624 100644
> --- a/arch/mips/mm/tlb-r4k.c
> +++ b/arch/mips/mm/tlb-r4k.c
> @@ -439,5 +439,6 @@ void __cpuinit tlb_init(void)
> printk("Ignoring invalid argument ntlb=%d\n", ntlb);
> }
>
> -   build_tlb_refill_handler();
> +   if (smp_processor_id() == 0)
> +   build_tlb_refill_handler();
>  }
> diff --git a/arch/mips/mm/tlb-r8k.c b/arch/mips/mm/tlb-r8k.c
> index 91c2499..43fe634 100644
> --- a/arch/mips/mm/tlb-r8k.c
> +++ b/arch/mips/mm/tlb-r8k.c
> @@ -244,5 +244,6 @@ void __cpuinit tlb_init(void)
>
> local_flush_tlb_all();
>
> -   build_tlb_refill_handler();
> +   if (smp_processor_id() == 0)
> +   build_tlb_refill_handler();
>  }
> diff --git a/arch/mips/sgi-ip27/ip27-init.c b/arch/mips/sgi-ip27/ip27-init.c
> index 923c080..62c41ab 100644
> --- a/arch/mips/sgi-ip27/ip27-init.c
> +++ b/arch/mips/sgi-ip27/ip27-init.c
> @@ -84,7 +84,8 @@ static void __cpuinit per_hub_init(cnodeid_t cnode)
>
> memcpy((void *)(CKSEG0 + 0x100), _vec2_generic, 0x80);
> memcpy((void *)(CKSEG0 + 0x180), _vec3_generic, 0x80);
> -   

Re: BUG: IPv4: Attempt to release TCP socket in state 1

2013-03-04 Thread dormando


On Mon, 4 Mar 2013, Eric Dumazet wrote:

> On Tue, 2013-03-05 at 11:47 +0800, Cong Wang wrote:
> > (Cc'ing the right netdev mailing list...)
> >
> > On 03/05/2013 08:01 AM, dormando wrote:
> > > Hi!
> > >
> > > I have a (core lockup?) with 3.7.6+ and 3.8.2 which appears to be under
> > > ixgbe. The machine appears to still be up but network stays in a severely
> > > hobbled state. Either lagging or not responding to the network at all.
> > >
> > > On a new box the hang happens within 8-24 hours of giving it production
> > > network traffic. On an older machine (6 cores instead of 8, etc) it can
> > > run for a week or more before hanging.
> > >
> > > The hang from 3.7 might be slightly different than 3.8. They seem to be
> > > mostly the same aside from 3.8 hanging in the GRO path. Don't see anything
> > > obvious in 3.9-rc1 that would fix it, and haven't tried 3.9-rc1.
> > >
> > > I've not yet figured out how to reproduce outside of production (as
> > > always, sigh). This doesn't seem to happen with 3.6.6, but we have
> > > different and less frequent kernel panics there.
> > >
>
> Dornando, do you use any kind of special setup, external modules,
> or netfilter ? (iptables-save output would help)
>
> Is it a pristine kernel, or a modified one ?
>

(Sigh. sorry for the misfire, thanks for fixing cc).

No 3rd party modules. There's a tiny patch for controlling initcwnd from
userspace and another one for the extra_free_kbytes tunable that I brought
up in another thread. We've had the initcwnd patch in for a long time
without trouble. The extra_free_kbytes tunable isn't even being used yet,
so all that's doing is adding a 0 somewhere.

Only two iptables rules loaded: global NOTRACK rules for PREROUTING/OUTPUT
in raw.

Kernel's as close to pristine as I can make it. We had the 10g patch in
but I've dropped it.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ 052/153] idr: idr_for_each_entry() macro

2013-03-04 Thread Ben Hutchings
On Mon, 2013-03-04 at 22:05 +0100, Philipp Reisner wrote:
> Sure, here it is:
> --
> 
> From: Philipp Reisner 
> 
> commit 9749f30f1a387070e6e8351f35aeb829eacc3ab6 upstream.
> 
> Inspired by the list_for_each_entry() macro
> 
> Signed-off-by: Ben Hutchings 
> Signed-off-by: Philipp Reisner 

Thanks.

Ben.

> ---
>  include/linux/idr.h |   11 +++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/include/linux/idr.h b/include/linux/idr.h
> index 255491c..52a9da2 100644
> --- a/include/linux/idr.h
> +++ b/include/linux/idr.h
> @@ -152,4 +152,15 @@ void ida_simple_remove(struct ida *ida, unsigned int id);
>  
>  void __init idr_init_cache(void);
>  
> +/**
> + * idr_for_each_entry - iterate over an idr's elements of a given type
> + * @idp: idr handle
> + * @entry:   the type * to use as cursor
> + * @id:  id entry's key
> + */
> +#define idr_for_each_entry(idp, entry, id) \
> +   for (id = 0, entry = (typeof(entry))idr_get_next((idp), &(id)); \
> +entry != NULL; \
> +++id, entry = (typeof(entry))idr_get_next((idp), &(id)))
> +
>  #endif /* __IDR_H__ */
> 
> 

-- 
Ben Hutchings
Always try to do things in chronological order;
it's less confusing that way.


signature.asc
Description: This is a digitally signed message part


Re: [PATCH 2/2] dmi_scan: Refactor dmi_scan_machine(), {smbios,dmi}_present()

2013-03-04 Thread Ben Hutchings
On Mon, 2013-03-04 at 15:09 -0500, tmhik...@gmail.com wrote:
>   Forgive me for bothering you about this again, I know that -stable
> has a policy of only accepting patches once Linus has accepted them
> upstream, but I'm curious what's going on here. Is there something I could
> help with to move this along? I haven't seen any discussion about it since
> Feb 16.
> 
>   Of course this assumes that I have not missed the patch going in...
> Tim McGrath

Andrew, did you see these dmi_scan patches?  Should I repost them?

Ben.

-- 
Ben Hutchings
Always try to do things in chronological order;
it's less confusing that way.


signature.asc
Description: This is a digitally signed message part


Re: [PATCH linux-next] cpufreq: conservative: Fix sampling_down_factor functionality

2013-03-04 Thread Stratos Karafotis
Hi Viresh,

On 03/05/2013 02:23 AM, Viresh Kumar wrote:> Interesting. Because it was 
removed earlier and no body complained :)
> 
> I got following from Documentation:
> 
> sampling_down_factor: this parameter controls the rate at which the
> kernel makes a decision on when to decrease the frequency while running
> at top speed. When set to 1 (the default) decisions to reevaluate load
> are made at the same interval regardless of current clock speed. But
> when set to greater than 1 (e.g. 100) it acts as a multiplier for the
> scheduling interval for reevaluating load when the CPU is at its top
> speed due to high load. This improves performance by reducing the overhead
> of load evaluation and helping the CPU stay at its top speed when truly
> busy, rather than shifting back and forth in speed. This tunable has no
> effect on behavior at lower speeds/lower CPU loads.
> 
> And i believe we are supposed to check if we are at the top speed or not.
> Over that i believe the code should be like:
> 
> While setting speed to top speed, set the timer to delay * 
> sampling_down_factor,
> so that we actually don't reevaluate the load. What do you say?
> 

I had the same thoughts, but I saw the comments in the code:

/*
 * Every sampling_rate, we check, if current idle time is less than 20%
 * (default), then we try to increase frequency Every sampling_rate *
 * sampling_down_factor, we check, if current idle time is more than 80%, then
 * we try to decrease frequency
 *

Also checking the code before the commit 
8e677ce83bf41ba9c74e5b6d9ee60b07d4e5ed93 you may see that sampling down factor 
works in this way.
So, I decided to keep the original functionality (also down_skip was already 
there unused).

Regards,
Stratos

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/1 v2] pwm_bl: Add support for backlight enable GPIO

2013-03-04 Thread Alex Courbot

On 03/05/2013 01:43 PM, Andrew Chew wrote:

The backlight enable GPIO is specified in the device tree node for
backlight.

Signed-off-by: Andrew Chew 
---
I decided to go ahead with disabling/enabling the backlight via GPIO as
needed.  Note that I named the new functions pwm_backlight_enable() and
pwm_backlight_disable() (instead of something more gpio-specific) because
I thought it would be convenient to have a generic hook for when someone
wants to add yet more stuff to be done on enable/disable.

I tested this by going into /sys/class/backlight/backlight.n and manually
adjusting the brightness, and checked the gpio state to see that it had
the appropriate value.

  .../bindings/video/backlight/pwm-backlight.txt |2 +
  drivers/video/backlight/pwm_bl.c   |   50 ++--
  include/linux/pwm_backlight.h  |2 +
  3 files changed, 50 insertions(+), 4 deletions(-)

diff --git 
a/Documentation/devicetree/bindings/video/backlight/pwm-backlight.txt 
b/Documentation/devicetree/bindings/video/backlight/pwm-backlight.txt
index 1e4fc72..1ed4f0f 100644
--- a/Documentation/devicetree/bindings/video/backlight/pwm-backlight.txt
+++ b/Documentation/devicetree/bindings/video/backlight/pwm-backlight.txt
@@ -14,6 +14,8 @@ Required properties:
  Optional properties:
- pwm-names: a list of names for the PWM devices specified in the
 "pwms" property (see PWM binding[0])
+  - enable-gpio: a GPIO that needs to be used to enable the backlight
+  - enable-gpio-active-high: polarity of GPIO is active high (default is low)

  [0]: Documentation/devicetree/bindings/pwm/pwm.txt

diff --git a/drivers/video/backlight/pwm_bl.c b/drivers/video/backlight/pwm_bl.c
index 069983c..7dd426e 100644
--- a/drivers/video/backlight/pwm_bl.c
+++ b/drivers/video/backlight/pwm_bl.c
@@ -20,10 +20,13 @@
  #include 
  #include 
  #include 
+#include 

  struct pwm_bl_data {
struct pwm_device   *pwm;
struct device   *dev;
+   int enable_gpio;
+   unsigned intenable_gpio_flags;
unsigned intperiod;
unsigned intlth_brightness;
unsigned int*levels;
@@ -35,6 +38,20 @@ struct pwm_bl_data {
void(*exit)(struct device *);
  };

+static void pwm_backlight_enable(struct pwm_bl_data *pb)
+{
+   if (gpio_is_valid(pb->enable_gpio))
+   gpio_set_value(pb->enable_gpio,
+  pb->enable_gpio_flags & GPIOF_INIT_HIGH ? 1 : 0);
+}
+
+static void pwm_backlight_disable(struct pwm_bl_data *pb)
+{
+   if (gpio_is_valid(pb->enable_gpio))
+   gpio_set_value(pb->enable_gpio,
+  pb->enable_gpio_flags & GPIOF_INIT_HIGH ? 0 : 1);
+}
+
  static int pwm_backlight_update_status(struct backlight_device *bl)
  {
struct pwm_bl_data *pb = dev_get_drvdata(>dev);
@@ -53,6 +70,7 @@ static int pwm_backlight_update_status(struct 
backlight_device *bl)
if (brightness == 0) {
pwm_config(pb->pwm, 0, pb->period);
pwm_disable(pb->pwm);
+   pwm_backlight_disable(pb);


Maybe move the call to pwm_backlight_disable() before pwm_disable() to 
have a correctly nested power sequence?


Actually every call to pwm_disable/enable is near 
pwm_backlight_disable/enable, so you might as well want to call 
pwm_disable/enable within pwm_backlight_enable/disable to make the code 
more bullet-proof.



} else {
int duty_cycle;

@@ -67,6 +85,7 @@ static int pwm_backlight_update_status(struct 
backlight_device *bl)
 (duty_cycle * (pb->period - pb->lth_brightness) / max);
pwm_config(pb->pwm, duty_cycle, pb->period);
pwm_enable(pb->pwm);
+   pwm_backlight_enable(pb);
}

if (pb->notify_after)
@@ -146,10 +165,15 @@ static int pwm_backlight_parse_dt(struct device *dev,
}

/*
-* TODO: Most users of this driver use a number of GPIOs to control
-*   backlight power. Support for specifying these needs to be
-*   added.
+* If "enable-gpio" is present, use that GPIO to enable the backlight.
+* The presence (or not) of "enable-gpio-active-high" will determine
+* the value of the GPIO.
 */
+   data->enable_gpio = of_get_named_gpio(node, "enable-gpio", 0);
+   if (of_property_read_bool(node, "enable-gpio-active-high"))
+   data->enable_gpio_flags = GPIOF_OUT_INIT_HIGH;
+   else
+   data->enable_gpio_flags = GPIOF_OUT_INIT_LOW;

return 0;
  }
@@ -207,12 +231,23 @@ static int pwm_backlight_probe(struct platform_device 
*pdev)
} else
max = data->max_brightness;

+   pb->enable_gpio = data->enable_gpio;
+   pb->enable_gpio_flags = data->enable_gpio_flags;
pb->notify = data->notify;

Re: [PATCH 1/1] pwm_bl: Add support for backlight enable GPIO

2013-03-04 Thread Alex Courbot

On 03/05/2013 01:59 PM, Alex Courbot wrote:

Btw, you also want to check if the enable-gpio property exists first
because otherwise probe() will fails if no GPIO is specified).


That's actually not true - I overlooked the fact that probe() checks for 
the GPIO validity before requesting it. My bad.


Alex.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: BUG: IPv4: Attempt to release TCP socket in state 1

2013-03-04 Thread Eric Dumazet
On Tue, 2013-03-05 at 11:47 +0800, Cong Wang wrote:
> (Cc'ing the right netdev mailing list...)
> 
> On 03/05/2013 08:01 AM, dormando wrote:
> > Hi!
> >
> > I have a (core lockup?) with 3.7.6+ and 3.8.2 which appears to be under
> > ixgbe. The machine appears to still be up but network stays in a severely
> > hobbled state. Either lagging or not responding to the network at all.
> >
> > On a new box the hang happens within 8-24 hours of giving it production
> > network traffic. On an older machine (6 cores instead of 8, etc) it can
> > run for a week or more before hanging.
> >
> > The hang from 3.7 might be slightly different than 3.8. They seem to be
> > mostly the same aside from 3.8 hanging in the GRO path. Don't see anything
> > obvious in 3.9-rc1 that would fix it, and haven't tried 3.9-rc1.
> >
> > I've not yet figured out how to reproduce outside of production (as
> > always, sigh). This doesn't seem to happen with 3.6.6, but we have
> > different and less frequent kernel panics there.
> >

Dornando, do you use any kind of special setup, external modules, 
or netfilter ? (iptables-save output would help)

Is it a pristine kernel, or a modified one ?



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/1] kernel/pid_namespace.c: Fixing a lack of cleanup (Probable resources leak).

2013-03-04 Thread Raphael S Carvalho
On Tue, Mar 5, 2013 at 12:51 AM, Gao feng  wrote:
> On 2013/03/05 11:26, Eric W. Biederman wrote:
>> From: Raphael S.Carvalho 
>>
>> Starting point: create_pid_namespace()
>>
>> Suppose create_pid_cachep() was executed sucessfully, thus:
>> pcache was allocated by kmalloc().
>> cachep received a cache created by kmem_cache_create().
>> and pcache->list was added to the list pid_caches_lh.
>>
>> So what would happen if proc_alloc_inum() returns an error?
>> The resources allocated by create_pid_namespace() would be deallocated!
>> How about those resources allocated by create_pid_cachep()?
>> By knowing that, I created this patch in order to fix that!
>>
>> Signed-off-by: Raphael S.Carvalho 
>> ---
>
> Actually I noticed this problem and I think it is not a BUG.
> Since the pid_cache is created for all pid namespace which have the same 
> level.
> Even this pid namespace is failed to create, the pid_cache will not be 
> leaked, Other
> pid namespace which has the same level will use the pid_cache and no need to
> allocate it again. In other words, the pid_cache for every level pid 
> namespace will
> only be created once.
>
> I also think this patch add a bug,because there may be other pid namespace's 
> pid_cachep
> points to the same pid_cache which will be free at the by label 
> out_free_cachep.
>

Yeah, I found the snippet of code which searches for the pcache with
the same level.
 46list_for_each_entry(pcache, _caches_lh, list)
 47if (pcache->nr_ids == nr_ids)
 48goto out;

Regards,
Raphael S.Carvalho
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/1] pwm_bl: Add support for backlight enable GPIO

2013-03-04 Thread Alex Courbot

On 03/05/2013 01:48 PM, Andrew Chew wrote:

I sent out a new patch that enables/disables the backlight enable gpio.


On 03/05/2013 01:00 PM, Andrew Chew wrote:

I did come to the same conclusion regarding the platform data breakage.
I'm expecting that the use of platform data will go away, at least on
ARM, since we are all aggressively moving what used to be in platform
data into the device tree.  Do other platforms use this driver?


I can see at least 29 users of platform_pwm_backlight_data, all ARM with the
exception of one unicore32. I guess at least for the foreseeable future
platform data will remain.


I'm not sure how to solve this, then.  Any suggestions?


In one of my (many) attempts to add power sequencing to pwm-backlight, I 
just added a boolean to the platform data that must be explicitly set in 
order to enable control by GPIO. I.e.


booluse_enable_gpio
int enable_gpio;
unsigned intenable_gpio_flags;

enable_gpio and enable_gpio_flags would then only be considered if 
use_enable_gpio is true. Granted, it's not the best solution here but 
that's the only way to handle this correctly with integer GPIOS, and it 
does not pollute the DT anyway (use_enable_gpio will only be set by 
pwm_backlight_parse_dt() if of_get_named_gpio() returned a valid GPIO. 
Btw, you also want to check if the enable-gpio property exists first 
because otherwise probe() will fails if no GPIO is specified).



Yes, actually I am doing the GPIO rework. If you are not too much in a hurry
you might want for it to happen (should not be too long now that the core
has been reworked). At the same time, GPIO descriptors will also enable the
power sequences, so if you wait even longer (or help me with it), this patch
might not even be needed at all. Of course if you want to support this
*now*, this is still the shortest path.


Sadly, I do need this now, and I'd rather do it as cleanly as possible rather
than maintaining a hack.  The project I am working on is very pedantic.


Well, if you can get this right and make the GPIO optional, I think this 
is a reasonable feature to have in pwm-backlight, until a more generic 
powerseq-backlight driver takes over. ;)



To answer your last question, yes, this single patch does allow me to
enable the backlight on some boards (in particular, the one I'm working

on).

Cool - may I ask which one? All the NV boards I tried to far required more
complex sequences for their panels.


This is for t114-dalmore.  There may be other gpios that are needed that I'm
not aware of off the top of my head.  For the backlight itself, this seems to
be the only one.


I don't know the details of Dalmore but would think there must also be 
at least one regulator involved. If it is set to be always on in the DT, 
then your solution of using an enable GPIO should work then, even if not 
necessarily optimal wrt power usage.


Alex.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 6/8] watchdog: w83627hf: Add support for additional chips

2013-03-04 Thread Guenter Roeck
W83667HG-B, NCT6775, NCT6776 have the same watchdog registers as the
other supported chips but require a slightly different initialization.

Signed-off-by: Guenter Roeck 
---
 drivers/watchdog/Kconfig|   18 ++
 drivers/watchdog/w83627hf_wdt.c |   29 -
 2 files changed, 42 insertions(+), 5 deletions(-)

diff --git a/drivers/watchdog/Kconfig b/drivers/watchdog/Kconfig
index dd462af..346bc70 100644
--- a/drivers/watchdog/Kconfig
+++ b/drivers/watchdog/Kconfig
@@ -854,13 +854,23 @@ config VIA_WDT
Most people will say N.
 
 config W83627HF_WDT
-   tristate "W83627HF/W83627DHG Watchdog Timer"
+   tristate "Watchdog timer for W83627HF/W83627DHG and compatibles"
depends on X86
select WATCHDOG_CORE
---help---
- This is the driver for the hardware watchdog on the W83627HF chipset
- as used in Advantech PC-9578 and Tyan S2721-533 motherboards
- (and likely others). The driver also supports the W83627DHG chip.
+ This is the driver for the hardware watchdog on the following
+ Super I/O chips.
+   W83627DHG
+   W83627HF
+   W83627EHF
+   W83627THF
+   W83627UHG
+   W83637HF
+   W83667HG-B
+   W83687THF
+   NCT6775
+   NCT6776
+
  This watchdog simply watches your kernel to make sure it doesn't
  freeze, and if it does, it reboots your computer after a certain
  amount of time.
diff --git a/drivers/watchdog/w83627hf_wdt.c b/drivers/watchdog/w83627hf_wdt.c
index 3a2ba30..b23f296 100644
--- a/drivers/watchdog/w83627hf_wdt.c
+++ b/drivers/watchdog/w83627hf_wdt.c
@@ -47,7 +47,7 @@
 static int wdt_io;
 
 enum chips { w83627hf, w83637hf, w83627thf, w83687thf, w83627ehf, w83627dhg,
-w83627uhg };
+w83627uhg, w83667hg_b, nct6775, nct6776 };
 
 /*
  * Kernel methods.
@@ -67,6 +67,9 @@ enum chips { w83627hf, w83637hf, w83627thf, w83687thf, 
w83627ehf, w83627dhg,
 #define W83627EHF_ID   0x88
 #define W83627DHG_ID   0xa0
 #define W83627UHG_ID   0xa2
+#define W83667HG_B_ID  0xb3
+#define NCT6775_ID 0xb4
+#define NCT6776_ID 0xc3
 
 static void superio_outb(int reg, int val)
 {
@@ -130,6 +133,18 @@ static void w83627hf_init(enum chips chip)
case w83637hf:
case w83687thf:
break;
+   case w83667hg_b:
+   case nct6775:
+   case nct6776:
+   /*
+* These chips support more than one WDTO# output pin.
+* Don't touch it, and hope the BIOS does the right thing.
+*/
+   t = superio_inb(0xF5);
+   t |= 0x02;  /* enable the WDTO# output low pulse
+* to the KBRST# pin */
+   superio_outb(0xF5, t);
+   break;
default:
break;
}
@@ -268,6 +283,15 @@ static int wdt_find(int addr)
case W83627UHG_ID:
ret = w83627uhg;
break;
+   case W83667HG_B_ID:
+   ret = w83667hg_b;
+   break;
+   case NCT6775_ID:
+   ret = nct6775;
+   break;
+   case NCT6776_ID:
+   ret = nct6776;
+   break;
case 0xff:
break;
default:
@@ -291,6 +315,9 @@ static int __init wdt_init(void)
"W83627EHF",
"W83627DHG",
"W83627UHG",
+   "W83667HG-B",
+   "NCT6775",
+   "NCT6776",
};
 
wdt_io = 0x2e;
-- 
1.7.9.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/8] watchdog: w83627hf: Convert to watchdog infrastructure

2013-03-04 Thread Guenter Roeck
Signed-off-by: Guenter Roeck 
---
 drivers/watchdog/Kconfig|1 +
 drivers/watchdog/w83627hf_wdt.c |  234 ---
 2 files changed, 47 insertions(+), 188 deletions(-)

diff --git a/drivers/watchdog/Kconfig b/drivers/watchdog/Kconfig
index 9fcc70c..dd462af 100644
--- a/drivers/watchdog/Kconfig
+++ b/drivers/watchdog/Kconfig
@@ -856,6 +856,7 @@ config VIA_WDT
 config W83627HF_WDT
tristate "W83627HF/W83627DHG Watchdog Timer"
depends on X86
+   select WATCHDOG_CORE
---help---
  This is the driver for the hardware watchdog on the W83627HF chipset
  as used in Advantech PC-9578 and Tyan S2721-533 motherboards
diff --git a/drivers/watchdog/w83627hf_wdt.c b/drivers/watchdog/w83627hf_wdt.c
index 92f1326..cd82a47 100644
--- a/drivers/watchdog/w83627hf_wdt.c
+++ b/drivers/watchdog/w83627hf_wdt.c
@@ -1,6 +1,9 @@
 /*
  * w83627hf/thf WDT driver
  *
+ * (c) Copyright 2013 Guenter Roeck
+ * converted to watchdog infrastructure
+ *
  * (c) Copyright 2007 Vlad Drukker 
  * added support for W83627THF.
  *
@@ -31,42 +34,21 @@
 #include 
 #include 
 #include 
-#include 
 #include 
-#include 
 #include 
-#include 
 #include 
 #include 
 #include 
 #include 
-#include 
-
 
 #define WATCHDOG_NAME "w83627hf/thf/hg/dhg WDT"
 #define WATCHDOG_TIMEOUT 60/* 60 sec default timeout */
 
-static unsigned long wdt_is_open;
-static char expect_close;
-static DEFINE_SPINLOCK(io_lock);
-
 /* You must set this - there is no sane way to probe for this board. */
 static int wdt_io = 0x2E;
 module_param(wdt_io, int, 0);
 MODULE_PARM_DESC(wdt_io, "w83627hf/thf WDT io port (default 0x2E)");
 
-static int timeout = WATCHDOG_TIMEOUT; /* in seconds */
-module_param(timeout, int, 0);
-MODULE_PARM_DESC(timeout,
-   "Watchdog timeout in seconds. 1 <= timeout <= 255, default="
-   __MODULE_STRING(WATCHDOG_TIMEOUT) ".");
-
-static bool nowayout = WATCHDOG_NOWAYOUT;
-module_param(nowayout, bool, 0);
-MODULE_PARM_DESC(nowayout,
-   "Watchdog cannot be stopped once started (default="
-   __MODULE_STRING(WATCHDOG_NOWAYOUT) ")");
-
 /*
  * Kernel methods.
  */
@@ -120,8 +102,8 @@ static void w83627hf_init(void)
t = inb_p(WDT_EFDR);  /* read CRF6 */
if (t != 0) {
pr_info("Watchdog already running. Resetting timeout to %d 
sec\n",
-   timeout);
-   outb_p(timeout, WDT_EFDR);/* Write back to CRF6 */
+   WATCHDOG_TIMEOUT);
+   outb_p(WATCHDOG_TIMEOUT, WDT_EFDR);/* Write back to CRF6 */
}
 
outb_p(0xF5, WDT_EFER); /* Select CRF5 */
@@ -141,171 +123,53 @@ static void w83627hf_init(void)
w83627hf_unselect_wd_register();
 }
 
-static void wdt_set_time(int timeout)
+static int wdt_set_time(unsigned int timeout)
 {
-   spin_lock(_lock);
-
w83627hf_select_wd_register();
-
outb_p(0xF6, WDT_EFER);/* Select CRF6 */
outb_p(timeout, WDT_EFDR); /* Write Timeout counter to CRF6 */
-
w83627hf_unselect_wd_register();
-
-   spin_unlock(_lock);
+   return 0;
 }
 
-static int wdt_ping(void)
+static int wdt_start(struct watchdog_device *wdog)
 {
-   wdt_set_time(timeout);
-   return 0;
+   return wdt_set_time(wdog->timeout);
 }
 
-static int wdt_disable(void)
+static int wdt_stop(struct watchdog_device *wdog)
 {
-   wdt_set_time(0);
-   return 0;
+   return wdt_set_time(0);
 }
 
-static int wdt_set_heartbeat(int t)
+static int wdt_set_timeout(struct watchdog_device *wdog, unsigned int timeout)
 {
-   if (t < 1 || t > 255)
-   return -EINVAL;
-   timeout = t;
+   wdt_set_time(timeout);
+   wdog->timeout = timeout;
+
return 0;
 }
 
-static int wdt_get_time(void)
+static unsigned int wdt_get_time(struct watchdog_device *wdog)
 {
-   int timeleft;
-
-   spin_lock(_lock);
+   unsigned int timeleft;
 
w83627hf_select_wd_register();
-
-   outb_p(0xF6, WDT_EFER);/* Select CRF6 */
-   timeleft = inb_p(WDT_EFDR); /* Read Timeout counter to CRF6 */
-
+   outb_p(0xF6, WDT_EFER);
+   timeleft = inb(WDT_EFDR);
w83627hf_unselect_wd_register();
 
-   spin_unlock(_lock);
-
return timeleft;
 }
 
-static ssize_t wdt_write(struct file *file, const char __user *buf,
-   size_t count, loff_t *ppos)
-{
-   if (count) {
-   if (!nowayout) {
-   size_t i;
-
-   expect_close = 0;
-
-   for (i = 0; i != count; i++) {
-   char c;
-   if (get_user(c, buf + i))
-   return -EFAULT;
-   if (c == 'V')
-   expect_close = 42;
-  

[PATCH 7/8] watchdog: w83627hf: Add support for W83697HF and W83697UG

2013-03-04 Thread Guenter Roeck
Major difference is that the watchdog control and counter registers
are different on both chips.

Signed-off-by: Guenter Roeck 
---
 drivers/watchdog/Kconfig|2 ++
 drivers/watchdog/w83627hf_wdt.c |   60 +++
 2 files changed, 50 insertions(+), 12 deletions(-)

diff --git a/drivers/watchdog/Kconfig b/drivers/watchdog/Kconfig
index 346bc70..acdd347 100644
--- a/drivers/watchdog/Kconfig
+++ b/drivers/watchdog/Kconfig
@@ -868,6 +868,8 @@ config W83627HF_WDT
W83637HF
W83667HG-B
W83687THF
+   W83697HF
+   W83697UG
NCT6775
NCT6776
 
diff --git a/drivers/watchdog/w83627hf_wdt.c b/drivers/watchdog/w83627hf_wdt.c
index b23f296..2acac16 100644
--- a/drivers/watchdog/w83627hf_wdt.c
+++ b/drivers/watchdog/w83627hf_wdt.c
@@ -45,9 +45,11 @@
 #define WATCHDOG_TIMEOUT 60/* 60 sec default timeout */
 
 static int wdt_io;
+static int cr_wdt_timeout; /* WDT timeout register */
+static int cr_wdt_control; /* WDT control register */
 
-enum chips { w83627hf, w83637hf, w83627thf, w83687thf, w83627ehf, w83627dhg,
-w83627uhg, w83667hg_b, nct6775, nct6776 };
+enum chips { w83627hf, w83697hf, w83697ug, w83637hf, w83627thf, w83687thf,
+w83627ehf, w83627dhg, w83627uhg, w83667hg_b, nct6775, nct6776 };
 
 /*
  * Kernel methods.
@@ -61,6 +63,8 @@ enum chips { w83627hf, w83637hf, w83627thf, w83687thf, 
w83627ehf, w83627dhg,
 #define W83627HF_LD_WDT0x08
 
 #define W83627HF_ID0x52
+#define W83697HF_ID0x60
+#define W83697UG_ID0x68
 #define W83637HF_ID0x70
 #define W83627THF_ID   0x82
 #define W83687THF_ID   0x85
@@ -71,6 +75,12 @@ enum chips { w83627hf, w83637hf, w83627thf, w83687thf, 
w83627ehf, w83627dhg,
 #define NCT6775_ID 0xb4
 #define NCT6776_ID 0xc3
 
+#define W83627HF_WDT_TIMEOUT   0xf6
+#define W83697HF_WDT_TIMEOUT   0xf4
+
+#define W83627HF_WDT_CONTROL   0xf5
+#define W83697HF_WDT_CONTROL   0xf3
+
 static void superio_outb(int reg, int val)
 {
outb(reg, WDT_EFER);
@@ -116,6 +126,17 @@ static void w83627hf_init(enum chips chip)
t = superio_inb(0x2B) & ~0x10;
superio_outb(0x2B, t); /* set GPIO24 to WDT0 */
break;
+   case w83697hf:
+   /* Set pin 119 to WDTO# mode (= CR29, WDT0) */
+   t = superio_inb(0x29) & ~0x60;
+   t |= 0x20;
+   superio_outb(0x29, t);
+   break;
+   case w83697ug:
+   /* Set pin 118 to WDTO# mode */
+   t = superio_inb(0x2b) & ~0x04;
+   superio_outb(0x2b, t);
+   break;
case w83627thf:
t = (superio_inb(0x2B) & ~0x08) | 0x04;
superio_outb(0x2B, t); /* set GPIO3 to WDT0 */
@@ -125,10 +146,10 @@ static void w83627hf_init(enum chips chip)
case w83627uhg:
t = superio_inb(0x2D) & ~0x01; /* PIN77 -> WDT0# */
superio_outb(0x2D, t); /* set GPIO5 to WDT0 */
-   t = superio_inb(0xF5);
+   t = superio_inb(cr_wdt_control);
t |= 0x02;  /* enable the WDTO# output low pulse
 * to the KBRST# pin */
-   superio_outb(0xF5, t);
+   superio_outb(cr_wdt_control, t);
break;
case w83637hf:
case w83687thf:
@@ -140,25 +161,25 @@ static void w83627hf_init(enum chips chip)
 * These chips support more than one WDTO# output pin.
 * Don't touch it, and hope the BIOS does the right thing.
 */
-   t = superio_inb(0xF5);
+   t = superio_inb(cr_wdt_control);
t |= 0x02;  /* enable the WDTO# output low pulse
 * to the KBRST# pin */
-   superio_outb(0xF5, t);
+   superio_outb(cr_wdt_control, t);
break;
default:
break;
}
 
-   t = superio_inb(0xF6);
+   t = superio_inb(cr_wdt_timeout);
if (t != 0) {
pr_info("Watchdog already running. Resetting timeout to %d 
sec\n",
WATCHDOG_TIMEOUT);
-   superio_outb(0xf6, WATCHDOG_TIMEOUT);
+   superio_outb(cr_wdt_timeout, WATCHDOG_TIMEOUT);
}
 
/* set second mode & disable keyboard turning off watchdog */
-   t = superio_inb(0xF5) & ~0x0C;
-   superio_outb(0xF5, t);
+   t = superio_inb(cr_wdt_control) & ~0x0C;
+   superio_outb(cr_wdt_control, t);
 
/* disable keyboard & mouse turning off watchdog */
t = superio_inb(0xF7) & ~0xC0;
@@ -171,7 +192,7 @@ static int wdt_set_time(unsigned int timeout)
 {
superio_enter();
superio_select(W83627HF_LD_WDT);
-   superio_outb(0xF6, timeout);
+   

[PATCH 8/8] watchdog: Remove drivers for W83697HF and W83697UG

2013-03-04 Thread Guenter Roeck
Since both chips are now supported by the w83627hf watchdog driver,
the chip specific drivers are no longer needed and can be removed.

Signed-off-by: Guenter Roeck 
---
 drivers/watchdog/Kconfig|   30 ---
 drivers/watchdog/Makefile   |2 -
 drivers/watchdog/w83697hf_wdt.c |  461 ---
 drivers/watchdog/w83697ug_wdt.c |  398 -
 4 files changed, 891 deletions(-)
 delete mode 100644 drivers/watchdog/w83697hf_wdt.c
 delete mode 100644 drivers/watchdog/w83697ug_wdt.c

diff --git a/drivers/watchdog/Kconfig b/drivers/watchdog/Kconfig
index acdd347..095edd0 100644
--- a/drivers/watchdog/Kconfig
+++ b/drivers/watchdog/Kconfig
@@ -882,36 +882,6 @@ config W83627HF_WDT
 
  Most people will say N.
 
-config W83697HF_WDT
-   tristate "W83697HF/W83697HG Watchdog Timer"
-   depends on X86
-   ---help---
- This is the driver for the hardware watchdog on the W83697HF/HG
- chipset as used in Dedibox/VIA motherboards (and likely others).
- This watchdog simply watches your kernel to make sure it doesn't
- freeze, and if it does, it reboots your computer after a certain
- amount of time.
-
- To compile this driver as a module, choose M here: the
- module will be called w83697hf_wdt.
-
- Most people will say N.
-
-config W83697UG_WDT
-   tristate "W83697UG/W83697UF Watchdog Timer"
-   depends on X86
-   ---help---
- This is the driver for the hardware watchdog on the W83697UG/UF
- chipset as used in MSI Fuzzy CX700 VIA motherboards (and likely 
others).
- This watchdog simply watches your kernel to make sure it doesn't
- freeze, and if it does, it reboots your computer after a certain
- amount of time.
-
- To compile this driver as a module, choose M here: the
- module will be called w83697ug_wdt.
-
- Most people will say N.
-
 config W83877F_WDT
tristate "W83877F (EMACS) Watchdog Timer"
depends on X86
diff --git a/drivers/watchdog/Makefile b/drivers/watchdog/Makefile
index a300b94..b330b1f 100644
--- a/drivers/watchdog/Makefile
+++ b/drivers/watchdog/Makefile
@@ -103,8 +103,6 @@ obj-$(CONFIG_SMSC_SCH311X_WDT) += sch311x_wdt.o
 obj-$(CONFIG_SMSC37B787_WDT) += smsc37b787_wdt.o
 obj-$(CONFIG_VIA_WDT) += via_wdt.o
 obj-$(CONFIG_W83627HF_WDT) += w83627hf_wdt.o
-obj-$(CONFIG_W83697HF_WDT) += w83697hf_wdt.o
-obj-$(CONFIG_W83697UG_WDT) += w83697ug_wdt.o
 obj-$(CONFIG_W83877F_WDT) += w83877f_wdt.o
 obj-$(CONFIG_W83977F_WDT) += w83977f_wdt.o
 obj-$(CONFIG_MACHZ_WDT) += machzwd.o
diff --git a/drivers/watchdog/w83697hf_wdt.c b/drivers/watchdog/w83697hf_wdt.c
deleted file mode 100644
index cd9f3c1..000
--- a/drivers/watchdog/w83697hf_wdt.c
+++ /dev/null
@@ -1,461 +0,0 @@
-/*
- * w83697hf/hg WDT driver
- *
- * (c) Copyright 2006 Samuel Tardieu 
- * (c) Copyright 2006 Marcus Junker 
- *
- * Based on w83627hf_wdt.c which is based on advantechwdt.c
- * which is based on wdt.c.
- * Original copyright messages:
- *
- * (c) Copyright 2003 Pádraig Brady 
- *
- * (c) Copyright 2000-2001 Marek Michalkiewicz 
- *
- * (c) Copyright 1996 Alan Cox ,
- * All Rights Reserved.
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of the GNU General Public License
- * as published by the Free Software Foundation; either version
- * 2 of the License, or (at your option) any later version.
- *
- * Neither Marcus Junker nor ANDURAS AG admit liability nor provide
- * warranty for any of this software. This material is provided
- * "AS-IS" and at no charge.
- */
-
-#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
-
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-
-
-#define WATCHDOG_NAME "w83697hf/hg WDT"
-#define WATCHDOG_TIMEOUT 60/* 60 sec default timeout */
-#define WATCHDOG_EARLY_DISABLE 1   /* Disable until userland kicks in */
-
-static unsigned long wdt_is_open;
-static char expect_close;
-static DEFINE_SPINLOCK(io_lock);
-
-/* You must set this - there is no sane way to probe for this board. */
-static int wdt_io = 0x2e;
-module_param(wdt_io, int, 0);
-MODULE_PARM_DESC(wdt_io,
-   "w83697hf/hg WDT io port (default 0x2e, 0 = autodetect)");
-
-static int timeout = WATCHDOG_TIMEOUT; /* in seconds */
-module_param(timeout, int, 0);
-MODULE_PARM_DESC(timeout,
-   "Watchdog timeout in seconds. 1<= timeout <=255 (default="
-   __MODULE_STRING(WATCHDOG_TIMEOUT) ")");
-
-static bool nowayout = WATCHDOG_NOWAYOUT;
-module_param(nowayout, bool, 0);
-MODULE_PARM_DESC(nowayout,
-   "Watchdog cannot be stopped once started (default="
-   __MODULE_STRING(WATCHDOG_NOWAYOUT) ")");
-

[PATCH 3/8] watchdog: w83627hf: Enable watchdog device only if not already enabled

2013-03-04 Thread Guenter Roeck
There is no need to enable the watchdog device if it is already enabled.
Also, when enabling the watchdog device, only set the watchdog device
enable bit and do not touch other bits; depending on the chip type,
those bits may enable other functionality.

Signed-off-by: Guenter Roeck 
---
 drivers/watchdog/w83627hf_wdt.c |4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/watchdog/w83627hf_wdt.c b/drivers/watchdog/w83627hf_wdt.c
index 98a4fd0..6f023e1 100644
--- a/drivers/watchdog/w83627hf_wdt.c
+++ b/drivers/watchdog/w83627hf_wdt.c
@@ -95,7 +95,9 @@ static void w83627hf_init(void)
}
 
outb_p(0x30, WDT_EFER); /* select CR30 */
-   outb_p(0x01, WDT_EFDR); /* set bit 0 to activate GPIO2 */
+   t = inb(WDT_EFDR);
+   if (!(t & 0x01))
+   outb_p(t | 0x01, WDT_EFDR); /* set bit 0 to activate GPIO2 */
 
outb_p(0xF6, WDT_EFER); /* Select CRF6 */
t = inb_p(WDT_EFDR);  /* read CRF6 */
-- 
1.7.9.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 5/8] watchdog: w83627hf: Auto-detect IO address and supported chips

2013-03-04 Thread Guenter Roeck
Instead of requiring the user to provide an IO address per module
parameter, auto-detect it as well as supported chips.

Signed-off-by: Guenter Roeck 
---
 drivers/watchdog/w83627hf_wdt.c |  121 ---
 1 file changed, 101 insertions(+), 20 deletions(-)

diff --git a/drivers/watchdog/w83627hf_wdt.c b/drivers/watchdog/w83627hf_wdt.c
index 8ae71fb..3a2ba30 100644
--- a/drivers/watchdog/w83627hf_wdt.c
+++ b/drivers/watchdog/w83627hf_wdt.c
@@ -44,10 +44,10 @@
 #define WATCHDOG_NAME "w83627hf/thf/hg/dhg WDT"
 #define WATCHDOG_TIMEOUT 60/* 60 sec default timeout */
 
-/* You must set this - there is no sane way to probe for this board. */
-static int wdt_io = 0x2E;
-module_param(wdt_io, int, 0);
-MODULE_PARM_DESC(wdt_io, "w83627hf/thf WDT io port (default 0x2E)");
+static int wdt_io;
+
+enum chips { w83627hf, w83637hf, w83627thf, w83687thf, w83627ehf, w83627dhg,
+w83627uhg };
 
 /*
  * Kernel methods.
@@ -60,6 +60,14 @@ MODULE_PARM_DESC(wdt_io, "w83627hf/thf WDT io port (default 
0x2E)");
 
 #define W83627HF_LD_WDT0x08
 
+#define W83627HF_ID0x52
+#define W83637HF_ID0x70
+#define W83627THF_ID   0x82
+#define W83687THF_ID   0x85
+#define W83627EHF_ID   0x88
+#define W83627DHG_ID   0xa0
+#define W83627UHG_ID   0xa2
+
 static void superio_outb(int reg, int val)
 {
outb(reg, WDT_EFER);
@@ -88,29 +96,44 @@ static void superio_exit(void)
outb_p(0xAA, WDT_EFER); /* Leave extended function mode */
 }
 
-/* tyan motherboards seem to set F5 to 0x4C ?
- * So explicitly init to appropriate value. */
-
-static void w83627hf_init(void)
+static void w83627hf_init(enum chips chip)
 {
unsigned char t;
 
superio_enter();
superio_select(W83627HF_LD_WDT);
-   t = superio_inb(0x20);  /* check chip version   */
-   if (t == 0x82) {/* W83627THF*/
-   t = (superio_inb(0x2b) & 0xf7);
-   superio_outb(0x2b, t | 0x04); /* set GPIO3 to WDT0 */
-   } else if (t == 0x88 || t == 0xa0) {/* W83627EHF / W83627DHG */
-   t = superio_inb(0x2d);
-   superio_outb(0x2d, t & ~0x01);  /* set GPIO5 to WDT0 */
-   }
 
/* set CR30 bit 0 to activate GPIO2 */
t = superio_inb(0x30);
if (!(t & 0x01))
superio_outb(0x30, t | 0x01);
 
+   switch (chip) {
+   case w83627hf:
+   t = superio_inb(0x2B) & ~0x10;
+   superio_outb(0x2B, t); /* set GPIO24 to WDT0 */
+   break;
+   case w83627thf:
+   t = (superio_inb(0x2B) & ~0x08) | 0x04;
+   superio_outb(0x2B, t); /* set GPIO3 to WDT0 */
+   break;
+   case w83627ehf:
+   case w83627dhg:
+   case w83627uhg:
+   t = superio_inb(0x2D) & ~0x01; /* PIN77 -> WDT0# */
+   superio_outb(0x2D, t); /* set GPIO5 to WDT0 */
+   t = superio_inb(0xF5);
+   t |= 0x02;  /* enable the WDTO# output low pulse
+* to the KBRST# pin */
+   superio_outb(0xF5, t);
+   break;
+   case w83637hf:
+   case w83687thf:
+   break;
+   default:
+   break;
+   }
+
t = superio_inb(0xF6);
if (t != 0) {
pr_info("Watchdog already running. Resetting timeout to %d 
sec\n",
@@ -120,8 +143,6 @@ static void w83627hf_init(void)
 
/* set second mode & disable keyboard turning off watchdog */
t = superio_inb(0xF5) & ~0x0C;
-   /* enable the WDTO# output low pulse to the KBRST# pin */
-   t |= 0x02;
superio_outb(0xF5, t);
 
/* disable keyboard & mouse turning off watchdog */
@@ -217,19 +238,79 @@ static struct notifier_block wdt_notifier = {
.notifier_call = wdt_notify_sys,
 };
 
+static int wdt_find(int addr)
+{
+   u8 val;
+   int ret = -ENODEV;
+
+   superio_enter();
+   superio_select(W83627HF_LD_WDT);
+   val = superio_inb(0x20);
+   switch (val) {
+   case W83627HF_ID:
+   ret = w83627hf;
+   break;
+   case W83637HF_ID:
+   ret = w83637hf;
+   break;
+   case W83627THF_ID:
+   ret = w83627thf;
+   break;
+   case W83687THF_ID:
+   ret = w83687thf;
+   break;
+   case W83627EHF_ID:
+   ret = w83627ehf;
+   break;
+   case W83627DHG_ID:
+   ret = w83627dhg;
+   break;
+   case W83627UHG_ID:
+   ret = w83627uhg;
+   break;
+   case 0xff:
+   break;
+   default:
+   pr_err("Unsupported chip ID: 0x%02x\n", val);
+   break;
+   }
+   superio_exit();
+   return ret;
+}
+
 static int __init wdt_init(void)
 {
int ret;
bool nowayout = WATCHDOG_NOWAYOUT;
+ 

[PATCH 2/8] watchdog: w83627hf: Enable watchdog only once

2013-03-04 Thread Guenter Roeck
It is unnecessary to enable the logical device and WDT0 each time
the watchdog is accessed. Do it only once during initialization.

Signed-off-by: Guenter Roeck 
---
 drivers/watchdog/w83627hf_wdt.c |   35 +--
 1 file changed, 17 insertions(+), 18 deletions(-)

diff --git a/drivers/watchdog/w83627hf_wdt.c b/drivers/watchdog/w83627hf_wdt.c
index cd82a47..98a4fd0 100644
--- a/drivers/watchdog/w83627hf_wdt.c
+++ b/drivers/watchdog/w83627hf_wdt.c
@@ -60,28 +60,10 @@ MODULE_PARM_DESC(wdt_io, "w83627hf/thf WDT io port (default 
0x2E)");
 
 static void w83627hf_select_wd_register(void)
 {
-   unsigned char c;
outb_p(0x87, WDT_EFER); /* Enter extended function mode */
outb_p(0x87, WDT_EFER); /* Again according to manual */
-
-   outb(0x20, WDT_EFER);   /* check chip version   */
-   c = inb(WDT_EFDR);
-   if (c == 0x82) {/* W83627THF*/
-   outb_p(0x2b, WDT_EFER); /* select GPIO3 */
-   c = ((inb_p(WDT_EFDR) & 0xf7) | 0x04); /* select WDT0 */
-   outb_p(0x2b, WDT_EFER);
-   outb_p(c, WDT_EFDR);/* set GPIO3 to WDT0 */
-   } else if (c == 0x88 || c == 0xa0) {/* W83627EHF / W83627DHG */
-   outb_p(0x2d, WDT_EFER); /* select GPIO5 */
-   c = inb_p(WDT_EFDR) & ~0x01; /* PIN77 -> WDT0# */
-   outb_p(0x2d, WDT_EFER);
-   outb_p(c, WDT_EFDR); /* set GPIO5 to WDT0 */
-   }
-
outb_p(0x07, WDT_EFER); /* point to logical device number reg */
outb_p(0x08, WDT_EFDR); /* select logical device 8 (GPIO2) */
-   outb_p(0x30, WDT_EFER); /* select CR30 */
-   outb_p(0x01, WDT_EFDR); /* set bit 0 to activate GPIO2 */
 }
 
 static void w83627hf_unselect_wd_register(void)
@@ -98,6 +80,23 @@ static void w83627hf_init(void)
 
w83627hf_select_wd_register();
 
+   outb(0x20, WDT_EFER);   /* check chip version   */
+   t = inb(WDT_EFDR);
+   if (t == 0x82) {/* W83627THF*/
+   outb_p(0x2b, WDT_EFER); /* select GPIO3 */
+   t = ((inb_p(WDT_EFDR) & 0xf7) | 0x04); /* select WDT0 */
+   outb_p(0x2b, WDT_EFER);
+   outb_p(t, WDT_EFDR);/* set GPIO3 to WDT0 */
+   } else if (t == 0x88 || t == 0xa0) {/* W83627EHF / W83627DHG */
+   outb_p(0x2d, WDT_EFER); /* select GPIO5 */
+   t = inb_p(WDT_EFDR) & ~0x01; /* PIN77 -> WDT0# */
+   outb_p(0x2d, WDT_EFER);
+   outb_p(t, WDT_EFDR); /* set GPIO5 to WDT0 */
+   }
+
+   outb_p(0x30, WDT_EFER); /* select CR30 */
+   outb_p(0x01, WDT_EFDR); /* set bit 0 to activate GPIO2 */
+
outb_p(0xF6, WDT_EFER); /* Select CRF6 */
t = inb_p(WDT_EFDR);  /* read CRF6 */
if (t != 0) {
-- 
1.7.9.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 4/8] watchdog: w83627hf: Use helper functions to access superio registers

2013-03-04 Thread Guenter Roeck
Use helper functions named similar to other drivers to access
superio registers.

Signed-off-by: Guenter Roeck 
---
 drivers/watchdog/w83627hf_wdt.c |   97 +--
 1 file changed, 52 insertions(+), 45 deletions(-)

diff --git a/drivers/watchdog/w83627hf_wdt.c b/drivers/watchdog/w83627hf_wdt.c
index 6f023e1..8ae71fb 100644
--- a/drivers/watchdog/w83627hf_wdt.c
+++ b/drivers/watchdog/w83627hf_wdt.c
@@ -58,15 +58,32 @@ MODULE_PARM_DESC(wdt_io, "w83627hf/thf WDT io port (default 
0x2E)");
(same as EFER) */
 #define WDT_EFDR (WDT_EFIR+1) /* Extended Function Data Register */
 
-static void w83627hf_select_wd_register(void)
+#define W83627HF_LD_WDT0x08
+
+static void superio_outb(int reg, int val)
+{
+   outb(reg, WDT_EFER);
+   outb(val, WDT_EFDR);
+}
+
+static inline int superio_inb(int reg)
+{
+   outb(reg, WDT_EFER);
+   return inb(WDT_EFDR);
+}
+
+static void superio_enter(void)
 {
outb_p(0x87, WDT_EFER); /* Enter extended function mode */
outb_p(0x87, WDT_EFER); /* Again according to manual */
-   outb_p(0x07, WDT_EFER); /* point to logical device number reg */
-   outb_p(0x08, WDT_EFDR); /* select logical device 8 (GPIO2) */
 }
 
-static void w83627hf_unselect_wd_register(void)
+static void superio_select(int ld)
+{
+   superio_outb(0x07, ld);
+}
+
+static void superio_exit(void)
 {
outb_p(0xAA, WDT_EFER); /* Leave extended function mode */
 }
@@ -78,58 +95,48 @@ static void w83627hf_init(void)
 {
unsigned char t;
 
-   w83627hf_select_wd_register();
-
-   outb(0x20, WDT_EFER);   /* check chip version   */
-   t = inb(WDT_EFDR);
+   superio_enter();
+   superio_select(W83627HF_LD_WDT);
+   t = superio_inb(0x20);  /* check chip version   */
if (t == 0x82) {/* W83627THF*/
-   outb_p(0x2b, WDT_EFER); /* select GPIO3 */
-   t = ((inb_p(WDT_EFDR) & 0xf7) | 0x04); /* select WDT0 */
-   outb_p(0x2b, WDT_EFER);
-   outb_p(t, WDT_EFDR);/* set GPIO3 to WDT0 */
+   t = (superio_inb(0x2b) & 0xf7);
+   superio_outb(0x2b, t | 0x04); /* set GPIO3 to WDT0 */
} else if (t == 0x88 || t == 0xa0) {/* W83627EHF / W83627DHG */
-   outb_p(0x2d, WDT_EFER); /* select GPIO5 */
-   t = inb_p(WDT_EFDR) & ~0x01; /* PIN77 -> WDT0# */
-   outb_p(0x2d, WDT_EFER);
-   outb_p(t, WDT_EFDR); /* set GPIO5 to WDT0 */
+   t = superio_inb(0x2d);
+   superio_outb(0x2d, t & ~0x01);  /* set GPIO5 to WDT0 */
}
 
-   outb_p(0x30, WDT_EFER); /* select CR30 */
-   t = inb(WDT_EFDR);
+   /* set CR30 bit 0 to activate GPIO2 */
+   t = superio_inb(0x30);
if (!(t & 0x01))
-   outb_p(t | 0x01, WDT_EFDR); /* set bit 0 to activate GPIO2 */
+   superio_outb(0x30, t | 0x01);
 
-   outb_p(0xF6, WDT_EFER); /* Select CRF6 */
-   t = inb_p(WDT_EFDR);  /* read CRF6 */
+   t = superio_inb(0xF6);
if (t != 0) {
pr_info("Watchdog already running. Resetting timeout to %d 
sec\n",
WATCHDOG_TIMEOUT);
-   outb_p(WATCHDOG_TIMEOUT, WDT_EFDR);/* Write back to CRF6 */
+   superio_outb(0xf6, WATCHDOG_TIMEOUT);
}
 
-   outb_p(0xF5, WDT_EFER); /* Select CRF5 */
-   t = inb_p(WDT_EFDR);  /* read CRF5 */
-   t &= ~0x0C;   /* set second mode & disable keyboard
-   turning off watchdog */
-   t |= 0x02;/* enable the WDTO# output low pulse
-   to the KBRST# pin (PIN60) */
-   outb_p(t, WDT_EFDR);/* Write back to CRF5 */
-
-   outb_p(0xF7, WDT_EFER); /* Select CRF7 */
-   t = inb_p(WDT_EFDR);  /* read CRF7 */
-   t &= ~0xC0;   /* disable keyboard & mouse turning off
-   watchdog */
-   outb_p(t, WDT_EFDR);/* Write back to CRF7 */
-
-   w83627hf_unselect_wd_register();
+   /* set second mode & disable keyboard turning off watchdog */
+   t = superio_inb(0xF5) & ~0x0C;
+   /* enable the WDTO# output low pulse to the KBRST# pin */
+   t |= 0x02;
+   superio_outb(0xF5, t);
+
+   /* disable keyboard & mouse turning off watchdog */
+   t = superio_inb(0xF7) & ~0xC0;
+   superio_outb(0xF7, t);
+
+   superio_exit();
 }
 
 static int wdt_set_time(unsigned int timeout)
 {
-   w83627hf_select_wd_register();
-   outb_p(0xF6, WDT_EFER);/* Select CRF6 */
-   outb_p(timeout, WDT_EFDR); /* Write Timeout counter to CRF6 */
-   w83627hf_unselect_wd_register();
+   superio_enter();
+   superio_select(W83627HF_LD_WDT);
+   superio_outb(0xF6, timeout);
+   superio_exit();
return 0;
 }
 
@@ -155,10 +162,10 @@ static 

[PATCH 0/8] watchdog: w83627hf: Convert to watchdog infrastructure

2013-03-04 Thread Guenter Roeck
Convert to watchdog infrastructure, cleanup, add support for additional
chips, and merge with W83697HF and W83697UG watchdog drivers.

Tested with NCT6775 and NCT6776. I'll be able to test the code with W83627UHG.
Additional test feedback, especially for other chips, would be appreciated.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] cpufreq: ARM big LITTLE: Add generic cpufreq driver and its DT glue

2013-03-04 Thread Viresh Kumar
big LITTLE is ARM's new Architecture focussing power/performance needs of modern
world. More information about big LITTLE can be found here:

http://www.arm.com/products/processors/technologies/biglittleprocessing.php
http://lwn.net/Articles/481055/

In order to keep cpufreq support for all big LITTLE platforms simple/generic,
this patch tries to add a generic cpufreq driver layer for all big LITTLE
platforms.

The driver is divided into two parts:
- Core driver: Generic and shared across all big LITTLE SoC's
- Glue drivers: Per platform drivers providing ops to the core driver

This patch adds in a generic glue driver which would extract information from
Device Tree.

Future SoC's can either reuse the DT glue or write their own depending on the
need.

Signed-off-by: Sudeep KarkadaNagesha 
Signed-off-by: Viresh Kumar 
---

This is pushed here:

http://git.linaro.org/gitweb?p=people/vireshk/linux.git;a=shortlog;h=refs/heads/cpufreq-biglittle

 .../bindings/cpufreq/arm_big_little_dt.txt |  29 +++
 MAINTAINERS|  11 +
 drivers/cpufreq/Kconfig.arm|  13 +
 drivers/cpufreq/Makefile   |   4 +
 drivers/cpufreq/arm_big_little.c   | 278 +
 drivers/cpufreq/arm_big_little.h   |  40 +++
 drivers/cpufreq/arm_big_little_dt.c| 125 +
 7 files changed, 500 insertions(+)
 create mode 100644 
Documentation/devicetree/bindings/cpufreq/arm_big_little_dt.txt
 create mode 100644 drivers/cpufreq/arm_big_little.c
 create mode 100644 drivers/cpufreq/arm_big_little.h
 create mode 100644 drivers/cpufreq/arm_big_little_dt.c

diff --git a/Documentation/devicetree/bindings/cpufreq/arm_big_little_dt.txt 
b/Documentation/devicetree/bindings/cpufreq/arm_big_little_dt.txt
new file mode 100644
index 000..6f534eb
--- /dev/null
+++ b/Documentation/devicetree/bindings/cpufreq/arm_big_little_dt.txt
@@ -0,0 +1,29 @@
+Generic ARM big LITTLE cpufreq driver's DT glue
+---
+
+It is DT specific glue layer for generic cpufreq driver for big LITTLE systems.
+
+Both required and optional properties listed below must be defined under node
+cluster*. * can be 0 or 1.
+
+Required properties:
+- freqs: List of all supported frequencies.
+
+Optional properties:
+- clock-latency: Specify the possible maximum transition latency for clock, in
+  unit of nanoseconds.
+
+Examples:
+
+cluster0: cluster@0 {
+..
+
+   freqs = <5 6 7 8 9 10 
11 12>;
+   clock-latency = <20>;
+
+   ..
+
+cores {
+   ..
+};
+};
diff --git a/MAINTAINERS b/MAINTAINERS
index 554fd30..b14b749 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2206,6 +2206,17 @@ S:   Maintained
 F: drivers/cpufreq/
 F: include/linux/cpufreq.h
 
+CPU FREQUENCY DRIVERS - ARM BIG LITTLE
+M: Viresh Kumar 
+M: Sudeep KarkadaNagesha 
+L: cpuf...@vger.kernel.org
+L: linux...@vger.kernel.org
+W: 
http://www.arm.com/products/processors/technologies/biglittleprocessing.php
+S: Maintained
+F: drivers/cpufreq/arm_big_little.h
+F: drivers/cpufreq/arm_big_little.c
+F: drivers/cpufreq/arm_big_little_dt.c
+
 CPUID/MSR DRIVER
 M: "H. Peter Anvin" 
 S: Maintained
diff --git a/drivers/cpufreq/Kconfig.arm b/drivers/cpufreq/Kconfig.arm
index 030ddf6..fdf54a9 100644
--- a/drivers/cpufreq/Kconfig.arm
+++ b/drivers/cpufreq/Kconfig.arm
@@ -2,6 +2,19 @@
 # ARM CPU Frequency scaling drivers
 #
 
+config ARM_BIG_LITTLE_CPUFREQ
+   tristate
+   depends on ARM_CPU_TOPOLOGY
+
+config ARM_DT_BL_CPUFREQ
+   tristate "Generic ARM big LITTLE CPUfreq driver probed via DT"
+   select ARM_BIG_LITTLE_CPUFREQ
+   depends on OF
+   default n
+   help
+ This enables the Generic CPUfreq driver for ARM big.LITTLE platform.
+ This gets frequency tables from DT.
+
 config ARM_OMAP2PLUS_CPUFREQ
bool "TI OMAP2+"
depends on ARCH_OMAP2PLUS
diff --git a/drivers/cpufreq/Makefile b/drivers/cpufreq/Makefile
index 863fd18..d1b0832 100644
--- a/drivers/cpufreq/Makefile
+++ b/drivers/cpufreq/Makefile
@@ -44,6 +44,10 @@ obj-$(CONFIG_X86_INTEL_PSTATE)   += 
intel_pstate.o
 
 
##
 # ARM SoC drivers
+obj-$(CONFIG_ARM_BIG_LITTLE_CPUFREQ)   += arm_big_little.o
+# big LITTLE per platform glues. Keep DT_BL_CPUFREQ as the last entry in all 
big
+# LITTLE drivers, so that it is probed last.
+obj-$(CONFIG_ARM_DT_BL_CPUFREQ)+= arm_big_little_dt.o
 obj-$(CONFIG_UX500_SOC_DB8500) += dbx500-cpufreq.o
 obj-$(CONFIG_ARM_S3C2416_CPUFREQ)  += s3c2416-cpufreq.o
 obj-$(CONFIG_ARM_S3C64XX_CPUFREQ)  += s3c64xx-cpufreq.o
diff --git a/drivers/cpufreq/arm_big_little.c b/drivers/cpufreq/arm_big_little.c
new file mode 100644
index 

Re: [PATCH V2 01/14] MIPS: Build uasm-generated code only once to avoid CPU Hotplug problem

2013-03-04 Thread 陈华才
I'm sorry, this is the only patch, please ignore [01/14].

> Currently, clear_page()/copy_page() are generated by Micro-assembler
> dynamically. But they are unavailable until uasm_resolve_relocs() has
> finished because jump labels are illegal before that. Since these
> functions are shared by every CPU, we only call build_clear_page()/
> build_copy_page() only once at boot time. Without this patch, programs
> will get random memory corruption (segmentation fault, bus error, etc.)
> while CPU Hotplug (e.g. one CPU is using clear_page() while another is
> generating it in cpu_cache_init()).
>
> For similar reasons we modify build_tlb_refill_handler()'s invocation.
>
> V2:
> 1, Rework the code to make CPU#0 can be online/offline.
> 2, Introduce cpu_has_local_ebase feature since some types of MIPS CPU
> need a per-CPU tlb_refill_handler().
>
> Signed-off-by: Huacai Chen 
> Signed-off-by: Hongbing Hu 
> ---
>  arch/mips/include/asm/cpu-features.h   |3 +++
>  .../asm/mach-loongson/cpu-feature-overrides.h  |1 +
>  arch/mips/mm/page.c|   10 ++
>  arch/mips/mm/tlbex.c   |   10 --
>  4 files changed, 22 insertions(+), 2 deletions(-)
>
> diff --git a/arch/mips/include/asm/cpu-features.h
> b/arch/mips/include/asm/cpu-features.h
> index c507b93..1204408 100644
> --- a/arch/mips/include/asm/cpu-features.h
> +++ b/arch/mips/include/asm/cpu-features.h
> @@ -110,6 +110,9 @@
>  #ifndef cpu_has_pindexed_dcache
>  #define cpu_has_pindexed_dcache  (cpu_data[0].dcache.flags &
> MIPS_CACHE_PINDEX)
>  #endif
> +#ifndef cpu_has_local_ebase
> +#define cpu_has_local_ebase  1
> +#endif
>
>  /*
>   * I-Cache snoops remote store.  This only matters on SMP.  Some
> multiprocessors
> diff --git a/arch/mips/include/asm/mach-loongson/cpu-feature-overrides.h
> b/arch/mips/include/asm/mach-loongson/cpu-feature-overrides.h
> index 1a05d85..8eec8e2 100644
> --- a/arch/mips/include/asm/mach-loongson/cpu-feature-overrides.h
> +++ b/arch/mips/include/asm/mach-loongson/cpu-feature-overrides.h
> @@ -57,5 +57,6 @@
>  #define cpu_has_vint 0
>  #define cpu_has_vtag_icache  0
>  #define cpu_has_watch1
> +#define cpu_has_local_ebase  0
>
>  #endif /* __ASM_MACH_LOONGSON_CPU_FEATURE_OVERRIDES_H */
> diff --git a/arch/mips/mm/page.c b/arch/mips/mm/page.c
> index 8e666c5..6a39c01 100644
> --- a/arch/mips/mm/page.c
> +++ b/arch/mips/mm/page.c
> @@ -247,6 +247,11 @@ void __cpuinit build_clear_page(void)
>   struct uasm_label *l = labels;
>   struct uasm_reloc *r = relocs;
>   int i;
> + static atomic_t run_once = ATOMIC_INIT(0);
> +
> + if (atomic_xchg(_once, 1)) {
> + return;
> + }
>
>   memset(labels, 0, sizeof(labels));
>   memset(relocs, 0, sizeof(relocs));
> @@ -389,6 +394,11 @@ void __cpuinit build_copy_page(void)
>   struct uasm_label *l = labels;
>   struct uasm_reloc *r = relocs;
>   int i;
> + static atomic_t run_once = ATOMIC_INIT(0);
> +
> + if (atomic_xchg(_once, 1)) {
> + return;
> + }
>
>   memset(labels, 0, sizeof(labels));
>   memset(relocs, 0, sizeof(relocs));
> diff --git a/arch/mips/mm/tlbex.c b/arch/mips/mm/tlbex.c
> index 1c8ac49..4a8b294 100644
> --- a/arch/mips/mm/tlbex.c
> +++ b/arch/mips/mm/tlbex.c
> @@ -2161,8 +2161,11 @@ void __cpuinit build_tlb_refill_handler(void)
>   case CPU_TX3922:
>   case CPU_TX3927:
>  #ifndef CONFIG_MIPS_PGD_C0_CONTEXT
> - build_r3000_tlb_refill_handler();
> + if (cpu_has_local_ebase)
> + build_r3000_tlb_refill_handler();
>   if (!run_once) {
> + if (!cpu_has_local_ebase)
> + build_r3000_tlb_refill_handler();
>   build_r3000_tlb_load_handler();
>   build_r3000_tlb_store_handler();
>   build_r3000_tlb_modify_handler();
> @@ -2191,9 +2194,12 @@ void __cpuinit build_tlb_refill_handler(void)
>   build_r4000_tlb_load_handler();
>   build_r4000_tlb_store_handler();
>   build_r4000_tlb_modify_handler();
> + if (!cpu_has_local_ebase)
> + build_r4000_tlb_refill_handler();
>   run_once++;
>   }
> - build_r4000_tlb_refill_handler();
> + if (cpu_has_local_ebase)
> + build_r4000_tlb_refill_handler();
>   }
>  }
>
> --
> 1.7.7.3
>
>


-- 
江苏中科梦兰电子科技有限公司

软件部 陈华才

E-mail: che...@lemote.com

Web: http://www.lemote.com/

Add: 江苏省常熟市虞山镇梦兰工业园

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH 1/1] pwm_bl: Add support for backlight enable GPIO

2013-03-04 Thread Andrew Chew
I sent out a new patch that enables/disables the backlight enable gpio.

> On 03/05/2013 01:00 PM, Andrew Chew wrote:
> > I did come to the same conclusion regarding the platform data breakage.
> > I'm expecting that the use of platform data will go away, at least on
> > ARM, since we are all aggressively moving what used to be in platform
> > data into the device tree.  Do other platforms use this driver?
> 
> I can see at least 29 users of platform_pwm_backlight_data, all ARM with the
> exception of one unicore32. I guess at least for the foreseeable future
> platform data will remain.

I'm not sure how to solve this, then.  Any suggestions?

> > I remember hearing that there is some work in progress to encapsulate
> > gpios into a struct, rather than passing it around as a bare integer,
> > so when that happens, we can use NULL for no-gpio, which should take
> > care of the platform data issue as well.  It's kind of difficult to
> > work around this problem otherwise.
> 
> Yes, actually I am doing the GPIO rework. If you are not too much in a hurry
> you might want for it to happen (should not be too long now that the core
> has been reworked). At the same time, GPIO descriptors will also enable the
> power sequences, so if you wait even longer (or help me with it), this patch
> might not even be needed at all. Of course if you want to support this
> *now*, this is still the shortest path.

Sadly, I do need this now, and I'd rather do it as cleanly as possible rather
than maintaining a hack.  The project I am working on is very pedantic.

> > I agree that we should be turning on and off the backlight enable gpio
> > as needed to save power.  I just haven't gotten there yet.  I can try
> > to modify this patch if that's your preference, or I can follow up
> > with a patch to add this in the very near future.
> 
> That's ultimately for Thierry to say, but submitting a new revision makes
> more sense IMHO - it is not a big change and there are other issues to
> address (uninitialized GPIO in platform data) anyway.

Done.

> > To answer your last question, yes, this single patch does allow me to
> > enable the backlight on some boards (in particular, the one I'm working
> on).
> 
> Cool - may I ask which one? All the NV boards I tried to far required more
> complex sequences for their panels.

This is for t114-dalmore.  There may be other gpios that are needed that I'm
not aware of off the top of my head.  For the backlight itself, this seems to
be the only one.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3/3] mfd: davinci_voicecodec: use module_platform_driver_probe()

2013-03-04 Thread Jingoo Han
This patch uses module_platform_driver_probe() macro which makes
the code smaller and simpler.

Signed-off-by: Jingoo Han 
---
 drivers/mfd/davinci_voicecodec.c |   12 +---
 1 files changed, 1 insertions(+), 11 deletions(-)

diff --git a/drivers/mfd/davinci_voicecodec.c b/drivers/mfd/davinci_voicecodec.c
index c0bcc87..c60ab0c 100644
--- a/drivers/mfd/davinci_voicecodec.c
+++ b/drivers/mfd/davinci_voicecodec.c
@@ -177,17 +177,7 @@ static struct platform_driver davinci_vc_driver = {
.remove = davinci_vc_remove,
 };
 
-static int __init davinci_vc_init(void)
-{
-   return platform_driver_probe(_vc_driver, davinci_vc_probe);
-}
-module_init(davinci_vc_init);
-
-static void __exit davinci_vc_exit(void)
-{
-   platform_driver_unregister(_vc_driver);
-}
-module_exit(davinci_vc_exit);
+module_platform_driver_probe(davinci_vc_driver, davinci_vc_probe);
 
 MODULE_AUTHOR("Miguel Aguilar");
 MODULE_DESCRIPTION("Texas Instruments DaVinci Voice Codec Core Interface");
-- 
1.7.2.5


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/3] mfd: htc-pasic3: use module_platform_driver_probe()

2013-03-04 Thread Jingoo Han
This patch uses module_platform_driver_probe() macro which makes
the code smaller and simpler.

Signed-off-by: Jingoo Han 
---
 drivers/mfd/htc-pasic3.c |   13 +
 1 files changed, 1 insertions(+), 12 deletions(-)

diff --git a/drivers/mfd/htc-pasic3.c b/drivers/mfd/htc-pasic3.c
index 9e5453d..0285fce 100644
--- a/drivers/mfd/htc-pasic3.c
+++ b/drivers/mfd/htc-pasic3.c
@@ -208,18 +208,7 @@ static struct platform_driver pasic3_driver = {
.remove = pasic3_remove,
 };
 
-static int __init pasic3_base_init(void)
-{
-   return platform_driver_probe(_driver, pasic3_probe);
-}
-
-static void __exit pasic3_base_exit(void)
-{
-   platform_driver_unregister(_driver);
-}
-
-module_init(pasic3_base_init);
-module_exit(pasic3_base_exit);
+module_platform_driver_probe(pasic3_driver, pasic3_probe);
 
 MODULE_AUTHOR("Philipp Zabel ");
 MODULE_DESCRIPTION("Core driver for HTC PASIC3");
-- 
1.7.2.5


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/3] mfd: ab3100-otp: use module_platform_driver_probe()

2013-03-04 Thread Jingoo Han
This patch uses module_platform_driver_probe() macro which makes
the code smaller and simpler.

Signed-off-by: Jingoo Han 
---
 drivers/mfd/ab3100-otp.c |   14 +-
 1 files changed, 1 insertions(+), 13 deletions(-)

diff --git a/drivers/mfd/ab3100-otp.c b/drivers/mfd/ab3100-otp.c
index 8440010..d7ce016 100644
--- a/drivers/mfd/ab3100-otp.c
+++ b/drivers/mfd/ab3100-otp.c
@@ -248,19 +248,7 @@ static struct platform_driver ab3100_otp_driver = {
.remove  = __exit_p(ab3100_otp_remove),
 };
 
-static int __init ab3100_otp_init(void)
-{
-   return platform_driver_probe(_otp_driver,
-ab3100_otp_probe);
-}
-
-static void __exit ab3100_otp_exit(void)
-{
-   platform_driver_unregister(_otp_driver);
-}
-
-module_init(ab3100_otp_init);
-module_exit(ab3100_otp_exit);
+module_platform_driver_probe(ab3100_otp_driver, ab3100_otp_probe);
 
 MODULE_AUTHOR("Linus Walleij ");
 MODULE_DESCRIPTION("AB3100 OTP Readout Driver");
-- 
1.7.2.5


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/1 v2] pwm_bl: Add support for backlight enable GPIO

2013-03-04 Thread Andrew Chew
The backlight enable GPIO is specified in the device tree node for
backlight.

Signed-off-by: Andrew Chew 
---
I decided to go ahead with disabling/enabling the backlight via GPIO as
needed.  Note that I named the new functions pwm_backlight_enable() and
pwm_backlight_disable() (instead of something more gpio-specific) because
I thought it would be convenient to have a generic hook for when someone
wants to add yet more stuff to be done on enable/disable.

I tested this by going into /sys/class/backlight/backlight.n and manually
adjusting the brightness, and checked the gpio state to see that it had
the appropriate value.

 .../bindings/video/backlight/pwm-backlight.txt |2 +
 drivers/video/backlight/pwm_bl.c   |   50 ++--
 include/linux/pwm_backlight.h  |2 +
 3 files changed, 50 insertions(+), 4 deletions(-)

diff --git 
a/Documentation/devicetree/bindings/video/backlight/pwm-backlight.txt 
b/Documentation/devicetree/bindings/video/backlight/pwm-backlight.txt
index 1e4fc72..1ed4f0f 100644
--- a/Documentation/devicetree/bindings/video/backlight/pwm-backlight.txt
+++ b/Documentation/devicetree/bindings/video/backlight/pwm-backlight.txt
@@ -14,6 +14,8 @@ Required properties:
 Optional properties:
   - pwm-names: a list of names for the PWM devices specified in the
"pwms" property (see PWM binding[0])
+  - enable-gpio: a GPIO that needs to be used to enable the backlight
+  - enable-gpio-active-high: polarity of GPIO is active high (default is low)
 
 [0]: Documentation/devicetree/bindings/pwm/pwm.txt
 
diff --git a/drivers/video/backlight/pwm_bl.c b/drivers/video/backlight/pwm_bl.c
index 069983c..7dd426e 100644
--- a/drivers/video/backlight/pwm_bl.c
+++ b/drivers/video/backlight/pwm_bl.c
@@ -20,10 +20,13 @@
 #include 
 #include 
 #include 
+#include 
 
 struct pwm_bl_data {
struct pwm_device   *pwm;
struct device   *dev;
+   int enable_gpio;
+   unsigned intenable_gpio_flags;
unsigned intperiod;
unsigned intlth_brightness;
unsigned int*levels;
@@ -35,6 +38,20 @@ struct pwm_bl_data {
void(*exit)(struct device *);
 };
 
+static void pwm_backlight_enable(struct pwm_bl_data *pb)
+{
+   if (gpio_is_valid(pb->enable_gpio))
+   gpio_set_value(pb->enable_gpio,
+  pb->enable_gpio_flags & GPIOF_INIT_HIGH ? 1 : 0);
+}
+
+static void pwm_backlight_disable(struct pwm_bl_data *pb)
+{
+   if (gpio_is_valid(pb->enable_gpio))
+   gpio_set_value(pb->enable_gpio,
+  pb->enable_gpio_flags & GPIOF_INIT_HIGH ? 0 : 1);
+}
+
 static int pwm_backlight_update_status(struct backlight_device *bl)
 {
struct pwm_bl_data *pb = dev_get_drvdata(>dev);
@@ -53,6 +70,7 @@ static int pwm_backlight_update_status(struct 
backlight_device *bl)
if (brightness == 0) {
pwm_config(pb->pwm, 0, pb->period);
pwm_disable(pb->pwm);
+   pwm_backlight_disable(pb);
} else {
int duty_cycle;
 
@@ -67,6 +85,7 @@ static int pwm_backlight_update_status(struct 
backlight_device *bl)
 (duty_cycle * (pb->period - pb->lth_brightness) / max);
pwm_config(pb->pwm, duty_cycle, pb->period);
pwm_enable(pb->pwm);
+   pwm_backlight_enable(pb);
}
 
if (pb->notify_after)
@@ -146,10 +165,15 @@ static int pwm_backlight_parse_dt(struct device *dev,
}
 
/*
-* TODO: Most users of this driver use a number of GPIOs to control
-*   backlight power. Support for specifying these needs to be
-*   added.
+* If "enable-gpio" is present, use that GPIO to enable the backlight.
+* The presence (or not) of "enable-gpio-active-high" will determine
+* the value of the GPIO.
 */
+   data->enable_gpio = of_get_named_gpio(node, "enable-gpio", 0);
+   if (of_property_read_bool(node, "enable-gpio-active-high"))
+   data->enable_gpio_flags = GPIOF_OUT_INIT_HIGH;
+   else
+   data->enable_gpio_flags = GPIOF_OUT_INIT_LOW;
 
return 0;
 }
@@ -207,12 +231,23 @@ static int pwm_backlight_probe(struct platform_device 
*pdev)
} else
max = data->max_brightness;
 
+   pb->enable_gpio = data->enable_gpio;
+   pb->enable_gpio_flags = data->enable_gpio_flags;
pb->notify = data->notify;
pb->notify_after = data->notify_after;
pb->check_fb = data->check_fb;
pb->exit = data->exit;
pb->dev = >dev;
 
+   if (gpio_is_valid(pb->enable_gpio)) {
+   ret = gpio_request_one(pb->enable_gpio,
+   GPIOF_DIR_OUT | pb->enable_gpio_flags, "bl_en");
+   if (ret) {
+   dev_err(>dev, "failed to 

Re: [PATCH 1/1] pwm_bl: Add support for backlight enable GPIO

2013-03-04 Thread Alex Courbot

On 03/05/2013 01:00 PM, Andrew Chew wrote:

I did come to the same conclusion regarding the platform data breakage.
I'm expecting that the use of platform data will go away, at least on ARM,
since we are all aggressively moving what used to be in platform data into
the device tree.  Do other platforms use this driver?


I can see at least 29 users of platform_pwm_backlight_data, all ARM with 
the exception of one unicore32. I guess at least for the foreseeable 
future platform data will remain.



I remember hearing that there is some work in progress to encapsulate
gpios into a struct, rather than passing it around as a bare integer, so when
that happens, we can use NULL for no-gpio, which should take care of the
platform data issue as well.  It's kind of difficult to work around this problem
otherwise.


Yes, actually I am doing the GPIO rework. If you are not too much in a 
hurry you might want for it to happen (should not be too long now that 
the core has been reworked). At the same time, GPIO descriptors will 
also enable the power sequences, so if you wait even longer (or help me 
with it), this patch might not even be needed at all. Of course if you 
want to support this *now*, this is still the shortest path.



I agree that we should be turning on and off the backlight enable gpio as
needed to save power.  I just haven't gotten there yet.  I can try to modify
this patch if that's your preference, or I can follow up with a patch to add
this in the very near future.


That's ultimately for Thierry to say, but submitting a new revision 
makes more sense IMHO - it is not a big change and there are other 
issues to address (uninitialized GPIO in platform data) anyway.



To answer your last question, yes, this single patch does allow me to enable
the backlight on some boards (in particular, the one I'm working on).


Cool - may I ask which one? All the NV boards I tried to far required 
more complex sequences for their panels.


Alex.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH V2 01/14] MIPS: Build uasm-generated code only once to avoid CPU Hotplug problem

2013-03-04 Thread Huacai Chen
Currently, clear_page()/copy_page() are generated by Micro-assembler
dynamically. But they are unavailable until uasm_resolve_relocs() has
finished because jump labels are illegal before that. Since these
functions are shared by every CPU, we only call build_clear_page()/
build_copy_page() only once at boot time. Without this patch, programs
will get random memory corruption (segmentation fault, bus error, etc.)
while CPU Hotplug (e.g. one CPU is using clear_page() while another is
generating it in cpu_cache_init()).

For similar reasons we modify build_tlb_refill_handler()'s invocation.

V2:
1, Rework the code to make CPU#0 can be online/offline.
2, Introduce cpu_has_local_ebase feature since some types of MIPS CPU
need a per-CPU tlb_refill_handler().

Signed-off-by: Huacai Chen 
Signed-off-by: Hongbing Hu 
---
 arch/mips/include/asm/cpu-features.h   |3 +++
 .../asm/mach-loongson/cpu-feature-overrides.h  |1 +
 arch/mips/mm/page.c|   10 ++
 arch/mips/mm/tlbex.c   |   10 --
 4 files changed, 22 insertions(+), 2 deletions(-)

diff --git a/arch/mips/include/asm/cpu-features.h 
b/arch/mips/include/asm/cpu-features.h
index c507b93..1204408 100644
--- a/arch/mips/include/asm/cpu-features.h
+++ b/arch/mips/include/asm/cpu-features.h
@@ -110,6 +110,9 @@
 #ifndef cpu_has_pindexed_dcache
 #define cpu_has_pindexed_dcache(cpu_data[0].dcache.flags & 
MIPS_CACHE_PINDEX)
 #endif
+#ifndef cpu_has_local_ebase
+#define cpu_has_local_ebase1
+#endif
 
 /*
  * I-Cache snoops remote store.  This only matters on SMP.  Some 
multiprocessors
diff --git a/arch/mips/include/asm/mach-loongson/cpu-feature-overrides.h 
b/arch/mips/include/asm/mach-loongson/cpu-feature-overrides.h
index 1a05d85..8eec8e2 100644
--- a/arch/mips/include/asm/mach-loongson/cpu-feature-overrides.h
+++ b/arch/mips/include/asm/mach-loongson/cpu-feature-overrides.h
@@ -57,5 +57,6 @@
 #define cpu_has_vint   0
 #define cpu_has_vtag_icache0
 #define cpu_has_watch  1
+#define cpu_has_local_ebase0
 
 #endif /* __ASM_MACH_LOONGSON_CPU_FEATURE_OVERRIDES_H */
diff --git a/arch/mips/mm/page.c b/arch/mips/mm/page.c
index 8e666c5..6a39c01 100644
--- a/arch/mips/mm/page.c
+++ b/arch/mips/mm/page.c
@@ -247,6 +247,11 @@ void __cpuinit build_clear_page(void)
struct uasm_label *l = labels;
struct uasm_reloc *r = relocs;
int i;
+   static atomic_t run_once = ATOMIC_INIT(0);
+
+   if (atomic_xchg(_once, 1)) {
+   return;
+   }
 
memset(labels, 0, sizeof(labels));
memset(relocs, 0, sizeof(relocs));
@@ -389,6 +394,11 @@ void __cpuinit build_copy_page(void)
struct uasm_label *l = labels;
struct uasm_reloc *r = relocs;
int i;
+   static atomic_t run_once = ATOMIC_INIT(0);
+
+   if (atomic_xchg(_once, 1)) {
+   return;
+   }
 
memset(labels, 0, sizeof(labels));
memset(relocs, 0, sizeof(relocs));
diff --git a/arch/mips/mm/tlbex.c b/arch/mips/mm/tlbex.c
index 1c8ac49..4a8b294 100644
--- a/arch/mips/mm/tlbex.c
+++ b/arch/mips/mm/tlbex.c
@@ -2161,8 +2161,11 @@ void __cpuinit build_tlb_refill_handler(void)
case CPU_TX3922:
case CPU_TX3927:
 #ifndef CONFIG_MIPS_PGD_C0_CONTEXT
-   build_r3000_tlb_refill_handler();
+   if (cpu_has_local_ebase)
+   build_r3000_tlb_refill_handler();
if (!run_once) {
+   if (!cpu_has_local_ebase)
+   build_r3000_tlb_refill_handler();
build_r3000_tlb_load_handler();
build_r3000_tlb_store_handler();
build_r3000_tlb_modify_handler();
@@ -2191,9 +2194,12 @@ void __cpuinit build_tlb_refill_handler(void)
build_r4000_tlb_load_handler();
build_r4000_tlb_store_handler();
build_r4000_tlb_modify_handler();
+   if (!cpu_has_local_ebase)
+   build_r4000_tlb_refill_handler();
run_once++;
}
-   build_r4000_tlb_refill_handler();
+   if (cpu_has_local_ebase)
+   build_r4000_tlb_refill_handler();
}
 }
 
-- 
1.7.7.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH v2 0/4] Add support for S3 non-stop TSC support.

2013-03-04 Thread Jason Gunthorpe
On Tue, Mar 05, 2013 at 11:53:02AM +0800, Feng Tang wrote:

> > // Drops some small precision along the way but is simple..
> > static inline u64 cyclecounter_cyc2ns_128(const struct cyclecounter *cc,
> >   cycle_t cycles)
> > {
> > u64 max = U64_MAX/cc->mult;
> > u64 num = cycles/max;
> > u64 result = num * ((max * cc->mult) >> cc->shift);
> > return result + cyclecounter_cyc2ns(cc, cycles - num*cc->mult);
> > }
> 
> Your way is surely more accurate, if maintainers are ok with adding
> the new API, I will use it.

Okay, give it a good look though, I only wrote it out in email, never
tested it :)

> > You may want to also CC the maintainers of all the ARM subsystems that
> > use read_persistent_clock and check with them to ensure this new
> > interface will let them migrate their implementations as well.
> 
> Maybe I didn't get it well, my patches didn't change the
> read_persistent_clock(), but inject a new way of counting suspended
> time. It should have no functional changes to existing platforms.

Right, your patches are fine stand alone.

The ARM case of plat-omap/counter_32k.c would ideally be converted to
use your new API though, that is what I ment about involving them.

I'm not sure about mach-tegra/timer.c though - it seems to be using a
counter as well but somehow sharing registers with the RTC?

Regards,
Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   3   4   5   6   7   8   9   10   >