date:20110616

Re: linux-next: build failure after merge of the final tree (Linus' tree related)

2011-06-16 Thread KAMEZAWA Hiroyuki

On Fri, 17 Jun 2011 15:38:09 +1000
Stephen Rothwell  wrote:

> Hi all,
> 
> After merging the final tree, today's linux-next build (powerpc
> allyesconfig) failed like this:
> 
> mm/page_cgroup.c: In function 'page_cgroup_init':
> mm/page_cgroup.c:309:13: error: 'pg_data_t' has no member named 'node_end_pfn'
> 
> Caused by commit 37573e8c7182 ("memcg: fix init_page_cgroup nid with
> sparsemem").  On powerpc, node_end_pfn() is defined to be (NODE_DATA
> (nid)->node_end_pfn) where NODE_DATA(nid) is (node_data[nid]) and
> node_data is struct pglist_data *node_data[].  As far as I can see,
> struct pglist_data has never had a member called node_end_pfn.
> 
> This commit introduces the only use of node_end_pfn() in the generic
> kernel code.  Presumably the powerpc definition needs to be fixed (to
> maybe something like the x86 version).  It looks like the sparc version
> is broken as well.
> 

Sorry, here is a fix I posted today. but no ack yet.
==
>From 507cc95c5ba2351bff16c5421255d1395a3b555b Mon Sep 17 00:00:00 2001
From: KAMEZAWA Hiroyuki 
Date: Thu, 16 Jun 2011 17:28:07 +0900
Subject: [PATCH] Fix node_start/end_pfn() definition for mm/page_cgroup.c

commit 21a3c96 uses node_start/end_pfn(nid) for detection start/end
of nodes. But, it's not defined in linux/mmzone.h but defined in
/arch/???/include/mmzone.h which is included only under
CONFIG_NEED_MULTIPLE_NODES=y.

Then, we see
mm/page_cgroup.c: In function 'page_cgroup_init':
mm/page_cgroup.c:308: error: implicit declaration of function 'node_start_pfn'
mm/page_cgroup.c:309: error: implicit declaration of function 'node_end_pfn'

So, fixiing page_cgroup.c is an idea...

But node_start_pfn()/node_end_pfn() is a very generic macro and
should be implemented in the same manner for all archs.
(m32r has different implementation...)

This patch removes definitions of node_start/end_pfn() in each archs
and defines a unified one in linux/mmzone.h. It's not under
CONFIG_NEED_MULTIPLE_NODES, now.

A result of macro expansion is here (mm/page_cgroup.c)

for !NUMA
 start_pfn = ((&contig_page_data)->node_start_pfn);
  end_pfn = ({ pg_data_t *__pgdat = (&contig_page_data); 
__pgdat->node_start_pfn + __pgdat->node_spanned_pages;});

for NUMA (x86-64)
  start_pfn = ((node_data[nid])->node_start_pfn);
  end_pfn = ({ pg_data_t *__pgdat = (node_data[nid]); __pgdat->node_start_pfn + 
__pgdat->node_spanned_pages;});

Signed-off-by: KAMEZAWA Hiroyuki 

Changelog:
 - fixed to avoid using "nid" twice in node_end_pfn() macro.
---
 arch/alpha/include/asm/mmzone.h   |1 -
 arch/m32r/include/asm/mmzone.h|8 +---
 arch/parisc/include/asm/mmzone.h  |7 ---
 arch/powerpc/include/asm/mmzone.h |7 ---
 arch/sh/include/asm/mmzone.h  |4 
 arch/sparc/include/asm/mmzone.h   |2 --
 arch/tile/include/asm/mmzone.h|   11 ---
 arch/x86/include/asm/mmzone_32.h  |   11 ---
 arch/x86/include/asm/mmzone_64.h  |3 ---
 include/linux/mmzone.h|7 +++
 10 files changed, 8 insertions(+), 53 deletions(-)

diff --git a/arch/alpha/include/asm/mmzone.h b/arch/alpha/include/asm/mmzone.h
index 8af56ce..445dc42 100644
--- a/arch/alpha/include/asm/mmzone.h
+++ b/arch/alpha/include/asm/mmzone.h
@@ -56,7 +56,6 @@ PLAT_NODE_DATA_LOCALNR(unsigned long p, int n)
  * Given a kernel address, find the home node of the underlying memory.
  */
 #define kvaddr_to_nid(kaddr)   pa_to_nid(__pa(kaddr))
-#define node_start_pfn(nid)(NODE_DATA(nid)->node_start_pfn)
 
 /*
  * Given a kaddr, LOCAL_BASE_ADDR finds the owning node of the memory
diff --git a/arch/m32r/include/asm/mmzone.h b/arch/m32r/include/asm/mmzone.h
index 9f3b5ac..115ced3 100644
--- a/arch/m32r/include/asm/mmzone.h
+++ b/arch/m32r/include/asm/mmzone.h
@@ -14,12 +14,6 @@ extern struct pglist_data *node_data[];
 #define NODE_DATA(nid) (node_data[nid])
 
 #define node_localnr(pfn, nid) ((pfn) - NODE_DATA(nid)->node_start_pfn)
-#define node_start_pfn(nid)(NODE_DATA(nid)->node_start_pfn)
-#define node_end_pfn(nid)  \
-({ \
-   pg_data_t *__pgdat = NODE_DATA(nid);\
-   __pgdat->node_start_pfn + __pgdat->node_spanned_pages - 1;  \
-})
 
 #define pmd_page(pmd)  (pfn_to_page(pmd_val(pmd) >> PAGE_SHIFT))
 /*
@@ -44,7 +38,7 @@ static __inline__ int pfn_to_nid(unsigned long pfn)
int node;
 
for (node = 0 ; node < MAX_NUMNODES ; node++)
-   if (pfn >= node_start_pfn(node) && pfn <= node_end_pfn(node))
+   if (pfn >= node_start_pfn(node) && pfn < node_end_pfn(node))
break;
 
return node;
diff --git a/arch/parisc/include/asm/mmzone.h b/arch/parisc/include/asm/mmzone.h
index 9608d2c..e67eb9c 100644
--- a/arch/parisc/include/asm/mmzone.h
+++ b/arch/parisc/include/asm/mmzone.h
@@ -14,13 +14,6 @@ extern struct node_map_data node_data[];
 
 #define NODE_DA

RE: powerpc: Add printk companion for ppc_md.progress

2011-06-16 Thread Benjamin Herrenschmidt

On Fri, 2011-06-17 at 15:39 +1000, Benjamin Herrenschmidt wrote:
> (Original mail lost in my email cleanup so this isn't a proper reply)
> 
> I'll apply that for now, but I'd very much like somebody to just get rid
> of the whole ppc_md.progress business.
> 
> We have printk working early enough nowadays (and we can use udbg for
> debugging).
> 
> It was meant to display magic numbers on the panel of IBM machines, I
> don't think it was every useful...

Hrm.. none of your 2 pending patches applies cleanly (factoring
free_initmem & that one). Haven't had a chance to check why yet, can you
check on your side ?

Cheers,
Ben.


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 3/3] powerpc: POWER7 optimised copy_to_user/copy_from_user using VMX

2011-06-16 Thread Benjamin Herrenschmidt

On Fri, 2011-06-17 at 14:54 +1000, Anton Blanchard wrote:
> plain text document attachment (power7_copy_tofrom_user)
> Implement a POWER7 optimised copy_to_user/copy_from_user using VMX.
> For large aligned copies this new loop is over 10% faster, and for
> large unaligned copies it is over 200% faster.
> 
> If we take a fault we fall back to the old version, this keeps
> things relatively simple and easy to verify.

Same re-entrancy comment as the other ones. preempt & interrupts...
Except here is worse since you may page fault and thus lose the vmx
state completely.

Cheers,
Ben.



___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 2/3] powerpc: POWER7 optimised memcpy using VMX

2011-06-16 Thread Benjamin Herrenschmidt

O
> +.Lvmx_copy:
> + mflrr0
> + std r4,56(r1)
> + std r5,64(r1)
> + std r0,16(r1)
> + stdur1,-STACKFRAMESIZE(r1)
> + bl  .enable_kernel_altivec
> + ld  r0,STACKFRAMESIZE+16(r1)
> + ld  r3,STACKFRAMESIZE+48(r1)
> + ld  r4,STACKFRAMESIZE+56(r1)
> + ld  r5,STACKFRAMESIZE+64(r1)
> + mtlrr0

Disable interrupts ? We wont save the VMX state on interrupts and memcpy
is definitely re-entrant.

Or only run the optimization when not at interrupt time

Cheers,
Ben.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 1/3] powerpc: POWER7 optimised copy_page using VMX

2011-06-16 Thread Benjamin Herrenschmidt

On Fri, 2011-06-17 at 14:53 +1000, Anton Blanchard wrote:

> +#include 
> +#include 
> +
> +#define STACKFRAMESIZE   112
> +
> +_GLOBAL(copypage_power7)
> + mflrr0
> + std r3,48(r1)
> + std r4,56(r1)
> + std r0,16(r1)
> + stdur1,-STACKFRAMESIZE(r1)
> +
> + bl  .enable_kernel_altivec

Don't you need to preempt disable ? Or even irq disable ? Or do we know
copy page will never called at irq time ?

Also I wonder if you wouldn't be better to instead just manually enable
it MSR and save some VRs (if no current thread regs is attached) ? That
would be re-entrant.

> + ld  r12,STACKFRAMESIZE+16(r1)
> + ld  r4,STACKFRAMESIZE+56(r1)
> + li  r0,(PAGE_SIZE/128)
> + li  r6,16
> + ld  r3,STACKFRAMESIZE+48(r1)
> + li  r7,32
> + li  r8,48
> + mtctr   r0
> + li  r9,64
> + li  r10,80
> + mtlrr12
> + li  r11,96
> + li  r12,112
> + addir1,r1,STACKFRAMESIZE
> +
> + .align  5

Do we know that the blank will be filled with something harmless ?

> +1:   lvx vr7,r0,r4
> + lvx vr6,r4,r6
> + lvx vr5,r4,r7
> + lvx vr4,r4,r8
> + lvx vr3,r4,r9
> + lvx vr2,r4,r10
> + lvx vr1,r4,r11
> + lvx vr0,r4,r12
> + addir4,r4,128
> + stvxvr7,r0,r3
> + stvxvr6,r3,r6
> + stvxvr5,r3,r7
> + stvxvr4,r3,r8
> + stvxvr3,r3,r9
> + stvxvr2,r3,r10
> + stvxvr1,r3,r11
> + stvxvr0,r3,r12
> + addir3,r3,128
> + bdnz1b

What about lvxl ? You aren't likely to re-use the source data soon
right ?

Hrm... re-reading the arch, it looks like the "l" variant is quirky,
should really only used on the last load of a cache block, but in your
case that should be ok to put it on the last accesses since we know the
alignment.

> + blr
> Index: linux-powerpc/arch/powerpc/lib/Makefile
> ===
> --- linux-powerpc.orig/arch/powerpc/lib/Makefile  2011-05-19 
> 19:57:38.058570608 +1000
> +++ linux-powerpc/arch/powerpc/lib/Makefile   2011-06-17 07:39:58.996165527 
> +1000
> @@ -16,7 +16,8 @@ obj-$(CONFIG_HAS_IOMEM) += devres.o
>  
>  obj-$(CONFIG_PPC64)  += copypage_64.o copyuser_64.o \
>  memcpy_64.o usercopy_64.o mem_64.o string.o \
> -checksum_wrappers_64.o hweight_64.o
> +checksum_wrappers_64.o hweight_64.o \
> +copypage_power7.o
>  obj-$(CONFIG_XMON)   += sstep.o ldstfp.o
>  obj-$(CONFIG_KPROBES)+= sstep.o ldstfp.o
>  obj-$(CONFIG_HAVE_HW_BREAKPOINT) += sstep.o ldstfp.o
> Index: linux-powerpc/arch/powerpc/lib/copypage_64.S
> ===
> --- linux-powerpc.orig/arch/powerpc/lib/copypage_64.S 2011-06-06 
> 08:07:35.0 +1000
> +++ linux-powerpc/arch/powerpc/lib/copypage_64.S  2011-06-17 
> 07:39:58.996165527 +1000
> @@ -17,7 +17,11 @@ PPC64_CACHES:
>  .section".text"
>  
>  _GLOBAL(copy_page)
> +BEGIN_FTR_SECTION
>   lis r5,PAGE_SIZE@h
> +FTR_SECTION_ELSE
> +b   .copypage_power7
> +ALT_FTR_SECTION_END_IFCLR(CPU_FTR_POWER7)
>   ori r5,r5,PAGE_SIZE@l
>  BEGIN_FTR_SECTION
>   ld  r10,PPC64_CACHES@toc(r2)
> 


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 1/3] powerpc: POWER7 optimised copy_page using VMX

2011-06-16 Thread Benjamin Herrenschmidt

On Fri, 2011-06-17 at 14:53 +1000, Anton Blanchard wrote:
> plain text document attachment (power7_copypage)
> Implement a POWER7 optimised copy_page using VMX. We copy a cacheline
> at a time using VMX loads and stores.
> 
> Signed-off-by: Anton Blanchard 
> ---
> 
> How do we want to handle per machine optimised functions? I create
> yet another feature bit, but feature bits might get out of control
> at some point.

I've been wondering about that for some time The feature bit itself
isn't a big deal, for the in-kernel feature it's easy to split that into
separate masks (CPU features, cache features, debug features,
whatever...) but I don't like much the branch tricks, that won't scale
much when we have 4 or 5 versions

What I really want is a way to patch the call sites to branch to an
alternate function.

We've looked at that with Michael a while back when pondering about
merging book3e/s but never got to something satisfactory, but maybe we
didn't look hard enough at what our toolchain is capable of...

Cheers,
Ben.

> Index: linux-powerpc/arch/powerpc/include/asm/cputable.h
> ===
> --- linux-powerpc.orig/arch/powerpc/include/asm/cputable.h2011-06-06 
> 08:07:35.128707749 +1000
> +++ linux-powerpc/arch/powerpc/include/asm/cputable.h 2011-06-17 
> 07:39:58.996165527 +1000
> @@ -200,6 +200,7 @@ extern const char *powerpc_base_platform
>  #define CPU_FTR_POPCNTB  
> LONG_ASM_CONST(0x0400)
>  #define CPU_FTR_POPCNTD  
> LONG_ASM_CONST(0x0800)
>  #define CPU_FTR_ICSWX
> LONG_ASM_CONST(0x1000)
> +#define CPU_FTR_POWER7   
> LONG_ASM_CONST(0x2000)
>  
>  #ifndef __ASSEMBLY__
>  
> @@ -423,7 +424,7 @@ extern const char *powerpc_base_platform
>   CPU_FTR_PURR | CPU_FTR_SPURR | CPU_FTR_REAL_LE | \
>   CPU_FTR_DSCR | CPU_FTR_SAO  | CPU_FTR_ASYM_SMT | \
>   CPU_FTR_STCX_CHECKS_ADDRESS | CPU_FTR_POPCNTB | CPU_FTR_POPCNTD | \
> - CPU_FTR_ICSWX | CPU_FTR_CFAR)
> + CPU_FTR_ICSWX | CPU_FTR_CFAR | CPU_FTR_POWER7)
>  #define CPU_FTRS_CELL(CPU_FTR_USE_TB | CPU_FTR_LWSYNC | \
>   CPU_FTR_PPCAS_ARCH_V2 | CPU_FTR_CTRL | \
>   CPU_FTR_ALTIVEC_COMP | CPU_FTR_MMCRA | CPU_FTR_SMT | \
> Index: linux-powerpc/arch/powerpc/lib/copypage_power7.S
> ===
> --- /dev/null 1970-01-01 00:00:00.0 +
> +++ linux-powerpc/arch/powerpc/lib/copypage_power7.S  2011-06-17 
> 07:39:58.996165527 +1000
> @@ -0,0 +1,70 @@
> +/*
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write to the Free Software
> + * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
> + *
> + * Copyright (C) IBM Corporation, 2011
> + *
> + * Author: Anton Blanchard 
> + */
> +#include 
> +#include 
> +
> +#define STACKFRAMESIZE   112
> +
> +_GLOBAL(copypage_power7)
> + mflrr0
> + std r3,48(r1)
> + std r4,56(r1)
> + std r0,16(r1)
> + stdur1,-STACKFRAMESIZE(r1)
> +
> + bl  .enable_kernel_altivec
> +
> + ld  r12,STACKFRAMESIZE+16(r1)
> + ld  r4,STACKFRAMESIZE+56(r1)
> + li  r0,(PAGE_SIZE/128)
> + li  r6,16
> + ld  r3,STACKFRAMESIZE+48(r1)
> + li  r7,32
> + li  r8,48
> + mtctr   r0
> + li  r9,64
> + li  r10,80
> + mtlrr12
> + li  r11,96
> + li  r12,112
> + addir1,r1,STACKFRAMESIZE
> +
> + .align  5
> +1:   lvx vr7,r0,r4
> + lvx vr6,r4,r6
> + lvx vr5,r4,r7
> + lvx vr4,r4,r8
> + lvx vr3,r4,r9
> + lvx vr2,r4,r10
> + lvx vr1,r4,r11
> + lvx vr0,r4,r12
> + addir4,r4,128
> + stvxvr7,r0,r3
> + stvxvr6,r3,r6
> + stvxvr5,r3,r7
> + stvxvr4,r3,r8
> + stvxvr3,r3,r9
> + stvxvr2,r3,r10
> + stvxvr1,r3,r11
> + stvxvr0,r3,r12
> + addir3,r3,128
> + bdnz1b
> +
> + blr
> Index: linux-powerpc/arch/powerpc/lib/Makefile
> ===
> --- linux-powerpc.orig/arch/powerpc/lib/Makefile  2011-05-19 
> 19:57:38.058570608 +1000
> +++ linux-powerpc/arch/powerpc/lib/Makefile   2011-06-17 07:39

RE: powerpc: Add printk companion for ppc_md.progress

2011-06-16 Thread Benjamin Herrenschmidt

(Original mail lost in my email cleanup so this isn't a proper reply)

I'll apply that for now, but I'd very much like somebody to just get rid
of the whole ppc_md.progress business.

We have printk working early enough nowadays (and we can use udbg for
debugging).

It was meant to display magic numbers on the panel of IBM machines, I
don't think it was every useful...

Cheers,
Ben.


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

linux-next: build failure after merge of the final tree (Linus' tree related)

2011-06-16 Thread Stephen Rothwell

Hi all,

After merging the final tree, today's linux-next build (powerpc
allyesconfig) failed like this:

mm/page_cgroup.c: In function 'page_cgroup_init':
mm/page_cgroup.c:309:13: error: 'pg_data_t' has no member named 'node_end_pfn'

Caused by commit 37573e8c7182 ("memcg: fix init_page_cgroup nid with
sparsemem").  On powerpc, node_end_pfn() is defined to be (NODE_DATA
(nid)->node_end_pfn) where NODE_DATA(nid) is (node_data[nid]) and
node_data is struct pglist_data *node_data[].  As far as I can see,
struct pglist_data has never had a member called node_end_pfn.

This commit introduces the only use of node_end_pfn() in the generic
kernel code.  Presumably the powerpc definition needs to be fixed (to
maybe something like the x86 version).  It looks like the sparc version
is broken as well.

I have left the powerpc allyesconfig build broken for today (as it is
also broken by other changes).
-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au
http://www.canb.auug.org.au/~sfr/


pgpWMWvqCdiQy.pgp
Description: PGP signature
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v3 2/2] powerpc: add support for MPIC message register API

2011-06-16 Thread Benjamin Herrenschmidt

On Tue, 2011-05-31 at 14:19 -0500, Meador Inge wrote:
> Some MPIC implementations contain one or more blocks of message registers
> that are used to send messages between cores via IPIs.  A simple API has
> been added to access (get/put, read, write, etc ...) these message registers.
> The available message registers are initially discovered via nodes in the
> device tree.  A separate commit contains a binding for the message register
> nodes.

Ok, so I finally got to look at that in a bit more details...
> +#ifndef _ASM_MPIC_MSGR_H
> +#define _ASM_MPIC_MSGR_H
> +
> +#include 
> +
> +struct mpic_msgr {
> + u32 __iomem *addr;
> + u32 __iomem *mer;
> + u32 __iomem *msr;
> + int irq;
> + atomic_t in_use;
> + int num;
> +};

General comment... I'm really not fan of "msgr", I'd rather see
"mpic_message_*", it's a tad more verbose but looks a lot better, no ?

Also do you need those 3 iomem pointers ? Not just one with fixed
offsets ? Or do they come from vastly different sources ?

atomic_t in_use looks fishy, but let's see how you use it...

> +extern struct mpic_msgr* mpic_msgr_get(unsigned int reg_num);
> +extern void mpic_msgr_put(struct mpic_msgr* msgr);
> +extern void mpic_msgr_enable(struct mpic_msgr *msgr);
> +extern void mpic_msgr_disable(struct mpic_msgr *msgr);
> +extern void mpic_msgr_write(struct mpic_msgr *msgr, u32 message);
> +extern u32 mpic_msgr_read(struct mpic_msgr *msgr);
> +extern void mpic_msgr_clear(struct mpic_msgr *msgr);
> +extern void mpic_msgr_set_destination(struct mpic_msgr *msgr, u32 cpu_num);
> +extern int mpic_msgr_get_irq(struct mpic_msgr *msgr);

Documentation of the API please.

> +#endif
> diff --git a/arch/powerpc/platforms/Kconfig b/arch/powerpc/platforms/Kconfig
> index f7b0772..4d65593 100644
> --- a/arch/powerpc/platforms/Kconfig
> +++ b/arch/powerpc/platforms/Kconfig
> @@ -78,6 +78,14 @@ config MPIC_WEIRD
>   bool
>   default n
>  
> +config MPIC_MSGR
> + bool "MPIC message register support"
> + depends on MPIC
> + default n
> + help
> +   Enables support for the MPIC message registers.  These
> +   registers are used for inter-processor communication.
> +
>  config PPC_I8259
>   bool
>   default n
> diff --git a/arch/powerpc/sysdev/Makefile b/arch/powerpc/sysdev/Makefile
> index 1e0c933..6d40185 100644
> --- a/arch/powerpc/sysdev/Makefile
> +++ b/arch/powerpc/sysdev/Makefile
> @@ -3,7 +3,8 @@ subdir-ccflags-$(CONFIG_PPC_WERROR) := -Werror
>  ccflags-$(CONFIG_PPC64)  := -mno-minimal-toc
>  
>  mpic-msi-obj-$(CONFIG_PCI_MSI)   += mpic_msi.o mpic_u3msi.o 
> mpic_pasemi_msi.o
> -obj-$(CONFIG_MPIC)   += mpic.o $(mpic-msi-obj-y)
> +mpic-msgr-obj-$(CONFIG_MPIC_MSGR)+= mpic_msgr.o
> +obj-$(CONFIG_MPIC)   += mpic.o $(mpic-msi-obj-y) $(mpic-msgr-obj-y)
>  fsl-msi-obj-$(CONFIG_PCI_MSI)+= fsl_msi.o
>  obj-$(CONFIG_PPC_MSI_BITMAP) += msi_bitmap.o
>  
> diff --git a/arch/powerpc/sysdev/mpic_msgr.c b/arch/powerpc/sysdev/mpic_msgr.c
> new file mode 100644
> index 000..bfa0612
> --- /dev/null
> +++ b/arch/powerpc/sysdev/mpic_msgr.c
> @@ -0,0 +1,279 @@
> +/*
> + * Copyright 2011-2012, Meador Inge, Mentor Graphics Corporation.
> + *
> + * Some ideas based on un-pushed work done by Vivek Mahajan, Jason Jin, and
> + * Mingkai Hu from Freescale Semiconductor, Inc.
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License
> + * as published by the Free Software Foundation; version 2 of the
> + * License.
> + *
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#define MPIC_MSGR_REGISTERS_PER_BLOCK 4
> +#define MSGR_INUSE 0
> +#define MSGR_FREE 1
> +
> +/* Internal structure used *only* for IO mapping register blocks. */
> +struct mpic_msgr_block {
> + struct msgr {
> + u32 msgr;
> + u8 res[12];
> + } msgrs[MPIC_MSGR_REGISTERS_PER_BLOCK];
> + u8 res0[192];
> + u32 mer;
> + u8 res1[12];
> + u32 msr;
> +};

So this represent HW registers ? Please make it clear in the comment.
I'm not a terrible fan of using structures to map HW especially with so
few registers.

> +static struct mpic_msgr **mpic_msgrs = 0;
> +static unsigned int mpic_msgr_count = 0;
> +
> +struct mpic_msgr* mpic_msgr_get(unsigned int reg_num)
> +{
> + struct mpic_msgr* msgr;
> +
> + if (reg_num >= mpic_msgr_count)
> + return ERR_PTR(-ENODEV);
> +
> + msgr = mpic_msgrs[reg_num];

No locking on the array access, might be ok if those things are never
plugged in/out I suppose...

> + if (atomic_cmpxchg(&msgr->in_use, MSGR_FREE, MSGR_INUSE) == MSGR_FREE)
> + return msgr;
> +
> + return ERR_PTR(-EBUSY);
> +}
> +EXPORT_SYMBOL(mpic_msgr_get);

So how are those things intended to be used ? Clients get a fixed
"register" number to use ? It looks like this stuff would have been
better off using

Re: [PATCH 1/3] powerpc: POWER7 optimised copy_page using VMX

2011-06-16 Thread Anton Blanchard

Hi,

> Yeah, I'm pretty against CPU_FTR_POWER7.  Every loon is going to
> attach anything POWER7 to it.  
> 
> I'm keen to see it setup in  __setup_cpu_power7.  Either a function
> pointer or use the patch_instruction infrastructure to avoid indirect
> function calls on small copies.  

Instruction patching in __setup_cpu_power7 could work. We might want to
have a nop at the start of the base functions and a label at the start
of the next instruction so we can easily override the base function and
jump back to it if things are too hard (like I do in the
copy_tofrom_user patch).

Anton
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 1/3] powerpc: POWER7 optimised copy_page using VMX

2011-06-16 Thread Michael Neuling

> Implement a POWER7 optimised copy_page using VMX. We copy a cacheline
> at a time using VMX loads and stores.
> 
> Signed-off-by: Anton Blanchard 
> ---
> 
> How do we want to handle per machine optimised functions? I create
> yet another feature bit, but feature bits might get out of control
> at some point.

Yeah, I'm pretty against CPU_FTR_POWER7.  Every loon is going to attach
anything POWER7 to it.  

I'm keen to see it setup in  __setup_cpu_power7.  Either a function
pointer or use the patch_instruction infrastructure to avoid indirect
function calls on small copies.  

Mikey
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 2/2] powerpc/book3e-64: reraise doorbell when masked by soft-irq-disable

2011-06-16 Thread Benjamin Herrenschmidt

On Tue, 2011-05-24 at 06:51 +1000, Benjamin Herrenschmidt wrote:
> On Mon, 2011-05-23 at 15:26 -0500, Scott Wood wrote:
> > On Sat, 21 May 2011 08:32:58 +1000
> > Benjamin Herrenschmidt  wrote:
> > 
> > > On Fri, 2011-05-20 at 14:00 -0500, Scott Wood wrote:
> > > > Signed-off-by: Scott Wood 
> > > > ---
> > > >  arch/powerpc/kernel/exceptions-64e.S |   22 +-
> > > >  1 files changed, 21 insertions(+), 1 deletions(-)
> > > 
> > > You can probably remove the doorbell re-check when enabling interrupts
> > > now, can't you ?
> > 
> > Ah, so that's how it currently gets away without re-raising when the
> > interrupt happens. :-)
> > 
> > I'll remove it.
> 
> Yup, I was too lazy to make a special case in the exception handlers :-)

Are you going to send a re-spin ?

Cheers,
Ben.


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 3/3] powerpc: POWER7 optimised copy_to_user/copy_from_user using VMX

2011-06-16 Thread Anton Blanchard

Implement a POWER7 optimised copy_to_user/copy_from_user using VMX.
For large aligned copies this new loop is over 10% faster, and for
large unaligned copies it is over 200% faster.

If we take a fault we fall back to the old version, this keeps
things relatively simple and easy to verify.

(The detailed comments below are copied from the POWER7 optimised
memcpy patch for completeness).

On POWER7 unaligned stores rarely slow down - they only flush when
a store crosses a 4KB page boundary. Furthermore this flush is
handled completely in hardware and should be 20-30 cycles.

Unaligned loads on the other hand flush much more often - whenever
crossing a 128 byte cache line, or a 32 byte sector if either sector
is an L1 miss.

Considering this information we really want to get the loads aligned
and not worry about the alignment of the stores. Microbenchmarks
confirm that this approach is much faster than the current unaligned
copy loop that uses shifts and rotates to ensure both loads and
stores are aligned.

We also want to try and do the stores in cacheline aligned, cacheline
sized chunks. If the store queue is unable to merge an entire
cacheline of stores then the L2 cache will have to do a
read/modify/write. Even worse, we will serialise this with the stores
in the next iteration of the copy loop since both iterations hit
the same cacheline.

Based on this, the new loop does the following things:


1 - 127 bytes
Get the source 8 byte aligned and use 8 byte loads and stores. Pretty
boring and similar to how the current loop works.

128 - 4095 bytes
Get the source 8 byte aligned and use 8 byte loads and stores,
1 cacheline at a time. We aren't doing the stores in cacheline
aligned chunks so we will potentially serialise once per cacheline.
Even so it is much better than the loop we have today.

4096 - bytes
If both source and destination have the same alignment get them both
16 byte aligned, then get the destination cacheline aligned. Do
cacheline sized loads and stores using VMX.

If source and destination do not have the same alignment, we get the
destination cacheline aligned, and use permute to do aligned loads.

In both cases the VMX loop should be optimal - we always do aligned
loads and stores and are always doing stores in cacheline aligned,
cacheline sized chunks.


The VMX breakpoint of 4096 bytes was chosen using this microbenchmark:

http://ozlabs.org/~anton/junkcode/copy_to_user.c

Since we are using VMX and there is a cost to saving and restoring
the user VMX state there are two broad cases we need to benchmark:

- Best case - userspace never uses VMX

- Worst case - userspace always uses VMX

In reality a userspace process will sit somewhere between these two
extremes. Since we need to test both aligned and unaligned copies we
end up with 4 combinations. The point at which the VMX loop begins to
win is:

0% VMX
aligned 2048 bytes
unaligned   2048 bytes

100% VMX
aligned 16384 bytes
unaligned   8192 bytes

Considering this is a microbenchmark, the data is hot in cache and
the VMX loop has better store queue merging properties we set the
breakpoint to 4096 bytes, a little below the unaligned breakpoints.

Some future optimisations we can look at:

- Looking at the perf data, a significant part of the cost when a task
  is always using VMX is the extra exception we take to restore the
  VMX state. As such we should do something similar to the x86
  optimisation that restores FPU state for heavy users. ie:

/*
 * If the task has used fpu the last 5 timeslices, just do a full
 * restore of the math state immediately to avoid the trap; the
 * chances of needing FPU soon are obviously high now
 */
preload_fpu = tsk_used_math(next_p) && next_p->fpu_counter > 5;

  and 

/*
 * fpu_counter contains the number of consecutive context switches
 * that the FPU is used. If this is over a threshold, the lazy fpu
 * saving becomes unlazy to save the trap. This is an unsigned char
 * so that after 256 times the counter wraps and the behavior turns
 * lazy again; this to deal with bursty apps that only use FPU for
 * a short time
 */

- We could create a paca bit to mirror the VMX enabled MSR bit and check
  that first, avoiding multiple calls to calling enable_kernel_altivec.

- We could have two VMX breakpoints, one for when we know the user VMX
  state is loaded into the registers and one when it isn't. This could
  be a second bit in the paca so we can calculate the break points quickly.

Signed-off-by: Anton Blanchard 
---

Index: linux-powerpc/arch/powerpc/lib/copyuser_64.S
===
--- linux-powerpc.orig/arch/powerpc/lib/copyuser_64.S   2011-06-17 
14:05:30.013020235 +1000
+++ linux-powerpc/arch/powerpc/lib/copyuser_64.S2011-06-17 
14:27:43.026572962 +1000
@@ -11,6 +11,10 @@
 
.align  7
 _GLOBAL(__

[PATCH 2/3] powerpc: POWER7 optimised memcpy using VMX

2011-06-16 Thread Anton Blanchard

Implement a POWER7 optimised memcpy using VMX. For large aligned
copies this new loop is over 10% faster and for large unaligned
copies it is over 200% faster.

On POWER7 unaligned stores rarely slow down - they only flush when
a store crosses a 4KB page boundary. Furthermore this flush is
handled completely in hardware and should be 20-30 cycles.

Unaligned loads on the other hand flush much more often - whenever
crossing a 128 byte cache line, or a 32 byte sector if either sector
is an L1 miss.

Considering this information we really want to get the loads aligned
and not worry about the alignment of the stores. Microbenchmarks
confirm that this approach is much faster than the current unaligned
copy loop that uses shifts and rotates to ensure both loads and
stores are aligned.

We also want to try and do the stores in cacheline aligned, cacheline
sized chunks. If the store queue is unable to merge an entire
cacheline of stores then the L2 cache will have to do a
read/modify/write. Even worse, we will serialise this with the stores
in the next iteration of the copy loop since both iterations hit
the same cacheline.

Based on this, the new loop does the following things:


1 - 127 bytes
Get the source 8 byte aligned and use 8 byte loads and stores. Pretty
boring and similar to how the current loop works.

128 - 4095 bytes
Get the source 8 byte aligned and use 8 byte loads and stores,
1 cacheline at a time. We aren't doing the stores in cacheline
aligned chunks so we will potentially serialise once per cacheline.
Even so it is much better than the loop we have today.

4096 - bytes
If both source and destination have the same alignment get them both
16 byte aligned, then get the destination cacheline aligned. Do
cacheline sized loads and stores using VMX.

If source and destination do not have the same alignment, we get the
destination cacheline aligned, and use permute to do aligned loads.

In both cases the VMX loop should be optimal - we always do aligned
loads and stores and are always doing stores in cacheline aligned,
cacheline sized chunks.


The VMX breakpoint of 4096 bytes was chosen using this microbenchmark:

http://ozlabs.org/~anton/junkcode/copy_to_user.c

(Note that the breakpoint analysis was done with the copy_tofrom_user
version of the loop and using varying sizes and alignments to read(). 
It's much easier to create a benchmark using read() that can control
the size and alignment of a kernel copy loop and synchronise it with
userspace doing optional VMX instructions).

Since we are using VMX and there is a cost to saving and restoring
the user VMX state there are two broad cases we need to benchmark:

- Best case - userspace never uses VMX

- Worst case - userspace always uses VMX

In reality a userspace process will sit somewhere between these two
extremes. Since we need to test both aligned and unaligned copies we
end up with 4 combinations. The point at which the VMX loop begins to
win is:

0% VMX
aligned 2048 bytes
unaligned   2048 bytes

100% VMX
aligned 16384 bytes
unaligned   8192 bytes

Considering this is a microbenchmark, the data is hot in cache and
the VMX loop has better store queue merging properties we set the
breakpoint to 4096 bytes, a little below the unaligned breakpoints.

Some future optimisations we can look at:

- Looking at the perf data, a significant part of the cost when a task
  is always using VMX is the extra exception we take to restore the
  VMX state. As such we should do something similar to the x86
  optimisation that restores FPU state for heavy users. ie:

/*
 * If the task has used fpu the last 5 timeslices, just do a full
 * restore of the math state immediately to avoid the trap; the
 * chances of needing FPU soon are obviously high now
 */
preload_fpu = tsk_used_math(next_p) && next_p->fpu_counter > 5;

  and 

/*
 * fpu_counter contains the number of consecutive context switches
 * that the FPU is used. If this is over a threshold, the lazy fpu
 * saving becomes unlazy to save the trap. This is an unsigned char
 * so that after 256 times the counter wraps and the behavior turns
 * lazy again; this to deal with bursty apps that only use FPU for
 * a short time
 */

- We could create a paca bit to mirror the VMX enabled MSR bit and check
  that first, avoiding multiple calls to calling enable_kernel_altivec.

- We could have two VMX breakpoints, one for when we know the user VMX
  state is loaded into the registers and one when it isn't. This could
  be a second bit in the paca so we can calculate the break points quickly.

Signed-off-by: Anton Blanchard 
---

Index: linux-powerpc/arch/powerpc/lib/Makefile
===
--- linux-powerpc.orig/arch/powerpc/lib/Makefile2011-06-17 
08:38:25.786110167 +1000
+++ linux-powerpc/arch/powerpc/lib/Makefile 2011-0

[PATCH 1/3] powerpc: POWER7 optimised copy_page using VMX

2011-06-16 Thread Anton Blanchard

Implement a POWER7 optimised copy_page using VMX. We copy a cacheline
at a time using VMX loads and stores.

Signed-off-by: Anton Blanchard 
---

How do we want to handle per machine optimised functions? I create
yet another feature bit, but feature bits might get out of control
at some point.

Index: linux-powerpc/arch/powerpc/include/asm/cputable.h
===
--- linux-powerpc.orig/arch/powerpc/include/asm/cputable.h  2011-06-06 
08:07:35.128707749 +1000
+++ linux-powerpc/arch/powerpc/include/asm/cputable.h   2011-06-17 
07:39:58.996165527 +1000
@@ -200,6 +200,7 @@ extern const char *powerpc_base_platform
 #define CPU_FTR_POPCNTB
LONG_ASM_CONST(0x0400)
 #define CPU_FTR_POPCNTD
LONG_ASM_CONST(0x0800)
 #define CPU_FTR_ICSWX  LONG_ASM_CONST(0x1000)
+#define CPU_FTR_POWER7 LONG_ASM_CONST(0x2000)
 
 #ifndef __ASSEMBLY__
 
@@ -423,7 +424,7 @@ extern const char *powerpc_base_platform
CPU_FTR_PURR | CPU_FTR_SPURR | CPU_FTR_REAL_LE | \
CPU_FTR_DSCR | CPU_FTR_SAO  | CPU_FTR_ASYM_SMT | \
CPU_FTR_STCX_CHECKS_ADDRESS | CPU_FTR_POPCNTB | CPU_FTR_POPCNTD | \
-   CPU_FTR_ICSWX | CPU_FTR_CFAR)
+   CPU_FTR_ICSWX | CPU_FTR_CFAR | CPU_FTR_POWER7)
 #define CPU_FTRS_CELL  (CPU_FTR_USE_TB | CPU_FTR_LWSYNC | \
CPU_FTR_PPCAS_ARCH_V2 | CPU_FTR_CTRL | \
CPU_FTR_ALTIVEC_COMP | CPU_FTR_MMCRA | CPU_FTR_SMT | \
Index: linux-powerpc/arch/powerpc/lib/copypage_power7.S
===
--- /dev/null   1970-01-01 00:00:00.0 +
+++ linux-powerpc/arch/powerpc/lib/copypage_power7.S2011-06-17 
07:39:58.996165527 +1000
@@ -0,0 +1,70 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ *
+ * Copyright (C) IBM Corporation, 2011
+ *
+ * Author: Anton Blanchard 
+ */
+#include 
+#include 
+
+#define STACKFRAMESIZE 112
+
+_GLOBAL(copypage_power7)
+   mflrr0
+   std r3,48(r1)
+   std r4,56(r1)
+   std r0,16(r1)
+   stdur1,-STACKFRAMESIZE(r1)
+
+   bl  .enable_kernel_altivec
+
+   ld  r12,STACKFRAMESIZE+16(r1)
+   ld  r4,STACKFRAMESIZE+56(r1)
+   li  r0,(PAGE_SIZE/128)
+   li  r6,16
+   ld  r3,STACKFRAMESIZE+48(r1)
+   li  r7,32
+   li  r8,48
+   mtctr   r0
+   li  r9,64
+   li  r10,80
+   mtlrr12
+   li  r11,96
+   li  r12,112
+   addir1,r1,STACKFRAMESIZE
+
+   .align  5
+1: lvx vr7,r0,r4
+   lvx vr6,r4,r6
+   lvx vr5,r4,r7
+   lvx vr4,r4,r8
+   lvx vr3,r4,r9
+   lvx vr2,r4,r10
+   lvx vr1,r4,r11
+   lvx vr0,r4,r12
+   addir4,r4,128
+   stvxvr7,r0,r3
+   stvxvr6,r3,r6
+   stvxvr5,r3,r7
+   stvxvr4,r3,r8
+   stvxvr3,r3,r9
+   stvxvr2,r3,r10
+   stvxvr1,r3,r11
+   stvxvr0,r3,r12
+   addir3,r3,128
+   bdnz1b
+
+   blr
Index: linux-powerpc/arch/powerpc/lib/Makefile
===
--- linux-powerpc.orig/arch/powerpc/lib/Makefile2011-05-19 
19:57:38.058570608 +1000
+++ linux-powerpc/arch/powerpc/lib/Makefile 2011-06-17 07:39:58.996165527 
+1000
@@ -16,7 +16,8 @@ obj-$(CONFIG_HAS_IOMEM)   += devres.o
 
 obj-$(CONFIG_PPC64)+= copypage_64.o copyuser_64.o \
   memcpy_64.o usercopy_64.o mem_64.o string.o \
-  checksum_wrappers_64.o hweight_64.o
+  checksum_wrappers_64.o hweight_64.o \
+  copypage_power7.o
 obj-$(CONFIG_XMON) += sstep.o ldstfp.o
 obj-$(CONFIG_KPROBES)  += sstep.o ldstfp.o
 obj-$(CONFIG_HAVE_HW_BREAKPOINT)   += sstep.o ldstfp.o
Index: linux-powerpc/arch/powerpc/lib/copypage_64.S
===
--- linux-powerpc.orig/arch/powerpc/lib/copypage_64.S   2011-06-06 
08:07:35.0 +1000
+++ linux-powerpc/arch/powerpc/lib/copypage_64.S2011-06-17 
07:39:58.996165527 +1000
@@ -17,7 +17,11 @@ PPC64_CACHES:

[PATCH 0/3] POWER7 optimised copy loops

2011-06-16 Thread Anton Blanchard

Here are POWER7 optimised versions of copy_page, memcpy and
copy_tofrom_user.

Anton

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFC PATCH V1 5/7] cpuidle: (POWER) cpuidle driver for pSeries

2011-06-16 Thread Benjamin Herrenschmidt

On Tue, 2011-06-07 at 22:00 +0530, Trinabh Gupta wrote:

> +static int snooze_loop(struct cpuidle_device *dev,
> + struct cpuidle_driver *drv,
> + int index)
> +{
> + unsigned long in_purr, out_purr;
> + ktime_t kt_before, kt_after;
> + s64 usec_delta;
> +
> + /*
> +  * Indicate to the HV that we are idle. Now would be
> +  * a good time to find other work to dispatch.
> +  */
> + get_lppaca()->idle = 1;
> + get_lppaca()->donate_dedicated_cpu = 1;
> + in_purr = mfspr(SPRN_PURR);
> +
> + kt_before = ktime_get_real();

Don't you want to timestamp before you tell the HV that you are idle ?
Or is the above stuff only polled by phyp when partition interrupts are
enabled ?

> + local_irq_enable();
> + set_thread_flag(TIF_POLLING_NRFLAG);
> + while (!need_resched()) {
> + ppc64_runlatch_off();
> + HMT_low();
> + HMT_very_low();
> + }
> + HMT_medium();
> + clear_thread_flag(TIF_POLLING_NRFLAG);
> + smp_mb();
> + local_irq_disable();
> +
> + kt_after = ktime_get_real();
> + usec_delta = ktime_to_us(ktime_sub(kt_after, kt_before));
> +
> + out_purr = mfspr(SPRN_PURR);
> + get_lppaca()->wait_state_cycles += out_purr - in_purr;
> + get_lppaca()->donate_dedicated_cpu = 0;
> + get_lppaca()->idle = 0;
> +
> + dev->last_residency = (int)usec_delta;
> +
> + return index;
> +}
> +
> +static int dedicated_cede_loop(struct cpuidle_device *dev,
> + struct cpuidle_driver *drv,
> + int index)
> +{
> + unsigned long in_purr, out_purr;
> + ktime_t kt_before, kt_after;
> + s64 usec_delta;
> +
> + /*
> +  * Indicate to the HV that we are idle. Now would be
> +  * a good time to find other work to dispatch.
> +  */
> + get_lppaca()->idle = 1;
> + get_lppaca()->donate_dedicated_cpu = 1;
> + in_purr = mfspr(SPRN_PURR);
> +
> + kt_before = ktime_get_real();

There's a bit too much code duplication for my taste here between the
two functions. Not sure if it can be helped, maybe with some inlines
for the prolog/epilogue ... Looks like stuff that's easy to "fix" in one
place and forget the other...

> + ppc64_runlatch_off();
> + HMT_medium();
> + cede_processor();
> +
> + kt_after = ktime_get_real();
> + usec_delta = ktime_to_us(ktime_sub(kt_after, kt_before));
> +
> + out_purr = mfspr(SPRN_PURR);
> + get_lppaca()->wait_state_cycles += out_purr - in_purr;
> + get_lppaca()->donate_dedicated_cpu = 0;
> + get_lppaca()->idle = 0;
> +
> + dev->last_residency = (int)usec_delta;
> +
> + return index;
> +}
> +
> +static int shared_cede_loop(struct cpuidle_device *dev,
> + struct cpuidle_driver *drv,
> + int index)
> +{
> + unsigned long in_purr, out_purr;
> + ktime_t kt_before, kt_after;
> + s64 usec_delta;
> +
> + /*
> +  * Indicate to the HV that we are idle. Now would be
> +  * a good time to find other work to dispatch.
> +  */
> + get_lppaca()->idle = 1;
> + get_lppaca()->donate_dedicated_cpu = 1;
> + in_purr = mfspr(SPRN_PURR);
> +
> + kt_before = ktime_get_real();
> + /*
> +  * Yield the processor to the hypervisor.  We return if
> +  * an external interrupt occurs (which are driven prior
> +  * to returning here) or if a prod occurs from another
> +  * processor. When returning here, external interrupts
> +  * are enabled.
> +  */
> + cede_processor();
> +
> + kt_after = ktime_get_real();
> +
> + usec_delta = ktime_to_us(ktime_sub(kt_after, kt_before));
> +
> + out_purr = mfspr(SPRN_PURR);
> + get_lppaca()->wait_state_cycles += out_purr - in_purr;
> + get_lppaca()->donate_dedicated_cpu = 0;
> + get_lppaca()->idle = 0;
> +
> + dev->last_residency = (int)usec_delta;
> +
> + return index;
> +}
> +
> +/*
> + * States for dedicated partition case.
> + */
> +static struct cpuidle_state dedicated_states[MAX_IDLE_STATE_COUNT] = {
> + { /* Snooze */
> + .name = "snooze",
> + .desc = "snooze",
> + .flags = CPUIDLE_FLAG_TIME_VALID,
> + .exit_latency = 0,
> + .target_residency = 0,
> + .enter = &snooze_loop },
> + { /* CEDE */
> + .name = "CEDE",
> + .desc = "CEDE",
> + .flags = CPUIDLE_FLAG_TIME_VALID,
> + .exit_latency = 1,
> + .target_residency = 10,
> + .enter = &dedicated_cede_loop },
> +};
> +
> +/*
> + * States for shared partition case.
> + */
> +static struct cpuidle_state shared_states[MAX_IDLE_STATE_COUNT] = {
> + { /* Shared Cede */
> + .name = "Shared Cede",
> + .desc = "Shared Cede",
> + .flags = CPUIDLE_FLAG_TIME_VALID,
> + .exit_latency = 0,
> + .target_res

Re: [RFC PATCH V1 4/7] cpuidle: (powerpc) Add cpu_idle_wait() to allow switching idle routines

2011-06-16 Thread Benjamin Herrenschmidt

On Tue, 2011-06-07 at 22:00 +0530, Trinabh Gupta wrote:

> diff --git a/arch/powerpc/kernel/idle.c b/arch/powerpc/kernel/idle.c
> index 39a2baa..932392b 100644
> --- a/arch/powerpc/kernel/idle.c
> +++ b/arch/powerpc/kernel/idle.c
> @@ -102,6 +102,24 @@ void cpu_idle(void)
>   }
>  }
>  
> +static void do_nothing(void *unused)
> +{
> +}
> +
> +/*
> + * cpu_idle_wait - Used to ensure that all the CPUs come out of the old
> + * idle loop and start using the new idle loop.
> + * Required while changing idle handler on SMP systems.
> + * Caller must have changed idle handler to the new value before the call.
> + */
> +void cpu_idle_wait(void)
> +{
> + smp_mb();
> + /* kick all the CPUs so that they exit out of old idle routine */
> + smp_call_function(do_nothing, NULL, 1);
> +}
> +EXPORT_SYMBOL_GPL(cpu_idle_wait);
> +
>  int powersave_nap;
>  
>  #ifdef CONFIG_SYSCTL

This is gross :-)

Do you need to absolutely ensure the idle task has changed or just
kicking it with a send reschedule is enough ?

Cheers,
Ben.


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFC PATCH V1 1/7] cpuidle: create bootparam "cpuidle.off=1"

2011-06-16 Thread Benjamin Herrenschmidt

On Tue, 2011-06-07 at 21:59 +0530, Trinabh Gupta wrote:
> From: Len Brown 
> 
> useful for disabling cpuidle to fall back
> to architecture-default idle loop
> 
> cpuidle drivers and governors will fail to register.
> on x86 they'll say so:
> 
> intel_idle: intel_idle yielding to (null)
> ACPI: acpi_idle yielding to (null)
> 
> Signed-off-by: Len Brown 
> ---

When you carry over somebody's patch like this you need to also add your
own signed-off-by.

Have those generic changes been reviewed by whoever is in charge of that
cpuidle framework ?

Cheers,
Ben.

>  Documentation/kernel-parameters.txt |3 +++
>  drivers/cpuidle/cpuidle.c   |   10 ++
>  drivers/cpuidle/cpuidle.h   |1 +
>  drivers/cpuidle/driver.c|3 +++
>  drivers/cpuidle/governor.c  |3 +++
>  5 files changed, 20 insertions(+), 0 deletions(-)
> 
> diff --git a/Documentation/kernel-parameters.txt 
> b/Documentation/kernel-parameters.txt
> index d9a203b..5697faf 100644
> --- a/Documentation/kernel-parameters.txt
> +++ b/Documentation/kernel-parameters.txt
> @@ -546,6 +546,9 @@ bytes respectively. Such letter suffixes can also be 
> entirely omitted.
>   /proc//coredump_filter.
>   See also Documentation/filesystems/proc.txt.
>  
> + cpuidle.off=1   [CPU_IDLE]
> + disable the cpuidle sub-system
> +
>   cpcihp_generic= [HW,PCI] Generic port I/O CompactPCI driver
>   Format:
>   ,,,[,]
> diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
> index 406be83..a171b9e 100644
> --- a/drivers/cpuidle/cpuidle.c
> +++ b/drivers/cpuidle/cpuidle.c
> @@ -28,6 +28,12 @@ LIST_HEAD(cpuidle_detected_devices);
>  static void (*pm_idle_old)(void);
>  
>  static int enabled_devices;
> +static int off __read_mostly;
> +
> +int cpuidle_disabled(void)
> +{
> + return off;
> +}
>  
>  #if defined(CONFIG_ARCH_HAS_CPU_IDLE_WAIT)
>  static void cpuidle_kick_cpus(void)
> @@ -397,6 +403,9 @@ static int __init cpuidle_init(void)
>  {
>   int ret;
>  
> + if (cpuidle_disabled())
> + return -ENODEV;
> +
>   pm_idle_old = pm_idle;
>  
>   ret = cpuidle_add_class_sysfs(&cpu_sysdev_class);
> @@ -408,4 +417,5 @@ static int __init cpuidle_init(void)
>   return 0;
>  }
>  
> +module_param(off, int, 0444);
>  core_initcall(cpuidle_init);
> diff --git a/drivers/cpuidle/cpuidle.h b/drivers/cpuidle/cpuidle.h
> index 33e50d5..38c3fd8 100644
> --- a/drivers/cpuidle/cpuidle.h
> +++ b/drivers/cpuidle/cpuidle.h
> @@ -13,6 +13,7 @@ extern struct list_head cpuidle_governors;
>  extern struct list_head cpuidle_detected_devices;
>  extern struct mutex cpuidle_lock;
>  extern spinlock_t cpuidle_driver_lock;
> +extern int cpuidle_disabled(void);
>  
>  /* idle loop */
>  extern void cpuidle_install_idle_handler(void);
> diff --git a/drivers/cpuidle/driver.c b/drivers/cpuidle/driver.c
> index 33e3189..284d7af 100644
> --- a/drivers/cpuidle/driver.c
> +++ b/drivers/cpuidle/driver.c
> @@ -50,6 +50,9 @@ int cpuidle_register_driver(struct cpuidle_driver *drv)
>   if (!drv)
>   return -EINVAL;
>  
> + if (cpuidle_disabled())
> + return -ENODEV;
> +
>   spin_lock(&cpuidle_driver_lock);
>   if (cpuidle_curr_driver) {
>   spin_unlock(&cpuidle_driver_lock);
> diff --git a/drivers/cpuidle/governor.c b/drivers/cpuidle/governor.c
> index 724c164..ea2f8e7 100644
> --- a/drivers/cpuidle/governor.c
> +++ b/drivers/cpuidle/governor.c
> @@ -81,6 +81,9 @@ int cpuidle_register_governor(struct cpuidle_governor *gov)
>   if (!gov || !gov->select)
>   return -EINVAL;
>  
> + if (cpuidle_disabled())
> + return -ENODEV;
> +
>   mutex_lock(&cpuidle_lock);
>   if (__cpuidle_find_governor(gov->name) == NULL) {
>   ret = 0;
> 
> ___
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] perf_events: Enable idle state tracing for pseries (ppc64)

2011-06-16 Thread Benjamin Herrenschmidt

On Wed, 2011-06-01 at 18:05 +0530, Deepthi Dharwar wrote:
> Hi,
> 
> Please find below a patch, which has perf_events added for pseries (ppc64)
> platform in order to emit the trace required for perf timechart. 
> It essentially enables perf timechart for pseries platfrom to analyse
> power savings events like cpuidle states.

Unless I'm mistaken, you added traces to dedicated CPU idle sleep but
not shared processor. Any reason ?

Also I don't really know that tracing stuff but what's the point of
having start/end _and trace_cpu_idle if you're going to always start &
end around a single occurence of trace_cpu_idle ?

Wouldn't there be a way to start/end and then trace the snooze and
subsequent cede within the same start/end section or that makes no
sense ?

Also would there be any interest in doing the tracing more generically
in idle.c ?

Cheers,
Ben.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] Add cpufreq driver for Momentum Maple boards

2011-06-16 Thread Benjamin Herrenschmidt

On Sat, 2011-05-21 at 14:28 +0400, Dmitry Eremin-Solenikov wrote:
> Add simple cpufreq driver for Maple-based boards (ppc970fx evaluation
> kit and others). Driver is based on a cpufreq driver for 64-bit powermac
> boxes with all pmac-dependant features removed and simple cleanup
> applied.

No special comment other than please replace all the g5_* with maple_
for consistency.

Cheers,
Ben.

> Signed-off-by: Dmitry Eremin-Solenikov 
> ---
>  arch/powerpc/kernel/misc_64.S  |4 +-
>  arch/powerpc/platforms/Kconfig |8 +
>  arch/powerpc/platforms/maple/Makefile  |1 +
>  arch/powerpc/platforms/maple/cpufreq.c |  317 
> 
>  4 files changed, 328 insertions(+), 2 deletions(-)
>  create mode 100644 arch/powerpc/platforms/maple/cpufreq.c
> 
> diff --git a/arch/powerpc/kernel/misc_64.S b/arch/powerpc/kernel/misc_64.S
> index 206a321..c442aae 100644
> --- a/arch/powerpc/kernel/misc_64.S
> +++ b/arch/powerpc/kernel/misc_64.S
> @@ -339,7 +339,7 @@ _GLOBAL(real_205_writeb)
>  #endif /* CONFIG_PPC_PASEMI */
>  
> 
> -#ifdef CONFIG_CPU_FREQ_PMAC64
> +#if defined(CONFIG_CPU_FREQ_PMAC64) || defined(CONFIG_CPU_FREQ_MAPLE)
>  /*
>   * SCOM access functions for 970 (FX only for now)
>   *
> @@ -408,7 +408,7 @@ _GLOBAL(scom970_write)
>   /* restore interrupts */
>   mtmsrd  r5,1
>   blr
> -#endif /* CONFIG_CPU_FREQ_PMAC64 */
> +#endif /* CONFIG_CPU_FREQ_PMAC64 || CONFIG_CPU_FREQ_MAPLE */
>  
> 
>  /*
> diff --git a/arch/powerpc/platforms/Kconfig b/arch/powerpc/platforms/Kconfig
> index f7b0772..4c5eb5b 100644
> --- a/arch/powerpc/platforms/Kconfig
> +++ b/arch/powerpc/platforms/Kconfig
> @@ -187,6 +187,14 @@ config PPC_PASEMI_CPUFREQ
> This adds the support for frequency switching on PA Semi
> PWRficient processors.
>  
> +config CPU_FREQ_MAPLE
> + bool "Support for Maple 970FX Evaluation Board"
> + depends on PPC_MAPLE
> + select CPU_FREQ_TABLE
> + help
> +   This adds support for frequency switching on Maple 970FX
> +   Evaluation Board and compatible boards (IBM JS2x blades).
> +
>  endmenu
>  
>  config PPC601_SYNC_FIX
> diff --git a/arch/powerpc/platforms/maple/Makefile 
> b/arch/powerpc/platforms/maple/Makefile
> index 1be1a99..0b3e3e3 100644
> --- a/arch/powerpc/platforms/maple/Makefile
> +++ b/arch/powerpc/platforms/maple/Makefile
> @@ -1 +1,2 @@
>  obj-y+= setup.o pci.o time.o
> +obj-$(CONFIG_CPU_FREQ_MAPLE) += cpufreq.o
> diff --git a/arch/powerpc/platforms/maple/cpufreq.c 
> b/arch/powerpc/platforms/maple/cpufreq.c
> new file mode 100644
> index 000..854adfa
> --- /dev/null
> +++ b/arch/powerpc/platforms/maple/cpufreq.c
> @@ -0,0 +1,317 @@
> +/*
> + *  Copyright (C) 2011 Dmitry Eremin-Solenikov
> + *  Copyright (C) 2002 - 2005 Benjamin Herrenschmidt 
> 
> + *  and   Markus Demleitner 
> 
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + *
> + * This driver adds basic cpufreq support for SMU & 970FX based G5 Macs,
> + * that is iMac G5 and latest single CPU desktop.
> + */
> +
> +#undef DEBUG
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#define DBG(fmt...) pr_debug(fmt)
> +
> +/* see 970FX user manual */
> +
> +#define SCOM_PCR 0x0aa001/* PCR scom addr */
> +
> +#define PCR_HILO_SELECT  0x8000U /* 1 = PCR, 0 = PCRH */
> +#define PCR_SPEED_FULL   0xU /* 1:1 speed value */
> +#define PCR_SPEED_HALF   0x0002U /* 1:2 speed value */
> +#define PCR_SPEED_QUARTER0x0004U /* 1:4 speed value */
> +#define PCR_SPEED_MASK   0x000eU /* speed mask */
> +#define PCR_SPEED_SHIFT  17
> +#define PCR_FREQ_REQ_VALID   0x0001U /* freq request valid */
> +#define PCR_VOLT_REQ_VALID   0x8000U /* volt request valid */
> +#define PCR_TARGET_TIME_MASK 0x6000U /* target time */
> +#define PCR_STATLAT_MASK 0x1f00U /* STATLAT value */
> +#define PCR_SNOOPLAT_MASK0x00f0U /* SNOOPLAT value */
> +#define PCR_SNOOPACC_MASK0x000fU /* SNOOPACC value */
> +
> +#define SCOM_PSR 0x408001/* PSR scom addr */
> +/* warning: PSR is a 64 bits register */
> +#define PSR_CMD_RECEIVED 0x2000U   /* command received */
> +#define PSR_CMD_COMPLETED0x1000U   /* command completed */
> +#define PSR_CUR_SPEED_MASK   0x0300U   /* current speed */
> +#define PSR_CUR_SPEED_SHIFT  (56)
> +
> +/*
> + * The G5 only supports two frequencies (Quarter speed is not supported)
> + */
> +#define CPUFREQ_HIGH  0
> +#define CPUFREQ_LOW   1
>

Re: [PATCH] powerpc/book3e-64: use a separate TLB handler when linear map is bolted

2011-06-16 Thread Benjamin Herrenschmidt

On Fri, 2011-06-03 at 17:12 -0500, Scott Wood wrote:
> On MMUs such as FSL where we can guarantee the entire linear mapping is
> bolted, we don't need to worry about linear TLB misses.  If on top of
> that we do a full table walk, we get rid of all recursive TLB faults, and
> can dispense with some state saving.  This gains a few percent on
> TLB-miss-heavy workloads, and around 50% on a benchmark that had a high
> rate of virtual page table faults under the normal handler.
> 
> While touching the EX_TLB layout, remove EX_TLB_MMUCR0, EX_TLB_SRR0, and
> EX_TLB_SRR1 as they're not used.
> 
> Signed-off-by: Scott Wood 
> ---
> This turned out to be a little faster than the virtual pmd approach
> on the sort benchmark as well as lmbench's lat_mem_rd with page stride.
> 
> It's slightly slower than virtual pmd (around 1%), but still faster than
> current code, on linear tests such as lmbench's bw_mem cp.

Does this completely replace your previous series of 7 patches ? (IE.
Should I ditch them in patchwork ?) Or does it apply on top of them ?

Some comments inline...

>  #define SET_IVOR(vector_number, vector_offset)   \
> diff --git a/arch/powerpc/include/asm/mmu_context.h 
> b/arch/powerpc/include/asm/mmu_context.h
> index a73668a..9d9e444 100644
> --- a/arch/powerpc/include/asm/mmu_context.h
> +++ b/arch/powerpc/include/asm/mmu_context.h
> @@ -54,6 +54,7 @@ static inline void switch_mm(struct mm_struct *prev, struct 
> mm_struct *next,
>   /* 64-bit Book3E keeps track of current PGD in the PACA */
>  #ifdef CONFIG_PPC_BOOK3E_64
>   get_paca()->pgd = next->pgd;
> + get_paca()->extlb[0][EX_TLB_PGD / 8] = (unsigned long)next->pgd;
>  #endif
>   /* Nothing else to do if we aren't actually switching */
>   if (prev == next)
> @@ -110,6 +111,7 @@ static inline void enter_lazy_tlb(struct mm_struct *mm,
>   /* 64-bit Book3E keeps track of current PGD in the PACA */
>  #ifdef CONFIG_PPC_BOOK3E_64
>   get_paca()->pgd = NULL;
> + get_paca()->extlb[0][EX_TLB_PGD / 8] = 0;
>  #endif
>  }

Why do you keep a copy of the pgd there since it's in the PACA already
and you have r13 setup in your handlers ?

> diff --git a/arch/powerpc/mm/tlb_low_64e.S b/arch/powerpc/mm/tlb_low_64e.S
> index af08922..0f4ab86 100644
> --- a/arch/powerpc/mm/tlb_low_64e.S
> +++ b/arch/powerpc/mm/tlb_low_64e.S
> @@ -30,6 +30,212 @@
>  #define VPTE_PGD_SHIFT   (VPTE_PUD_SHIFT + PUD_INDEX_SIZE)
>  #define VPTE_INDEX_SIZE (VPTE_PGD_SHIFT + PGD_INDEX_SIZE)
>  
> +/**
> + **
> + * TLB miss handling for Book3E with a bolted linear mapping  *
> + * No virtual page table, no nested TLB misses*
> + **
> + **/
> +
> +.macro tlb_prolog_bolted addr
> + mtspr   SPRN_SPRG_TLB_SCRATCH,r13
> + mfspr   r13,SPRN_SPRG_PACA
> + std r10,PACA_EXTLB+EX_TLB_R10(r13)
> + mfcrr10
> + std r11,PACA_EXTLB+EX_TLB_R11(r13)
> + mfspr   r11,SPRN_SPRG_TLB_SCRATCH

Do you need that ? Can't you leave r13 in scratch the whole way and
just pop it out in the error case when branching to DSI/ISI ? The only
thing is that TLB_SCRATCH needs to be saved/restored by
crit/debug/mcheck but thats worth saving cycles in the TLB miss handler
no ?

> + std r16,PACA_EXTLB+EX_TLB_R16(r13)
> + mfspr   r16,\addr   /* get faulting address */
> + std r14,PACA_EXTLB+EX_TLB_R14(r13)
> + ld  r14,PACA_EXTLB+EX_TLB_PGD(r13)

Why not get PGD from paca ?

> + std r15,PACA_EXTLB+EX_TLB_R15(r13)
> + std r10,PACA_EXTLB+EX_TLB_CR(r13)
> + std r11,PACA_EXTLB+EX_TLB_R13(r13)
> + TLB_MISS_PROLOG_STATS_BOLTED
> +.endm
> +
> +.macro tlb_epilog_bolted
> + ld  r14,PACA_EXTLB+EX_TLB_CR(r13)
> + ld  r10,PACA_EXTLB+EX_TLB_R10(r13)
> + ld  r11,PACA_EXTLB+EX_TLB_R11(r13)
> + mtcrr14
> + ld  r14,PACA_EXTLB+EX_TLB_R14(r13)
> + ld  r15,PACA_EXTLB+EX_TLB_R15(r13)
> + TLB_MISS_RESTORE_STATS_BOLTED
> + ld  r16,PACA_EXTLB+EX_TLB_R16(r13)
> + ld  r13,PACA_EXTLB+EX_TLB_R13(r13)
> +.endm
> +
> +/* Data TLB miss */
> + START_EXCEPTION(data_tlb_miss_bolted)
> + tlb_prolog_bolted SPRN_DEAR
> +
> + /* We need _PAGE_PRESENT and  _PAGE_ACCESSED set */
> +
> + /* We do the user/kernel test for the PID here along with the RW test
> +  */
> + /* We pre-test some combination of permissions to avoid double
> +  * faults:
> +  *
> +  * We move the ESR:ST bit into the position of _PAGE_BAP_SW in the PTE
> +  * ESR_ST   is 0x0080
> +  * _PAGE_BAP_SW is 0x0010
> +  * So the shift is >> 19. This tests for supervisor writeability.
> +  * If the page happens to be supervisor writeable and n

[PATCH 4/5] powerpc/pseries: Re-implement HVSI as part of hvc_vio

2011-06-16 Thread Benjamin Herrenschmidt

On pseries machines, consoles are provided by the hypervisor using
a low level get_chars/put_chars type interface. However, this is
really just a transport to the service processor which implements
them either as "raw" console (networked consoles, HMC, ...) or as
"hvsi" serial ports.

The later is a simple packet protocol on top of the raw character
interface that is supposed to convey additional "serial port" style
semantics. In practice however, all it does is provide a way to
read the CD line and set/clear our DTR line, that's it.

We currently implement the "raw" protocol as an hvc console backend
(/dev/hvcN) and the "hvsi" protocol using a separate tty driver
(/dev/hvsi0).

However this is quite impractical. The arbitrary difference between
the two type of devices has been a major source of user (and distro)
confusion. Additionally, there's an additional mini -hvsi implementation
in the pseries platform code for our low level debug console and early
boot kernel messages, which means code duplication, though that low
level variant is impractical as it's incapable of doing the initial
protocol negociation to establish the link to the FSP.

This essentially replaces the dedicated hvsi driver and the platform
udbg code completely by extending the existing hvc_vio backend used
in "raw" mode so that:

 - It now supports HVSI as well
 - We add support for hvc backend providing tiocm{get,set}
 - It also provides a udbg interface for early debug and boot console

This is overall less code, though this will only be obvious once we
remove the old "hvsi" driver, which is still available for now. When
the old driver is enabled, the new code still kicks in for the low
level udbg console, replacing the old mini implementation in the
platform
code, it just doesn't provide the higher level "hvc" interface.

In addition to producing generally simler code, this has several
benefits
over our current situation:

 - The user/distro only has to deal with /dev/hvcN for the hypervisor
console, avoiding all sort of confusion that has plagued us in the past

 - The tty, kernel and low level debug console all use the same code
base which supports the full protocol establishment process, thus the
console is now available much earlier than it used to be with the
old HVSI driver. The kernel console works much earlier and udbg is
available much earlier too. Hackers can enable a hard coded very-early
debug console as well that works with HVSI (previously that was only
supported for the "raw" mode).

I've tried to keep the same semantics as hvsi relative to how I react
to things like CD changes, with some subtle differences though:

 - I clear DTR on close if HUPCL is set

 - Current hvsi triggers a hangup if it detects a up->down transition
   on CD (you can still open a console with CD down). My new
implementation
   triggers a hangup if the link to the FSP is severed, and severs it
upon
   detecting a up->down transition on CD.

Signed-off-by: Benjamin Herrenschmidt 
---
 arch/powerpc/Kconfig.debug   |   15 +
 arch/powerpc/include/asm/udbg.h  |1 +
 arch/powerpc/kernel/udbg.c   |3 +
 arch/powerpc/platforms/pseries/lpar.c|  189 
 arch/powerpc/platforms/pseries/pseries.h |3 +-
 arch/powerpc/platforms/pseries/setup.c   |5 +-
 drivers/tty/hvc/Kconfig  |5 +
 drivers/tty/hvc/Makefile |3 +-
 drivers/tty/hvc/hvc_console.c|   23 +-
 drivers/tty/hvc/hvc_console.h|4 +
 drivers/tty/hvc/hvc_vio.c|  725
--
 11 files changed, 749 insertions(+), 227 deletions(-)

diff --git a/arch/powerpc/Kconfig.debug b/arch/powerpc/Kconfig.debug
index e72dcf6..067cb84 100644
--- a/arch/powerpc/Kconfig.debug
+++ b/arch/powerpc/Kconfig.debug
@@ -167,6 +167,13 @@ config PPC_EARLY_DEBUG_LPAR
  Select this to enable early debugging for a machine with a HVC
  console on vterm 0.
 
+config PPC_EARLY_DEBUG_LPAR_HVSI
+   bool "LPAR HVSI Console"
+   depends on PPC_PSERIES
+   help
+ Select this to enable early debugging for a machine with a HVSI
+ console on a specified vterm.
+
 config PPC_EARLY_DEBUG_G5
bool "Apple G5"
depends on PPC_PMAC64
@@ -253,6 +260,14 @@ config PPC_EARLY_DEBUG_WSP
 
 endchoice
 
+config PPC_EARLY_DEBUG_HVSI_VTERMNO
+   hex "vterm number to use with early debug HVSI"
+   depends on PPC_EARLY_DEBUG_LPAR_HVSI
+   default "0x3000"
+   help
+ You probably want 0x3000 for your first serial port and
+ 0x3001 for your second one
+
 config PPC_EARLY_DEBUG_44x_PHYSLOW
hex "Low 32 bits of early debug UART physical address"
depends on PPC_EARLY_DEBUG_44x
diff --git a/arch/powerpc/include/asm/udbg.h
b/arch/powerpc/include/asm/udbg.h
index 58580e9..93e05d1 100644
--- a/arch/powerpc/include/asm/udbg.h
+++ b/arch/powerpc/include/asm/udbg.h
@@ -40,6 +40,7 @@ extern void udbg_a

[PATCH 3/5] powerpc/udbg: Register udbg console generically

2011-06-16 Thread Benjamin Herrenschmidt

When CONFIG_PPC_EARLY_DEBUG is set, call register_early_udbg_console()
early from generic code.

Signed-off-by: Benjamin Herrenschmidt 
---
 arch/powerpc/kernel/udbg.c|2 ++
 arch/powerpc/platforms/pseries/lpar.c |2 --
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/udbg.c b/arch/powerpc/kernel/udbg.c
index 23d65ab..a57e61e 100644
--- a/arch/powerpc/kernel/udbg.c
+++ b/arch/powerpc/kernel/udbg.c
@@ -68,6 +68,8 @@ void __init udbg_early_init(void)
 
 #ifdef CONFIG_PPC_EARLY_DEBUG
console_loglevel = 10;
+
+   register_early_udbg_console();
 #endif
 }
 
diff --git a/arch/powerpc/platforms/pseries/lpar.c 
b/arch/powerpc/platforms/pseries/lpar.c
index 39e6e0a..e3a96c4 100644
--- a/arch/powerpc/platforms/pseries/lpar.c
+++ b/arch/powerpc/platforms/pseries/lpar.c
@@ -193,8 +193,6 @@ void __init udbg_init_debug_lpar(void)
udbg_putc = udbg_putcLP;
udbg_getc = udbg_getcLP;
udbg_getc_poll = udbg_getc_pollLP;
-
-   register_early_udbg_console();
 }
 
 /* returns 0 if couldn't find or use /chosen/stdout as console */


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 5/5] powerpc/pseries: Move hvsi support into a library

2011-06-16 Thread Benjamin Herrenschmidt

This will allow a different backend to share it

Signed-off-by: Benjamin Herrenschmidt 
---
 arch/powerpc/include/asm/hvsi.h |   34 +++
 drivers/tty/hvc/Makefile|2 +-
 drivers/tty/hvc/hvc_vio.c   |  405 ++---
 drivers/tty/hvc/hvsi_lib.c  |  426 +++
 4 files changed, 482 insertions(+), 385 deletions(-)
 create mode 100644 drivers/tty/hvc/hvsi_lib.c

diff --git a/arch/powerpc/include/asm/hvsi.h b/arch/powerpc/include/asm/hvsi.h
index ab2ddd7..91e0453 100644
--- a/arch/powerpc/include/asm/hvsi.h
+++ b/arch/powerpc/include/asm/hvsi.h
@@ -56,5 +56,39 @@ struct hvsi_query_response {
} u;
 } __attribute__((packed));
 
+/* hvsi lib struct definitions */
+#define HVSI_INBUF_SIZE255
+struct tty_struct;
+struct hvsi_priv {
+   unsigned intinbuf_len;  /* data in input buffer */
+   unsigned char   inbuf[HVSI_INBUF_SIZE];
+   unsigned intinbuf_cur;  /* Cursor in input buffer */
+   unsigned intinbuf_pktlen;   /* packet lenght from cursor */
+   atomic_tseqno;  /* packet sequence number */
+   unsigned intopened:1;   /* driver opened */
+   unsigned intestablished:1;  /* protocol established */
+   unsigned intis_console:1;   /* used as a kernel console device */
+   unsigned intmctrl_update:1; /* modem control updated */
+   unsigned short  mctrl;  /* modem control */
+   struct tty_struct *tty; /* tty structure */
+   int (*get_chars)(uint32_t termno, char *buf, int count);
+   int (*put_chars)(uint32_t termno, const char *buf, int count);
+   uint32_ttermno;
+};
+
+/* hvsi lib functions */
+struct hvc_struct;
+extern void hvsi_init(struct hvsi_priv *pv,
+ int (*get_chars)(uint32_t termno, char *buf, int count),
+ int (*put_chars)(uint32_t termno, const char *buf,
+  int count),
+ int termno, int is_console);
+extern int hvsi_open(struct hvsi_priv *pv, struct hvc_struct *hp);
+extern void hvsi_close(struct hvsi_priv *pv, struct hvc_struct *hp);
+extern int hvsi_read_mctrl(struct hvsi_priv *pv);
+extern int hvsi_write_mctrl(struct hvsi_priv *pv, int dtr);
+extern void hvsi_establish(struct hvsi_priv *pv);
+extern int hvsi_get_chars(struct hvsi_priv *pv, char *buf, int count);
+extern int hvsi_put_chars(struct hvsi_priv *pv, const char *buf, int count);
 
 #endif /* _HVSI_H */
diff --git a/drivers/tty/hvc/Makefile b/drivers/tty/hvc/Makefile
index 69a444b..e292053 100644
--- a/drivers/tty/hvc/Makefile
+++ b/drivers/tty/hvc/Makefile
@@ -1,4 +1,4 @@
-obj-$(CONFIG_HVC_CONSOLE)  += hvc_vio.o
+obj-$(CONFIG_HVC_CONSOLE)  += hvc_vio.o hvsi_lib.o
 obj-$(CONFIG_HVC_OLD_HVSI) += hvsi.o
 obj-$(CONFIG_HVC_ISERIES)  += hvc_iseries.o
 obj-$(CONFIG_HVC_RTAS) += hvc_rtas.o
diff --git a/drivers/tty/hvc/hvc_vio.c b/drivers/tty/hvc/hvc_vio.c
index d4e0850..ade73fa 100644
--- a/drivers/tty/hvc/hvc_vio.c
+++ b/drivers/tty/hvc/hvc_vio.c
@@ -67,22 +67,10 @@ typedef enum hv_protocol {
HV_PROTOCOL_HVSI
 } hv_protocol_t;
 
-#define HV_INBUF_SIZE  255
-
 struct hvterm_priv {
-   u32 termno; /* HV term number */
-   hv_protocol_t   proto;  /* Raw data or HVSI packets */
-   unsigned intinbuf_len;  /* Data in input buffer */
-   unsigned char   inbuf[HV_INBUF_SIZE];
-   unsigned intinbuf_cur;  /* Cursor in input buffer */
-   unsigned intinbuf_pktlen;   /* HVSI packet lenght from cursor */
-   atomic_tseqno;  /* HVSI packet sequence number */
-   unsigned intopened:1;   /* HVSI driver opened */
-   unsigned intestablished:1;  /* HVSI protocol established */
-   unsigned intis_console:1;   /* Used as a kernel console device */
-   unsigned intmctrl_update:1; /* HVSI modem control updated */
-   unsigned short  mctrl;  /* HVSI modem control */
-   struct tty_struct *tty; /* TTY structure */
+   u32 termno; /* HV term number */
+   hv_protocol_t   proto;  /* Raw data or HVSI packets */
+   struct hvsi_privhvsi;   /* HVSI specific data */
 };
 static struct hvterm_priv *hvterm_privs[MAX_NR_HVC_CONSOLES];
 
@@ -139,348 +127,24 @@ static const struct hv_ops hvterm_raw_ops = {
.notifier_hangup = notifier_hangup_irq,
 };
 
-static int hvterm_hvsi_send_packet(struct hvterm_priv *pv, struct hvsi_header 
*packet)
-{
-   packet->seqno = atomic_inc_return(&pv->seqno);
-
-   /* Assumes that always succeeds, works in practice */
-   return hvc_put_chars(pv->termno, (char *)packet, packet->len);
-}
-
-static void hvterm_hvsi_start_handshake(struct hvterm_priv *pv)
-{
-   struct hvsi_query q;
-
-   /* Reset state */
-   pv->established = 0;
-   atomic_set(&p

[PATCH 2/5] powerpc/pseries: Factor HVSI header struct in packet definitions

2011-06-16 Thread Benjamin Herrenschmidt

Embed the struct hvsi_header in the various packet definitions
rather than open coding it multiple times. Will help provide
stronger type checking.

Signed-off-by: Benjamin Herrenschmidt 
---
 arch/powerpc/include/asm/hvsi.h |   16 ++---
 drivers/tty/hvc/hvsi.c  |   66 +++---
 2 files changed, 37 insertions(+), 45 deletions(-)

diff --git a/arch/powerpc/include/asm/hvsi.h b/arch/powerpc/include/asm/hvsi.h
index f13125a..ab2ddd7 100644
--- a/arch/powerpc/include/asm/hvsi.h
+++ b/arch/powerpc/include/asm/hvsi.h
@@ -29,16 +29,12 @@ struct hvsi_header {
 } __attribute__((packed));
 
 struct hvsi_data {
-   uint8_t  type;
-   uint8_t  len;
-   uint16_t seqno;
+   struct hvsi_header hdr;
uint8_t  data[HVSI_MAX_OUTGOING_DATA];
 } __attribute__((packed));
 
 struct hvsi_control {
-   uint8_t  type;
-   uint8_t  len;
-   uint16_t seqno;
+   struct hvsi_header hdr;
uint16_t verb;
/* optional depending on verb: */
uint32_t word;
@@ -46,16 +42,12 @@ struct hvsi_control {
 } __attribute__((packed));
 
 struct hvsi_query {
-   uint8_t  type;
-   uint8_t  len;
-   uint16_t seqno;
+   struct hvsi_header hdr;
uint16_t verb;
 } __attribute__((packed));
 
 struct hvsi_query_response {
-   uint8_t  type;
-   uint8_t  len;
-   uint16_t seqno;
+   struct hvsi_header hdr;
uint16_t verb;
uint16_t query_seqno;
union {
diff --git a/drivers/tty/hvc/hvsi.c b/drivers/tty/hvc/hvsi.c
index 0b35793..c94e2f5 100644
--- a/drivers/tty/hvc/hvsi.c
+++ b/drivers/tty/hvc/hvsi.c
@@ -295,18 +295,18 @@ static int hvsi_version_respond(struct hvsi_struct *hp, 
uint16_t query_seqno)
struct hvsi_query_response packet __ALIGNED__;
int wrote;
 
-   packet.type = VS_QUERY_RESPONSE_PACKET_HEADER;
-   packet.len = sizeof(struct hvsi_query_response);
-   packet.seqno = atomic_inc_return(&hp->seqno);
+   packet.hdr.type = VS_QUERY_RESPONSE_PACKET_HEADER;
+   packet.hdr.len = sizeof(struct hvsi_query_response);
+   packet.hdr.seqno = atomic_inc_return(&hp->seqno);
packet.verb = VSV_SEND_VERSION_NUMBER;
packet.u.version = HVSI_VERSION;
packet.query_seqno = query_seqno+1;
 
-   pr_debug("%s: sending %i bytes\n", __func__, packet.len);
-   dbg_dump_hex((uint8_t*)&packet, packet.len);
+   pr_debug("%s: sending %i bytes\n", __func__, packet.hdr.len);
+   dbg_dump_hex((uint8_t*)&packet, packet.hdr.len);
 
-   wrote = hvc_put_chars(hp->vtermno, (char *)&packet, packet.len);
-   if (wrote != packet.len) {
+   wrote = hvc_put_chars(hp->vtermno, (char *)&packet, packet.hdr.len);
+   if (wrote != packet.hdr.len) {
printk(KERN_ERR "hvsi%i: couldn't send query response!\n",
hp->index);
return -EIO;
@@ -321,7 +321,7 @@ static void hvsi_recv_query(struct hvsi_struct *hp, uint8_t 
*packet)
 
switch (hp->state) {
case HVSI_WAIT_FOR_VER_QUERY:
-   hvsi_version_respond(hp, query->seqno);
+   hvsi_version_respond(hp, query->hdr.seqno);
__set_state(hp, HVSI_OPEN);
break;
default:
@@ -579,16 +579,16 @@ static int hvsi_query(struct hvsi_struct *hp, uint16_t 
verb)
struct hvsi_query packet __ALIGNED__;
int wrote;
 
-   packet.type = VS_QUERY_PACKET_HEADER;
-   packet.len = sizeof(struct hvsi_query);
-   packet.seqno = atomic_inc_return(&hp->seqno);
+   packet.hdr.type = VS_QUERY_PACKET_HEADER;
+   packet.hdr.len = sizeof(struct hvsi_query);
+   packet.hdr.seqno = atomic_inc_return(&hp->seqno);
packet.verb = verb;
 
-   pr_debug("%s: sending %i bytes\n", __func__, packet.len);
-   dbg_dump_hex((uint8_t*)&packet, packet.len);
+   pr_debug("%s: sending %i bytes\n", __func__, packet.hdr.len);
+   dbg_dump_hex((uint8_t*)&packet, packet.hdr.len);
 
-   wrote = hvc_put_chars(hp->vtermno, (char *)&packet, packet.len);
-   if (wrote != packet.len) {
+   wrote = hvc_put_chars(hp->vtermno, (char *)&packet, packet.hdr.len);
+   if (wrote != packet.hdr.len) {
printk(KERN_ERR "hvsi%i: couldn't send query (%i)!\n", 
hp->index,
wrote);
return -EIO;
@@ -622,20 +622,20 @@ static int hvsi_set_mctrl(struct hvsi_struct *hp, 
uint16_t mctrl)
struct hvsi_control packet __ALIGNED__;
int wrote;
 
-   packet.type = VS_CONTROL_PACKET_HEADER,
-   packet.seqno = atomic_inc_return(&hp->seqno);
-   packet.len = sizeof(struct hvsi_control);
+   packet.hdr.type = VS_CONTROL_PACKET_HEADER,
+   packet.hdr.seqno = atomic_inc_return(&hp->seqno);
+   packet.hdr.len = sizeof(struct hvsi_control);
packet.verb = VSV_SET_MODEM_CTL;
packet.mask = HVSI_TSDTR;
 
if (mctrl & TIOCM_DTR)

[PATCH 1/5] powerpc/hvsi: Move HVSI protocol definitions to a header file

2011-06-16 Thread Benjamin Herrenschmidt

This moves various HVSI protocol definitions from the hvsi.c
driver to a header file that can be used later on by a udbg
implementation

Signed-off-by: Benjamin Herrenschmidt 
---
 arch/powerpc/include/asm/hvsi.h |   68 +++
 drivers/tty/hvc/hvsi.c  |   63 +---
 2 files changed, 69 insertions(+), 62 deletions(-)
 create mode 100644 arch/powerpc/include/asm/hvsi.h

diff --git a/arch/powerpc/include/asm/hvsi.h b/arch/powerpc/include/asm/hvsi.h
new file mode 100644
index 000..f13125a
--- /dev/null
+++ b/arch/powerpc/include/asm/hvsi.h
@@ -0,0 +1,68 @@
+#ifndef _HVSI_H
+#define _HVSI_H
+
+#define VS_DATA_PACKET_HEADER   0xff
+#define VS_CONTROL_PACKET_HEADER0xfe
+#define VS_QUERY_PACKET_HEADER  0xfd
+#define VS_QUERY_RESPONSE_PACKET_HEADER 0xfc
+
+/* control verbs */
+#define VSV_SET_MODEM_CTL1 /* to service processor only */
+#define VSV_MODEM_CTL_UPDATE 2 /* from service processor only */
+#define VSV_CLOSE_PROTOCOL   3
+
+/* query verbs */
+#define VSV_SEND_VERSION_NUMBER 1
+#define VSV_SEND_MODEM_CTL_STATUS 2
+
+/* yes, these masks are not consecutive. */
+#define HVSI_TSDTR 0x01
+#define HVSI_TSCD  0x20
+
+#define HVSI_MAX_OUTGOING_DATA 12
+#define HVSI_VERSION 1
+
+struct hvsi_header {
+   uint8_t  type;
+   uint8_t  len;
+   uint16_t seqno;
+} __attribute__((packed));
+
+struct hvsi_data {
+   uint8_t  type;
+   uint8_t  len;
+   uint16_t seqno;
+   uint8_t  data[HVSI_MAX_OUTGOING_DATA];
+} __attribute__((packed));
+
+struct hvsi_control {
+   uint8_t  type;
+   uint8_t  len;
+   uint16_t seqno;
+   uint16_t verb;
+   /* optional depending on verb: */
+   uint32_t word;
+   uint32_t mask;
+} __attribute__((packed));
+
+struct hvsi_query {
+   uint8_t  type;
+   uint8_t  len;
+   uint16_t seqno;
+   uint16_t verb;
+} __attribute__((packed));
+
+struct hvsi_query_response {
+   uint8_t  type;
+   uint8_t  len;
+   uint16_t seqno;
+   uint16_t verb;
+   uint16_t query_seqno;
+   union {
+   uint8_t  version;
+   uint32_t mctrl_word;
+   } u;
+} __attribute__((packed));
+
+
+#endif /* _HVSI_H */
diff --git a/drivers/tty/hvc/hvsi.c b/drivers/tty/hvc/hvsi.c
index 8a8d637..0b35793 100644
--- a/drivers/tty/hvc/hvsi.c
+++ b/drivers/tty/hvc/hvsi.c
@@ -49,6 +49,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #define HVSI_MAJOR 229
 #define HVSI_MINOR 128
@@ -109,68 +110,6 @@ enum HVSI_PROTOCOL_STATE {
 };
 #define HVSI_CONSOLE 0x1
 
-#define VS_DATA_PACKET_HEADER   0xff
-#define VS_CONTROL_PACKET_HEADER0xfe
-#define VS_QUERY_PACKET_HEADER  0xfd
-#define VS_QUERY_RESPONSE_PACKET_HEADER 0xfc
-
-/* control verbs */
-#define VSV_SET_MODEM_CTL1 /* to service processor only */
-#define VSV_MODEM_CTL_UPDATE 2 /* from service processor only */
-#define VSV_CLOSE_PROTOCOL   3
-
-/* query verbs */
-#define VSV_SEND_VERSION_NUMBER 1
-#define VSV_SEND_MODEM_CTL_STATUS 2
-
-/* yes, these masks are not consecutive. */
-#define HVSI_TSDTR 0x01
-#define HVSI_TSCD  0x20
-
-struct hvsi_header {
-   uint8_t  type;
-   uint8_t  len;
-   uint16_t seqno;
-} __attribute__((packed));
-
-struct hvsi_data {
-   uint8_t  type;
-   uint8_t  len;
-   uint16_t seqno;
-   uint8_t  data[HVSI_MAX_OUTGOING_DATA];
-} __attribute__((packed));
-
-struct hvsi_control {
-   uint8_t  type;
-   uint8_t  len;
-   uint16_t seqno;
-   uint16_t verb;
-   /* optional depending on verb: */
-   uint32_t word;
-   uint32_t mask;
-} __attribute__((packed));
-
-struct hvsi_query {
-   uint8_t  type;
-   uint8_t  len;
-   uint16_t seqno;
-   uint16_t verb;
-} __attribute__((packed));
-
-struct hvsi_query_response {
-   uint8_t  type;
-   uint8_t  len;
-   uint16_t seqno;
-   uint16_t verb;
-   uint16_t query_seqno;
-   union {
-   uint8_t  version;
-   uint32_t mctrl_word;
-   } u;
-} __attribute__((packed));
-
-
-
 static inline int is_console(struct hvsi_struct *hp)
 {
return hp->flags & HVSI_CONSOLE;



___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2] kexec-tools: powerpc: Use the #address-cells information to parsememory/reg

2011-06-16 Thread Simon Horman

On Thu, Jun 16, 2011 at 04:15:13PM +0530, Suzuki K. Poulose wrote:
> The format of memory/reg is based on the #address-cells,#size-cells. 
> Currently,
> the kexec-tools doesn't use the above values in parsing the memory/reg values.
> Hence the kexec cannot handle cases where #address-cells, #size-cells are
> different, (for e.g, PPC440X ).
> 
> This patch introduces a read_memory_region_limits(), which parses the
> memory/reg contents based on the values of #address-cells and #size-cells.

Thanks, applied.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH][v3] Add support for RTC device: pt7c4338 in rtc-ds1307.c

2011-06-16 Thread Wolfram Sang

On Thu, Jun 16, 2011 at 10:38:11AM -0500, Timur Tabi wrote:
> Priyanka Jain wrote:
> > PT7C4338 chip is being manufactured by Pericom Technology Inc.
> > It is a serial real-time clock which provides:
> > 1)Low-power clock/calendar.
> > 2)Programmable square-wave output.
> > It has 56 bytes of nonvolatile RAM.
> > Its register set is same as that of rtc device: DS1307.
> > 
> > 
> > Signed-off-by: Priyanka Jain 
> 
> Acked-by: Timur Tabi 

Reviewed-by: Wolfram Sang 

-- 
Pengutronix e.K.   | Wolfram Sang|
Industrial Linux Solutions | http://www.pengutronix.de/  |


signature.asc
Description: Digital signature
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH] powerpc/e500: fix breakage with fsl_rio_mcheck_exception

2011-06-16 Thread Scott Wood

The wrong MCSR bit was being used on e500mc.  MCSR_BUS_RBERR only exists
on e500v1/v2.  Use MCSR_LD on e500mc, and remove all MCSR checking
in fsl_rio_mcheck_exception as we now no longer call that function
if the appropriate bit in MCSR is not set.

If RIO support was enabled at compile-time, but was never probed, just
return from fsl_rio_mcheck_exception rather than dereference a NULL
pointer.

TODO: There is still a remaining, though comparitively minor, issue in
that this recovery mechanism will falsely engage if there's an unrelated
MCSR_LD event at the same time as a RIO error.

Signed-off-by: Scott Wood 
---
 arch/powerpc/kernel/traps.c   |2 +-
 arch/powerpc/sysdev/fsl_rio.c |   33 +
 2 files changed, 18 insertions(+), 17 deletions(-)

diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index 0ff4ab9..6414a0d 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -425,7 +425,7 @@ int machine_check_e500mc(struct pt_regs *regs)
unsigned long reason = mcsr;
int recoverable = 1;
 
-   if (reason & MCSR_BUS_RBERR) {
+   if (reason & MCSR_LD) {
recoverable = fsl_rio_mcheck_exception(regs);
if (recoverable == 1)
goto silent_out;
diff --git a/arch/powerpc/sysdev/fsl_rio.c b/arch/powerpc/sysdev/fsl_rio.c
index 5b206a2..b3fd081 100644
--- a/arch/powerpc/sysdev/fsl_rio.c
+++ b/arch/powerpc/sysdev/fsl_rio.c
@@ -283,23 +283,24 @@ static void __iomem *rio_regs_win;
 #ifdef CONFIG_E500
 int fsl_rio_mcheck_exception(struct pt_regs *regs)
 {
-   const struct exception_table_entry *entry = NULL;
-   unsigned long reason = mfspr(SPRN_MCSR);
-
-   if (reason & MCSR_BUS_RBERR) {
-   reason = in_be32((u32 *)(rio_regs_win + RIO_LTLEDCSR));
-   if (reason & (RIO_LTLEDCSR_IER | RIO_LTLEDCSR_PRT)) {
-   /* Check if we are prepared to handle this fault */
-   entry = search_exception_tables(regs->nip);
-   if (entry) {
-   pr_debug("RIO: %s - MC Exception handled\n",
-__func__);
-   out_be32((u32 *)(rio_regs_win + RIO_LTLEDCSR),
-0);
-   regs->msr |= MSR_RI;
-   regs->nip = entry->fixup;
-   return 1;
-   }
+   const struct exception_table_entry *entry;
+   unsigned long reason;
+
+   if (!rio_regs_win)
+   return 0;
+
+   reason = in_be32((u32 *)(rio_regs_win + RIO_LTLEDCSR));
+   if (reason & (RIO_LTLEDCSR_IER | RIO_LTLEDCSR_PRT)) {
+   /* Check if we are prepared to handle this fault */
+   entry = search_exception_tables(regs->nip);
+   if (entry) {
+   pr_debug("RIO: %s - MC Exception handled\n",
+__func__);
+   out_be32((u32 *)(rio_regs_win + RIO_LTLEDCSR),
+0);
+   regs->msr |= MSR_RI;
+   regs->nip = entry->fixup;
+   return 1;
}
}
 
-- 
1.7.4.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH][v3] Add support for RTC device: pt7c4338 in rtc-ds1307.c

2011-06-16 Thread Timur Tabi

Priyanka Jain wrote:
> PT7C4338 chip is being manufactured by Pericom Technology Inc.
> It is a serial real-time clock which provides:
> 1)Low-power clock/calendar.
> 2)Programmable square-wave output.
> It has 56 bytes of nonvolatile RAM.
> Its register set is same as that of rtc device: DS1307.
> 
> 
> Signed-off-by: Priyanka Jain 

Acked-by: Timur Tabi 

-- 
Timur Tabi
Linux kernel developer at Freescale

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH] libata/sas: only set FROZEN flag if new EH is supported

2011-06-16 Thread Nishanth Aravamudan

On 16.06.2011 [08:28:39 -0500], Brian King wrote:
> On 06/16/2011 02:51 AM, Tejun Heo wrote:
> > On Wed, Jun 15, 2011 at 04:34:17PM -0700, Nishanth Aravamudan wrote:
> >>> That looks like the right thing to do. For ipr's usage of
> >>> libata, we don't have the concept of a port frozen state, so this flag
> >>> should really never get set. The alternate way to fix this would be to
> >>> only set ATA_PFLAG_FROZEN in ata_port_alloc if ap->ops->error_handler
> >>> is not NULL.
> >>
> >> It seemed like ipr is as you say, but I wasn't sure if it was
> >> appropriate to make the change above in the common libata-scis code or
> >> not. I don't want to break some other device on accident.
> >>
> >> Also, I tried your suggestion, but I don't think that can happen in
> >> ata_port_alloc? ata_port_alloc is allocated ap itself, and it seems like
> >> ap->ops typically gets set only after ata_port_alloc returns?
> > 
> > Maybe we can test error_handler in ata_sas_port_start()?
> 
> Good point. Since libsas is converted to the new eh now, we would need to have
> this test.

Commit 7b3a24c57d2eeda8dba9c205342b12689c4679f9 ("ahci: don't enable
port irq before handler is registered") caused a regression for CD-ROMs
attached to the IPR SATA bus on Power machines:

  ata_port_alloc: ENTER
  ata_port_probe: ata1: bus probe begin
  ata1.00: ata_dev_read_id: ENTER
  ata1.00: failed to IDENTIFY (I/O error, err_mask=0x40)
  ata1.00: ata_dev_read_id: ENTER
  ata1.00: failed to IDENTIFY (I/O error, err_mask=0x40)
  ata1.00: limiting speed to UDMA7:PIO5
  ata1.00: ata_dev_read_id: ENTER
  ata1.00: failed to IDENTIFY (I/O error, err_mask=0x40)
  ata1.00: disabled
  ata_port_probe: ata1: bus probe end
  scsi_alloc_sdev: Allocation failure during SCSI scanning, some SCSI devices 
might not be configured

The FROZEN flag added in that commit is only cleared by the new EH code,
which is not used by ipr. Clear this flag in the SAS code if we don't
support new EH.

Reported-by: Benjamin Herrenschmidt 
Signed-off-by: Nishanth Aravamudan 

diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
index d51f979..ebe1685 100644
--- a/drivers/ata/libata-scsi.c
+++ b/drivers/ata/libata-scsi.c
@@ -3797,6 +3797,12 @@ EXPORT_SYMBOL_GPL(ata_sas_port_alloc);
  */
 int ata_sas_port_start(struct ata_port *ap)
 {
+   /*
+* the port is marked as frozen at allocation time, but if we don't
+* have new eh, we won't thaw it
+*/
+   if (!ap->ops->error_handler)
+   ap->pflags &= ~ATA_PFLAG_FROZEN;
return 0;
 }
 EXPORT_SYMBOL_GPL(ata_sas_port_start);


-- 
Nishanth Aravamudan 
IBM Linux Technology Center
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[RFC PATCH 6/7] powerpc: Update the default FIT image to use the correct load/boot addresses

2011-06-16 Thread Michal Simek

From: John Williams 

The default kernel_fdt.its hard codes zero load/start addresses, but this may
no longer be true.

As we copy the FIT tree descriptor, update these values based on the incoming
ELF payload.

Signed-off-by: John Williams 
---
 arch/powerpc/boot/wrapper |6 +-
 1 files changed, 5 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/boot/wrapper b/arch/powerpc/boot/wrapper
index 594aa02..54fbc2e 100755
--- a/arch/powerpc/boot/wrapper
+++ b/arch/powerpc/boot/wrapper
@@ -281,7 +281,11 @@ uboot-fit)
 rm -f "$ofile"
 #[ "$vmz" != vmlinux.bin.gz ] && mv "$vmz" "vmlinux.bin.gz"
 mv "$dtb" "target.dtb"
-cp arch/powerpc/boot/kernel_fdt.its .
+# Check the ELF file for a non-zero load/entry address
+membase=${membase:2:8}
+sed -e "s/load = <.*$/load = <${membase}>;/g" \
+-e "s/entry = <.*$/entry = <${membase}>;/g" \
+arch/powerpc/boot/kernel_fdt.its > kernel_fdt.its
 mkimage -f kernel_fdt.its "$ofile"
 #rm kernet_fdt.its
 exit 0
-- 
1.5.5.6

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[RFC PATCH 3/7] powerpc: simpleboot get load address from ELF instead of assuming zero

2011-06-16 Thread Michal Simek

From: John Williams 

simpleboot current assumes that the physical load address is zero, even if
the ELF payload has a non-zero paddr.

This is a simple fix that avoids a custom pltform_ops handler in this case.

Signed-off-by: John Williams 
---
 arch/powerpc/boot/elf.h  |1 +
 arch/powerpc/boot/elf_util.c |2 ++
 arch/powerpc/boot/main.c |1 +
 3 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/boot/elf.h b/arch/powerpc/boot/elf.h
index 1941bc5..39af242 100644
--- a/arch/powerpc/boot/elf.h
+++ b/arch/powerpc/boot/elf.h
@@ -150,6 +150,7 @@ struct elf_info {
unsigned long loadsize;
unsigned long memsize;
unsigned long elfoffset;
+   unsigned long loadaddr;
 };
 int parse_elf64(void *hdr, struct elf_info *info);
 int parse_elf32(void *hdr, struct elf_info *info);
diff --git a/arch/powerpc/boot/elf_util.c b/arch/powerpc/boot/elf_util.c
index 1567a0c..3aef4f0 100644
--- a/arch/powerpc/boot/elf_util.c
+++ b/arch/powerpc/boot/elf_util.c
@@ -43,6 +43,7 @@ int parse_elf64(void *hdr, struct elf_info *info)
info->loadsize = (unsigned long)elf64ph->p_filesz;
info->memsize = (unsigned long)elf64ph->p_memsz;
info->elfoffset = (unsigned long)elf64ph->p_offset;
+   info->loadaddr = (unsigned long)elf64ph->p_paddr;
 
return 1;
 }
@@ -74,5 +75,6 @@ int parse_elf32(void *hdr, struct elf_info *info)
info->loadsize = elf32ph->p_filesz;
info->memsize = elf32ph->p_memsz;
info->elfoffset = elf32ph->p_offset;
+   info->loadaddr = elf32ph->p_paddr;
return 1;
 }
diff --git a/arch/powerpc/boot/main.c b/arch/powerpc/boot/main.c
index a28f021..fbbffa5 100644
--- a/arch/powerpc/boot/main.c
+++ b/arch/powerpc/boot/main.c
@@ -56,6 +56,7 @@ static struct addr_range prep_kernel(void)
if (platform_ops.vmlinux_alloc) {
addr = platform_ops.vmlinux_alloc(ei.memsize);
} else {
+   addr = ei.loadaddr;
/*
 * Check if the kernel image (without bss) would overwrite the
 * bootwrapper. The device tree has been moved in fdt_init()
-- 
1.5.5.6

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[RFC PATCH 7/7] powerpc: Support RELOCATABLE kernel for PPC44x

2011-06-16 Thread Michal Simek

Changes:
- Find out address where kernel runs
- Create the first 256MB TLB from online detected address

Limitations:
- Kernel must be aligned to 256MB

Backport:
- Changes in page.h are backported from newer kernel version

mmu_mapin_ram function has to reflect offset in memory start.
memstart_addr and kernstart_addr are setup directly from asm
code to ensure that only ppc44x is affected.

Signed-off-by: Michal Simek 
---
 arch/powerpc/Kconfig|3 ++-
 arch/powerpc/include/asm/page.h |7 ++-
 arch/powerpc/kernel/head_44x.S  |   28 
 arch/powerpc/mm/44x_mmu.c   |6 +-
 4 files changed, 41 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 45c9683..34c521e 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -796,7 +796,8 @@ config LOWMEM_CAM_NUM
 
 config RELOCATABLE
bool "Build a relocatable kernel (EXPERIMENTAL)"
-   depends on EXPERIMENTAL && ADVANCED_OPTIONS && FLATMEM && FSL_BOOKE
+   depends on EXPERIMENTAL && ADVANCED_OPTIONS && FLATMEM
+   depends on FSL_BOOKE || (44x && !SMP)
help
  This builds a kernel image that is capable of running at the
  location the kernel is loaded at (some alignment restrictions may
diff --git a/arch/powerpc/include/asm/page.h b/arch/powerpc/include/asm/page.h
index 4940662..e813cc2 100644
--- a/arch/powerpc/include/asm/page.h
+++ b/arch/powerpc/include/asm/page.h
@@ -108,8 +108,13 @@ extern phys_addr_t kernstart_addr;
 #define pfn_to_kaddr(pfn)  __va((pfn) << PAGE_SHIFT)
 #define virt_addr_valid(kaddr) pfn_valid(__pa(kaddr) >> PAGE_SHIFT)
 
-#define __va(x) ((void *)((unsigned long)(x) + PAGE_OFFSET - MEMORY_START))
+#ifdef CONFIG_BOOKE
+#define __va(x) ((void *)(unsigned long)((phys_addr_t)(x) - PHYSICAL_START + 
KERNELBASE))
+#define __pa(x) ((unsigned long)(x) + PHYSICAL_START - KERNELBASE)
+#else
+#define __va(x) ((void *)(unsigned long)((phys_addr_t)(x) + PAGE_OFFSET - 
MEMORY_START))
 #define __pa(x) ((unsigned long)(x) - PAGE_OFFSET + MEMORY_START)
+#endif
 
 /*
  * Unfortunately the PLT is in the BSS in the PPC32 ELF ABI,
diff --git a/arch/powerpc/kernel/head_44x.S b/arch/powerpc/kernel/head_44x.S
index d80ce05..6a63d32 100644
--- a/arch/powerpc/kernel/head_44x.S
+++ b/arch/powerpc/kernel/head_44x.S
@@ -59,6 +59,17 @@ _ENTRY(_start);
 * of abatron_pteptrs
 */
nop
+
+#ifdef CONFIG_RELOCATABLE
+   bl  jump/* Find our address */
+   nop
+jump:  mflrr25  /* Make it accessible */
+   /* just for and */
+   lis r26, 0xfff0@h
+   ori r26, r26, 0xfff0@l
+   and.r21, r25, r26
+#endif
+
 /*
  * Save parameters we are passed
  */
@@ -135,9 +146,14 @@ skpinv:addir4,r4,1 /* 
Increment */
lis r3,PAGE_OFFSET@h
ori r3,r3,PAGE_OFFSET@l
 
+#ifdef CONFIG_RELOCATABLE
+   /* load physical address where kernel runs */
+   mr  r4,r21
+#else
/* Kernel is at PHYSICAL_START */
lis r4,PHYSICAL_START@h
ori r4,r4,PHYSICAL_START@l
+#endif
 
/* Load the kernel PID = 0 */
li  r0,0
@@ -258,6 +274,18 @@ skpinv:addir4,r4,1 /* 
Increment */
mr  r5,r29
mr  r6,r28
mr  r7,r27
+
+#ifdef CONFIG_RELOCATABLE
+   /* save kernel and memory start */
+   lis r25,kernstart_addr@h
+   ori r25,r25,kernstart_addr@l
+   stw r21,4(r25)
+
+   lis r25,memstart_addr@h
+   ori r25,r25,memstart_addr@l
+   stw r21,4(r25)
+#endif
+
bl  machine_init
bl  MMU_init
 
diff --git a/arch/powerpc/mm/44x_mmu.c b/arch/powerpc/mm/44x_mmu.c
index 4a55061..ecf4a20 100644
--- a/arch/powerpc/mm/44x_mmu.c
+++ b/arch/powerpc/mm/44x_mmu.c
@@ -91,12 +91,16 @@ void __init MMU_init_hw(void)
 unsigned long __init mmu_mapin_ram(void)
 {
unsigned long addr;
+   unsigned long offset = 0;
 
+#if defined(CONFIG_RELOCATABLE)
+   offset = memstart_addr;
+#endif
/* Pin in enough TLBs to cover any lowmem not covered by the
 * initial 256M mapping established in head_44x.S */
for (addr = PHYSICAL_START + PPC_PIN_SIZE; addr < lowmem_end_addr;
 addr += PPC_PIN_SIZE)
-   ppc44x_pin_tlb(addr + PAGE_OFFSET, addr);
+   ppc44x_pin_tlb(addr + PAGE_OFFSET - offset, addr);
 
return total_lowmem;
 }
-- 
1.5.5.6

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[RFC PATCH 5/7] powerpc: Consider a non-zero boot address when computing the bootwrapper start

2011-06-16 Thread Michal Simek

From: John Williams 

There's no fundamental reason the bootwrapper can't boot off a non-zero base,
we just need to make sure we account for it in the link.

Do this by adding the (up-aligned) kernel size to membase, and using that as
the link address.

Signed-off-by: John Williams 
---
 arch/powerpc/boot/wrapper |   14 +-
 1 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/boot/wrapper b/arch/powerpc/boot/wrapper
index e148053..594aa02 100755
--- a/arch/powerpc/boot/wrapper
+++ b/arch/powerpc/boot/wrapper
@@ -139,9 +139,16 @@ tmp=$tmpdir/zImage.$$.o
 ksection=.kernel:vmlinux.strip
 isection=.kernel:initrd
 
-# default auto-calculate link_address to make room for the kernel
+# physical offset of kernel image
+membase=`${CROSS}objdump -p "$kernel" | grep -m 1 LOAD | awk '{print $7}'`
+
+# auto-calculate link_address to make room for the kernel
 # round up kernel image size to nearest megabyte
-link_address=`${CROSS}size -x ${kernel} | grep ${kernel} | awk 
'{printf("0x%08x", and($4 + 0x0f, 0xfffe))}'`
+
+# don't forget to add membase for non-zero kernel boot
+membase_dec=`printf "%d" $membase`
+
+link_address=`${CROSS}size -x ${kernel} | grep ${kernel} | awk -v 
membase=$membase_dec '{printf("0x%08x", membase + and($4 + 0x0f, 
0xfffe))}'`
 
 case "$platform" in
 pseries)
@@ -259,9 +266,6 @@ if [ -n "$version" ]; then
 uboot_version="-n Linux-$version"
 fi
 
-# physical offset of kernel image
-membase=`${CROSS}objdump -p "$kernel" | grep -m 1 LOAD | awk '{print $7}'`
-
 case "$platform" in
 uboot)
 rm -f "$ofile"
-- 
1.5.5.6

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[RFC PATCH 4/7] powerpc: Let simpleboot function with non zero-based memory maps

2011-06-16 Thread Michal Simek

From: John Williams 

It is unnecessarily restrictive to fatal() if there is physical memory at a
non-zero base address.

Signed-off-by: John Williams 
---
 arch/powerpc/boot/simpleboot.c |5 -
 1 files changed, 0 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/boot/simpleboot.c b/arch/powerpc/boot/simpleboot.c
index 21cd480..910ae05 100644
--- a/arch/powerpc/boot/simpleboot.c
+++ b/arch/powerpc/boot/simpleboot.c
@@ -56,11 +56,6 @@ void platform_init(unsigned long r3, unsigned long r4, 
unsigned long r5,
if (size < (*na+*ns) * sizeof(u32))
fatal("cannot get memory range\n");
 
-   /* Only interested in memory based at 0 */
-   for (i = 0; i < *na; i++)
-   if (*reg++ != 0)
-   fatal("Memory range is not based at address 0\n");
-
/* get the memsize and trucate it to under 4G on 32 bit machines */
memsize64 = 0;
for (i = 0; i < *ns; i++)
-- 
1.5.5.6

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[RFC PATCH 2/7] powerpc: Permit non-zero physical start address for PPC44x

2011-06-16 Thread Michal Simek

From: John Williams 

The initial TLB entry is 256M, meaning that the physical base address must be
256M aligned.

Signed-off-by: John Williams 
---
 arch/powerpc/Kconfig |3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index d00131c..45c9683 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -845,7 +845,7 @@ config KERNEL_START
 
 config PHYSICAL_START_BOOL
bool "Set physical address where the kernel is loaded"
-   depends on ADVANCED_OPTIONS && FLATMEM && FSL_BOOKE
+   depends on ADVANCED_OPTIONS && FLATMEM && (FSL_BOOKE || 44x)
help
  This gives the physical address where the kernel is loaded.
 
@@ -858,6 +858,7 @@ config PHYSICAL_START
 
 config PHYSICAL_ALIGN
hex
+   default "0x1000" if 44x
default "0x0400" if FSL_BOOKE
help
  This value puts the alignment restrictions on physical address
-- 
1.5.5.6

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[RFC PATCH 1/7] powerpc: ppc440 remove zero physical memory base assumption

2011-06-16 Thread Michal Simek

From: John Williams 

The macro PHYSICAL_START is available in this context, currently always with
the value zero.  However, that will change in a future patchset.

For now, just remove the zero physical address start assumption in head_44x.S
where we setup the initial TLB, and in the later MMU setup where we map in
the remainder of low mem if required.

Signed-off-by: John Williams 
---
 arch/powerpc/kernel/head_44x.S |5 +++--
 arch/powerpc/mm/44x_mmu.c  |2 +-
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kernel/head_44x.S b/arch/powerpc/kernel/head_44x.S
index 18d8a16..d80ce05 100644
--- a/arch/powerpc/kernel/head_44x.S
+++ b/arch/powerpc/kernel/head_44x.S
@@ -135,8 +135,9 @@ skpinv: addir4,r4,1 /* 
Increment */
lis r3,PAGE_OFFSET@h
ori r3,r3,PAGE_OFFSET@l
 
-   /* Kernel is at the base of RAM */
-   li r4, 0/* Load the kernel physical address */
+   /* Kernel is at PHYSICAL_START */
+   lis r4,PHYSICAL_START@h
+   ori r4,r4,PHYSICAL_START@l
 
/* Load the kernel PID = 0 */
li  r0,0
diff --git a/arch/powerpc/mm/44x_mmu.c b/arch/powerpc/mm/44x_mmu.c
index 98052ac..4a55061 100644
--- a/arch/powerpc/mm/44x_mmu.c
+++ b/arch/powerpc/mm/44x_mmu.c
@@ -94,7 +94,7 @@ unsigned long __init mmu_mapin_ram(void)
 
/* Pin in enough TLBs to cover any lowmem not covered by the
 * initial 256M mapping established in head_44x.S */
-   for (addr = PPC_PIN_SIZE; addr < lowmem_end_addr;
+   for (addr = PHYSICAL_START + PPC_PIN_SIZE; addr < lowmem_end_addr;
 addr += PPC_PIN_SIZE)
ppc44x_pin_tlb(addr + PAGE_OFFSET, addr);
 
-- 
1.5.5.6

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[RFC] Relocatable kernel for ppc44x

2011-06-16 Thread Michal Simek

Hi,

John mentioned in his email that we have some patches.
I am also sending them [patch 1/7-6/7]. It is support for non zero
boot address for ppc44x.

Patch 7/7 is relocatable support for ppc44x.

All patches are against 2.6.31.13 but it shouldn't be big problem
to port them to the latest and greatest. 

We are using 256MB alignment which is fine for our purposes. But I think
it could be simple to change it to any different style.

I have tested it on QEMU system emulator with u-boot.
I have used its with vmlinux.bin.gz and DTB. Not sure about your configuration
but the best is not to relocate DTB. Address of DTB should be in 256MB boundary
to be able to access it in the first pinned 256MB tlb. I haven't tested any 
other
configuration but I have tried not to break anyone. Not sure if I can use r25, 
r26 and r21
(I need two regs for storing temp values). I look forward on your 
recommendation.

The idea of this patch is to find out where the kernel runs. Pinned 256MB
in asm - setup memstart_addr and kernstart_addr in ASM and not to break generic 
PPC code. And also setup TLB for larger memories in mmu_mapin_ram. Have tested 
system till
768MB.

Thanks for your comments,
Michal

P.S.: Sorry for faults in PPC asm - I am not PPC expert.


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: libata/ipr/powerpc: regression between 2.6.39-rc4 and 2.6.39-rc5

2011-06-16 Thread Brian King

On 06/16/2011 02:51 AM, Tejun Heo wrote:
> On Wed, Jun 15, 2011 at 04:34:17PM -0700, Nishanth Aravamudan wrote:
>>> That looks like the right thing to do. For ipr's usage of
>>> libata, we don't have the concept of a port frozen state, so this flag
>>> should really never get set. The alternate way to fix this would be to
>>> only set ATA_PFLAG_FROZEN in ata_port_alloc if ap->ops->error_handler
>>> is not NULL.
>>
>> It seemed like ipr is as you say, but I wasn't sure if it was
>> appropriate to make the change above in the common libata-scis code or
>> not. I don't want to break some other device on accident.
>>
>> Also, I tried your suggestion, but I don't think that can happen in
>> ata_port_alloc? ata_port_alloc is allocated ap itself, and it seems like
>> ap->ops typically gets set only after ata_port_alloc returns?
> 
> Maybe we can test error_handler in ata_sas_port_start()?

Good point. Since libsas is converted to the new eh now, we would need to have
this test.

Thanks,

Brian

-- 
Brian King
Linux on Power Virtualization
IBM Linux Technology Center


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v2] kexec-tools: powerpc: Use the #address-cells information to parsememory/reg

2011-06-16 Thread Suzuki K. Poulose


The problem was with the mail client. Also, mistyped the "To" field in the 
previous
one. Resending.

ChangeLog from V1:
* Changed the interface for read_memory_region_limits to use 'int fd'
* Use sizeof(variable) for read(, instead of sizeof(type).

---
The format of memory/reg is based on the #address-cells,#size-cells. Currently,
the kexec-tools doesn't use the above values in parsing the memory/reg values.
Hence the kexec cannot handle cases where #address-cells, #size-cells are
different, (for e.g, PPC440X ).

This patch introduces a read_memory_region_limits(), which parses the
memory/reg contents based on the values of #address-cells and #size-cells.

Signed-off-by: Suzuki K. Poulose 
---

 kexec/arch/ppc/crashdump-powerpc.c |   33 ++--
 kexec/arch/ppc/fs2dt.c |   14 ---
 kexec/arch/ppc/kexec-ppc.c |  158 ++--
 kexec/arch/ppc/kexec-ppc.h |6 +
 4 files changed, 129 insertions(+), 82 deletions(-)

diff --git a/kexec/arch/ppc/crashdump-powerpc.c 
b/kexec/arch/ppc/crashdump-powerpc.c
index 1dd6485..77a01e1 100644
--- a/kexec/arch/ppc/crashdump-powerpc.c
+++ b/kexec/arch/ppc/crashdump-powerpc.c
@@ -81,7 +81,7 @@ static int get_crash_memory_ranges(struct memory_range 
**range, int *ranges)
char fname[256];
char buf[MAXBYTES];
DIR *dir, *dmem;
-   FILE *file;
+   int fd;
struct dirent *dentry, *mentry;
int i, n, crash_rng_len = 0;
unsigned long long start, end, cstart, cend;
@@ -123,17 +123,16 @@ static int get_crash_memory_ranges(struct memory_range 
**range, int *ranges)
if (strcmp(mentry->d_name, "reg"))
continue;
strcat(fname, "/reg");
-   file = fopen(fname, "r");
-   if (!file) {
+   fd = open(fname, O_RDONLY);
+   if (fd < 0) {
perror(fname);
closedir(dmem);
closedir(dir);
goto err;
}
-   n = fread(buf, 1, MAXBYTES, file);
-   if (n < 0) {
-   perror(fname);
-   fclose(file);
+   n = read_memory_region_limits(fd, &start, &end);
+   if (n != 0) {
+   close(fd);
closedir(dmem);
closedir(dir);
goto err;
@@ -146,24 +145,6 @@ static int get_crash_memory_ranges(struct memory_range 
**range, int *ranges)
goto err;
}
 
-   /*
-* FIXME: This code fails on platforms that
-* have more than one memory range specified
-* in the device-tree's /memory/reg property.
-* or where the #address-cells and #size-cells
-* are not identical.
-*
-* We should interpret the /memory/reg property
-* based on the values of the #address-cells and
-* #size-cells properites.
-*/
-   if (n == (sizeof(unsigned long) * 2)) {
-   start = ((unsigned long *)buf)[0];
-   end = start + ((unsigned long *)buf)[1];
-   } else {
-   start = ((unsigned long long *)buf)[0];
-   end = start + ((unsigned long long *)buf)[1];
-   }
if (start == 0 && end >= (BACKUP_SRC_END + 1))
start = BACKUP_SRC_END + 1;
 
@@ -212,7 +193,7 @@ static int get_crash_memory_ranges(struct memory_range 
**range, int *ranges)
= RANGE_RAM;
memory_ranges++;
}
-   fclose(file);
+   close(fd);
}
closedir(dmem);
}
diff --git a/kexec/arch/ppc/fs2dt.c b/kexec/arch/ppc/fs2dt.c
index 238a3f2..733515a 100644
--- a/kexec/arch/ppc/fs2dt.c
+++ b/kexec/arch/ppc/fs2dt.c
@@ -137,21 +137,11 @@ static void add_usable_mem_property(int fd, int len)
if (strncmp(bname, "/memory@", 8) && strcmp(bname, "/memory"))
return;
 
-   if (len < 2 * sizeof(unsigned long))
-   die("unrecoverable error: not enough data for mem property\n");
-   len = 2 * sizeof(unsigned long);
-
if (lseek(fd, 0, SEEK_SET) < 0)
die("unrecoverable error: error seeking in \"%s\": %s\n",
pathname, strerror(errno));
-   if (read(f

[PATCH v2] kexec-tools: powerpc: Use the #address-cells information to parsememory/reg

2011-06-16 Thread Suzuki K. Poulose

The format of memory/reg is based on the #address-cells,#size-cells. Currently,
the kexec-tools doesn't use the above values in parsing the memory/reg values.
Hence the kexec cannot handle cases where #address-cells, #size-cells are
different, (for e.g, PPC440X ).

This patch introduces a read_memory_region_limits(), which parses the
memory/reg contents based on the values of #address-cells and #size-cells.

Signed-off-by: Suzuki K. Poulose 
---

 kexec/arch/ppc/crashdump-powerpc.c |   33 ++--
 kexec/arch/ppc/fs2dt.c |   14 ---
 kexec/arch/ppc/kexec-ppc.c |  158 ++--
 kexec/arch/ppc/kexec-ppc.h |6 +
 4 files changed, 129 insertions(+), 82 deletions(-)

diff --git a/kexec/arch/ppc/crashdump-powerpc.c 
b/kexec/arch/ppc/crashdump-powerpc.c
index 1dd6485..77a01e1 100644
--- a/kexec/arch/ppc/crashdump-powerpc.c
+++ b/kexec/arch/ppc/crashdump-powerpc.c
@@ -81,7 +81,7 @@ static int get_crash_memory_ranges(struct memory_range 
**range, int *ranges)
char fname[256];
char buf[MAXBYTES];
DIR *dir, *dmem;
-   FILE *file;
+   int fd;
struct dirent *dentry, *mentry;
int i, n, crash_rng_len = 0;
unsigned long long start, end, cstart, cend;
@@ -123,17 +123,16 @@ static int get_crash_memory_ranges(struct memory_range 
**range, int *ranges)
if (strcmp(mentry->d_name, "reg"))
continue;
strcat(fname, "/reg");
-   file = fopen(fname, "r");
-   if (!file) {
+   fd = open(fname, O_RDONLY);
+   if (fd < 0) {
perror(fname);
closedir(dmem);
closedir(dir);
goto err;
}
-   n = fread(buf, 1, MAXBYTES, file);
-   if (n < 0) {
-   perror(fname);
-   fclose(file);
+   n = read_memory_region_limits(fd, &start, &end);
+   if (n != 0) {
+   close(fd);
closedir(dmem);
closedir(dir);
goto err;
@@ -146,24 +145,6 @@ static int get_crash_memory_ranges(struct memory_range 
**range, int *ranges)
goto err;
}
 
-   /*
-* FIXME: This code fails on platforms that
-* have more than one memory range specified
-* in the device-tree's /memory/reg property.
-* or where the #address-cells and #size-cells
-* are not identical.
-*
-* We should interpret the /memory/reg property
-* based on the values of the #address-cells and
-* #size-cells properites.
-*/
-   if (n == (sizeof(unsigned long) * 2)) {
-   start = ((unsigned long *)buf)[0];
-   end = start + ((unsigned long *)buf)[1];
-   } else {
-   start = ((unsigned long long *)buf)[0];
-   end = start + ((unsigned long long *)buf)[1];
-   }
if (start == 0 && end >= (BACKUP_SRC_END + 1))
start = BACKUP_SRC_END + 1;
 
@@ -212,7 +193,7 @@ static int get_crash_memory_ranges(struct memory_range 
**range, int *ranges)
= RANGE_RAM;
memory_ranges++;
}
-   fclose(file);
+   close(fd);
}
closedir(dmem);
}
diff --git a/kexec/arch/ppc/fs2dt.c b/kexec/arch/ppc/fs2dt.c
index 238a3f2..733515a 100644
--- a/kexec/arch/ppc/fs2dt.c
+++ b/kexec/arch/ppc/fs2dt.c
@@ -137,21 +137,11 @@ static void add_usable_mem_property(int fd, int len)
if (strncmp(bname, "/memory@", 8) && strcmp(bname, "/memory"))
return;
 
-   if (len < 2 * sizeof(unsigned long))
-   die("unrecoverable error: not enough data for mem property\n");
-   len = 2 * sizeof(unsigned long);
-
if (lseek(fd, 0, SEEK_SET) < 0)
die("unrecoverable error: error seeking in \"%s\": %s\n",
pathname, strerror(errno));
-   if (read(fd, buf, len) != len)
-   die("unrecoverable error: error reading \"%s\": %s\n",
-   pathname, strerror(errno));
-
-   if (~0ULL - buf[0] < buf[1])
-   die("unrecoverable error: mem property overflow\n");
-   base = buf[0];
-

Re: libata/ipr/powerpc: regression between 2.6.39-rc4 and 2.6.39-rc5

2011-06-16 Thread Tejun Heo

On Wed, Jun 15, 2011 at 04:34:17PM -0700, Nishanth Aravamudan wrote:
> > That looks like the right thing to do. For ipr's usage of
> > libata, we don't have the concept of a port frozen state, so this flag
> > should really never get set. The alternate way to fix this would be to
> > only set ATA_PFLAG_FROZEN in ata_port_alloc if ap->ops->error_handler
> > is not NULL.
> 
> It seemed like ipr is as you say, but I wasn't sure if it was
> appropriate to make the change above in the common libata-scis code or
> not. I don't want to break some other device on accident.
> 
> Also, I tried your suggestion, but I don't think that can happen in
> ata_port_alloc? ata_port_alloc is allocated ap itself, and it seems like
> ap->ops typically gets set only after ata_port_alloc returns?

Maybe we can test error_handler in ata_sas_port_start()?

Thanks.

-- 
tejun
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[21/85] seqlock: Dont smp_rmb in seqlock reader spin loop

2011-06-16 Thread Greg KH

2.6.33-longterm review patch.  If anyone has any objections, please let us know.

--

From: Milton Miller 

commit 5db1256a5131d3b133946fa02ac9770a784e6eb2 upstream.

Move the smp_rmb after cpu_relax loop in read_seqlock and add
ACCESS_ONCE to make sure the test and return are consistent.

A multi-threaded core in the lab didn't like the update
from 2.6.35 to 2.6.36, to the point it would hang during
boot when multiple threads were active.  Bisection showed
af5ab277ded04bd9bc6b048c5a2f0e7d70ef0867 (clockevents:
Remove the per cpu tick skew) as the culprit and it is
supported with stack traces showing xtime_lock waits including
tick_do_update_jiffies64 and/or update_vsyscall.

Experimentation showed the combination of cpu_relax and smp_rmb
was significantly slowing the progress of other threads sharing
the core, and this patch is effective in avoiding the hang.

A theory is the rmb is affecting the whole core while the
cpu_relax is causing a resource rebalance flush, together they
cause an interfernce cadance that is unbroken when the seqlock
reader has interrupts disabled.

At first I was confused why the refactor in
3c22cd5709e8143444a6d08682a87f4c57902df3 (kernel: optimise
seqlock) didn't affect this patch application, but after some
study that affected seqcount not seqlock. The new seqcount was
not factored back into the seqlock.  I defer that the future.

While the removal of the timer interrupt offset created
contention for the xtime lock while a cpu does the
additonal work to update the system clock, the seqlock
implementation with the tight rmb spin loop goes back much
further, and is just waiting for the right trigger.

Signed-off-by: Milton Miller 
Cc: 
Cc: Linus Torvalds 
Cc: Andi Kleen 
Cc: Nick Piggin 
Cc: Benjamin Herrenschmidt 
Cc: Anton Blanchard 
Cc: Paul McKenney 
Acked-by: Eric Dumazet 
Link: http://lkml.kernel.org/r/%3Cseqlock-rmb%40mdm.bga.com%3E
Signed-off-by: Thomas Gleixner 
Signed-off-by: Greg Kroah-Hartman 

---
 include/linux/seqlock.h |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

--- a/include/linux/seqlock.h
+++ b/include/linux/seqlock.h
@@ -88,12 +88,12 @@ static __always_inline unsigned read_seq
unsigned ret;
 
 repeat:
-   ret = sl->sequence;
-   smp_rmb();
+   ret = ACCESS_ONCE(sl->sequence);
if (unlikely(ret & 1)) {
cpu_relax();
goto repeat;
}
+   smp_rmb();
 
return ret;
 }


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[21/91] seqlock: Dont smp_rmb in seqlock reader spin loop

2011-06-16 Thread Greg KH

2.6.32-longterm review patch.  If anyone has any objections, please let us know.

--

From: Milton Miller 

commit 5db1256a5131d3b133946fa02ac9770a784e6eb2 upstream.

Move the smp_rmb after cpu_relax loop in read_seqlock and add
ACCESS_ONCE to make sure the test and return are consistent.

A multi-threaded core in the lab didn't like the update
from 2.6.35 to 2.6.36, to the point it would hang during
boot when multiple threads were active.  Bisection showed
af5ab277ded04bd9bc6b048c5a2f0e7d70ef0867 (clockevents:
Remove the per cpu tick skew) as the culprit and it is
supported with stack traces showing xtime_lock waits including
tick_do_update_jiffies64 and/or update_vsyscall.

Experimentation showed the combination of cpu_relax and smp_rmb
was significantly slowing the progress of other threads sharing
the core, and this patch is effective in avoiding the hang.

A theory is the rmb is affecting the whole core while the
cpu_relax is causing a resource rebalance flush, together they
cause an interfernce cadance that is unbroken when the seqlock
reader has interrupts disabled.

At first I was confused why the refactor in
3c22cd5709e8143444a6d08682a87f4c57902df3 (kernel: optimise
seqlock) didn't affect this patch application, but after some
study that affected seqcount not seqlock. The new seqcount was
not factored back into the seqlock.  I defer that the future.

While the removal of the timer interrupt offset created
contention for the xtime lock while a cpu does the
additonal work to update the system clock, the seqlock
implementation with the tight rmb spin loop goes back much
further, and is just waiting for the right trigger.

Signed-off-by: Milton Miller 
Cc: 
Cc: Linus Torvalds 
Cc: Andi Kleen 
Cc: Nick Piggin 
Cc: Benjamin Herrenschmidt 
Cc: Anton Blanchard 
Cc: Paul McKenney 
Acked-by: Eric Dumazet 
Link: http://lkml.kernel.org/r/%3Cseqlock-rmb%40mdm.bga.com%3E
Signed-off-by: Thomas Gleixner 
Signed-off-by: Greg Kroah-Hartman 

---
 include/linux/seqlock.h |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

--- a/include/linux/seqlock.h
+++ b/include/linux/seqlock.h
@@ -88,12 +88,12 @@ static __always_inline unsigned read_seq
unsigned ret;
 
 repeat:
-   ret = sl->sequence;
-   smp_rmb();
+   ret = ACCESS_ONCE(sl->sequence);
if (unlikely(ret & 1)) {
cpu_relax();
goto repeat;
}
+   smp_rmb();
 
return ret;
 }


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

46 matches

Mail list logo