Re: kernel panic during kernel module load (powerpc specific part)

2012-06-06 Thread Steffen Rumler



On Fri, Jun 01, 2012 at 11:33:37AM +, Wrobel Heinz-R39252 wrote:

I believe that the basic premise is that you should provide a directly
reachable copy of the save/rstore functions, even if this means that

you need several copies of the functions.

I just fixed a very similar problem with grub2 in fact. It was using r0
and trashing the saved LR that way.

The real fix is indeed to statically link those gcc "helpers", we
shouldn't generate things like cross-module calls inside function prologs
and epilogues, when stackframes aren't even guaranteed to be reliable.

However, in the grub2 case, it was easier to just use r12 :-)

For not just the module loading case, I believe r12 is the only real solution 
now. I checked one debugger capable of doing ELF download. It also uses r12 for 
trampoline code. I am guessing for the reason that prompted this discussion.


I disagree. Look carefully at Be's answer: cross-module calls
are intrinsically dangerous when stack frames are in a transient
state.


Without r12 we'd have to change standard libraries to automagically link in gcc 
helpers for any conceivable non-.text section, which I am not sure is feasible. 
How would you write section independent helper functions which link to any 
section needing them?!

I don't thnk that it is tha bad: the helpers should be linked to the default 
.text section
when needed, typically the init code and so on are mapped within the reach of 
that
section (otherwise you'll end up with the linker complaining that it finds 
overflowing
branch offsets between .text and .init.text).


Asking users to create their own section specific copy of helper functions is 
definitely not portable if the module or other code is not architecture 
dependent.

Well, it automagically works on 64 bit. There is is performed by magic built 
into the linker.


It is a normal gcc feature that you can assign specific code to non-.text 
sections and it is not documented that it may crash depending on the OS arch 
the ELF is built for, so asking for a Power Architecture specific change on 
tool libs to make Power Architecture Linux happy seems a bit much to ask.


Once again I disagree.


Using r12 in any Linux related trampoline code seems a reachable goal, and it 
would eliminate the conflict to the ABI.


There is no conflict to the ABI. These functions are supposed to be directly 
reachable from whatever code
section may need them.

Now I have a question: how did you get the need for this?

None of my kernels uses them:
- if I compile with -O2, the compiler simply expands epilogue and prologue to 
series of lwz and stw
- if I compile with -Os, the compiler generates lmw/stmw which give the 
smallest possible cache footprint

Neither did I find a single reference to these functions in several systems 
that I grepped for.

Regards,
Gabriel


Hi,

how should we continue here ?
There is the kernel panic, I've described.

Technically, there is an conflict between the code generated by the compiler 
and the loader in module_32.c, at least by using -Os.
Because the prologue/epilogue is part of the .text and init_module() is part of 
.init.text (in the case __init is applied, as usual),
a directly reachable call is not always possible.

Thanks
Steffen

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[RFC PATCH] sched/numa: do load balance between remote nodes

2012-06-06 Thread Alex Shi
commit cb83b629b remove the NODE sched domain and check if the node
distance in SLIT table is farther than REMOTE_DISTANCE, if so, it will
lose the load balance chance at exec/fork/wake_affine points.

But actually, even the node distance is farther than REMOTE_DISTANCE,
Modern CPUs also has QPI like connections, that make memory access is
not too slow between nodes. So above losing on NUMA machine make a
huge performance regression on benchmark: hackbench, tbench, netperf
and oltp etc.

This patch will recover the scheduler behavior to old mode on all my
Intel platforms: NHM EP/EX, WSM EP, SNB EP/EP4S, and so remove the
perfromance regressions. (all of them just has 2 kinds distance, 10 21)

Signed-off-by: Alex Shi 
---
 kernel/sched/core.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 39eb601..b2ee41a 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6286,7 +6286,7 @@ static int sched_domains_curr_level;
 
 static inline int sd_local_flags(int level)
 {
-   if (sched_domains_numa_distance[level] > REMOTE_DISTANCE)
+   if (sched_domains_numa_distance[level] > RECLAIM_DISTANCE)
return 0;
 
return SD_BALANCE_EXEC | SD_BALANCE_FORK | SD_WAKE_AFFINE;
-- 
1.7.5.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [RFC PATCH] sched/numa: do load balance between remote nodes

2012-06-06 Thread Peter Zijlstra
On Wed, 2012-06-06 at 14:52 +0800, Alex Shi wrote:
> -   if (sched_domains_numa_distance[level] > REMOTE_DISTANCE)
> +   if (sched_domains_numa_distance[level] > RECLAIM_DISTANCE) 

I actually considered this.. I just felt a little uneasy re-purposing
the RECLAIM_DISTANCE for this, but I guess its all the same anyway. Both
mean expensive-away-distance.

So I've taken this.

thanks!
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH 1/2] uprobes: Pass probed vaddr to arch_uprobe_analyze_insn()

2012-06-06 Thread Ananth N Mavinakayanahalli
From: Ananth N Mavinakayanahalli 

On RISC architectures like powerpc, instructions are fixed size.
Instruction analysis on such platforms is just a matter of (insn % 4).
Pass the vaddr at which the uprobe is to be inserted so that
arch_uprobe_analyze_insn() can flag misaligned registration requests.

Signed-off-by: Ananth N Mavinakaynahalli 
---
 arch/x86/include/asm/uprobes.h |2 +-
 arch/x86/kernel/uprobes.c  |3 ++-
 kernel/events/uprobes.c|2 +-
 3 files changed, 4 insertions(+), 3 deletions(-)

Index: uprobes-24may/arch/x86/include/asm/uprobes.h
===
--- uprobes-24may.orig/arch/x86/include/asm/uprobes.h
+++ uprobes-24may/arch/x86/include/asm/uprobes.h
@@ -48,7 +48,7 @@ struct arch_uprobe_task {
 #endif
 };
 
-extern int  arch_uprobe_analyze_insn(struct arch_uprobe *aup, struct mm_struct 
*mm);
+extern int  arch_uprobe_analyze_insn(struct arch_uprobe *aup, struct mm_struct 
*mm, loff_t vaddr);
 extern int  arch_uprobe_pre_xol(struct arch_uprobe *aup, struct pt_regs *regs);
 extern int  arch_uprobe_post_xol(struct arch_uprobe *aup, struct pt_regs 
*regs);
 extern bool arch_uprobe_xol_was_trapped(struct task_struct *tsk);
Index: uprobes-24may/arch/x86/kernel/uprobes.c
===
--- uprobes-24may.orig/arch/x86/kernel/uprobes.c
+++ uprobes-24may/arch/x86/kernel/uprobes.c
@@ -409,9 +409,10 @@ static int validate_insn_bits(struct arc
  * arch_uprobe_analyze_insn - instruction analysis including validity and 
fixups.
  * @mm: the probed address space.
  * @arch_uprobe: the probepoint information.
+ * @vaddr: virtual address at which to install the probepoint
  * Return 0 on success or a -ve number on error.
  */
-int arch_uprobe_analyze_insn(struct arch_uprobe *auprobe, struct mm_struct *mm)
+int arch_uprobe_analyze_insn(struct arch_uprobe *auprobe, struct mm_struct 
*mm, loff_t vaddr)
 {
int ret;
struct insn insn;
Index: uprobes-24may/kernel/events/uprobes.c
===
--- uprobes-24may.orig/kernel/events/uprobes.c
+++ uprobes-24may/kernel/events/uprobes.c
@@ -697,7 +697,7 @@ install_breakpoint(struct uprobe *uprobe
if (is_swbp_insn((uprobe_opcode_t *)uprobe->arch.insn))
return -EEXIST;
 
-   ret = arch_uprobe_analyze_insn(&uprobe->arch, mm);
+   ret = arch_uprobe_analyze_insn(&uprobe->arch, mm, vaddr);
if (ret)
return ret;
 

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH 2/2] [POWERPC] uprobes: powerpc port

2012-06-06 Thread Ananth N Mavinakayanahalli
From: Ananth N Mavinakayanahalli 

This is the port of uprobes to powerpc. Usage is similar to x86.

One TODO in this port compared to x86 is the uprobe abort_xol() logic.
x86 depends on the thread_struct.trap_nr (absent in powerpc) to determine
if a signal was caused when the uprobed instruction was single-stepped/
emulated, in which case, we reset the instruction pointer to the probed
address and retry the probe again.

[root@ ~]# ./bin/perf probe -x /lib64/libc.so.6 malloc
Added new event:
  probe_libc:malloc(on 0xb4860)

You can now use it in all perf tools, such as:

perf record -e probe_libc:malloc -aR sleep 1

[root@ ~]# ./bin/perf record -e probe_libc:malloc -aR sleep 20
[ perf record: Woken up 22 times to write data ]
[ perf record: Captured and wrote 5.843 MB perf.data (~255302 samples) ]
[root@ ~]# ./bin/perf report --stdio
# 
# captured on: Mon Jun  4 05:26:31 2012
# hostname : .ibm.com
# os release : 3.4.0-uprobe
# perf version : 3.4.0
# arch : ppc64
# nrcpus online : 4
# nrcpus avail : 4
# cpudesc : POWER6 (raw), altivec supported
# cpuid : 62,769
# total memory : 7310528 kB
# cmdline : /root/bin/perf record -e probe_libc:malloc -aR sleep 20
# event : name = probe_libc:malloc, type = 2, config = 0x124, config1 = 0x0, con
# HEADER_CPU_TOPOLOGY info available, use -I to display
# HEADER_NUMA_TOPOLOGY info available, use -I to display
# 
#
# Samples: 83K of event 'probe_libc:malloc'
# Event count (approx.): 83484
#
# Overhead   Command  Shared Object  Symbol
#     .  ..
#
69.05%   tar  libc-2.12.so   [.] malloc
28.57%rm  libc-2.12.so   [.] malloc
 1.32%  avahi-daemon  libc-2.12.so   [.] malloc
 0.58%  bash  libc-2.12.so   [.] malloc
 0.28%  sshd  libc-2.12.so   [.] malloc
 0.08%irqbalance  libc-2.12.so   [.] malloc
 0.05% bzip2  libc-2.12.so   [.] malloc
 0.04% sleep  libc-2.12.so   [.] malloc
 0.03%multipathd  libc-2.12.so   [.] malloc
 0.01%  sendmail  libc-2.12.so   [.] malloc
 0.01% automount  libc-2.12.so   [.] malloc


Signed-off-by: Ananth N Mavinakayanahalli 
Index: linux-3.5-rc1/arch/powerpc/include/asm/thread_info.h
===
--- linux-3.5-rc1.orig/arch/powerpc/include/asm/thread_info.h   2012-06-03 
06:59:26.0 +0530
+++ linux-3.5-rc1/arch/powerpc/include/asm/thread_info.h2012-06-03 
21:05:48.226233001 +0530
@@ -96,6 +96,7 @@
 #define TIF_RESTOREALL 11  /* Restore all regs (implies NOERROR) */
 #define TIF_NOERROR12  /* Force successful syscall return */
 #define TIF_NOTIFY_RESUME  13  /* callback before returning to user */
+#define TIF_UPROBE 14  /* breakpointed or single-stepping */
 #define TIF_SYSCALL_TRACEPOINT 15  /* syscall tracepoint instrumentation */
 
 /* as above, but as bit values */
@@ -112,12 +113,13 @@
 #define _TIF_RESTOREALL(1<
+ */
+
+#include 
+
+typedef unsigned int uprobe_opcode_t;
+
+#define MAX_UINSN_BYTES4
+#define UPROBE_XOL_SLOT_BYTES  (MAX_UINSN_BYTES)
+
+#define UPROBE_SWBP_INSN   0x7fe8
+#define UPROBE_SWBP_INSN_SIZE  4 /* swbp insn size in bytes */
+
+struct arch_uprobe {
+   u8  insn[MAX_UINSN_BYTES];
+};
+
+struct arch_uprobe_task {
+};
+
+extern int  arch_uprobe_analyze_insn(struct arch_uprobe *aup, struct mm_struct 
*mm, loff_t vaddr);
+extern unsigned long uprobe_get_swbp_addr(struct pt_regs *regs);
+extern int  arch_uprobe_pre_xol(struct arch_uprobe *aup, struct pt_regs *regs);
+extern int  arch_uprobe_post_xol(struct arch_uprobe *aup, struct pt_regs 
*regs);
+extern bool arch_uprobe_xol_was_trapped(struct task_struct *tsk);
+extern int  arch_uprobe_exception_notify(struct notifier_block *self, unsigned 
long val, void *data);
+extern void arch_uprobe_abort_xol(struct arch_uprobe *aup, struct pt_regs 
*regs);
+#endif /* _ASM_UPROBES_H */
Index: linux-3.5-rc1/arch/powerpc/kernel/Makefile
===
--- linux-3.5-rc1.orig/arch/powerpc/kernel/Makefile 2012-06-03 
06:59:26.0 +0530
+++ linux-3.5-rc1/arch/powerpc/kernel/Makefile  2012-06-03 21:05:48.226233001 
+0530
@@ -96,6 +96,7 @@
 obj-$(CONFIG_BOOTX_TEXT)   += btext.o
 obj-$(CONFIG_SMP)  += smp.o
 obj-$(CONFIG_KPROBES)  += kprobes.o
+obj-$(CONFIG_UPROBES)  += uprobes.o
 obj-$(CONFIG_PPC_UDBG_16550)   += legacy_serial.o udbg_16550.o
 obj-$(CONFIG_STACKTRACE)   += stacktrace.o
 obj-$(CONFIG_SWIOTLB)  += dma-swiotlb.o
Index: linux-3.5-rc1/arch/powerpc/kernel/signal.c
===
--- linux-3.5-rc1.orig/arch/powerpc/kernel/signal.c 2012-06-03 
06:59:26.0 +0530
+++ linux-3.5-rc1/arch/powerpc/kernel/signal.c  2012-06-03 21:

Re: [PATCH 1/2] uprobes: Pass probed vaddr to arch_uprobe_analyze_insn()

2012-06-06 Thread Peter Zijlstra
On Wed, 2012-06-06 at 14:49 +0530, Ananth N Mavinakayanahalli wrote:
> +int arch_uprobe_analyze_insn(struct arch_uprobe *auprobe, struct mm_struct 
> *mm, loff_t vaddr)

Don't we traditionally use unsigned long to pass vaddrs?
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 2/2] [POWERPC] uprobes: powerpc port

2012-06-06 Thread Peter Zijlstra
On Wed, 2012-06-06 at 14:51 +0530, Ananth N Mavinakayanahalli wrote:
> One TODO in this port compared to x86 is the uprobe abort_xol() logic.
> x86 depends on the thread_struct.trap_nr (absent in powerpc) to determine
> if a signal was caused when the uprobed instruction was single-stepped/
> emulated, in which case, we reset the instruction pointer to the probed
> address and retry the probe again. 

Another curious difference is that x86 uses an instruction decoder and
contains massive tables to validate we can probe a particular
instruction.

Can we probe all possible PPC instructions?
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH v5 1/5] powerpc/85xx: implement hardware timebase sync

2012-06-06 Thread Zhao Chenhui
On Tue, Jun 05, 2012 at 11:07:41AM -0500, Scott Wood wrote:
> On 06/05/2012 04:08 AM, Zhao Chenhui wrote:
> > On Fri, Jun 01, 2012 at 10:40:00AM -0500, Scott Wood wrote:
> >> I know you say this is for dual-core chips only, but it would be nice if
> >> you'd write this in a way that doesn't assume that (even if the
> >> corenet-specific timebase freezing comes later).
> > 
> > At this point, I have not thought about how to implement the 
> > cornet-specific timebase freezing.
> 
> I wasn't asking you to.  I was asking you to not have logic that breaks
> with more than 2 CPUs.

These routines only called in the dual-core case. 

> 
> >> Do we need an isync after setting the timebase, to ensure it's happened
> >> before we enable the timebase?  Likewise, do we need a readback after
> >> disabling the timebase to ensure it's disabled before we read the
> >> timebase in give_timebase?
> > 
> > I checked the e500 core manual (Chapter 2.16 Synchronization Requirements 
> > for SPRs).
> > Only some SPR registers need an isync. The timebase registers do not.
> 
> I don't trust that, and the consequences of having the sync be imperfect
> are too unpleasant to chance it.
> 
> > I did a readback in mpc85xx_timebase_freeze().
> 
> Sorry, missed that somehow.
> 
> >>> +#ifdef CONFIG_KEXEC
> >>> + np = of_find_matching_node(NULL, guts_ids);
> >>> + if (np) {
> >>> + guts = of_iomap(np, 0);
> >>> + smp_85xx_ops.give_timebase = mpc85xx_give_timebase;
> >>> + smp_85xx_ops.take_timebase = mpc85xx_take_timebase;
> >>> + of_node_put(np);
> >>> + } else {
> >>> + smp_85xx_ops.give_timebase = smp_generic_give_timebase;
> >>> + smp_85xx_ops.take_timebase = smp_generic_take_timebase;
> >>> + }
> >>
> >> Do not use smp_generic_give/take_timebase, ever.  If you don't have the
> >> guts node, then just assume the timebase is already synced.
> >>
> >> -Scott
> > 
> > smp_generic_give/take_timebase is the default in KEXEC before.
> 
> That was a mistake.
> 
> > If do not set them, it may make KEXEC fail on other platforms.
> 
> What platforms?
> 
> -Scott

Such as P4080, P3041, etc.

-Chenhui


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 2/2] [POWERPC] uprobes: powerpc port

2012-06-06 Thread Ananth N Mavinakayanahalli
On Wed, Jun 06, 2012 at 11:27:02AM +0200, Peter Zijlstra wrote:
> On Wed, 2012-06-06 at 14:51 +0530, Ananth N Mavinakayanahalli wrote:
> > One TODO in this port compared to x86 is the uprobe abort_xol() logic.
> > x86 depends on the thread_struct.trap_nr (absent in powerpc) to determine
> > if a signal was caused when the uprobed instruction was single-stepped/
> > emulated, in which case, we reset the instruction pointer to the probed
> > address and retry the probe again. 
> 
> Another curious difference is that x86 uses an instruction decoder and
> contains massive tables to validate we can probe a particular
> instruction.
> 
> Can we probe all possible PPC instructions?

For the kernel, the only ones that are off limits are rfi (return from
interrupt), mtmsr (move to msr). All other instructions can be probed.

Both those instructions are supervisor level, so we won't see them in
userspace at all; so we should be able to probe all user level
instructions.

I am not aware of specific caveats for vector/altivec instructions;
maybe Paul or Ben are more suitable to comment on that.

Ananth

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 1/2] uprobes: Pass probed vaddr to arch_uprobe_analyze_insn()

2012-06-06 Thread Ananth N Mavinakayanahalli
On Wed, Jun 06, 2012 at 11:23:52AM +0200, Peter Zijlstra wrote:
> On Wed, 2012-06-06 at 14:49 +0530, Ananth N Mavinakayanahalli wrote:
> > +int arch_uprobe_analyze_insn(struct arch_uprobe *auprobe, struct mm_struct 
> > *mm, loff_t vaddr)
> 
> Don't we traditionally use unsigned long to pass vaddrs?

Right. But the vaddr we pass here is vma_info->vaddr which is loff_t.
I guess I should've made that clear in the patch description.

Ananth

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH v5 2/5] powerpc/85xx: add HOTPLUG_CPU support

2012-06-06 Thread Zhao Chenhui
On Tue, Jun 05, 2012 at 11:15:52AM -0500, Scott Wood wrote:
> On 06/05/2012 06:18 AM, Zhao Chenhui wrote:
> > On Mon, Jun 04, 2012 at 11:32:47AM -0500, Scott Wood wrote:
> >> On 06/04/2012 06:04 AM, Zhao Chenhui wrote:
> >>> On Fri, Jun 01, 2012 at 04:27:27PM -0500, Scott Wood wrote:
>  On 05/11/2012 06:53 AM, Zhao Chenhui wrote:
> > -#ifdef CONFIG_KEXEC
> > +#if defined(CONFIG_KEXEC) || defined(CONFIG_HOTPLUG_CPU)
> 
>  Let's not grow lists like this.  Is there any harm in building it
>  unconditionally?
> 
>  -Scott
> >>>
> >>> We need this ifdef. We only set give_timebase/take_timebase
> >>> when CONFIG_KEXEC or CONFIG_HOTPLUG_CPU is defined.
> >>
> >> If we really need this to be a compile-time decision, make a new symbol
> >> for it, but I really think this should be decided at runtime.  Just
> >> because we have kexec or hotplug support enabled doesn't mean that's
> >> actually what we're doing at the moment.
> >>
> >> -Scott
> > 
> > If user does not enable kexec or hotplug, these codes are redundant.
> > So use CONFIG_KEXEC and CONFIG_HOTPLUG_CPU to gard them.
> 
> My point is that these lists tend to grow and be a maintenance pain.
> For small things it's often better to not worry about saving a few
> bytes.  For larger things that need to be conditional, define a new
> symbol rather than growing ORed lists like this.
> 
> -Scott

I agree with you in principle. But there are only two config options
in this patch, and it is unlikely to grow. 

-Chenhui

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH v5 5/5] powerpc/85xx: add support to JOG feature using cpufreq interface

2012-06-06 Thread Zhao Chenhui
On Tue, Jun 05, 2012 at 10:58:41AM -0500, Scott Wood wrote:
> On 06/05/2012 05:59 AM, Zhao Chenhui wrote:
> > On Fri, Jun 01, 2012 at 06:30:55PM -0500, Scott Wood wrote:
> >> On 05/11/2012 06:53 AM, Zhao Chenhui wrote:
> >>> The jog mode frequency transition process on the MPC8536 is similar to
> >>> the deep sleep process. The driver need save the CPU state and restore
> >>> it after CPU warm reset.
> >>>
> >>> Note:
> >>>  * The I/O peripherals such as PCIe and eTSEC may lose packets during
> >>>the jog mode frequency transition.
> >>
> >> That might be acceptable for eTSEC, but it is not acceptable to lose
> >> anything on PCIe.  Especially not if you're going to make this "default y".
> > 
> > It is a hardware limitation.
> 
> Then make sure jog isn't used if PCIe is used.
> 
> Maybe you could do something with the suspend infrastructure, but this
> is sufficiently heavyweight that transitions should be manually
> requested, not triggered by the automatic cpufreq governor.
> 
> Does this apply to p1022, or just mpc8536?

Both of them.

> 
> > Peripherals in the platform will not be operating
> > during the jog mode frequency transition process.
> 
> What ensures this?
> 
> -Scott

Hardware ensures it without software intervention.

-Chenhui

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 1/2] uprobes: Pass probed vaddr to arch_uprobe_analyze_insn()

2012-06-06 Thread Ananth N Mavinakayanahalli
On Wed, Jun 06, 2012 at 11:40:15AM +0200, Ingo Molnar wrote:
> 
> * Ananth N Mavinakayanahalli  wrote:
> 
> > On Wed, Jun 06, 2012 at 11:23:52AM +0200, Peter Zijlstra wrote:
> > > On Wed, 2012-06-06 at 14:49 +0530, Ananth N Mavinakayanahalli wrote:
> > > > +int arch_uprobe_analyze_insn(struct arch_uprobe *auprobe, struct 
> > > > mm_struct *mm, loff_t vaddr)
> > > 
> > > Don't we traditionally use unsigned long to pass vaddrs?
> > 
> > Right. But the vaddr we pass here is vma_info->vaddr which is loff_t.
> > I guess I should've made that clear in the patch description.
> 
> Why not fix struct vma_info's vaddr type?

Agreed. Will fix and send v2.

Ananth

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [RFC PATCH] sched/numa: do load balance between remote nodes

2012-06-06 Thread Sergei Shtylyov

Hello.

On 06-06-2012 10:52, Alex Shi wrote:


commit cb83b629b


   Please also specify that commit's summary in parens.


remove the NODE sched domain and check if the node
distance in SLIT table is farther than REMOTE_DISTANCE, if so, it will
lose the load balance chance at exec/fork/wake_affine points.



But actually, even the node distance is farther than REMOTE_DISTANCE,
Modern CPUs also has QPI like connections, that make memory access is


   "Is" not needed here.


not too slow between nodes.  So above losing on NUMA machine make a
huge performance regression on benchmark: hackbench, tbench, netperf
and oltp etc.



This patch will recover the scheduler behavior to old mode on all my
Intel platforms: NHM EP/EX, WSM EP, SNB EP/EP4S, and so remove the
perfromance regressions. (all of them just has 2 kinds distance, 10 21)



Signed-off-by: Alex Shi


WBR, Sergei
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: kernel panic during kernel module load (powerpc specific part)

2012-06-06 Thread Benjamin Herrenschmidt
On Wed, 2012-06-06 at 09:36 +0200, Steffen Rumler wrote:
> 
> how should we continue here ?
> There is the kernel panic, I've described.
> 
> Technically, there is an conflict between the code generated by the
> compiler and the loader in module_32.c, at least by using -Os.
> Because the prologue/epilogue is part of the .text and init_module()
> is part of .init.text (in the case __init is applied, as usual),
> a directly reachable call is not always possible. 

As we discussed earlier, if you could submit a patch to use r12 instead,
we should merge that.

Cheers,
Ben.


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 1/2] uprobes: Pass probed vaddr to arch_uprobe_analyze_insn()

2012-06-06 Thread Srikar Dronamraju
* Ingo Molnar  [2012-06-06 11:40:15]:

> 
> * Ananth N Mavinakayanahalli  wrote:
> 
> > On Wed, Jun 06, 2012 at 11:23:52AM +0200, Peter Zijlstra wrote:
> > > On Wed, 2012-06-06 at 14:49 +0530, Ananth N Mavinakayanahalli wrote:
> > > > +int arch_uprobe_analyze_insn(struct arch_uprobe *auprobe, struct 
> > > > mm_struct *mm, loff_t vaddr)
> > > 
> > > Don't we traditionally use unsigned long to pass vaddrs?
> > 
> > Right. But the vaddr we pass here is vma_info->vaddr which is loff_t.
> > I guess I should've made that clear in the patch description.
> 
> Why not fix struct vma_info's vaddr type?
> 

Calculating and comparing vaddr results either uses variables of type loff_t. 
To avoid typecasting and avoid overflow at each of these places, we used
loff_t. 

Ananth, install_breakpoint() already has a variable of type addr of type
unsigned long.  Why dont you use addr instead of vaddr. 

-- 
Thanks and regards
Srikar

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 1/2] uprobes: Pass probed vaddr to arch_uprobe_analyze_insn()

2012-06-06 Thread Ingo Molnar

* Ananth N Mavinakayanahalli  wrote:

> On Wed, Jun 06, 2012 at 11:23:52AM +0200, Peter Zijlstra wrote:
> > On Wed, 2012-06-06 at 14:49 +0530, Ananth N Mavinakayanahalli wrote:
> > > +int arch_uprobe_analyze_insn(struct arch_uprobe *auprobe, struct 
> > > mm_struct *mm, loff_t vaddr)
> > 
> > Don't we traditionally use unsigned long to pass vaddrs?
> 
> Right. But the vaddr we pass here is vma_info->vaddr which is loff_t.
> I guess I should've made that clear in the patch description.

Why not fix struct vma_info's vaddr type?

Thanks,

Ingo
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH] kernel panic during kernel module load (powerpc specific part)

2012-06-06 Thread Steffen Rumler

Hi,

The patch below is intended to fix the following problem.

According to the PowerPC EABI specification, the GPR r11 is assigned
the dedicated function to point to the previous stack frame.
In the powerpc-specific kernel module loader, do_plt_call()
(in arch/powerpc/kernel/module_32.c), the GPR r11 is also used
to generate trampoline code.

This combination crashes the kernel, in the following case:

  + The compiler has been generated the prologue and epilogue,
which is part of the .text section.
  + The compiler has been generated the code for the module init entry point,
part of the .init.text section (in the case it is marked with __init).
  + By returning from the module init entry point, the epilogue is called by 
doing
a branch instruction.
  + If the epilogue is too far away, a relative branch instruction cannot be 
applied.
Instead trampoline code is generated in do_plt_call(), in order to jump via 
register.
Unfortunately the code generated by do_plt_call() destroys the content of 
GPR r11.
  + Because GPR r11 does not more keep the right stack frame pointer,
the kernel crashes right after the epilogue.

The fix just uses GPR r12 instead of GPR r11 for generating the trampoline code.
According to the statements from Freescale, this is also save from EABI 
perspective.

I've tested the fix for kernel 2.6.33 on MPC8541.

Signed-off-by: Steffen Rumler 
---

--- orig/arch/powerpc/kernel/module_32.c2012-06-06 16:04:28.956446788 
+0200
+++ new/arch/powerpc/kernel/module_32.c 2012-06-06 16:04:17.746290683 
+0200
@@ -187,8 +187,8 @@

 static inline int entry_matches(struct ppc_plt_entry *entry, Elf32_Addr val)
 {
-   if (entry->jump[0] == 0x3d60 + ((val + 0x8000) >> 16)
-   && entry->jump[1] == 0x396b + (val & 0x))
+   if (entry->jump[0] == 0x3d80 + ((val + 0x8000) >> 16)
+   && entry->jump[1] == 0x398c + (val & 0x))
return 1;
return 0;
 }
@@ -215,10 +215,9 @@
entry++;
}

-   /* Stolen from Paul Mackerras as well... */
-   entry->jump[0] = 0x3d60+((val+0x8000)>>16);/* lis 
r11,sym@ha */
-   entry->jump[1] = 0x396b + (val&0x);  /* addi r11,r11,sym@l*/
-   entry->jump[2] = 0x7d6903a6; /* mtctr r11 */
+   entry->jump[0] = 0x3d80+((val+0x8000)>>16); /* lis r12,sym@ha */
+   entry->jump[1] = 0x398c + (val&0x); /* addi r12,r12,sym@l*/
+   entry->jump[2] = 0x7d8903a6;/* mtctr r12 */
entry->jump[3] = 0x4e800420; /* bctr */

DEBUGP("Initialized plt for 0x%x at %p\n", val, entry);
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 1/2] uprobes: Pass probed vaddr to arch_uprobe_analyze_insn()

2012-06-06 Thread Oleg Nesterov
On 06/06, Ananth N Mavinakayanahalli wrote:
>
> From: Ananth N Mavinakayanahalli 
>
> On RISC architectures like powerpc, instructions are fixed size.
> Instruction analysis on such platforms is just a matter of (insn % 4).
> Pass the vaddr at which the uprobe is to be inserted so that
> arch_uprobe_analyze_insn() can flag misaligned registration requests.

And the next patch checks "vaddr & 0x03".

But why do you need this new arg? arch_uprobe_analyze_insn() could
check "container_of(auprobe, struct uprobe, arch)->offset & 0x3" with
the same effect, no? vm_start/vm_pgoff are obviously page-aligned.

Oleg.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH] powerpc: Fix assmption of end_of_DRAM() returns end address

2012-06-06 Thread Andrea Arcangeli
Hi,

On Wed, Jun 06, 2012 at 03:30:17PM +1000, Benjamin Herrenschmidt wrote:
> On Wed, 2012-06-06 at 00:46 +, Bhushan Bharat-R65777 wrote:
> 
> > > >> memblock_end_of_DRAM() returns end_address + 1, not end address.
> > > >> While some code assumes that it returns end address.
> > > >
> > > > Shouldn't we instead fix it the other way around ? IE, make
> > > > memblock_end_of_DRAM() does what the name implies, which is to
> > return
> > > > the last byte of DRAM, and fix the -other- callers not to make bad
> > > > assumptions ?
> > > 
> > > That was my impression too when I saw this patch.
> > 
> > Initially I also intended to do so. I initiated a email on linux-mm@
> > subject "memblock_end_of_DRAM()  return end address + 1" and the only
> > response I received from Andrea was:
> > 
> > "
> > It's normal that "end" means "first byte offset out of the range". End
> > = not ok.
> > end = start+size.
> > This is true for vm_end too. So it's better to keep it that way.
> > My suggestion is to just fix point 1 below and audit the rest :)
> > "
> 
> Oh well, I don't care enough to fight this battle in my current state so

I wish you to get well soon Ben!

> unless Dave has more stamina than I have today, I'm ok with the patch.

Well it doesn't really matter in the end what is decided as long as
something is decided :). I was asked through a forward so I only
expressed my preference...
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH] powerpc: Fix assmption of end_of_DRAM() returns end address

2012-06-06 Thread David Miller
From: Benjamin Herrenschmidt 
Date: Wed, 06 Jun 2012 15:30:17 +1000

> On Wed, 2012-06-06 at 00:46 +, Bhushan Bharat-R65777 wrote:
> 
>> > >> memblock_end_of_DRAM() returns end_address + 1, not end address.
>> > >> While some code assumes that it returns end address.
>> > >
>> > > Shouldn't we instead fix it the other way around ? IE, make
>> > > memblock_end_of_DRAM() does what the name implies, which is to
>> return
>> > > the last byte of DRAM, and fix the -other- callers not to make bad
>> > > assumptions ?
>> > 
>> > That was my impression too when I saw this patch.
>> 
>> Initially I also intended to do so. I initiated a email on linux-mm@
>> subject "memblock_end_of_DRAM()  return end address + 1" and the only
>> response I received from Andrea was:
>> 
>> "
>> It's normal that "end" means "first byte offset out of the range". End
>> = not ok.
>> end = start+size.
>> This is true for vm_end too. So it's better to keep it that way.
>> My suggestion is to just fix point 1 below and audit the rest :)
>> "
> 
> Oh well, I don't care enough to fight this battle in my current state so
> unless Dave has more stamina than I have today, I'm ok with the patch.

I'm definitely without the samina to fight something like this right now :)
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 1/2] uprobes: Pass probed vaddr to arch_uprobe_analyze_insn()

2012-06-06 Thread Srikar Dronamraju
* Oleg Nesterov  [2012-06-06 17:08:48]:

> On 06/06, Ananth N Mavinakayanahalli wrote:
> >
> > From: Ananth N Mavinakayanahalli 
> >
> > On RISC architectures like powerpc, instructions are fixed size.
> > Instruction analysis on such platforms is just a matter of (insn % 4).
> > Pass the vaddr at which the uprobe is to be inserted so that
> > arch_uprobe_analyze_insn() can flag misaligned registration requests.
> 
> And the next patch checks "vaddr & 0x03".
> 
> But why do you need this new arg? arch_uprobe_analyze_insn() could
> check "container_of(auprobe, struct uprobe, arch)->offset & 0x3" with
> the same effect, no? vm_start/vm_pgoff are obviously page-aligned.
> 

We cant use container_of because we moved the definition for struct
uprobe to kernel/events/uprobe.c. This was possible before when struct
uprobe definition was in include/uprobes.h 

-- 
Thanks and Regards
Srikar

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH] powerpc: Optimise the 64bit optimised __clear_user

2012-06-06 Thread Segher Boessenkool

+err1;  dcbzr0,r3


There is no such instruction, you probably meant "dcbz 0,r3"?


Segher

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 2/2] [POWERPC] uprobes: powerpc port

2012-06-06 Thread Jim Keniston
On Wed, 2012-06-06 at 15:05 +0530, Ananth N Mavinakayanahalli wrote:
> On Wed, Jun 06, 2012 at 11:27:02AM +0200, Peter Zijlstra wrote:
> > On Wed, 2012-06-06 at 14:51 +0530, Ananth N Mavinakayanahalli wrote:
> > > One TODO in this port compared to x86 is the uprobe abort_xol() logic.
> > > x86 depends on the thread_struct.trap_nr (absent in powerpc) to determine
> > > if a signal was caused when the uprobed instruction was single-stepped/
> > > emulated, in which case, we reset the instruction pointer to the probed
> > > address and retry the probe again. 
> > 
> > Another curious difference is that x86 uses an instruction decoder and
> > contains massive tables to validate we can probe a particular
> > instruction.

Part of that difference is because the x86 instruction set is a lot more
complex.  Another part is due to the lack, back when the x86 code was
created, of robust handling by uprobes of traps by probed instructions.
So we refused to probe instructions that we knew (or strongly suspected)
would generate traps in user mode -- e.g., privileged instructions,
illegal instructions.  A couple of times we had to "legalize"
instructions or prefixes that we didn't originally expect to encounter.

> > 
> > Can we probe all possible PPC instructions?
> 
> For the kernel, the only ones that are off limits are rfi (return from
> interrupt), mtmsr (move to msr). All other instructions can be probed.
> 
> Both those instructions are supervisor level, so we won't see them in
> userspace at all; so we should be able to probe all user level
> instructions.

Presumably rfi or mtmsr could show up in the instruction stream via an
erroneous or mischievous asm statement.  It'd be good to verify that you
handle that gracefully.

> 
> I am not aware of specific caveats for vector/altivec instructions;
> maybe Paul or Ben are more suitable to comment on that.
> 
> Ananth
> 

Jim


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH v5 2/5] powerpc/85xx: add HOTPLUG_CPU support

2012-06-06 Thread Scott Wood
On 06/06/2012 04:59 AM, Zhao Chenhui wrote:
> On Tue, Jun 05, 2012 at 11:15:52AM -0500, Scott Wood wrote:
>> On 06/05/2012 06:18 AM, Zhao Chenhui wrote:
>>> If user does not enable kexec or hotplug, these codes are redundant.
>>> So use CONFIG_KEXEC and CONFIG_HOTPLUG_CPU to gard them.
>>
>> My point is that these lists tend to grow and be a maintenance pain.
>> For small things it's often better to not worry about saving a few
>> bytes.  For larger things that need to be conditional, define a new
>> symbol rather than growing ORed lists like this.
>>
>> -Scott
> 
> I agree with you in principle. But there are only two config options
> in this patch, and it is unlikely to grow. 

That's what everybody says when these things start. :-)

-Scott

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH v5 1/5] powerpc/85xx: implement hardware timebase sync

2012-06-06 Thread Scott Wood
On 06/06/2012 04:31 AM, Zhao Chenhui wrote:
> On Tue, Jun 05, 2012 at 11:07:41AM -0500, Scott Wood wrote:
>> On 06/05/2012 04:08 AM, Zhao Chenhui wrote:
>>> On Fri, Jun 01, 2012 at 10:40:00AM -0500, Scott Wood wrote:
 I know you say this is for dual-core chips only, but it would be nice if
 you'd write this in a way that doesn't assume that (even if the
 corenet-specific timebase freezing comes later).
>>>
>>> At this point, I have not thought about how to implement the 
>>> cornet-specific timebase freezing.
>>
>> I wasn't asking you to.  I was asking you to not have logic that breaks
>> with more than 2 CPUs.
> 
> These routines only called in the dual-core case. 

Come on, you know we have chips with more than two cores.  Why design
such a limitation into it, just because you're not personally interested
in supporting anything but e500v2?

Is it so hard to make it work for an arbitrary number of cores?

>>> If do not set them, it may make KEXEC fail on other platforms.
>>
>> What platforms?
> 
> Such as P4080, P3041, etc.

So we need to wait for corenet timebase sync before we stop causing
problems in virtualization, simulators, etc. if a kernel has kexec or
cpu hotplug enabled (whether used or not)?

Can you at least make sure we're actually in a kexec/hotplug scenario at
runtime?

Or just implement corenet timebase sync -- it's not that different.

-Scott

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH v5 4/5] fsl_pmc: Add API to enable device as wakeup event source

2012-06-06 Thread Scott Wood
On 06/05/2012 11:06 PM, Li Yang wrote:
> On Wed, Jun 6, 2012 at 2:05 AM, Scott Wood  wrote:
>> You ignored "what about devices other than ethernet".
> 
> No, I haven't.  Other devices are so at least for now.

I don't understand that last sentence.  Other devices are what?

-Scott

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH] PPC: PCI: Fix pcibios_io_space_offset() so it works for 32-bit ptr/64-bit rsrcs

2012-06-06 Thread Scott Wood
On 06/05/2012 10:50 PM, Ben Collins wrote:
> The commit introducing pcibios_io_space_offset() was ignoring 32-bit to
> 64-bit sign extention, which is the case on ppc32 with 64-bit resource
> addresses. This only seems to have shown up while running under QEMU for
> e500mc target. It may or may be suboptimal that QEMU has an IO base
> address > 32-bits for the e500-pci implementation, but 1) it's still a
> regression and 2) it's more correct to handle things this way.

Where do you see addresses over 32 bits in QEMU's e500-pci, at least
with current mainline QEMU and the mpc8544ds model?

I/O space should be at 0xe100.

I'm also not sure what this has to do with the virtual address returned
by ioremap().

> Signed-off-by: Ben Collins 
> Cc: Benjamin Herrenschmidt 
> ---
>  arch/powerpc/kernel/pci-common.c |8 +++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/kernel/pci-common.c 
> b/arch/powerpc/kernel/pci-common.c
> index 8e78e93..be9ced7 100644
> --- a/arch/powerpc/kernel/pci-common.c
> +++ b/arch/powerpc/kernel/pci-common.c
> @@ -1477,9 +1477,15 @@ int pcibios_enable_device(struct pci_dev *dev, int 
> mask)
>   return pci_enable_resources(dev, mask);
>  }
>  
> +/* Before assuming too much here, take care to realize that we need sign
> + * extension from 32-bit pointers to 64-bit resource addresses to work.
> + */
>  resource_size_t pcibios_io_space_offset(struct pci_controller *hose)
>  {
> - return (unsigned long) hose->io_base_virt - _IO_BASE;
> + long vbase = (long)hose->io_base_virt;
> + long io_base = _IO_BASE;
> +
> + return (resource_size_t)(vbase - io_base);

Why do we want sign extension here?

If we do want it, there are a lot of other places in this file where the
same calculation is done.

-Scott

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH] powerpc: Optimise the 64bit optimised __clear_user

2012-06-06 Thread Benjamin Herrenschmidt
On Wed, 2012-06-06 at 18:40 +0200, Segher Boessenkool wrote:
> > +err1;  dcbzr0,r3
> 
> There is no such instruction, you probably meant "dcbz 0,r3"?

This reminds me... what would happen if we changed all our

#define r0  0
#define r1  1

etc... to:

#define r0  %r0
#define r1  %r1

?

I'm thinking it might help catch that sort of nasties (and some of them
can be really nasty, such as inverting mfspr/mtspr arguments, or vs ori,
etc... ). I'm sure we'd have a problem with a few macros & inline
constructs but nothing we can't fix..

(Haven't tested ... still home, officially sick :-)

Cheers,
Ben.


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH] PPC: PCI: Fix pcibios_io_space_offset() so it works for 32-bit ptr/64-bit rsrcs

2012-06-06 Thread Benjamin Herrenschmidt
On Wed, 2012-06-06 at 16:15 -0500, Scott Wood wrote:
> On 06/05/2012 10:50 PM, Ben Collins wrote:
> > The commit introducing pcibios_io_space_offset() was ignoring 32-bit to
> > 64-bit sign extention, which is the case on ppc32 with 64-bit resource
> > addresses. This only seems to have shown up while running under QEMU for
> > e500mc target. It may or may be suboptimal that QEMU has an IO base
> > address > 32-bits for the e500-pci implementation, but 1) it's still a
> > regression and 2) it's more correct to handle things this way.
> 
> Where do you see addresses over 32 bits in QEMU's e500-pci, at least
> with current mainline QEMU and the mpc8544ds model?
> 
> I/O space should be at 0xe100.
> 
> I'm also not sure what this has to do with the virtual address returned
> by ioremap().

This is due to how we calculate IO offsets on ppc32, see below

> > Signed-off-by: Ben Collins 
> > Cc: Benjamin Herrenschmidt 
> > ---
> >  arch/powerpc/kernel/pci-common.c |8 +++-
> >  1 file changed, 7 insertions(+), 1 deletion(-)
> > 
> > diff --git a/arch/powerpc/kernel/pci-common.c 
> > b/arch/powerpc/kernel/pci-common.c
> > index 8e78e93..be9ced7 100644
> > --- a/arch/powerpc/kernel/pci-common.c
> > +++ b/arch/powerpc/kernel/pci-common.c
> > @@ -1477,9 +1477,15 @@ int pcibios_enable_device(struct pci_dev *dev, int 
> > mask)
> > return pci_enable_resources(dev, mask);
> >  }
> >  
> > +/* Before assuming too much here, take care to realize that we need sign
> > + * extension from 32-bit pointers to 64-bit resource addresses to work.
> > + */
> >  resource_size_t pcibios_io_space_offset(struct pci_controller *hose)
> >  {
> > -   return (unsigned long) hose->io_base_virt - _IO_BASE;
> > +   long vbase = (long)hose->io_base_virt;
> > +   long io_base = _IO_BASE;
> > +
> > +   return (resource_size_t)(vbase - io_base);
> 
> Why do we want sign extension here?
> 
> If we do want it, there are a lot of other places in this file where the
> same calculation is done.

We should probably as much as possible factor it, but basically what
happens is that to access IO space, we turn:

 oub(port)

into
 out_8(_IO_BASE + port)

With _IO_BASE being a global.

Now what happens when you have several PHBs ? Well, we make _IO_BASE be
the result of ioremap'ing the IO space window of the first one, minus
the bus address corresponding to the beginning of that window. Then for
each device, we offset devices with the offset calculated above.

Now that means that we can end up with funky arithmetic in a couple of
cases:

 - If the bus address of the IO space is larger than the virtual address
returned by ioremap (it's a bit silly to use large IO addresses but it's
technically possible, normally IO windows start at 0 bus-side though).
In fact I wouldn't be surprised if we have various other bugs if IO
windows don't start at 0 (you may want to double check your dts setup
here).

 - If the ioremap'ed address of the IO space of another domain is lower
than the ioremap'ed address of the first domain, in which case the
calculation:

host->io_base_virt - _IO_BASE

results in a negative offset.

Thus we need to make sure that this offset is fully sign extended so
that things work properly when applied to a resource_size_t which can be
64-bit.

On ppc64 we do things differently, we have a single linear region that
has all IO spaces and _IO_BASE is the beginning of it so offsets are
never negative, we can do that because we don't care wasting address
space there.

Cheers,
Ben.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH] PPC: PCI: Fix pcibios_io_space_offset() so it works for 32-bit ptr/64-bit rsrcs

2012-06-06 Thread Ben Collins

On Jun 6, 2012, at 5:15 PM, Scott Wood wrote:

> On 06/05/2012 10:50 PM, Ben Collins wrote:
>> The commit introducing pcibios_io_space_offset() was ignoring 32-bit to
>> 64-bit sign extention, which is the case on ppc32 with 64-bit resource
>> addresses. This only seems to have shown up while running under QEMU for
>> e500mc target. It may or may be suboptimal that QEMU has an IO base
>> address > 32-bits for the e500-pci implementation, but 1) it's still a
>> regression and 2) it's more correct to handle things this way.
> 
> Where do you see addresses over 32 bits in QEMU's e500-pci, at least
> with current mainline QEMU and the mpc8544ds model?
> 
> I/O space should be at 0xe100.

The problem is this:

pci_bus :00: root bus resource [io  0xffbed000-0xffbfcfff] (bus address 
[0x1-0x1])

Without the fix that I sent, it ends up looking like:

pci_bus :00: root bus resource [io  0xffbed000-0xffbfcfff] (bus address 
[0x-0x])

And that's when some devices fail to be assigned valid bar 0's and the kernel 
complains because of it.

> I'm also not sure what this has to do with the virtual address returned
> by ioremap().
> 
>> Signed-off-by: Ben Collins 
>> Cc: Benjamin Herrenschmidt 
>> ---
>> arch/powerpc/kernel/pci-common.c |8 +++-
>> 1 file changed, 7 insertions(+), 1 deletion(-)
>> 
>> diff --git a/arch/powerpc/kernel/pci-common.c 
>> b/arch/powerpc/kernel/pci-common.c
>> index 8e78e93..be9ced7 100644
>> --- a/arch/powerpc/kernel/pci-common.c
>> +++ b/arch/powerpc/kernel/pci-common.c
>> @@ -1477,9 +1477,15 @@ int pcibios_enable_device(struct pci_dev *dev, int 
>> mask)
>>  return pci_enable_resources(dev, mask);
>> }
>> 
>> +/* Before assuming too much here, take care to realize that we need sign
>> + * extension from 32-bit pointers to 64-bit resource addresses to work.
>> + */
>> resource_size_t pcibios_io_space_offset(struct pci_controller *hose)
>> {
>> -return (unsigned long) hose->io_base_virt - _IO_BASE;
>> +long vbase = (long)hose->io_base_virt;
>> +long io_base = _IO_BASE;
>> +
>> +return (resource_size_t)(vbase - io_base);
> 
> Why do we want sign extension here?
> 
> If we do want it, there are a lot of other places in this file where the
> same calculation is done.
> 
> -Scott
> 

--
Ben Collins
Servergy, Inc.
(757) 243-7557

CONFIDENTIALITY NOTICE: This communication contains privileged and/or 
confidential information; and should be maintained with the strictest 
confidence. It is intended solely for the use of the person or entity in which 
it is addressed. If you are not the intended recipient, you are STRICTLY 
PROHIBITED from disclosing, copying, distributing or using any of this 
information. If you received this communication in error, please contact the 
sender immediately and destroy the material in its entirety, whether electronic 
or hard copy.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH] powerpc: Optimise the 64bit optimised __clear_user

2012-06-06 Thread Olof Johansson
On Mon, Jun 4, 2012 at 7:02 PM, Anton Blanchard  wrote:
>
> I blame Mikey for this. He elevated my slightly dubious testcase:
>
> # dd if=/dev/zero of=/dev/null bs=1M count=1
>
> to benchmark status. And naturally we need to be number 1 at creating
> zeros. So lets improve __clear_user some more.
>
> As Paul suggests we can use dcbz for large lengths. This patch gets
> the destination cacheline aligned then uses dcbz on whole cachelines.
>
> Before:
> 1048576 bytes (10 GB) copied, 0.414744 s, 25.3 GB/s
>
> After:
> 1048576 bytes (10 GB) copied, 0.268597 s, 39.0 GB/s
>
> 39 GB/s, a new record.
>
> Signed-off-by: Anton Blanchard 

Besides the comments from Segher, feel free to add:

Tested-by: Olof Johansson 
Acked-by: Olof Johansson 

Didn't help performance all that much on pa6t, but it didn't go down.
Too low on cycles to actually analyze why at this time.

-OIof
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [RFC PATCH] sched/numa: do load balance between remote nodes

2012-06-06 Thread Alex Shi
On 06/06/2012 05:01 PM, Peter Zijlstra wrote:

> On Wed, 2012-06-06 at 14:52 +0800, Alex Shi wrote:
>> -   if (sched_domains_numa_distance[level] > REMOTE_DISTANCE)
>> +   if (sched_domains_numa_distance[level] > RECLAIM_DISTANCE) 
> 
> I actually considered this.. I just felt a little uneasy re-purposing
> the RECLAIM_DISTANCE for this, but I guess its all the same anyway. Both
> mean expensive-away-distance.
> 


I understand you, the BIOS guys don't have a good alignment with us on
this.

> So I've taken this.
> 
> thanks!


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH] PPC: PCI: Fix pcibios_io_space_offset() so it works for 32-bit ptr/64-bit rsrcs

2012-06-06 Thread Scott Wood
On 06/06/2012 05:21 PM, Benjamin Herrenschmidt wrote:
> Now that means that we can end up with funky arithmetic in a couple of
> cases:
> 
>  - If the bus address of the IO space is larger than the virtual address
> returned by ioremap (it's a bit silly to use large IO addresses but it's
> technically possible, normally IO windows start at 0 bus-side though).
> In fact I wouldn't be surprised if we have various other bugs if IO
> windows don't start at 0 (you may want to double check your dts setup
> here).

The dts does show the I/O beginning at bus address zero:

 ranges = <0x200 0x0 0xc000 0xc000 0x0
0x2000
   0x100 0x0 0x0 0xe100 0x0 0x1>;

>  - If the ioremap'ed address of the IO space of another domain is lower
> than the ioremap'ed address of the first domain, in which case the
> calculation:
> 
>   host->io_base_virt - _IO_BASE
> 
> results in a negative offset.

There should have been only one PCI domain in the QEMU case.

-Scott

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH] mpc85xx_edac: fix error: too few arguments to function 'edac_mc_alloc'

2012-06-06 Thread Kim Phillips
commit ca0907b "edac: Remove the legacy EDAC ABI" broke mpc85xx_edac
in the following manner:

mpc85xx_edac.c:983:35: error: too few arguments to function 'edac_mc_alloc'

this patch puts back the missing 'layers' argument.

Cc: Mauro Carvalho Chehab 
Signed-off-by: Kim Phillips 
---
 drivers/edac/mpc85xx_edac.c |3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/edac/mpc85xx_edac.c b/drivers/edac/mpc85xx_edac.c
index 4c40235..0e37462 100644
--- a/drivers/edac/mpc85xx_edac.c
+++ b/drivers/edac/mpc85xx_edac.c
@@ -980,7 +980,8 @@ static int __devinit mpc85xx_mc_err_probe(struct 
platform_device *op)
layers[1].type = EDAC_MC_LAYER_CHANNEL;
layers[1].size = 1;
layers[1].is_virt_csrow = false;
-   mci = edac_mc_alloc(edac_mc_idx, ARRAY_SIZE(layers), sizeof(*pdata));
+   mci = edac_mc_alloc(edac_mc_idx, ARRAY_SIZE(layers), layers,
+   sizeof(*pdata));
if (!mci) {
devres_release_group(&op->dev, mpc85xx_mc_err_probe);
return -ENOMEM;
-- 
1.7.10.2


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH] powerpc: Optimise the 64bit optimised __clear_user

2012-06-06 Thread Paul Mackerras
On Wed, Jun 06, 2012 at 06:40:54PM +0200, Segher Boessenkool wrote:
> >+err1;   dcbzr0,r3
> 
> There is no such instruction, you probably meant "dcbz 0,r3"?

There certainly is such an instruction, though it doesn't do exactly
what a naive reader might expect.  Using 0 rather than r0 or %r0
improves readability but makes no difference to the assembler or the
cpu.

Paul.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH v5 1/5] powerpc/85xx: implement hardware timebase sync

2012-06-06 Thread Zhao Chenhui
On Wed, Jun 06, 2012 at 01:26:16PM -0500, Scott Wood wrote:
> On 06/06/2012 04:31 AM, Zhao Chenhui wrote:
> > On Tue, Jun 05, 2012 at 11:07:41AM -0500, Scott Wood wrote:
> >> On 06/05/2012 04:08 AM, Zhao Chenhui wrote:
> >>> On Fri, Jun 01, 2012 at 10:40:00AM -0500, Scott Wood wrote:
>  I know you say this is for dual-core chips only, but it would be nice if
>  you'd write this in a way that doesn't assume that (even if the
>  corenet-specific timebase freezing comes later).
> >>>
> >>> At this point, I have not thought about how to implement the 
> >>> cornet-specific timebase freezing.
> >>
> >> I wasn't asking you to.  I was asking you to not have logic that breaks
> >> with more than 2 CPUs.
> > 
> > These routines only called in the dual-core case. 
> 
> Come on, you know we have chips with more than two cores.  Why design
> such a limitation into it, just because you're not personally interested
> in supporting anything but e500v2?
> 
> Is it so hard to make it work for an arbitrary number of cores?
> 
> >>> If do not set them, it may make KEXEC fail on other platforms.
> >>
> >> What platforms?
> > 
> > Such as P4080, P3041, etc.
> 
> So we need to wait for corenet timebase sync before we stop causing
> problems in virtualization, simulators, etc. if a kernel has kexec or
> cpu hotplug enabled (whether used or not)?
> 
> Can you at least make sure we're actually in a kexec/hotplug scenario at
> runtime?
> 
> Or just implement corenet timebase sync -- it's not that different.
> 
> -Scott

We also work on the corenet timebase sync. Our plan is first the dual-core case,
then the case of more than 2 cores. We will submit the corenet timebase sync 
patch soon.

-Chenhui

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH v5 4/5] fsl_pmc: Add API to enable device as wakeup event source

2012-06-06 Thread Li Yang
On Thu, Jun 7, 2012 at 2:29 AM, Scott Wood  wrote:
> On 06/05/2012 11:06 PM, Li Yang wrote:
>> On Wed, Jun 6, 2012 at 2:05 AM, Scott Wood  wrote:
>>> You ignored "what about devices other than ethernet".
>>
>> No, I haven't.  Other devices are so at least for now.
>
> I don't understand that last sentence.  Other devices are what?

Probably I misunderstood your question "what about devices other than
ethernet".  Did you mean how would other devices other than ethernet
know how to set it?

Other wakeup capable devices can call the API when it is up and
running.  It will be the pmc driver's responsibility to find out if
that specific device can be configured as a wakeup source for the SoC.

Leo
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] powerpc: Optimise the 64bit optimised __clear_user

2012-06-06 Thread Michael Neuling
Benjamin Herrenschmidt  wrote:

> On Wed, 2012-06-06 at 18:40 +0200, Segher Boessenkool wrote:
> > > +err1;dcbzr0,r3
> > 
> > There is no such instruction, you probably meant "dcbz 0,r3"?
> 
> This reminds me... what would happen if we changed all our
> 
> #define   r0  0
> #define   r1  1
> 
> etc... to:
> 
> #define r0%r0
> #define r1%r1
> 
> ?
> 
> I'm thinking it might help catch that sort of nasties (and some of them
> can be really nasty, such as inverting mfspr/mtspr arguments, or vs ori,
> etc... ). I'm sure we'd have a problem with a few macros & inline
> constructs but nothing we can't fix..

One problem with this is when we construct the instructions, like using
anything from ppc-opcode.h.  eg. using PPC_POPCNTB would need to go from:
PPC_POPCNTB(r3,r3) 
to:
PPC_POPCNTB(3,3) 
Which is less readable IMHO.

Mikey
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH] powerpc: Optimise the 64bit optimised __clear_user

2012-06-06 Thread Michael Ellerman
On Thu, 2012-06-07 at 16:05 +1000, Michael Neuling wrote:
> Benjamin Herrenschmidt  wrote:
> 
> > On Wed, 2012-06-06 at 18:40 +0200, Segher Boessenkool wrote:
> > > > +err1;  dcbzr0,r3
> > > 
> > > There is no such instruction, you probably meant "dcbz 0,r3"?
> > 
> > This reminds me... what would happen if we changed all our
> > 
> > #define r0  0
> > #define r1  1
> > 
> > etc... to:
> > 
> > #define r0  %r0
> > #define r1  %r1
> > 
> > ?
> > 
> > I'm thinking it might help catch that sort of nasties (and some of them
> > can be really nasty, such as inverting mfspr/mtspr arguments, or vs ori,
> > etc... ). I'm sure we'd have a problem with a few macros & inline
> > constructs but nothing we can't fix..
> 
> One problem with this is when we construct the instructions, like using
> anything from ppc-opcode.h.  eg. using PPC_POPCNTB would need to go from:
> PPC_POPCNTB(r3,r3) 
> to:
> PPC_POPCNTB(3,3) 

#define R(x)x

#define PPC_POPCNTB(R(3), R(3))

??

cheers


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH] powerpc: Optimise the 64bit optimised __clear_user

2012-06-06 Thread Michael Neuling
Michael Ellerman  wrote:

> On Thu, 2012-06-07 at 16:05 +1000, Michael Neuling wrote:
> > Benjamin Herrenschmidt  wrote:
> > 
> > > On Wed, 2012-06-06 at 18:40 +0200, Segher Boessenkool wrote:
> > > > > +err1;dcbzr0,r3
> > > > 
> > > > There is no such instruction, you probably meant "dcbz 0,r3"?
> > > 
> > > This reminds me... what would happen if we changed all our
> > > 
> > > #define   r0  0
> > > #define   r1  1
> > > 
> > > etc... to:
> > > 
> > > #define r0%r0
> > > #define r1%r1
> > > 
> > > ?
> > > 
> > > I'm thinking it might help catch that sort of nasties (and some of them
> > > can be really nasty, such as inverting mfspr/mtspr arguments, or vs ori,
> > > etc... ). I'm sure we'd have a problem with a few macros & inline
> > > constructs but nothing we can't fix..
> > 
> > One problem with this is when we construct the instructions, like using
> > anything from ppc-opcode.h.  eg. using PPC_POPCNTB would need to go from:
> > PPC_POPCNTB(r3,r3) 
> > to:
> > PPC_POPCNTB(3,3) 
> 
> #define R(x)  x

#define R(x)(x) 

> #define PPC_POPCNTB(R(3), R(3))

Maybe, looks pretty gross but you're the maintainer! :-)

Mikey
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH] powerpc: Optimise the 64bit optimised __clear_user

2012-06-06 Thread Michael Ellerman
On Thu, 2012-06-07 at 16:12 +1000, Michael Neuling wrote:
> Michael Ellerman  wrote:
> 
> > On Thu, 2012-06-07 at 16:05 +1000, Michael Neuling wrote:
> > > Benjamin Herrenschmidt  wrote:
> > > 
> > > > On Wed, 2012-06-06 at 18:40 +0200, Segher Boessenkool wrote:
> > > > > > +err1;  dcbzr0,r3
> > > > > 
> > > > > There is no such instruction, you probably meant "dcbz 0,r3"?
> > > > 
> > > > This reminds me... what would happen if we changed all our
> > > > 
> > > > #define r0  0
> > > > #define r1  1
> > > > 
> > > > etc... to:
> > > > 
> > > > #define r0  %r0
> > > > #define r1  %r1
> > > > 
> > > > ?
> > > > 
> > > > I'm thinking it might help catch that sort of nasties (and some of them
> > > > can be really nasty, such as inverting mfspr/mtspr arguments, or vs ori,
> > > > etc... ). I'm sure we'd have a problem with a few macros & inline
> > > > constructs but nothing we can't fix..
> > > 
> > > One problem with this is when we construct the instructions, like using
> > > anything from ppc-opcode.h.  eg. using PPC_POPCNTB would need to go from:
> > > PPC_POPCNTB(r3,r3) 
> > > to:
> > > PPC_POPCNTB(3,3) 
> > 
> > #define R(x)x
> 
> #define R(x)  (x) 
> 
> > #define PPC_POPCNTB(R(3), R(3))
> 
> Maybe, looks pretty gross but you're the maintainer! :-)

No I am not!


I agree it's fairly gross. But I'll take gross and correct over ungross
and buggy.

cheers


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH] powerpc: Optimise the 64bit optimised __clear_user

2012-06-06 Thread Benjamin Herrenschmidt
On Thu, 2012-06-07 at 16:05 +1000, Michael Neuling wrote:

> One problem with this is when we construct the instructions, like using
> anything from ppc-opcode.h.  eg. using PPC_POPCNTB would need to go from:
> PPC_POPCNTB(r3,r3) 
> to:
> PPC_POPCNTB(3,3) 
> Which is less readable IMHO.

Yes, I know. Not much to do about this, but it might still be worth it,
how much time wasted due to mixing up or with ori in asm somewhere ?

One option would be to #define R3 (or _r3) for use in those macros so
we still have something nicer than just "3"... oh well.

Cheers,
Ben.


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[v2][PATCH 1/1] ppc64: fix missing to check all bits of _TIF_USER_WORK_MASK in preempt

2012-06-06 Thread Tiejun Chen
In entry_64.S version of ret_from_except_lite, you'll notice that
in the !preempt case, after we've checked MSR_PR we test for any
TIF flag in _TIF_USER_WORK_MASK to decide whether to go to do_work
or not. However, in the preempt case, we do a convoluted trick to
test SIGPENDING only if PR was set and always test NEED_RESCHED ...
but we forget to test any other bit of _TIF_USER_WORK_MASK !!! So
that means that with preempt, we completely fail to test for things
like single step, syscall tracing, etc...

This should be fixed as the following path:

 - Test PR. If not set, go to resume_kernel, else continue.

 - If go resume_kernel, to do that original do_work.

 - If else, then always test for _TIF_USER_WORK_MASK to decide to do
that original user_work, else restore directly.

Signed-off-by: Benjamin Herrenschmidt 
Signed-off-by: Tiejun Chen 
---
v2:
* reorganize the original do_work/user_work

 arch/powerpc/kernel/entry_64.S |   97 ---
 1 files changed, 40 insertions(+), 57 deletions(-)

diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index ed1718f..5971c85 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -558,27 +558,54 @@ _GLOBAL(ret_from_except_lite)
mtmsrd  r10,1 /* Update machine state */
 #endif /* CONFIG_PPC_BOOK3E */
 
-#ifdef CONFIG_PREEMPT
clrrdi  r9,r1,THREAD_SHIFT  /* current_thread_info() */
-   li  r0,_TIF_NEED_RESCHED/* bits to check */
ld  r3,_MSR(r1)
ld  r4,TI_FLAGS(r9)
-   /* Move MSR_PR bit in r3 to _TIF_SIGPENDING position in r0 */
-   rlwimi  r0,r3,32+TIF_SIGPENDING-MSR_PR_LG,_TIF_SIGPENDING
-   and.r0,r4,r0/* check NEED_RESCHED and maybe SIGPENDING */
-   bne do_work
-
-#else /* !CONFIG_PREEMPT */
-   ld  r3,_MSR(r1) /* Returning to user mode? */
andi.   r3,r3,MSR_PR
-   beq restore /* if not, just restore regs and return */
+   beq resume_kernel
 
/* Check current_thread_info()->flags */
+   andi.   r0,r4,_TIF_USER_WORK_MASK
+   beq restore
+
+   andi.   r0,r4,_TIF_NEED_RESCHED
+   beq 1f
+   bl  .restore_interrupts
+   bl  .schedule
+   b   .ret_from_except_lite
+
+1: bl  .save_nvgprs
+   bl  .restore_interrupts
+   addir3,r1,STACK_FRAME_OVERHEAD
+   bl  .do_notify_resume
+   b   .ret_from_except
+
+resume_kernel:
+#ifdef CONFIG_PREEMPT
+   /* Check if we need to preempt */
+   andi.   r0,r4,_TIF_NEED_RESCHED
+   beq+restore
+   /* Check that preempt_count() == 0 and interrupts are enabled */
+   lwz r8,TI_PREEMPT(r9)
+   cmpwi   cr1,r8,0
+   ld  r0,SOFTE(r1)
+   cmpdi   r0,0
+   crandc  eq,cr1*4+eq,eq
+   bne restore
+
+   /*
+* Here we are preempting the current task. We want to make
+* sure we are soft-disabled first
+*/
+   SOFT_DISABLE_INTS(r3,r4)
+1: bl  .preempt_schedule_irq
+
+   /* Re-test flags and eventually loop */
clrrdi  r9,r1,THREAD_SHIFT
ld  r4,TI_FLAGS(r9)
-   andi.   r0,r4,_TIF_USER_WORK_MASK
-   bne do_work
-#endif /* !CONFIG_PREEMPT */
+   andi.   r0,r4,_TIF_NEED_RESCHED
+   bne 1b
+#endif /* CONFIG_PREEMPT */
 
.globl  fast_exc_return_irq
 fast_exc_return_irq:
@@ -759,50 +786,6 @@ restore_check_irq_replay:
 #endif /* CONFIG_PPC_BOOK3E */
 1: b   .ret_from_except /* What else to do here ? */
  
-
-
-3:
-do_work:
-#ifdef CONFIG_PREEMPT
-   andi.   r0,r3,MSR_PR/* Returning to user mode? */
-   bne user_work
-   /* Check that preempt_count() == 0 and interrupts are enabled */
-   lwz r8,TI_PREEMPT(r9)
-   cmpwi   cr1,r8,0
-   ld  r0,SOFTE(r1)
-   cmpdi   r0,0
-   crandc  eq,cr1*4+eq,eq
-   bne restore
-
-   /*
-* Here we are preempting the current task. We want to make
-* sure we are soft-disabled first
-*/
-   SOFT_DISABLE_INTS(r3,r4)
-1: bl  .preempt_schedule_irq
-
-   /* Re-test flags and eventually loop */
-   clrrdi  r9,r1,THREAD_SHIFT
-   ld  r4,TI_FLAGS(r9)
-   andi.   r0,r4,_TIF_NEED_RESCHED
-   bne 1b
-   b   restore
-
-user_work:
-#endif /* CONFIG_PREEMPT */
-
-   andi.   r0,r4,_TIF_NEED_RESCHED
-   beq 1f
-   bl  .restore_interrupts
-   bl  .schedule
-   b   .ret_from_except_lite
-
-1: bl  .save_nvgprs
-   bl  .restore_interrupts
-   addir3,r1,STACK_FRAME_OVERHEAD
-   bl  .do_notify_resume
-   b   .ret_from_except
-
 unrecov_restore:
addir3,r1,STACK_FRAME_OVERHEAD
bl  .unrecoverable_exception
-- 
1.5.6

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinf