Re: 7447A strange problem with MSR:POW (WAS: can't boot 2.6.17-rc1)

2006-04-18 Thread Benjamin Herrenschmidt
On Tue, 2006-04-18 at 15:45 +1000, Paul Mackerras wrote:
 Benjamin Herrenschmidt writes:
 
  Looks good to me except that we need the same for ppc64 since the 970
  theorically has the same problem...
 
 OK, does this look OK to everyone, before I send it off to Linus?  I
 now use a bit in the thread_info rather than using the HID0 bits
 themselves to indicate that we're napping, since the m[ft]spr might be
 slow.  I added a `local_flags' field to the thread_info struct for
 things that are only changed by the task itself and therefore don't
 need to be accessed atomically.
 
 This version does the same sort of change for the 970 as for 6xx.

Hrm...

The 970 version bloats the exception prolog significantly... I
understand now why you were talking about putting the code in the exit
path on irc ... I don't like it that way Also, if you want to keep
it, maybe use a separate CONFIG_PPC_970STYLE_NAP or something that gets
selected by platforms that can do it ?

I suppose a PACA field would be less inefficient but still sucks... the
exception return to userland code path already accesses thread_info and
definitely looks like a better place to put it... as long as we never
have to add dodgy workarounds when getting out of NAP like we do on 6xx.

Ben.



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Re: 7447A strange problem with MSR:POW (WAS: can't boot 2.6.17-rc1)

2006-04-18 Thread Benjamin Herrenschmidt
On Tue, 2006-04-18 at 16:32 +1000, Paul Mackerras wrote:
 Benjamin Herrenschmidt writes:
 
  The 970 version bloats the exception prolog significantly... I
 
 Four instructions, in the external and decrementer interrupt entry
 paths - I don't think that's really significant bloat.

Yeah well.. including a load.. ok, I admit that should be fairly hot...
btw, I suppose you took care of having those local flags in some hot
cache line ? :)

 More likely we'll get more situations like Cell where we come in
 through the soft reset vector after sleep.

Yeah.

Ben.


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Re: 7447A strange problem with MSR:POW (WAS: can't boot 2.6.17-rc1)

2006-04-18 Thread Paul Mackerras
Benjamin Herrenschmidt writes:

 Looks good to me except that we need the same for ppc64 since the 970
 theorically has the same problem...

OK, does this look OK to everyone, before I send it off to Linus?  I
now use a bit in the thread_info rather than using the HID0 bits
themselves to indicate that we're napping, since the m[ft]spr might be
slow.  I added a `local_flags' field to the thread_info struct for
things that are only changed by the task itself and therefore don't
need to be accessed atomically.

This version does the same sort of change for the 970 as for 6xx.

Oh, and I also fixed a stupid bug in the 32-bit stack overflow code,
where we put _end into r11, and then if there was a stack overflow,
saved registers into the stack frame pointed to by r11. :)

Paul.

diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index 54b48f3..8f85c5e 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -91,6 +91,7 @@ #endif /* CONFIG_SPE */
 #endif /* CONFIG_PPC64 */
 
DEFINE(TI_FLAGS, offsetof(struct thread_info, flags));
+   DEFINE(TI_LOCAL_FLAGS, offsetof(struct thread_info, local_flags));
DEFINE(TI_PREEMPT, offsetof(struct thread_info, preempt_count));
DEFINE(TI_TASK, offsetof(struct thread_info, task));
 #ifdef CONFIG_PPC32
diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S
index b3a9794..8866fd2 100644
--- a/arch/powerpc/kernel/entry_32.S
+++ b/arch/powerpc/kernel/entry_32.S
@@ -128,37 +128,36 @@ #if defined(CONFIG_40x) || defined(CONFI
stw r12,4(r11)
 #endif
b   3f
+
 2: /* if from kernel, check interrupted DOZE/NAP mode and
  * check for stack overflow
  */
+   lwz r9,THREAD_INFO-THREAD(r12)
+   cmplw   r1,r9   /* if r1 = current-thread_info */
+   ble-stack_ovf   /* then the kernel stack overflowed */
+5:
 #ifdef CONFIG_6xx
-   mfspr   r11,SPRN_HID0
-   mtcrr11
-BEGIN_FTR_SECTION
-   bt- 8,4f/* Check DOZE */
-END_FTR_SECTION_IFSET(CPU_FTR_CAN_DOZE)
-BEGIN_FTR_SECTION
-   bt- 9,4f/* Check NAP */
-END_FTR_SECTION_IFSET(CPU_FTR_CAN_NAP)
+   tophys(r9,r9)   /* check local flags */
+   lwz r12,TI_LOCAL_FLAGS(r9)
+   mtcrf   0x01,r12
+   bt- 31-TLF_NAPPING,4f
 #endif /* CONFIG_6xx */
.globl transfer_to_handler_cont
 transfer_to_handler_cont:
-   lwz r11,THREAD_INFO-THREAD(r12)
-   cmplw   r1,r11  /* if r1 = current-thread_info */
-   ble-stack_ovf   /* then the kernel stack overflowed */
 3:
mflrr9
lwz r11,0(r9)   /* virtual address of handler */
lwz r9,4(r9)/* where to go when done */
-   FIX_SRR1(r10,r12)
mtspr   SPRN_SRR0,r11
mtspr   SPRN_SRR1,r10
mtlrr9
SYNC
RFI /* jump to handler, enable MMU */
 
-#ifdef CONFIG_6xx  
-4: b   power_save_6xx_restore
+#ifdef CONFIG_6xx
+4: rlwinm  r12,r12,0,~_TLF_NAPPING
+   stw r12,TI_LOCAL_FLAGS(r9)
+   b   power_save_6xx_restore
 #endif
 
 /*
@@ -167,10 +166,10 @@ #endif
  */
 stack_ovf:
/* sometimes we use a statically-allocated stack, which is OK. */
-   lis r11,[EMAIL PROTECTED]
-   ori r11,r11,[EMAIL PROTECTED]
-   cmplw   r1,r11
-   ble 3b  /* r1 = _end is OK */
+   lis r12,[EMAIL PROTECTED]
+   ori r12,r12,[EMAIL PROTECTED]
+   cmplw   r1,r12
+   ble 5b  /* r1 = _end is OK */
SAVE_NVGPRS(r11)
addir3,r1,STACK_FRAME_OVERHEAD
lis r1,[EMAIL PROTECTED]
diff --git a/arch/powerpc/kernel/head_64.S b/arch/powerpc/kernel/head_64.S
index a5ae04a..3b500dc 100644
--- a/arch/powerpc/kernel/head_64.S
+++ b/arch/powerpc/kernel/head_64.S
@@ -381,6 +381,7 @@ #define STD_EXCEPTION_COMMON_LITE(trap, 
.globl label##_common;  \
 label##_common:\
EXCEPTION_PROLOG_COMMON(trap, PACA_EXGEN);  \
+   FINISH_NAP; \
DISABLE_INTS;   \
bl  .ppc64_runlatch_on; \
addir3,r1,STACK_FRAME_OVERHEAD; \
@@ -388,6 +389,25 @@ label##_common:
\
b   .ret_from_except_lite
 
 /*
+ * When the idle code in power4_idle puts the CPU into NAP mode,
+ * it has to do so in a loop, and relies on the external interrupt
+ * and decrementer interrupt entry code to get it out of the loop.
+ * It sets the _TLF_NAPPING bit in current_thread_info()-local_flags
+ * to signal that it is in the loop and needs help to get out.
+ */
+#if defined(CONFIG_PPC_PMAC) || 

Re: 7447A strange problem with MSR:POW (WAS: can't boot 2.6.17-rc1)

2006-04-18 Thread Paul Mackerras
Benjamin Herrenschmidt writes:

 The 970 version bloats the exception prolog significantly... I

Four instructions, in the external and decrementer interrupt entry
paths - I don't think that's really significant bloat.

 understand now why you were talking about putting the code in the exit
 path on irc ... I don't like it that way Also, if you want to keep
 it, maybe use a separate CONFIG_PPC_970STYLE_NAP or something that gets
 selected by platforms that can do it ?

The config option makes sense.

 I suppose a PACA field would be less inefficient but still sucks... the
 exception return to userland code path already accesses thread_info and
 definitely looks like a better place to put it... as long as we never
 have to add dodgy workarounds when getting out of NAP like we do on 6xx.

More likely we'll get more situations like Cell where we come in
through the soft reset vector after sleep.

Paul.


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Re: 7447A strange problem with MSR:POW (WAS: can't boot 2.6.17-rc1)

2006-04-18 Thread Olof Johansson
On Tue, Apr 18, 2006 at 04:32:46PM +1000, Paul Mackerras wrote:

  understand now why you were talking about putting the code in the exit
  path on irc ... I don't like it that way Also, if you want to keep
  it, maybe use a separate CONFIG_PPC_970STYLE_NAP or something that gets
  selected by platforms that can do it ?
 
 The config option makes sense.

How about a CPU feature bit instead? That way it's possible to build a
multiplatform kernel without getting the overhead on other platforms.
4 nops should be manageable?


-Olof


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Re: 7447A strange problem with MSR:POW (WAS: can't boot 2.6.17-rc1)

2006-04-18 Thread Olof Johansson
On Tue, Apr 18, 2006 at 09:56:00AM -0500, Olof Johansson wrote:
 On Tue, Apr 18, 2006 at 04:32:46PM +1000, Paul Mackerras wrote:
 
   understand now why you were talking about putting the code in the exit
   path on irc ... I don't like it that way Also, if you want to keep
   it, maybe use a separate CONFIG_PPC_970STYLE_NAP or something that gets
   selected by platforms that can do it ?
  
  The config option makes sense.
 
 How about a CPU feature bit instead? That way it's possible to build a
 multiplatform kernel without getting the overhead on other platforms.
 4 nops should be manageable?

DOH! I'm going blind, that's already there. So, Ben, you're unhappy
with nopping it out?


-Olof


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Re: 7447A strange problem with MSR:POW (WAS: can't boot 2.6.17-rc1)

2006-04-18 Thread Becky Bruce
Paul,

This new version of the patch breaks 32-bit arch/ppc builds. The changes 
to the idle_6xx code are shared between architectures, but the entry.S 
code and asm-offsets.c are not.  

Here's a patch that puts the changes in arch/ppc as well.  Builds and 
boots on 834x (which is CONFIG_6xx).  

-B

ppc: Fix powersave on arch/ppc

Fix asm_offsets.c and entry.S to work with the new power save code.  
Changes in arch/powerpc needed to exist in arch/ppc as well since the 
idle code is shared by both ppc and powerpc..

Signed-off-by: Becky Bruce [EMAIL PROTECTED]

---
commit c9b42c4b6ad17aea51066604b70ea7ec399d2e45
tree 38642212d6396ecad721efca967ae1fc8924e096
parent c85f35d06479bf7cb5cfc7cda0ea218a23ed2dc4
author Becky Bruce [EMAIL PROTECTED] Tue, 18 Apr 2006 14:12:03 -0500
committer Becky Bruce [EMAIL PROTECTED] Tue, 18 Apr 2006 14:12:03 -0500

 arch/ppc/kernel/asm-offsets.c |1 +
 arch/ppc/kernel/entry.S   |   33 -
 2 files changed, 17 insertions(+), 17 deletions(-)

diff --git a/arch/ppc/kernel/asm-offsets.c b/arch/ppc/kernel/asm-offsets.c
index 77e4dc7..cc7c4ae 100644
--- a/arch/ppc/kernel/asm-offsets.c
+++ b/arch/ppc/kernel/asm-offsets.c
@@ -134,6 +134,7 @@ main(void)
DEFINE(TI_TASK, offsetof(struct thread_info, task));
DEFINE(TI_EXECDOMAIN, offsetof(struct thread_info, exec_domain));
DEFINE(TI_FLAGS, offsetof(struct thread_info, flags));
+   DEFINE(TI_LOCAL_FLAGS, offsetof(struct thread_info, flags));
DEFINE(TI_CPU, offsetof(struct thread_info, cpu));
DEFINE(TI_PREEMPT, offsetof(struct thread_info, preempt_count));
 
diff --git a/arch/ppc/kernel/entry.S b/arch/ppc/kernel/entry.S
index 5891ecb..1adc914 100644
--- a/arch/ppc/kernel/entry.S
+++ b/arch/ppc/kernel/entry.S
@@ -128,29 +128,26 @@ transfer_to_handler:
stw r12,4(r11)
 #endif
b   3f
+
 2: /* if from kernel, check interrupted DOZE/NAP mode and
  * check for stack overflow
  */
+   lwz r9,THREAD_INFO-THREAD(r12)
+   cmplw   r1,r9   /* if r1 = current-thread_info */
+   ble-stack_ovf   /* then the kernel stack overflowed */
+5:
 #ifdef CONFIG_6xx
-   mfspr   r11,SPRN_HID0
-   mtcrr11
-BEGIN_FTR_SECTION
-   bt- 8,4f/* Check DOZE */
-END_FTR_SECTION_IFSET(CPU_FTR_CAN_DOZE)
-BEGIN_FTR_SECTION
-   bt- 9,4f/* Check NAP */
-END_FTR_SECTION_IFSET(CPU_FTR_CAN_NAP)
+   tophys(r9,r9)   /* check local flags */
+   lwz r12,TI_LOCAL_FLAGS(r9)
+   mtcrf   0x01,r12
+   bt- 31-TLF_NAPPING,4f
 #endif /* CONFIG_6xx */
.globl transfer_to_handler_cont
 transfer_to_handler_cont:
-   lwz r11,THREAD_INFO-THREAD(r12)
-   cmplw   r1,r11  /* if r1 = current-thread_info */
-   ble-stack_ovf   /* then the kernel stack overflowed */
 3:
mflrr9
lwz r11,0(r9)   /* virtual address of handler */
lwz r9,4(r9)/* where to go when done */
-   FIX_SRR1(r10,r12)
mtspr   SPRN_SRR0,r11
mtspr   SPRN_SRR1,r10
mtlrr9
@@ -158,7 +155,9 @@ transfer_to_handler_cont:
RFI /* jump to handler, enable MMU */
 
 #ifdef CONFIG_6xx
-4: b   power_save_6xx_restore
+4: rlwinm  r12,r12,0,~_TLF_NAPPING
+   stw r12,TI_LOCAL_FLAGS(r9)
+   b   power_save_6xx_restore
 #endif
 
 /*
@@ -167,10 +166,10 @@ transfer_to_handler_cont:
  */
 stack_ovf:
/* sometimes we use a statically-allocated stack, which is OK. */
-   lis r11,[EMAIL PROTECTED]
-   ori r11,r11,[EMAIL PROTECTED]
-   cmplw   r1,r11
-   ble 3b  /* r1 = _end is OK */
+   lis r12,[EMAIL PROTECTED]
+   ori r12,r12,[EMAIL PROTECTED]
+   cmplw   r1,r12
+   ble 5b  /* r1 = _end is OK */
SAVE_NVGPRS(r11)
addir3,r1,STACK_FRAME_OVERHEAD
lis r1,[EMAIL PROTECTED]


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Re: 7447A strange problem with MSR:POW (WAS: can't boot 2.6.17-rc1)

2006-04-15 Thread Michael Schmitz
  Actually, I think the problem is that the code linux is using to turn
  on nap mode is not guaranteed to put the processor in nap mode by the
  time the blr in ppc6xx_idle occurs.

 Thanks, Becky.

 This patch fixes it for me.  Comments, anyone?

Works for me :-)

Michael


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Re: 7447A strange problem with MSR:POW (WAS: can't boot 2.6.17-rc1)

2006-04-14 Thread Becky Bruce


On Apr 13, 2006, at 5:20 AM, Benjamin Herrenschmidt wrote:


(For those who haven't followed the beginning, current git locks up at
boot on most recent powermacs. It was tracked down to a weird problem
with the idle code. My latest experiments seem to show something dodgy
with MSR_POW). Help from Freescale folks would be appreciated.



Ben, I think I know what the problem is - comments below.


On Sat, 2006-04-08 at 12:55 +1000, Paul Mackerras wrote:

This patch fixes it for me on my powerbook (1.5GHz albook).  The  
issue

seems to be that the cpu objects to HID0_NAP being cleared in HID0.
If I have this code power_save_6xx_restore, it hangs:

_GLOBAL(power_save_6xx_restore)
mfspr   r11,SPRN_HID0
rlwinm  r11,r11,0,10,8  /* Clear NAP */
mtspr   SPRN_HID0,r11
b   transfer_to_handler_cont

If I take out that rlwinm, it boots.  Bizaare.


If you do that, you cause the transfer_to_handler to always call
power_save_6xx_restore even when not coming back from idle.

I did a bit more tracking and it's very strange At first, I
discovered that adding a printk after the call to power_save fixed  
it. I

did all sort of tests and if my memory serves me well, a simple mb()
there will fix it too. In fact, what I noticed is that if I do

 if (mfmsr()  MSR_POW)
printk(GACK !\n);

After calling ppc_md.power_save() and before local_irq_enable(), it  
does

trigger ! But with an mb() just before, it doesn't. In fact, you don't
need an mb()... all you need is another mfmsr(). Thus a dummy msmsr()
fixes the stale MSR_POW in there.

That is very dodgy. Looks like we get a stale MSR_POW upon return from
the exception that interrupted sleep, causing the next
local_irq_enable() to block forever.


Actually, I think the problem is that the code linux is using to turn  
on nap mode is not guaranteed to put the processor in nap mode by the  
time the blr in ppc6xx_idle occurs.


This is at the bottom of ppc6xx_idle:

mfmsr   r7
ori r7,r7,MSR_EE
orisr7,r7,[EMAIL PROTECTED]
sync
isync
mtmsr   r7
isync
sync
blr

Unfortunately, NAP mode does not necessarily fully take effect for  
some number of cycles after the mtmsr, and the sync isn't enough to  
guarantee this.  So it's entirely possible that you execute the blr  
and carry on with the next function, which is local_irq_enable (or  
perhaps a MSR read in the case of your test code) which is going see  
the MSR value with POW set because you haven't started napping yet.


The above code should really look like this:

mfmsr   r7
ori r7,r7,MSR_EE
orisr7,r7,[EMAIL PROTECTED]
sync
isync
mtmsr   r7
isync
label:
b label
blr



The next question is how does it get there... my idea at first was  
that

we get MSR_POW in SRR1 in that exception and put it back in with rfi
(and the CPU gets it that way instead of ignoring it). Sounds like a
lovely explanation if we also add that a sync or an mfmsr clears  
this

weird condition. However, I added clearing of MSR_POW in r9 in
EXCEPTION_PROLOG_2() and it didn't fix it (but maybe I did something
wrong, I was tired).


This wouldn't help - MSR[POW] is cleared on exception and is not a  
bit that is saved in SRR1.


Hope this helps - I don't have hardware to test this on, so I can't  
be sure, but it seems to explain the behavior you're seeing if I'm  
understanding the problem correctly.


Cheers,
Becky


--
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Re: 7447A strange problem with MSR:POW (WAS: can't boot 2.6.17-rc1)

2006-04-14 Thread Becky Bruce


On Apr 13, 2006, at 3:55 PM, Benjamin Herrenschmidt wrote:





The above code should really look like this:

 mfmsr   r7
 ori r7,r7,MSR_EE
 orisr7,r7,[EMAIL PROTECTED]
 sync
 isync
 mtmsr   r7
 isync
label:
 b label
blr


Ohhh ... we always assumed mtmsr with MSR_POW was
immediate/synchronous ! That explains a lot. The problem with the  
above

though is that we'll never get out unless we also hack the exception
path to change the return address once an exception happens. It's not
that difficult especially since we already have a special case to  
handle
returning from NAP there, on ppc32 at least. ppc64 will need a bit  
more

investigation.



Agreed, this is yuck :(

Do you see another way to loop until NAP has gone ? Maybe reading  
msr in

a loop until POW gets cleared would do the trick ?


So, it makes sense to me that this would work, but I suspect there  
may be hardware wierdness - the user manual is very specific about  
the code sequence that should be used (although I've given you a  
slightly different sequence in my last mail that is also known to  
work and is cleaner, IMHO).  Let me check with one of our HW  
designers to see if this is OK.  It might be tomorrow before I have  
an answer - it's after 4:30 here and some of them are early birds,  
and might have already left for the day.


FYI, the user's manual recommends this sequence:
loop:
  sync
  mtmsr POW
  isync
  b loop




Hope this helps - I don't have hardware to test this on, so I can't
be sure, but it seems to explain the behavior you're seeing if I'm
understanding the problem correctly.


It definitely does ! Thanks a lot.



NP.

Cheers!
-B


--
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Re: 7447A strange problem with MSR:POW (WAS: can't boot 2.6.17-rc1)

2006-04-14 Thread Olof Johansson
Hi,

On Fri, Apr 14, 2006 at 12:07:23PM -0700, Paul Mackerras wrote:
 Becky Bruce writes:
 
  Actually, I think the problem is that the code linux is using to turn  
  on nap mode is not guaranteed to put the processor in nap mode by the  
  time the blr in ppc6xx_idle occurs.
 
 Thanks, Becky.
 
 This patch fixes it for me.  Comments, anyone?

The bf mnemonics had me scratching my head a while, it's not listed as
a simplified mnemonic in the 64-bit PEM. Two questions below.

  _GLOBAL(power_save_6xx_restore)
 + tophys(r11, r1) /* Make the idle task do a blr */
 + lwz r9,_LINK(r11)
 + stw r9,_NIP(r11)
   mfspr   r11,SPRN_HID0
 - rlwinm. r11,r11,0,10,8  /* Clear NAP  copy NAP bit !state to cr1 EQ */
 - cror4*cr1+eq,4*cr0+eq,4*cr0+eq
 + rlwinm  r11,r11,0,10,8  /* Clear NAP */
  BEGIN_FTR_SECTION
   rlwinm  r11,r11,0,9,7   /* Clear DOZE */
  END_FTR_SECTION_IFSET(CPU_FTR_CAN_DOZE)
   mtspr   SPRN_HID0, r11
  
  #ifdef DEBUG
 - beq cr1,1f
 + bf  9,1f

Where is cr0 set now -- you took the dot off of rlwinm?

   lis r11,(nap_return_count-KERNELBASE)@ha
   lwz r9,[EMAIL PROTECTED](r11)
   addir9,r9,1
   stw r9,[EMAIL PROTECTED](r11)
  1:
  #endif
 - 
 +
 +#ifdef CONFIG_SMP
   rlwinm  r9,r1,0,0,18
   tophys(r9,r9)
   lwz r11,TI_CPU(r9)
   slwir11,r11,2
 +#else
 + li  r11,0
 +#endif
   /* Todo make sure all these are in the same page
 -  * and load r22 (@ha part + CPU offset) only once
 +  * and load r11 (@ha part + CPU offset) only once
*/
  BEGIN_FTR_SECTION
 - beq cr1,1f
 + bf  9,1f

Same comment as above w.r.t. cr0?



-Olof


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Re: 7447A strange problem with MSR:POW (WAS: can't boot 2.6.17-rc1)

2006-04-14 Thread Olof Johansson
On Fri, Apr 14, 2006 at 01:19:36PM -0700, Paul Mackerras wrote:
 Olof Johansson writes:
 
  Where is cr0 set now -- you took the dot off of rlwinm?
 
 transfer_to_handler does mfspr r11,SPRN_HID0; mtcr r11 before jumping
 to power_save_6xx_restore.  The rlwinm. was wrong anyway since it was
 setting cr0.eq based on all the *other* bits in HID0, not HID0_NAP
 (doh!).

Oh, right, you even updated the comment to reflect this. My bad.
Patch looks good to me then.


-Olof


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Re: 7447A strange problem with MSR:POW (WAS: can't boot 2.6.17-rc1)

2006-04-14 Thread Benjamin Herrenschmidt
On Fri, 2006-04-14 at 12:07 -0700, Paul Mackerras wrote:
 Becky Bruce writes:
 
  Actually, I think the problem is that the code linux is using to turn  
  on nap mode is not guaranteed to put the processor in nap mode by the  
  time the blr in ppc6xx_idle occurs.
 
 Thanks, Becky.
 
 This patch fixes it for me.  Comments, anyone?

Looks good to me except that we need the same for ppc64 since the 970
theorically has the same problem...

Ben.



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Re: 7447A strange problem with MSR:POW (WAS: can't boot 2.6.17-rc1)

2006-04-14 Thread Benjamin Herrenschmidt
On Fri, 2006-04-14 at 15:00 -0500, Becky Bruce wrote:
 He's being sneaky - there's a copy of HID0 in the CR at this point  
 from the caller, and bit 9 is the position for NAP.

It's a trick I learned from Darwin :) They do that regulary when code is
very cpu-feature dependant, like cache code for example, they put the
cpu features bitmask in CR and do branches based on individual bits of
it here or there.

Ben.



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Re: 7447A strange problem with MSR:POW (WAS: can't boot 2.6.17-rc1)

2006-04-14 Thread Paul Mackerras
Becky Bruce writes:

 Actually, I think the problem is that the code linux is using to turn  
 on nap mode is not guaranteed to put the processor in nap mode by the  
 time the blr in ppc6xx_idle occurs.

Thanks, Becky.

This patch fixes it for me.  Comments, anyone?

Paul.

diff -urN powerpc-merge/arch/powerpc/kernel/idle_6xx.S 
pmac-2.6.17-rc1/arch/powerpc/kernel/idle_6xx.S
--- powerpc-merge/arch/powerpc/kernel/idle_6xx.S2006-04-04 
23:09:16.0 -0700
+++ pmac-2.6.17-rc1/arch/powerpc/kernel/idle_6xx.S  2006-04-14 
10:29:54.0 -0700
@@ -151,41 +151,47 @@
isync
mtmsr   r7
isync
-   sync
-   blr
+1: b   1b

 /*
  * Return from NAP/DOZE mode, restore some CPU specific registers,
  * we are called with DR/IR still off and r2 containing physical
- * address of current.
+ * address of current.  R11 and CR contain HID0.  We have to preserve
+ * r10 and r12.
  */
 _GLOBAL(power_save_6xx_restore)
+   tophys(r11, r1) /* Make the idle task do a blr */
+   lwz r9,_LINK(r11)
+   stw r9,_NIP(r11)
mfspr   r11,SPRN_HID0
-   rlwinm. r11,r11,0,10,8  /* Clear NAP  copy NAP bit !state to cr1 EQ */
-   cror4*cr1+eq,4*cr0+eq,4*cr0+eq
+   rlwinm  r11,r11,0,10,8  /* Clear NAP */
 BEGIN_FTR_SECTION
rlwinm  r11,r11,0,9,7   /* Clear DOZE */
 END_FTR_SECTION_IFSET(CPU_FTR_CAN_DOZE)
mtspr   SPRN_HID0, r11
 
 #ifdef DEBUG
-   beq cr1,1f
+   bf  9,1f
lis r11,(nap_return_count-KERNELBASE)@ha
lwz r9,[EMAIL PROTECTED](r11)
addir9,r9,1
stw r9,[EMAIL PROTECTED](r11)
 1:
 #endif
-   
+
+#ifdef CONFIG_SMP
rlwinm  r9,r1,0,0,18
tophys(r9,r9)
lwz r11,TI_CPU(r9)
slwir11,r11,2
+#else
+   li  r11,0
+#endif
/* Todo make sure all these are in the same page
-* and load r22 (@ha part + CPU offset) only once
+* and load r11 (@ha part + CPU offset) only once
 */
 BEGIN_FTR_SECTION
-   beq cr1,1f
+   bf  9,1f
addis   r9,r11,(nap_save_msscr0-KERNELBASE)@ha
lwz r9,[EMAIL PROTECTED](r9)
mtspr   SPRN_MSSCR0, r9


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Re: 7447A strange problem with MSR:POW (WAS: can't boot 2.6.17-rc1)

2006-04-14 Thread Becky Bruce
He's being sneaky - there's a copy of HID0 in the CR at this point  
from the caller, and bit 9 is the position for NAP.


-B

On Apr 14, 2006, at 2:54 PM, Olof Johansson wrote:


Hi,

On Fri, Apr 14, 2006 at 12:07:23PM -0700, Paul Mackerras wrote:

Becky Bruce writes:

Actually, I think the problem is that the code linux is using to  
turn
on nap mode is not guaranteed to put the processor in nap mode by  
the

time the blr in ppc6xx_idle occurs.


Thanks, Becky.

This patch fixes it for me.  Comments, anyone?


The bf mnemonics had me scratching my head a while, it's not listed as
a simplified mnemonic in the 64-bit PEM. Two questions below.


 _GLOBAL(power_save_6xx_restore)
+   tophys(r11, r1) /* Make the idle task do a blr */
+   lwz r9,_LINK(r11)
+   stw r9,_NIP(r11)
mfspr   r11,SPRN_HID0
-	rlwinm.	r11,r11,0,10,8	/* Clear NAP  copy NAP bit !state to cr1  
EQ */

-   cror4*cr1+eq,4*cr0+eq,4*cr0+eq
+   rlwinm  r11,r11,0,10,8  /* Clear NAP */
 BEGIN_FTR_SECTION
rlwinm  r11,r11,0,9,7   /* Clear DOZE */
 END_FTR_SECTION_IFSET(CPU_FTR_CAN_DOZE)
mtspr   SPRN_HID0, r11

 #ifdef DEBUG
-   beq cr1,1f
+   bf  9,1f


Where is cr0 set now -- you took the dot off of rlwinm?


lis r11,(nap_return_count-KERNELBASE)@ha
lwz r9,[EMAIL PROTECTED](r11)
addir9,r9,1
stw r9,[EMAIL PROTECTED](r11)
 1:
 #endif
-   
+
+#ifdef CONFIG_SMP
rlwinm  r9,r1,0,0,18
tophys(r9,r9)
lwz r11,TI_CPU(r9)
slwir11,r11,2
+#else
+   li  r11,0
+#endif
/* Todo make sure all these are in the same page
-* and load r22 (@ha part + CPU offset) only once
+* and load r11 (@ha part + CPU offset) only once
 */
 BEGIN_FTR_SECTION
-   beq cr1,1f
+   bf  9,1f


Same comment as above w.r.t. cr0?



-Olof



--
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Re: 7447A strange problem with MSR:POW (WAS: can't boot 2.6.17-rc1)

2006-04-14 Thread Paul Mackerras
Olof Johansson writes:

 Where is cr0 set now -- you took the dot off of rlwinm?

transfer_to_handler does mfspr r11,SPRN_HID0; mtcr r11 before jumping
to power_save_6xx_restore.  The rlwinm. was wrong anyway since it was
setting cr0.eq based on all the *other* bits in HID0, not HID0_NAP
(doh!).

Paul.


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Re: 7447A strange problem with MSR:POW (WAS: can't boot 2.6.17-rc1)

2006-04-14 Thread Becky Bruce

On Fri, 2006-04-14 at 12:07 -0700, Paul Mackerras wrote:


Becky Bruce writes:



Actually, I think the problem is that the code linux is using to turn
on nap mode is not guaranteed to put the processor in nap mode by the
time the blr in ppc6xx_idle occurs.



Thanks, Becky.

This patch fixes it for me.  Comments, anyone?


Patch LGTM, as well.  I like the approach.

Thanks!
-Becky


--
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Re: 7447A strange problem with MSR:POW (WAS: can't boot 2.6.17-rc1)

2006-04-13 Thread Benjamin Herrenschmidt

 
 The above code should really look like this:
 
  mfmsr   r7
  ori r7,r7,MSR_EE
  orisr7,r7,[EMAIL PROTECTED]
  sync
  isync
  mtmsr   r7
  isync
 label:
  b label
   blr

Ohhh ... we always assumed mtmsr with MSR_POW was
immediate/synchronous ! That explains a lot. The problem with the above
though is that we'll never get out unless we also hack the exception
path to change the return address once an exception happens. It's not
that difficult especially since we already have a special case to handle
returning from NAP there, on ppc32 at least. ppc64 will need a bit more
investigation.

Do you see another way to loop until NAP has gone ? Maybe reading msr in
a loop until POW gets cleared would do the trick ?

 Hope this helps - I don't have hardware to test this on, so I can't  
 be sure, but it seems to explain the behavior you're seeing if I'm  
 understanding the problem correctly.

It definitely does ! Thanks a lot.

Ben.



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Re: 7447A strange problem with MSR:POW (WAS: can't boot 2.6.17-rc1)

2006-04-13 Thread Benjamin Herrenschmidt

 FYI, the user's manual recommends this sequence:
 loop:
sync
mtmsr POW
isync
b loop

Ok, that's what OS X does... I always wondered ...

So ideally, we should do something similar to the above and set some
global bit somewhere telling the exception path to change the return
address. In either case, the actual form of the loop becomes fairly
irrelevant.

I need to verify what's up with the 970. I noticed Apple has some
additional weird tricks involving setting the DEC to a short value but
setting POW without EE (though I don't remember for sure, I should dbl
check their code). I suppose I should ask some IBM folks there.

Ben.
 


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Re: 7447A strange problem with MSR:POW (WAS: can't boot 2.6.17-rc1)

2006-04-13 Thread Olof Johansson
On Fri, Apr 14, 2006 at 08:37:21AM +1000, Benjamin Herrenschmidt wrote:

 I need to verify what's up with the 970. I noticed Apple has some

970 keeps executing too. I guess it's never bitten hard enough to
trigger anything serious.


-Olof


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]