Re: PPC upstream kernel ignored DABR bug

2008-03-26 Thread Josh Boyer
On Wed, 26 Mar 2008 15:57:32 -0500
Josh Boyer <[EMAIL PROTECTED]> wrote:

> On Wed, 12 Mar 2008 18:47:45 -0700 (PDT)
> Roland McGrath <[EMAIL PROTECTED]> wrote:
> 
> > The only machine I have at home for testing powerpc is an Apple G5,
> > supplied to me by IBM.  It says:
> > cpu : PPC970FX, altivec supported
> > revision: 3.0 (pvr 003c 0300)
> > so I am guessing this document applies to the chips I have.  Since I can't
> > test on other chips myself, it is plausible from what I've seen that there
> > is no mysterious kernel problem and only this hardware problem.  The
> > description of the hardware problem would not make me think that it would
> > behave this way, but it is not very detailed or precise, or at least does
> > not seem so to a reader not expert on powerpc.
> 
> I ran the testcase on my older G5 today with:
> 
> cpu : PPC970, altivec supported
> revision: 2.2 (pvr 0039 0202)
> 
> and it also failed after a few iterations.  This was with
> 2.6.25-0.121.rc5.git4.fc9 as the kernel, which is fairly close to mainline.  
> At the least, this doesn't seem to be 970FX related.  I'll try building a 
> vanilla 2.6.25-rc7 later this evening to see if that makes a difference.

Still failed with a -vanilla build of 2.6.25-rc7.

josh
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: PPC upstream kernel ignored DABR bug

2008-03-26 Thread Josh Boyer
On Wed, 12 Mar 2008 18:47:45 -0700 (PDT)
Roland McGrath <[EMAIL PROTECTED]> wrote:
 
> The only machine I have at home for testing powerpc is an Apple G5,
> supplied to me by IBM.  It says:
>   cpu : PPC970FX, altivec supported
>   revision: 3.0 (pvr 003c 0300)
> so I am guessing this document applies to the chips I have.  Since I can't
> test on other chips myself, it is plausible from what I've seen that there
> is no mysterious kernel problem and only this hardware problem.  The
> description of the hardware problem would not make me think that it would
> behave this way, but it is not very detailed or precise, or at least does
> not seem so to a reader not expert on powerpc.

I ran the testcase on my older G5 today with:

cpu : PPC970, altivec supported
revision: 2.2 (pvr 0039 0202)

and it also failed after a few iterations.  This was with
2.6.25-0.121.rc5.git4.fc9 as the kernel, which is fairly close to mainline.  At 
the least, this doesn't seem to be 970FX related.  I'll try building a vanilla 
2.6.25-rc7 later this evening to see if that makes a difference.

josh
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: PPC upstream kernel ignored DABR bug

2008-03-16 Thread Benjamin Herrenschmidt

On Fri, 2008-03-14 at 09:42 +0100, Segher Boessenkool wrote:
> > I saw no effect from that change.  So now we're back to pure
> mystery, 
> > I guess.
> 
> Hey, we know something now: it's "just" a problem in the kernel :-)

We don't know that for sure. The DABR context switching code is trivial
enough...

Ben.


___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: PPC upstream kernel ignored DABR bug

2008-03-16 Thread Benjamin Herrenschmidt

> Since the 970 kernel never sets DABRX currently, #8 cannot explain
> _intermittent_ problems: either it always works, or never does.

Uh... could be the boot code setting it, the setting happening on LSU0
but not LSU1. No ?

Ben.

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: PPC upstream kernel ignored DABR bug

2008-03-14 Thread Segher Boessenkool
If this doesn't help, and the failures stay intermittent, I don't 
think

there is a close-to-the-hardware problem here.


I saw no effect from that change.  So now we're back to pure mystery, 
I guess.


Hey, we know something now: it's "just" a problem in the kernel :-)


Segher

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: PPC upstream kernel ignored DABR bug

2008-03-14 Thread Roland McGrath
> In both these cases, the storage access goes to LSU0, so you're
> not hitting the errata.

I'll take your word for it.

> If this doesn't help, and the failures stay intermittent, I don't think
> there is a close-to-the-hardware problem here.

I saw no effect from that change.  So now we're back to pure mystery, I guess.


Thanks,
Roland
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: PPC upstream kernel ignored DABR bug

2008-03-13 Thread Segher Boessenkool

The pointer to the test case was given here before.


Oh, I missed that.  Anyway, I wanted to see the asm, and who knows,
with different compiler versions and all that.


0x1984 :   bl  0x10001750 
0x1988 :   lis r9,4097
---> 0x198c :   stw r29,7792(r9)



0x1d4c :   bl  0x1a88
0x1d50 :   ld  r2,40(r1)
0x1d54 :   ld  r9,-32688(r2)
---> 0x1d58 :   std r29,0(r9)


In both these cases, the storage access goes to LSU0, so you're
not hitting the errata.

I noticed set_dabr() doesn't do proper synchronisation insns, could
you try this patch?  I doubt it helps, but it changes the code to do
"the right thing".


diff --git a/arch/powerpc/kernel/process.c 
b/arch/powerpc/kernel/process.c

index 4846bf5..ee925f5 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -250,7 +250,9 @@ int set_dabr(unsigned long dabr)

/* XXX should we have a CPU_FTR_HAS_DABR ? */
 #if defined(CONFIG_PPC64) || defined(CONFIG_6xx)
+   asm("sync");
mtspr(SPRN_DABR, dabr);
+   asm("isync");
 #endif
return 0;
 }


(badly copy/pasted, please apply by hand.  Will send a real patch later 
;-) )


If this doesn't help, and the failures stay intermittent, I don't think 
there

is a close-to-the-hardware problem here.


Segher

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: PPC upstream kernel ignored DABR bug

2008-03-13 Thread Roland McGrath
> Since the 970 kernel never sets DABRX currently, #8 cannot explain
> _intermittent_ problems: either it always works, or never does.

That's kind of what I thought, but I couldn't make enough sense of
the #8 text to be very sure.

> You could be happening upon #5, if the non-triggering data breakpoints
> are with vector loads/stores in strange code.

They are not.

> It would help if you could give us the disassembly of some code where the
> breakpoint did not trigger; say, that insn and the previous 20 or so insns.

The pointer to the test case was given here before.

http://sources.redhat.com/cgi-bin/cvsweb.cgi/~checkout~/tests/ptrace-tests/tests/ppc-dabr-race.c?cvsroot=systemtap

-m32Dump of assembler code for function child_thread:
0x1950 :stwur1,-32(r1)
0x1954 :li  r3,207
0x1958 :mflrr0
0x195c :   stw r29,20(r1)
0x1960 :   stw r0,36(r1)
0x1964 :   crclr   4*cr1+eq
0x1968 :   bl  0x10001680 
0x196c :   lis r11,4097
0x1970 :   mr  r29,r3
0x1974 :   li  r3,1
0x1978 :   lwz r9,7800(r11)
0x197c :   addir9,r9,1
0x1980 :   stw r9,7800(r11)
0x1984 :   bl  0x10001750 
0x1988 :   lis r9,4097
--->0x198c :   stw r29,7792(r9)
0x1990 :   bl  0x10001760 
0x1994 :   bl  0x10001760 
0x1998 :   b   0x1990 
End of assembler dump.

-m64Dump of assembler code for function child_thread:
0x1d10 :mflrr0
0x1d14 :std r29,-24(r1)
0x1d18 :li  r3,207
0x1d1c :   std r0,16(r1)
0x1d20 :   stdur1,-144(r1)
0x1d24 :   bl  0x1b68
0x1d28 :   ld  r2,40(r1)
0x1d2c :   ld  r11,-32696(r2)
0x1d30 :   mr  r29,r3
0x1d34 :   li  r3,1
0x1d38 :   extsw   r29,r29
0x1d3c :   lwz r9,0(r11)
0x1d40 :   addir9,r9,1
0x1d44 :   clrldi  r9,r9,32
0x1d48 :   stw r9,0(r11)
0x1d4c :   bl  0x1a88
0x1d50 :   ld  r2,40(r1)
0x1d54 :   ld  r9,-32688(r2)
--->0x1d58 :   std r29,0(r9)
0x1d5c :   nop
0x1d60 :   bl  0x19a8
0x1d64 :   ld  r2,40(r1)
0x1d68 :   b   0x1d60 

0x1d6c :   .long 0x0
0x1d70 :   .long 0x1
0x1d74 :  lwz r0,0(r3)
End of assembler dump.


Thanks,
Roland
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: PPC upstream kernel ignored DABR bug

2008-03-13 Thread Segher Boessenkool

AFAICT the DABRX register just has two global bits that enable paying
attention to the DABR register.


It has four bits:

01  match in user mode
02  match in supervisor mode
04  match in hypervisor mode
08  ignore translation field in DABR

If the kernel can write to DABRX, it is running in hypervisor mode, so
it should set 07 instead of 03 (as it currently does) if it wants to
match in kernel mode; or 01, if it doesn't.

OTOH, the Apple version of the 970 is special (it has no separate
hypervisor mode); still, 07 should always work.


It only needs to be set once at boot time
(as the cell code does).  I don't see how missing that initialization  
could
ever have explained the behavior we see where DABR matches are  
intermittent.

If those DABRX bits weren't set then no DABR match would have happened.
(Apparently they are set before boot on an Apple G5.)


I don't see the Apple boot code initialising DABRX; maybe the bootup  
state
for DABRX is 07, dunno.  Either way, it would be good if the kernel set  
it
properly, esp. if it wants to enable or disable matches in the kernel  
itself.


What we actually see is that DABR matches seem to be reliable when  
things
are slow, and get intermittent when there are enough threads with DABR  
set.



I happened across:

http://www-01.ibm.com/chips/techlib/techlib.nsf/techdocs/ 
79B6E24422AA101287256E93006C957E/$file/ 
PowerPC_970FX_errata_DD3.X_V1.7.pdf


which is "IBM PowerPC 970FX RISC Microprocessor Errata List for DD3.X"
and contains "Erratum #8: DABRX register might not always be updated  
correctly":



The only machine I have at home for testing powerpc is an Apple G5,
supplied to me by IBM.  It says:
cpu : PPC970FX, altivec supported
revision: 3.0 (pvr 003c 0300)
so I am guessing this document applies to the chips I have.


Indeed.


Since I can't
test on other chips myself, it is plausible from what I've seen that  
there

is no mysterious kernel problem and only this hardware problem.  The
description of the hardware problem would not make me think that it  
would
behave this way, but it is not very detailed or precise, or at least  
does

not seem so to a reader not expert on powerpc.


Since the 970 kernel never sets DABRX currently, #8 cannot explain
_intermittent_ problems: either it always works, or never does.

You could be happening upon #5, if the non-triggering data breakpoints
are with vector loads/stores in strange code.

I don't know what I can do next to tell whether this processor erratum  
is in
fact what's happening in the test case.  If it is, I don't know if  
there

might be some arcane way to work around it despite "None" cited above.


It would help if you could give us the disassembly of some code where  
the
breakpoint did not trigger; say, that insn and the previous 20 or so  
insns.



Segher

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: PPC upstream kernel ignored DABR bug

2008-03-13 Thread Luis Machado
On Wed, 2008-03-12 at 23:30 +0100, Jens Osterkamp wrote:
> > Just to make sure, i tested the binary against the 2.6.25-rc4 kernel. It
> > still fails. So this is really an open bug for PPC.
> 
> On a Cell- or 970-based machine ?
> 
> Gruß,
>   Jens

On a 970-based machine.

Regards,

-- 
Luis Machado
Software Engineer 
IBM Linux Technology Center

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: PPC upstream kernel ignored DABR bug

2008-03-12 Thread Roland McGrath
AFAICT the DABRX register just has two global bits that enable paying
attention to the DABR register.  It only needs to be set once at boot time
(as the cell code does).  I don't see how missing that initialization could
ever have explained the behavior we see where DABR matches are intermittent.
If those DABRX bits weren't set then no DABR match would have happened.
(Apparently they are set before boot on an Apple G5.)

What we actually see is that DABR matches seem to be reliable when things
are slow, and get intermittent when there are enough threads with DABR set.

I searched the web trying to figure out what a DABRX register does so I
could just go try it myself rather than waiting another n months for powerpc
folks to forget about it again.  (I did try it, and 
mtspr(SPRN_DABRX, DABRX_KERNEL | DABRX_USER);
makes no difference to the test on my machine, even done in set_dabr every
time we set SPRN_DABR.)

I happened across:

http://www-01.ibm.com/chips/techlib/techlib.nsf/techdocs/79B6E24422AA101287256E93006C957E/$file/PowerPC_970FX_errata_DD3.X_V1.7.pdf

which is "IBM PowerPC 970FX RISC Microprocessor Errata List for DD3.X"
and contains "Erratum #8: DABRX register might not always be updated correctly":

Projected Impact
  The data address breakpoint function might not always work.
Workaround
  None.
Status
  A fix is not planned at this time for the PowerPC 970FX.

The only machine I have at home for testing powerpc is an Apple G5,
supplied to me by IBM.  It says:
cpu : PPC970FX, altivec supported
revision: 3.0 (pvr 003c 0300)
so I am guessing this document applies to the chips I have.  Since I can't
test on other chips myself, it is plausible from what I've seen that there
is no mysterious kernel problem and only this hardware problem.  The
description of the hardware problem would not make me think that it would
behave this way, but it is not very detailed or precise, or at least does
not seem so to a reader not expert on powerpc.

So, uh, go IBM!

I'm in the minority in this conversation as someone not expert on powerpc,
and as someone not employed by IBM.  (I don't really mind finding public IBM
documents about powerpc on the web and telling IBM powerpc folks about them.
But, well.)

I don't know what I can do next to tell whether this processor erratum is in
fact what's happening in the test case.  If it is, I don't know if there
might be some arcane way to work around it despite "None" cited above.


Thanks,
Roland
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: PPC upstream kernel ignored DABR bug

2008-03-12 Thread Jens Osterkamp

> Just to make sure, i tested the binary against the 2.6.25-rc4 kernel. It
> still fails. So this is really an open bug for PPC.

On a Cell- or 970-based machine ?

Gruß,
Jens

IBM Deutschland Entwicklung GmbH
Vorsitzender des Aufsichtsrats: Martin Jetter
Geschäftsführung: Herbert Kircher 
Sitz der Gesellschaft: Böblingen
Registergericht: Amtsgericht Stuttgart, HRB 243294
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: PPC upstream kernel ignored DABR bug

2008-03-12 Thread Luis Machado
Hi,

> On the Blade DABRX had to be set additional to DABR. PS3 and Celleb
> already did this. Uli Weigand found this back in November. I submitted
> a patch for this which went into 2.6.25-rc4.
> Can you please try again with rc4 ?

> Gruß,
> 
> Jens

Just to make sure, i tested the binary against the 2.6.25-rc4 kernel. It
still fails. So this is really an open bug for PPC.

-- 
Luis Machado
Software Engineer 
IBM Linux Technology Center

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: PPC upstream kernel ignored DABR bug

2008-03-10 Thread Segher Boessenkool

On the Blade DABRX had to be set additional to DABR. PS3 and Celleb
already did this. Uli Weigand found this back in November. I submitted
a patch for this which went into 2.6.25-rc4.
Can you please try again with rc4 ?


This is not the problem.  This came up before and everyone seems have
forgotten.  This bug has been reproduced on G5's, which do not have 
DABRX

as I understand it.


970 (all versions) _does_ have a DABRX register.  Dunno if it has
the same register definition (I cannot find DABRX in the Cell docs).


Segher

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: PPC upstream kernel ignored DABR bug

2008-03-10 Thread Roland McGrath
The G5 that I have says:

cpu : PPC970FX, altivec supported
revision: 3.0 (pvr 003c 0300)

and it does indeed reproduce this bug.

It also strange for it to be the DABRX issue given the failure mode.
That is, it works sometimes but unreliably (as if the context switch
sometimes fails to install the value).


Thanks,
Roland
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: PPC upstream kernel ignored DABR bug

2008-03-10 Thread Olof Johansson
On Mon, Mar 10, 2008 at 04:36:37PM -0300, Luis Machado wrote:
> On Mon, 2008-03-10 at 12:19 -0700, Roland McGrath wrote:
> > > On the Blade DABRX had to be set additional to DABR. PS3 and Celleb
> > > already did this. Uli Weigand found this back in November. I submitted
> > > a patch for this which went into 2.6.25-rc4.
> > > Can you please try again with rc4 ?
> > 
> > This is not the problem.  This came up before and everyone seems have
> > forgotten.  This bug has been reproduced on G5's, which do not have DABRX
> > as I understand it.
> 
> Yes, now that you mentioned, i've been able to reproduce this on 970FX's
> blades, which i don't think have DABRX registers. I guess it's the
> almost the same CPU as G5's.

What Apple called G5 were during the production runs three different
CPUs:

970
970FX
970MP

970 was only used in the very first models. 970MP was used in the last
(the models with pci-express and up to 4 cpus). 970FX was used on almost
everything else inbetween.


-Olof
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: PPC upstream kernel ignored DABR bug

2008-03-10 Thread Luis Machado
On Mon, 2008-03-10 at 12:19 -0700, Roland McGrath wrote:
> > On the Blade DABRX had to be set additional to DABR. PS3 and Celleb
> > already did this. Uli Weigand found this back in November. I submitted
> > a patch for this which went into 2.6.25-rc4.
> > Can you please try again with rc4 ?
> 
> This is not the problem.  This came up before and everyone seems have
> forgotten.  This bug has been reproduced on G5's, which do not have DABRX
> as I understand it.

Yes, now that you mentioned, i've been able to reproduce this on 970FX's
blades, which i don't think have DABRX registers. I guess it's the
almost the same CPU as G5's.

Regards,

-- 
Luis Machado
Software Engineer 
IBM Linux Technology Center

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: PPC upstream kernel ignored DABR bug

2008-03-10 Thread Roland McGrath
> On the Blade DABRX had to be set additional to DABR. PS3 and Celleb
> already did this. Uli Weigand found this back in November. I submitted
> a patch for this which went into 2.6.25-rc4.
> Can you please try again with rc4 ?

This is not the problem.  This came up before and everyone seems have
forgotten.  This bug has been reproduced on G5's, which do not have DABRX
as I understand it.


Thanks,
Roland
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: PPC upstream kernel ignored DABR bug

2008-03-10 Thread Luis Machado
> On the Blade DABRX had to be set additional to DABR. PS3 and Celleb
> already did this. Uli Weigand found this back in November. I submitted
> a patch for this which went into 2.6.25-rc4.
> Can you please try again with rc4 ?

I will try it and will post the results back.

Thanks Jens.

Regards,
-- 
Luis Machado
Software Engineer 
IBM Linux Technology Center

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: PPC upstream kernel ignored DABR bug

2008-03-10 Thread Jens Osterkamp
On Monday 10 March 2008, Luis Machado wrote:
> > Yes, I know. I tried it on the PS3 first and couldn't reproduce
> > the bug he saw on the blade.
> 
> Arnd,
> 
> Do we have any news on this topic? 
> 
> I've seen this happening quite often within GDB when using hardware
> watchpoints on a shared variable in a threaded (7+ threads) binary.
> Sometimes the watchpoint won't trigger, even though the monitored
> variable's value was modified.

On the Blade DABRX had to be set additional to DABR. PS3 and Celleb
already did this. Uli Weigand found this back in November. I submitted
a patch for this which went into 2.6.25-rc4.
Can you please try again with rc4 ?

Gruß,

Jens

IBM Deutschland Entwicklung GmbH
Vorsitzender des Aufsichtsrats: Martin Jetter
Geschäftsführung: Herbert Kircher 
Sitz der Gesellschaft: Böblingen
Registergericht: Amtsgericht Stuttgart, HRB 243294
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: PPC upstream kernel ignored DABR bug

2008-03-09 Thread Luis Machado
> Yes, I know. I tried it on the PS3 first and couldn't reproduce
> the bug he saw on the blade.

Arnd,

Do we have any news on this topic? 

I've seen this happening quite often within GDB when using hardware
watchpoints on a shared variable in a threaded (7+ threads) binary.
Sometimes the watchpoint won't trigger, even though the monitored
variable's value was modified.

Appreciate your feedback.

Best regards,

-- 
Luis Machado
LoP Toolchain
Software Engineer 
IBM Linux Technology Center

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: PPC upstream kernel ignored DABR bug

2007-11-28 Thread Arnd Bergmann
On Wednesday 28 November 2007 23:59:36 Geoff Levand wrote:
> > This sounds like a bug recently reported by Uli Weigand. BenH
> > said he'd take a look, but it probably fell under the table.
> > The problem found by Uli is that on certain processors (Cell/B.E.
> > in his case), the DABRX register needs to be set in order for
> > the DABR to take effect.
>
> Just as a note, the PS3's lv1_set_dabr(), which we used for
> ppc_md.set_dabr sets up both the DABRX and DABR registers.

Yes, I know. I tried it on the PS3 first and couldn't reproduce
the bug he saw on the blade.

Arnd <><
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: PPC upstream kernel ignored DABR bug

2007-11-28 Thread Geoff Levand
Arnd Bergmann wrote:
> On Monday 26 November 2007, Jan Kratochvil wrote:
>> Hi,
>> 
>> this testcase:
>> http://people.redhat.com/jkratoch/dabr-lost.c
>> 
>> reproduces a PPC DABR kernel bug.  The variable `variable' should not get
>> modified as the thread modifying it should be caught by its DABR:
>> 
>> $ ./dabr-lost
>> TID 30914: DABR 0x10012a77 NIP 0x80f6ebb318
>> TID 30915: DABR 0x10012a77 NIP 0x80f6ebb318
>> TID 30916: DABR 0x10012a77 NIP 0x80f6ebb318
>> TID 30914: hitting the variable
>> TID 30915: hitting the variable
>> TID 30916: hitting the variable
>> variable found = 30916, caught TID = 30914
>> TID 30916: DABR 0x10012a77
>> Variable got modified by a thread which has DABR still set!
>> 
> 
> This sounds like a bug recently reported by Uli Weigand. BenH
> said he'd take a look, but it probably fell under the table.
> The problem found by Uli is that on certain processors (Cell/B.E.
> in his case), the DABRX register needs to be set in order for
> the DABR to take effect.

Just as a note, the PS3's lv1_set_dabr(), which we used for
ppc_md.set_dabr sets up both the DABRX and DABR registers.

-Geoff

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: PPC upstream kernel ignored DABR bug

2007-11-28 Thread Jan Kratochvil
On Wed, 28 Nov 2007 13:28:48 +0100, Arnd Bergmann wrote:
> On Wednesday 28 November 2007, Jan Kratochvil wrote:
> > Please be aware DABR works fine if the same code runs just 1 (always) or
> > 2 (sometimes) threads.  It starts failing with too many threads running:
> > 
> > $ ./dabr-lost
> > TID 32725: DABR 0x1001279f NIP 0xfecf41c
> > TID 32726: DABR 0x1001279f NIP 0xfecf41c
> > TID 32725: hitting the variable
> > variable found = -1, caught TID = 32725
> > TID 32726: hitting the variable
> > variable found = -1, caught TID = 32726
> > The kernel bug did not get reproduced - increase THREADS.
> > 
> > As I did not find any code in that kernel touching DABRX its value should 
> > not
> > be dependent on the number of threads running.
> > 
> 
> Right, this is a different problem from the one reported by Uli.
> From what I can tell, your problem is that you set the DABR only
> in one thread, so the other threads don't see it. DABR is saved
> in the thread_struct, so setting it in one thread doesn't have
> an impact on any other thread.

It even prints out above:
TID 32725: DABR 0x1001279f NIP 0xfecf41c
TID 32726: DABR 0x1001279f NIP 0xfecf41c

that it wrote DABR in both the threads and it has also successfully read it
back from each thread specifically (according to its thread-specific TID).

for (threadi = 0; threadi < THREADS; threadi++)
{
  pid_t tid = thread[threadi];

  setup (tid);
...
}
static void setup (pid_t tid)
{
...
  l = ptrace (PTRACE_SET_DEBUGREG, tid, NULL, (void *) dabr);
...
}

Also if I would not set DABR specifically for each thread it would not work in
90% of cases for `THREADS == 2'.  And it would not work for `THREADS == 4' if
they are busylooping (therefore not in a syscall).
TID 596: DABR 0x100127a7 NIP 0x1dbc
TID 597: DABR 0x100127a7 NIP 0x1db0
TID 598: DABR 0x100127a7 NIP 0x1dac
TID 599: DABR 0x100127a7 NIP 0x1dbc
TID 596: hitting the variable
variable found = -1, caught TID = 596
TID 599: hitting the variable
variable found = -1, caught TID = 599
TID 597: hitting the variable
variable found = -1, caught TID = 597
TID 598: hitting the variable
variable found = -1, caught TID = 598
The kernel bug got workarounded by WORKAROUND_SET_DABR_IN_SYSCALL.

(I found out now WORKAROUND_SET_DABR_IN_SYSCALL only reduces the probability of
the failure, it is not a 100% workaround of the problem in the testcase.)


There is some tricky kernel code around it but I did not try to debug it:

struct task_struct *__switch_to(struct task_struct *prev,
struct task_struct *new)
{
...
if (unlikely(__get_cpu_var(current_dabr) != new->thread.dabr)) {
set_dabr(new->thread.dabr);
__get_cpu_var(current_dabr) = new->thread.dabr;
}
...
}



Regards,
Jan
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: PPC upstream kernel ignored DABR bug

2007-11-28 Thread Arnd Bergmann
On Wednesday 28 November 2007, Jan Kratochvil wrote:
> Please be aware DABR works fine if the same code runs just 1 (always) or
> 2 (sometimes) threads.  It starts failing with too many threads running:
> 
> $ ./dabr-lost
> TID 32725: DABR 0x1001279f NIP 0xfecf41c
> TID 32726: DABR 0x1001279f NIP 0xfecf41c
> TID 32725: hitting the variable
> variable found = -1, caught TID = 32725
> TID 32726: hitting the variable
> variable found = -1, caught TID = 32726
> The kernel bug did not get reproduced - increase THREADS.
> 
> As I did not find any code in that kernel touching DABRX its value should not
> be dependent on the number of threads running.
> 

Right, this is a different problem from the one reported by Uli.
From what I can tell, your problem is that you set the DABR only
in one thread, so the other threads don't see it. DABR is saved
in the thread_struct, so setting it in one thread doesn't have
an impact on any other thread.

Arnd <><
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: PPC upstream kernel ignored DABR bug

2007-11-28 Thread Jan Kratochvil
On Tue, 27 Nov 2007 23:35:36 +0100, Arnd Bergmann wrote:
> On Monday 26 November 2007, Jan Kratochvil wrote:
> > Hi,
> > 
> > this testcase:
> > http://people.redhat.com/jkratoch/dabr-lost.c
> > 
> > reproduces a PPC DABR kernel bug.  The variable `variable' should not get
> > modified as the thread modifying it should be caught by its DABR:
> > 
> > $ ./dabr-lost
> > TID 30914: DABR 0x10012a77 NIP 0x80f6ebb318
> > TID 30915: DABR 0x10012a77 NIP 0x80f6ebb318
> > TID 30916: DABR 0x10012a77 NIP 0x80f6ebb318
> > TID 30914: hitting the variable
> > TID 30915: hitting the variable
> > TID 30916: hitting the variable
> > variable found = 30916, caught TID = 30914
> > TID 30916: DABR 0x10012a77
> > Variable got modified by a thread which has DABR still set!
> > 
> 
> This sounds like a bug recently reported by Uli Weigand. BenH
> said he'd take a look, but it probably fell under the table.
> The problem found by Uli is that on certain processors (Cell/B.E.
> in his case), the DABRX register needs to be set in order for
> the DABR to take effect.

Please be aware DABR works fine if the same code runs just 1 (always) or
2 (sometimes) threads.  It starts failing with too many threads running:

$ ./dabr-lost
TID 32725: DABR 0x1001279f NIP 0xfecf41c
TID 32726: DABR 0x1001279f NIP 0xfecf41c
TID 32725: hitting the variable
variable found = -1, caught TID = 32725
TID 32726: hitting the variable
variable found = -1, caught TID = 32726
The kernel bug did not get reproduced - increase THREADS.

As I did not find any code in that kernel touching DABRX its value should not
be dependent on the number of threads running.


Regards,
Lace
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: PPC upstream kernel ignored DABR bug

2007-11-27 Thread Arnd Bergmann
On Monday 26 November 2007, Jan Kratochvil wrote:
> Hi,
> 
> this testcase:
> http://people.redhat.com/jkratoch/dabr-lost.c
> 
> reproduces a PPC DABR kernel bug.  The variable `variable' should not get
> modified as the thread modifying it should be caught by its DABR:
> 
> $ ./dabr-lost
> TID 30914: DABR 0x10012a77 NIP 0x80f6ebb318
> TID 30915: DABR 0x10012a77 NIP 0x80f6ebb318
> TID 30916: DABR 0x10012a77 NIP 0x80f6ebb318
> TID 30914: hitting the variable
> TID 30915: hitting the variable
> TID 30916: hitting the variable
> variable found = 30916, caught TID = 30914
> TID 30916: DABR 0x10012a77
> Variable got modified by a thread which has DABR still set!
> 

This sounds like a bug recently reported by Uli Weigand. BenH
said he'd take a look, but it probably fell under the table.
The problem found by Uli is that on certain processors (Cell/B.E.
in his case), the DABRX register needs to be set in order for
the DABR to take effect.

Arnd <><
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


PPC upstream kernel ignored DABR bug

2007-11-26 Thread Jan Kratochvil
Hi,

this testcase:
http://people.redhat.com/jkratoch/dabr-lost.c

reproduces a PPC DABR kernel bug.  The variable `variable' should not get
modified as the thread modifying it should be caught by its DABR:

$ ./dabr-lost
TID 30914: DABR 0x10012a77 NIP 0x80f6ebb318
TID 30915: DABR 0x10012a77 NIP 0x80f6ebb318
TID 30916: DABR 0x10012a77 NIP 0x80f6ebb318
TID 30914: hitting the variable
TID 30915: hitting the variable
TID 30916: hitting the variable
variable found = 30916, caught TID = 30914
TID 30916: DABR 0x10012a77
Variable got modified by a thread which has DABR still set!

At the `variable found =' line the parent ptracer found the TID thread 30916
wrote the value into the variable - despite it had DABR alrady set before.

As the behavior is dependent on the current weather I expect the scheduling
matters there.

It is important the target thread is in the `nanosleep' syscall.  If you define
WORKAROUND_SET_DABR_IN_SYSCALL in the testcase it busyloops in the userland and
the bug gets no longer reproduced.

I got it reproduced on a utrace-patched kernel on dual-CPU Power5 and Roland
McGrath reported it reproduced on the vanilla upstream kernel on a Mac G5.



Regards,
Jan Kratochvil
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev