Re: PPC upstream kernel ignored DABR bug
On Wed, 26 Mar 2008 15:57:32 -0500 Josh Boyer <[EMAIL PROTECTED]> wrote: > On Wed, 12 Mar 2008 18:47:45 -0700 (PDT) > Roland McGrath <[EMAIL PROTECTED]> wrote: > > > The only machine I have at home for testing powerpc is an Apple G5, > > supplied to me by IBM. It says: > > cpu : PPC970FX, altivec supported > > revision: 3.0 (pvr 003c 0300) > > so I am guessing this document applies to the chips I have. Since I can't > > test on other chips myself, it is plausible from what I've seen that there > > is no mysterious kernel problem and only this hardware problem. The > > description of the hardware problem would not make me think that it would > > behave this way, but it is not very detailed or precise, or at least does > > not seem so to a reader not expert on powerpc. > > I ran the testcase on my older G5 today with: > > cpu : PPC970, altivec supported > revision: 2.2 (pvr 0039 0202) > > and it also failed after a few iterations. This was with > 2.6.25-0.121.rc5.git4.fc9 as the kernel, which is fairly close to mainline. > At the least, this doesn't seem to be 970FX related. I'll try building a > vanilla 2.6.25-rc7 later this evening to see if that makes a difference. Still failed with a -vanilla build of 2.6.25-rc7. josh ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: PPC upstream kernel ignored DABR bug
On Wed, 12 Mar 2008 18:47:45 -0700 (PDT) Roland McGrath <[EMAIL PROTECTED]> wrote: > The only machine I have at home for testing powerpc is an Apple G5, > supplied to me by IBM. It says: > cpu : PPC970FX, altivec supported > revision: 3.0 (pvr 003c 0300) > so I am guessing this document applies to the chips I have. Since I can't > test on other chips myself, it is plausible from what I've seen that there > is no mysterious kernel problem and only this hardware problem. The > description of the hardware problem would not make me think that it would > behave this way, but it is not very detailed or precise, or at least does > not seem so to a reader not expert on powerpc. I ran the testcase on my older G5 today with: cpu : PPC970, altivec supported revision: 2.2 (pvr 0039 0202) and it also failed after a few iterations. This was with 2.6.25-0.121.rc5.git4.fc9 as the kernel, which is fairly close to mainline. At the least, this doesn't seem to be 970FX related. I'll try building a vanilla 2.6.25-rc7 later this evening to see if that makes a difference. josh ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: PPC upstream kernel ignored DABR bug
On Fri, 2008-03-14 at 09:42 +0100, Segher Boessenkool wrote: > > I saw no effect from that change. So now we're back to pure > mystery, > > I guess. > > Hey, we know something now: it's "just" a problem in the kernel :-) We don't know that for sure. The DABR context switching code is trivial enough... Ben. ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: PPC upstream kernel ignored DABR bug
> Since the 970 kernel never sets DABRX currently, #8 cannot explain > _intermittent_ problems: either it always works, or never does. Uh... could be the boot code setting it, the setting happening on LSU0 but not LSU1. No ? Ben. ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: PPC upstream kernel ignored DABR bug
If this doesn't help, and the failures stay intermittent, I don't think there is a close-to-the-hardware problem here. I saw no effect from that change. So now we're back to pure mystery, I guess. Hey, we know something now: it's "just" a problem in the kernel :-) Segher ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: PPC upstream kernel ignored DABR bug
> In both these cases, the storage access goes to LSU0, so you're > not hitting the errata. I'll take your word for it. > If this doesn't help, and the failures stay intermittent, I don't think > there is a close-to-the-hardware problem here. I saw no effect from that change. So now we're back to pure mystery, I guess. Thanks, Roland ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: PPC upstream kernel ignored DABR bug
The pointer to the test case was given here before. Oh, I missed that. Anyway, I wanted to see the asm, and who knows, with different compiler versions and all that. 0x1984 : bl 0x10001750 0x1988 : lis r9,4097 ---> 0x198c : stw r29,7792(r9) 0x1d4c : bl 0x1a88 0x1d50 : ld r2,40(r1) 0x1d54 : ld r9,-32688(r2) ---> 0x1d58 : std r29,0(r9) In both these cases, the storage access goes to LSU0, so you're not hitting the errata. I noticed set_dabr() doesn't do proper synchronisation insns, could you try this patch? I doubt it helps, but it changes the code to do "the right thing". diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c index 4846bf5..ee925f5 100644 --- a/arch/powerpc/kernel/process.c +++ b/arch/powerpc/kernel/process.c @@ -250,7 +250,9 @@ int set_dabr(unsigned long dabr) /* XXX should we have a CPU_FTR_HAS_DABR ? */ #if defined(CONFIG_PPC64) || defined(CONFIG_6xx) + asm("sync"); mtspr(SPRN_DABR, dabr); + asm("isync"); #endif return 0; } (badly copy/pasted, please apply by hand. Will send a real patch later ;-) ) If this doesn't help, and the failures stay intermittent, I don't think there is a close-to-the-hardware problem here. Segher ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: PPC upstream kernel ignored DABR bug
> Since the 970 kernel never sets DABRX currently, #8 cannot explain > _intermittent_ problems: either it always works, or never does. That's kind of what I thought, but I couldn't make enough sense of the #8 text to be very sure. > You could be happening upon #5, if the non-triggering data breakpoints > are with vector loads/stores in strange code. They are not. > It would help if you could give us the disassembly of some code where the > breakpoint did not trigger; say, that insn and the previous 20 or so insns. The pointer to the test case was given here before. http://sources.redhat.com/cgi-bin/cvsweb.cgi/~checkout~/tests/ptrace-tests/tests/ppc-dabr-race.c?cvsroot=systemtap -m32Dump of assembler code for function child_thread: 0x1950 :stwur1,-32(r1) 0x1954 :li r3,207 0x1958 :mflrr0 0x195c : stw r29,20(r1) 0x1960 : stw r0,36(r1) 0x1964 : crclr 4*cr1+eq 0x1968 : bl 0x10001680 0x196c : lis r11,4097 0x1970 : mr r29,r3 0x1974 : li r3,1 0x1978 : lwz r9,7800(r11) 0x197c : addir9,r9,1 0x1980 : stw r9,7800(r11) 0x1984 : bl 0x10001750 0x1988 : lis r9,4097 --->0x198c : stw r29,7792(r9) 0x1990 : bl 0x10001760 0x1994 : bl 0x10001760 0x1998 : b 0x1990 End of assembler dump. -m64Dump of assembler code for function child_thread: 0x1d10 :mflrr0 0x1d14 :std r29,-24(r1) 0x1d18 :li r3,207 0x1d1c : std r0,16(r1) 0x1d20 : stdur1,-144(r1) 0x1d24 : bl 0x1b68 0x1d28 : ld r2,40(r1) 0x1d2c : ld r11,-32696(r2) 0x1d30 : mr r29,r3 0x1d34 : li r3,1 0x1d38 : extsw r29,r29 0x1d3c : lwz r9,0(r11) 0x1d40 : addir9,r9,1 0x1d44 : clrldi r9,r9,32 0x1d48 : stw r9,0(r11) 0x1d4c : bl 0x1a88 0x1d50 : ld r2,40(r1) 0x1d54 : ld r9,-32688(r2) --->0x1d58 : std r29,0(r9) 0x1d5c : nop 0x1d60 : bl 0x19a8 0x1d64 : ld r2,40(r1) 0x1d68 : b 0x1d60 0x1d6c : .long 0x0 0x1d70 : .long 0x1 0x1d74 : lwz r0,0(r3) End of assembler dump. Thanks, Roland ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: PPC upstream kernel ignored DABR bug
AFAICT the DABRX register just has two global bits that enable paying attention to the DABR register. It has four bits: 01 match in user mode 02 match in supervisor mode 04 match in hypervisor mode 08 ignore translation field in DABR If the kernel can write to DABRX, it is running in hypervisor mode, so it should set 07 instead of 03 (as it currently does) if it wants to match in kernel mode; or 01, if it doesn't. OTOH, the Apple version of the 970 is special (it has no separate hypervisor mode); still, 07 should always work. It only needs to be set once at boot time (as the cell code does). I don't see how missing that initialization could ever have explained the behavior we see where DABR matches are intermittent. If those DABRX bits weren't set then no DABR match would have happened. (Apparently they are set before boot on an Apple G5.) I don't see the Apple boot code initialising DABRX; maybe the bootup state for DABRX is 07, dunno. Either way, it would be good if the kernel set it properly, esp. if it wants to enable or disable matches in the kernel itself. What we actually see is that DABR matches seem to be reliable when things are slow, and get intermittent when there are enough threads with DABR set. I happened across: http://www-01.ibm.com/chips/techlib/techlib.nsf/techdocs/ 79B6E24422AA101287256E93006C957E/$file/ PowerPC_970FX_errata_DD3.X_V1.7.pdf which is "IBM PowerPC 970FX RISC Microprocessor Errata List for DD3.X" and contains "Erratum #8: DABRX register might not always be updated correctly": The only machine I have at home for testing powerpc is an Apple G5, supplied to me by IBM. It says: cpu : PPC970FX, altivec supported revision: 3.0 (pvr 003c 0300) so I am guessing this document applies to the chips I have. Indeed. Since I can't test on other chips myself, it is plausible from what I've seen that there is no mysterious kernel problem and only this hardware problem. The description of the hardware problem would not make me think that it would behave this way, but it is not very detailed or precise, or at least does not seem so to a reader not expert on powerpc. Since the 970 kernel never sets DABRX currently, #8 cannot explain _intermittent_ problems: either it always works, or never does. You could be happening upon #5, if the non-triggering data breakpoints are with vector loads/stores in strange code. I don't know what I can do next to tell whether this processor erratum is in fact what's happening in the test case. If it is, I don't know if there might be some arcane way to work around it despite "None" cited above. It would help if you could give us the disassembly of some code where the breakpoint did not trigger; say, that insn and the previous 20 or so insns. Segher ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: PPC upstream kernel ignored DABR bug
On Wed, 2008-03-12 at 23:30 +0100, Jens Osterkamp wrote: > > Just to make sure, i tested the binary against the 2.6.25-rc4 kernel. It > > still fails. So this is really an open bug for PPC. > > On a Cell- or 970-based machine ? > > Gruß, > Jens On a 970-based machine. Regards, -- Luis Machado Software Engineer IBM Linux Technology Center ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: PPC upstream kernel ignored DABR bug
AFAICT the DABRX register just has two global bits that enable paying attention to the DABR register. It only needs to be set once at boot time (as the cell code does). I don't see how missing that initialization could ever have explained the behavior we see where DABR matches are intermittent. If those DABRX bits weren't set then no DABR match would have happened. (Apparently they are set before boot on an Apple G5.) What we actually see is that DABR matches seem to be reliable when things are slow, and get intermittent when there are enough threads with DABR set. I searched the web trying to figure out what a DABRX register does so I could just go try it myself rather than waiting another n months for powerpc folks to forget about it again. (I did try it, and mtspr(SPRN_DABRX, DABRX_KERNEL | DABRX_USER); makes no difference to the test on my machine, even done in set_dabr every time we set SPRN_DABR.) I happened across: http://www-01.ibm.com/chips/techlib/techlib.nsf/techdocs/79B6E24422AA101287256E93006C957E/$file/PowerPC_970FX_errata_DD3.X_V1.7.pdf which is "IBM PowerPC 970FX RISC Microprocessor Errata List for DD3.X" and contains "Erratum #8: DABRX register might not always be updated correctly": Projected Impact The data address breakpoint function might not always work. Workaround None. Status A fix is not planned at this time for the PowerPC 970FX. The only machine I have at home for testing powerpc is an Apple G5, supplied to me by IBM. It says: cpu : PPC970FX, altivec supported revision: 3.0 (pvr 003c 0300) so I am guessing this document applies to the chips I have. Since I can't test on other chips myself, it is plausible from what I've seen that there is no mysterious kernel problem and only this hardware problem. The description of the hardware problem would not make me think that it would behave this way, but it is not very detailed or precise, or at least does not seem so to a reader not expert on powerpc. So, uh, go IBM! I'm in the minority in this conversation as someone not expert on powerpc, and as someone not employed by IBM. (I don't really mind finding public IBM documents about powerpc on the web and telling IBM powerpc folks about them. But, well.) I don't know what I can do next to tell whether this processor erratum is in fact what's happening in the test case. If it is, I don't know if there might be some arcane way to work around it despite "None" cited above. Thanks, Roland ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: PPC upstream kernel ignored DABR bug
> Just to make sure, i tested the binary against the 2.6.25-rc4 kernel. It > still fails. So this is really an open bug for PPC. On a Cell- or 970-based machine ? Gruß, Jens IBM Deutschland Entwicklung GmbH Vorsitzender des Aufsichtsrats: Martin Jetter Geschäftsführung: Herbert Kircher Sitz der Gesellschaft: Böblingen Registergericht: Amtsgericht Stuttgart, HRB 243294 ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: PPC upstream kernel ignored DABR bug
Hi, > On the Blade DABRX had to be set additional to DABR. PS3 and Celleb > already did this. Uli Weigand found this back in November. I submitted > a patch for this which went into 2.6.25-rc4. > Can you please try again with rc4 ? > Gruß, > > Jens Just to make sure, i tested the binary against the 2.6.25-rc4 kernel. It still fails. So this is really an open bug for PPC. -- Luis Machado Software Engineer IBM Linux Technology Center ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: PPC upstream kernel ignored DABR bug
On the Blade DABRX had to be set additional to DABR. PS3 and Celleb already did this. Uli Weigand found this back in November. I submitted a patch for this which went into 2.6.25-rc4. Can you please try again with rc4 ? This is not the problem. This came up before and everyone seems have forgotten. This bug has been reproduced on G5's, which do not have DABRX as I understand it. 970 (all versions) _does_ have a DABRX register. Dunno if it has the same register definition (I cannot find DABRX in the Cell docs). Segher ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: PPC upstream kernel ignored DABR bug
The G5 that I have says: cpu : PPC970FX, altivec supported revision: 3.0 (pvr 003c 0300) and it does indeed reproduce this bug. It also strange for it to be the DABRX issue given the failure mode. That is, it works sometimes but unreliably (as if the context switch sometimes fails to install the value). Thanks, Roland ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: PPC upstream kernel ignored DABR bug
On Mon, Mar 10, 2008 at 04:36:37PM -0300, Luis Machado wrote: > On Mon, 2008-03-10 at 12:19 -0700, Roland McGrath wrote: > > > On the Blade DABRX had to be set additional to DABR. PS3 and Celleb > > > already did this. Uli Weigand found this back in November. I submitted > > > a patch for this which went into 2.6.25-rc4. > > > Can you please try again with rc4 ? > > > > This is not the problem. This came up before and everyone seems have > > forgotten. This bug has been reproduced on G5's, which do not have DABRX > > as I understand it. > > Yes, now that you mentioned, i've been able to reproduce this on 970FX's > blades, which i don't think have DABRX registers. I guess it's the > almost the same CPU as G5's. What Apple called G5 were during the production runs three different CPUs: 970 970FX 970MP 970 was only used in the very first models. 970MP was used in the last (the models with pci-express and up to 4 cpus). 970FX was used on almost everything else inbetween. -Olof ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: PPC upstream kernel ignored DABR bug
On Mon, 2008-03-10 at 12:19 -0700, Roland McGrath wrote: > > On the Blade DABRX had to be set additional to DABR. PS3 and Celleb > > already did this. Uli Weigand found this back in November. I submitted > > a patch for this which went into 2.6.25-rc4. > > Can you please try again with rc4 ? > > This is not the problem. This came up before and everyone seems have > forgotten. This bug has been reproduced on G5's, which do not have DABRX > as I understand it. Yes, now that you mentioned, i've been able to reproduce this on 970FX's blades, which i don't think have DABRX registers. I guess it's the almost the same CPU as G5's. Regards, -- Luis Machado Software Engineer IBM Linux Technology Center ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: PPC upstream kernel ignored DABR bug
> On the Blade DABRX had to be set additional to DABR. PS3 and Celleb > already did this. Uli Weigand found this back in November. I submitted > a patch for this which went into 2.6.25-rc4. > Can you please try again with rc4 ? This is not the problem. This came up before and everyone seems have forgotten. This bug has been reproduced on G5's, which do not have DABRX as I understand it. Thanks, Roland ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: PPC upstream kernel ignored DABR bug
> On the Blade DABRX had to be set additional to DABR. PS3 and Celleb > already did this. Uli Weigand found this back in November. I submitted > a patch for this which went into 2.6.25-rc4. > Can you please try again with rc4 ? I will try it and will post the results back. Thanks Jens. Regards, -- Luis Machado Software Engineer IBM Linux Technology Center ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: PPC upstream kernel ignored DABR bug
On Monday 10 March 2008, Luis Machado wrote: > > Yes, I know. I tried it on the PS3 first and couldn't reproduce > > the bug he saw on the blade. > > Arnd, > > Do we have any news on this topic? > > I've seen this happening quite often within GDB when using hardware > watchpoints on a shared variable in a threaded (7+ threads) binary. > Sometimes the watchpoint won't trigger, even though the monitored > variable's value was modified. On the Blade DABRX had to be set additional to DABR. PS3 and Celleb already did this. Uli Weigand found this back in November. I submitted a patch for this which went into 2.6.25-rc4. Can you please try again with rc4 ? Gruß, Jens IBM Deutschland Entwicklung GmbH Vorsitzender des Aufsichtsrats: Martin Jetter Geschäftsführung: Herbert Kircher Sitz der Gesellschaft: Böblingen Registergericht: Amtsgericht Stuttgart, HRB 243294 ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: PPC upstream kernel ignored DABR bug
> Yes, I know. I tried it on the PS3 first and couldn't reproduce > the bug he saw on the blade. Arnd, Do we have any news on this topic? I've seen this happening quite often within GDB when using hardware watchpoints on a shared variable in a threaded (7+ threads) binary. Sometimes the watchpoint won't trigger, even though the monitored variable's value was modified. Appreciate your feedback. Best regards, -- Luis Machado LoP Toolchain Software Engineer IBM Linux Technology Center ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: PPC upstream kernel ignored DABR bug
On Wednesday 28 November 2007 23:59:36 Geoff Levand wrote: > > This sounds like a bug recently reported by Uli Weigand. BenH > > said he'd take a look, but it probably fell under the table. > > The problem found by Uli is that on certain processors (Cell/B.E. > > in his case), the DABRX register needs to be set in order for > > the DABR to take effect. > > Just as a note, the PS3's lv1_set_dabr(), which we used for > ppc_md.set_dabr sets up both the DABRX and DABR registers. Yes, I know. I tried it on the PS3 first and couldn't reproduce the bug he saw on the blade. Arnd <>< ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: PPC upstream kernel ignored DABR bug
Arnd Bergmann wrote: > On Monday 26 November 2007, Jan Kratochvil wrote: >> Hi, >> >> this testcase: >> http://people.redhat.com/jkratoch/dabr-lost.c >> >> reproduces a PPC DABR kernel bug. The variable `variable' should not get >> modified as the thread modifying it should be caught by its DABR: >> >> $ ./dabr-lost >> TID 30914: DABR 0x10012a77 NIP 0x80f6ebb318 >> TID 30915: DABR 0x10012a77 NIP 0x80f6ebb318 >> TID 30916: DABR 0x10012a77 NIP 0x80f6ebb318 >> TID 30914: hitting the variable >> TID 30915: hitting the variable >> TID 30916: hitting the variable >> variable found = 30916, caught TID = 30914 >> TID 30916: DABR 0x10012a77 >> Variable got modified by a thread which has DABR still set! >> > > This sounds like a bug recently reported by Uli Weigand. BenH > said he'd take a look, but it probably fell under the table. > The problem found by Uli is that on certain processors (Cell/B.E. > in his case), the DABRX register needs to be set in order for > the DABR to take effect. Just as a note, the PS3's lv1_set_dabr(), which we used for ppc_md.set_dabr sets up both the DABRX and DABR registers. -Geoff ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: PPC upstream kernel ignored DABR bug
On Wed, 28 Nov 2007 13:28:48 +0100, Arnd Bergmann wrote: > On Wednesday 28 November 2007, Jan Kratochvil wrote: > > Please be aware DABR works fine if the same code runs just 1 (always) or > > 2 (sometimes) threads. It starts failing with too many threads running: > > > > $ ./dabr-lost > > TID 32725: DABR 0x1001279f NIP 0xfecf41c > > TID 32726: DABR 0x1001279f NIP 0xfecf41c > > TID 32725: hitting the variable > > variable found = -1, caught TID = 32725 > > TID 32726: hitting the variable > > variable found = -1, caught TID = 32726 > > The kernel bug did not get reproduced - increase THREADS. > > > > As I did not find any code in that kernel touching DABRX its value should > > not > > be dependent on the number of threads running. > > > > Right, this is a different problem from the one reported by Uli. > From what I can tell, your problem is that you set the DABR only > in one thread, so the other threads don't see it. DABR is saved > in the thread_struct, so setting it in one thread doesn't have > an impact on any other thread. It even prints out above: TID 32725: DABR 0x1001279f NIP 0xfecf41c TID 32726: DABR 0x1001279f NIP 0xfecf41c that it wrote DABR in both the threads and it has also successfully read it back from each thread specifically (according to its thread-specific TID). for (threadi = 0; threadi < THREADS; threadi++) { pid_t tid = thread[threadi]; setup (tid); ... } static void setup (pid_t tid) { ... l = ptrace (PTRACE_SET_DEBUGREG, tid, NULL, (void *) dabr); ... } Also if I would not set DABR specifically for each thread it would not work in 90% of cases for `THREADS == 2'. And it would not work for `THREADS == 4' if they are busylooping (therefore not in a syscall). TID 596: DABR 0x100127a7 NIP 0x1dbc TID 597: DABR 0x100127a7 NIP 0x1db0 TID 598: DABR 0x100127a7 NIP 0x1dac TID 599: DABR 0x100127a7 NIP 0x1dbc TID 596: hitting the variable variable found = -1, caught TID = 596 TID 599: hitting the variable variable found = -1, caught TID = 599 TID 597: hitting the variable variable found = -1, caught TID = 597 TID 598: hitting the variable variable found = -1, caught TID = 598 The kernel bug got workarounded by WORKAROUND_SET_DABR_IN_SYSCALL. (I found out now WORKAROUND_SET_DABR_IN_SYSCALL only reduces the probability of the failure, it is not a 100% workaround of the problem in the testcase.) There is some tricky kernel code around it but I did not try to debug it: struct task_struct *__switch_to(struct task_struct *prev, struct task_struct *new) { ... if (unlikely(__get_cpu_var(current_dabr) != new->thread.dabr)) { set_dabr(new->thread.dabr); __get_cpu_var(current_dabr) = new->thread.dabr; } ... } Regards, Jan ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: PPC upstream kernel ignored DABR bug
On Wednesday 28 November 2007, Jan Kratochvil wrote: > Please be aware DABR works fine if the same code runs just 1 (always) or > 2 (sometimes) threads. It starts failing with too many threads running: > > $ ./dabr-lost > TID 32725: DABR 0x1001279f NIP 0xfecf41c > TID 32726: DABR 0x1001279f NIP 0xfecf41c > TID 32725: hitting the variable > variable found = -1, caught TID = 32725 > TID 32726: hitting the variable > variable found = -1, caught TID = 32726 > The kernel bug did not get reproduced - increase THREADS. > > As I did not find any code in that kernel touching DABRX its value should not > be dependent on the number of threads running. > Right, this is a different problem from the one reported by Uli. From what I can tell, your problem is that you set the DABR only in one thread, so the other threads don't see it. DABR is saved in the thread_struct, so setting it in one thread doesn't have an impact on any other thread. Arnd <>< ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: PPC upstream kernel ignored DABR bug
On Tue, 27 Nov 2007 23:35:36 +0100, Arnd Bergmann wrote: > On Monday 26 November 2007, Jan Kratochvil wrote: > > Hi, > > > > this testcase: > > http://people.redhat.com/jkratoch/dabr-lost.c > > > > reproduces a PPC DABR kernel bug. The variable `variable' should not get > > modified as the thread modifying it should be caught by its DABR: > > > > $ ./dabr-lost > > TID 30914: DABR 0x10012a77 NIP 0x80f6ebb318 > > TID 30915: DABR 0x10012a77 NIP 0x80f6ebb318 > > TID 30916: DABR 0x10012a77 NIP 0x80f6ebb318 > > TID 30914: hitting the variable > > TID 30915: hitting the variable > > TID 30916: hitting the variable > > variable found = 30916, caught TID = 30914 > > TID 30916: DABR 0x10012a77 > > Variable got modified by a thread which has DABR still set! > > > > This sounds like a bug recently reported by Uli Weigand. BenH > said he'd take a look, but it probably fell under the table. > The problem found by Uli is that on certain processors (Cell/B.E. > in his case), the DABRX register needs to be set in order for > the DABR to take effect. Please be aware DABR works fine if the same code runs just 1 (always) or 2 (sometimes) threads. It starts failing with too many threads running: $ ./dabr-lost TID 32725: DABR 0x1001279f NIP 0xfecf41c TID 32726: DABR 0x1001279f NIP 0xfecf41c TID 32725: hitting the variable variable found = -1, caught TID = 32725 TID 32726: hitting the variable variable found = -1, caught TID = 32726 The kernel bug did not get reproduced - increase THREADS. As I did not find any code in that kernel touching DABRX its value should not be dependent on the number of threads running. Regards, Lace ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: PPC upstream kernel ignored DABR bug
On Monday 26 November 2007, Jan Kratochvil wrote: > Hi, > > this testcase: > http://people.redhat.com/jkratoch/dabr-lost.c > > reproduces a PPC DABR kernel bug. The variable `variable' should not get > modified as the thread modifying it should be caught by its DABR: > > $ ./dabr-lost > TID 30914: DABR 0x10012a77 NIP 0x80f6ebb318 > TID 30915: DABR 0x10012a77 NIP 0x80f6ebb318 > TID 30916: DABR 0x10012a77 NIP 0x80f6ebb318 > TID 30914: hitting the variable > TID 30915: hitting the variable > TID 30916: hitting the variable > variable found = 30916, caught TID = 30914 > TID 30916: DABR 0x10012a77 > Variable got modified by a thread which has DABR still set! > This sounds like a bug recently reported by Uli Weigand. BenH said he'd take a look, but it probably fell under the table. The problem found by Uli is that on certain processors (Cell/B.E. in his case), the DABRX register needs to be set in order for the DABR to take effect. Arnd <>< ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
PPC upstream kernel ignored DABR bug
Hi, this testcase: http://people.redhat.com/jkratoch/dabr-lost.c reproduces a PPC DABR kernel bug. The variable `variable' should not get modified as the thread modifying it should be caught by its DABR: $ ./dabr-lost TID 30914: DABR 0x10012a77 NIP 0x80f6ebb318 TID 30915: DABR 0x10012a77 NIP 0x80f6ebb318 TID 30916: DABR 0x10012a77 NIP 0x80f6ebb318 TID 30914: hitting the variable TID 30915: hitting the variable TID 30916: hitting the variable variable found = 30916, caught TID = 30914 TID 30916: DABR 0x10012a77 Variable got modified by a thread which has DABR still set! At the `variable found =' line the parent ptracer found the TID thread 30916 wrote the value into the variable - despite it had DABR alrady set before. As the behavior is dependent on the current weather I expect the scheduling matters there. It is important the target thread is in the `nanosleep' syscall. If you define WORKAROUND_SET_DABR_IN_SYSCALL in the testcase it busyloops in the userland and the bug gets no longer reproduced. I got it reproduced on a utrace-patched kernel on dual-CPU Power5 and Roland McGrath reported it reproduced on the vanilla upstream kernel on a Mac G5. Regards, Jan Kratochvil ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev