Re: powerpc/pci sysdata batch hangs G5 boot
On Sat, Mar 19, 2011 at 9:41 PM, Hugh Dickins hu...@google.com wrote: Hi Grant, I've been unable to boot mmotm on the G5 for a few weeks; and now that the problem has reached Linus, I've bisected and it converges on your: commit b5d937de0367d26f65b9af1aef5f2c34c1939be0 powerpc/pci: Make both ppc32 and ppc64 use sysdata for pci_controller Hi Hugh, Thanks for the testing. I don't have access to a G5 unfortunately. Are you able to capture the good/bad console output and send it to me? A digital photo would be fine if you can't grab the raw text. Add #define DEBUG to the top of arch/powerpc/kernel/pci-common.c above the #includes too if you don't mind. I'm investigating on my end. I suspect that I've messed up retrieval of the hose pointer. Thanks, g. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: powerpc/pci sysdata batch hangs G5 boot
On Sun, 2011-03-20 at 00:25 -0600, Grant Likely wrote: On Sat, Mar 19, 2011 at 9:41 PM, Hugh Dickins hu...@google.com wrote: Hi Grant, I've been unable to boot mmotm on the G5 for a few weeks; and now that the problem has reached Linus, I've bisected and it converges on your: commit b5d937de0367d26f65b9af1aef5f2c34c1939be0 powerpc/pci: Make both ppc32 and ppc64 use sysdata for pci_controller Hi Hugh, Thanks for the testing. I don't have access to a G5 unfortunately. Are you able to capture the good/bad console output and send it to me? A digital photo would be fine if you can't grab the raw text. Add #define DEBUG to the top of arch/powerpc/kernel/pci-common.c above the #includes too if you don't mind. I'm investigating on my end. I suspect that I've messed up retrieval of the hose pointer. Hrm, you merged that already ? I would have liked to have a chance to at least test and review properly... Oh well, I have G5's here, I'll see if I can find what's wrong tomorrow. Cheers, Ben. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
PowerMac7,3 dvd drive?
Hi, I am seeing ... issues with the optical drive (hda) under 2.6.36. I can't mount disks: [root@PowerMacG5 ~]# mount -r /dev/hda /mnt/cdrom mount: /dev/hda already mounted or /mnt/cdrom busy The log has: [ 239.922268] hda: irq timeout: status=0xd0 { Busy } [ 239.922485] hda: possibly failed opcode: 0xa0 eject hda will hang ... longer than my patience. At first I thought the drive was going south. But I don't see this (at least so far) on 2.6.28. Thanks! kevin ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: mmotm threatens ppc preemption again
On Sat, 2011-03-19 at 21:11 -0700, Hugh Dickins wrote: As I warned a few weeks ago, Jeremy has vmalloc apply_to_pte_range patches in mmotm, which again assault PowerPC's expectations, and cause lots of noise with CONFIG_PREEMPT=y CONFIG_PREEMPT_DEBUG=y. This time in vmalloc as well as vfree; and Peter's fix to the last lot, which went into 2.6.38, doesn't protect against these ones. Here's what I now see when I swapon and swapoff: Right. And we said from day one we had the HARD WIRED assumption that arch_enter/leave_lazy_mmu_mode() was ALWAYS going to be called within a PTE lock section, and we did get reassurance that it was going to remain so. So why is it ok for them to change those and break us like that ? Seriously, this is going out of control. If we can't even rely on fundamental locking assumptions in the VM to remain reasonably stable or at least get some amount of -care- from who changes them as to whether they break others and work with us to fix them, wtf ? I don't know what the right way to fix that is. We have an absolute requirement that the batching we start within a lazy MMU section is complete and flushed before any other PTE in that section can be touched by anything else. Do we -at least- keep that guarantee ? If yes, then maybe preempt_disable/enable() around arch_enter/leave_lazy_mmu_mode() in apply_to_pte_range() would do... Or maybe I should just prevent any batching of init_mm :-( Cheers, Ben. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v2] powerpc/ptrace: remove BUG_ON when full register set not available
On Wed, 2011-03-16 at 08:37 -0500, Michael Wolf wrote: In some cases during a threaded core dump not all the threads will have a full register set. This will cause problems when the sigkill is sent to the thread. To solve this problem a poison value (0xdeadbeef) will be placed in the buffer in place of the actual register values. This will affect gpr14 to gpr31. Signed-off-by: Mike Wolf m...@linux.vnet.ibm.com Patch is busted on ppc32 (you add #define's in the middle of a multi-line macro) and of doubtful stylistic value :-) I'll merge a slightly reworked variant that includes a new cset comment with Paulus explanation in it. Cheers, Ben. -- --- linux-2.6.32-71.el6.ppc64.orig/arch/powerpc/include/asm/ptrace.h 2010-08-31 23:56:50.0 -0500 +++ linux-2.6.32-71.el6.ppc64/arch/powerpc/include/asm/ptrace.h 2011-03-14 11:43:33.176667099 -0500 @@ -123,8 +123,14 @@ extern int ptrace_put_reg(struct task_st #define TRAP(regs) ((regs)-trap ~0xF) #ifdef __powerpc64__ #define CHECK_FULL_REGS(regs)BUG_ON(regs-trap 1) +#define PARTIAL_REG_FILL 0xdeadbeefdeadbeefUL +#define PARTIAL_REG_START14 +#define PARTIAL_REG_END 31 #else #define CHECK_FULL_REGS(regs) \ +#define PARTIAL_REG_FILL 0xdeadbeef +#define PARTIAL_REG_START14 +#define PARTIAL_REG_END 31 do { \ if ((regs)-trap 1) \ printk(KERN_CRIT %s: partial register set\n, __func__); \ --- linux-2.6.32-71.el6.ppc64.orig/arch/powerpc/kernel/ptrace.c 2009-12-02 21:51:21.0 -0600 +++ linux-2.6.32-71.el6.ppc64/arch/powerpc/kernel/ptrace.c2011-03-14 13:01:51.955586126 -0500 @@ -125,11 +125,16 @@ static int gpr_get(struct task_struct *t void *kbuf, void __user *ubuf) { int ret; + int partial_reg; if (target-thread.regs == NULL) return -EIO; - CHECK_FULL_REGS(target-thread.regs); + if (!FULL_REGS(target-thread.regs)) +/* We have a partial register set. Fill 14-31 with bogus values */ +for(partial_reg=PARTIAL_REG_START;partial_reg = PARTIAL_REG_END; + partial_reg++) + target-thread.regs-gpr[partial_reg] = PARTIAL_REG_FILL; ret = user_regset_copyout(pos, count, kbuf, ubuf, target-thread.regs, @@ -536,11 +541,16 @@ static int gpr32_get(struct task_struct compat_ulong_t *k = kbuf; compat_ulong_t __user *u = ubuf; compat_ulong_t reg; + int partial_reg; if (target-thread.regs == NULL) return -EIO; - CHECK_FULL_REGS(target-thread.regs); + if (!FULL_REGS(target-thread.regs)) +/* We have a partial register set. Fill 14-31 with bogus values */ +for(partial_reg=PARTIAL_REG_START;partial_reg = PARTIAL_REG_END; + partial_reg++) + target-thread.regs-gpr[partial_reg] = PARTIAL_REG_FILL; pos /= sizeof(reg); count /= sizeof(reg); ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH] powerpc/ptrace: Remove BUG_ON when full register set not available
From: Mike Wolf m...@linux.vnet.ibm.com In some cases during a threaded core dump not all the threads will have a full register set. This happens when the signal causing the core dump races with a thread exiting. The race happens when the exiting thread has entered the kernel for the last time before the signal arrives, but doesn't get far enough through the exit code to avoid being included in the core dump. So we get a thread included in the core dump which is never going to go out to userspace again and only has a partial register set recorded Normally we would catch each thread as it is about to go into userspace and capture the full register set then. However, this exiting thread is never going to go out to userspace again, so we have no way to capture its full register set. It doesn't really matter, though, as this is a thread which is effectively already dead. So instead of hitting a BUG() in this case (a really bad choice of action in the first place), we use a poison value for the register values. [BenH]: Some cosmetic/stylistic changes and fix build on ppc32 Signed-off-by: Mike Wolf m...@linux.vnet.ibm.com Signed-off-by: Benjamin Herrenschmidt b...@kernel.crashing.org --- arch/powerpc/include/asm/ptrace.h |2 ++ arch/powerpc/kernel/ptrace.c | 15 --- 2 files changed, 14 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/include/asm/ptrace.h b/arch/powerpc/include/asm/ptrace.h index 0175a67..48223f9 100644 --- a/arch/powerpc/include/asm/ptrace.h +++ b/arch/powerpc/include/asm/ptrace.h @@ -125,8 +125,10 @@ extern int ptrace_put_reg(struct task_struct *task, int regno, #endif /* ! __powerpc64__ */ #define TRAP(regs) ((regs)-trap ~0xF) #ifdef __powerpc64__ +#define NV_REG_POISON 0xdeadbeefdeadbeefUL #define CHECK_FULL_REGS(regs) BUG_ON(regs-trap 1) #else +#define NV_REG_POISON 0xdeadbeef #define CHECK_FULL_REGS(regs)\ do { \ if ((regs)-trap 1) \ diff --git a/arch/powerpc/kernel/ptrace.c b/arch/powerpc/kernel/ptrace.c index 9065369..895b082 100644 --- a/arch/powerpc/kernel/ptrace.c +++ b/arch/powerpc/kernel/ptrace.c @@ -229,12 +229,16 @@ static int gpr_get(struct task_struct *target, const struct user_regset *regset, unsigned int pos, unsigned int count, void *kbuf, void __user *ubuf) { - int ret; + int i, ret; if (target-thread.regs == NULL) return -EIO; - CHECK_FULL_REGS(target-thread.regs); + if (!FULL_REGS(target-thread.regs)) { + /* We have a partial register set. Fill 14-31 with bogus values */ + for (i = 14; i 32; i++) + target-thread.regs-gpr[i] = NV_REG_POISON; + } ret = user_regset_copyout(pos, count, kbuf, ubuf, target-thread.regs, @@ -641,11 +645,16 @@ static int gpr32_get(struct task_struct *target, compat_ulong_t *k = kbuf; compat_ulong_t __user *u = ubuf; compat_ulong_t reg; + int i; if (target-thread.regs == NULL) return -EIO; - CHECK_FULL_REGS(target-thread.regs); + if (!FULL_REGS(target-thread.regs)) { + /* We have a partial register set. Fill 14-31 with bogus values */ + for (i = 14; i 32; i++) + target-thread.regs-gpr[i] = NV_REG_POISON; + } pos /= sizeof(reg); count /= sizeof(reg); ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: PowerMac7,3 dvd drive?
On Sun, 2011-03-20 at 18:52 -0500, kevin diggs wrote: I am seeing ... issues with the optical drive (hda) under 2.6.36. I can't mount disks: [root@PowerMacG5 ~]# mount -r /dev/hda /mnt/cdrom mount: /dev/hda already mounted or /mnt/cdrom busy The log has: [ 239.922268] hda: irq timeout: status=0xd0 { Busy } [ 239.922485] hda: possibly failed opcode: 0xa0 eject hda will hang ... longer than my patience. At first I thought the drive was going south. But I don't see this (at least so far) on 2.6.28. Do you see something similar if you use the new libata based driver (macio-ata) instead of the old IDE driver ? It does look like the drive itself is crashing tho. Maybe something the IDE CDROM driver does upsets it... Cheers, Ben. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH] powerpc: Fix accounting of softirq time when idle
commit cf9efce0ce31 (powerpc: Account time using timebase rather than PURR) used in_irq() to detect if the time was spent in interrupt processing. This only catches hardirq context so if we are in softirq context and in the idle loop we end up accounting it as idle time. If we instead use in_interrupt() we catch both softirq and hardirq time. The issue was found when running a network intensive workload. top showed the following: 0.0%us, 1.1%sy, 0.0%ni, 85.7%id, 0.0%wa, 9.9%hi, 3.3%si, 0.0%st 85.7% idle. But this was wildly different to the perf events data. To confirm the suspicion I ran something to keep the core busy: # yes /dev/null 8.2%us, 0.0%sy, 0.0%ni, 0.0%id, 0.0%wa, 10.3%hi, 81.4%si, 0.0%st We only got 8.2% of the CPU for the userspace task and softirq has shot up to 81.4%. With the patch below top shows the correct stats: 0.0%us, 0.0%sy, 0.0%ni, 5.3%id, 0.0%wa, 13.3%hi, 81.3%si, 0.0%st Signed-off-by: Anton Blanchard an...@samba.org Cc: sta...@kernel.org --- Index: linux-2.6/arch/powerpc/kernel/time.c === --- linux-2.6.orig/arch/powerpc/kernel/time.c 2011-03-21 12:05:12.056482258 +1100 +++ linux-2.6/arch/powerpc/kernel/time.c2011-03-21 12:05:18.516721851 +1100 @@ -356,7 +356,7 @@ void account_system_vtime(struct task_st } get_paca()-user_time_scaled += user_scaled; - if (in_irq() || idle_task(smp_processor_id()) != tsk) { + if (in_interrupt() || idle_task(smp_processor_id()) != tsk) { account_system_time(tsk, 0, delta, sys_scaled); if (stolen) account_steal_time(stolen); ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: mmotm threatens ppc preemption again
On Mon, 21 Mar 2011, Benjamin Herrenschmidt wrote: On Sat, 2011-03-19 at 21:11 -0700, Hugh Dickins wrote: As I warned a few weeks ago, Jeremy has vmalloc apply_to_pte_range patches in mmotm, which again assault PowerPC's expectations, and cause lots of noise with CONFIG_PREEMPT=y CONFIG_PREEMPT_DEBUG=y. This time in vmalloc as well as vfree; and Peter's fix to the last lot, which went into 2.6.38, doesn't protect against these ones. Here's what I now see when I swapon and swapoff: Right. And we said from day one we had the HARD WIRED assumption that arch_enter/leave_lazy_mmu_mode() was ALWAYS going to be called within a PTE lock section, and we did get reassurance that it was going to remain so. So why is it ok for them to change those and break us like that ? It's not ok. Sounds like Andrew should not forward mm-remove-unused-token-argument-from-apply_to_page_range-callback.patch mm-add-apply_to_page_range_batch.patch ioremap-use-apply_to_page_range_batch-for-ioremap_page_range.patch vmalloc-use-plain-pte_clear-for-unmaps.patch vmalloc-use-apply_to_page_range_batch-for-vunmap_page_range.patch vmalloc-use-apply_to_page_range_batch-for-vmap_page_range_noflush.patch vmalloc-use-apply_to_page_range_batch-in-alloc_vm_area.patch xen-mmu-use-apply_to_page_range_batch-in-xen_remap_domain_mfn_range.patch xen-grant-table-use-apply_to_page_range_batch.patch or some subset (the vmalloc-use-apply ones? and the ioremap one?) of that set to Linus for 2.6.39. Your call. Seriously, this is going out of control. If we can't even rely on fundamental locking assumptions in the VM to remain reasonably stable or at least get some amount of -care- from who changes them as to whether they break others and work with us to fix them, wtf ? I know next to nothing of arch_enter/leave_lazy_mmu_mode(), and the same is probably true of most mm developers. The only people who have it defined to anything interesting appear to be powerpc and xen and lguest: so it would be a gentleman's agreement between you and Jeremy and Rusty. If Jeremy has changed the rules without your agreement, then you can fight a duel at daybreak, or, since your daybreaks are at different times, Jeremy's patches just shouldn't go forward yet. I don't know what the right way to fix that is. We have an absolute requirement that the batching we start within a lazy MMU section is complete and flushed before any other PTE in that section can be touched by anything else. Do we -at least- keep that guarantee ? I'm guessing it's a guarantee of the same kind as led me to skip page_table_lock on init_mm in 2.6.15: no locking to guarantee it, but it would have to be a kernel bug, in a driver or wherever, for us to be accessing such a section while it was in transit (short of speculative access prior to tlb flush). If yes, then maybe preempt_disable/enable() around arch_enter/leave_lazy_mmu_mode() in apply_to_pte_range() would do... Or maybe I should just prevent any batching of init_mm :-( I don't see where you're doing batching on init_mm today: it looks as if Jeremy's patches, by using the same code as he has for user mms, are now enabling batching on init_mm, and you should :-) But I may be all wrong: it's between you and Jeremy, and until he defends them, his patches should not go forward. Hugh ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[git pull] Please pull powerpc.git merge branch
Hi Linus Here's a fix for the regression introduced by b5d937de0367d26f65b9af1aef5f2c34c1939be0 along with a bug fix from Mike Wolf for a nasty BUG_ON() that shoudn't be there for some odd case of threaded core dumps, and 3 patches from Meador Inge that I plain forgot to include before. Cheers, Ben. The following changes since commit a952baa034ae7c2e4a66932005cbc7ebbccfe28d: Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input (2011-03-19 22:27:06 -0700) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc.git merge Benjamin Herrenschmidt (1): powerpc/pci: Fix crash in PCI code on ppc64 when matching device nodes Meador Inge (3): powerpc: Document the Open PIC device tree binding powerpc: Make MPIC honor the pic-no-reset device tree property powerpc: Factoring mpic cpu id fetching into a function Mike Wolf (1): powerpc/ptrace: Remove BUG_ON when full register set not available Documentation/devicetree/bindings/open-pic.txt | 98 arch/powerpc/include/asm/mpic.h|4 + arch/powerpc/include/asm/ptrace.h |2 + arch/powerpc/kernel/pci_dn.c |7 +- arch/powerpc/kernel/ptrace.c | 15 +++- arch/powerpc/sysdev/mpic.c | 85 +++- 6 files changed, 184 insertions(+), 27 deletions(-) create mode 100644 Documentation/devicetree/bindings/open-pic.txt ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: mmotm threatens ppc preemption again
On Sun, 2011-03-20 at 18:41 -0700, Hugh Dickins wrote: I don't know what the right way to fix that is. We have an absolute requirement that the batching we start within a lazy MMU section is complete and flushed before any other PTE in that section can be touched by anything else. Do we -at least- keep that guarantee ? I'm guessing it's a guarantee of the same kind as led me to skip page_table_lock on init_mm in 2.6.15: no locking to guarantee it, but it would have to be a kernel bug, in a driver or wherever, for us to be accessing such a section while it was in transit (short of speculative access prior to tlb flush). As long as the races to avoid are between map/unmap vs. access, yes, it -should- be fine, and we used to not do demand faulting on kernel space (but for how long ?). I'm wondering why we don't just stick a ptl in there or is there a good reason why we can't ? I don't see where you're doing batching on init_mm today: it looks as if Jeremy's patches, by using the same code as he has for user mms, are now enabling batching on init_mm, and you should :-) But I may be all wrong: it's between you and Jeremy, and until he defends them, his patches should not go forward. We don't do it today (batching). Jeremy's patches have the side effect of enabling it, which isn't wrong per-se ... but on our side relies on some locking assumptions we are missing. Cheers, Ben. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: mmotm threatens ppc preemption again
On Mon, 21 Mar 2011, Benjamin Herrenschmidt wrote: As long as the races to avoid are between map/unmap vs. access, yes, it -should- be fine, and we used to not do demand faulting on kernel space (but for how long ?). I'm wondering why we don't just stick a ptl in there or is there a good reason why we can't ? We can - but we usually prefer to avoid unnecessary locking. An arch function which locks init_mm.page_table_lock on powerpc, but does nothing on others? Hugh ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: mmotm threatens ppc preemption again
On Sun, 2011-03-20 at 19:20 -0700, Hugh Dickins wrote: As long as the races to avoid are between map/unmap vs. access, yes, it -should- be fine, and we used to not do demand faulting on kernel space (but for how long ?). I'm wondering why we don't just stick a ptl in there or is there a good reason why we can't ? We can - but we usually prefer to avoid unnecessary locking. An arch function which locks init_mm.page_table_lock on powerpc, but does nothing on others? That still means gratuitous differences between how the normal and kernel page tables are handled. Maybe that's not worth bothering ... Cheers, Ben. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 1/2] kdump: Allow shrinking of kdump region to be overridden
On Tue, 15 Mar 2011 22:22:19 +0530, Mahesh J Salgaonkar wrote: On Tue, Mar 15, 2011 at 03:52:38PM +0800, Américo Wang wrote: On Tue, Mar 15, 2011 at 2:13 AM, Mahesh J Salgaonkar mah...@linux.vnet.ibm.com wrote: During free we do free all of them including RMO region. But since the rtas region is always on top of RMO, crashkernel memory overlaps rtas region and we endup freeing that even, which is causing the crash. Okay, but with this patch applied, we will just ignore rtas region, right? Correct. Thus, when I echo 0 to free all the 128M crashkernel memory, the final result will be 32M left, which means crash_size will still show 32M. This looks odd. How about skipping the 32M as a whole? I mean once the region being freed has overlap with this rtas region, skip the whole rtas region, and let crash_size show 0? The existing code from crash_shrink_memory() function reduces the crash size to 0 when echo'ed 0. I did test this patchset and verified that /sys/kernel/kexec_crash_size show 0 value. Oh, ok. Acked-by: WANG Cong xiyou.wangc...@gmail.com Thanks. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev