Re: Regression in 32-bit ppc kernel
On 04/27/2012 07:42 PM, Benjamin Herrenschmidt wrote: Ok, so you do have a serial port, probably two even :-) One of them is connected to the infra red transceiver and the other one is probably connected to the internal modem. (The modem itself might not use it, some of these machines use an i2s/i2c modem, some use a usb modem, but the serial port is wired to the connector regardless). I have done a little more debugging. The problem is definitely coming from drivers/tty/serial/pmac_zilog.c. I am getting ChanB interrupts while open, which causes the following code segment to return IRQ_NONE: if (r3 (CHBEXT | CHBTxIP | CHBRxIP)) { if (!ZS_IS_OPEN(uap_a)) { pmz_debug(ChanB interrupt while open !\n); goto skip_b; } write_zsreg(uap_b, R0, RES_H_IUS); zssync(uap_b); if (r3 CHBEXT) When this section is entered, r3 == 0x2 (CHBTxIP). Larry ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: Regression in 32-bit ppc kernel
Larry Finger larry.fin...@lwfinger.net writes: I have done a little more debugging. The problem is definitely coming from drivers/tty/serial/pmac_zilog.c. I am getting ChanB interrupts while open, which causes the following code segment to return IRQ_NONE: if (r3 (CHBEXT | CHBTxIP | CHBRxIP)) { if (!ZS_IS_OPEN(uap_a)) { s/uap_a/uap_b/? Andreas. -- Andreas Schwab, sch...@linux-m68k.org GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 And now for something completely different. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: Regression in 32-bit ppc kernel
On Sat, 2012-04-28 at 13:09 -0500, Larry Finger wrote: I have done a little more debugging. The problem is definitely coming from drivers/tty/serial/pmac_zilog.c. I am getting ChanB interrupts while open, which causes the following code segment to return IRQ_NONE: if (r3 (CHBEXT | CHBTxIP | CHBRxIP)) { if (!ZS_IS_OPEN(uap_a)) { pmz_debug(ChanB interrupt while open !\n); goto skip_b; } write_zsreg(uap_b, R0, RES_H_IUS); zssync(uap_b); if (r3 CHBEXT) When this section is entered, r3 == 0x2 (CHBTxIP). Ok. The debug code was meant to spell while not open btw :-) I have some ideas what's going on. I think the irda stuff can trigger interrupts during the open/close sequence before ZS_IS_OPEN is true. I'll send a fix. Cheers, Ben. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: Regression in 32-bit ppc kernel
On Sat, 2012-04-28 at 20:23 +0200, Andreas Schwab wrote: Larry Finger larry.fin...@lwfinger.net writes: I have done a little more debugging. The problem is definitely coming from drivers/tty/serial/pmac_zilog.c. I am getting ChanB interrupts while open, which causes the following code segment to return IRQ_NONE: if (r3 (CHBEXT | CHBTxIP | CHBRxIP)) { if (!ZS_IS_OPEN(uap_a)) { s/uap_a/uap_b/? Good catch... Let's see if that fixes it for Larry... Cheers, Ben. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: Regression in 32-bit ppc kernel
On Sun, 2012-04-29 at 08:41 +1000, Benjamin Herrenschmidt wrote: On Sat, 2012-04-28 at 13:09 -0500, Larry Finger wrote: I have done a little more debugging. The problem is definitely coming from drivers/tty/serial/pmac_zilog.c. I am getting ChanB interrupts while open, which causes the following code segment to return IRQ_NONE: if (r3 (CHBEXT | CHBTxIP | CHBRxIP)) { if (!ZS_IS_OPEN(uap_a)) { pmz_debug(ChanB interrupt while open !\n); goto skip_b; } write_zsreg(uap_b, R0, RES_H_IUS); zssync(uap_b); if (r3 CHBEXT) When this section is entered, r3 == 0x2 (CHBTxIP). Ok. The debug code was meant to spell while not open btw :-) I have some ideas what's going on. I think the irda stuff can trigger interrupts during the open/close sequence before ZS_IS_OPEN is true. I'll send a fix. Hrm, actually, Andreas also found an actual bug here, as we aren't testing uap_b but uap_a ... oops. I think when I tested chan b I always had chan a open :-) That will be easy to fix. Can you try turning the uap_a to uap_b test above and see if that fixes some of it for you ? Cheers, Ben. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: Regression in 32-bit ppc kernel
On 04/28/2012 05:48 PM, Benjamin Herrenschmidt wrote: On Sat, 2012-04-28 at 20:23 +0200, Andreas Schwab wrote: Larry Fingerlarry.fin...@lwfinger.net writes: I have done a little more debugging. The problem is definitely coming from drivers/tty/serial/pmac_zilog.c. I am getting ChanB interrupts while open, which causes the following code segment to return IRQ_NONE: if (r3 (CHBEXT | CHBTxIP | CHBRxIP)) { if (!ZS_IS_OPEN(uap_a)) { s/uap_a/uap_b/? Good catch... Let's see if that fixes it for Larry... Yes, good catch by Andreas. That change does fix the problem. Ben - Do you want to fix the typos for open/not open with the same patch? Larry ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: Regression in 32-bit ppc kernel
On Sat, 2012-04-28 at 18:17 -0500, Larry Finger wrote: Yes, good catch by Andreas. That change does fix the problem. Ben - Do you want to fix the typos for open/not open with the same patch? Sure, if you're going to do a proper patch, by all means please fix those too :-) Does it fix all the occurrences of the problem for you ? Cheers, Ben. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: Regression in 32-bit ppc kernel
On 04/28/2012 06:23 PM, Benjamin Herrenschmidt wrote: On Sat, 2012-04-28 at 18:17 -0500, Larry Finger wrote: Yes, good catch by Andreas. That change does fix the problem. Ben - Do you want to fix the typos for open/not open with the same patch? Sure, if you're going to do a proper patch, by all means please fix those too :-) Does it fix all the occurrences of the problem for you ? Yes. After the patch is applied, no more nobody cared IRQ messages. I will prepare the patch and send it to you. Larry ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: Regression in 32-bit ppc kernel
On 04/25/2012 04:44 PM, Benjamin Herrenschmidt wrote: Do we know what the bad interrupt maps to ? Also what is the value of NR_IRQ and do you have SPARSE_IRQ enabled ? Can you try with the latter disabled and NR_IRQ set to something large, such as 128 ? (You may be able to check the interrupt mapping in debugfs) Sorry, I was unable to find anything in debugfs to help me learn about interrupt mapping. The value of CONFIG_NR_IRQS is already 512. I have not tried reducing it to 128. The setting for CONFIG_SPARSE_IRQ was on, and changing it to off did not make any difference. I finished the bisection, which led to commit a79dd5ae5a8f49688d65b89a859f2b98a7ee5538 Author: Benjamin Herrenschmidt b...@kernel.crashing.org Date: Thu Dec 15 11:13:03 2011 +1100 tty/serial/pmac_zilog: Fix suspend resume As this seemed to be an improbable result, I did the full test by checking out the previous commit (43ca5d3). That resulted in a good result. Then I used quilt to add commit a79dd5a as a patch and the fault returned. I then noticed that you said in the commit message that I removed some code for handling unexpected interrupt which should never be hit It appears that my box does indeed hit such an unexpected interrupt. I could always get rid of the fault by disabling CONFIG_SERIAL_PMACZILOG, but I would like to fix the problem if possible. Thanks, Larry ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: Regression in 32-bit ppc kernel
On Fri, 2012-04-27 at 10:38 -0500, Larry Finger wrote: Sorry, I was unable to find anything in debugfs to help me learn about interrupt mapping. The value of CONFIG_NR_IRQS is already 512. I have not tried reducing it to 128. The setting for CONFIG_SPARSE_IRQ was on, and changing it to off did not make any difference. I finished the bisection, which led to commit a79dd5ae5a8f49688d65b89a859f2b98a7ee5538 Author: Benjamin Herrenschmidt b...@kernel.crashing.org Date: Thu Dec 15 11:13:03 2011 +1100 tty/serial/pmac_zilog: Fix suspend resume As this seemed to be an improbable result, I did the full test by checking out the previous commit (43ca5d3). That resulted in a good result. Then I used quilt to add commit a79dd5a as a patch and the fault returned. I then noticed that you said in the commit message that I removed some code for handling unexpected interrupt which should never be hit It appears that my box does indeed hit such an unexpected interrupt. I could always get rid of the fault by disabling CONFIG_SERIAL_PMACZILOG, but I would like to fix the problem if possible. Right, it should be fixed. I need to understand where the unexpected interrupt comes from. Can you tell me (or remind me) what specific machine model you are using ? Are you putting the console on the serial port ? Cheers, Ben. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: Regression in 32-bit ppc kernel
On 04/27/2012 05:26 PM, Benjamin Herrenschmidt wrote: On Fri, 2012-04-27 at 10:38 -0500, Larry Finger wrote: Sorry, I was unable to find anything in debugfs to help me learn about interrupt mapping. The value of CONFIG_NR_IRQS is already 512. I have not tried reducing it to 128. The setting for CONFIG_SPARSE_IRQ was on, and changing it to off did not make any difference. I finished the bisection, which led to commit a79dd5ae5a8f49688d65b89a859f2b98a7ee5538 Author: Benjamin Herrenschmidtb...@kernel.crashing.org Date: Thu Dec 15 11:13:03 2011 +1100 tty/serial/pmac_zilog: Fix suspend resume As this seemed to be an improbable result, I did the full test by checking out the previous commit (43ca5d3). That resulted in a good result. Then I used quilt to add commit a79dd5a as a patch and the fault returned. I then noticed that you said in the commit message that I removed some code for handling unexpected interrupt which should never be hit It appears that my box does indeed hit such an unexpected interrupt. I could always get rid of the fault by disabling CONFIG_SERIAL_PMACZILOG, but I would like to fix the problem if possible. Right, it should be fixed. I need to understand where the unexpected interrupt comes from. Can you tell me (or remind me) what specific machine model you are using ? Are you putting the console on the serial port ? It is a 15 Powerbook G4. I think they call it a Titanium. The console is not on a serial port. In fact, the reason that I did not think this patch was a problem is because the serial port does not appear to be connected to an external port. I was unaware that there was a serial port on the motherboard. There is a modem jack, but no 9 or 25-pin connectors that would indicate a standard serial port. There are two stack dumps with the same trace. I posted the first, but the second is preceded by the lines [c02adca0] pmz_interrupt Disabling IRQ #23 ttyPZ1: IrDA setup for 57600 bps, dongle version: 4 ttyPZ1: IrDA setup for 115200 bps, dongle version: 4 irq23: nobody cared (try booting with the irqpoll option As I am not sure how to put options in with yaboot, I have not tried that. Larry ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: Regression in 32-bit ppc kernel
On Fri, 2012-04-27 at 19:02 -0500, Larry Finger wrote: It is a 15 Powerbook G4. I think they call it a Titanium. The console is not on a serial port. In fact, the reason that I did not think this patch was a problem is because the serial port does not appear to be connected to an external port. I was unaware that there was a serial port on the motherboard. There is a modem jack, but no 9 or 25-pin connectors that would indicate a standard serial port. There are two stack dumps with the same trace. I posted the first, but the second is preceded by the lines [c02adca0] pmz_interrupt Disabling IRQ #23 ttyPZ1: IrDA setup for 57600 bps, dongle version: 4 ttyPZ1: IrDA setup for 115200 bps, dongle version: 4 irq23: nobody cared (try booting with the irqpoll option As I am not sure how to put options in with yaboot, I have not tried that. Ok, so you do have a serial port, probably two even :-) One of them is connected to the infra red transceiver and the other one is probably connected to the internal modem. (The modem itself might not use it, some of these machines use an i2s/i2c modem, some use a usb modem, but the serial port is wired to the connector regardless). Cheers, Ben. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: Regression in 32-bit ppc kernel
On 04/24/2012 11:11 PM, Benjamin Herrenschmidt wrote: On Tue, 2012-04-24 at 21:37 -0500, Larry Finger wrote: Somewhere between v3.2 and v3.3, the kernel in my Powerbook G4 started issuing the following traceback on bootup: Does it continue working afterward or not at all ? Are you using the old IDE driver or the newer libata based pata_macio ? Yes, it finishes the boot, and appears to work correctly. If a device is missing, I do not know what it is. I think I am using the old IDE driver. Interesting. Does it make a difference if you switch to pata_macio ? After a few tries, I managed to change over to pata_macio. Fortunately, most of the system used dev-by-id or UUID, thus most of the process was getting all the kernel pieces built in. Unfortunately, the original problem remains. I have resumed the bisecting - only 11 steps to go. I should have it by Friday! :) Larry ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: Regression in 32-bit ppc kernel
On Wed, 2012-04-25 at 10:00 -0500, Larry Finger wrote: After a few tries, I managed to change over to pata_macio. Fortunately, most of the system used dev-by-id or UUID, thus most of the process was getting all the kernel pieces built in. Unfortunately, the original problem remains. I have resumed the bisecting - only 11 steps to go. I should have it by Friday! :) Thanks ! Do we know what the bad interrupt maps to ? Also what is the value of NR_IRQ and do you have SPARSE_IRQ enabled ? Can you try with the latter disabled and NR_IRQ set to something large, such as 128 ? (You may be able to check the interrupt mapping in debugfs) Cheers, Ben. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Regression in 32-bit ppc kernel
Hi, Somewhere between v3.2 and v3.3, the kernel in my Powerbook G4 started issuing the following traceback on bootup: [ 40.264006] irq 23: nobody cared (try booting with the irqpoll option) [ 40.264031] Call Trace: [ 40.264070] [dfff3f00] [c000984c] show_stack+0x7c/0x194 (unreliable) [ 40.264102] [dfff3f40] [c00a6840] __report_bad_irq+0x44/0xf4 [ 40.264119] [dfff3f60] [c00a6adc] note_interrupt+0x1ec/0x2ac [ 40.264135] [dfff3f80] [c00a48a8] handle_irq_event_percpu+0x250/0x2b8 [ 40.264152] [dfff3fd0] [c00a4944] handle_irq_event+0x34/0x54 [ 40.264169] [dfff3fe0] [c00a7514] handle_fasteoi_irq+0xb4/0x124 [ 40.264192] [dfff3ff0] [c000f5bc] call_handle_irq+0x18/0x28 [ 40.264208] [dec85ce0] [c000719c] do_IRQ+0x114/0x1cc [ 40.264226] [dec85d10] [c0015868] ret_from_except+0x0/0x1c [ 40.264254] --- Exception: 501 at find_vma+0x10/0x80 [ 40.264259] LR = do_page_fault+0x26c/0x6ac [ 40.264272] [dec85dd0] [c03f0128] do_page_fault+0x25c/0x6ac (unreliable) [ 40.264289] [dec85f40] [c00155e4] handle_page_fault+0xc/0x80 [ 40.264327] --- Exception: 301 at 0x4800a174 The problem still exists in v3.4-rc3. I am currently doing a bisection of this problem, but it will take a long time to complete. Note: IRQ 23 is not active in v3.2. Thanks, Larry ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: Regression in 32-bit ppc kernel
On Tue, 2012-04-24 at 17:58 -0500, Larry Finger wrote: Hi, Somewhere between v3.2 and v3.3, the kernel in my Powerbook G4 started issuing the following traceback on bootup: Does it continue working afterward or not at all ? Are you using the old IDE driver or the newer libata based pata_macio ? Cheers, Ben. [ 40.264006] irq 23: nobody cared (try booting with the irqpoll option) [ 40.264031] Call Trace: [ 40.264070] [dfff3f00] [c000984c] show_stack+0x7c/0x194 (unreliable) [ 40.264102] [dfff3f40] [c00a6840] __report_bad_irq+0x44/0xf4 [ 40.264119] [dfff3f60] [c00a6adc] note_interrupt+0x1ec/0x2ac [ 40.264135] [dfff3f80] [c00a48a8] handle_irq_event_percpu+0x250/0x2b8 [ 40.264152] [dfff3fd0] [c00a4944] handle_irq_event+0x34/0x54 [ 40.264169] [dfff3fe0] [c00a7514] handle_fasteoi_irq+0xb4/0x124 [ 40.264192] [dfff3ff0] [c000f5bc] call_handle_irq+0x18/0x28 [ 40.264208] [dec85ce0] [c000719c] do_IRQ+0x114/0x1cc [ 40.264226] [dec85d10] [c0015868] ret_from_except+0x0/0x1c [ 40.264254] --- Exception: 501 at find_vma+0x10/0x80 [ 40.264259] LR = do_page_fault+0x26c/0x6ac [ 40.264272] [dec85dd0] [c03f0128] do_page_fault+0x25c/0x6ac (unreliable) [ 40.264289] [dec85f40] [c00155e4] handle_page_fault+0xc/0x80 [ 40.264327] --- Exception: 301 at 0x4800a174 The problem still exists in v3.4-rc3. I am currently doing a bisection of this problem, but it will take a long time to complete. Note: IRQ 23 is not active in v3.2. Thanks, Larry ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: Regression in 32-bit ppc kernel
On 04/24/2012 06:53 PM, Benjamin Herrenschmidt wrote: On Tue, 2012-04-24 at 17:58 -0500, Larry Finger wrote: Hi, Somewhere between v3.2 and v3.3, the kernel in my Powerbook G4 started issuing the following traceback on bootup: Does it continue working afterward or not at all ? Are you using the old IDE driver or the newer libata based pata_macio ? Yes, it finishes the boot, and appears to work correctly. If a device is missing, I do not know what it is. I think I am using the old IDE driver. Larry ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: Regression in 32-bit ppc kernel
On Tue, 2012-04-24 at 21:37 -0500, Larry Finger wrote: Somewhere between v3.2 and v3.3, the kernel in my Powerbook G4 started issuing the following traceback on bootup: Does it continue working afterward or not at all ? Are you using the old IDE driver or the newer libata based pata_macio ? Yes, it finishes the boot, and appears to work correctly. If a device is missing, I do not know what it is. I think I am using the old IDE driver. Interesting. Does it make a difference if you switch to pata_macio ? Cheers, Ben. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev