Re: Lossy interrupts on x86_64
On Wed, 12 Sep 2007 08:33:15 -0700 Jesse Barnes wrote: > I just narrowed down a weird problem where I was losing more than 50% of > my vblank interrupts to what seems to be the hires timers patch. Stock > 2.6.23-rc5 works fine, but the latest (171) kernel from rawhide drops > most of my interrupts unless I also have another interrupt source > running (e.g. if I hold down a key or move the mouse I get the expected > number of vblank interrupts, otherwise I get between 3 and 30 instead > of the expected 60 per second). > > Any ideas? It seems like it might be bad APIC programming, but I > haven't gone through those mods to look for suspects... Also tickless? (NO_HZ ?) I think I've seen some emails about tickless and keystrokes being needed to cause interrupts... but I'm not postive about it. but you said "any ideas" --- ~Randy *** Remember to use Documentation/SubmitChecklist when testing your code *** - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Lossy interrupts on x86_64
On Wednesday, September 12, 2007, Randy Dunlap wrote: > On Wed, 12 Sep 2007 08:33:15 -0700 Jesse Barnes wrote: > > I just narrowed down a weird problem where I was losing more than > > 50% of my vblank interrupts to what seems to be the hires timers > > patch. Stock 2.6.23-rc5 works fine, but the latest (171) kernel > > from rawhide drops most of my interrupts unless I also have another > > interrupt source running (e.g. if I hold down a key or move the > > mouse I get the expected number of vblank interrupts, otherwise I > > get between 3 and 30 instead of the expected 60 per second). > > > > Any ideas? It seems like it might be bad APIC programming, but I > > haven't gone through those mods to look for suspects... > > Also tickless? (NO_HZ ?) > > I think I've seen some emails about tickless and keystrokes being > needed to cause interrupts... but I'm not postive about it. > > but you said "any ideas" Yeah, there's NO_HZ in the rawhide kernel too, but I'm getting timer ticks normally afaict, it's just vblank interrupts that get lost... /proc/interrupts on this machine (from NO_HZ, hires kernel): [EMAIL PROTECTED] ~]$ cat /proc/interrupts CPU0 CPU1 0: 290050 289541 IO-APIC-edge timer 1: 3862 3956 IO-APIC-edge i8042 8: 0 0 IO-APIC-edge rtc0 9: 1632 1643 IO-APIC-fasteoi acpi 12: 183662 183926 IO-APIC-edge i8042 14: 20626 20717 IO-APIC-edge libata 15: 0 0 IO-APIC-edge libata 16: 46812 46825 IO-APIC-fasteoi yenta, uhci_hcd:usb3, [EMAIL PROTECTED]::00:02.0 17: 63715 63653 IO-APIC-fasteoi uhci_hcd:usb4, HDA Intel, firewire_ohci, iwl4965 18: 0 0 IO-APIC-fasteoi uhci_hcd:usb5 19: 52 36 IO-APIC-fasteoi ehci_hcd:usb7 20: 43 46 IO-APIC-fasteoi uhci_hcd:usb1 21: 0 0 IO-APIC-fasteoi uhci_hcd:usb2 22: 2 1 IO-APIC-fasteoi ehci_hcd:usb6 2297:937944 PCI-MSI-edge eth0 2298: 12392 12402 PCI-MSI-edge ahci NMI: 0 0 LOC: 290913 335027 ERR: 0 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Lossy interrupts on x86_64
On Wed, 2007-09-12 at 09:29 -0700, Jesse Barnes wrote: > On Wednesday, September 12, 2007, Randy Dunlap wrote: > > On Wed, 12 Sep 2007 08:33:15 -0700 Jesse Barnes wrote: > > > I just narrowed down a weird problem where I was losing more than > > > 50% of my vblank interrupts to what seems to be the hires timers > > > patch. Stock 2.6.23-rc5 works fine, but the latest (171) kernel > > > from rawhide drops most of my interrupts unless I also have another > > > interrupt source running (e.g. if I hold down a key or move the > > > mouse I get the expected number of vblank interrupts, otherwise I > > > get between 3 and 30 instead of the expected 60 per second). > > > > > > Any ideas? It seems like it might be bad APIC programming, but I > > > haven't gone through those mods to look for suspects... > > > > Also tickless? (NO_HZ ?) > > > > I think I've seen some emails about tickless and keystrokes being > > needed to cause interrupts... but I'm not postive about it. That's a suspend / resume problem which we are hunting. > > but you said "any ideas" > > Yeah, there's NO_HZ in the rawhide kernel too, but I'm getting timer > ticks normally afaict, it's just vblank interrupts that get lost... Jesse, does it make any difference when you boot the box with: nohz=off on the kernel command line ? tglx - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Lossy interrupts on x86_64
Jesse Barnes wrote: I just narrowed down a weird problem where I was losing more than 50% of my vblank interrupts to what seems to be the hires timers patch. Stock 2.6.23-rc5 works fine, but the latest (171) kernel from rawhide drops most of my interrupts unless I also have another interrupt source running (e.g. if I hold down a key or move the mouse I get the expected number of vblank interrupts, otherwise I get between 3 and 30 instead of the expected 60 per second). Any ideas? It seems like it might be bad APIC programming, but I haven't gone through those mods to look for suspects... What happens if you boot with 'noapic' or 'pci=nomsi'? Please post dmesg as well so we can see how the kernel is initializing the relevant hardware. -- Chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Lossy interrupts on x86_64
On 09/12/2007 12:29 PM, Jesse Barnes wrote: > > [EMAIL PROTECTED] ~]$ cat /proc/interrupts >CPU0 CPU1 > 0: 290050 289541 IO-APIC-edge timer > 1: 3862 3956 IO-APIC-edge i8042 > 8: 0 0 IO-APIC-edge rtc0 > 9: 1632 1643 IO-APIC-fasteoi acpi > 12: 183662 183926 IO-APIC-edge i8042 > 14: 20626 20717 IO-APIC-edge libata > 15: 0 0 IO-APIC-edge libata > 16: 46812 46825 IO-APIC-fasteoi yenta, uhci_hcd:usb3, [EMAIL > PROTECTED]::00:02.0 > 17: 63715 63653 IO-APIC-fasteoi uhci_hcd:usb4, HDA Intel, > firewire_ohci, iwl4965 > 18: 0 0 IO-APIC-fasteoi uhci_hcd:usb5 > 19: 52 36 IO-APIC-fasteoi ehci_hcd:usb7 > 20: 43 46 IO-APIC-fasteoi uhci_hcd:usb1 > 21: 0 0 IO-APIC-fasteoi uhci_hcd:usb2 > 22: 2 1 IO-APIC-fasteoi ehci_hcd:usb6 > 2297:937944 PCI-MSI-edge eth0 > 2298: 12392 12402 PCI-MSI-edge ahci > NMI: 0 0 > LOC: 290913 335027 > ERR: 0 > Hmm, is there any way to force those "fasteoi" handlers to "level" instead? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Lossy interrupts on x86_64
On Mon, 2007-09-17 at 12:13 -0700, Jesse Barnes wrote: > On Wednesday, September 12, 2007, Thomas Gleixner wrote: > > does it make any difference when you boot the box with: > > > > nohz=off > > > > on the kernel command line ? > > Yeah, that makes a difference: the box hangs when I start receiving > vblank interrupts instead. However it's not a hard hang, I think X > just becomes unresponsive. I can still hit the power button on the > laptop and the machine shuts down gracefully, but ctl-alt-delete and > ctl-alt-backspace don't work. So X is probably still up and in charge > of input but may not be getting any more timeslices or something. Eeek, that sounds scary. Can you add "highres=off" as well ? tglx - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Lossy interrupts on x86_64
On Wednesday, September 12, 2007, Thomas Gleixner wrote: > does it make any difference when you boot the box with: > > nohz=off > > on the kernel command line ? Yeah, that makes a difference: the box hangs when I start receiving vblank interrupts instead. However it's not a hard hang, I think X just becomes unresponsive. I can still hit the power button on the laptop and the machine shuts down gracefully, but ctl-alt-delete and ctl-alt-backspace don't work. So X is probably still up and in charge of input but may not be getting any more timeslices or something. So to summarize: 2.6.23-rc5: works 2.6.23-0.171.rc5.git1.fc8 w/o boot options: lossy interrupts 2.6.23-0.171.rc5.git1.fc8 nohz=off: hang when starting 3d apps using vblank interrupts Sorry it took so long for me to get back to you, I've been out of range for a few days... Thanks, Jesse - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Lossy interrupts on x86_64
On Thursday, September 13, 2007, Chris Snook wrote: > Jesse Barnes wrote: > > I just narrowed down a weird problem where I was losing more than > > 50% of my vblank interrupts to what seems to be the hires timers > > patch. Stock 2.6.23-rc5 works fine, but the latest (171) kernel > > from rawhide drops most of my interrupts unless I also have another > > interrupt source running (e.g. if I hold down a key or move the > > mouse I get the expected number of vblank interrupts, otherwise I > > get between 3 and 30 instead of the expected 60 per second). > > > > Any ideas? It seems like it might be bad APIC programming, but I > > haven't gone through those mods to look for suspects... > > What happens if you boot with 'noapic' or 'pci=nomsi'? Please post > dmesg as well so we can see how the kernel is initializing the > relevant hardware. noapic gives me the same behavior as nohz=off and pci=nomsi doesn't seem to do anything (still get the same lossy interrupts behavior). dmesg from stock booth (i.e. lossy problem) attached. Jesse Linux version 2.6.23-0.171.rc5.git1.fc8 (kojibuilder@) (gcc version 4.1.2 20070821 (Red Hat 4.1.2-23)) #1 SMP Mon Sep 10 16:55:15 EDT 2007 Command line: ro root=/dev/VolGroup00/LogVol00 quiet selinux=0 rootflags=data=writeback BIOS-provided physical RAM map: BIOS-e820: - 00099800 (usable) BIOS-e820: 00099800 - 000a (reserved) BIOS-e820: 000d6000 - 000d8000 (reserved) BIOS-e820: 000e - 0010 (reserved) BIOS-e820: 0010 - 7d6b (usable) BIOS-e820: 7d6b - 7d6cd000 (ACPI data) BIOS-e820: 7d6cd000 - 7d70 (ACPI NVS) BIOS-e820: 7d70 - 7e00 (reserved) BIOS-e820: f000 - f400 (reserved) BIOS-e820: fec0 - fec1 (reserved) BIOS-e820: fed0 - fed00400 (reserved) BIOS-e820: fed14000 - fed1a000 (reserved) BIOS-e820: fed1c000 - fed9 (reserved) BIOS-e820: fee0 - fee01000 (reserved) BIOS-e820: ff00 - 0001 (reserved) Entering add_active_range(0, 0, 153) 0 entries of 3200 used Entering add_active_range(0, 256, 513712) 1 entries of 3200 used end_pfn_map = 1048576 DMI present. ACPI: RSDP 000F68D0, 0024 (r2 LENOVO) ACPI: XSDT 7D6BCA72, 0094 (r1 LENOVO TP-7L1090 LTP0) ACPI: FACP 7D6BCC00, 00F4 (r3 LENOVO TP-7L1090 LNVO1) ACPI Warning (tbfadt-0442): Optional field "Gpe1Block" has zero address or length: 102C/0 [20070126] ACPI: DSDT 7D6BD00C, FBEF (r1 LENOVO TP-7L1090 MSFT 300) ACPI: FACS 7D6E4000, 0040 ACPI: SSDT 7D6BCDB4, 0258 (r1 LENOVO TP-7L1090 MSFT 300) ACPI: ECDT 7D6CCBFB, 0052 (r1 LENOVO TP-7L1090 LNVO1) ACPI: TCPA 7D6CCC4D, 0032 (r2 LENOVO TP-7L1090 LNVO1) ACPI: APIC 7D6CCC7F, 0068 (r1 LENOVO TP-7L1090 LNVO1) ACPI: MCFG 7D6CCCE7, 003C (r1 LENOVO TP-7L1090 LNVO1) ACPI: HPET 7D6CCD23, 0038 (r1 LENOVO TP-7L1090 LNVO1) ACPI: SLIC 7D6CCDF0, 0176 (r1 LENOVO TP-7L1090 LTP0) ACPI: BOOT 7D6CCF66, 0028 (r1 LENOVO TP-7L1090 LTP1) ACPI: ASF! 7D6CCF8E, 0072 (r16 LENOVO TP-7L1090 PTL 1) ACPI: SSDT 7D6E26D9, 025F (r1 LENOVO TP-7L1090 INTL 20050513) ACPI: SSDT 7D6E2938, 00A6 (r1 LENOVO TP-7L1090 INTL 20050513) ACPI: SSDT 7D6E29DE, 04F7 (r1 LENOVO TP-7L1090 INTL 20050513) ACPI: SSDT 7D6E2ED5, 01D8 (r1 LENOVO TP-7L1090 INTL 20050513) No NUMA configuration found Faking a node at -7d6b Entering add_active_range(0, 0, 153) 0 entries of 3200 used Entering add_active_range(0, 256, 513712) 1 entries of 3200 used Bootmem setup node 0 -7d6b Zone PFN ranges: DMA 0 -> 4096 DMA324096 -> 1048576 Normal1048576 -> 1048576 Movable zone start PFN for each node early_node_map[2] active PFN ranges 0:0 -> 153 0: 256 -> 513712 On node 0 totalpages: 513609 DMA zone: 96 pages used for memmap DMA zone: 2499 pages reserved DMA zone: 1398 pages, LIFO batch:0 DMA32 zone: 11944 pages used for memmap DMA32 zone: 497672 pages, LIFO batch:31 Normal zone: 0 pages used for memmap Movable zone: 0 pages used for memmap ACPI: PM-Timer IO Port: 0x1008 ACPI: Local APIC address 0xfee0 ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled) Processor #0 (Bootup-CPU) ACPI: LAPIC (acpi_id[0x01] lapic_id[0x01] enabled) Processor #1 ACPI: LAPIC_NMI (acpi_id[0x00] high edge lint[0x1]) ACPI: LAPIC_NMI (acpi_id[0x01] high edge lint[0x1]) ACPI: IOAPIC (id[0x01] address[0xfec0] gsi_base[0]) IOAPIC[0]: apic_id 1, address 0xfec0, GSI 0-23 ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl) ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level) ACPI: IRQ0 us
Re: Lossy interrupts on x86_64
On Monday, September 17, 2007, Thomas Gleixner wrote: > On Mon, 2007-09-17 at 12:13 -0700, Jesse Barnes wrote: > > On Wednesday, September 12, 2007, Thomas Gleixner wrote: > > > does it make any difference when you boot the box with: > > > > > > nohz=off > > > > > > on the kernel command line ? > > > > Yeah, that makes a difference: the box hangs when I start > > receiving vblank interrupts instead. However it's not a hard hang, > > I think X just becomes unresponsive. I can still hit the power > > button on the laptop and the machine shuts down gracefully, but > > ctl-alt-delete and ctl-alt-backspace don't work. So X is probably > > still up and in charge of input but may not be getting any more > > timeslices or something. > > Eeek, that sounds scary. Can you add "highres=off" as well ? with nohz=off and highres=off I get a similar but not quite identical hang: the mouse pointer still moves around but my desktop is still hung and glxgears doesn't come up. Behavior is identical without boot options and with highres=off however (i.e. lossy interrupts in both cases). Jesse - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Lossy interrupts on x86_64
On Thu, 2007-09-20 at 12:22 -0700, Jesse Barnes wrote: > > Eeek, that sounds scary. Can you add "highres=off" as well ? > > FWIW I just tried your linux-2.6-hires tree with the attached config and > still see the problem. It doesn't look like NO_HZ is even an option in > that tree... Right, that's a 2.6-hrt update tree for Linus to pull. The 64 bit parts are not in there. It's basically Linus + some fixes. tglx - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Lossy interrupts on x86_64
On Thursday, September 20, 2007, Thomas Gleixner wrote: > On Thu, 2007-09-20 at 12:22 -0700, Jesse Barnes wrote: > > > Eeek, that sounds scary. Can you add "highres=off" as well ? > > > > FWIW I just tried your linux-2.6-hires tree with the attached > > config and still see the problem. It doesn't look like NO_HZ is > > even an option in that tree... > > Right, that's a 2.6-hrt update tree for Linus to pull. The 64 bit > parts are not in there. It's basically Linus + some fixes. Arg, looks like this is actually a DRM problem, but it doesn't exist in the DRM upstream tree, only the upstream kernel tree. I've only seen it on 965 chips though, and they have other vblank related problems, so I won't worry about it for 2.6.23 proper. Thanks, Jesse - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/