Re: [PATCH] power: supply: fix sbs-charger build, needs REGMAP_I2C
Hi Randy, thank you very much. I would not mind dropping my name but I tested the patch now with 5.4.89 so you may actually also add Tested-by: Martin Mokrejs It also happened with 5.10.7, it is probably obvious. Thank you for quick action. Martin On 16/01/2021 22:13, Randy Dunlap wrote: > CHARGER_SBS should select REGMAP_I2C since it uses API(s) that are > provided by that Kconfig symbol. > > Fixes these errors: > > ../drivers/power/supply/sbs-charger.c:149:21: error: variable ‘sbs_regmap’ > has initializer but incomplete type > static const struct regmap_config sbs_regmap = { > ../drivers/power/supply/sbs-charger.c:150:3: error: ‘const struct > regmap_config’ has no member named ‘reg_bits’ > .reg_bits = 8, > ../drivers/power/supply/sbs-charger.c:155:23: error: ‘REGMAP_ENDIAN_LITTLE’ > undeclared here (not in a function) > .val_format_endian = REGMAP_ENDIAN_LITTLE, /* since based on SMBus */ > ../drivers/power/supply/sbs-charger.c: In function ‘sbs_probe’: > ../drivers/power/supply/sbs-charger.c:183:17: error: implicit declaration of > function ‘devm_regmap_init_i2c’; did you mean ‘devm_request_irq’? > [-Werror=implicit-function-declaration] > chip->regmap = devm_regmap_init_i2c(client, _regmap); > ../drivers/power/supply/sbs-charger.c: At top level: > ../drivers/power/supply/sbs-charger.c:149:35: error: storage size of > ‘sbs_regmap’ isn’t known > static const struct regmap_config sbs_regmap = { > > Fixes: feb583e37f8a ("power: supply: add sbs-charger driver") > Signed-off-by: Randy Dunlap > Cc: Sebastian Reichel > Cc: linux...@vger.kernel.org > Cc: Martin Mokrejs > Cc: Greg Kroah-Hartman > Cc: nicolassae...@gmail.com > Cc: Nicolas Saenz Julienne > Cc: Rafael J. Wysocki > --- > Martin, do you want Reported-by: on this? > > drivers/power/supply/Kconfig |1 + > 1 file changed, 1 insertion(+) > > --- linux-next-20210115.orig/drivers/power/supply/Kconfig > +++ linux-next-20210115/drivers/power/supply/Kconfig > @@ -229,6 +229,7 @@ config BATTERY_SBS > config CHARGER_SBS > tristate "SBS Compliant charger" > depends on I2C > + select REGMAP_I2C > help > Say Y to include support for SBS compliant battery chargers. > >
Re: [PATCH] i2c: i801: fix memleak on probe error
Thanks for the note, was just compiling a new 3.10.24 kernel to test it. ;-) So far just booted an old 3.9 kernel and after plugging in an external USB3 drive I got the message, just to be sure I am still able to reproduce the error and that I have the right .config in the running kernel. Will wait for another fix instead. Martin Peter Wu wrote: Nevermind this patch, it does not really fix the memleak because i2c_set_adapdata() calls dev_set_drvdata() which allocates memory. (I must have ran kmemleak too early, right after boot it did not give any warnings, now it does). RFC: what about dropping i2c_set_adapdata() from the probe function and replacing i2c_get_adapdata(adapter) by pci_get_drvdata(adapter->pci_dev) on top of this patch? I am not sure what the purpose is for i2c_set_adapdata, hence this question. Regards, Peter On Monday 23 December 2013 10:39:38 Peter Wu wrote: The driver-specific data for i801 was only set for the device on success, that led to a memory leak on error paths (for instance, when there is a resource conflict with ACPI). (The driver core clears the driver data (if set) if the probe routine fails). Fix it by setting the driver data right after successful memory allocation, before reaching any error paths. References: http://lkml.org/lkml/2013/1/23/191 Reported-by: Martin Mokrejs Tested-by: Peter Wu [ACPI conflict error path] Signed-off-by: Peter Wu --- Hi Jean, This memleak issue is still present in v3.13-rc4-256-gb7000ad. From kmemleak: unreferenced object 0x88022f501a00 (size 256): comm "systemd-udevd", pid 209, jiffies 4294896115 (age 2872.520s) hex dump (first 32 bytes): 00 00 00 00 ad 4e ad de ff ff ff ff 00 00 00 00 .N.. ff ff ff ff ff ff ff ff f4 e2 53 82 ff ff ff ff ..S. backtrace: [] kmemleak_alloc+0x4e/0xb0 [] kmem_cache_alloc_trace+0xfa/0x1e0 [] device_private_init+0x23/0x80 [] dev_set_drvdata+0x39/0x50 [] i801_probe+0x59/0x528 [i2c_i801] [] local_pci_probe+0x45/0xa0 [] pci_device_probe+0xd9/0x130 [] driver_probe_device+0x87/0x390 [] __driver_attach+0x93/0xa0 [] bus_for_each_dev+0x6b/0xb0 [] driver_attach+0x1e/0x20 [] bus_add_driver+0x188/0x260 [] driver_register+0x64/0xf0 [] __pci_register_driver+0x60/0x70 [] 0xa02990af [] do_one_initcall+0xf2/0x1a0 The dmesg for this laptop also contains a resource conflict message, just like the reporter (Martin Mokrejs): [ 15.409772] ACPI Warning: 0x1840-0x185f SystemIO conflicts with Region \_SB_.PCI0.SBUS.SMBI 1 (20131115/utaddress-251) [ 15.413439] ACPI: If an ACPI driver is available for this device, you should use it instead of the native driver With this patch applied on top of almost 3.13-rc5 (v3.13-rc4-256-gb7000ad), the memleak is gone. Regards, Peter --- drivers/i2c/busses/i2c-i801.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/drivers/i2c/busses/i2c-i801.c b/drivers/i2c/busses/i2c-i801.c index 737e298..a7096bf 100644 --- a/drivers/i2c/busses/i2c-i801.c +++ b/drivers/i2c/busses/i2c-i801.c @@ -1117,6 +1117,7 @@ static int i801_probe(struct pci_dev *dev, const struct pci_device_id *id) if (!priv) return -ENOMEM; + pci_set_drvdata(dev, priv); i2c_set_adapdata(>adapter, priv); priv->adapter.owner = THIS_MODULE; priv->adapter.class = i801_get_adapter_class(priv); @@ -1236,8 +1237,6 @@ static int i801_probe(struct pci_dev *dev, const struct pci_device_id *id) /* We ignore errors - multiplexing is optional */ i801_add_mux(priv); - pci_set_drvdata(dev, priv); - return 0; exit_free_irq: -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] i2c: i801: fix memleak on probe error
Thanks for the note, was just compiling a new 3.10.24 kernel to test it. ;-) So far just booted an old 3.9 kernel and after plugging in an external USB3 drive I got the message, just to be sure I am still able to reproduce the error and that I have the right .config in the running kernel. Will wait for another fix instead. Martin Peter Wu wrote: Nevermind this patch, it does not really fix the memleak because i2c_set_adapdata() calls dev_set_drvdata() which allocates memory. (I must have ran kmemleak too early, right after boot it did not give any warnings, now it does). RFC: what about dropping i2c_set_adapdata() from the probe function and replacing i2c_get_adapdata(adapter) by pci_get_drvdata(adapter-pci_dev) on top of this patch? I am not sure what the purpose is for i2c_set_adapdata, hence this question. Regards, Peter On Monday 23 December 2013 10:39:38 Peter Wu wrote: The driver-specific data for i801 was only set for the device on success, that led to a memory leak on error paths (for instance, when there is a resource conflict with ACPI). (The driver core clears the driver data (if set) if the probe routine fails). Fix it by setting the driver data right after successful memory allocation, before reaching any error paths. References: http://lkml.org/lkml/2013/1/23/191 Reported-by: Martin Mokrejs mmokr...@fold.natur.cuni.cz Tested-by: Peter Wu lekenst...@gmail.com [ACPI conflict error path] Signed-off-by: Peter Wu lekenst...@gmail.com --- Hi Jean, This memleak issue is still present in v3.13-rc4-256-gb7000ad. From kmemleak: unreferenced object 0x88022f501a00 (size 256): comm systemd-udevd, pid 209, jiffies 4294896115 (age 2872.520s) hex dump (first 32 bytes): 00 00 00 00 ad 4e ad de ff ff ff ff 00 00 00 00 .N.. ff ff ff ff ff ff ff ff f4 e2 53 82 ff ff ff ff ..S. backtrace: [815d29ce] kmemleak_alloc+0x4e/0xb0 [8116ea5a] kmem_cache_alloc_trace+0xfa/0x1e0 [813efc63] device_private_init+0x23/0x80 [813f2b49] dev_set_drvdata+0x39/0x50 [a0294539] i801_probe+0x59/0x528 [i2c_i801] [81332d95] local_pci_probe+0x45/0xa0 [81333be9] pci_device_probe+0xd9/0x130 [813f30e7] driver_probe_device+0x87/0x390 [813f34c3] __driver_attach+0x93/0xa0 [813f102b] bus_for_each_dev+0x6b/0xb0 [813f2b0e] driver_attach+0x1e/0x20 [813f26e8] bus_add_driver+0x188/0x260 [813f3b04] driver_register+0x64/0xf0 [81332930] __pci_register_driver+0x60/0x70 [a02990af] 0xa02990af [81000312] do_one_initcall+0xf2/0x1a0 The dmesg for this laptop also contains a resource conflict message, just like the reporter (Martin Mokrejs): [ 15.409772] ACPI Warning: 0x1840-0x185f SystemIO conflicts with Region \_SB_.PCI0.SBUS.SMBI 1 (20131115/utaddress-251) [ 15.413439] ACPI: If an ACPI driver is available for this device, you should use it instead of the native driver With this patch applied on top of almost 3.13-rc5 (v3.13-rc4-256-gb7000ad), the memleak is gone. Regards, Peter --- drivers/i2c/busses/i2c-i801.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/drivers/i2c/busses/i2c-i801.c b/drivers/i2c/busses/i2c-i801.c index 737e298..a7096bf 100644 --- a/drivers/i2c/busses/i2c-i801.c +++ b/drivers/i2c/busses/i2c-i801.c @@ -1117,6 +1117,7 @@ static int i801_probe(struct pci_dev *dev, const struct pci_device_id *id) if (!priv) return -ENOMEM; + pci_set_drvdata(dev, priv); i2c_set_adapdata(priv-adapter, priv); priv-adapter.owner = THIS_MODULE; priv-adapter.class = i801_get_adapter_class(priv); @@ -1236,8 +1237,6 @@ static int i801_probe(struct pci_dev *dev, const struct pci_device_id *id) /* We ignore errors - multiplexing is optional */ i801_add_mux(priv); - pci_set_drvdata(dev, priv); - return 0; exit_free_irq: -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [sched_delayed] sched: RT throttling activated
While you are probably thinking about the iwlwifi issue causing RT throttling I have one more interesting followup below. Peter Zijlstra wrote: > On Fri, Aug 23, 2013 at 12:38:53PM +0200, Martin Mokrejs wrote: >>> It means you have (a) real-time task(s) that consume significant amount >> >> How can I find them? > > ps -deo pid,cls,cmd | grep -e RR -e FF > > Should do I suppose > >> I don't think I need the RT, I have two CPU-bound >> processes and want to run them at max speed. Rest of the system is >> unimportant. >> >> I still don't understand what the $subj message actually says. Does it say >> the RT-requiring task was slowed down? I am a bit lost here. > > Yeah, they were forcibly stopped from running for a little while. > >>> of time. At some point we throttle them in an attempt to keep the system >>> from falling over. >> >> Will I get companion "[sched_delayed] sched: RT throttling deactivated" >> at some point? > > Nope, you get that message once to tell you that we throttle RT tasks. > >> Are python-based apps requiring the realtime features? > > I'm fairly sure python could use the relevant scheduling classes, but I > don't speak snake so I really wouldn't know. > >> I used to get the messages below which are now gone with my CPU cooler being >> replaced yesterday: >> >> [ 4172.717272] CPU1: Core temperature above threshold, cpu clock throttled >> (total events = 153727) > >> mcelog report in such cases: >> >> Hardware event. This is not a software error. >> MCE 0 >> CPU 1 THERMAL EVENT TSC 1bf82e2a146 >> TIME 1375536062 Sat Aug 3 15:21:02 2013 >> Processor 1 heated above trip temperature. Throttling enabled. >> Please check your system cooling. Performance will be impacted >> STATUS 880003c3 MCGSTATUS 0 >> MCGCAP c07 APICID 2 SOCKETID 0 >> CPUID Vendor Intel Family 6 Model 42 > > Right, those are thermal events throttling the speed of your CPU to keep > the thing from heat damaging itself. > >> While my CPU cooler got replaced even now I still get (hence this email >> thread): >> >> [39564.452795] blah.py[14396]: segfault at 7ff67af34a58 ip 7ff67badff00 >> sp 7fff771ce798 error 4 in libpython2.7.so.1.0[7ff67b9cf000+173000] >> [44520.259205] [sched_delayed] sched: RT throttling activated >> [48956.057816] blah.py[16623]: segfault at 2f ip 7fd462e5d046 sp >> 7fff638431e0 error 4 in libpython2.7.so.1.0[7fd462d7c000+173000] >> [49288.388797] blah.py[28631]: segfault at 7fe254b6aa58 ip 7fe255715f00 >> sp 7fff6ddaaff8 error 4 in libpython2.7.so.1.0[7fe255605000+173000] >> [49942.020084] blah.py[6950]: segfault at d0 ip 7f3e8a9acf9c sp >> 7fffa72288a0 error 4 in libpython2.7.so.1.0[7f3e8a904000+173000] >> [66696.443342] blah.py[8015]: segfault at cf ip 7f798f708f9c sp >> 7fff420336e0 error 4 in libpython2.7.so.1.0[7f798f66+173000] >> [67561.587383] blah.py[7483]: segfault at 7f7b16e01540 ip 7f7b17a85f00 >> sp 7fffe663d9b8 error 4 in libpython2.7.so.1.0[7f7b17975000+173000] >> [77262.490502] blah.py[29107]: segfault at 21e1458 ip 7fc54cd17f00 sp >> 7fff283c5c38 error 4 in libpython2.7.so.1.0[7fc54cc07000+173000] >> >> >> So, what does this "[sched_delayed] sched: RT throttling activated" tell me? > > That of the past 1s, 0.95s were spend running RR/FIFO tasks. It is a > warning that comes only once per boot and should prompt you to > investigate. > > You can turn the throttle off, but be advised that running a RR/FIFO > task at 100% can (and generally does) negatively affect the running of > your system (as in, these tasks can prevent system duties from taking > place and eventually make the system come to a halt). > > > As to those faults, investigate if your python prog does something > particualrly weird or your runtime is in order. Otherwise I would advise > you to run memtest for a while to make sure your machine is in proper > working order. Hmm, meanwhile the core dumps filled up my /var/dumps/ directory of / filesystem. I do not have timing information what was the time since bootup. I deleted some files on the disk and thought I am done. Now, few hours later I realized: [85451.247130] traps: blah.py[30787] general protection ip:7faf7b57a046 sp:7fffd9f7b1d0 error:0 in libpython2.7.so.1.0[7faf7b499000+173000] [87125.493730] nr_pdflush_threads exported in /proc is scheduled for removal [87125.494238] sysctl: The scan_unevictable_pages sysctl/node-interface has been disabled for lack of a legitimate use case. If you have one, please send an email to linux...
Re: [sched_delayed] sched: RT throttling activated
Martin Mokrejs wrote: >>>> Nope, you get that message once to tell you that we throttle RT tasks. >>> >>> I think the message could improved to explain this is a warn ONCE message >>> and >>> that there is no "[sched_delayed] sched: RT throttling deactivated" >>> counterpart >>> message to be anticipated. >> >> Would something like: >> >> sched: [ONCE] RT throttle hit -- inspect system configuration. >> >> Be a better message? > > Not really. I would prefer something like: > > [sched_delayed] sched: stopped running $cmd on CPU%d in favor of RR/FIFO task > $psname Actually, to retain the message text appearing in current kernel so that people can find by e.g. Google newer syntax and possibly this thread maybe much better would be: [sched_delayed] sched: RT throttling limit $d hit. Stopped running $cmd on CPU%d in favor of RR/FIFO task $psname. Will not issue any more these messages until reboot. I know, looong line. I just realized this is about some threshold limit value, and you mean that iwlwifi contributed the highest increase compared to the other kernel threads on my system. sysctl -q -a | grep -i limit does not show what is the actual value. Am probably looking into a wrong place. ;-) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [sched_delayed] sched: RT throttling activated
Peter Zijlstra wrote: > On Fri, Aug 23, 2013 at 01:35:24PM +0200, Martin Mokrejs wrote: > >> # ps -deo pid,cls,cmd | grep -e 'RR \[' -e 'FF \[' > > This explicitly only lists kernel threads; from your other comment: > >> The shell/python tasks have 'TS' in place of the FF value in the second >> column >> so I guess they are not requiring realtime responsiveness. > > I'll assume you actually inspected the other tasks and found none. Yes, the other (false) matches were in the third or latter columns so I wanted to match just those true matches and cut it. I admit, this is not a general-purpose REGEXP and is misleading. > >> 7 FF [migration/0] >>10 FF [watchdog/0] >>11 FF [watchdog/1] >>12 FF [migration/1] >>17 FF [migration/2] >>22 FF [migration/3] > > The 'migration' threads only look like FIFO threads but they're secretly > not and don't count to the limit. The watchdog threads shouldn't run > much either. > >> 2161 FF [irq/50-iwlwifi] > > Oh a threaded interrupt, I presume you're not using "threadiqrs" since Is that what you talk about? CONFIG_IRQ_FORCED_THREADING=y CONFIG_GENERIC_SMP_IDLE_THREAD=y > this is the only interrupt thread around and I see a > 'request_threaded_irq()' call in > drivers/net/wireless/iwlwifi/pcie/trans.c > > And wow, why would that thing consume that much cpu. > > Johill, ever seen the iwlwifi interrupt go 'funny' and consume gobs of > cpu-time? I am not sure if I understand you but in case it helps somebody Current values: # cat /proc/interrupts CPU0 CPU1 0: 23 0 IO-APIC-edge timer 1: 42 0 IO-APIC-edge i8042 8: 36 0 IO-APIC-edge rtc0 9: 3 0 IO-APIC-fasteoi acpi 12: 404650 0 IO-APIC-edge i8042 16:109 0 IO-APIC-fasteoi ehci_hcd:usb1 23: 583646 0 IO-APIC-fasteoi ehci_hcd:usb2 40: 0 0 PCI-MSI-edge pciehp 41: 54319 0 PCI-MSI-edge i915 42: 553802 0 PCI-MSI-edge ahci 43: 0 0 PCI-MSI-edge enp5s0 44: 257268 0 PCI-MSI-edge xhci_hcd 45: 0 0 PCI-MSI-edge xhci_hcd 46: 0 0 PCI-MSI-edge xhci_hcd 47: 0 0 PCI-MSI-edge xhci_hcd 48: 0 0 PCI-MSI-edge xhci_hcd 49: 465462 0 PCI-MSI-edge snd_hda_intel 50:3895788 0 PCI-MSI-edge iwlwifi NMI: 8687 9483 Non-maskable interrupts LOC: 17531664 16978131 Local timer interrupts SPU: 0 0 Spurious interrupts PMI: 8687 9483 Performance monitoring interrupts IWI: 213009 205171 IRQ work interrupts RTR: 3 0 APIC ICR read retries RES:19226514491695 Rescheduling interrupts CAL: 73741 348678 Function call interrupts TLB: 98634 73 TLB shootdowns TRM: 0 0 Thermal event interrupts THR: 0 0 Threshold APIC interrupts MCE: 0 0 Machine check exceptions MCP:286286 Machine check polls ERR: 0 MIS: 0 # ifconfig wlp9s0 wlp9s0: flags=4163 mtu 1500 inet 192.168.0.24 netmask 255.255.255.0 broadcast 192.168.0.255 inet6 fe80::4e80:93ff:fe15:e6c7 prefixlen 64 scopeid 0x20 ether 4c:80:93:15:e6:c7 txqueuelen 1000 (Ethernet) RX packets 811806 bytes 992611146 (946.6 MiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 490006 bytes 71390887 (68.0 MiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 # dmesg ... [ 11.789302] Intel(R) Wireless WiFi driver for Linux, in-tree:d [ 11.789310] Copyright(c) 2003-2013 Intel Corporation [ 11.791626] iwlwifi :09:00.0: irq 50 for MSI/MSI-X [ 12.044905] iwlwifi :09:00.0: loaded firmware version 18.168.6.1 op_mode iwldvm [ 13.896033] iwlwifi :09:00.0: CONFIG_IWLWIFI_DEBUG enabled [ 13.896041] iwlwifi :09:00.0: CONFIG_IWLWIFI_DEBUGFS disabled [ 13.896044] iwlwifi :09:00.0: CONFIG_IWLWIFI_DEVICE_TRACING disabled [ 13.896047] iwlwifi :09:00.0: CONFIG_IWLWIFI_DEVICE_TESTMODE disabled [ 13.896049] iwlwifi :09:00.0: CONFIG_IWLWIFI_P2P disabled [ 13.896054] iwlwifi :09:00.0: Detected Intel(R) Centrino(R) Wireless-N 1030 BGN, REV=0xB0 [ 13.896173] iwlwifi :09:00.0: L1 Disabled; Enabling L0S [ 13.917705] ieee80211 phy0: Selected rate control algorithm 'iwl-agn-rs' > > >>> Nope, you get that message once to tell you that we throttle RT tasks. >> >> I think the message could improved to explain this is a warn ONCE message and >>
Re: [sched_delayed] sched: RT throttling activated
Peter Zijlstra wrote: > On Fri, Aug 23, 2013 at 12:38:53PM +0200, Martin Mokrejs wrote: >>> It means you have (a) real-time task(s) that consume significant amount >> >> How can I find them? > > ps -deo pid,cls,cmd | grep -e RR -e FF # ps -deo pid,cls,cmd | grep -e 'RR \[' -e 'FF \[' 7 FF [migration/0] 10 FF [watchdog/0] 11 FF [watchdog/1] 12 FF [migration/1] 17 FF [migration/2] 22 FF [migration/3] 2161 FF [irq/50-iwlwifi] # The shell/python tasks have 'TS' in place of the FF value in the second column so I guess they are not requiring realtime responsiveness. > > Should do I suppose > >> I don't think I need the RT, I have two CPU-bound >> processes and want to run them at max speed. Rest of the system is >> unimportant. >> >> I still don't understand what the $subj message actually says. Does it say >> the RT-requiring task was slowed down? I am a bit lost here. > > Yeah, they were forcibly stopped from running for a little while. > >>> of time. At some point we throttle them in an attempt to keep the system >>> from falling over. >> >> Will I get companion "[sched_delayed] sched: RT throttling deactivated" >> at some point? > > Nope, you get that message once to tell you that we throttle RT tasks. I think the message could improved to explain this is a warn ONCE message and that there is no "[sched_delayed] sched: RT throttling deactivated" counterpart message to be anticipated. > >> Are python-based apps requiring the realtime features? > > I'm fairly sure python could use the relevant scheduling classes, but I > don't speak snake so I really wouldn't know. > >> I used to get the messages below which are now gone with my CPU cooler being >> replaced yesterday: >> >> [ 4172.717272] CPU1: Core temperature above threshold, cpu clock throttled >> (total events = 153727) > >> mcelog report in such cases: >> >> Hardware event. This is not a software error. >> MCE 0 >> CPU 1 THERMAL EVENT TSC 1bf82e2a146 >> TIME 1375536062 Sat Aug 3 15:21:02 2013 >> Processor 1 heated above trip temperature. Throttling enabled. >> Please check your system cooling. Performance will be impacted >> STATUS 880003c3 MCGSTATUS 0 >> MCGCAP c07 APICID 2 SOCKETID 0 >> CPUID Vendor Intel Family 6 Model 42 > > Right, those are thermal events throttling the speed of your CPU to keep > the thing from heat damaging itself. > >> While my CPU cooler got replaced even now I still get (hence this email >> thread): >> >> [39564.452795] blah.py[14396]: segfault at 7ff67af34a58 ip 7ff67badff00 >> sp 7fff771ce798 error 4 in libpython2.7.so.1.0[7ff67b9cf000+173000] >> [44520.259205] [sched_delayed] sched: RT throttling activated >> [48956.057816] blah.py[16623]: segfault at 2f ip 7fd462e5d046 sp >> 7fff638431e0 error 4 in libpython2.7.so.1.0[7fd462d7c000+173000] >> [49288.388797] blah.py[28631]: segfault at 7fe254b6aa58 ip 7fe255715f00 >> sp 7fff6ddaaff8 error 4 in libpython2.7.so.1.0[7fe255605000+173000] >> [49942.020084] blah.py[6950]: segfault at d0 ip 7f3e8a9acf9c sp >> 7fffa72288a0 error 4 in libpython2.7.so.1.0[7f3e8a904000+173000] >> [66696.443342] blah.py[8015]: segfault at cf ip 7f798f708f9c sp >> 7fff420336e0 error 4 in libpython2.7.so.1.0[7f798f66+173000] >> [67561.587383] blah.py[7483]: segfault at 7f7b16e01540 ip 7f7b17a85f00 >> sp 7fffe663d9b8 error 4 in libpython2.7.so.1.0[7f7b17975000+173000] >> [77262.490502] blah.py[29107]: segfault at 21e1458 ip 7fc54cd17f00 sp >> 7fff283c5c38 error 4 in libpython2.7.so.1.0[7fc54cc07000+173000] >> >> >> So, what does this "[sched_delayed] sched: RT throttling activated" tell me? > > That of the past 1s, 0.95s were spend running RR/FIFO tasks. It is a > warning that comes only once per boot and should prompt you to > investigate. Could kernel log by itself some kind of equivalent of the "ps -deo pid,cls,cmd | grep -e 'RR \[' -e 'FF \['" command? > > You can turn the throttle off, but be advised that running a RR/FIFO > task at 100% can (and generally does) negatively affect the running of > your system (as in, these tasks can prevent system duties from taking > place and eventually make the system come to a halt). Provided I have in my .config: # grep EMPT .config.current # CONFIG_PREEMPT_RCU is not set CONFIG_PREEMPT_NONE=y # CONFIG_PREEMPT_VOLUNTARY is not set # CONFIG_PREEMPT is not set does that mean that I can't do much about those kernel tasks reported by the ps command above? Or could kernel be tuned to be even less dema
Re: [sched_delayed] sched: RT throttling activated
Hi Peter, Peter Zijlstra wrote: > On Fri, Aug 23, 2013 at 10:53:02AM +0200, Martin Mokrejs wrote: >> Hi, >> I tried to figure out what this message really means. I came to >> https://rt.wiki.kernel.org/index.php/Frequently_Asked_Questions >> but I am still lost. I lack in the FAQ some user-related information. >> The first paragraph is still unclear to me. I have a i7-2640M based >> laptop, hyperthreading is enabled by BIOS but I shut down the two >> emulated cores by (no BIOS option to disable HT): >> >> Would you please clarify what the "[sched_delayed] sched: RT throttling >> activated" >> really means? > > It means you have (a) real-time task(s) that consume significant amount How can I find them? I don't think I need the RT, I have two CPU-bound processes and want to run them at max speed. Rest of the system is unimportant. I still don't understand what the $subj message actually says. Does it say the RT-requiring task was slowed down? I am a bit lost here. > of time. At some point we throttle them in an attempt to keep the system > from falling over. Will I get companion "[sched_delayed] sched: RT throttling deactivated" at some point? > >> Is that because there is some RT-requiring application on my system? > > Yep. Which? How can I find them and turn that requirement off (if I understand right they interrupt my long-living computing processes)? > >> I don't know of any (or don't care about real-time responsiveness except >> that ALSA >> drivers require me to have CONFIG_SND_HRTIMER=y). Per Goggle answers could >> the >> culprit be nfsd? Then I will recompile is as a module. > > Unlikely, I don't think I've ever seen anybody run their nfsd with RT Maybe false info in that thread, I don't know: http://forums.opensuse.org/english/get-technical-help-here/applications/482756-kernel-panic-rt-throttling-activated.html > priority. Also, you can run RT tasks regardless of the config options. > SCHED_RR and SCHED_FIFO are POSIX specified and always available. Are python-based apps requiring the realtime features? I used to get the messages below which are now gone with my CPU cooler being replaced yesterday: [ 4172.717272] CPU1: Core temperature above threshold, cpu clock throttled (total events = 153727) [ 4172.717277] CPU1: Package temperature above threshold, cpu clock throttled (total events = 158008) [ 4172.717348] CPU0: Package temperature above threshold, cpu clock throttled (total events = 158008) [ 4172.718291] CPU1: Core temperature/speed normal [ 4172.718293] CPU1: Package temperature/speed normal [ 4172.718347] CPU0: Package temperature/speed normal [ 4205.336883] mce: [Hardware Error]: Machine check events logged ... [ 8966.052786] CPU1: Core temperature/speed normal [ 8966.052788] CPU0: Package temperature/speed normal [ 8966.052791] CPU1: Package temperature/speed normal [ 9266.421068] CPU1: Core temperature above threshold, cpu clock throttled (total events = 530778) [ 9266.421070] CPU0: Package temperature above threshold, cpu clock throttled (total events = 547228) [ 9266.421075] CPU1: Package temperature above threshold, cpu clock throttled (total events = 547228) [ 9266.422076] CPU1: Core temperature/speed normal [ 9266.422078] CPU0: Package temperature/speed normal [ 9266.422081] CPU1: Package temperature/speed normal [ 9445.150679] [sched_delayed] sched: RT throttling activated [ 9566.792369] CPU1: Core temperature above threshold, cpu clock throttled (total events = 559429) [ 9566.792372] CPU0: Package temperature above threshold, cpu clock throttled (total events = 576882) [ 9566.792378] CPU1: Package temperature above threshold, cpu clock throttled (total events = 576882) [ 9566.793377] CPU1: Core temperature/speed normal [ 9566.793380] CPU0: Package temperature/speed normal [ 9566.793382] CPU1: Package temperature/speed normal [ 9872.630811] CPU1: Core temperature above threshold, cpu clock throttled (total events = 583223) [ 9872.630813] CPU0: Package temperature above threshold, cpu clock throttled (total events = 601532) [ 9872.630817] CPU1: Package temperature above threshold, cpu clock throttled (total events = 601532) [ 9872.631818] CPU1: Core temperature/speed normal [ 9872.631820] CPU0: Package temperature/speed normal [ 9872.631823] CPU1: Package temperature/speed normal mcelog report in such cases: Hardware event. This is not a software error. MCE 0 CPU 1 THERMAL EVENT TSC 1bf82e2a146 TIME 1375536062 Sat Aug 3 15:21:02 2013 Processor 1 heated above trip temperature. Throttling enabled. Please check your system cooling. Performance will be impacted STATUS 880003c3 MCGSTATUS 0 MCGCAP c07 APICID 2 SOCKETID 0 CPUID Vendor Intel Family 6 Model 42 While my CPU cooler got replaced even now I still get (hence this email thread): [39564.452795] blah.py[14396]: segfault
[sched_delayed] sched: RT throttling activated
Hi, I tried to figure out what this message really means. I came to https://rt.wiki.kernel.org/index.php/Frequently_Asked_Questions but I am still lost. I lack in the FAQ some user-related information. The first paragraph is still unclear to me. I have a i7-2640M based laptop, hyperthreading is enabled by BIOS but I shut down the two emulated cores by (no BIOS option to disable HT): echo 0 > /sys/devices/system/cpu/cpu2/online echo 0 > /sys/devices/system/cpu/cpu3/online At least I hope I shutdown those emulated ones. i7z claims I did the right thing and IntelPerformanceCounterMonitorV2.5.1/pcm.x application says the same: EXEC : instructions per nominal CPU cycle IPC : instructions per CPU cycle FREQ : relation to nominal CPU frequency='unhalted clock ticks'/'invariant timer ticks' (includes Intel Turbo Boost) AFREQ : relation to nominal CPU frequency while in active state (not in power-saving C state)='unhalted clock ticks'/'invariant timer ticks while in C0-state' (includes Intel Turbo Boost) L3MISS: L3 cache misses L2MISS: L2 cache misses (including other core's L2 cache *hits*) L3HIT : L3 cache hit ratio (0.00-1.00) L2HIT : L2 cache hit ratio (0.00-1.00) L3CLK : ratio of CPU cycles lost due to L3 cache misses (0.00-1.00), in some cases could be >1.0 due to a higher memory latency L2CLK : ratio of CPU cycles lost due to missing L2 cache but still hitting L3 cache (0.00-1.00) READ : bytes read from memory controller (in GBytes) WRITE : bytes written to memory controller (in GBytes) TEMP : Temperature reading in 1 degree Celsius relative to the TjMax temperature (thermal headroom): 0 corresponds to the max temperature Core (SKT) | EXEC | IPC | FREQ | AFREQ | L3MISS | L2MISS | L3HIT | L2HIT | L3CLK | L2CLK | READ | WRITE | TEMP 00 1.78 1.51 1.181.181595 K 3363 K0.530.00 0.090.02 N/A N/A 23 10 1.21 1.03 1.181.189359 K 13 M0.310.00 0.510.04 N/A N/A 24 --- SKT0 1.50 1.27 1.181.18 10 M 16 M0.350.00 0.300.031.320.37 24 --- TOTAL * 1.50 1.27 1.181.18 10 M 16 M0.350.00 0.300.031.320.37 N/A Instructions retired: 8368 M ; Active cycles: 6594 M ; Time (TSC): 2797 Mticks ; C0 (active,non-halted) core residency: 100.00 % C1 core residency: 0.00 %; C3 core residency: 0.00 %; C6 core residency: 0.00 %; C7 core residency: 0.00 % C2 package residency: 0.00 %; C3 package residency: 0.00 %; C6 package residency: 0.00 %; C7 package residency: 0.00 % PHYSICAL CORE IPC : 1.27 => corresponds to 31.73 % utilization for cores in active state Instructions per nominal CPU cycle: 1.50 => corresponds to 37.40 % core utilization over time interval -- -- SKT0 package consumed 28.18 Joules -- TOTAL:28.18 Joules Why do I get the message at all? I have in 3.10.9 kernel: ... CONFIG_IOSCHED_DEADLINE=y CONFIG_DEFAULT_IOSCHED="deadline" ... CONFIG_NR_CPUS=4 ... # CONFIG_PREEMPT_RCU is not set CONFIG_PREEMPT_NONE=y # CONFIG_PREEMPT_VOLUNTARY is not set # CONFIG_PREEMPT is not set ... # CONFIG_SCHED_MC is not set CONFIG_SCHED_HRTICK=y I fear this is about CPU being overloaded (both cores loaded by user processes), by why do I get the message at all? Cpu speed from cpuinfo 2796.00Mhz cpuinfo might be wrong if cpufreq is enabled. To guess correctly try estimating via tsc Linux's inbuilt cpu_khz code emulated now True Frequency (without accounting Turbo) 2796 MHz CPU Multiplier 28x || Bus clock frequency (BCLK) 99.86 MHz Socket [0] - [physical cores=2, logical cores=2, max online cores ever=2] TURBO ENABLED on 2 Cores, Hyper Threading OFF Max Frequency without considering Turbo 2895.86 MHz (99.86 x [29]) Max TURBO Multiplier (if Enabled) with 1/2/3/4 Cores is 35x/33x/33x/33x Real Current Frequency 3295.29 MHz [99.86 x 33.00] (Max of below) Core [core-id] :Actual Freq (Mult.) C0% Halt(C1)% C3 % C6 % C7 % Temp Core 1 [0]: 3295.28 (33.00x) 100 0 0 0 076 Core 2 [1]: 3295.29 (33.00x) 100 0 0 0 076 Would you please clarify what the "[sched_delayed] sched: RT throttling activated" really means? Is that because there is some RT-requiring application on my system? I don't know of any (or don't care about
[sched_delayed] sched: RT throttling activated
Hi, I tried to figure out what this message really means. I came to https://rt.wiki.kernel.org/index.php/Frequently_Asked_Questions but I am still lost. I lack in the FAQ some user-related information. The first paragraph is still unclear to me. I have a i7-2640M based laptop, hyperthreading is enabled by BIOS but I shut down the two emulated cores by (no BIOS option to disable HT): echo 0 /sys/devices/system/cpu/cpu2/online echo 0 /sys/devices/system/cpu/cpu3/online At least I hope I shutdown those emulated ones. i7z claims I did the right thing and IntelPerformanceCounterMonitorV2.5.1/pcm.x application says the same: EXEC : instructions per nominal CPU cycle IPC : instructions per CPU cycle FREQ : relation to nominal CPU frequency='unhalted clock ticks'/'invariant timer ticks' (includes Intel Turbo Boost) AFREQ : relation to nominal CPU frequency while in active state (not in power-saving C state)='unhalted clock ticks'/'invariant timer ticks while in C0-state' (includes Intel Turbo Boost) L3MISS: L3 cache misses L2MISS: L2 cache misses (including other core's L2 cache *hits*) L3HIT : L3 cache hit ratio (0.00-1.00) L2HIT : L2 cache hit ratio (0.00-1.00) L3CLK : ratio of CPU cycles lost due to L3 cache misses (0.00-1.00), in some cases could be 1.0 due to a higher memory latency L2CLK : ratio of CPU cycles lost due to missing L2 cache but still hitting L3 cache (0.00-1.00) READ : bytes read from memory controller (in GBytes) WRITE : bytes written to memory controller (in GBytes) TEMP : Temperature reading in 1 degree Celsius relative to the TjMax temperature (thermal headroom): 0 corresponds to the max temperature Core (SKT) | EXEC | IPC | FREQ | AFREQ | L3MISS | L2MISS | L3HIT | L2HIT | L3CLK | L2CLK | READ | WRITE | TEMP 00 1.78 1.51 1.181.181595 K 3363 K0.530.00 0.090.02 N/A N/A 23 10 1.21 1.03 1.181.189359 K 13 M0.310.00 0.510.04 N/A N/A 24 --- SKT0 1.50 1.27 1.181.18 10 M 16 M0.350.00 0.300.031.320.37 24 --- TOTAL * 1.50 1.27 1.181.18 10 M 16 M0.350.00 0.300.031.320.37 N/A Instructions retired: 8368 M ; Active cycles: 6594 M ; Time (TSC): 2797 Mticks ; C0 (active,non-halted) core residency: 100.00 % C1 core residency: 0.00 %; C3 core residency: 0.00 %; C6 core residency: 0.00 %; C7 core residency: 0.00 % C2 package residency: 0.00 %; C3 package residency: 0.00 %; C6 package residency: 0.00 %; C7 package residency: 0.00 % PHYSICAL CORE IPC : 1.27 = corresponds to 31.73 % utilization for cores in active state Instructions per nominal CPU cycle: 1.50 = corresponds to 37.40 % core utilization over time interval -- -- SKT0 package consumed 28.18 Joules -- TOTAL:28.18 Joules Why do I get the message at all? I have in 3.10.9 kernel: ... CONFIG_IOSCHED_DEADLINE=y CONFIG_DEFAULT_IOSCHED=deadline ... CONFIG_NR_CPUS=4 ... # CONFIG_PREEMPT_RCU is not set CONFIG_PREEMPT_NONE=y # CONFIG_PREEMPT_VOLUNTARY is not set # CONFIG_PREEMPT is not set ... # CONFIG_SCHED_MC is not set CONFIG_SCHED_HRTICK=y I fear this is about CPU being overloaded (both cores loaded by user processes), by why do I get the message at all? Cpu speed from cpuinfo 2796.00Mhz cpuinfo might be wrong if cpufreq is enabled. To guess correctly try estimating via tsc Linux's inbuilt cpu_khz code emulated now True Frequency (without accounting Turbo) 2796 MHz CPU Multiplier 28x || Bus clock frequency (BCLK) 99.86 MHz Socket [0] - [physical cores=2, logical cores=2, max online cores ever=2] TURBO ENABLED on 2 Cores, Hyper Threading OFF Max Frequency without considering Turbo 2895.86 MHz (99.86 x [29]) Max TURBO Multiplier (if Enabled) with 1/2/3/4 Cores is 35x/33x/33x/33x Real Current Frequency 3295.29 MHz [99.86 x 33.00] (Max of below) Core [core-id] :Actual Freq (Mult.) C0% Halt(C1)% C3 % C6 % C7 % Temp Core 1 [0]: 3295.28 (33.00x) 100 0 0 0 076 Core 2 [1]: 3295.29 (33.00x) 100 0 0 0 076 Would you please clarify what the [sched_delayed] sched: RT throttling activated really means? Is that because there is some RT-requiring application on my system? I don't know of any (or don't care about real-time
Re: [sched_delayed] sched: RT throttling activated
Hi Peter, Peter Zijlstra wrote: On Fri, Aug 23, 2013 at 10:53:02AM +0200, Martin Mokrejs wrote: Hi, I tried to figure out what this message really means. I came to https://rt.wiki.kernel.org/index.php/Frequently_Asked_Questions but I am still lost. I lack in the FAQ some user-related information. The first paragraph is still unclear to me. I have a i7-2640M based laptop, hyperthreading is enabled by BIOS but I shut down the two emulated cores by (no BIOS option to disable HT): Would you please clarify what the [sched_delayed] sched: RT throttling activated really means? It means you have (a) real-time task(s) that consume significant amount How can I find them? I don't think I need the RT, I have two CPU-bound processes and want to run them at max speed. Rest of the system is unimportant. I still don't understand what the $subj message actually says. Does it say the RT-requiring task was slowed down? I am a bit lost here. of time. At some point we throttle them in an attempt to keep the system from falling over. Will I get companion [sched_delayed] sched: RT throttling deactivated at some point? Is that because there is some RT-requiring application on my system? Yep. Which? How can I find them and turn that requirement off (if I understand right they interrupt my long-living computing processes)? I don't know of any (or don't care about real-time responsiveness except that ALSA drivers require me to have CONFIG_SND_HRTIMER=y). Per Goggle answers could the culprit be nfsd? Then I will recompile is as a module. Unlikely, I don't think I've ever seen anybody run their nfsd with RT Maybe false info in that thread, I don't know: http://forums.opensuse.org/english/get-technical-help-here/applications/482756-kernel-panic-rt-throttling-activated.html priority. Also, you can run RT tasks regardless of the config options. SCHED_RR and SCHED_FIFO are POSIX specified and always available. Are python-based apps requiring the realtime features? I used to get the messages below which are now gone with my CPU cooler being replaced yesterday: [ 4172.717272] CPU1: Core temperature above threshold, cpu clock throttled (total events = 153727) [ 4172.717277] CPU1: Package temperature above threshold, cpu clock throttled (total events = 158008) [ 4172.717348] CPU0: Package temperature above threshold, cpu clock throttled (total events = 158008) [ 4172.718291] CPU1: Core temperature/speed normal [ 4172.718293] CPU1: Package temperature/speed normal [ 4172.718347] CPU0: Package temperature/speed normal [ 4205.336883] mce: [Hardware Error]: Machine check events logged ... [ 8966.052786] CPU1: Core temperature/speed normal [ 8966.052788] CPU0: Package temperature/speed normal [ 8966.052791] CPU1: Package temperature/speed normal [ 9266.421068] CPU1: Core temperature above threshold, cpu clock throttled (total events = 530778) [ 9266.421070] CPU0: Package temperature above threshold, cpu clock throttled (total events = 547228) [ 9266.421075] CPU1: Package temperature above threshold, cpu clock throttled (total events = 547228) [ 9266.422076] CPU1: Core temperature/speed normal [ 9266.422078] CPU0: Package temperature/speed normal [ 9266.422081] CPU1: Package temperature/speed normal [ 9445.150679] [sched_delayed] sched: RT throttling activated [ 9566.792369] CPU1: Core temperature above threshold, cpu clock throttled (total events = 559429) [ 9566.792372] CPU0: Package temperature above threshold, cpu clock throttled (total events = 576882) [ 9566.792378] CPU1: Package temperature above threshold, cpu clock throttled (total events = 576882) [ 9566.793377] CPU1: Core temperature/speed normal [ 9566.793380] CPU0: Package temperature/speed normal [ 9566.793382] CPU1: Package temperature/speed normal [ 9872.630811] CPU1: Core temperature above threshold, cpu clock throttled (total events = 583223) [ 9872.630813] CPU0: Package temperature above threshold, cpu clock throttled (total events = 601532) [ 9872.630817] CPU1: Package temperature above threshold, cpu clock throttled (total events = 601532) [ 9872.631818] CPU1: Core temperature/speed normal [ 9872.631820] CPU0: Package temperature/speed normal [ 9872.631823] CPU1: Package temperature/speed normal mcelog report in such cases: Hardware event. This is not a software error. MCE 0 CPU 1 THERMAL EVENT TSC 1bf82e2a146 TIME 1375536062 Sat Aug 3 15:21:02 2013 Processor 1 heated above trip temperature. Throttling enabled. Please check your system cooling. Performance will be impacted STATUS 880003c3 MCGSTATUS 0 MCGCAP c07 APICID 2 SOCKETID 0 CPUID Vendor Intel Family 6 Model 42 While my CPU cooler got replaced even now I still get (hence this email thread): [39564.452795] blah.py[14396]: segfault at 7ff67af34a58 ip 7ff67badff00 sp 7fff771ce798 error 4 in libpython2.7.so.1.0[7ff67b9cf000+173000] [44520.259205] [sched_delayed] sched: RT throttling activated [48956.057816] blah.py[16623]: segfault
Re: [sched_delayed] sched: RT throttling activated
Peter Zijlstra wrote: On Fri, Aug 23, 2013 at 12:38:53PM +0200, Martin Mokrejs wrote: It means you have (a) real-time task(s) that consume significant amount How can I find them? ps -deo pid,cls,cmd | grep -e RR -e FF # ps -deo pid,cls,cmd | grep -e 'RR \[' -e 'FF \[' 7 FF [migration/0] 10 FF [watchdog/0] 11 FF [watchdog/1] 12 FF [migration/1] 17 FF [migration/2] 22 FF [migration/3] 2161 FF [irq/50-iwlwifi] # The shell/python tasks have 'TS' in place of the FF value in the second column so I guess they are not requiring realtime responsiveness. Should do I suppose I don't think I need the RT, I have two CPU-bound processes and want to run them at max speed. Rest of the system is unimportant. I still don't understand what the $subj message actually says. Does it say the RT-requiring task was slowed down? I am a bit lost here. Yeah, they were forcibly stopped from running for a little while. of time. At some point we throttle them in an attempt to keep the system from falling over. Will I get companion [sched_delayed] sched: RT throttling deactivated at some point? Nope, you get that message once to tell you that we throttle RT tasks. I think the message could improved to explain this is a warn ONCE message and that there is no [sched_delayed] sched: RT throttling deactivated counterpart message to be anticipated. Are python-based apps requiring the realtime features? I'm fairly sure python could use the relevant scheduling classes, but I don't speak snake so I really wouldn't know. I used to get the messages below which are now gone with my CPU cooler being replaced yesterday: [ 4172.717272] CPU1: Core temperature above threshold, cpu clock throttled (total events = 153727) mcelog report in such cases: Hardware event. This is not a software error. MCE 0 CPU 1 THERMAL EVENT TSC 1bf82e2a146 TIME 1375536062 Sat Aug 3 15:21:02 2013 Processor 1 heated above trip temperature. Throttling enabled. Please check your system cooling. Performance will be impacted STATUS 880003c3 MCGSTATUS 0 MCGCAP c07 APICID 2 SOCKETID 0 CPUID Vendor Intel Family 6 Model 42 Right, those are thermal events throttling the speed of your CPU to keep the thing from heat damaging itself. While my CPU cooler got replaced even now I still get (hence this email thread): [39564.452795] blah.py[14396]: segfault at 7ff67af34a58 ip 7ff67badff00 sp 7fff771ce798 error 4 in libpython2.7.so.1.0[7ff67b9cf000+173000] [44520.259205] [sched_delayed] sched: RT throttling activated [48956.057816] blah.py[16623]: segfault at 2f ip 7fd462e5d046 sp 7fff638431e0 error 4 in libpython2.7.so.1.0[7fd462d7c000+173000] [49288.388797] blah.py[28631]: segfault at 7fe254b6aa58 ip 7fe255715f00 sp 7fff6ddaaff8 error 4 in libpython2.7.so.1.0[7fe255605000+173000] [49942.020084] blah.py[6950]: segfault at d0 ip 7f3e8a9acf9c sp 7fffa72288a0 error 4 in libpython2.7.so.1.0[7f3e8a904000+173000] [66696.443342] blah.py[8015]: segfault at cf ip 7f798f708f9c sp 7fff420336e0 error 4 in libpython2.7.so.1.0[7f798f66+173000] [67561.587383] blah.py[7483]: segfault at 7f7b16e01540 ip 7f7b17a85f00 sp 7fffe663d9b8 error 4 in libpython2.7.so.1.0[7f7b17975000+173000] [77262.490502] blah.py[29107]: segfault at 21e1458 ip 7fc54cd17f00 sp 7fff283c5c38 error 4 in libpython2.7.so.1.0[7fc54cc07000+173000] So, what does this [sched_delayed] sched: RT throttling activated tell me? That of the past 1s, 0.95s were spend running RR/FIFO tasks. It is a warning that comes only once per boot and should prompt you to investigate. Could kernel log by itself some kind of equivalent of the ps -deo pid,cls,cmd | grep -e 'RR \[' -e 'FF \[' command? You can turn the throttle off, but be advised that running a RR/FIFO task at 100% can (and generally does) negatively affect the running of your system (as in, these tasks can prevent system duties from taking place and eventually make the system come to a halt). Provided I have in my .config: # grep EMPT .config.current # CONFIG_PREEMPT_RCU is not set CONFIG_PREEMPT_NONE=y # CONFIG_PREEMPT_VOLUNTARY is not set # CONFIG_PREEMPT is not set does that mean that I can't do much about those kernel tasks reported by the ps command above? Or could kernel be tuned to be even less demanding and not interrupt my tasks that often (no idea how often that happens if the message is logged only once and how much harm is causes). As to those faults, investigate if your python prog does something particualrly weird or your runtime is in order. Otherwise I would advise you to run memtest for a while to make sure your machine is in proper working order. I will re-check the stacktraces but last time I did I did not come to a single place where it crashes. OK, will re-test the memory again but I think it is fine. It seemed those results of the overheated CPU and thermal
Re: [sched_delayed] sched: RT throttling activated
Peter Zijlstra wrote: On Fri, Aug 23, 2013 at 01:35:24PM +0200, Martin Mokrejs wrote: # ps -deo pid,cls,cmd | grep -e 'RR \[' -e 'FF \[' This explicitly only lists kernel threads; from your other comment: The shell/python tasks have 'TS' in place of the FF value in the second column so I guess they are not requiring realtime responsiveness. I'll assume you actually inspected the other tasks and found none. Yes, the other (false) matches were in the third or latter columns so I wanted to match just those true matches and cutpaste it. I admit, this is not a general-purpose REGEXP and is misleading. 7 FF [migration/0] 10 FF [watchdog/0] 11 FF [watchdog/1] 12 FF [migration/1] 17 FF [migration/2] 22 FF [migration/3] The 'migration' threads only look like FIFO threads but they're secretly not and don't count to the limit. The watchdog threads shouldn't run much either. 2161 FF [irq/50-iwlwifi] Oh a threaded interrupt, I presume you're not using threadiqrs since Is that what you talk about? CONFIG_IRQ_FORCED_THREADING=y CONFIG_GENERIC_SMP_IDLE_THREAD=y this is the only interrupt thread around and I see a 'request_threaded_irq()' call in drivers/net/wireless/iwlwifi/pcie/trans.c And wow, why would that thing consume that much cpu. Johill, ever seen the iwlwifi interrupt go 'funny' and consume gobs of cpu-time? I am not sure if I understand you but in case it helps somebody Current values: # cat /proc/interrupts CPU0 CPU1 0: 23 0 IO-APIC-edge timer 1: 42 0 IO-APIC-edge i8042 8: 36 0 IO-APIC-edge rtc0 9: 3 0 IO-APIC-fasteoi acpi 12: 404650 0 IO-APIC-edge i8042 16:109 0 IO-APIC-fasteoi ehci_hcd:usb1 23: 583646 0 IO-APIC-fasteoi ehci_hcd:usb2 40: 0 0 PCI-MSI-edge pciehp 41: 54319 0 PCI-MSI-edge i915 42: 553802 0 PCI-MSI-edge ahci 43: 0 0 PCI-MSI-edge enp5s0 44: 257268 0 PCI-MSI-edge xhci_hcd 45: 0 0 PCI-MSI-edge xhci_hcd 46: 0 0 PCI-MSI-edge xhci_hcd 47: 0 0 PCI-MSI-edge xhci_hcd 48: 0 0 PCI-MSI-edge xhci_hcd 49: 465462 0 PCI-MSI-edge snd_hda_intel 50:3895788 0 PCI-MSI-edge iwlwifi NMI: 8687 9483 Non-maskable interrupts LOC: 17531664 16978131 Local timer interrupts SPU: 0 0 Spurious interrupts PMI: 8687 9483 Performance monitoring interrupts IWI: 213009 205171 IRQ work interrupts RTR: 3 0 APIC ICR read retries RES:19226514491695 Rescheduling interrupts CAL: 73741 348678 Function call interrupts TLB: 98634 73 TLB shootdowns TRM: 0 0 Thermal event interrupts THR: 0 0 Threshold APIC interrupts MCE: 0 0 Machine check exceptions MCP:286286 Machine check polls ERR: 0 MIS: 0 # ifconfig wlp9s0 wlp9s0: flags=4163UP,BROADCAST,RUNNING,MULTICAST mtu 1500 inet 192.168.0.24 netmask 255.255.255.0 broadcast 192.168.0.255 inet6 fe80::4e80:93ff:fe15:e6c7 prefixlen 64 scopeid 0x20link ether 4c:80:93:15:e6:c7 txqueuelen 1000 (Ethernet) RX packets 811806 bytes 992611146 (946.6 MiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 490006 bytes 71390887 (68.0 MiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 # dmesg ... [ 11.789302] Intel(R) Wireless WiFi driver for Linux, in-tree:d [ 11.789310] Copyright(c) 2003-2013 Intel Corporation [ 11.791626] iwlwifi :09:00.0: irq 50 for MSI/MSI-X [ 12.044905] iwlwifi :09:00.0: loaded firmware version 18.168.6.1 op_mode iwldvm [ 13.896033] iwlwifi :09:00.0: CONFIG_IWLWIFI_DEBUG enabled [ 13.896041] iwlwifi :09:00.0: CONFIG_IWLWIFI_DEBUGFS disabled [ 13.896044] iwlwifi :09:00.0: CONFIG_IWLWIFI_DEVICE_TRACING disabled [ 13.896047] iwlwifi :09:00.0: CONFIG_IWLWIFI_DEVICE_TESTMODE disabled [ 13.896049] iwlwifi :09:00.0: CONFIG_IWLWIFI_P2P disabled [ 13.896054] iwlwifi :09:00.0: Detected Intel(R) Centrino(R) Wireless-N 1030 BGN, REV=0xB0 [ 13.896173] iwlwifi :09:00.0: L1 Disabled; Enabling L0S [ 13.917705] ieee80211 phy0: Selected rate control algorithm 'iwl-agn-rs' Nope, you get that message once to tell you that we throttle RT tasks. I think the message could improved to explain this is a warn ONCE message and that there is no [sched_delayed] sched: RT throttling deactivated counterpart message to be anticipated. Would something like: sched: [ONCE] RT throttle hit -- inspect system configuration
Re: [sched_delayed] sched: RT throttling activated
Martin Mokrejs wrote: Nope, you get that message once to tell you that we throttle RT tasks. I think the message could improved to explain this is a warn ONCE message and that there is no [sched_delayed] sched: RT throttling deactivated counterpart message to be anticipated. Would something like: sched: [ONCE] RT throttle hit -- inspect system configuration. Be a better message? Not really. I would prefer something like: [sched_delayed] sched: stopped running $cmd on CPU%d in favor of RR/FIFO task $psname Actually, to retain the message text appearing in current kernel so that people can find by e.g. Google newer syntax and possibly this thread maybe much better would be: [sched_delayed] sched: RT throttling limit $d hit. Stopped running $cmd on CPU%d in favor of RR/FIFO task $psname. Will not issue any more these messages until reboot. I know, looong line. I just realized this is about some threshold limit value, and you mean that iwlwifi contributed the highest increase compared to the other kernel threads on my system. sysctl -q -a | grep -i limit does not show what is the actual value. Am probably looking into a wrong place. ;-) -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [sched_delayed] sched: RT throttling activated
While you are probably thinking about the iwlwifi issue causing RT throttling I have one more interesting followup below. Peter Zijlstra wrote: On Fri, Aug 23, 2013 at 12:38:53PM +0200, Martin Mokrejs wrote: It means you have (a) real-time task(s) that consume significant amount How can I find them? ps -deo pid,cls,cmd | grep -e RR -e FF Should do I suppose I don't think I need the RT, I have two CPU-bound processes and want to run them at max speed. Rest of the system is unimportant. I still don't understand what the $subj message actually says. Does it say the RT-requiring task was slowed down? I am a bit lost here. Yeah, they were forcibly stopped from running for a little while. of time. At some point we throttle them in an attempt to keep the system from falling over. Will I get companion [sched_delayed] sched: RT throttling deactivated at some point? Nope, you get that message once to tell you that we throttle RT tasks. Are python-based apps requiring the realtime features? I'm fairly sure python could use the relevant scheduling classes, but I don't speak snake so I really wouldn't know. I used to get the messages below which are now gone with my CPU cooler being replaced yesterday: [ 4172.717272] CPU1: Core temperature above threshold, cpu clock throttled (total events = 153727) mcelog report in such cases: Hardware event. This is not a software error. MCE 0 CPU 1 THERMAL EVENT TSC 1bf82e2a146 TIME 1375536062 Sat Aug 3 15:21:02 2013 Processor 1 heated above trip temperature. Throttling enabled. Please check your system cooling. Performance will be impacted STATUS 880003c3 MCGSTATUS 0 MCGCAP c07 APICID 2 SOCKETID 0 CPUID Vendor Intel Family 6 Model 42 Right, those are thermal events throttling the speed of your CPU to keep the thing from heat damaging itself. While my CPU cooler got replaced even now I still get (hence this email thread): [39564.452795] blah.py[14396]: segfault at 7ff67af34a58 ip 7ff67badff00 sp 7fff771ce798 error 4 in libpython2.7.so.1.0[7ff67b9cf000+173000] [44520.259205] [sched_delayed] sched: RT throttling activated [48956.057816] blah.py[16623]: segfault at 2f ip 7fd462e5d046 sp 7fff638431e0 error 4 in libpython2.7.so.1.0[7fd462d7c000+173000] [49288.388797] blah.py[28631]: segfault at 7fe254b6aa58 ip 7fe255715f00 sp 7fff6ddaaff8 error 4 in libpython2.7.so.1.0[7fe255605000+173000] [49942.020084] blah.py[6950]: segfault at d0 ip 7f3e8a9acf9c sp 7fffa72288a0 error 4 in libpython2.7.so.1.0[7f3e8a904000+173000] [66696.443342] blah.py[8015]: segfault at cf ip 7f798f708f9c sp 7fff420336e0 error 4 in libpython2.7.so.1.0[7f798f66+173000] [67561.587383] blah.py[7483]: segfault at 7f7b16e01540 ip 7f7b17a85f00 sp 7fffe663d9b8 error 4 in libpython2.7.so.1.0[7f7b17975000+173000] [77262.490502] blah.py[29107]: segfault at 21e1458 ip 7fc54cd17f00 sp 7fff283c5c38 error 4 in libpython2.7.so.1.0[7fc54cc07000+173000] So, what does this [sched_delayed] sched: RT throttling activated tell me? That of the past 1s, 0.95s were spend running RR/FIFO tasks. It is a warning that comes only once per boot and should prompt you to investigate. You can turn the throttle off, but be advised that running a RR/FIFO task at 100% can (and generally does) negatively affect the running of your system (as in, these tasks can prevent system duties from taking place and eventually make the system come to a halt). As to those faults, investigate if your python prog does something particualrly weird or your runtime is in order. Otherwise I would advise you to run memtest for a while to make sure your machine is in proper working order. Hmm, meanwhile the core dumps filled up my /var/dumps/ directory of / filesystem. I do not have timing information what was the time since bootup. I deleted some files on the disk and thought I am done. Now, few hours later I realized: [85451.247130] traps: blah.py[30787] general protection ip:7faf7b57a046 sp:7fffd9f7b1d0 error:0 in libpython2.7.so.1.0[7faf7b499000+173000] [87125.493730] nr_pdflush_threads exported in /proc is scheduled for removal [87125.494238] sysctl: The scan_unevictable_pages sysctl/node-interface has been disabled for lack of a legitimate use case. If you have one, please send an email to linux...@kvack.org. [97959.812943] blah.py[13069]: segfault at 7f1f2cfdca58 ip 7f1f2db87f00 sp 7fffade41768 error 4 in libpython2.7.so.1.0[7f1f2da77000+173000] I bet at about the time 87125 the disk was full. The laptop has 16GB of RAM and the coredump files are really big, 300MB to 8GB. However, the nr_pdflush_threads message sounds scary. Does linux 3.10.9 want to delete /proc on the fly? ;-) Martin -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo
Re: [PATCH] pciehp: Add pciehp_surprise module option
Takashi Iwai wrote: > At Wed, 20 Mar 2013 19:41:38 +0100, > Martin Mokrejs wrote: >> >> Hi Takashi, >> would you please describe your test system in more detail? How >> about 'lspci -tv'? And 'lsusb -v' of the broken device? > > I left the machine in my office, so I'll give details tomorrow. > It's a Realtek 5249 PCI-e card reader, and this appears as a PCI > device once when registered by pciehp. At cold boot, it doesn't > appear in lspci. It appears only when you insert the card. Also, > this device is no USB. It's supported by mfd/rtsx_pci driver in 3.9. > >> If I have Realtek MediaCardReader enabled in BIOS, no card in it, coldboot, >> and hot >> insert an ExpressCard into the slot, the Realtek MediaCardReader pops up in >> dmesg as >> a new PCI device. How about you? > > The device is hotplugged only when the option of my patch is enabled, > i.e. overriding the surprise capability check. > >> My card does NOT show in lspci (maybe because I never plugged in a data card >> into it) but does show in lsusb: > > So, it's a completely different case... > >> >> Bus 002 Device 005: ID 0bda:0138 Realtek Semiconductor Corp. RTS5138 Card >> Reader Controller >> Device Descriptor: >> bLength18 >> bDescriptorType 1 >> bcdUSB 2.00 >> bDeviceClass0 (Defined at Interface level) >> bDeviceSubClass 0 >> bDeviceProtocol 0 >> bMaxPacketSize064 >> idVendor 0x0bda Realtek Semiconductor Corp. >> idProduct 0x0138 RTS5138 Card Reader Controller >> bcdDevice 38.82 >> iManufacturer 1 Generic >> iProduct2 USB2.0-CRW >> iSerial 3 2009051638820 >> >> >> Can you try coldboot without a media card inserted before power up without >> your patch and check whether the CardReader pops up after you plugin some >> ExpressCard into an ExpressCardSlot (not the CardReader)? I presume it is >> a laptop. ;-) > > When you boot without the card, there is no PCI device. Triggering > PCI bus rescan also doesn't expose it. But, when you insert the card, > you'll get the notification in pciehp (seeing "Card present on Slot" > message), but pciehp doesn't do anything right now unless the > surprising bit is set. The device may appear if you trigger the PCI > bus rescan at this moment, too, though. > >> 2. Is the hotplug broken also under acpiphp? And again, does it get detected >> once you plugin some card into an ExpressCard slot? > > acpiphp doesn't load on this machine. While we concluded above that I have a different card (USB-hooked Realtek card) I need since about 3.5 kernel pcie_aspm=off to get acpiphp working. It does not work for all express cards but maybe this will help you to get *acpiphp* recognize the slot? Note: the same kernel command line pcie_aspm=off breaks *pciehp* on my laptop, so don't forget to delete it from grub.conf if you want to stick with *pciehp* (if your hardware is prone to hit same bug like me: https://bugzilla.kernel.org/show_bug.cgi?id=59391 . Just in case you could switch to acpiphp. ;) Martin -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] pciehp: Add pciehp_surprise module option
Takashi Iwai wrote: At Wed, 20 Mar 2013 19:41:38 +0100, Martin Mokrejs wrote: Hi Takashi, would you please describe your test system in more detail? How about 'lspci -tv'? And 'lsusb -v' of the broken device? I left the machine in my office, so I'll give details tomorrow. It's a Realtek 5249 PCI-e card reader, and this appears as a PCI device once when registered by pciehp. At cold boot, it doesn't appear in lspci. It appears only when you insert the card. Also, this device is no USB. It's supported by mfd/rtsx_pci driver in 3.9. If I have Realtek MediaCardReader enabled in BIOS, no card in it, coldboot, and hot insert an ExpressCard into the slot, the Realtek MediaCardReader pops up in dmesg as a new PCI device. How about you? The device is hotplugged only when the option of my patch is enabled, i.e. overriding the surprise capability check. My card does NOT show in lspci (maybe because I never plugged in a data card into it) but does show in lsusb: So, it's a completely different case... Bus 002 Device 005: ID 0bda:0138 Realtek Semiconductor Corp. RTS5138 Card Reader Controller Device Descriptor: bLength18 bDescriptorType 1 bcdUSB 2.00 bDeviceClass0 (Defined at Interface level) bDeviceSubClass 0 bDeviceProtocol 0 bMaxPacketSize064 idVendor 0x0bda Realtek Semiconductor Corp. idProduct 0x0138 RTS5138 Card Reader Controller bcdDevice 38.82 iManufacturer 1 Generic iProduct2 USB2.0-CRW iSerial 3 2009051638820 Can you try coldboot without a media card inserted before power up without your patch and check whether the CardReader pops up after you plugin some ExpressCard into an ExpressCardSlot (not the CardReader)? I presume it is a laptop. ;-) When you boot without the card, there is no PCI device. Triggering PCI bus rescan also doesn't expose it. But, when you insert the card, you'll get the notification in pciehp (seeing Card present on Slot message), but pciehp doesn't do anything right now unless the surprising bit is set. The device may appear if you trigger the PCI bus rescan at this moment, too, though. 2. Is the hotplug broken also under acpiphp? And again, does it get detected once you plugin some card into an ExpressCard slot? acpiphp doesn't load on this machine. While we concluded above that I have a different card (USB-hooked Realtek card) I need since about 3.5 kernel pcie_aspm=off to get acpiphp working. It does not work for all express cards but maybe this will help you to get *acpiphp* recognize the slot? Note: the same kernel command line pcie_aspm=off breaks *pciehp* on my laptop, so don't forget to delete it from grub.conf if you want to stick with *pciehp* (if your hardware is prone to hit same bug like me: https://bugzilla.kernel.org/show_bug.cgi?id=59391 . Just in case you could switch to acpiphp. ;) Martin -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] ACPI: Fix potential NULL pointer dereference in acpi_processor_add()
Hanjun Guo wrote: > On 2013-5-29 7:30, Rafael J. Wysocki wrote: >> On Thursday, May 23, 2013 08:44:26 PM Hanjun Guo wrote: >>> In acpi_processor_add(), get_cpu_device() will return NULL sometimes, >>> although the chances are small, I think it should be fixed. >>> >>> Signed-off-by: Hanjun Guo >> >> This patch isn't necessary any more after the changes queued up for 3.11 >> in the acpi-hotplug branch of the linux-pm.git tree. > > Ok, I noticed your patch set, just drop my patch. But shouldn't this go to stable at least? I checked linux-3.9.4 and it applies fine. Whether this is relevant for other stable series I will leave up to somebody else. ;) Martin > > Thanks > Hanjun > >> >> Thanks, >> Rafael >> >> >>> --- >>> drivers/acpi/processor_driver.c |4 >>> 1 files changed, 4 insertions(+), 0 deletions(-) >>> >>> diff --git a/drivers/acpi/processor_driver.c >>> b/drivers/acpi/processor_driver.c >>> index bec717f..dd64f23 100644 >>> --- a/drivers/acpi/processor_driver.c >>> +++ b/drivers/acpi/processor_driver.c >>> @@ -579,6 +579,10 @@ static int __cpuinit acpi_processor_add(struct >>> acpi_device >>> *device) >>> per_cpu(processors, pr->id) = pr; >>> >>> dev = get_cpu_device(pr->id); >>> + if (!dev) { >>> + result = -ENODEV; >>> + goto err_clear_processor; >>> + } >>> if (sysfs_create_link(>dev.kobj, >kobj, "sysdev")) { >>> result = -EFAULT; >>> goto err_clear_processor; >>> > > -- > To unsubscribe from this list: send the line "unsubscribe linux-acpi" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] ACPI: Fix potential NULL pointer dereference in acpi_processor_add()
Hanjun Guo wrote: On 2013-5-29 7:30, Rafael J. Wysocki wrote: On Thursday, May 23, 2013 08:44:26 PM Hanjun Guo wrote: In acpi_processor_add(), get_cpu_device() will return NULL sometimes, although the chances are small, I think it should be fixed. Signed-off-by: Hanjun Guo hanjun@linaro.org This patch isn't necessary any more after the changes queued up for 3.11 in the acpi-hotplug branch of the linux-pm.git tree. Ok, I noticed your patch set, just drop my patch. But shouldn't this go to stable at least? I checked linux-3.9.4 and it applies fine. Whether this is relevant for other stable series I will leave up to somebody else. ;) Martin Thanks Hanjun Thanks, Rafael --- drivers/acpi/processor_driver.c |4 1 files changed, 4 insertions(+), 0 deletions(-) diff --git a/drivers/acpi/processor_driver.c b/drivers/acpi/processor_driver.c index bec717f..dd64f23 100644 --- a/drivers/acpi/processor_driver.c +++ b/drivers/acpi/processor_driver.c @@ -579,6 +579,10 @@ static int __cpuinit acpi_processor_add(struct acpi_device *device) per_cpu(processors, pr-id) = pr; dev = get_cpu_device(pr-id); + if (!dev) { + result = -ENODEV; + goto err_clear_processor; + } if (sysfs_create_link(device-dev.kobj, dev-kobj, sysdev)) { result = -EFAULT; goto err_clear_processor; -- To unsubscribe from this list: send the line unsubscribe linux-acpi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PROBLEM: modprobe hang at startup (3.8.x, 3.9.x, IBM x3550)
Hi, while you are chasing some problem with i2c_801 I would like to mention that I never got an answer on the thread https://lkml.org/lkml/2013/1/23/405 about a kmemleak reported by kernel . Maybe this could give you a hint? If these do not overlap I would be anyways glad to receive an answer via the original thread I have started. Thank you, Martin Jean Delvare wrote: > Hi Robert, > > On Thu, 16 May 2013 13:44:55 +1000, Robert Norris wrote: >> On Wed, May 15, 2013 at 09:49:23PM +0200, Jean Delvare wrote: Interrupt: pin B routed to IRQ 0 >>> >>> Hmm, this "IRQ 0" is quite odd. I'm wondering if this could be the >>> reason for this hang. Was it with the i2c-i801 driver loaded, or >>> blacklisted? Please check if it makes a difference. >> >> That was without the driver loaded (blacklisted). After loading (with >> interrupts enabled) we get: >> >> Interrupt: pin B routed to IRQ 20 > > For the record, I also see the IRQ value change after loading the > i2c-i801 driver on my system (with an ICH10 south bridge.) From 14 to > 22 in my case. So it's a bit different (no IRQ 0) but not still > somewhat similar, so I'm still not sure if this has anything to do with > your issue. > >> >>> Do you see the same (and more generally, this issue) on one, some or >>> all of your x3550 servers? >> >> The issue has occured on at least three x3550s (we have 11). I haven't >> tested more, because knowingly crashing production machines sucks. > > Yes of course, I understand, I did not expect you to do that ;) > >> This appears to be the case on other machines. With the module >> blacklisted (never loaded), lspci shows IRQ 0. After load, IRQ 20. >> (tested on 3.4 and 3.9). > > OK. > >>> Are you using IPMI on these machines? >> >> Yes, but only for monitoring/sensors, if that makes a difference. > > IPMI is still likely to access the SMBus controller. If there's a BMC > in the machine, it can also access the SMBus slave with its own > controller. It would be good to rule this out by disabling IPMI > completely, removing the BMC from the machine if it has one, and > checking if it makes the issue go away or not. > >>> I would appreciate if you could test the following: >>> * Blacklist i2c-i801 and ics932s401 so that none of them get >>> auto-loaded. >> >> Done. >> >>> * Manually load i2c-i801 with interrupts enabled, and see what >>> happens. >> >> Returned immediately: >> >> [ 60.527140] i801_smbus :00:1f.3: SMBus using PCI Interrupt > > This confirms that the i2c-i801 driver loading itself isn't the problem. > >>> * If no hang happens, load i2c-dev, find the i801 bus number with >>> i2cdetect -l (from the i2c-tools package - it should be 4 according >>> to what you reported so far but there is no guarantee that it won't >>> change across reboots.) >> >> $ i2cdetect -l >> i2c-0 i2c Radeon i2c bit bus DVI_DDC I2C adapter >> i2c-1 i2c Radeon i2c bit bus VGA_DDC I2C adapter >> i2c-2 i2c Radeon i2c bit bus MONIDI2C adapter >> i2c-3 i2c Radeon i2c bit bus CRT2_DDC I2C adapter >> i2c-4 smbus SMBus I801 adapter at 0440 SMBus adapter >> >>> Then do a simple read from a random address >>> with: >>> # i2cget 4 0x50 0x00 >>> (Adjust the bus number as needed.) >>> I am curious if this will hang as well or only when accessing the >>> clock chip at address 0x69. >> >> Yep, that one hangs. The hung task handler picked it up after a few >> minutes. > > OK, this means that any transaction request to the SMBus controller > causes the hang. > > The i2c-i801 driver is optimistically using wait_event() when waiting > for an interrupt to arrive. I suppose that the interrupt is never > delivered in your case (all 0 in /proc/interrupts.) > > Daniel, shouldn't we use wait_event_timeout() instead to catch issues > like this and fail cleanly? Maybe even fallback to polling > automatically? > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PROBLEM: modprobe hang at startup (3.8.x, 3.9.x, IBM x3550)
Hi, while you are chasing some problem with i2c_801 I would like to mention that I never got an answer on the thread https://lkml.org/lkml/2013/1/23/405 about a kmemleak reported by kernel . Maybe this could give you a hint? If these do not overlap I would be anyways glad to receive an answer via the original thread I have started. Thank you, Martin Jean Delvare wrote: Hi Robert, On Thu, 16 May 2013 13:44:55 +1000, Robert Norris wrote: On Wed, May 15, 2013 at 09:49:23PM +0200, Jean Delvare wrote: Interrupt: pin B routed to IRQ 0 Hmm, this IRQ 0 is quite odd. I'm wondering if this could be the reason for this hang. Was it with the i2c-i801 driver loaded, or blacklisted? Please check if it makes a difference. That was without the driver loaded (blacklisted). After loading (with interrupts enabled) we get: Interrupt: pin B routed to IRQ 20 For the record, I also see the IRQ value change after loading the i2c-i801 driver on my system (with an ICH10 south bridge.) From 14 to 22 in my case. So it's a bit different (no IRQ 0) but not still somewhat similar, so I'm still not sure if this has anything to do with your issue. Do you see the same (and more generally, this issue) on one, some or all of your x3550 servers? The issue has occured on at least three x3550s (we have 11). I haven't tested more, because knowingly crashing production machines sucks. Yes of course, I understand, I did not expect you to do that ;) This appears to be the case on other machines. With the module blacklisted (never loaded), lspci shows IRQ 0. After load, IRQ 20. (tested on 3.4 and 3.9). OK. Are you using IPMI on these machines? Yes, but only for monitoring/sensors, if that makes a difference. IPMI is still likely to access the SMBus controller. If there's a BMC in the machine, it can also access the SMBus slave with its own controller. It would be good to rule this out by disabling IPMI completely, removing the BMC from the machine if it has one, and checking if it makes the issue go away or not. I would appreciate if you could test the following: * Blacklist i2c-i801 and ics932s401 so that none of them get auto-loaded. Done. * Manually load i2c-i801 with interrupts enabled, and see what happens. Returned immediately: [ 60.527140] i801_smbus :00:1f.3: SMBus using PCI Interrupt This confirms that the i2c-i801 driver loading itself isn't the problem. * If no hang happens, load i2c-dev, find the i801 bus number with i2cdetect -l (from the i2c-tools package - it should be 4 according to what you reported so far but there is no guarantee that it won't change across reboots.) $ i2cdetect -l i2c-0 i2c Radeon i2c bit bus DVI_DDC I2C adapter i2c-1 i2c Radeon i2c bit bus VGA_DDC I2C adapter i2c-2 i2c Radeon i2c bit bus MONIDI2C adapter i2c-3 i2c Radeon i2c bit bus CRT2_DDC I2C adapter i2c-4 smbus SMBus I801 adapter at 0440 SMBus adapter Then do a simple read from a random address with: # i2cget 4 0x50 0x00 (Adjust the bus number as needed.) I am curious if this will hang as well or only when accessing the clock chip at address 0x69. Yep, that one hangs. The hung task handler picked it up after a few minutes. OK, this means that any transaction request to the SMBus controller causes the hang. The i2c-i801 driver is optimistically using wait_event() when waiting for an interrupt to arrive. I suppose that the interrupt is never delivered in your case (all 0 in /proc/interrupts.) Daniel, shouldn't we use wait_event_timeout() instead to catch issues like this and fail cleanly? Maybe even fallback to polling automatically? -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
3.9-linux-next-20130501: OOPS in intel_pstate_sample
Hi, I opened yet another bug https://bugzilla.kernel.org/show_bug.cgi?id=57411 . This is maybe a dupe of bug https://bugzilla.kernel.org/show_bug.cgi?id=57401 (which is vanilla 3.9) but happened on linux-next-20130501 after I did "dmesg | less". ? pid_param_set intel_pstate_timer_func call_timer_fn ? __internal_add_timer ? pid_param_set run_timer_softirq __do_softirq irq_exit smp_apic_timer_interrupt apic_timer_interrupt ? sysret_check A camera picture of the stacktrace is attached to the bug https://bugzilla.kernel.org/show_bug.cgi?id=57411 Please forward this to the appropriate person. Thanks, Martin -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
linux-3.9: OOPS in intel_timer_pstate_func
Hi, I just this kernel crash on my laptop running fine so far on 3.7.10 (and 3.8.5 if really necessary). The 3.9 was running for maybe 2 hrs, at the most. :( ? cpumask_weight call_timer_fn.clone ? init_timer_key run_timer_softirq ? cpumask_weight __do_softirq smp_apic_timer_interrupt apic_timer_interrupt ? cpuidle_wrap_enter ? cpuidle_wrap_enter cpuidle_enter_tk cpuidle_enter_state cpuidle_call cpu_idel rest_init ? csum_partial_copy_generic start_kernel ? repair_env_string x86_64_start_reservations x86_64_start_kernel I just opened a bug at https://bugzilla.kernel.org/show_bug.cgi?id=57401 with a camera picture of the screen with the stacktrace. I failed to find a component like CPU or IRQ so please forward this to the appropriate person. Thank you, Martin -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
linux-3.9: OOPS in intel_timer_pstate_func
Hi, I just this kernel crash on my laptop running fine so far on 3.7.10 (and 3.8.5 if really necessary). The 3.9 was running for maybe 2 hrs, at the most. :( ? cpumask_weight call_timer_fn.clone ? init_timer_key run_timer_softirq ? cpumask_weight __do_softirq smp_apic_timer_interrupt apic_timer_interrupt ? cpuidle_wrap_enter ? cpuidle_wrap_enter cpuidle_enter_tk cpuidle_enter_state cpuidle_call cpu_idel rest_init ? csum_partial_copy_generic start_kernel ? repair_env_string x86_64_start_reservations x86_64_start_kernel I just opened a bug at https://bugzilla.kernel.org/show_bug.cgi?id=57401 with a camera picture of the screen with the stacktrace. I failed to find a component like CPU or IRQ so please forward this to the appropriate person. Thank you, Martin -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
3.9-linux-next-20130501: OOPS in intel_pstate_sample
Hi, I opened yet another bug https://bugzilla.kernel.org/show_bug.cgi?id=57411 . This is maybe a dupe of bug https://bugzilla.kernel.org/show_bug.cgi?id=57401 (which is vanilla 3.9) but happened on linux-next-20130501 after I did dmesg | less. ? pid_param_set intel_pstate_timer_func call_timer_fn ? __internal_add_timer ? pid_param_set run_timer_softirq __do_softirq irq_exit smp_apic_timer_interrupt apic_timer_interrupt ? sysret_check A camera picture of the stacktrace is attached to the bug https://bugzilla.kernel.org/show_bug.cgi?id=57411 Please forward this to the appropriate person. Thanks, Martin -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Update][PATCH] PCI / PM: Disable runtime PM of PCIe ports
Bjorn Helgaas wrote: > On Mon, Apr 1, 2013 at 2:51 PM, Rafael J. Wysocki wrote: >> On Monday, April 01, 2013 11:34:46 AM Bjorn Helgaas wrote: >>> [+cc Zheng, who added this with 71a83bd727] >>> >>> On Sat, Mar 30, 2013 at 4:38 PM, Rafael J. Wysocki wrote: From: Rafael J. Wysocki The runtime PM of PCIe ports turns out to be quite fragile, as in some cases things work while in some other cases they don't and we don't seem to have a good way to determine whether or not they are going to work in advance. >>> >>> Do you have any references to problems encountered when enabling >>> runtime PM for PCIe ports? That information will be useful to anybody >>> who wants to take another crack at getting this working. >> >> Well, bug 53811 is one example and problems recently reported by >> Martin are another. Do you want me to dig deeper? > > OK, I got this one: > > https://bugzilla.kernel.org/show_bug.cgi?id=53811 > > Martin has reported a lot of problems lately, and I don't know which > are related to runtime PM for PCIe ports. I was hoping for a couple > URLs to put in the changelog so that when somebody gets the itch to > make this work, they have some useful info to start from. If you > point me at a specific message, I'll dig up an archive URL for it. In the thread Re: 3.8.2: xhci port is dead until pcieport PME# goes to disabled http://marc.info/?t=13632822262=1=2 I reported that if an upstream express root port 1c.4 of the xHCI controller at 0b:00 is suspended the USB3 socket on the laptop appears dead. Initially I found that 'lsusb -v' rescues the dead socket and is accompanied by these in logs: [ 1445.597641] pcieport :00:1c.4: PME# disabled [ 1445.617667] xhci_hcd :0b:00.0: PME# disabled Ying Huang then realized elsewhere I am running laptop-mode-tools although in their config file I set that they should NOT be run when on AC power. Looks they do enable 'auto' power mode as seen in /sys/bus/pci/devices/*/power/control files already upon bootup. BTW, even worse, if I do /etc/init.d/laptop-mode-tools stop they restore to some initial values. :(( So, if I meanwhile forced 'on' for some device they will return me back to 'auto' and the device will immediately do suspend. ;-) Provided I uninstalled the laptop-mode-tools and made sure all control files say 'on' (and hence runtime_status files say 'active') then my problem is with a dead xHCI port 'obeyed'. Myself it weird that suspend of the port happens only upon USB device unplug. The port does not suspend by itself if unused. What is not clear to me how kernel is going to handle laptop-mode-tools which enabled powersaving on the 1c.4. In my naive, user view kernel does not realize and *check* that no user tool or a desperate user tried to suspend an upstream port while there is something bound to it and it does not apply a check for cascaded devices (1c.4 > 0b:00 and 1c.7 -> 11:00 in my case). I am writing this without a reference but modprobe of a driver can overcome suspended root port. I am in this particular case meaning my 1c.7 port and its downstream 11:00 express card device. From the top of my head I am not sure if modprobe overcame both 1c.7 and 11:00 being initially suspended. I could dig it out from the Re: 3.9-rc1: pciehp and eSATA card SiI 3132, no XHCI http://marc.info/?t=13630500881=1=2 thread if you want. Or it might be easier for you to test it yourself. So, for me the issue is not fixed but if you decide to disable runtime power saving for devices under pcieport I don't mind. Their mishandling definitely causes my acpiphp hotplug issues under 3.7-3.8 kernels (3.9-rc not tested) whereas these PM issues do not answer why pciehp is broken on 3.7-3.9-rc1. Anyway, this patch maybe only good because I would like to use the laptop-mode-tools and they for sure will put one of the devices into 'auto' and it will likely fall into suspend. Martin > > Otherwise, I'm afraid we'll just oscillate between "enable PM, find > bug, disable PM, enable PM, find same bug, disable PM, etc..." > > Bjorn > For this reason, avoid enabling runtime PM for PCIe ports by keeping their runtime PM reference counters always above 0 for the time being. Signed-off-by: Rafael J. Wysocki --- This version also removes the no longer necessary (and empty anyway) port_runtime_pm_black_list[] table. Thanks, Rafael --- drivers/pci/pcie/portdrv_pci.c | 13 - 1 file changed, 13 deletions(-) Index: linux-pm/drivers/pci/pcie/portdrv_pci.c === --- linux-pm.orig/drivers/pci/pcie/portdrv_pci.c +++ linux-pm/drivers/pci/pcie/portdrv_pci.c @@ -185,14 +185,6 @@ static const struct dev_pm_ops pcie_port #endif /* !PM */ /* - * PCIe port runtime suspend is broken for some chipsets, so use a - * black list
Re: [Update][PATCH] PCI / PM: Disable runtime PM of PCIe ports
Bjorn Helgaas wrote: On Mon, Apr 1, 2013 at 2:51 PM, Rafael J. Wysocki r...@sisk.pl wrote: On Monday, April 01, 2013 11:34:46 AM Bjorn Helgaas wrote: [+cc Zheng, who added this with 71a83bd727] On Sat, Mar 30, 2013 at 4:38 PM, Rafael J. Wysocki r...@sisk.pl wrote: From: Rafael J. Wysocki rafael.j.wyso...@intel.com The runtime PM of PCIe ports turns out to be quite fragile, as in some cases things work while in some other cases they don't and we don't seem to have a good way to determine whether or not they are going to work in advance. Do you have any references to problems encountered when enabling runtime PM for PCIe ports? That information will be useful to anybody who wants to take another crack at getting this working. Well, bug 53811 is one example and problems recently reported by Martin are another. Do you want me to dig deeper? OK, I got this one: https://bugzilla.kernel.org/show_bug.cgi?id=53811 Martin has reported a lot of problems lately, and I don't know which are related to runtime PM for PCIe ports. I was hoping for a couple URLs to put in the changelog so that when somebody gets the itch to make this work, they have some useful info to start from. If you point me at a specific message, I'll dig up an archive URL for it. In the thread Re: 3.8.2: xhci port is dead until pcieport PME# goes to disabled http://marc.info/?t=13632822262r=1w=2 I reported that if an upstream express root port 1c.4 of the xHCI controller at 0b:00 is suspended the USB3 socket on the laptop appears dead. Initially I found that 'lsusb -v' rescues the dead socket and is accompanied by these in logs: [ 1445.597641] pcieport :00:1c.4: PME# disabled [ 1445.617667] xhci_hcd :0b:00.0: PME# disabled Ying Huang then realized elsewhere I am running laptop-mode-tools although in their config file I set that they should NOT be run when on AC power. Looks they do enable 'auto' power mode as seen in /sys/bus/pci/devices/*/power/control files already upon bootup. BTW, even worse, if I do /etc/init.d/laptop-mode-tools stop they restore to some initial values. :(( So, if I meanwhile forced 'on' for some device they will return me back to 'auto' and the device will immediately do suspend. ;-) Provided I uninstalled the laptop-mode-tools and made sure all control files say 'on' (and hence runtime_status files say 'active') then my problem is with a dead xHCI port 'obeyed'. Myself it weird that suspend of the port happens only upon USB device unplug. The port does not suspend by itself if unused. What is not clear to me how kernel is going to handle laptop-mode-tools which enabled powersaving on the 1c.4. In my naive, user view kernel does not realize and *check* that no user tool or a desperate user tried to suspend an upstream port while there is something bound to it and it does not apply a check for cascaded devices (1c.4 0b:00 and 1c.7 - 11:00 in my case). I am writing this without a reference but modprobe of a driver can overcome suspended root port. I am in this particular case meaning my 1c.7 port and its downstream 11:00 express card device. From the top of my head I am not sure if modprobe overcame both 1c.7 and 11:00 being initially suspended. I could dig it out from the Re: 3.9-rc1: pciehp and eSATA card SiI 3132, no XHCI http://marc.info/?t=13630500881r=1w=2 thread if you want. Or it might be easier for you to test it yourself. So, for me the issue is not fixed but if you decide to disable runtime power saving for devices under pcieport I don't mind. Their mishandling definitely causes my acpiphp hotplug issues under 3.7-3.8 kernels (3.9-rc not tested) whereas these PM issues do not answer why pciehp is broken on 3.7-3.9-rc1. Anyway, this patch maybe only good because I would like to use the laptop-mode-tools and they for sure will put one of the devices into 'auto' and it will likely fall into suspend. Martin Otherwise, I'm afraid we'll just oscillate between enable PM, find bug, disable PM, enable PM, find same bug, disable PM, etc... Bjorn For this reason, avoid enabling runtime PM for PCIe ports by keeping their runtime PM reference counters always above 0 for the time being. Signed-off-by: Rafael J. Wysocki rafael.j.wyso...@intel.com --- This version also removes the no longer necessary (and empty anyway) port_runtime_pm_black_list[] table. Thanks, Rafael --- drivers/pci/pcie/portdrv_pci.c | 13 - 1 file changed, 13 deletions(-) Index: linux-pm/drivers/pci/pcie/portdrv_pci.c === --- linux-pm.orig/drivers/pci/pcie/portdrv_pci.c +++ linux-pm/drivers/pci/pcie/portdrv_pci.c @@ -185,14 +185,6 @@ static const struct dev_pm_ops pcie_port #endif /* !PM */ /* - * PCIe port runtime suspend is broken for some chipsets, so use a - * black list to disable runtime PM for these chipsets. - */ -static const struct pci_device_id
Re: [Update][PATCH] PCI / ACPI: Always resume devices on ACPI wakeup notifications
So, I re-tested again with the patch and 3.8.3 but without laptop-mode-tools. The xHCI port works fine provided /sys/bus/pci/devices/:0b:00.0/power/control is set to on and /sys/bus/pci/devices/:00:1c.4/power/control also to on. If I set parent 1c.4 to auto, it gets suspended and the port seems dead until a device is in and I wake it using lsusb -vv. There must be a bug in linux so that it cannot overcome upstream 1c.4 sleeping while willing to access 0b:00. Or more likely, that upstream root port should be prevented to fall asleep, right? # lspci -tv -[:00]-+-00.0 Intel Corporation 2nd Generation Core Processor Family DRAM Controller +-02.0 Intel Corporation 2nd Generation Core Processor Family Integrated Graphics Controller +-16.0 Intel Corporation 6 Series/C200 Series Chipset Family MEI Controller #1 +-1a.0 Intel Corporation 6 Series/C200 Series Chipset Family USB Enhanced Host Controller #2 +-1b.0 Intel Corporation 6 Series/C200 Series Chipset Family High Definition Audio Controller +-1c.0-[03-04]-- +-1c.1-[05-06]00.0 Realtek Semiconductor Co., Ltd. RTL8111/8168 PCI Express Gigabit Ethernet controller +-1c.3-[09-0a]00.0 Intel Corporation Centrino Wireless-N 1030 [Rainbow Peak] +-1c.4-[0b-0c]00.0 Texas Instruments TUSB73x0 SuperSpeed USB 3.0 xHCI Host Controller +-1c.7-[11-16]00.0 Silicon Image, Inc. SiI 3132 Serial ATA Raid II Controller +-1d.0 Intel Corporation 6 Series/C200 Series Chipset Family USB Enhanced Host Controller #1 +-1f.0 Intel Corporation HM67 Express Chipset Family LPC Controller +-1f.2 Intel Corporation 6 Series/C200 Series Chipset Family 6 port SATA AHCI Controller \-1f.3 Intel Corporation 6 Series/C200 Series Chipset Family SMBus Controller # I have attached the lspci -vvv -n. Interestingly, maybe, the TI xHCI controller ended up after my tests changed. I booted up with all devices with power/control set to on due to laptop-mode-tools uninstalled. I fiddled with the echo commands tweaking 1c.4 and 0b:00 but in the end set both back to "on". However, below is some diff. Don't know what that means. Maybe because I tried to write '0', 'off', 'none' to the control file? ;-) 00:1c.4 0604: 8086:1c18 (rev b5) (prog-if 00 [Normal decode]) Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR- TAbort- Reset- FastB2B- PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn- Capabilities: [40] Express (v2) Root Port (Slot+), MSI 00 DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us ExtTag- RBE+ FLReset- DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported- RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop- MaxPayload 128 bytes, MaxReadReq 128 bytes DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+ TransPend- LnkCap: Port #5, Speed 5GT/s, Width x1, ASPM L0s L1, Latency L0 <512ns, L1 <16us ClockPM- Surprise- LLActRep+ BwNot- LnkCtl: ASPM L1 Enabled; RCB 64 bytes Disabled- Retrain- CommClk+ ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- LnkSta: Speed 5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive+ BWMgmt+ ABWMgmt+ SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug- Surprise- Slot #4, PowerLimit 10.000W; Interlock- NoCompl+ SltCtl: Enable: AttnBtn- PwrFlt- MRL- PresDet- CmdCplt- HPIrq- LinkChg- Control: AttnInd Unknown, PwrInd Unknown, Power- Interlock- SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet+ Interlock- Changed: MRL- PresDet- LinkState+ RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna- CRSVisible- RootCap: CRSVisible- - RootSta: PME ReqID , PMEStatus- PMEPending- + RootSta: PME ReqID 0b00, PMEStatus- PMEPending- DevCap2: Completion Timeout: Range BC, TimeoutDis+, LTR-, OBFF Not Supported ARIFwd- DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled ARIFwd- LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis- Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS- Compliance De-emphasis: -6dB LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1- EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest- Capabilities: [80] MSI: Enable-
Re: [Update][PATCH] PCI / ACPI: Always resume devices on ACPI wakeup notifications
Sarah, please let me know if you feel the test was screwed by laptop-mode-tools kicking in, although I believed they were not running while I was on AC power. I was testing under these conditions: vostro ~ # grep . /sys/bus/pci/devices/*/power/control /sys/bus/pci/devices/:00:00.0/power/control:auto /sys/bus/pci/devices/:00:02.0/power/control:auto /sys/bus/pci/devices/:00:16.0/power/control:auto /sys/bus/pci/devices/:00:1a.0/power/control:auto /sys/bus/pci/devices/:00:1b.0/power/control:auto /sys/bus/pci/devices/:00:1c.0/power/control:auto /sys/bus/pci/devices/:00:1c.1/power/control:auto /sys/bus/pci/devices/:00:1c.3/power/control:auto /sys/bus/pci/devices/:00:1c.4/power/control:auto /sys/bus/pci/devices/:00:1c.7/power/control:auto /sys/bus/pci/devices/:00:1d.0/power/control:auto /sys/bus/pci/devices/:00:1f.0/power/control:auto /sys/bus/pci/devices/:00:1f.2/power/control:auto /sys/bus/pci/devices/:00:1f.3/power/control:auto /sys/bus/pci/devices/:05:00.0/power/control:auto /sys/bus/pci/devices/:09:00.0/power/control:auto /sys/bus/pci/devices/:0b:00.0/power/control:auto /sys/bus/pci/devices/:11:00.0/power/control:auto vostro ~ # grep . /sys/bus/pci/devices/*/power/runtime_status /sys/bus/pci/devices/:00:00.0/power/runtime_status:suspended /sys/bus/pci/devices/:00:02.0/power/runtime_status:active /sys/bus/pci/devices/:00:16.0/power/runtime_status:suspended /sys/bus/pci/devices/:00:1a.0/power/runtime_status:active /sys/bus/pci/devices/:00:1b.0/power/runtime_status:active /sys/bus/pci/devices/:00:1c.0/power/runtime_status:suspended /sys/bus/pci/devices/:00:1c.1/power/runtime_status:active /sys/bus/pci/devices/:00:1c.3/power/runtime_status:active /sys/bus/pci/devices/:00:1c.4/power/runtime_status:active /sys/bus/pci/devices/:00:1c.7/power/runtime_status:active /sys/bus/pci/devices/:00:1d.0/power/runtime_status:active /sys/bus/pci/devices/:00:1f.0/power/runtime_status:active /sys/bus/pci/devices/:00:1f.2/power/runtime_status:active /sys/bus/pci/devices/:00:1f.3/power/runtime_status:suspended /sys/bus/pci/devices/:05:00.0/power/runtime_status:active /sys/bus/pci/devices/:09:00.0/power/runtime_status:active /sys/bus/pci/devices/:0b:00.0/power/runtime_status:active /sys/bus/pci/devices/:11:00.0/power/runtime_status:active vostro ~ # My apologies if that twisted the test and thanks for you detailed explanations. I will spot below, however, a few questions. Sarah Sharp wrote: > On Fri, Mar 29, 2013 at 04:05:54PM +0100, Martin Mokrejs wrote: > >> Nevertheless, I went to check if if the USB3 socket dies after first unplug >> of device >> or not anymore thanks to the patch being tested: >> >> I plugged into the USB3.0 socket a mouse, it worked. Around its unplug I got: >> >> [ 94.954779] hub 3-0:1.0: debounce: port 2: total 100ms stable 100ms >> status 0x100 >> [ 94.954795] hub 3-0:1.0: hub_suspend >> [ 94.954802] usb usb3: bus auto-suspend, wakeup 1 >> [ 94.954817] xhci_hcd :0b:00.0: xhci_hub_status_data: stopping port >> polling. >> [ 94.954835] xhci_hcd :0b:00.0: xhci_suspend: stopping port polling. >> [ 94.954857] xhci_hcd :0b:00.0: // Setting command ring address to >> 0xd6007001 >> [ 94.954898] xhci_hcd :0b:00.0: hcd_pci_runtime_suspend: 0 >> [ 94.954983] xhci_hcd :0b:00.0: PME# enabled >> [ 169.622513] hub 2-1:1.0: state 7 ports 8 chg evt 0004 >> [ 169.623057] hub 2-1:1.0: port 2, status 0101, change 0001, 12 Mb/s >> [ 169.777012] hub 2-1:1.0: debounce: port 2: total 100ms stable 100ms >> status 0x101 >> [ 169.856992] usb 2-1.2: new low-speed USB device number 4 using ehci-pci >> >> and the port was dead, no matter what "lsusb -v or -vv" options I tried. At >> about >> [ 169.622513] I plugged the mouse into a USB2.0 socket (do not know if that >> is 1a.0 or 1d.0). > > All right, I wonder if the USB core/xHCI driver is forgetting to clear a > port status change bit after the device is unplugged. That can cause > the xHCI host to not give us a port status change event later (and thus > no PME). Looking at the logs later, it doesn't seem like we do this > though. > >> If I run lsusb -vv it does (with the problematic patch): >> >> [ 1760.414086] pcieport :00:1c.4: PME# disabled >> [ 1760.434314] xhci_hcd :0b:00.0: PME# disabled >> [ 1760.434327] xhci_hcd :0b:00.0: enabling bus mastering >> [ 1760.434338] xhci_hcd :0b:00.0: // Setting command ring address to >> 0xd6007001 >> [ 1760.434360] xhci_hcd :0b:00.0: Port Status Change Event for port 2 > > Ok, so the xHCI driver *is* getting a port status change event, and thus > must have gott
Re: [Update][PATCH] PCI / ACPI: Always resume devices on ACPI wakeup notifications
Hi, I applied this patches over 3.8.3 hoping it will fix my issue under thread: "Re: 3.8.2: xhci port is dead until pcieport PME# goes to disabled" but unfortunately, it is even worse! Now, although lsusb -v nor lsusb -vv do wakeup the XHCI port but it falls asleep immediately, more quickly than I am able to plug a device into the socket. To get a device working in the USB3 socket I need to plug it in, run lsusb -vv and then it is recognized. Without the patch, the 'lsusb -vv' woke up the port (PME# disabled happened on both 1c.4 and 0b:00.0) and I had unlimited time to find some USB device around and to plug it into the slot. I noticed this message some while after a bootup (no external USB devices were connected to the laptop, neither into USB2 socket nor into USB3.0 sockets) before I started to do the tests: [ 36.594171] xhci_hcd :0b:00.0: xhci_suspend: stopping port polling. [ 36.594202] xhci_hcd :0b:00.0: // Setting command ring address to 0xd6007001 [ 36.594247] xhci_hcd :0b:00.0: hcd_pci_runtime_suspend: 0 [ 36.594349] xhci_hcd :0b:00.0: PME# enabled [ 36.703695] r8169 :05:00.0 eth0: link down [ 37.098299] microcode: CPU0 updated to revision 0x28, date = 2012-04-24 [ 37.098941] microcode: CPU1 updated to revision 0x28, date = 2012-04-24 [ 37.098944] perf_event_intel: PEBS enabled due to microcode update [ 38.343029] r8169 :05:00.0 eth0: link up [ 39.094944] r8169 :05:00.0 eth0: link down [ 41.492768] r8169 :05:00.0 eth0: link up [ 62.782910] xhci_hcd :0b:00.0: Poll event ring: 4294943584 [ 62.782938] xhci_hcd :0b:00.0: op reg status = 0x [ 62.782939] xhci_hcd :0b:00.0: HW died, polling stopped. [ 88.754183] pcieport :00:1c.0: PME# enabled [ 88.764182] xhci_hcd :0b:00.0: PME# disabled [ 88.764192] xhci_hcd :0b:00.0: enabling bus mastering [ 88.764206] xhci_hcd :0b:00.0: // Setting command ring address to 0xd6007001 [ 88.764242] xhci_hcd :0b:00.0: Port Status Change Event for port 2 [ 88.764246] xhci_hcd :0b:00.0: resume root hub [ 88.764259] xhci_hcd :0b:00.0: handle_port_status: starting port polling. [ 88.764276] xhci_hcd :0b:00.0: xhci_resume: starting port polling. [ 88.764281] xhci_hcd :0b:00.0: hcd_pci_runtime_resume: 0 What "HW died? Why 1c.0 is here? What is this device actually doing? 00:1c.0 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 1 (rev b5) (prog-if 00 [Normal decode]) Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR- TAbort- Reset- FastB2B- PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn- Capabilities: [40] Express (v2) Root Port (Slot+), MSI 00 DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us ExtTag- RBE+ FLReset- DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported- RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop- MaxPayload 128 bytes, MaxReadReq 128 bytes DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+ TransPend- LnkCap: Port #1, Speed 5GT/s, Width x1, ASPM L0s L1, Latency L0 <1us, L1 <16us ClockPM- Surprise- LLActRep+ BwNot- LnkCtl: ASPM L1 Enabled; RCB 64 bytes Disabled- Retrain- CommClk- ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- LnkSta: Speed 2.5GT/s, Width x0, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug- Surprise- Slot #0, PowerLimit 10.000W; Interlock- NoCompl+ SltCtl: Enable: AttnBtn- PwrFlt- MRL- PresDet- CmdCplt- HPIrq- LinkChg- Control: AttnInd Unknown, PwrInd Unknown, Power- Interlock- SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet- Interlock- Changed: MRL- PresDet- LinkState- RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna- CRSVisible- RootCap: CRSVisible- RootSta: PME ReqID , PMEStatus- PMEPending- DevCap2: Completion Timeout: Range BC, TimeoutDis+, LTR-, OBFF Not Supported ARIFwd- DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled ARIFwd- LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis- Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS- Compliance De-emphasis: -6dB LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete-, EqualizationPhase1- EqualizationPhase2-, EqualizationPhase3-,
Re: [Update][PATCH] PCI / ACPI: Always resume devices on ACPI wakeup notifications
Hi, I applied this patches over 3.8.3 hoping it will fix my issue under thread: Re: 3.8.2: xhci port is dead until pcieport PME# goes to disabled but unfortunately, it is even worse! Now, although lsusb -v nor lsusb -vv do wakeup the XHCI port but it falls asleep immediately, more quickly than I am able to plug a device into the socket. To get a device working in the USB3 socket I need to plug it in, run lsusb -vv and then it is recognized. Without the patch, the 'lsusb -vv' woke up the port (PME# disabled happened on both 1c.4 and 0b:00.0) and I had unlimited time to find some USB device around and to plug it into the slot. I noticed this message some while after a bootup (no external USB devices were connected to the laptop, neither into USB2 socket nor into USB3.0 sockets) before I started to do the tests: [ 36.594171] xhci_hcd :0b:00.0: xhci_suspend: stopping port polling. [ 36.594202] xhci_hcd :0b:00.0: // Setting command ring address to 0xd6007001 [ 36.594247] xhci_hcd :0b:00.0: hcd_pci_runtime_suspend: 0 [ 36.594349] xhci_hcd :0b:00.0: PME# enabled [ 36.703695] r8169 :05:00.0 eth0: link down [ 37.098299] microcode: CPU0 updated to revision 0x28, date = 2012-04-24 [ 37.098941] microcode: CPU1 updated to revision 0x28, date = 2012-04-24 [ 37.098944] perf_event_intel: PEBS enabled due to microcode update [ 38.343029] r8169 :05:00.0 eth0: link up [ 39.094944] r8169 :05:00.0 eth0: link down [ 41.492768] r8169 :05:00.0 eth0: link up [ 62.782910] xhci_hcd :0b:00.0: Poll event ring: 4294943584 [ 62.782938] xhci_hcd :0b:00.0: op reg status = 0x [ 62.782939] xhci_hcd :0b:00.0: HW died, polling stopped. [ 88.754183] pcieport :00:1c.0: PME# enabled [ 88.764182] xhci_hcd :0b:00.0: PME# disabled [ 88.764192] xhci_hcd :0b:00.0: enabling bus mastering [ 88.764206] xhci_hcd :0b:00.0: // Setting command ring address to 0xd6007001 [ 88.764242] xhci_hcd :0b:00.0: Port Status Change Event for port 2 [ 88.764246] xhci_hcd :0b:00.0: resume root hub [ 88.764259] xhci_hcd :0b:00.0: handle_port_status: starting port polling. [ 88.764276] xhci_hcd :0b:00.0: xhci_resume: starting port polling. [ 88.764281] xhci_hcd :0b:00.0: hcd_pci_runtime_resume: 0 What HW died? Why 1c.0 is here? What is this device actually doing? 00:1c.0 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 1 (rev b5) (prog-if 00 [Normal decode]) Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast TAbort- TAbort- MAbort- SERR- PERR- INTx- Latency: 0, Cache Line Size: 64 bytes Bus: primary=00, secondary=03, subordinate=04, sec-latency=0 I/O behind bridge: f000-0fff Memory behind bridge: fff0-000f Prefetchable memory behind bridge: fff0-000f Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast TAbort- TAbort- MAbort+ SERR- PERR- BridgeCtl: Parity- SERR- NoISA- VGA- MAbort- Reset- FastB2B- PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn- Capabilities: [40] Express (v2) Root Port (Slot+), MSI 00 DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s 64ns, L1 1us ExtTag- RBE+ FLReset- DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported- RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop- MaxPayload 128 bytes, MaxReadReq 128 bytes DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+ TransPend- LnkCap: Port #1, Speed 5GT/s, Width x1, ASPM L0s L1, Latency L0 1us, L1 16us ClockPM- Surprise- LLActRep+ BwNot- LnkCtl: ASPM L1 Enabled; RCB 64 bytes Disabled- Retrain- CommClk- ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- LnkSta: Speed 2.5GT/s, Width x0, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug- Surprise- Slot #0, PowerLimit 10.000W; Interlock- NoCompl+ SltCtl: Enable: AttnBtn- PwrFlt- MRL- PresDet- CmdCplt- HPIrq- LinkChg- Control: AttnInd Unknown, PwrInd Unknown, Power- Interlock- SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet- Interlock- Changed: MRL- PresDet- LinkState- RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna- CRSVisible- RootCap: CRSVisible- RootSta: PME ReqID , PMEStatus- PMEPending- DevCap2: Completion Timeout: Range BC, TimeoutDis+, LTR-, OBFF Not Supported ARIFwd- DevCtl2: Completion Timeout: 50us to 50ms,
Re: [Update][PATCH] PCI / ACPI: Always resume devices on ACPI wakeup notifications
Sarah, please let me know if you feel the test was screwed by laptop-mode-tools kicking in, although I believed they were not running while I was on AC power. I was testing under these conditions: vostro ~ # grep . /sys/bus/pci/devices/*/power/control /sys/bus/pci/devices/:00:00.0/power/control:auto /sys/bus/pci/devices/:00:02.0/power/control:auto /sys/bus/pci/devices/:00:16.0/power/control:auto /sys/bus/pci/devices/:00:1a.0/power/control:auto /sys/bus/pci/devices/:00:1b.0/power/control:auto /sys/bus/pci/devices/:00:1c.0/power/control:auto /sys/bus/pci/devices/:00:1c.1/power/control:auto /sys/bus/pci/devices/:00:1c.3/power/control:auto /sys/bus/pci/devices/:00:1c.4/power/control:auto /sys/bus/pci/devices/:00:1c.7/power/control:auto /sys/bus/pci/devices/:00:1d.0/power/control:auto /sys/bus/pci/devices/:00:1f.0/power/control:auto /sys/bus/pci/devices/:00:1f.2/power/control:auto /sys/bus/pci/devices/:00:1f.3/power/control:auto /sys/bus/pci/devices/:05:00.0/power/control:auto /sys/bus/pci/devices/:09:00.0/power/control:auto /sys/bus/pci/devices/:0b:00.0/power/control:auto /sys/bus/pci/devices/:11:00.0/power/control:auto vostro ~ # grep . /sys/bus/pci/devices/*/power/runtime_status /sys/bus/pci/devices/:00:00.0/power/runtime_status:suspended /sys/bus/pci/devices/:00:02.0/power/runtime_status:active /sys/bus/pci/devices/:00:16.0/power/runtime_status:suspended /sys/bus/pci/devices/:00:1a.0/power/runtime_status:active /sys/bus/pci/devices/:00:1b.0/power/runtime_status:active /sys/bus/pci/devices/:00:1c.0/power/runtime_status:suspended /sys/bus/pci/devices/:00:1c.1/power/runtime_status:active /sys/bus/pci/devices/:00:1c.3/power/runtime_status:active /sys/bus/pci/devices/:00:1c.4/power/runtime_status:active /sys/bus/pci/devices/:00:1c.7/power/runtime_status:active /sys/bus/pci/devices/:00:1d.0/power/runtime_status:active /sys/bus/pci/devices/:00:1f.0/power/runtime_status:active /sys/bus/pci/devices/:00:1f.2/power/runtime_status:active /sys/bus/pci/devices/:00:1f.3/power/runtime_status:suspended /sys/bus/pci/devices/:05:00.0/power/runtime_status:active /sys/bus/pci/devices/:09:00.0/power/runtime_status:active /sys/bus/pci/devices/:0b:00.0/power/runtime_status:active /sys/bus/pci/devices/:11:00.0/power/runtime_status:active vostro ~ # My apologies if that twisted the test and thanks for you detailed explanations. I will spot below, however, a few questions. Sarah Sharp wrote: On Fri, Mar 29, 2013 at 04:05:54PM +0100, Martin Mokrejs wrote: Nevertheless, I went to check if if the USB3 socket dies after first unplug of device or not anymore thanks to the patch being tested: I plugged into the USB3.0 socket a mouse, it worked. Around its unplug I got: [ 94.954779] hub 3-0:1.0: debounce: port 2: total 100ms stable 100ms status 0x100 [ 94.954795] hub 3-0:1.0: hub_suspend [ 94.954802] usb usb3: bus auto-suspend, wakeup 1 [ 94.954817] xhci_hcd :0b:00.0: xhci_hub_status_data: stopping port polling. [ 94.954835] xhci_hcd :0b:00.0: xhci_suspend: stopping port polling. [ 94.954857] xhci_hcd :0b:00.0: // Setting command ring address to 0xd6007001 [ 94.954898] xhci_hcd :0b:00.0: hcd_pci_runtime_suspend: 0 [ 94.954983] xhci_hcd :0b:00.0: PME# enabled [ 169.622513] hub 2-1:1.0: state 7 ports 8 chg evt 0004 [ 169.623057] hub 2-1:1.0: port 2, status 0101, change 0001, 12 Mb/s [ 169.777012] hub 2-1:1.0: debounce: port 2: total 100ms stable 100ms status 0x101 [ 169.856992] usb 2-1.2: new low-speed USB device number 4 using ehci-pci and the port was dead, no matter what lsusb -v or -vv options I tried. At about [ 169.622513] I plugged the mouse into a USB2.0 socket (do not know if that is 1a.0 or 1d.0). All right, I wonder if the USB core/xHCI driver is forgetting to clear a port status change bit after the device is unplugged. That can cause the xHCI host to not give us a port status change event later (and thus no PME). Looking at the logs later, it doesn't seem like we do this though. If I run lsusb -vv it does (with the problematic patch): [ 1760.414086] pcieport :00:1c.4: PME# disabled [ 1760.434314] xhci_hcd :0b:00.0: PME# disabled [ 1760.434327] xhci_hcd :0b:00.0: enabling bus mastering [ 1760.434338] xhci_hcd :0b:00.0: // Setting command ring address to 0xd6007001 [ 1760.434360] xhci_hcd :0b:00.0: Port Status Change Event for port 2 Ok, so the xHCI driver *is* getting a port status change event, and thus must have gotten a PME. So the PCI layer is doing its job. [ 1760.434363] xhci_hcd :0b:00.0: resume root hub [ 1760.434367] xhci_hcd :0b:00.0: handle_port_status: starting port polling. [ 1760.434378] xhci_hcd :0b:00.0: xhci_resume: starting port polling. [ 1760.434383] xhci_hcd :0b:00.0: hcd_pci_runtime_resume: 0 [ 1760.434388
Re: [Update][PATCH] PCI / ACPI: Always resume devices on ACPI wakeup notifications
So, I re-tested again with the patch and 3.8.3 but without laptop-mode-tools. The xHCI port works fine provided /sys/bus/pci/devices/:0b:00.0/power/control is set to on and /sys/bus/pci/devices/:00:1c.4/power/control also to on. If I set parent 1c.4 to auto, it gets suspended and the port seems dead until a device is in and I wake it using lsusb -vv. There must be a bug in linux so that it cannot overcome upstream 1c.4 sleeping while willing to access 0b:00. Or more likely, that upstream root port should be prevented to fall asleep, right? # lspci -tv -[:00]-+-00.0 Intel Corporation 2nd Generation Core Processor Family DRAM Controller +-02.0 Intel Corporation 2nd Generation Core Processor Family Integrated Graphics Controller +-16.0 Intel Corporation 6 Series/C200 Series Chipset Family MEI Controller #1 +-1a.0 Intel Corporation 6 Series/C200 Series Chipset Family USB Enhanced Host Controller #2 +-1b.0 Intel Corporation 6 Series/C200 Series Chipset Family High Definition Audio Controller +-1c.0-[03-04]-- +-1c.1-[05-06]00.0 Realtek Semiconductor Co., Ltd. RTL8111/8168 PCI Express Gigabit Ethernet controller +-1c.3-[09-0a]00.0 Intel Corporation Centrino Wireless-N 1030 [Rainbow Peak] +-1c.4-[0b-0c]00.0 Texas Instruments TUSB73x0 SuperSpeed USB 3.0 xHCI Host Controller +-1c.7-[11-16]00.0 Silicon Image, Inc. SiI 3132 Serial ATA Raid II Controller +-1d.0 Intel Corporation 6 Series/C200 Series Chipset Family USB Enhanced Host Controller #1 +-1f.0 Intel Corporation HM67 Express Chipset Family LPC Controller +-1f.2 Intel Corporation 6 Series/C200 Series Chipset Family 6 port SATA AHCI Controller \-1f.3 Intel Corporation 6 Series/C200 Series Chipset Family SMBus Controller # I have attached the lspci -vvv -n. Interestingly, maybe, the TI xHCI controller ended up after my tests changed. I booted up with all devices with power/control set to on due to laptop-mode-tools uninstalled. I fiddled with the echo commands tweaking 1c.4 and 0b:00 but in the end set both back to on. However, below is some diff. Don't know what that means. Maybe because I tried to write '0', 'off', 'none' to the control file? ;-) 00:1c.4 0604: 8086:1c18 (rev b5) (prog-if 00 [Normal decode]) Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast TAbort- TAbort- MAbort- SERR- PERR- INTx- Latency: 0, Cache Line Size: 64 bytes Bus: primary=00, secondary=0b, subordinate=0c, sec-latency=0 I/O behind bridge: f000-0fff Memory behind bridge: f7d0-f7df Prefetchable memory behind bridge: fff0-000f Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast TAbort- TAbort- MAbort+ SERR- PERR- BridgeCtl: Parity- SERR- NoISA- VGA- MAbort- Reset- FastB2B- PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn- Capabilities: [40] Express (v2) Root Port (Slot+), MSI 00 DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s 64ns, L1 1us ExtTag- RBE+ FLReset- DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported- RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop- MaxPayload 128 bytes, MaxReadReq 128 bytes DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+ TransPend- LnkCap: Port #5, Speed 5GT/s, Width x1, ASPM L0s L1, Latency L0 512ns, L1 16us ClockPM- Surprise- LLActRep+ BwNot- LnkCtl: ASPM L1 Enabled; RCB 64 bytes Disabled- Retrain- CommClk+ ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- LnkSta: Speed 5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive+ BWMgmt+ ABWMgmt+ SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug- Surprise- Slot #4, PowerLimit 10.000W; Interlock- NoCompl+ SltCtl: Enable: AttnBtn- PwrFlt- MRL- PresDet- CmdCplt- HPIrq- LinkChg- Control: AttnInd Unknown, PwrInd Unknown, Power- Interlock- SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet+ Interlock- Changed: MRL- PresDet- LinkState+ RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna- CRSVisible- RootCap: CRSVisible- - RootSta: PME ReqID , PMEStatus- PMEPending- + RootSta: PME ReqID 0b00, PMEStatus- PMEPending- DevCap2: Completion Timeout: Range BC, TimeoutDis+, LTR-, OBFF Not Supported ARIFwd- DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled ARIFwd-
Re: 3.9-rc3+: reports battery as 0 mWh capacity on thinkpad x60
Pavel Machek wrote: > Hi! > >> pavel@amd:~$ cat /proc/acpi/battery/BAT0/info >> present: yes >> design capacity: 0 mWh >> last full capacity: 0 mWh >> battery technology: rechargeable >> design voltage: 14400 mV >> >> This worked before... at least it works in 2.6 kernel used by debian. > > This works for me in 3.9-rc3. May I see your .config? > ... >>> But problem is not in /proc, /sys has zeros, too. >>> >>> pavel@amd:~$ cat /sys/class/power_supply/BAT0/energy_full >>> 0 >>> pavel@amd:~$ cat /sys/class/power_supply/BAT0/energy_full_design >>> 0 >>> pavel@amd:~$ cat /sys/class/power_supply/BAT0/model_name >>> 93P5030 >>> pavel@amd:~$ >> >> Can you narrow the time frame when it stopped working a bit? > > Well, 2.6.32 from debian works ok, and self-compiled 3.1+ kernel also > seems to work ok. > > I'm not sure if 3.7+ kernels worked, actually... I'd have to do some > compiling to check. FYI, on 3.7.10 I don't have the above files. See below what I do have: # for f in /sys/class/power_supply/BAT0/*; do echo $f; cat $f; done /sys/class/power_supply/BAT0/alarm 0 /sys/class/power_supply/BAT0/capacity 106 /sys/class/power_supply/BAT0/charge_full 4126000 /sys/class/power_supply/BAT0/charge_full_design 440 /sys/class/power_supply/BAT0/charge_now 440 /sys/class/power_supply/BAT0/current_now 1000 /sys/class/power_supply/BAT0/cycle_count 0 /sys/class/power_supply/BAT0/device cat: /sys/class/power_supply/BAT0/device: Is a directory /sys/class/power_supply/BAT0/manufacturer SMP /sys/class/power_supply/BAT0/model_name DELL 8NH551B /sys/class/power_supply/BAT0/power cat: /sys/class/power_supply/BAT0/power: Is a directory /sys/class/power_supply/BAT0/present 1 /sys/class/power_supply/BAT0/serial_number 2630 /sys/class/power_supply/BAT0/status Full /sys/class/power_supply/BAT0/subsystem cat: /sys/class/power_supply/BAT0/subsystem: Is a directory /sys/class/power_supply/BAT0/technology Li-ion /sys/class/power_supply/BAT0/type Battery /sys/class/power_supply/BAT0/uevent POWER_SUPPLY_NAME=BAT0 POWER_SUPPLY_STATUS=Full POWER_SUPPLY_PRESENT=1 POWER_SUPPLY_TECHNOLOGY=Li-ion POWER_SUPPLY_CYCLE_COUNT=0 POWER_SUPPLY_VOLTAGE_MIN_DESIGN=1110 POWER_SUPPLY_VOLTAGE_NOW=12294000 POWER_SUPPLY_CURRENT_NOW=1000 POWER_SUPPLY_CHARGE_FULL_DESIGN=440 POWER_SUPPLY_CHARGE_FULL=4126000 POWER_SUPPLY_CHARGE_NOW=440 POWER_SUPPLY_CAPACITY=106 POWER_SUPPLY_MODEL_NAME=DELL 8NH551B POWER_SUPPLY_MANUFACTURER=SMP POWER_SUPPLY_SERIAL_NUMBER= 2630 /sys/class/power_supply/BAT0/voltage_min_design 1110 /sys/class/power_supply/BAT0/voltage_now 12294000 # -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 3.9-rc3+: reports battery as 0 mWh capacity on thinkpad x60
Pavel Machek wrote: Hi! pavel@amd:~$ cat /proc/acpi/battery/BAT0/info present: yes design capacity: 0 mWh last full capacity: 0 mWh battery technology: rechargeable design voltage: 14400 mV This worked before... at least it works in 2.6 kernel used by debian. This works for me in 3.9-rc3. May I see your .config? ... But problem is not in /proc, /sys has zeros, too. pavel@amd:~$ cat /sys/class/power_supply/BAT0/energy_full 0 pavel@amd:~$ cat /sys/class/power_supply/BAT0/energy_full_design 0 pavel@amd:~$ cat /sys/class/power_supply/BAT0/model_name 93P5030 pavel@amd:~$ Can you narrow the time frame when it stopped working a bit? Well, 2.6.32 from debian works ok, and self-compiled 3.1+ kernel also seems to work ok. I'm not sure if 3.7+ kernels worked, actually... I'd have to do some compiling to check. FYI, on 3.7.10 I don't have the above files. See below what I do have: # for f in /sys/class/power_supply/BAT0/*; do echo $f; cat $f; done /sys/class/power_supply/BAT0/alarm 0 /sys/class/power_supply/BAT0/capacity 106 /sys/class/power_supply/BAT0/charge_full 4126000 /sys/class/power_supply/BAT0/charge_full_design 440 /sys/class/power_supply/BAT0/charge_now 440 /sys/class/power_supply/BAT0/current_now 1000 /sys/class/power_supply/BAT0/cycle_count 0 /sys/class/power_supply/BAT0/device cat: /sys/class/power_supply/BAT0/device: Is a directory /sys/class/power_supply/BAT0/manufacturer SMP /sys/class/power_supply/BAT0/model_name DELL 8NH551B /sys/class/power_supply/BAT0/power cat: /sys/class/power_supply/BAT0/power: Is a directory /sys/class/power_supply/BAT0/present 1 /sys/class/power_supply/BAT0/serial_number 2630 /sys/class/power_supply/BAT0/status Full /sys/class/power_supply/BAT0/subsystem cat: /sys/class/power_supply/BAT0/subsystem: Is a directory /sys/class/power_supply/BAT0/technology Li-ion /sys/class/power_supply/BAT0/type Battery /sys/class/power_supply/BAT0/uevent POWER_SUPPLY_NAME=BAT0 POWER_SUPPLY_STATUS=Full POWER_SUPPLY_PRESENT=1 POWER_SUPPLY_TECHNOLOGY=Li-ion POWER_SUPPLY_CYCLE_COUNT=0 POWER_SUPPLY_VOLTAGE_MIN_DESIGN=1110 POWER_SUPPLY_VOLTAGE_NOW=12294000 POWER_SUPPLY_CURRENT_NOW=1000 POWER_SUPPLY_CHARGE_FULL_DESIGN=440 POWER_SUPPLY_CHARGE_FULL=4126000 POWER_SUPPLY_CHARGE_NOW=440 POWER_SUPPLY_CAPACITY=106 POWER_SUPPLY_MODEL_NAME=DELL 8NH551B POWER_SUPPLY_MANUFACTURER=SMP POWER_SUPPLY_SERIAL_NUMBER= 2630 /sys/class/power_supply/BAT0/voltage_min_design 1110 /sys/class/power_supply/BAT0/voltage_now 12294000 # -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] pciehp: Add pciehp_surprise module option
Martin Mokrejs wrote: > Hi Takashi, > would you please describe your test system in more detail? How > about 'lspci -tv'? And 'lsusb -v' of the broken device? > > 1. For me on Dell Vostro 3550 with a SandyBridge chip doing all > SATA+USB2+ExpressCardSlot: > > 00:00.0 Host bridge: Intel Corporation 2nd Generation Core Processor Family > DRAM Controller (rev 09) > 00:02.0 VGA compatible controller: Intel Corporation 2nd Generation Core > Processor Family Integrated Graphics Controller (rev 09) (prog-if 00 [VGA > controller]) > 00:16.0 Communication controller: Intel Corporation 6 Series/C200 Series > Chipset Family MEI Controller #1 (rev 04) > 00:1a.0 USB controller: Intel Corporation 6 Series/C200 Series Chipset Family > USB Enhanced Host Controller #2 (rev 05) (prog-if 20 [EHCI]) > 00:1b.0 Audio device: Intel Corporation 6 Series/C200 Series Chipset Family > High Definition Audio Controller (rev 05) > 00:1c.0 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI > Express Root Port 1 (rev b5) (prog-if 00 [Normal decode]) > 00:1c.1 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI > Express Root Port 2 (rev b5) (prog-if 00 [Normal decode]) > 00:1c.3 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI > Express Root Port 4 (rev b5) (prog-if 00 [Normal decode]) > 00:1c.4 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI > Express Root Port 5 (rev b5) (prog-if 00 [Normal decode]) > 00:1c.7 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI > Express Root Port 8 (rev b5) (prog-if 00 [Normal decode]) > 00:1d.0 USB controller: Intel Corporation 6 Series/C200 Series Chipset Family > USB Enhanced Host Controller #1 (rev 05) (prog-if 20 [EHCI]) > 00:1f.0 ISA bridge: Intel Corporation HM67 Express Chipset Family LPC > Controller (rev 05) > 00:1f.2 SATA controller: Intel Corporation 6 Series/C200 Series Chipset > Family 6 port SATA AHCI Controller (rev 05) (prog-if 01 [AHCI 1.0]) > 00:1f.3 SMBus: Intel Corporation 6 Series/C200 Series Chipset Family SMBus > Controller (rev 05) > 05:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168 PCI > Express Gigabit Ethernet controller (rev 06) > 09:00.0 Network controller: Intel Corporation Centrino Wireless-N 1030 > [Rainbow Peak] (rev 34) > 0b:00.0 USB controller: Texas Instruments TUSB73x0 SuperSpeed USB 3.0 xHCI > Host Controller (rev 02) (prog-if 30 [XHCI]) > 11:00.0 Mass storage controller: Silicon Image, Inc. SiI 3132 Serial ATA Raid > II Controller (rev 01) > # > > If I have Realtek MediaCardReader enabled in BIOS, no card in it, coldboot, > and hot > insert an ExpressCard into the slot, the Realtek MediaCardReader pops up in > dmesg as > a new PCI device. How about you? Err, not PCI device as I said, sorry, but gets re-detected as a USB device: [4.220009] hub 2-1:1.0: port 6, status 0101, change , 12 Mb/s [4.291831] usb 2-1.6: new high-speed USB device number 5 using ehci_hcd [4.409353] usb 2-1.6: default language 0x0409 [4.414740] usb 2-1.6: udev 5, busnum 2, minor = 132 [4.414745] usb 2-1.6: New USB device found, idVendor=0bda, idProduct=0138 [4.414858] usb 2-1.6: New USB device strings: Mfr=1, Product=2, SerialNumber=3 [4.414967] usb 2-1.6: Product: USB2.0-CRW [4.415069] usb 2-1.6: Manufacturer: Generic [4.415172] usb 2-1.6: SerialNumber: 2009051638820 [4.416956] usb 2-1.6: usb_probe_device [4.416962] usb 2-1.6: configuration #1 chosen from 1 choice [4.419477] usb 2-1.6: adding 2-1.6:1.0 (config #1, interface 0) [4.424094] usb-storage 2-1.6:1.0: usb_probe_interface [4.424103] usb-storage 2-1.6:1.0: usb_probe_interface - got id [4.424276] ums-realtek 2-1.6:1.0: usb_probe_interface [4.424279] ums-realtek 2-1.6:1.0: usb_probe_interface - got id [4.440838] scsi6 : usb-storage 2-1.6:1.0 cut [ 222.748820] pci :11:00.0: [1095:3132] type 00 class 0x018000 [ 222.748865] pci :11:00.0: reg 10: [mem 0x-0x007f 64bit] [ 222.748898] pci :11:00.0: reg 18: [mem 0x-0x3fff 64bit] [ 222.748919] pci :11:00.0: reg 20: [io 0x-0x007f] [ 222.748960] pci :11:00.0: reg 30: [mem 0x-0x0007 pref] [ 222.749095] pci :11:00.0: supports D1 D2 [ 222.769438] pci :11:00.0: BAR 6: assigned [mem 0xf000-0xf007 pref] [ 222.769442] pci :11:00.0: BAR 2: assigned [mem 0xf6c0-0xf6c03fff 64bit] [ 222.769464] pci :11:00.0: BAR 2: set to [mem 0xf6c0-0xf6c03fff 64bit] (PCI address [0xf6c0-0xf6c03fff]) [ 222.769466] pci :11:00.0: BAR 0: assigned [mem 0xf6c04000-0xf6c0407f 64bit] [ 222.769487] pci :11:00.0: BAR 0: set to [mem 0xf6c04000-0xf6c0407f 64bit] (PCI address [0xf6c04000-0xf6c0407f]) [ 222.769489] pci :11:00.0: BAR 4:
Re: [PATCH] pciehp: Add pciehp_surprise module option
Hi Takashi, would you please describe your test system in more detail? How about 'lspci -tv'? And 'lsusb -v' of the broken device? 1. For me on Dell Vostro 3550 with a SandyBridge chip doing all SATA+USB2+ExpressCardSlot: 00:00.0 Host bridge: Intel Corporation 2nd Generation Core Processor Family DRAM Controller (rev 09) 00:02.0 VGA compatible controller: Intel Corporation 2nd Generation Core Processor Family Integrated Graphics Controller (rev 09) (prog-if 00 [VGA controller]) 00:16.0 Communication controller: Intel Corporation 6 Series/C200 Series Chipset Family MEI Controller #1 (rev 04) 00:1a.0 USB controller: Intel Corporation 6 Series/C200 Series Chipset Family USB Enhanced Host Controller #2 (rev 05) (prog-if 20 [EHCI]) 00:1b.0 Audio device: Intel Corporation 6 Series/C200 Series Chipset Family High Definition Audio Controller (rev 05) 00:1c.0 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 1 (rev b5) (prog-if 00 [Normal decode]) 00:1c.1 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 2 (rev b5) (prog-if 00 [Normal decode]) 00:1c.3 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 4 (rev b5) (prog-if 00 [Normal decode]) 00:1c.4 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 5 (rev b5) (prog-if 00 [Normal decode]) 00:1c.7 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 8 (rev b5) (prog-if 00 [Normal decode]) 00:1d.0 USB controller: Intel Corporation 6 Series/C200 Series Chipset Family USB Enhanced Host Controller #1 (rev 05) (prog-if 20 [EHCI]) 00:1f.0 ISA bridge: Intel Corporation HM67 Express Chipset Family LPC Controller (rev 05) 00:1f.2 SATA controller: Intel Corporation 6 Series/C200 Series Chipset Family 6 port SATA AHCI Controller (rev 05) (prog-if 01 [AHCI 1.0]) 00:1f.3 SMBus: Intel Corporation 6 Series/C200 Series Chipset Family SMBus Controller (rev 05) 05:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168 PCI Express Gigabit Ethernet controller (rev 06) 09:00.0 Network controller: Intel Corporation Centrino Wireless-N 1030 [Rainbow Peak] (rev 34) 0b:00.0 USB controller: Texas Instruments TUSB73x0 SuperSpeed USB 3.0 xHCI Host Controller (rev 02) (prog-if 30 [XHCI]) 11:00.0 Mass storage controller: Silicon Image, Inc. SiI 3132 Serial ATA Raid II Controller (rev 01) # If I have Realtek MediaCardReader enabled in BIOS, no card in it, coldboot, and hot insert an ExpressCard into the slot, the Realtek MediaCardReader pops up in dmesg as a new PCI device. How about you? My card does NOT show in lspci (maybe because I never plugged in a data card into it) but does show in lsusb: Bus 002 Device 005: ID 0bda:0138 Realtek Semiconductor Corp. RTS5138 Card Reader Controller Device Descriptor: bLength18 bDescriptorType 1 bcdUSB 2.00 bDeviceClass0 (Defined at Interface level) bDeviceSubClass 0 bDeviceProtocol 0 bMaxPacketSize064 idVendor 0x0bda Realtek Semiconductor Corp. idProduct 0x0138 RTS5138 Card Reader Controller bcdDevice 38.82 iManufacturer 1 Generic iProduct2 USB2.0-CRW iSerial 3 2009051638820 Can you try coldboot without a media card inserted before power up without your patch and check whether the CardReader pops up after you plugin some ExpressCard into an ExpressCardSlot (not the CardReader)? I presume it is a laptop. ;-) 2. Is the hotplug broken also under acpiphp? And again, does it get detected once you plugin some card into an ExpressCard slot? 3. Does the device appear under lsusb also in addition to lspci? 4. How does the 'lack of the hotplug surprise (PCI_EXP_SLTCAP_HPS) capability bit' manifest in 'lspci -vvv' output? A diff before and after the patch? 5. Where is the *real* bug in the code that "linux" ignores the fact that one of the PCIe Root Ports (or the whole PCI Bridge?) does not support 'hotplug surprise'? Or is this about the hooked up "third-party" PCI devices? Why does it affect other PCIe ports of the bridge? Would be nice if you look into any of my previous emails to linux-pci and with your current knowledge comment whether here or there I faced a same problem. Looks like. Disabling the hotplug is a no go for me, I need hotplug for my ExpressCards. So far am rather having disabled the MediaCardReader in BIOS. But thank you, I did not know that inserting a data card into a CardReader is supposed to give me a lspci entry for it. So far I saw only the one in lsusb. Thank you, Martin Takashi Iwai wrote: > We encountered a problem that on some HP machines the Realtek PCI-e > card reader device appears only when you inserted a card before the > cold boot. While debugging, it turned out that the device is actually > handled via PCI-e hotplug in
Re: [PATCH] pciehp: Add pciehp_surprise module option
Hi Takashi, would you please describe your test system in more detail? How about 'lspci -tv'? And 'lsusb -v' of the broken device? 1. For me on Dell Vostro 3550 with a SandyBridge chip doing all SATA+USB2+ExpressCardSlot: 00:00.0 Host bridge: Intel Corporation 2nd Generation Core Processor Family DRAM Controller (rev 09) 00:02.0 VGA compatible controller: Intel Corporation 2nd Generation Core Processor Family Integrated Graphics Controller (rev 09) (prog-if 00 [VGA controller]) 00:16.0 Communication controller: Intel Corporation 6 Series/C200 Series Chipset Family MEI Controller #1 (rev 04) 00:1a.0 USB controller: Intel Corporation 6 Series/C200 Series Chipset Family USB Enhanced Host Controller #2 (rev 05) (prog-if 20 [EHCI]) 00:1b.0 Audio device: Intel Corporation 6 Series/C200 Series Chipset Family High Definition Audio Controller (rev 05) 00:1c.0 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 1 (rev b5) (prog-if 00 [Normal decode]) 00:1c.1 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 2 (rev b5) (prog-if 00 [Normal decode]) 00:1c.3 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 4 (rev b5) (prog-if 00 [Normal decode]) 00:1c.4 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 5 (rev b5) (prog-if 00 [Normal decode]) 00:1c.7 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 8 (rev b5) (prog-if 00 [Normal decode]) 00:1d.0 USB controller: Intel Corporation 6 Series/C200 Series Chipset Family USB Enhanced Host Controller #1 (rev 05) (prog-if 20 [EHCI]) 00:1f.0 ISA bridge: Intel Corporation HM67 Express Chipset Family LPC Controller (rev 05) 00:1f.2 SATA controller: Intel Corporation 6 Series/C200 Series Chipset Family 6 port SATA AHCI Controller (rev 05) (prog-if 01 [AHCI 1.0]) 00:1f.3 SMBus: Intel Corporation 6 Series/C200 Series Chipset Family SMBus Controller (rev 05) 05:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168 PCI Express Gigabit Ethernet controller (rev 06) 09:00.0 Network controller: Intel Corporation Centrino Wireless-N 1030 [Rainbow Peak] (rev 34) 0b:00.0 USB controller: Texas Instruments TUSB73x0 SuperSpeed USB 3.0 xHCI Host Controller (rev 02) (prog-if 30 [XHCI]) 11:00.0 Mass storage controller: Silicon Image, Inc. SiI 3132 Serial ATA Raid II Controller (rev 01) # If I have Realtek MediaCardReader enabled in BIOS, no card in it, coldboot, and hot insert an ExpressCard into the slot, the Realtek MediaCardReader pops up in dmesg as a new PCI device. How about you? My card does NOT show in lspci (maybe because I never plugged in a data card into it) but does show in lsusb: Bus 002 Device 005: ID 0bda:0138 Realtek Semiconductor Corp. RTS5138 Card Reader Controller Device Descriptor: bLength18 bDescriptorType 1 bcdUSB 2.00 bDeviceClass0 (Defined at Interface level) bDeviceSubClass 0 bDeviceProtocol 0 bMaxPacketSize064 idVendor 0x0bda Realtek Semiconductor Corp. idProduct 0x0138 RTS5138 Card Reader Controller bcdDevice 38.82 iManufacturer 1 Generic iProduct2 USB2.0-CRW iSerial 3 2009051638820 Can you try coldboot without a media card inserted before power up without your patch and check whether the CardReader pops up after you plugin some ExpressCard into an ExpressCardSlot (not the CardReader)? I presume it is a laptop. ;-) 2. Is the hotplug broken also under acpiphp? And again, does it get detected once you plugin some card into an ExpressCard slot? 3. Does the device appear under lsusb also in addition to lspci? 4. How does the 'lack of the hotplug surprise (PCI_EXP_SLTCAP_HPS) capability bit' manifest in 'lspci -vvv' output? A diff before and after the patch? 5. Where is the *real* bug in the code that linux ignores the fact that one of the PCIe Root Ports (or the whole PCI Bridge?) does not support 'hotplug surprise'? Or is this about the hooked up third-party PCI devices? Why does it affect other PCIe ports of the bridge? Would be nice if you look into any of my previous emails to linux-pci and with your current knowledge comment whether here or there I faced a same problem. Looks like. Disabling the hotplug is a no go for me, I need hotplug for my ExpressCards. So far am rather having disabled the MediaCardReader in BIOS. But thank you, I did not know that inserting a data card into a CardReader is supposed to give me a lspci entry for it. So far I saw only the one in lsusb. Thank you, Martin Takashi Iwai wrote: We encountered a problem that on some HP machines the Realtek PCI-e card reader device appears only when you inserted a card before the cold boot. While debugging, it turned out that the device is actually handled via PCI-e hotplug in some
Re: [PATCH] pciehp: Add pciehp_surprise module option
Martin Mokrejs wrote: Hi Takashi, would you please describe your test system in more detail? How about 'lspci -tv'? And 'lsusb -v' of the broken device? 1. For me on Dell Vostro 3550 with a SandyBridge chip doing all SATA+USB2+ExpressCardSlot: 00:00.0 Host bridge: Intel Corporation 2nd Generation Core Processor Family DRAM Controller (rev 09) 00:02.0 VGA compatible controller: Intel Corporation 2nd Generation Core Processor Family Integrated Graphics Controller (rev 09) (prog-if 00 [VGA controller]) 00:16.0 Communication controller: Intel Corporation 6 Series/C200 Series Chipset Family MEI Controller #1 (rev 04) 00:1a.0 USB controller: Intel Corporation 6 Series/C200 Series Chipset Family USB Enhanced Host Controller #2 (rev 05) (prog-if 20 [EHCI]) 00:1b.0 Audio device: Intel Corporation 6 Series/C200 Series Chipset Family High Definition Audio Controller (rev 05) 00:1c.0 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 1 (rev b5) (prog-if 00 [Normal decode]) 00:1c.1 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 2 (rev b5) (prog-if 00 [Normal decode]) 00:1c.3 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 4 (rev b5) (prog-if 00 [Normal decode]) 00:1c.4 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 5 (rev b5) (prog-if 00 [Normal decode]) 00:1c.7 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 8 (rev b5) (prog-if 00 [Normal decode]) 00:1d.0 USB controller: Intel Corporation 6 Series/C200 Series Chipset Family USB Enhanced Host Controller #1 (rev 05) (prog-if 20 [EHCI]) 00:1f.0 ISA bridge: Intel Corporation HM67 Express Chipset Family LPC Controller (rev 05) 00:1f.2 SATA controller: Intel Corporation 6 Series/C200 Series Chipset Family 6 port SATA AHCI Controller (rev 05) (prog-if 01 [AHCI 1.0]) 00:1f.3 SMBus: Intel Corporation 6 Series/C200 Series Chipset Family SMBus Controller (rev 05) 05:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168 PCI Express Gigabit Ethernet controller (rev 06) 09:00.0 Network controller: Intel Corporation Centrino Wireless-N 1030 [Rainbow Peak] (rev 34) 0b:00.0 USB controller: Texas Instruments TUSB73x0 SuperSpeed USB 3.0 xHCI Host Controller (rev 02) (prog-if 30 [XHCI]) 11:00.0 Mass storage controller: Silicon Image, Inc. SiI 3132 Serial ATA Raid II Controller (rev 01) # If I have Realtek MediaCardReader enabled in BIOS, no card in it, coldboot, and hot insert an ExpressCard into the slot, the Realtek MediaCardReader pops up in dmesg as a new PCI device. How about you? Err, not PCI device as I said, sorry, but gets re-detected as a USB device: [4.220009] hub 2-1:1.0: port 6, status 0101, change , 12 Mb/s [4.291831] usb 2-1.6: new high-speed USB device number 5 using ehci_hcd [4.409353] usb 2-1.6: default language 0x0409 [4.414740] usb 2-1.6: udev 5, busnum 2, minor = 132 [4.414745] usb 2-1.6: New USB device found, idVendor=0bda, idProduct=0138 [4.414858] usb 2-1.6: New USB device strings: Mfr=1, Product=2, SerialNumber=3 [4.414967] usb 2-1.6: Product: USB2.0-CRW [4.415069] usb 2-1.6: Manufacturer: Generic [4.415172] usb 2-1.6: SerialNumber: 2009051638820 [4.416956] usb 2-1.6: usb_probe_device [4.416962] usb 2-1.6: configuration #1 chosen from 1 choice [4.419477] usb 2-1.6: adding 2-1.6:1.0 (config #1, interface 0) [4.424094] usb-storage 2-1.6:1.0: usb_probe_interface [4.424103] usb-storage 2-1.6:1.0: usb_probe_interface - got id [4.424276] ums-realtek 2-1.6:1.0: usb_probe_interface [4.424279] ums-realtek 2-1.6:1.0: usb_probe_interface - got id [4.440838] scsi6 : usb-storage 2-1.6:1.0 cut [ 222.748820] pci :11:00.0: [1095:3132] type 00 class 0x018000 [ 222.748865] pci :11:00.0: reg 10: [mem 0x-0x007f 64bit] [ 222.748898] pci :11:00.0: reg 18: [mem 0x-0x3fff 64bit] [ 222.748919] pci :11:00.0: reg 20: [io 0x-0x007f] [ 222.748960] pci :11:00.0: reg 30: [mem 0x-0x0007 pref] [ 222.749095] pci :11:00.0: supports D1 D2 [ 222.769438] pci :11:00.0: BAR 6: assigned [mem 0xf000-0xf007 pref] [ 222.769442] pci :11:00.0: BAR 2: assigned [mem 0xf6c0-0xf6c03fff 64bit] [ 222.769464] pci :11:00.0: BAR 2: set to [mem 0xf6c0-0xf6c03fff 64bit] (PCI address [0xf6c0-0xf6c03fff]) [ 222.769466] pci :11:00.0: BAR 0: assigned [mem 0xf6c04000-0xf6c0407f 64bit] [ 222.769487] pci :11:00.0: BAR 0: set to [mem 0xf6c04000-0xf6c0407f 64bit] (PCI address [0xf6c04000-0xf6c0407f]) [ 222.769489] pci :11:00.0: BAR 4: assigned [io 0xc000-0xc07f] [ 222.769496] pci :11:00.0: BAR 4: set to [io 0xc000-0xc07f] (PCI address [0xc000-0xc07f]) [ 222.891588] sata_sil24 :11:00.0: version 1.1 [ 222.891606] sata_sil24
Re: [PATCH] pci: Disable slot presence detection around bus reset
Hi Alex, I was just going to ask you whether your patch would "explain" why pciehp has in my experience broken presence detection while acpiphp has not (on 3.7 kernel) and whether the patch will fix it. Some testing I have done in the past on 3.2 kernel and on 3.7.1, with no fixes. Maybe you are interested in these threads? Actually, another user confirmed that pciehp is broken on 3.7 while luckily, he also could have shifted to acpiphp. Still, it is weird the behavior is different for different express cards (USB3 vs. SATA vs. RS232 vs. firewire). Four thread subjects on card presence detection: Re: 3.2.11: PCI Express card cannot be re-detected withing cca 60sec timeframe Re: linux-3.4-rc5: eSATA Sil3132 ExpressCard removal results in warn_slowpath_common Re: Dell Vostro 3550: pci_hotplug+acpiphp require 'pcie_aspm=force' on kernel command-line for hotplug to work Re: [PATCH v3 3/6] ACPI/pci_slot: update PCI slot information when PCI hotplug event happens Re: 3.8.0-rc4+ - Oops on removing WinTV-HVR-1400 expresscard TV Tuner Maybe you will crack it? ;-) Thanks, Martin Alex Williamson wrote: > On Thu, 2013-02-14 at 11:37 -0700, Alex Williamson wrote: >> A bus reset can trigger a presence detection change and result in a >> suprise hotplug. This is generally not what we want to happen when >> trying to reset a device. Disable the presence detection control on >> on bridges around bus reset. >> >> Signed-off-by: Alex Williamson >> --- >> drivers/pci/pci.c | 29 - >> 1 file changed, 24 insertions(+), 5 deletions(-) > > > Hmm, this doesn't seem to be sufficient, still seeing it > occasionally :-\ > >> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c >> index 5cb5820..c1f7d77 100644 >> --- a/drivers/pci/pci.c >> +++ b/drivers/pci/pci.c >> @@ -3229,8 +3229,8 @@ static int pci_pm_reset(struct pci_dev *dev, int probe) >> >> static int pci_parent_bus_reset(struct pci_dev *dev, int probe) >> { >> -u16 ctrl; >> -struct pci_dev *pdev; >> +u16 ctrl, flags, sltctl = 0; >> +struct pci_dev *pdev, *bridge; >> >> if (pci_is_root_bus(dev->bus) || dev->subordinate || !dev->bus->self) >> return -ENOTTY; >> @@ -3242,15 +3242,34 @@ static int pci_parent_bus_reset(struct pci_dev *dev, >> int probe) >> if (probe) >> return 0; >> >> -pci_read_config_word(dev->bus->self, PCI_BRIDGE_CONTROL, ); >> +bridge = dev->bus->self; >> + >> +/* >> + * If the parent device supports a slot with presence detection >> + * change enabled, holding the bus in reset can trigger that and >> + * cause an unwanted surprise removal. Disable presence detection >> + * around the bus reset. >> + */ >> +pcie_capability_read_word(bridge, PCI_EXP_FLAGS, ); >> +if (flags & PCI_EXP_FLAGS_SLOT) { >> +pcie_capability_read_word(bridge, PCI_EXP_SLTCTL, ); >> +if (sltctl & PCI_EXP_SLTCTL_PDCE) >> +pcie_capability_write_word(bridge, PCI_EXP_SLTCTL, >> +sltctl & ~PCI_EXP_SLTCTL_PDCE); >> +} >> + >> +pci_read_config_word(bridge, PCI_BRIDGE_CONTROL, ); >> ctrl |= PCI_BRIDGE_CTL_BUS_RESET; >> -pci_write_config_word(dev->bus->self, PCI_BRIDGE_CONTROL, ctrl); >> +pci_write_config_word(bridge, PCI_BRIDGE_CONTROL, ctrl); >> msleep(100); >> >> ctrl &= ~PCI_BRIDGE_CTL_BUS_RESET; >> -pci_write_config_word(dev->bus->self, PCI_BRIDGE_CONTROL, ctrl); >> +pci_write_config_word(bridge, PCI_BRIDGE_CONTROL, ctrl); >> msleep(100); >> >> +if (sltctl & PCI_EXP_SLTCTL_PDCE) >> +pcie_capability_write_word(bridge, PCI_EXP_SLTCTL, sltctl); >> + >> return 0; >> } >> >> > > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-pci" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] pci: Disable slot presence detection around bus reset
Hi Alex, I was just going to ask you whether your patch would explain why pciehp has in my experience broken presence detection while acpiphp has not (on 3.7 kernel) and whether the patch will fix it. Some testing I have done in the past on 3.2 kernel and on 3.7.1, with no fixes. Maybe you are interested in these threads? Actually, another user confirmed that pciehp is broken on 3.7 while luckily, he also could have shifted to acpiphp. Still, it is weird the behavior is different for different express cards (USB3 vs. SATA vs. RS232 vs. firewire). Four thread subjects on card presence detection: Re: 3.2.11: PCI Express card cannot be re-detected withing cca 60sec timeframe Re: linux-3.4-rc5: eSATA Sil3132 ExpressCard removal results in warn_slowpath_common Re: Dell Vostro 3550: pci_hotplug+acpiphp require 'pcie_aspm=force' on kernel command-line for hotplug to work Re: [PATCH v3 3/6] ACPI/pci_slot: update PCI slot information when PCI hotplug event happens Re: 3.8.0-rc4+ - Oops on removing WinTV-HVR-1400 expresscard TV Tuner Maybe you will crack it? ;-) Thanks, Martin Alex Williamson wrote: On Thu, 2013-02-14 at 11:37 -0700, Alex Williamson wrote: A bus reset can trigger a presence detection change and result in a suprise hotplug. This is generally not what we want to happen when trying to reset a device. Disable the presence detection control on on bridges around bus reset. Signed-off-by: Alex Williamson alex.william...@redhat.com --- drivers/pci/pci.c | 29 - 1 file changed, 24 insertions(+), 5 deletions(-) Hmm, this doesn't seem to be sufficient, still seeing it occasionally :-\ diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index 5cb5820..c1f7d77 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -3229,8 +3229,8 @@ static int pci_pm_reset(struct pci_dev *dev, int probe) static int pci_parent_bus_reset(struct pci_dev *dev, int probe) { -u16 ctrl; -struct pci_dev *pdev; +u16 ctrl, flags, sltctl = 0; +struct pci_dev *pdev, *bridge; if (pci_is_root_bus(dev-bus) || dev-subordinate || !dev-bus-self) return -ENOTTY; @@ -3242,15 +3242,34 @@ static int pci_parent_bus_reset(struct pci_dev *dev, int probe) if (probe) return 0; -pci_read_config_word(dev-bus-self, PCI_BRIDGE_CONTROL, ctrl); +bridge = dev-bus-self; + +/* + * If the parent device supports a slot with presence detection + * change enabled, holding the bus in reset can trigger that and + * cause an unwanted surprise removal. Disable presence detection + * around the bus reset. + */ +pcie_capability_read_word(bridge, PCI_EXP_FLAGS, flags); +if (flags PCI_EXP_FLAGS_SLOT) { +pcie_capability_read_word(bridge, PCI_EXP_SLTCTL, sltctl); +if (sltctl PCI_EXP_SLTCTL_PDCE) +pcie_capability_write_word(bridge, PCI_EXP_SLTCTL, +sltctl ~PCI_EXP_SLTCTL_PDCE); +} + +pci_read_config_word(bridge, PCI_BRIDGE_CONTROL, ctrl); ctrl |= PCI_BRIDGE_CTL_BUS_RESET; -pci_write_config_word(dev-bus-self, PCI_BRIDGE_CONTROL, ctrl); +pci_write_config_word(bridge, PCI_BRIDGE_CONTROL, ctrl); msleep(100); ctrl = ~PCI_BRIDGE_CTL_BUS_RESET; -pci_write_config_word(dev-bus-self, PCI_BRIDGE_CONTROL, ctrl); +pci_write_config_word(bridge, PCI_BRIDGE_CONTROL, ctrl); msleep(100); +if (sltctl PCI_EXP_SLTCTL_PDCE) +pcie_capability_write_word(bridge, PCI_EXP_SLTCTL, sltctl); + return 0; } -- To unsubscribe from this list: send the line unsubscribe linux-pci in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 3.8.0-rc4+ - Oops on removing WinTV-HVR-1400 expresscard TV Tuner
Hi Chris, Chris Clayton wrote: > Hi Martin, > > On 01/28/13 21:02, Martin Mokrejs wrote: >> Hi Chris, >> >> Chris Clayton wrote: >>> Hi Martin, >>> >>> On 01/28/13 12:12, Martin Mokrejs wrote: >>>> Chris Clayton wrote: >>> >>> I've struggled with this a little. For some reason, the expresscard >>> doesn't always stay properly inserted in the slot when I insert it. >>> Now that hotplug is working, the modules are being loaded and when >>> the card pops out again, I get an oops because, of course, the driver >>> is running and the card disappears. Perhaps the driver can be made a >>> bit more robust to sudden disappearance of the card. I'll report the >> >> Yes, I had or maybe still have same issues here. I used to get an Oops >> for sata_sil24 card weird behavior for USB3.0 NEC-based card. It was >> fine always for a VIA-based firewire card and serial PL2303-based one. >> I found out it is better if a usb device is connected to the USB card >> because if that slips out then the libata layer quickly realizes that. >> If there was no device connected, the usb waits too long before it removes >> the usb hub from the system. And if you plugin the card meanwhile >> back into the slot, weird thing happen. >> > My usb3 expresscard device has arrived and I get an oops with that > too, if I remove it without unloading the driver first. I guess it > shouldn't be a surprise that the driver isn't expecting the device to > disappear. I avoided the oopses when a USB device to connected to the express card. Nevertheless, you should report it to linux-usb and linux-pci mailing lists, along with the oops stacktrace (under a new thread). Maybe you suffer from another Oops. > > As I mentioned, I have some trouble with the WinTV-HVR-1400 card, > which sometimes pops out again, if I push it into the slot too hard > (but I'm geeting better at that with practice). So what I've done > (with the usb3 card too) to avoid the oopsen is blacklist the driver > in /etc/modprobe.d/blacklist.conf and then load them when I'm sure > the card is properly inserted. Not exactly hotplug, but at least I > don't have to reboot because of an oops- and it's not something I'm > doing several times an hour. Yeah, i also my way around - not fiddle much with the cards and if they slip out during insertion, don't re-plug them too quickly (at least with the USB3 card and SATA card I had problems). BTW, if you remove a card, you are supposed to push the card into the slot so it gets ejected. Do not just pull it out (what I did in the beginnings). I was told that is not the right way (probably affects the PresDet status). Martin -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 3.8.0-rc4+ - Oops on removing WinTV-HVR-1400 expresscard TV Tuner
Hi Chris, Chris Clayton wrote: Hi Martin, On 01/28/13 21:02, Martin Mokrejs wrote: Hi Chris, Chris Clayton wrote: Hi Martin, On 01/28/13 12:12, Martin Mokrejs wrote: Chris Clayton wrote: I've struggled with this a little. For some reason, the expresscard doesn't always stay properly inserted in the slot when I insert it. Now that hotplug is working, the modules are being loaded and when the card pops out again, I get an oops because, of course, the driver is running and the card disappears. Perhaps the driver can be made a bit more robust to sudden disappearance of the card. I'll report the Yes, I had or maybe still have same issues here. I used to get an Oops for sata_sil24 card weird behavior for USB3.0 NEC-based card. It was fine always for a VIA-based firewire card and serial PL2303-based one. I found out it is better if a usb device is connected to the USB card because if that slips out then the libata layer quickly realizes that. If there was no device connected, the usb waits too long before it removes the usb hub from the system. And if you plugin the card meanwhile back into the slot, weird thing happen. My usb3 expresscard device has arrived and I get an oops with that too, if I remove it without unloading the driver first. I guess it shouldn't be a surprise that the driver isn't expecting the device to disappear. I avoided the oopses when a USB device to connected to the express card. Nevertheless, you should report it to linux-usb and linux-pci mailing lists, along with the oops stacktrace (under a new thread). Maybe you suffer from another Oops. As I mentioned, I have some trouble with the WinTV-HVR-1400 card, which sometimes pops out again, if I push it into the slot too hard (but I'm geeting better at that with practice). So what I've done (with the usb3 card too) to avoid the oopsen is blacklist the driver in /etc/modprobe.d/blacklist.conf and then load them when I'm sure the card is properly inserted. Not exactly hotplug, but at least I don't have to reboot because of an oops- and it's not something I'm doing several times an hour. Yeah, i also my way around - not fiddle much with the cards and if they slip out during insertion, don't re-plug them too quickly (at least with the USB3 card and SATA card I had problems). BTW, if you remove a card, you are supposed to push the card into the slot so it gets ejected. Do not just pull it out (what I did in the beginnings). I was told that is not the right way (probably affects the PresDet status). Martin -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 3.8.0-rc4+ - Oops on removing WinTV-HVR-1400 expresscard TV Tuner
Hi Chris, Chris Clayton wrote: > Hi Martin, > > On 01/28/13 12:12, Martin Mokrejs wrote: >> Chris Clayton wrote: >>> >>> [snip] >>> >>>> [chris:~]$ cat /proc/cmdline >>>> root=/dev/sda5 pciehp_ports=native ro resume=/dev/sda6 >>> ^^ >>>**typo** >>> I've run the test again with pcie_ports=native and the directories now get >>> populated. Even better though, is that when I plug in the card, hotplug >>> **works** and the card's drivers are loaded. >> >> BTW, I have with acpiphp on 3.7.4: >> >> ls -la /sys/bus/pci_express/devices >> total 0 >> drwxr-xr-x 2 root root 0 Jan 28 13:07 . >> drwxr-xr-x 4 root root 0 Jan 28 13:07 .. >> $ ls -la /sys/bus/pci/devices/slots > > **typo** > It should be /sys/bus/pci/slots. > >> ls: cannot access /sys/bus/pci/devices/slots: No such file or directory >> $ >> > With acpiphp, I get /sys/bus/pci_express/devices populated but > /sys/bus/pci/slots is empty. OK, I haven't realized the typo, but I have here with acpiphp: # ls -laR /sys/bus/pci/slots /sys/bus/pci/slots: total 0 drwxr-xr-x 3 root root 0 Jan 27 17:14 . drwxr-xr-x 5 root root 0 Jan 25 15:56 .. drwxr-xr-x 2 root root 0 Jan 27 17:14 1 /sys/bus/pci/slots/1: total 0 drwxr-xr-x 2 root root0 Jan 27 17:14 . drwxr-xr-x 3 root root0 Jan 27 17:14 .. -r--r--r-- 1 root root 4096 Jan 28 21:31 adapter -r--r--r-- 1 root root 4096 Jan 27 17:14 address -rw-r--r-- 1 root root 4096 Jan 28 21:31 attention -r--r--r-- 1 root root 4096 Jan 28 21:31 cur_bus_speed -r--r--r-- 1 root root 4096 Jan 28 21:31 latch -r--r--r-- 1 root root 4096 Jan 28 21:31 max_bus_speed lrwxrwxrwx 1 root root0 Jan 28 21:31 module -> ../../../../module/acpiphp -rw-r--r-- 1 root root 4096 Jan 28 21:31 power # > >> And for me hotplug also works (as far as I can tell). ;-) >> >>> >>> Excellent! Thank you so much for your help (and patience) Martin and Yijing. >>> >>> Now to solving why running scandvb doesn't find any TV channels. >> >> Would be fine if you could re-do the PresDet checks and confirm whether it >> is also broken >> for you under pciehp. > > I've struggled with this a little. For some reason, the expresscard > doesn't always stay properly inserted in the slot when I insert it. > Now that hotplug is working, the modules are being loaded and when > the card pops out again, I get an oops because, of course, the driver > is running and the card disappears. Perhaps the driver can be made a > bit more robust to sudden disappearance of the card. I'll report the Yes, I had or maybe still have same issues here. I used to get an Oops for sata_sil24 card weird behavior for USB3.0 NEC-based card. It was fine always for a VIA-based firewire card and serial PL2303-based one. I found out it is better if a usb device is connected to the USB card because if that slips out then the libata layer quickly realizes that. If there was no device connected, the usb waits too long before it removes the usb hub from the system. And if you plugin the card meanwhile back into the slot, weird thing happen. > oops later. Anyway, to run these tests I built a kernel without the > dvb card's drivers, effectively simulating the situation I had before > Yijing got hotplug working for me. The card popping out may also have > affected these diffs a bit because, for example, the first one has > the CorrErr flag changed, possibly because I had to have two or more > goes at getting the card to lock in the slot. Yesterday that diff > showed no changes. Anyway, here are the diffs: > > diff lspci.after_insertion.txt lspci.after_1st_re-insertion.txt > 262c262 > < DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+ > TransPend- > --- >> DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq- AuxPwr+ >> TransPend- > 295c295 > < 40: 10 80 42 01 00 80 00 00 00 00 10 00 12 3c 12 04 > --- >> 40: 10 80 42 01 00 80 00 00 00 00 11 00 12 3c 12 04 > > > diff lspci.after_1st_removal.txt lspci.after_2nd_removal.txt > > BTW, with the NEC-based card only after every second removal of the card I got into PresDet- state. So, on every other diff attempt you won't see a difference! But we are talking about acpiphp here (unlike pciehp) and with that I also have no problems. > > = > diff lspci.before_insertion.txt lspci.after_1st_removal.txt > 112c112 > < 60: 20 20 ff 07 00 00 00 00 01 00 00 00 00 00 08 c0 > --- >> 60: 20 20 ff 07 00 00 0
Re: 3.8.0-rc4+ - Oops on removing WinTV-HVR-1400 expresscard TV Tuner
Chris Clayton wrote: > > [snip] > >> [chris:~]$ cat /proc/cmdline >> root=/dev/sda5 pciehp_ports=native ro resume=/dev/sda6 > ^^ > **typo** > I've run the test again with pcie_ports=native and the directories now get > populated. Even better though, is that when I plug in the card, hotplug > **works** and the card's drivers are loaded. BTW, I have with acpiphp on 3.7.4: ls -la /sys/bus/pci_express/devices total 0 drwxr-xr-x 2 root root 0 Jan 28 13:07 . drwxr-xr-x 4 root root 0 Jan 28 13:07 .. $ ls -la /sys/bus/pci/devices/slots ls: cannot access /sys/bus/pci/devices/slots: No such file or directory $ And for me hotplug also works (as far as I can tell). ;-) > > Excellent! Thank you so much for your help (and patience) Martin and Yijing. > > Now to solving why running scandvb doesn't find any TV channels. Would be fine if you could re-do the PresDet checks and confirm whether it is also broken for you under pciehp. Martin -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 3.8.0-rc4+ - Oops on removing WinTV-HVR-1400 expresscard TV Tuner
Chris Clayton wrote: > Hi Yijing, > > On 01/28/13 02:40, Yijing Wang wrote: >> Hi Chris, >> Sorry for the delay reply. It seems like my reply last night was missed. >> >> From the sysinfo you provide, there are no pcie port devices under >> /sys/bus/pci_express/devices. >> Maybe because there are some problems with _OSC in your laptop, so pcie port >> driver won't create pcie port device >> for hotplug, aer and so on. >> >> Maybe you can add boot parameter "pcie_ports=native" and reboot your laptop. >> Then use #modprobe pciehp pciehp_force=1 pciehp_debug=1 to load pciehp >> modules. >> After above actions, enter /sys/bus/pci_express/devices/ directory and >> /sys/bus/pci/slots/ >> Some slots and pcie port devices should be there now. >> > Sorry, I've tried your suggestion, but the two directories are still empty. > > I verified the test environment as follows: > > [chris:~]$ uname -a > Linux laptop 3.7.4 #15 SMP PREEMPT Mon Jan 28 09:43:57 GMT 2013 i686 GNU/Linux > [chris:~]$ grep acpiphp /boot/System.map-3.7.4 > [chris:~]$ modinfo acpiphp > modinfo: ERROR: Module acpiphp not found. > [chris:~]$ modinfo pciehp > filename: /lib/modules/3.7.4/kernel/drivers/pci/hotplug/pciehp.ko > license:GPL > description:PCI Express Hot Plug Controller Driver > author: Dan Zink , Greg Kroah-Hartman > , Dely Sy > depends: > intree: Y > vermagic: 3.7.4 SMP preempt mod_unload CORE2 > parm: pciehp_detect_mode:Slot detection mode: pcie, acpi, auto > pcie - Use PCIe based slot detection > acpi - Use ACPI for slot detection > auto(default) - Auto select mode. Use acpi option if duplicate > slot ids are found. Otherwise, use pcie option > (charp) > parm: pciehp_debug:Debugging mode enabled or not (bool) > parm: pciehp_poll_mode:Using polling mechanism for hot-plug events > or not (bool) > parm: pciehp_poll_time:Polling mechanism frequency, in seconds (int) > parm: pciehp_force:Force pciehp, even if OSHP is missing (bool) > [chris:~]$ cat /proc/cmdline > root=/dev/sda5 pciehp_ports=native ro resume=/dev/sda6 > [chris:~]$ sudo modprobe pciehp pciehp_force=1 pciehp_debug=1 > [chris:~]$ lsmod > Module Size Used by > pciehp 19907 0 > [...] > > You will notice that the kernel I have used is 3.7.4. I hope that's a > suitable kernel for your tests. I've moved away from the 3.8 development > kernel onto one that's stable and on which Martin has identified a solution. > I see Greg KH released 3.7.5 yesterday and it includes a pciehp change. I'll > upgrade to that, run the tests again and report back. > > One question - should I include the (acpi) pci_slot driver in the kernel > build or does pciehp populate the directories without pci_slot? Hi Chris, I am not a kernel developer but from the other threads at linux-pci I gathered there are in some scenarios problems with improper loading of the hotplug modules. Therefore, the patches floating now around are to disable hotplug module availability. Therefore, I suggested you to try only only static kernel support for hotplug. That way you don't hit the issue. That is for sure not addressed in 3.7.5, seems that it is probably in -next. Martin -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 3.8.0-rc4+ - Oops on removing WinTV-HVR-1400 expresscard TV Tuner
Chris Clayton wrote: Hi Yijing, On 01/28/13 02:40, Yijing Wang wrote: Hi Chris, Sorry for the delay reply. It seems like my reply last night was missed. From the sysinfo you provide, there are no pcie port devices under /sys/bus/pci_express/devices. Maybe because there are some problems with _OSC in your laptop, so pcie port driver won't create pcie port device for hotplug, aer and so on. Maybe you can add boot parameter pcie_ports=native and reboot your laptop. Then use #modprobe pciehp pciehp_force=1 pciehp_debug=1 to load pciehp modules. After above actions, enter /sys/bus/pci_express/devices/ directory and /sys/bus/pci/slots/ Some slots and pcie port devices should be there now. Sorry, I've tried your suggestion, but the two directories are still empty. I verified the test environment as follows: [chris:~]$ uname -a Linux laptop 3.7.4 #15 SMP PREEMPT Mon Jan 28 09:43:57 GMT 2013 i686 GNU/Linux [chris:~]$ grep acpiphp /boot/System.map-3.7.4 [chris:~]$ modinfo acpiphp modinfo: ERROR: Module acpiphp not found. [chris:~]$ modinfo pciehp filename: /lib/modules/3.7.4/kernel/drivers/pci/hotplug/pciehp.ko license:GPL description:PCI Express Hot Plug Controller Driver author: Dan Zink dan.z...@compaq.com, Greg Kroah-Hartman g...@kroah.com, Dely Sy dely.l...@intel.com depends: intree: Y vermagic: 3.7.4 SMP preempt mod_unload CORE2 parm: pciehp_detect_mode:Slot detection mode: pcie, acpi, auto pcie - Use PCIe based slot detection acpi - Use ACPI for slot detection auto(default) - Auto select mode. Use acpi option if duplicate slot ids are found. Otherwise, use pcie option (charp) parm: pciehp_debug:Debugging mode enabled or not (bool) parm: pciehp_poll_mode:Using polling mechanism for hot-plug events or not (bool) parm: pciehp_poll_time:Polling mechanism frequency, in seconds (int) parm: pciehp_force:Force pciehp, even if OSHP is missing (bool) [chris:~]$ cat /proc/cmdline root=/dev/sda5 pciehp_ports=native ro resume=/dev/sda6 [chris:~]$ sudo modprobe pciehp pciehp_force=1 pciehp_debug=1 [chris:~]$ lsmod Module Size Used by pciehp 19907 0 [...] You will notice that the kernel I have used is 3.7.4. I hope that's a suitable kernel for your tests. I've moved away from the 3.8 development kernel onto one that's stable and on which Martin has identified a solution. I see Greg KH released 3.7.5 yesterday and it includes a pciehp change. I'll upgrade to that, run the tests again and report back. One question - should I include the (acpi) pci_slot driver in the kernel build or does pciehp populate the directories without pci_slot? Hi Chris, I am not a kernel developer but from the other threads at linux-pci I gathered there are in some scenarios problems with improper loading of the hotplug modules. Therefore, the patches floating now around are to disable hotplug module availability. Therefore, I suggested you to try only only static kernel support for hotplug. That way you don't hit the issue. That is for sure not addressed in 3.7.5, seems that it is probably in -next. Martin -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 3.8.0-rc4+ - Oops on removing WinTV-HVR-1400 expresscard TV Tuner
Chris Clayton wrote: [snip] [chris:~]$ cat /proc/cmdline root=/dev/sda5 pciehp_ports=native ro resume=/dev/sda6 ^^ **typo** I've run the test again with pcie_ports=native and the directories now get populated. Even better though, is that when I plug in the card, hotplug **works** and the card's drivers are loaded. BTW, I have with acpiphp on 3.7.4: ls -la /sys/bus/pci_express/devices total 0 drwxr-xr-x 2 root root 0 Jan 28 13:07 . drwxr-xr-x 4 root root 0 Jan 28 13:07 .. $ ls -la /sys/bus/pci/devices/slots ls: cannot access /sys/bus/pci/devices/slots: No such file or directory $ And for me hotplug also works (as far as I can tell). ;-) Excellent! Thank you so much for your help (and patience) Martin and Yijing. Now to solving why running scandvb doesn't find any TV channels. Would be fine if you could re-do the PresDet checks and confirm whether it is also broken for you under pciehp. Martin -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 3.8.0-rc4+ - Oops on removing WinTV-HVR-1400 expresscard TV Tuner
Hi Chris, Chris Clayton wrote: Hi Martin, On 01/28/13 12:12, Martin Mokrejs wrote: Chris Clayton wrote: [snip] [chris:~]$ cat /proc/cmdline root=/dev/sda5 pciehp_ports=native ro resume=/dev/sda6 ^^ **typo** I've run the test again with pcie_ports=native and the directories now get populated. Even better though, is that when I plug in the card, hotplug **works** and the card's drivers are loaded. BTW, I have with acpiphp on 3.7.4: ls -la /sys/bus/pci_express/devices total 0 drwxr-xr-x 2 root root 0 Jan 28 13:07 . drwxr-xr-x 4 root root 0 Jan 28 13:07 .. $ ls -la /sys/bus/pci/devices/slots **typo** It should be /sys/bus/pci/slots. ls: cannot access /sys/bus/pci/devices/slots: No such file or directory $ With acpiphp, I get /sys/bus/pci_express/devices populated but /sys/bus/pci/slots is empty. OK, I haven't realized the typo, but I have here with acpiphp: # ls -laR /sys/bus/pci/slots /sys/bus/pci/slots: total 0 drwxr-xr-x 3 root root 0 Jan 27 17:14 . drwxr-xr-x 5 root root 0 Jan 25 15:56 .. drwxr-xr-x 2 root root 0 Jan 27 17:14 1 /sys/bus/pci/slots/1: total 0 drwxr-xr-x 2 root root0 Jan 27 17:14 . drwxr-xr-x 3 root root0 Jan 27 17:14 .. -r--r--r-- 1 root root 4096 Jan 28 21:31 adapter -r--r--r-- 1 root root 4096 Jan 27 17:14 address -rw-r--r-- 1 root root 4096 Jan 28 21:31 attention -r--r--r-- 1 root root 4096 Jan 28 21:31 cur_bus_speed -r--r--r-- 1 root root 4096 Jan 28 21:31 latch -r--r--r-- 1 root root 4096 Jan 28 21:31 max_bus_speed lrwxrwxrwx 1 root root0 Jan 28 21:31 module - ../../../../module/acpiphp -rw-r--r-- 1 root root 4096 Jan 28 21:31 power # And for me hotplug also works (as far as I can tell). ;-) Excellent! Thank you so much for your help (and patience) Martin and Yijing. Now to solving why running scandvb doesn't find any TV channels. Would be fine if you could re-do the PresDet checks and confirm whether it is also broken for you under pciehp. I've struggled with this a little. For some reason, the expresscard doesn't always stay properly inserted in the slot when I insert it. Now that hotplug is working, the modules are being loaded and when the card pops out again, I get an oops because, of course, the driver is running and the card disappears. Perhaps the driver can be made a bit more robust to sudden disappearance of the card. I'll report the Yes, I had or maybe still have same issues here. I used to get an Oops for sata_sil24 card weird behavior for USB3.0 NEC-based card. It was fine always for a VIA-based firewire card and serial PL2303-based one. I found out it is better if a usb device is connected to the USB card because if that slips out then the libata layer quickly realizes that. If there was no device connected, the usb waits too long before it removes the usb hub from the system. And if you plugin the card meanwhile back into the slot, weird thing happen. oops later. Anyway, to run these tests I built a kernel without the dvb card's drivers, effectively simulating the situation I had before Yijing got hotplug working for me. The card popping out may also have affected these diffs a bit because, for example, the first one has the CorrErr flag changed, possibly because I had to have two or more goes at getting the card to lock in the slot. Yesterday that diff showed no changes. Anyway, here are the diffs: diff lspci.after_insertion.txt lspci.after_1st_re-insertion.txt 262c262 DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+ TransPend- --- DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq- AuxPwr+ TransPend- 295c295 40: 10 80 42 01 00 80 00 00 00 00 10 00 12 3c 12 04 --- 40: 10 80 42 01 00 80 00 00 00 00 11 00 12 3c 12 04 diff lspci.after_1st_removal.txt lspci.after_2nd_removal.txt no difference BTW, with the NEC-based card only after every second removal of the card I got into PresDet- state. So, on every other diff attempt you won't see a difference! But we are talking about acpiphp here (unlike pciehp) and with that I also have no problems. = diff lspci.before_insertion.txt lspci.after_1st_removal.txt 112c112 60: 20 20 ff 07 00 00 00 00 01 00 00 00 00 00 08 c0 --- 60: 20 20 ff 07 00 00 00 00 01 00 00 00 00 00 00 c0 262,263c262,263 DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+ TransPend- LnkCap: Port #4, Speed 5GT/s, Width x1, ASPM L0s L1, Latency L0 1us, L1 16us --- DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq- AuxPwr+ TransPend- LnkCap: Port #4, Speed 5GT/s, Width x1, ASPM L0s L1, Latency L0 512ns, L1 16us 265c265 LnkCtl: ASPM L0s L1 Enabled; RCB 64 bytes Disabled- Retrain- CommClk
Re: 3.8.0-rc4+ - Oops on removing WinTV-HVR-1400 expresscard TV Tuner
Hi Chris, Chris Clayton wrote: > Thanks again, Martin. > > Firstly, maybe we should remove the linux-media list from the copy list. I > imagine this hotplug stuff is just noise to them. > > [snip] >> Do you have any other express card around to try if it works at all? Try >> that always after a cold boot. >> > Not at the moment, but I ordered at USB3 expresscard yesterday, so I will > have one soon. > >> Posting a diff result of the below procedure might help: >> >> # lspci -vvvxxx > lspci.before_insertion.txt >> >> [plug your card into the slot] >> >> # lspci -vvvxxx > lspci.after_insertion.txt >> >> [ unplug your card] >> >> # lspci -vvvxxx > lspci.after_1st_removal.txt >> >> [re-plug your card into the slot] >> >> # lspci -vvvxxx > lspci.after_1st_re-insertion.txt >> >> [ unplug your card] >> >> # lspci -vvvxxx > lspci.after_2nd_removal.txt >> > > OK, I've been using kernel 3.8.0-rc kernels so far, but given that is still > under development, I've switched to 3.7.4, mainly because you are having > success with 3.7.x, acpiphp and pcie_aspm=off. I verified the environment as > follows: > > [chris:~]$ cat /proc/cmdline > root=/dev/sda5 pcie_aspm=off ro resume=/dev/sda6 > [chris:~]$ dmesg | grep ASPM > [0.00] PCIe ASPM is disabled > [0.348959] pci:00: ACPI _OSC support notification failed, disabling > PCIe ASPM > [chris:~]$ dmesg | grep acpiphp > [0.400846] acpiphp: ACPI Hot Plug PCI Controller Driver version: 0.5 > [chris:~]$ dmesg | grep pciehp > [chris:~]$ uname -a > Linux laptop 3.7.4 #13 SMP PREEMPT Sun Jan 27 18:39:39 GMT 2013 i686 GNU/Linux > vostro ~ # cat /proc/cmdline root=/dev/sda5 pciehp.pciehp_debug=1 slub_debug=AFPZ pcie_aspm=off vostro ~ # dmesg | grep ASPM [0.00] PCIe ASPM is disabled vostro ~ # dmesg | grep acpiphp [2.449038] acpiphp: ACPI Hot Plug PCI Controller Driver version: 0.5 [2.453757] acpiphp: Slot [1] registered vostro ~ # uname -a Linux vostro 3.7.4-default #2 SMP Mon Jan 21 22:45:22 MET 2013 x86_64 Intel(R) Core(TM) i7-2640M CPU @ 2.80GHz GenuineIntel GNU/Linux vostro ~ # > >> Then compare them using diff. These should have no difference: >> >> diff lspci.after_insertion.txt lspci.after_1st_re-insertion.txt >> diff lspci.after_1st_removal.txt lspci.after_2nd_removal.txt >> > Correct, there were no differences. > >> >> These may have only little difference, or none: >> >> diff lspci.before_insertion.txt lspci.after_1st_removal.txt > > 263c263 > < LnkCap: Port #4, Speed 5GT/s, Width x1, ASPM L0s L1, Latency > L0 <1us, L1 <16us > --- > > LnkCap: Port #4, Speed 5GT/s, Width x1, ASPM L0s L1, Latency > L0 <512ns, L1 <16us > 265c265 > < LnkCtl: ASPM L0s L1 Enabled; RCB 64 bytes Disabled- Retrain- > CommClk- > --- > > LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- > CommClk+ > 267c267 > < LnkSta: Speed 2.5GT/s, Width x0, TrErr- Train- SlotClk+ > DLActive- BWMgmt- ABWMgmt- > --- > > LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ > DLActive- BWMgmt+ ABWMgmt- > 273c273 > < Changed: MRL- PresDet- LinkState- > --- > > Changed: MRL- PresDet- LinkState+ > 295,296c295,296 > < 40: 10 80 42 01 00 80 00 00 00 00 10 00 12 4c 12 04 > < 50: 03 00 01 10 60 b2 1c 00 08 00 00 00 00 00 00 00 > --- > > 40: 10 80 42 01 00 80 00 00 00 00 10 00 12 3c 12 04 > > 50: 40 00 11 50 60 b2 1c 00 08 00 00 01 00 00 00 00 > >> diff lspci.after_1st_removal.txt lspci.after_2nd_removal.txt >> > No difference. >> >> >> Finally, these should confirm whether the PresDet works for you (for me NOT >> with pciehp but does work with acpiphp). >> You should see PresDet- to PresDet+ changes in: >> > Yes, I do see the PresDet- to PresDet+ changes > >> diff lspci.before_insertion.txt lspci.after_insertion.txt > > 263c263 > < LnkCap: Port #4, Speed 5GT/s, Width x1, ASPM L0s L1, Latency > L0 <1us, L1 <16us > --- > > LnkCap: Port #4, Speed 5GT/s, Width x1, ASPM L0s L1, Latency > L0 <512ns, L1 <16us > 265c265 > < LnkCtl: ASPM L0s L1 Enabled; RCB 64 bytes Disabled- Retrain- > CommClk- > --- > > LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- > CommClk+ > 267c267 > < LnkSta: Speed 2.5GT/s, Width x0, TrErr- Train- SlotClk+ > DLActive- BWMgmt- ABWMgmt- > --- > > LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ > DLActive+ BWMgmt+ ABWMgmt- > 272,273c272,273 > < SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet- > Interlock- > < Changed: MRL- PresDet- LinkState- > --- > > SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet+ > Interlock- > > Changed: MRL- PresDet- LinkState+ > 295,296c295,296 > < 40: 10 80 42 01 00 80 00 00 00 00 10 00 12 4c 12 04 > < 50: 03 00 01 10 60 b2 1c 00 08 00 00 00 00 00 00 00 > --- > > 40: 10 80 42 01 00 80
Re: 3.8.0-rc4+ - Oops on removing WinTV-HVR-1400 expresscard TV Tuner
Chris Clayton wrote: > > > On 01/27/13 14:26, Martin Mokrejs wrote: >> Chris Clayton wrote: >>> >>> >>> On 01/27/13 12:18, Yijing Wang wrote: >>>> 于 2013-01-27 19:19, Chris Clayton 写道: >>>>> Hi Yijing >>>>> >>>>> On 01/27/13 02:45, Yijing Wang wrote: >>>>>> 于 2013-01-27 4:54, Chris Clayton 写道: >>>>>>> Hi Martin, >>>>>>> >>>>>>> On 01/24/13 19:21, Martin Mokrejs wrote: >>>>>>>> Hi Chris, >>>>>>>> try to include in kernel only acpiphp and omit pciehp. Don't use >>>>>>>> modules but include >>>>>>>> them statically. And try, in addition, check whether "pcie_aspm=off" >>>>>>>> in grub.conf helped. >>>>>>>> >>>>>>> >>>>>>> Thanks for the tip. I had the pciehp driver installed, but it was a >>>>>>> module and not loaded. I didn't have acpiphp enabled at all. Building >>>>>>> them both in statically, appears to have papered over the cracks of the >>>>>>> oops :-) >>>>>> >>>>>> Not loaded pciehp driver? Remove the device from this slot without >>>>>> poweroff ? >>>>>> >>>>> >>>>> That's correct. When I first encountered the oops, I did not have the >>>>> pciehp driver loaded and removing the device from the slot whilst the >>>>> laptop was powered on resulted in the oops. >>>> >>>> Hmm, that's unsafe and dangerous, because device now may be running. >>>> There are two ways to trigger pci hot-add or hot-remove in linux, after >>>> loaded pciehp or acpiphp module >>>> (the two modules only one can loaded into system at the same time). You >>>> can trigger hot-add/hot-remove by >>>> sysfs interface under /sys/bus/pci/slots/[slot-name]/power or attention >>>> button on hardware (if your laptop supports that). >>>> >>> >>> OK, thanks for the advice. >>> >>>>>>> >>>>>>>> The best would if you subscribe to linux-pci, and read my recent >>>>>>>> threads >>>>>>>> about similar issues I had with express cards with Dell Vostro 3550. >>>>>>>> Further, there is >>>>>>>> a lot of changes to PCI hotplug done by Yingahi Liu and Rafael >>>>>>>> Wysockij, just browse the >>>>>>>> archives of linux-pci and see the pacthes and the discussion. >>>>>>> >>>>>>> Those discussions are way above my level of knowledge. I guess all this >>>>>>> work will be merged into mainline in due course, so I'll watch for them >>>>>>> in 3.9 or later. Unless, of course, there is a tree I could clone and >>>>>>> help test the changes with my laptop and expresscard. >>>>>>> >>>>>>> Hotplug isn't working at all on my Fujitsu laptop, so I can only get >>>>>>> the card recognised by rebooting with the card inserted (or by writing >>>>>>> 1 to/sys/bus/pci/rescan). There seem to be a few reports on this in the >>>>>>> kernel bugzilla, so I'll look through them and see what's being done. >>>>>> >>>>>> Hi Chris, >>>>>> What about use #modprobe pciehp pciehp_debug=1 pciehp_poll_mode=1 >>>>>> pciehp_poll_time=1 ? >>>>>> >>>>>> Can you resend the dmesg log and "lspci -vvv" info after hotplug device >>>>>> from your Fujitsu laptop with above module parameters? >>>>>> >>>>> >>>>> I wasn't sure whether or not the pciehp driver should be loaded on its >>>>> own or with the acpiphp driver also loaded. So I built them both as >>>>> modules and planned to try both, pciehp only and acpiphp only. However, >>>>> I've found that acpiphp will not load (regardless of whether or not >>>>> pciehp is already loaded). What I get is: >>>>> >>>>> [chris:~]$ sudo modprobe acpiphp debug=1 >>>>> modprobe: ERROR: could not insert 'acpiphp': No such device >> >> Are you sure you had pciehp already loaded? >> > Yes, I'm sure it was. Ah, sorry, wanted t
Re: 3.8.0-rc4+ - Oops on removing WinTV-HVR-1400 expresscard TV Tuner
Chris Clayton wrote: > > > On 01/27/13 12:18, Yijing Wang wrote: >> 于 2013-01-27 19:19, Chris Clayton 写道: >>> Hi Yijing >>> >>> On 01/27/13 02:45, Yijing Wang wrote: >>>> 于 2013-01-27 4:54, Chris Clayton 写道: >>>>> Hi Martin, >>>>> >>>>> On 01/24/13 19:21, Martin Mokrejs wrote: >>>>>> Hi Chris, >>>>>> try to include in kernel only acpiphp and omit pciehp. Don't use >>>>>> modules but include >>>>>> them statically. And try, in addition, check whether "pcie_aspm=off" in >>>>>> grub.conf helped. >>>>>> >>>>> >>>>> Thanks for the tip. I had the pciehp driver installed, but it was a >>>>> module and not loaded. I didn't have acpiphp enabled at all. Building >>>>> them both in statically, appears to have papered over the cracks of the >>>>> oops :-) >>>> >>>> Not loaded pciehp driver? Remove the device from this slot without >>>> poweroff ? >>>> >>> >>> That's correct. When I first encountered the oops, I did not have the >>> pciehp driver loaded and removing the device from the slot whilst the >>> laptop was powered on resulted in the oops. >> >> Hmm, that's unsafe and dangerous, because device now may be running. >> There are two ways to trigger pci hot-add or hot-remove in linux, after >> loaded pciehp or acpiphp module >> (the two modules only one can loaded into system at the same time). You can >> trigger hot-add/hot-remove by >> sysfs interface under /sys/bus/pci/slots/[slot-name]/power or attention >> button on hardware (if your laptop supports that). >> > > OK, thanks for the advice. > >>>>> >>>>>> The best would if you subscribe to linux-pci, and read my recent >>>>>> threads >>>>>> about similar issues I had with express cards with Dell Vostro 3550. >>>>>> Further, there is >>>>>> a lot of changes to PCI hotplug done by Yingahi Liu and Rafael Wysockij, >>>>>> just browse the >>>>>> archives of linux-pci and see the pacthes and the discussion. >>>>> >>>>> Those discussions are way above my level of knowledge. I guess all this >>>>> work will be merged into mainline in due course, so I'll watch for them >>>>> in 3.9 or later. Unless, of course, there is a tree I could clone and >>>>> help test the changes with my laptop and expresscard. >>>>> >>>>> Hotplug isn't working at all on my Fujitsu laptop, so I can only get the >>>>> card recognised by rebooting with the card inserted (or by writing 1 >>>>> to/sys/bus/pci/rescan). There seem to be a few reports on this in the >>>>> kernel bugzilla, so I'll look through them and see what's being done. >>>> >>>> Hi Chris, >>>> What about use #modprobe pciehp pciehp_debug=1 pciehp_poll_mode=1 >>>> pciehp_poll_time=1 ? >>>> >>>> Can you resend the dmesg log and "lspci -vvv" info after hotplug device >>>> from your Fujitsu laptop with above module parameters? >>>> >>> >>> I wasn't sure whether or not the pciehp driver should be loaded on its own >>> or with the acpiphp driver also loaded. So I built them both as modules and >>> planned to try both, pciehp only and acpiphp only. However, I've found that >>> acpiphp will not load (regardless of whether or not pciehp is already >>> loaded). What I get is: >>> >>> [chris:~]$ sudo modprobe acpiphp debug=1 >>> modprobe: ERROR: could not insert 'acpiphp': No such device Are you sure you had pciehp already loaded? >>> >> >> Currently, If your hardware support pciehp native hotplug, acpiphp driver >> will be rejected when loading it in system >> (you can force loading it by add boot parameter pcie_aspm=off as Martin >> said). >> > > OK, thanks again for the advice. I've disabled the acpiphp driver. Pitty. For me only with acpiphp works detection of express card in the slot. With pciehp the PresDet is not updated properly upon removal/insertion and sometimes, probably as a result of the previous, PresDet on the SltSta: line of lspci is not correct. So I moved away from pciehp. I have a SandyBridge based laptop so I was hoping with your i5-based laptop you have also great chance to get rid of pciehp issues. Martin -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 3.8.0-rc4+ - Oops on removing WinTV-HVR-1400 expresscard TV Tuner
Chris Clayton wrote: On 01/27/13 12:18, Yijing Wang wrote: 于 2013-01-27 19:19, Chris Clayton 写道: Hi Yijing On 01/27/13 02:45, Yijing Wang wrote: 于 2013-01-27 4:54, Chris Clayton 写道: Hi Martin, On 01/24/13 19:21, Martin Mokrejs wrote: Hi Chris, try to include in kernel only acpiphp and omit pciehp. Don't use modules but include them statically. And try, in addition, check whether pcie_aspm=off in grub.conf helped. Thanks for the tip. I had the pciehp driver installed, but it was a module and not loaded. I didn't have acpiphp enabled at all. Building them both in statically, appears to have papered over the cracks of the oops :-) Not loaded pciehp driver? Remove the device from this slot without poweroff ? That's correct. When I first encountered the oops, I did not have the pciehp driver loaded and removing the device from the slot whilst the laptop was powered on resulted in the oops. Hmm, that's unsafe and dangerous, because device now may be running. There are two ways to trigger pci hot-add or hot-remove in linux, after loaded pciehp or acpiphp module (the two modules only one can loaded into system at the same time). You can trigger hot-add/hot-remove by sysfs interface under /sys/bus/pci/slots/[slot-name]/power or attention button on hardware (if your laptop supports that). OK, thanks for the advice. The best would if you subscribe to linux-pci, and read my recent threads about similar issues I had with express cards with Dell Vostro 3550. Further, there is a lot of changes to PCI hotplug done by Yingahi Liu and Rafael Wysockij, just browse the archives of linux-pci and see the pacthes and the discussion. Those discussions are way above my level of knowledge. I guess all this work will be merged into mainline in due course, so I'll watch for them in 3.9 or later. Unless, of course, there is a tree I could clone and help test the changes with my laptop and expresscard. Hotplug isn't working at all on my Fujitsu laptop, so I can only get the card recognised by rebooting with the card inserted (or by writing 1 to/sys/bus/pci/rescan). There seem to be a few reports on this in the kernel bugzilla, so I'll look through them and see what's being done. Hi Chris, What about use #modprobe pciehp pciehp_debug=1 pciehp_poll_mode=1 pciehp_poll_time=1 ? Can you resend the dmesg log and lspci -vvv info after hotplug device from your Fujitsu laptop with above module parameters? I wasn't sure whether or not the pciehp driver should be loaded on its own or with the acpiphp driver also loaded. So I built them both as modules and planned to try both, pciehp only and acpiphp only. However, I've found that acpiphp will not load (regardless of whether or not pciehp is already loaded). What I get is: [chris:~]$ sudo modprobe acpiphp debug=1 modprobe: ERROR: could not insert 'acpiphp': No such device Are you sure you had pciehp already loaded? Currently, If your hardware support pciehp native hotplug, acpiphp driver will be rejected when loading it in system (you can force loading it by add boot parameter pcie_aspm=off as Martin said). OK, thanks again for the advice. I've disabled the acpiphp driver. Pitty. For me only with acpiphp works detection of express card in the slot. With pciehp the PresDet is not updated properly upon removal/insertion and sometimes, probably as a result of the previous, PresDet on the SltSta: line of lspci is not correct. So I moved away from pciehp. I have a SandyBridge based laptop so I was hoping with your i5-based laptop you have also great chance to get rid of pciehp issues. Martin -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 3.8.0-rc4+ - Oops on removing WinTV-HVR-1400 expresscard TV Tuner
Chris Clayton wrote: On 01/27/13 14:26, Martin Mokrejs wrote: Chris Clayton wrote: On 01/27/13 12:18, Yijing Wang wrote: 于 2013-01-27 19:19, Chris Clayton 写道: Hi Yijing On 01/27/13 02:45, Yijing Wang wrote: 于 2013-01-27 4:54, Chris Clayton 写道: Hi Martin, On 01/24/13 19:21, Martin Mokrejs wrote: Hi Chris, try to include in kernel only acpiphp and omit pciehp. Don't use modules but include them statically. And try, in addition, check whether pcie_aspm=off in grub.conf helped. Thanks for the tip. I had the pciehp driver installed, but it was a module and not loaded. I didn't have acpiphp enabled at all. Building them both in statically, appears to have papered over the cracks of the oops :-) Not loaded pciehp driver? Remove the device from this slot without poweroff ? That's correct. When I first encountered the oops, I did not have the pciehp driver loaded and removing the device from the slot whilst the laptop was powered on resulted in the oops. Hmm, that's unsafe and dangerous, because device now may be running. There are two ways to trigger pci hot-add or hot-remove in linux, after loaded pciehp or acpiphp module (the two modules only one can loaded into system at the same time). You can trigger hot-add/hot-remove by sysfs interface under /sys/bus/pci/slots/[slot-name]/power or attention button on hardware (if your laptop supports that). OK, thanks for the advice. The best would if you subscribe to linux-pci, and read my recent threads about similar issues I had with express cards with Dell Vostro 3550. Further, there is a lot of changes to PCI hotplug done by Yingahi Liu and Rafael Wysockij, just browse the archives of linux-pci and see the pacthes and the discussion. Those discussions are way above my level of knowledge. I guess all this work will be merged into mainline in due course, so I'll watch for them in 3.9 or later. Unless, of course, there is a tree I could clone and help test the changes with my laptop and expresscard. Hotplug isn't working at all on my Fujitsu laptop, so I can only get the card recognised by rebooting with the card inserted (or by writing 1 to/sys/bus/pci/rescan). There seem to be a few reports on this in the kernel bugzilla, so I'll look through them and see what's being done. Hi Chris, What about use #modprobe pciehp pciehp_debug=1 pciehp_poll_mode=1 pciehp_poll_time=1 ? Can you resend the dmesg log and lspci -vvv info after hotplug device from your Fujitsu laptop with above module parameters? I wasn't sure whether or not the pciehp driver should be loaded on its own or with the acpiphp driver also loaded. So I built them both as modules and planned to try both, pciehp only and acpiphp only. However, I've found that acpiphp will not load (regardless of whether or not pciehp is already loaded). What I get is: [chris:~]$ sudo modprobe acpiphp debug=1 modprobe: ERROR: could not insert 'acpiphp': No such device Are you sure you had pciehp already loaded? Yes, I'm sure it was. Ah, sorry, wanted to say Are you sure you had NOT pciehp already loaded (loaded before)?. If you retry without loading it ever you might succeed with acpiphp. Currently, If your hardware support pciehp native hotplug, acpiphp driver will be rejected when loading it in system (you can force loading it by add boot parameter pcie_aspm=off as Martin said). OK, thanks again for the advice. I've disabled the acpiphp driver. Pitty. For me only with acpiphp works detection of express card in the slot. With pciehp the PresDet is not updated properly upon removal/insertion and sometimes, probably as a result of the previous, PresDet on the SltSta: line of lspci is not correct. So I moved away from pciehp. I have a SandyBridge based laptop so I was hoping with your i5-based laptop you have also great chance to get rid of pciehp issues. I've just (very carefully) set this up again (i.e. no pciehp driver (module or builtin), acpiphp driver built in and pcie_aspm=off on the kernel command line (via grub). My card is not detected on insertion. :-( Do you have any other express card around to try if it works at all? Try that always after a cold boot. Posting a diff result of the below procedure might help: # lspci -vvvxxx lspci.before_insertion.txt [plug your card into the slot] # lspci -vvvxxx lspci.after_insertion.txt [ unplug your card] # lspci -vvvxxx lspci.after_1st_removal.txt [re-plug your card into the slot] # lspci -vvvxxx lspci.after_1st_re-insertion.txt [ unplug your card] # lspci -vvvxxx lspci.after_2nd_removal.txt Then compare them using diff. These should have no difference: diff lspci.after_insertion.txt lspci.after_1st_re-insertion.txt diff lspci.after_1st_removal.txt lspci.after_2nd_removal.txt These may have only little difference, or none: diff lspci.before_insertion.txt lspci.after_1st_removal.txt diff
Re: 3.8.0-rc4+ - Oops on removing WinTV-HVR-1400 expresscard TV Tuner
Hi Chris, Chris Clayton wrote: Thanks again, Martin. Firstly, maybe we should remove the linux-media list from the copy list. I imagine this hotplug stuff is just noise to them. [snip] Do you have any other express card around to try if it works at all? Try that always after a cold boot. Not at the moment, but I ordered at USB3 expresscard yesterday, so I will have one soon. Posting a diff result of the below procedure might help: # lspci -vvvxxx lspci.before_insertion.txt [plug your card into the slot] # lspci -vvvxxx lspci.after_insertion.txt [ unplug your card] # lspci -vvvxxx lspci.after_1st_removal.txt [re-plug your card into the slot] # lspci -vvvxxx lspci.after_1st_re-insertion.txt [ unplug your card] # lspci -vvvxxx lspci.after_2nd_removal.txt OK, I've been using kernel 3.8.0-rc kernels so far, but given that is still under development, I've switched to 3.7.4, mainly because you are having success with 3.7.x, acpiphp and pcie_aspm=off. I verified the environment as follows: [chris:~]$ cat /proc/cmdline root=/dev/sda5 pcie_aspm=off ro resume=/dev/sda6 [chris:~]$ dmesg | grep ASPM [0.00] PCIe ASPM is disabled [0.348959] pci:00: ACPI _OSC support notification failed, disabling PCIe ASPM [chris:~]$ dmesg | grep acpiphp [0.400846] acpiphp: ACPI Hot Plug PCI Controller Driver version: 0.5 [chris:~]$ dmesg | grep pciehp [chris:~]$ uname -a Linux laptop 3.7.4 #13 SMP PREEMPT Sun Jan 27 18:39:39 GMT 2013 i686 GNU/Linux vostro ~ # cat /proc/cmdline root=/dev/sda5 pciehp.pciehp_debug=1 slub_debug=AFPZ pcie_aspm=off vostro ~ # dmesg | grep ASPM [0.00] PCIe ASPM is disabled vostro ~ # dmesg | grep acpiphp [2.449038] acpiphp: ACPI Hot Plug PCI Controller Driver version: 0.5 [2.453757] acpiphp: Slot [1] registered vostro ~ # uname -a Linux vostro 3.7.4-default #2 SMP Mon Jan 21 22:45:22 MET 2013 x86_64 Intel(R) Core(TM) i7-2640M CPU @ 2.80GHz GenuineIntel GNU/Linux vostro ~ # Then compare them using diff. These should have no difference: diff lspci.after_insertion.txt lspci.after_1st_re-insertion.txt diff lspci.after_1st_removal.txt lspci.after_2nd_removal.txt Correct, there were no differences. These may have only little difference, or none: diff lspci.before_insertion.txt lspci.after_1st_removal.txt 263c263 LnkCap: Port #4, Speed 5GT/s, Width x1, ASPM L0s L1, Latency L0 1us, L1 16us --- LnkCap: Port #4, Speed 5GT/s, Width x1, ASPM L0s L1, Latency L0 512ns, L1 16us 265c265 LnkCtl: ASPM L0s L1 Enabled; RCB 64 bytes Disabled- Retrain- CommClk- --- LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+ 267c267 LnkSta: Speed 2.5GT/s, Width x0, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- --- LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt+ ABWMgmt- 273c273 Changed: MRL- PresDet- LinkState- --- Changed: MRL- PresDet- LinkState+ 295,296c295,296 40: 10 80 42 01 00 80 00 00 00 00 10 00 12 4c 12 04 50: 03 00 01 10 60 b2 1c 00 08 00 00 00 00 00 00 00 --- 40: 10 80 42 01 00 80 00 00 00 00 10 00 12 3c 12 04 50: 40 00 11 50 60 b2 1c 00 08 00 00 01 00 00 00 00 diff lspci.after_1st_removal.txt lspci.after_2nd_removal.txt No difference. Finally, these should confirm whether the PresDet works for you (for me NOT with pciehp but does work with acpiphp). You should see PresDet- to PresDet+ changes in: Yes, I do see the PresDet- to PresDet+ changes diff lspci.before_insertion.txt lspci.after_insertion.txt 263c263 LnkCap: Port #4, Speed 5GT/s, Width x1, ASPM L0s L1, Latency L0 1us, L1 16us --- LnkCap: Port #4, Speed 5GT/s, Width x1, ASPM L0s L1, Latency L0 512ns, L1 16us 265c265 LnkCtl: ASPM L0s L1 Enabled; RCB 64 bytes Disabled- Retrain- CommClk- --- LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+ 267c267 LnkSta: Speed 2.5GT/s, Width x0, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- --- LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive+ BWMgmt+ ABWMgmt- 272,273c272,273 SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet- Interlock- Changed: MRL- PresDet- LinkState- --- SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet+ Interlock- Changed: MRL- PresDet- LinkState+ 295,296c295,296 40: 10 80 42 01 00 80 00 00 00 00 10 00 12 4c 12 04 50: 03 00 01 10 60 b2 1c 00 08 00 00 00 00 00 00 00 --- 40: 10 80 42 01 00 80 00 00 00 00 10 00 12 3c 12 04 50: 40 00 11 70 60 b2 1c 00 08 00 40 01 00 00 00 00 diff lspci.after_1st_removal.txt lspci.after_1st_re-insertion.txt 267c267 LnkSta: Speed 2.5GT/s, Width x1,
Re: 3.8.0-rc4+ - Oops on removing WinTV-HVR-1400 expresscard TV Tuner
Hi Chris, Chris Clayton wrote: > Hi Martin, > > On 01/24/13 19:21, Martin Mokrejs wrote: >> Hi Chris, >>try to include in kernel only acpiphp and omit pciehp. Don't use modules >> but include >> them statically. And try, in addition, check whether "pcie_aspm=off" in >> grub.conf helped. >> > > Thanks for the tip. I had the pciehp driver installed, but it was a module > and not loaded. I didn't have acpiphp enabled at all. Building them both in > statically, appears to have papered over the cracks of the oops :-) > >>The best would if you subscribe to linux-pci, and read my recent threads >> about similar issues I had with express cards with Dell Vostro 3550. >> Further, there is >> a lot of changes to PCI hotplug done by Yingahi Liu and Rafael Wysockij, >> just browse the >> archives of linux-pci and see the pacthes and the discussion. > > Those discussions are way above my level of knowledge. I guess all this work > will be merged into mainline in due course, so I'll watch for them in 3.9 or > later. Unless, of course, there is a tree I could clone and help test the > changes with my laptop and expresscard. > > Hotplug isn't working at all on my Fujitsu laptop, so I can only get the card > recognised by rebooting with the card inserted (or by writing 1 > to/sys/bus/pci/rescan). There seem to be a few reports on this in the kernel > bugzilla, so I'll look through them and see what's being done. That's what I suspected. Compile in statically acpiphp, no pciehp at all (not even as a module). Then it might work for you -- at least it does for me, provided I use "pcie_aspm=off". Martin > > Thanks again. > > Chris > >> Martin >> >> Chris Clayton wrote: >>> Hi, >>> >>> I've today taken delivery of a WinTV-HVR-1400 expresscard TV Tuner and got >>> an Oops when I removed from the expresscard slot in my laptop. I will quite >>> understand if the response to this report is "don't do that!", but in that >>> case, how should one remove one of these cards? >>> >>> I have attached three files: >>> >>> 1. the dmesg output from when I rebooted the machine after the oops. I have >>> turned debugging on in the dib700p and cx23885 modules via modules options >>> in /etc/modprobe.d/hvr1400.conf; >>> >>> 2. the .config file for the kernel that oopsed. >>> >>> 3. the text of the oops message. I've typed this up from a photograph of >>> the screen because the laptop was locked up and there was nothing in the >>> log files. Apologies for any typos, but I have tried to be careful. >>> >>> Assuming the answer isn't don't do that, let me know if I can provide any >>> additional diagnostics, test any patches, etc. Please, however, cc me as >>> I'm not subscribed. >>> >>> Chris > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 3.8.0-rc4+ - Oops on removing WinTV-HVR-1400 expresscard TV Tuner
Hi Chris, Chris Clayton wrote: Hi Martin, On 01/24/13 19:21, Martin Mokrejs wrote: Hi Chris, try to include in kernel only acpiphp and omit pciehp. Don't use modules but include them statically. And try, in addition, check whether pcie_aspm=off in grub.conf helped. Thanks for the tip. I had the pciehp driver installed, but it was a module and not loaded. I didn't have acpiphp enabled at all. Building them both in statically, appears to have papered over the cracks of the oops :-) The best would if you subscribe to linux-pci, and read my recent threads about similar issues I had with express cards with Dell Vostro 3550. Further, there is a lot of changes to PCI hotplug done by Yingahi Liu and Rafael Wysockij, just browse the archives of linux-pci and see the pacthes and the discussion. Those discussions are way above my level of knowledge. I guess all this work will be merged into mainline in due course, so I'll watch for them in 3.9 or later. Unless, of course, there is a tree I could clone and help test the changes with my laptop and expresscard. Hotplug isn't working at all on my Fujitsu laptop, so I can only get the card recognised by rebooting with the card inserted (or by writing 1 to/sys/bus/pci/rescan). There seem to be a few reports on this in the kernel bugzilla, so I'll look through them and see what's being done. That's what I suspected. Compile in statically acpiphp, no pciehp at all (not even as a module). Then it might work for you -- at least it does for me, provided I use pcie_aspm=off. Martin Thanks again. Chris Martin Chris Clayton wrote: Hi, I've today taken delivery of a WinTV-HVR-1400 expresscard TV Tuner and got an Oops when I removed from the expresscard slot in my laptop. I will quite understand if the response to this report is don't do that!, but in that case, how should one remove one of these cards? I have attached three files: 1. the dmesg output from when I rebooted the machine after the oops. I have turned debugging on in the dib700p and cx23885 modules via modules options in /etc/modprobe.d/hvr1400.conf; 2. the .config file for the kernel that oopsed. 3. the text of the oops message. I've typed this up from a photograph of the screen because the laptop was locked up and there was nothing in the log files. Apologies for any typos, but I have tried to be careful. Assuming the answer isn't don't do that, let me know if I can provide any additional diagnostics, test any patches, etc. Please, however, cc me as I'm not subscribed. Chris -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] USB: XHCI: fix memory leak of URB-private data
Greg KH wrote: > On Thu, Jan 24, 2013 at 10:53:25PM +0100, Martin Mokrejs wrote: >> Hi Sarah and Alan, >> I just saw 3.7.5 patches announced by Greg but I don't see this path in >> there. >> And, don't know but maybe this applies to older stable kernels as well? >> Where will this patch posted originally to linux-usb land? >> >> Ah, is that because the email was actually NOT sent to "stable@"? ;-) > > No. It's because the patch isn't in Linus's tree yet, which is one of > the requirements for a patch to be able to get into the stable kernel > releases. > > Please read the kernel file, Documentation/stable_kernel_rules.txt for > more details if you are curious. Thank you Greg! Aside from the fact that I do not know how much serious a memleak is and whether it is eligible for -stable. Other than that, it was helpful to read the file. Will see what happens. Meanwhile will continue running my patched kernel. ;-) Martin -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] USB: XHCI: fix memory leak of URB-private data
Hi Sarah and Alan, I just saw 3.7.5 patches announced by Greg but I don't see this path in there. And, don't know but maybe this applies to older stable kernels as well? Where will this patch posted originally to linux-usb land? Ah, is that because the email was actually NOT sent to "stable@"? ;-) Date: Thu, 17 Jan 2013 10:32:16 -0500 (EST) From: Alan Stern To: Sarah Sharp cc: Martin Mokrejs , USB list Subject: [PATCH] USB: XHCI: fix memory leak of URB-private data Message-ID: Thank you, Martin Alan Stern wrote: > This patch (as1640) fixes a memory leak in xhci-hcd. The urb_priv > data structure isn't always deallocated in the handle_tx_event() > routine for non-control transfers. The patch adds a kfree() call so > that all paths end up freeing the memory properly. > > Signed-off-by: Alan Stern > Reported-and-tested-by: Martin Mokrejs > CC: > > --- > > drivers/usb/host/xhci-ring.c |2 ++ > 1 file changed, 2 insertions(+) > > Index: usb-3.7/drivers/usb/host/xhci-ring.c > === > --- usb-3.7.orig/drivers/usb/host/xhci-ring.c > +++ usb-3.7/drivers/usb/host/xhci-ring.c > @@ -2580,6 +2580,8 @@ cleanup: > (trb_comp_code != COMP_STALL && > trb_comp_code != COMP_BABBLE)) > xhci_urb_free_priv(xhci, urb_priv); > + else > + kfree(urb_priv); > > usb_hcd_unlink_urb_from_ep(bus_to_hcd(urb->dev->bus), > urb); > if ((urb->actual_length != urb->transfer_buffer_length > && > > -- > To unsubscribe from this list: send the line "unsubscribe linux-usb" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 3.8.0-rc4+ - Oops on removing WinTV-HVR-1400 expresscard TV Tuner
Hi Chris, try to include in kernel only acpiphp and omit pciehp. Don't use modules but include them statically. And try, in addition, check whether "pcie_aspm=off" in grub.conf helped. The best would if you subscribe to linux-pci, and read my recent threads about similar issues I had with express cards with Dell Vostro 3550. Further, there is a lot of changes to PCI hotplug done by Yingahi Liu and Rafael Wysockij, just browse the archives of linux-pci and see the pacthes and the discussion. Martin Chris Clayton wrote: > Hi, > > I've today taken delivery of a WinTV-HVR-1400 expresscard TV Tuner and got an > Oops when I removed from the expresscard slot in my laptop. I will quite > understand if the response to this report is "don't do that!", but in that > case, how should one remove one of these cards? > > I have attached three files: > > 1. the dmesg output from when I rebooted the machine after the oops. I have > turned debugging on in the dib700p and cx23885 modules via modules options in > /etc/modprobe.d/hvr1400.conf; > > 2. the .config file for the kernel that oopsed. > > 3. the text of the oops message. I've typed this up from a photograph of the > screen because the laptop was locked up and there was nothing in the log > files. Apologies for any typos, but I have tried to be careful. > > Assuming the answer isn't don't do that, let me know if I can provide any > additional diagnostics, test any patches, etc. Please, however, cc me as I'm > not subscribed. > > Chris -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 3.8.0-rc4+ - Oops on removing WinTV-HVR-1400 expresscard TV Tuner
Hi Chris, try to include in kernel only acpiphp and omit pciehp. Don't use modules but include them statically. And try, in addition, check whether pcie_aspm=off in grub.conf helped. The best would if you subscribe to linux-pci, and read my recent threads about similar issues I had with express cards with Dell Vostro 3550. Further, there is a lot of changes to PCI hotplug done by Yingahi Liu and Rafael Wysockij, just browse the archives of linux-pci and see the pacthes and the discussion. Martin Chris Clayton wrote: Hi, I've today taken delivery of a WinTV-HVR-1400 expresscard TV Tuner and got an Oops when I removed from the expresscard slot in my laptop. I will quite understand if the response to this report is don't do that!, but in that case, how should one remove one of these cards? I have attached three files: 1. the dmesg output from when I rebooted the machine after the oops. I have turned debugging on in the dib700p and cx23885 modules via modules options in /etc/modprobe.d/hvr1400.conf; 2. the .config file for the kernel that oopsed. 3. the text of the oops message. I've typed this up from a photograph of the screen because the laptop was locked up and there was nothing in the log files. Apologies for any typos, but I have tried to be careful. Assuming the answer isn't don't do that, let me know if I can provide any additional diagnostics, test any patches, etc. Please, however, cc me as I'm not subscribed. Chris -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] USB: XHCI: fix memory leak of URB-private data
Hi Sarah and Alan, I just saw 3.7.5 patches announced by Greg but I don't see this path in there. And, don't know but maybe this applies to older stable kernels as well? Where will this patch posted originally to linux-usb land? Ah, is that because the email was actually NOT sent to stable@? ;-) Date: Thu, 17 Jan 2013 10:32:16 -0500 (EST) From: Alan Stern st...@rowland.harvard.edu To: Sarah Sharp sarah.a.sh...@linux.intel.com cc: Martin Mokrejs mmokr...@fold.natur.cuni.cz, USB list linux-...@vger.kernel.org Subject: [PATCH] USB: XHCI: fix memory leak of URB-private data Message-ID: pine.lnx.4.44l0.1301171031260.1339-100...@iolanthe.rowland.org Thank you, Martin Alan Stern wrote: This patch (as1640) fixes a memory leak in xhci-hcd. The urb_priv data structure isn't always deallocated in the handle_tx_event() routine for non-control transfers. The patch adds a kfree() call so that all paths end up freeing the memory properly. Signed-off-by: Alan Stern st...@rowland.harvard.edu Reported-and-tested-by: Martin Mokrejs mmokr...@fold.natur.cuni.cz CC: sta...@vger.kernel.org --- drivers/usb/host/xhci-ring.c |2 ++ 1 file changed, 2 insertions(+) Index: usb-3.7/drivers/usb/host/xhci-ring.c === --- usb-3.7.orig/drivers/usb/host/xhci-ring.c +++ usb-3.7/drivers/usb/host/xhci-ring.c @@ -2580,6 +2580,8 @@ cleanup: (trb_comp_code != COMP_STALL trb_comp_code != COMP_BABBLE)) xhci_urb_free_priv(xhci, urb_priv); + else + kfree(urb_priv); usb_hcd_unlink_urb_from_ep(bus_to_hcd(urb-dev-bus), urb); if ((urb-actual_length != urb-transfer_buffer_length -- To unsubscribe from this list: send the line unsubscribe linux-usb in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] USB: XHCI: fix memory leak of URB-private data
Greg KH wrote: On Thu, Jan 24, 2013 at 10:53:25PM +0100, Martin Mokrejs wrote: Hi Sarah and Alan, I just saw 3.7.5 patches announced by Greg but I don't see this path in there. And, don't know but maybe this applies to older stable kernels as well? Where will this patch posted originally to linux-usb land? Ah, is that because the email was actually NOT sent to stable@? ;-) No. It's because the patch isn't in Linus's tree yet, which is one of the requirements for a patch to be able to get into the stable kernel releases. Please read the kernel file, Documentation/stable_kernel_rules.txt for more details if you are curious. Thank you Greg! Aside from the fact that I do not know how much serious a memleak is and whether it is eligible for -stable. Other than that, it was helpful to read the file. Will see what happens. Meanwhile will continue running my patched kernel. ;-) Martin -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
linux-3.7.4: kmemleak in sctp_sysctl_net_register()?
Hi, today I got the following report from the kernel, looks it happened when I started/used/quit chromium browser. I haven't seen this with 3.7.1 but I use builtin kmemleak detector only for 2-3 weeks. unreferenced object 0x880402d08000 (size 2048): comm "chrome_sandbox", pid 18437, jiffies 4310887172 (age 9097.630s) hex dump (first 32 bytes): b2 68 89 81 ff ff ff ff 20 04 04 f8 01 88 ff ff .h.. ... 04 00 00 00 a4 01 00 00 00 00 00 00 00 00 00 00 backtrace: [] kmemleak_alloc+0x21/0x3e [] slab_post_alloc_hook+0x28/0x2a [] __kmalloc_track_caller+0xf1/0x104 [] kmemdup+0x1b/0x30 [] sctp_sysctl_net_register+0x1f/0x72 [] sctp_net_init+0x100/0x39f [] ops_init+0xc6/0xf5 [] setup_net+0x4c/0xd0 [] copy_net_ns+0x6d/0xd6 [] create_new_namespaces+0xd7/0x147 [] copy_namespaces+0x63/0x99 [] copy_process+0xa65/0x1233 [] do_fork+0x10b/0x271 [] sys_clone+0x23/0x25 [] stub_clone+0x13/0x20 [] 0x Please let me know if you need more info, like dmesg, .config or other. Hope this helps. Martin -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: linux-3.7.[1,4]: kmemleak in i801_probe
Hi Jean, Jean Delvare wrote: > Hi Martin, > > On Wed, 23 Jan 2013 12:15:37 +0100, Martin Mokrejs wrote: >> Hi, >> I already reported this to lkml recently with linux-3.7.1 but this is to >> let you know >> that with 3.7.4 I am still getting this kmemleak reported by the kernel. > > I don't read LKML. > >> unreferenced object 0x88040b614690 (size 256): >> comm "swapper/0", pid 1, jiffies 4294937573 (age 133834.550s) >> hex dump (first 32 bytes): >> 00 00 00 00 ad 4e ad de ff ff ff ff 00 00 00 00 .N.. >> ff ff ff ff ff ff ff ff 08 7f 5d 82 ff ff ff ff ..]. >> backtrace: >> [] kmemleak_alloc+0x21/0x3e >> [] slab_post_alloc_hook+0x28/0x2a >> [] __kmalloc+0xf2/0x104 >> [] kzalloc.constprop.14+0xe/0x10 >> [] device_private_init+0x14/0x63 >> [] dev_set_drvdata+0x19/0x2f >> [] i801_probe+0x5e/0x451 >> [] local_pci_probe+0x39/0x61 >> [] pci_device_probe+0xc6/0xf3 >> [] driver_probe_device+0xa9/0x1c1 >> [] __driver_attach+0x5a/0x7e >> [] bus_for_each_dev+0x57/0x83 >> [] driver_attach+0x19/0x1b >> [] bus_add_driver+0xa8/0x1fa >> [] driver_register+0x8c/0x106 >> [] __pci_register_driver+0x59/0x5d > > I am using the i2c-i801 driver, enabled kmemleak, but I don't get this > leak. Did you have to do anything special to get it? Didn't you get a Based on the dmesg timestamp I think I just logged in through xdm. Eh. Actually, xdm crashes for me, I have to do in the framebuffer VT console: root # /etc/init.d/xdm stop user $ startx and happily use my X. I have a bugreport opened at https://bugs.freedesktop.org/show_bug.cgi?id=56608 but I doubt it is related to i2c_801 driver. But it is not clear why I cannot just use xdm but can always start X11 via startx. And actually, rarely, but without reinstalling my kernel or x11 server or drivers, I sometimes (1/20 attempts?) I can login through xdm. But comparing Xorg.log files from successful xdm login against those unsuccessful did not help so far. Only reordered items, probably due to autoconfig. So, I don't think it helps you with isolating the i2c_801 driver memleak. > similar leak with older kernels? Do you get a similar leak (with > reference to dev_set_drvdata)? With 3.7.1 I was getting same stacktrace: unreferenced object 0x88040b1c5230 (size 256): comm "swapper/0", pid 1, jiffies 4294937570 (age 182492.630s) hex dump (first 32 bytes): 00 00 00 00 ad 4e ad de ff ff ff ff 00 00 00 00 .N.. ff ff ff ff ff ff ff ff 38 3f 5d 82 ff ff ff ff 8?]. backtrace: [] kmemleak_alloc+0x21/0x3e [] slab_post_alloc_hook+0x28/0x2a [] __kmalloc+0xf2/0x104 [] kzalloc.constprop.14+0xe/0x10 [] device_private_init+0x14/0x63 [] dev_set_drvdata+0x19/0x2f [] i801_probe+0x5e/0x451 [] local_pci_probe+0x5b/0xa2 [] pci_device_probe+0xc8/0xf7 [] driver_probe_device+0xa9/0x1c1 [] __driver_attach+0x5a/0x7e [] bus_for_each_dev+0x57/0x83 [] driver_attach+0x19/0x1b [] bus_add_driver+0xa8/0x1fa [] driver_register+0x8c/0x106 [] __pci_register_driver+0x5a/0x5e Before 3.7.1 I did not use kmemleak detector. while searching my older emails/reports I found only that I loaded in the past both drivers (on a 2.6.32.59 kernel): Mar 26 11:21:55 vostro kernel: i801_smbus :00:1f.3: PCI INT C -> GSI 18 (level, low) -> IRQ 18 Mar 26 11:21:55 vostro kernel: ACPI: If an ACPI driver is available for this device, you should use it instead of the native driver And here the relevant line from lspci from that time: 00:1f.3 SMBus: Intel Corporation 6 Series/C200 Series Chipset Family SMBus Controller (rev 05) 00:1f.3 SMBus: Intel Corporation 6 Series/C200 Series Chipset Family SMBus Controller (rev 05) Subsystem: Dell Device 04b3 Flags: medium devsel, IRQ 18 Memory at f7f05000 (64-bit, non-prefetchable) [size=256] I/O ports at f040 [size=32] Kernel modules: i2c-i801 I don't think this will help you now. :( > > I can see that dev_set_drvdata may allocate memory (which I didn't > know) and I admit I don't see where it gets released, however this is > all happening in the driver core and isn't specific to the i2c-i801 > driver, so if there really is a leak there, you should see it in all > drivers. I am not a kernel developer at all but maybe that little bit points out that the kmemleak was reported when I was pulling in/out my external USB drives? Martin -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: linux-3.7.[1,4]: kmemleak in i801_probe
Hi Jean, Jean Delvare wrote: Hi Martin, On Wed, 23 Jan 2013 12:15:37 +0100, Martin Mokrejs wrote: Hi, I already reported this to lkml recently with linux-3.7.1 but this is to let you know that with 3.7.4 I am still getting this kmemleak reported by the kernel. I don't read LKML. unreferenced object 0x88040b614690 (size 256): comm swapper/0, pid 1, jiffies 4294937573 (age 133834.550s) hex dump (first 32 bytes): 00 00 00 00 ad 4e ad de ff ff ff ff 00 00 00 00 .N.. ff ff ff ff ff ff ff ff 08 7f 5d 82 ff ff ff ff ..]. backtrace: [815b4aad] kmemleak_alloc+0x21/0x3e [81110352] slab_post_alloc_hook+0x28/0x2a [8111288a] __kmalloc+0xf2/0x104 [81305165] kzalloc.constprop.14+0xe/0x10 [813055c6] device_private_init+0x14/0x63 [813076a0] dev_set_drvdata+0x19/0x2f [815c4f5e] i801_probe+0x5e/0x451 [81280e40] local_pci_probe+0x39/0x61 [81281f53] pci_device_probe+0xc6/0xf3 [81307c5d] driver_probe_device+0xa9/0x1c1 [81307dcf] __driver_attach+0x5a/0x7e [8130650a] bus_for_each_dev+0x57/0x83 [81307806] driver_attach+0x19/0x1b [813073d8] bus_add_driver+0xa8/0x1fa [81308241] driver_register+0x8c/0x106 [81281b4e] __pci_register_driver+0x59/0x5d I am using the i2c-i801 driver, enabled kmemleak, but I don't get this leak. Did you have to do anything special to get it? Didn't you get a Based on the dmesg timestamp I think I just logged in through xdm. Eh. Actually, xdm crashes for me, I have to do in the framebuffer VT console: root # /etc/init.d/xdm stop user $ startx and happily use my X. I have a bugreport opened at https://bugs.freedesktop.org/show_bug.cgi?id=56608 but I doubt it is related to i2c_801 driver. But it is not clear why I cannot just use xdm but can always start X11 via startx. And actually, rarely, but without reinstalling my kernel or x11 server or drivers, I sometimes (1/20 attempts?) I can login through xdm. But comparing Xorg.log files from successful xdm login against those unsuccessful did not help so far. Only reordered items, probably due to autoconfig. So, I don't think it helps you with isolating the i2c_801 driver memleak. similar leak with older kernels? Do you get a similar leak (with reference to dev_set_drvdata)? With 3.7.1 I was getting same stacktrace: unreferenced object 0x88040b1c5230 (size 256): comm swapper/0, pid 1, jiffies 4294937570 (age 182492.630s) hex dump (first 32 bytes): 00 00 00 00 ad 4e ad de ff ff ff ff 00 00 00 00 .N.. ff ff ff ff ff ff ff ff 38 3f 5d 82 ff ff ff ff 8?]. backtrace: [815b1dbd] kmemleak_alloc+0x21/0x3e [81110536] slab_post_alloc_hook+0x28/0x2a [81112a6e] __kmalloc+0xf2/0x104 [81302bd5] kzalloc.constprop.14+0xe/0x10 [81303036] device_private_init+0x14/0x63 [81305110] dev_set_drvdata+0x19/0x2f [815c1ed4] i801_probe+0x5e/0x451 [81280fb3] local_pci_probe+0x5b/0xa2 [81282074] pci_device_probe+0xc8/0xf7 [813056cd] driver_probe_device+0xa9/0x1c1 [8130583f] __driver_attach+0x5a/0x7e [81303f7a] bus_for_each_dev+0x57/0x83 [81305276] driver_attach+0x19/0x1b [81304e48] bus_add_driver+0xa8/0x1fa [81305cb1] driver_register+0x8c/0x106 [81281c6d] __pci_register_driver+0x5a/0x5e Before 3.7.1 I did not use kmemleak detector. while searching my older emails/reports I found only that I loaded in the past both drivers (on a 2.6.32.59 kernel): Mar 26 11:21:55 vostro kernel: i801_smbus :00:1f.3: PCI INT C - GSI 18 (level, low) - IRQ 18 Mar 26 11:21:55 vostro kernel: ACPI: If an ACPI driver is available for this device, you should use it instead of the native driver And here the relevant line from lspci from that time: 00:1f.3 SMBus: Intel Corporation 6 Series/C200 Series Chipset Family SMBus Controller (rev 05) 00:1f.3 SMBus: Intel Corporation 6 Series/C200 Series Chipset Family SMBus Controller (rev 05) Subsystem: Dell Device 04b3 Flags: medium devsel, IRQ 18 Memory at f7f05000 (64-bit, non-prefetchable) [size=256] I/O ports at f040 [size=32] Kernel modules: i2c-i801 I don't think this will help you now. :( I can see that dev_set_drvdata may allocate memory (which I didn't know) and I admit I don't see where it gets released, however this is all happening in the driver core and isn't specific to the i2c-i801 driver, so if there really is a leak there, you should see it in all drivers. I am not a kernel developer at all but maybe that little bit points out that the kmemleak was reported when I was pulling in/out my external USB drives? Martin -- To unsubscribe from this list: send the line unsubscribe linux
linux-3.7.4: kmemleak in sctp_sysctl_net_register()?
Hi, today I got the following report from the kernel, looks it happened when I started/used/quit chromium browser. I haven't seen this with 3.7.1 but I use builtin kmemleak detector only for 2-3 weeks. unreferenced object 0x880402d08000 (size 2048): comm chrome_sandbox, pid 18437, jiffies 4310887172 (age 9097.630s) hex dump (first 32 bytes): b2 68 89 81 ff ff ff ff 20 04 04 f8 01 88 ff ff .h.. ... 04 00 00 00 a4 01 00 00 00 00 00 00 00 00 00 00 backtrace: [815b4aad] kmemleak_alloc+0x21/0x3e [81110352] slab_post_alloc_hook+0x28/0x2a [81113fad] __kmalloc_track_caller+0xf1/0x104 [810f10c2] kmemdup+0x1b/0x30 [81571e9f] sctp_sysctl_net_register+0x1f/0x72 [8155d305] sctp_net_init+0x100/0x39f [814ad53c] ops_init+0xc6/0xf5 [814ad5b7] setup_net+0x4c/0xd0 [814ada5e] copy_net_ns+0x6d/0xd6 [810938b1] create_new_namespaces+0xd7/0x147 [810939f4] copy_namespaces+0x63/0x99 [81076733] copy_process+0xa65/0x1233 [81077030] do_fork+0x10b/0x271 [8100a0e9] sys_clone+0x23/0x25 [815dda73] stub_clone+0x13/0x20 [] 0x Please let me know if you need more info, like dmesg, .config or other. Hope this helps. Martin -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Use of memmap= to forcibly recover memory in 3GB-4GB range - is this safe?
Yinghai Lu wrote: > On Wed, Jan 16, 2013 at 4:24 PM, Alex Villacís Lasso > wrote: >> El 16/01/13 02:11, Yinghai Lu escribió: >> >>> On Tue, Jan 15, 2013 at 5:47 PM, Alex Villacís Lasso >>> wrote: [0.00] e820: BIOS-provided physical RAM map: [0.00] BIOS-e820: [mem 0x-0x0009f3ff] usable [0.00] BIOS-e820: [mem 0x0009f400-0x0009] reserved [0.00] BIOS-e820: [mem 0x000f-0x000f] reserved [0.00] BIOS-e820: [mem 0x0010-0xcf58] usable [0.00] BIOS-e820: [mem 0xcf59-0xcf5e2fff] ACPI NVS [0.00] BIOS-e820: [mem 0xcf5e3000-0xcf5e] ACPI data [0.00] BIOS-e820: [mem 0xcf5f-0xcf5f] reserved [0.00] BIOS-e820: [mem 0xe000-0xefff] reserved [0.00] BIOS-e820: [mem 0xfec0-0x] reserved [0.00] NX (Execute Disable) protection: active >>> >>> .. [0.00] original variable MTRRs [0.00] reg 0, base: 4GB, range: 512MB, type WB [0.00] reg 1, base: 4608MB, range: 256MB, type WB [0.00] reg 2, base: 0GB, range: 2GB, type WB [0.00] reg 3, base: 2GB, range: 1GB, type WB [0.00] reg 4, base: 3GB, range: 256MB, type WB [0.00] reg 5, base: 3319MB, range: 1MB, type UC [0.00] reg 6, base: 3320MB, range: 8MB, type UC [0.00] reg 7, base: 3318MB, range: 1MB, type UC [0.00] total RAM covered: 4086M >>> >>> Can you apply attached debug patch to see if the raw e820 is right from >>> BIOS ? > >> Done. The output is attached. I see no difference between raw and sanitized >> maps. > > yeah, it is BIOS problem. > > you may either live with memmap= or try to get one BIOS update. Hi Yinghai, wouldn't it be useful for other to include this patch into the kernel? It might help someone else. Provided it is printed only when extra debug is enabled in the kernel I don't think it hurts. Right? Actually, if it could do the check for differences automatically and print a warning it would be even better. Martin -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Use of memmap= to forcibly recover memory in 3GB-4GB range - is this safe?
Yinghai Lu wrote: On Wed, Jan 16, 2013 at 4:24 PM, Alex Villacís Lasso a_villa...@palosanto.com wrote: El 16/01/13 02:11, Yinghai Lu escribió: On Tue, Jan 15, 2013 at 5:47 PM, Alex Villacís Lasso a_villa...@palosanto.com wrote: [0.00] e820: BIOS-provided physical RAM map: [0.00] BIOS-e820: [mem 0x-0x0009f3ff] usable [0.00] BIOS-e820: [mem 0x0009f400-0x0009] reserved [0.00] BIOS-e820: [mem 0x000f-0x000f] reserved [0.00] BIOS-e820: [mem 0x0010-0xcf58] usable [0.00] BIOS-e820: [mem 0xcf59-0xcf5e2fff] ACPI NVS [0.00] BIOS-e820: [mem 0xcf5e3000-0xcf5e] ACPI data [0.00] BIOS-e820: [mem 0xcf5f-0xcf5f] reserved [0.00] BIOS-e820: [mem 0xe000-0xefff] reserved [0.00] BIOS-e820: [mem 0xfec0-0x] reserved [0.00] NX (Execute Disable) protection: active .. [0.00] original variable MTRRs [0.00] reg 0, base: 4GB, range: 512MB, type WB [0.00] reg 1, base: 4608MB, range: 256MB, type WB [0.00] reg 2, base: 0GB, range: 2GB, type WB [0.00] reg 3, base: 2GB, range: 1GB, type WB [0.00] reg 4, base: 3GB, range: 256MB, type WB [0.00] reg 5, base: 3319MB, range: 1MB, type UC [0.00] reg 6, base: 3320MB, range: 8MB, type UC [0.00] reg 7, base: 3318MB, range: 1MB, type UC [0.00] total RAM covered: 4086M Can you apply attached debug patch to see if the raw e820 is right from BIOS ? Done. The output is attached. I see no difference between raw and sanitized maps. yeah, it is BIOS problem. you may either live with memmap= or try to get one BIOS update. Hi Yinghai, wouldn't it be useful for other to include this patch into the kernel? It might help someone else. Provided it is printed only when extra debug is enabled in the kernel I don't think it hurts. Right? Actually, if it could do the check for differences automatically and print a warning it would be even better. Martin -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: linux-3.7.1: OOPS in page_lock_anon_vma
Hugh Dickins wrote: > On Sun, 6 Jan 2013, Martin Mokrejs wrote: > >> I was running 3.7.1 kernel quite fine for a while but I realized that it is >> slow and that >> I should go and drop useless kernel drivers from my kernel. I have a >> SandyBridge-based >> laptop and I found that I gain speed while setting CONFIG_NO_HZ=y, >> CONFIG_PREEMPT_NONE=y, >> removing multicore scheduler, asking configurator set set maximum amount of >> CPUs for my >> system (and not blindly specifying 4 for my dual-core i7 processor). >> Further I get faster system while removing IOMMU and DMA redirects while it >> still >> emulates NUMA. And, I switched away from CFQ scheduler to deadline and from >> SLAB to SLUB. >> Finally, to make sure my CPU cores do not go back and forth between C0 and >> C7 states and >> shutdown dynamically the 2 hyperthreaded cores. So I have really only two, >> physical cores >> accessible. With performance CPU governor I have 1/2 of context switches and >> both cores >> can be satured by whatever jobs (kernel compile or some computational jobs). >> It was not >> possible to get the CPU running at turbo speed for a long while as it always >> went down >> time to time. With ondemand governor I had cores in C7 for 50-70% of the >> time, that was >> a bit better with performance governor but having the two hyperthreaded >> cores disabled >> reduced the context switches by half, rescheduling interrupts went down by >> several orders >> of magnitute. So it is crunching at max turbo speed on both cores, temp >> about 80 oC. >> >> I think none of the changes relates to the kernel crash directly but I had >> not a single crash >> with 3.7.1 for few weeks. After the tweaks I had 3-4 crashes this afternoon. >> The system always >> locked up so I could not see anything. Luckily, be it actually the same >> crash or not, now my X11 >> screen was dropped and to my framebuffer console and I got to see a kernel >> stacktrace. Here >> is the first, fished out from /var/log/messages upon next bootup: >> >> >> Jan 6 22:37:29 vostro kernel: [ 7663.251110] general protection fault: >> [#1] SMP >> Jan 6 22:37:29 vostro kernel: [ 7663.251135] Modules linked in: i915 fbcon >> bitblit cfbfillrect softcursor cfbimgblt i2c_algo_bit font cfbcopyarea >> drm_kms_helper drm fb iwldvm iwlwifi fbdev sata_sil24 >> Jan 6 22:37:29 vostro kernel: [ 7663.251197] CPU 1 >> Jan 6 22:37:29 vostro kernel: [ 7663.251206] Pid: 795, comm: kswapd0 Not >> tainted 3.7.1-default #22 Dell Inc. Vostro 3550/ >> Jan 6 22:37:29 vostro kernel: [ 7663.251229] RIP: 0010:[] >> [] mutex_trylock+0xb/0x26 >> Jan 6 22:37:29 vostro kernel: [ 7663.251257] RSP: 0018:88040d25bbb8 >> EFLAGS: 00010246 >> Jan 6 22:37:29 vostro kernel: [ 7663.251273] RAX: 0001 RBX: >> 88040bfdc000 RCX: 88040d25bce8 >> Jan 6 22:37:29 vostro kernel: [ 7663.251293] RDX: RSI: >> RDI: 0720072007200728 >> Jan 6 22:37:29 vostro kernel: [ 7663.251313] RBP: 88040d25bbb8 R08: >> dead00200200 R09: dead00100100 >> Jan 6 22:37:29 vostro kernel: [ 7663.251333] R10: 88040d25bc38 R11: >> 8804078acec0 R12: 88040bfdc001 >> Jan 6 22:37:29 vostro kernel: [ 7663.251354] R13: ea0010137440 R14: >> 0720072007200728 R15: 0001 >> Jan 6 22:37:29 vostro kernel: [ 7663.251374] FS: () >> GS:88041fa8() knlGS: >> Jan 6 22:37:29 vostro kernel: [ 7663.251396] CS: 0010 DS: ES: >> CR0: 80050033 >> Jan 6 22:37:29 vostro kernel: [ 7663.251413] CR2: 2b876c545978 CR3: >> 018f6000 CR4: 000407e0 >> Jan 6 22:37:29 vostro kernel: [ 7663.251432] DR0: DR1: >> DR2: >> Jan 6 22:37:29 vostro kernel: [ 7663.251452] DR3: DR6: >> 0ff0 DR7: 0400 >> Jan 6 22:37:29 vostro kernel: [ 7663.251472] Process kswapd0 (pid: 795, >> threadinfo 88040d25a000, task 88040d07ce30) >> Jan 6 22:37:29 vostro kernel: [ 7663.251494] Stack: >> Jan 6 22:37:29 vostro kernel: [ 7663.251501] 88040d25bbe8 >> 810f6994 ea0010137440 >> Jan 6 22:37:29 vostro kernel: [ 7663.251527] 88040d25bde8 >> 88041fddad00 88040d25bc58 810f6b9e >> Jan 6 22:37:29 vostro kernel: [ 7663.251551] >> 8804046d2dc0 810dee97 88040d25bce8 >> Jan
Re: [PATCH v3 3/6] ACPI/pci_slot: update PCI slot information when PCI hotplug event happens
Hi, I just hit this thread in my bloated Inbox. Rafael J. Wysocki wrote: > On Thursday, January 10, 2013 03:03:53 PM Yinghai Lu wrote: >> On Thu, Jan 10, 2013 at 1:50 PM, Rafael J. Wysocki wrote: >>> Well, I don't see what functional problems that can bring. >>> >>> In theory people may want to have them as modules to avoid loading them on >>> systems that don't use PCI hotplug, but honestly I think that the complexity >>> this causes us to deal with is not worth it. >>> >>> Moreover, removing the modularity may actually allow us to solve some >>> ordering >>> issues once and for good. >> >> No, the world is not really ideal yet. >> >> looks like laptops have problem with pci express cards. >> >> when pciehp is used, surprise insert/removal does not work because >> PresDect does not change properly, so no interrupt is generated. >> --- i suspects that is silicon problem. That's what seemed to be the conclusion half a year ago around 3.2.x/3.3.x for my issues as well (SandyBridge C6/C200 chipset). >> >> but when acpiphp is used, that surprise insert/removal is working. That's what I discovered few days ago as well. However, there are still some differences between individual express cards and I just need to find some time to dig through the data I collected. >> >> some laptop like thinkpad, just don't give osc to kernel.. >> [0.505117] pci:00: Requesting ACPI _OSC control (0x1d) >> [0.505413] pci:00: ACPI _OSC request failed (AE_SUPPORT), >> returned control mask: 0x0d >> [0.505517] ACPI _OSC control for PCIe not granted, disabling ASPM >> >> and other laptop give that to kernel, in recent kernel will not give >> acpiphp to have that slot, because it want to hold that for pciehp. >> poor user have to pass 'pci_aspm=off" to disable _OSC for all. >> --- please check the mail that i forward to you yesterday. > > Yes, this is a bug, but I'm not sure how to fix it yet. Looks like what I see with Dell Vostro 3550 as well. > >> Anyway, we do need to let the user to have choice to use acpiphp and pciehp. >> and it should be first come and first serve policy. > > And that's why you think they should be modules? I disagree if so. For me it is easier to cold boot with a card plugged in and fiddle later with hotplug if I want to unload the card. Until that, I can inspect wheteher PresDet really reports the card is in, and the see if system reports same after loading acpiphp or pciehp. I wouldn't drop the possibility to have them as modules, at least for now when finally we have some clue what is going on and can load the modules as we want while chasing the bugs. But sorry for hijacking this thread, maybe I managed to delete your answers on my thread ("Re: Dell Vostro 3550: pci_hotplug+acpiphp require 'pcie_aspm=force' on kernel command-line for hotplug to work"). Will go through web archives to make sure I did not miss something. Martin -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v3 3/6] ACPI/pci_slot: update PCI slot information when PCI hotplug event happens
Hi, I just hit this thread in my bloated Inbox. Rafael J. Wysocki wrote: On Thursday, January 10, 2013 03:03:53 PM Yinghai Lu wrote: On Thu, Jan 10, 2013 at 1:50 PM, Rafael J. Wysocki r...@sisk.pl wrote: Well, I don't see what functional problems that can bring. In theory people may want to have them as modules to avoid loading them on systems that don't use PCI hotplug, but honestly I think that the complexity this causes us to deal with is not worth it. Moreover, removing the modularity may actually allow us to solve some ordering issues once and for good. No, the world is not really ideal yet. looks like laptops have problem with pci express cards. when pciehp is used, surprise insert/removal does not work because PresDect does not change properly, so no interrupt is generated. --- i suspects that is silicon problem. That's what seemed to be the conclusion half a year ago around 3.2.x/3.3.x for my issues as well (SandyBridge C6/C200 chipset). but when acpiphp is used, that surprise insert/removal is working. That's what I discovered few days ago as well. However, there are still some differences between individual express cards and I just need to find some time to dig through the data I collected. some laptop like thinkpad, just don't give osc to kernel.. [0.505117] pci:00: Requesting ACPI _OSC control (0x1d) [0.505413] pci:00: ACPI _OSC request failed (AE_SUPPORT), returned control mask: 0x0d [0.505517] ACPI _OSC control for PCIe not granted, disabling ASPM and other laptop give that to kernel, in recent kernel will not give acpiphp to have that slot, because it want to hold that for pciehp. poor user have to pass 'pci_aspm=off to disable _OSC for all. --- please check the mail that i forward to you yesterday. Yes, this is a bug, but I'm not sure how to fix it yet. Looks like what I see with Dell Vostro 3550 as well. Anyway, we do need to let the user to have choice to use acpiphp and pciehp. and it should be first come and first serve policy. And that's why you think they should be modules? I disagree if so. For me it is easier to cold boot with a card plugged in and fiddle later with hotplug if I want to unload the card. Until that, I can inspect wheteher PresDet really reports the card is in, and the see if system reports same after loading acpiphp or pciehp. I wouldn't drop the possibility to have them as modules, at least for now when finally we have some clue what is going on and can load the modules as we want while chasing the bugs. But sorry for hijacking this thread, maybe I managed to delete your answers on my thread (Re: Dell Vostro 3550: pci_hotplug+acpiphp require 'pcie_aspm=force' on kernel command-line for hotplug to work). Will go through web archives to make sure I did not miss something. Martin -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: linux-3.7.1: OOPS in page_lock_anon_vma
Hugh Dickins wrote: On Sun, 6 Jan 2013, Martin Mokrejs wrote: I was running 3.7.1 kernel quite fine for a while but I realized that it is slow and that I should go and drop useless kernel drivers from my kernel. I have a SandyBridge-based laptop and I found that I gain speed while setting CONFIG_NO_HZ=y, CONFIG_PREEMPT_NONE=y, removing multicore scheduler, asking configurator set set maximum amount of CPUs for my system (and not blindly specifying 4 for my dual-core i7 processor). Further I get faster system while removing IOMMU and DMA redirects while it still emulates NUMA. And, I switched away from CFQ scheduler to deadline and from SLAB to SLUB. Finally, to make sure my CPU cores do not go back and forth between C0 and C7 states and shutdown dynamically the 2 hyperthreaded cores. So I have really only two, physical cores accessible. With performance CPU governor I have 1/2 of context switches and both cores can be satured by whatever jobs (kernel compile or some computational jobs). It was not possible to get the CPU running at turbo speed for a long while as it always went down time to time. With ondemand governor I had cores in C7 for 50-70% of the time, that was a bit better with performance governor but having the two hyperthreaded cores disabled reduced the context switches by half, rescheduling interrupts went down by several orders of magnitute. So it is crunching at max turbo speed on both cores, temp about 80 oC. I think none of the changes relates to the kernel crash directly but I had not a single crash with 3.7.1 for few weeks. After the tweaks I had 3-4 crashes this afternoon. The system always locked up so I could not see anything. Luckily, be it actually the same crash or not, now my X11 screen was dropped and to my framebuffer console and I got to see a kernel stacktrace. Here is the first, fished out from /var/log/messages upon next bootup: Jan 6 22:37:29 vostro kernel: [ 7663.251110] general protection fault: [#1] SMP Jan 6 22:37:29 vostro kernel: [ 7663.251135] Modules linked in: i915 fbcon bitblit cfbfillrect softcursor cfbimgblt i2c_algo_bit font cfbcopyarea drm_kms_helper drm fb iwldvm iwlwifi fbdev sata_sil24 Jan 6 22:37:29 vostro kernel: [ 7663.251197] CPU 1 Jan 6 22:37:29 vostro kernel: [ 7663.251206] Pid: 795, comm: kswapd0 Not tainted 3.7.1-default #22 Dell Inc. Vostro 3550/ Jan 6 22:37:29 vostro kernel: [ 7663.251229] RIP: 0010:[815d3dee] [815d3dee] mutex_trylock+0xb/0x26 Jan 6 22:37:29 vostro kernel: [ 7663.251257] RSP: 0018:88040d25bbb8 EFLAGS: 00010246 Jan 6 22:37:29 vostro kernel: [ 7663.251273] RAX: 0001 RBX: 88040bfdc000 RCX: 88040d25bce8 Jan 6 22:37:29 vostro kernel: [ 7663.251293] RDX: RSI: RDI: 0720072007200728 Jan 6 22:37:29 vostro kernel: [ 7663.251313] RBP: 88040d25bbb8 R08: dead00200200 R09: dead00100100 Jan 6 22:37:29 vostro kernel: [ 7663.251333] R10: 88040d25bc38 R11: 8804078acec0 R12: 88040bfdc001 Jan 6 22:37:29 vostro kernel: [ 7663.251354] R13: ea0010137440 R14: 0720072007200728 R15: 0001 Jan 6 22:37:29 vostro kernel: [ 7663.251374] FS: () GS:88041fa8() knlGS: Jan 6 22:37:29 vostro kernel: [ 7663.251396] CS: 0010 DS: ES: CR0: 80050033 Jan 6 22:37:29 vostro kernel: [ 7663.251413] CR2: 2b876c545978 CR3: 018f6000 CR4: 000407e0 Jan 6 22:37:29 vostro kernel: [ 7663.251432] DR0: DR1: DR2: Jan 6 22:37:29 vostro kernel: [ 7663.251452] DR3: DR6: 0ff0 DR7: 0400 Jan 6 22:37:29 vostro kernel: [ 7663.251472] Process kswapd0 (pid: 795, threadinfo 88040d25a000, task 88040d07ce30) Jan 6 22:37:29 vostro kernel: [ 7663.251494] Stack: Jan 6 22:37:29 vostro kernel: [ 7663.251501] 88040d25bbe8 810f6994 ea0010137440 Jan 6 22:37:29 vostro kernel: [ 7663.251527] 88040d25bde8 88041fddad00 88040d25bc58 810f6b9e Jan 6 22:37:29 vostro kernel: [ 7663.251551] 8804046d2dc0 810dee97 88040d25bce8 Jan 6 22:37:29 vostro kernel: [ 7663.251576] Call Trace: Jan 6 22:37:29 vostro kernel: [ 7663.251587] [810f6994] page_lock_anon_vma+0x40/0xaf Jan 6 22:37:29 vostro kernel: [ 7663.251605] [810f6b9e] page_referenced+0x78/0x1b7 Jan 6 22:37:29 vostro kernel: [ 7663.251623] [810e026a] shrink_active_list+0x209/0x305 Jan 6 22:37:29 vostro kernel: [ 7663.251641] [810e1269] kswapd+0x3fe/0x8ea Jan 6 22:37:29 vostro kernel: [ 7663.251658] [81091697] ? wake_up_bit+0x25/0x25 Jan 6 22:37:29 vostro kernel: [ 7663.251675] [810e0e6b] ? try_to_free_pages+0x8c/0x8c Jan 6 22:37:29 vostro kernel
3.7.1: BUG filp (Not tainted): Poison overwritten
Hi, today I received the following. [ 124.927854] pci_hotplug: PCI Hot Plug PCI Core version: 0.5 [ 124.987250] acpiphp: ACPI Hot Plug PCI Controller Driver version: 0.5 [ 124.992228] pci_bus :11: dev 00, created physical slot 1 [ 124.992448] acpiphp: Slot [1] registered [ 233.258244] = [ 233.258247] BUG filp (Not tainted): Poison overwritten [ 233.258248] - [ 233.258248] Disabling lock debugging due to kernel taint [ 233.258250] INFO: 0x88040102-0x88040102001d. First byte 0x20 instead of 0x6b [ 233.258253] INFO: Slab 0xea0010040800 objects=21 used=21 fp=0x (null) flags=0x204080 [ 233.258254] INFO: Object 0x88040102 @offset=0 fp=0x880401021e00 [ 233.258255] Object 88040102: 20 07 20 07 20 07 20 07 20 07 20 07 20 07 20 07 . . . . . . . . [ 233.258256] Object 880401020010: 20 07 20 07 20 07 20 07 20 07 20 07 20 07 6b 6b . . . . . . .kk [ 233.258257] Object 880401020020: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b [ 233.258258] Object 880401020030: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b [ 233.258259] Object 880401020040: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b [ 233.258260] Object 880401020050: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b [ 233.258260] Object 880401020060: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b [ 233.258261] Object 880401020070: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b [ 233.258262] Object 880401020080: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b [ 233.258263] Object 880401020090: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b [ 233.258264] Object 8804010200a0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b [ 233.258265] Object 8804010200b0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b [ 233.258265] Object 8804010200c0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b [ 233.258266] Object 8804010200d0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b [ 233.258267] Object 8804010200e0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b [ 233.258268] Object 8804010200f0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b [ 233.258269] Object 880401020100: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b [ 233.258269] Object 880401020110: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b [ 233.258270] Object 880401020120: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b a5 kkk. [ 233.258271] Redzone 880401020130: bb bb bb bb bb bb bb bb [ 233.258272] Padding 880401020140: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a [ 233.258273] Padding 880401020150: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a [ 233.258274] Padding 880401020160: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a [ 233.258275] Padding 880401020170: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a [ 233.258277] Pid: 4440, comm: lspci Tainted: GB3.7.1-default #30 [ 233.258277] Call Trace: [ 233.258283] [] ? print_section+0x38/0x3a [ 233.258285] [] print_trailer+0x105/0x10e [ 233.258287] [] check_bytes_and_report+0xac/0xe5 [ 233.258290] [] check_object+0xbf/0x1ad [ 233.258291] [] ? check_slab+0xaf/0xbd [ 233.258294] [] ? get_empty_filp+0x6f/0x155 [ 233.258297] [] alloc_debug_processing+0x61/0xed [ 233.258299] [] __slab_alloc+0x344/0x3ba [ 233.258301] [] ? get_empty_filp+0x6f/0x155 [ 233.258303] [] ? print_context_stack+0xa2/0xbe [ 233.258305] [] ? get_empty_filp+0x6f/0x155 [ 233.258307] [] ? get_empty_filp+0x6f/0x155 [ 233.258309] [] kmem_cache_alloc+0x50/0xb6 [ 233.258310] [] get_empty_filp+0x6f/0x155 [ 233.258313] [] path_openat+0x35/0x313 [ 233.258315] [] do_filp_open+0x33/0x81 [ 233.258317] [] ? _raw_spin_unlock+0x23/0x27 [ 233.258320] [] ? __alloc_fd+0xe4/0xf6 [ 233.258322] [] do_sys_open+0x68/0xfa [ 233.258323] [] sys_open+0x1c/0x1e [ 233.258325] [] system_call_fastpath+0x1a/0x1f [ 233.258327] FIX filp: Restoring 0x88040102-0x88040102001d=0x6b [ 233.258327] FIX filp: Marking all objects used If you need .config or full dmesg please let me know and please Cc: me, ideally. Martin -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
3.7.1: BUG filp (Not tainted): Poison overwritten
Hi, today I received the following. [ 124.927854] pci_hotplug: PCI Hot Plug PCI Core version: 0.5 [ 124.987250] acpiphp: ACPI Hot Plug PCI Controller Driver version: 0.5 [ 124.992228] pci_bus :11: dev 00, created physical slot 1 [ 124.992448] acpiphp: Slot [1] registered [ 233.258244] = [ 233.258247] BUG filp (Not tainted): Poison overwritten [ 233.258248] - [ 233.258248] Disabling lock debugging due to kernel taint [ 233.258250] INFO: 0x88040102-0x88040102001d. First byte 0x20 instead of 0x6b [ 233.258253] INFO: Slab 0xea0010040800 objects=21 used=21 fp=0x (null) flags=0x204080 [ 233.258254] INFO: Object 0x88040102 @offset=0 fp=0x880401021e00 [ 233.258255] Object 88040102: 20 07 20 07 20 07 20 07 20 07 20 07 20 07 20 07 . . . . . . . . [ 233.258256] Object 880401020010: 20 07 20 07 20 07 20 07 20 07 20 07 20 07 6b 6b . . . . . . .kk [ 233.258257] Object 880401020020: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b [ 233.258258] Object 880401020030: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b [ 233.258259] Object 880401020040: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b [ 233.258260] Object 880401020050: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b [ 233.258260] Object 880401020060: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b [ 233.258261] Object 880401020070: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b [ 233.258262] Object 880401020080: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b [ 233.258263] Object 880401020090: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b [ 233.258264] Object 8804010200a0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b [ 233.258265] Object 8804010200b0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b [ 233.258265] Object 8804010200c0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b [ 233.258266] Object 8804010200d0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b [ 233.258267] Object 8804010200e0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b [ 233.258268] Object 8804010200f0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b [ 233.258269] Object 880401020100: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b [ 233.258269] Object 880401020110: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b [ 233.258270] Object 880401020120: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b a5 kkk. [ 233.258271] Redzone 880401020130: bb bb bb bb bb bb bb bb [ 233.258272] Padding 880401020140: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a [ 233.258273] Padding 880401020150: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a [ 233.258274] Padding 880401020160: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a [ 233.258275] Padding 880401020170: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a [ 233.258277] Pid: 4440, comm: lspci Tainted: GB3.7.1-default #30 [ 233.258277] Call Trace: [ 233.258283] [8111085b] ? print_section+0x38/0x3a [ 233.258285] [81110d19] print_trailer+0x105/0x10e [ 233.258287] [81110fe9] check_bytes_and_report+0xac/0xe5 [ 233.258290] [80e1] check_object+0xbf/0x1ad [ 233.258291] [897f] ? check_slab+0xaf/0xbd [ 233.258294] [81119b04] ? get_empty_filp+0x6f/0x155 [ 233.258297] [815d2a31] alloc_debug_processing+0x61/0xed [ 233.258299] [815d34dd] __slab_alloc+0x344/0x3ba [ 233.258301] [81119b04] ? get_empty_filp+0x6f/0x155 [ 233.258303] [8100536b] ? print_context_stack+0xa2/0xbe [ 233.258305] [81119b04] ? get_empty_filp+0x6f/0x155 [ 233.258307] [81119b04] ? get_empty_filp+0x6f/0x155 [ 233.258309] [81112f50] kmem_cache_alloc+0x50/0xb6 [ 233.258310] [81119b04] get_empty_filp+0x6f/0x155 [ 233.258313] [81123e4b] path_openat+0x35/0x313 [ 233.258315] [8112440b] do_filp_open+0x33/0x81 [ 233.258317] [815d9b93] ? _raw_spin_unlock+0x23/0x27 [ 233.258320] [8112e4cb] ? __alloc_fd+0xe4/0xf6 [ 233.258322] [81118403] do_sys_open+0x68/0xfa [ 233.258323] [811184b1] sys_open+0x1c/0x1e [ 233.258325] [815da756] system_call_fastpath+0x1a/0x1f [ 233.258327] FIX filp: Restoring 0x88040102-0x88040102001d=0x6b [ 233.258327] FIX filp: Marking all objects used If you need .config or full
Re: linux-3.7.1: OOPS in page_lock_anon_vma
Hi Hilf, thank you for your answer on this albeit I am not sure I understood your point well. Hillf Danton wrote: > Hello Martin > > On Mon, Jan 7, 2013 at 6:59 AM, Martin Mokrejs > wrote: >> time to time. With ondemand governor I had cores in C7 for 50-70% of the >> time, that was >> a bit better with performance governor but having the two hyperthreaded >> cores disabled >> reduced the context switches by half, rescheduling interrupts went down by >> several orders >> of magnitute. So it is crunching at max turbo speed on both cores, temp >> about 80 oC. >> > Your boxen could be used to cook pizza, and check the > recommended working temperature in the manual please. I meant CPU temperature, not environment temperature. ;-) This is a laptop dual core i7. # dmesg | grep -i temp [2.233856] coretemp coretemp.0: TjMax is 100 degrees C [2.233882] coretemp coretemp.0: TjMax is 100 degrees C # I am a bit worried whether I disabled the 2 hyperthreaded cores (cpu2 and cpu3). Per the stats below it like inadverently disabled the second core and its hyperthreaded sibling? Or why are the counters not updated for CPU1 below? # cat /proc/interrupts CPU0 CPU1 0: 30 0 IO-APIC-edge timer 1: 15 0 IO-APIC-edge i8042 8: 33 0 IO-APIC-edge rtc0 9: 2 0 IO-APIC-fasteoi acpi 12:241 0 IO-APIC-edge i8042 16: 50 0 IO-APIC-fasteoi ehci_hcd:usb1 19: 464445 0 IO-APIC-fasteoi sata_sil24 23: 17324 0 IO-APIC-fasteoi ehci_hcd:usb2 40: 0 0 PCI-MSI-edge pciehp 41: 14 0 PCI-MSI-edge mei 42: 137666 0 PCI-MSI-edge ahci 43: 13901 0 PCI-MSI-edge eth0 44: 36022 0 PCI-MSI-edge xhci_hcd 45: 0 0 PCI-MSI-edge xhci_hcd 46: 0 0 PCI-MSI-edge xhci_hcd 47: 0 0 PCI-MSI-edge xhci_hcd 48: 0 0 PCI-MSI-edge xhci_hcd 49:810 0 PCI-MSI-edge snd_hda_intel 50: 1 0 PCI-MSI-edge iwlwifi 51:461 0 PCI-MSI-edge i915 NMI: 6496 6111 Non-maskable interrupts LOC: 526765 521983 Local timer interrupts SPU: 0 0 Spurious interrupts PMI: 6496 6111 Performance monitoring interrupts IWI: 0 0 IRQ work interrupts RTR: 2 0 APIC ICR read retries RES: 197262 220079 Rescheduling interrupts CAL: 33 299572 Function call interrupts TLB: 3302 19119 TLB shootdowns TRM: 0 0 Thermal event interrupts THR: 0 0 Threshold APIC interrupts MCE: 0 0 Machine check exceptions MCP: 20 20 Machine check polls ERR: 0 MIS: 0 # i7z reports at the moment: Cpu speed from cpuinfo 2793.00Mhz cpuinfo might be wrong if cpufreq is enabled. To guess correctly try estimating via tsc Linux's inbuilt cpu_khz code emulated now True Frequency (without accounting Turbo) 2793 MHz CPU Multiplier 28x || Bus clock frequency (BCLK) 99.75 MHz Socket [0] - [physical cores=2, logical cores=2, max online cores ever=2] TURBO ENABLED on 2 Cores, Hyper Threading OFF Max Frequency without considering Turbo 2892.75 MHz (99.75 x [29]) Max TURBO Multiplier (if Enabled) with 1/2/3/4 Cores is 35x/33x/33x/33x Real Current Frequency 3291.75 MHz [99.75 x 33.00] (Max of below) Core [core-id] :Actual Freq (Mult.) C0% Halt(C1)% C3 % C6 % C7 % Temp Core 1 [0]: 3291.75 (33.00x) 100 0 0 0 087 Core 2 [1]: 3291.75 (33.00x) 100 0 0 0 081 # cat /proc/schedstat version 15 timestamp 4295525245 cpu0 0 0 4348066 350860 2727228 2499580 4026361745866 2434254688153 3965236 domain0 3 25687 19018 2642 7492049 4293 7 0 19018 22219 21471 43 1338108 709 0 0 21471 342087 288140 40648 58479699 14067 33 5 288135 0 0 0 0 0 0 0 0 0 223256 24270 0 cpu1 0 0 4297136 324961 2565709 2361951 3810969849763 2437183692947 3941014 domain0 3 24296 17512 2837 7768706 4218 15 1 17511 22994 22053 48 1636623 896 0 0 22053 313125 260913 38828 58232101 14403 37 2 260911 0 0 0 0 0 0 0 0 0 198332 23230 0 # cat /proc/sched_debug Sched Debug Version: v0.10, 3.7.1-default #24 ktime : 5888049.840626 sched_clk : 5878999.320221 cpu_clk : 5878999.320272 jiffies : 4295526100 sched_clock_stable : 1 sysctl_sched .sysctl_sched_latency
Re: linux-3.7.1: OOPS in page_lock_anon_vma
Hi Hilf, thank you for your answer on this albeit I am not sure I understood your point well. Hillf Danton wrote: Hello Martin On Mon, Jan 7, 2013 at 6:59 AM, Martin Mokrejs mmokr...@fold.natur.cuni.cz wrote: time to time. With ondemand governor I had cores in C7 for 50-70% of the time, that was a bit better with performance governor but having the two hyperthreaded cores disabled reduced the context switches by half, rescheduling interrupts went down by several orders of magnitute. So it is crunching at max turbo speed on both cores, temp about 80 oC. Your boxen could be used to cook pizza, and check the recommended working temperature in the manual please. I meant CPU temperature, not environment temperature. ;-) This is a laptop dual core i7. # dmesg | grep -i temp [2.233856] coretemp coretemp.0: TjMax is 100 degrees C [2.233882] coretemp coretemp.0: TjMax is 100 degrees C # I am a bit worried whether I disabled the 2 hyperthreaded cores (cpu2 and cpu3). Per the stats below it like inadverently disabled the second core and its hyperthreaded sibling? Or why are the counters not updated for CPU1 below? # cat /proc/interrupts CPU0 CPU1 0: 30 0 IO-APIC-edge timer 1: 15 0 IO-APIC-edge i8042 8: 33 0 IO-APIC-edge rtc0 9: 2 0 IO-APIC-fasteoi acpi 12:241 0 IO-APIC-edge i8042 16: 50 0 IO-APIC-fasteoi ehci_hcd:usb1 19: 464445 0 IO-APIC-fasteoi sata_sil24 23: 17324 0 IO-APIC-fasteoi ehci_hcd:usb2 40: 0 0 PCI-MSI-edge pciehp 41: 14 0 PCI-MSI-edge mei 42: 137666 0 PCI-MSI-edge ahci 43: 13901 0 PCI-MSI-edge eth0 44: 36022 0 PCI-MSI-edge xhci_hcd 45: 0 0 PCI-MSI-edge xhci_hcd 46: 0 0 PCI-MSI-edge xhci_hcd 47: 0 0 PCI-MSI-edge xhci_hcd 48: 0 0 PCI-MSI-edge xhci_hcd 49:810 0 PCI-MSI-edge snd_hda_intel 50: 1 0 PCI-MSI-edge iwlwifi 51:461 0 PCI-MSI-edge i915 NMI: 6496 6111 Non-maskable interrupts LOC: 526765 521983 Local timer interrupts SPU: 0 0 Spurious interrupts PMI: 6496 6111 Performance monitoring interrupts IWI: 0 0 IRQ work interrupts RTR: 2 0 APIC ICR read retries RES: 197262 220079 Rescheduling interrupts CAL: 33 299572 Function call interrupts TLB: 3302 19119 TLB shootdowns TRM: 0 0 Thermal event interrupts THR: 0 0 Threshold APIC interrupts MCE: 0 0 Machine check exceptions MCP: 20 20 Machine check polls ERR: 0 MIS: 0 # i7z reports at the moment: Cpu speed from cpuinfo 2793.00Mhz cpuinfo might be wrong if cpufreq is enabled. To guess correctly try estimating via tsc Linux's inbuilt cpu_khz code emulated now True Frequency (without accounting Turbo) 2793 MHz CPU Multiplier 28x || Bus clock frequency (BCLK) 99.75 MHz Socket [0] - [physical cores=2, logical cores=2, max online cores ever=2] TURBO ENABLED on 2 Cores, Hyper Threading OFF Max Frequency without considering Turbo 2892.75 MHz (99.75 x [29]) Max TURBO Multiplier (if Enabled) with 1/2/3/4 Cores is 35x/33x/33x/33x Real Current Frequency 3291.75 MHz [99.75 x 33.00] (Max of below) Core [core-id] :Actual Freq (Mult.) C0% Halt(C1)% C3 % C6 % C7 % Temp Core 1 [0]: 3291.75 (33.00x) 100 0 0 0 087 Core 2 [1]: 3291.75 (33.00x) 100 0 0 0 081 # cat /proc/schedstat version 15 timestamp 4295525245 cpu0 0 0 4348066 350860 2727228 2499580 4026361745866 2434254688153 3965236 domain0 3 25687 19018 2642 7492049 4293 7 0 19018 22219 21471 43 1338108 709 0 0 21471 342087 288140 40648 58479699 14067 33 5 288135 0 0 0 0 0 0 0 0 0 223256 24270 0 cpu1 0 0 4297136 324961 2565709 2361951 3810969849763 2437183692947 3941014 domain0 3 24296 17512 2837 7768706 4218 15 1 17511 22994 22053 48 1636623 896 0 0 22053 313125 260913 38828 58232101 14403 37 2 260911 0 0 0 0 0 0 0 0 0 198332 23230 0 # cat /proc/sched_debug Sched Debug Version: v0.10, 3.7.1-default #24 ktime : 5888049.840626 sched_clk : 5878999.320221 cpu_clk : 5878999.320272 jiffies : 4295526100 sched_clock_stable : 1 sysctl_sched .sysctl_sched_latency: 12.00 .sysctl_sched_min_granularity: 1.50
Re: [QUESTION ON BUG] the rcu stall issue could not be reproduced
Hi, I see few more RCU bugs reported in bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=43028 https://bugzilla.kernel.org/show_bug.cgi?id=40092 https://bugzilla.kernel.org/show_bug.cgi?id=42997 And, I placed my previous long email with logs at https://bugzilla.kernel.org/show_bug.cgi?id=45091 Hope this helps eventually once. Martin Mike Galbraith wrote: > On Fri, 2012-07-20 at 11:09 +0800, Michael Wang wrote: >> Hi, Mike, Martin, Dan >> >> I'm currently taking an eye on the rcu stall issue which was reported by >> you in the mail: >> >> rcu: endless stalls >> From: Mike Galbraith >> linux-3.4-rc7: rcu_sched self-detected stall on CPU >> From: Martin Mokrejs >> RCU stalls in linux-next >> From: Dan Carpenter >> >> I try to reproduce the issue on my X86 server with 12 cpu > > The 'endless stalls' box was 341.3 times larger. Dunno if you can > even set a serial port slow enough to approximate all cores trying to > gripe through a single pinhole simultaneously. > > -Mike > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [QUESTION ON BUG] the rcu stall issue could not be reproduced
Hi, I see few more RCU bugs reported in bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=43028 https://bugzilla.kernel.org/show_bug.cgi?id=40092 https://bugzilla.kernel.org/show_bug.cgi?id=42997 And, I placed my previous long email with logs at https://bugzilla.kernel.org/show_bug.cgi?id=45091 Hope this helps eventually once. Martin Mike Galbraith wrote: On Fri, 2012-07-20 at 11:09 +0800, Michael Wang wrote: Hi, Mike, Martin, Dan I'm currently taking an eye on the rcu stall issue which was reported by you in the mail: rcu: endless stalls From: Mike Galbraith linux-3.4-rc7: rcu_sched self-detected stall on CPU From: Martin Mokrejs RCU stalls in linux-next From: Dan Carpenter I try to reproduce the issue on my X86 server with 12 cpu The 'endless stalls' box was 341.3 times larger. Dunno if you can even set a serial port slow enough to approximate all cores trying to gripe through a single pinhole simultaneously. -Mike -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.22-rc1 killed my ext3 filesystem cleanly unmounted
On Fri, May 18, 2007 at 04:20:39PM +0200, Jesper Juhl wrote: > On 18/05/07, Martin Mokrejs <[EMAIL PROTECTED]> wrote: > > Hi, > > I just tried the 2.6.22-r1 candidate to test whether some bug I have > > hit in the past still exists. I did use 2.6.20.6 so far. So, I have > > cleanly rebooted to use the new kernel, after the machine came up I > > tried to mess with the bug, and had to reboot again to play with kernel > > commandline parameters. Unfortunately, on the next reboot fsck was > > schedules on my filesystem after 38 clean mounts. :( And the problem > > started. The fsck found some unused inodes, but probably did not know > > where do they belong to, but it deleted them automagically. Finally, the > > fsck died because it cannot fine some '..' entry. > > > > How do you know that the corruption was caused by 2.6.21-rc1 ? > Isn't it possible that the corruption was created by an earlier > kernel, but only detected when a forced fsck was run - which just > happened to be while you were running 2.6.21-rc1 ... > > My point is that, as far as I can see, there's nothing tying > 2.6.21-rc1 specifically to this corruption... or? You might be right, but I thought maybe more probably is the cause in kernel as that is what I have changed recently. ;) Or maybe someone can at leats say "No, no changes to be considered between 2.6.20.6 and 2.6.22-rc1.". ;) Martin - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.22-rc1 killed my ext3 filesystem cleanly unmounted
On Fri, May 18, 2007 at 07:38:18PM +0530, Kalpak Shah wrote: > On Fri, 2007-05-18 at 15:51 +0200, Martin Mokrejs wrote: > > On Fri, May 18, 2007 at 05:17:06PM +0530, Kalpak Shah wrote: > > > On Fri, 2007-05-18 at 11:06 +0200, Martin Mokrejs wrote: > > > > Hi, > > > > I just tried the 2.6.22-r1 candidate to test whether some bug I have > > > > hit in the past still exists. I did use 2.6.20.6 so far. So, I have > > > > cleanly rebooted to use the new kernel, after the machine came up I > > > > tried to mess with the bug, and had to reboot again to play with kernel > > > > commandline parameters. Unfortunately, on the next reboot fsck was > > > > schedules on my filesystem after 38 clean mounts. :( And the problem > > > > started. The fsck found some unused inodes, but probably did not know > > > > where do they belong to, but it deleted them automagically. Finally, > > > > the > > > > fsck died because it cannot fine some '..' entry. > > > > > > > > /dev/hda3: Entry '..' in .../??? (5701636) has deleted/unused inode > > > > 5570561. CLEARED. > > > > Unconnected directory inode 5570567 (...) > > > > > > > > /dev/hda3: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY. > > > > (i.e., without -a or -p options) > > > > > > > > > > This means that e2fsck has reached a point where it needs user > > > intervention. So you should not run e2fsck with -p, -a or -y options. > > > Look up the e2fsck man page for more on this. > > > > Yeah, stupid init.d script in Gentoo. I will report at Gentoo as well but > > how can I revert the changes? Can you say which directories were affected? > > No there is nothing wrong with your script, most problems get solved by > -a or -p and hence your init.d script is correct in using these options. > > I don't understand what you mean by reverting your changes. I would like to boot with another/previous/tested kernel and run another, stable fsck version. Yes, I cannot say how it happened that ext3 had broken directory, but for sure before making changes to the filesystem I would boot with a tested kernel and tools. > > An unconnected directory inode means that this directory (inode 5570567) > does not have a valid ".." entry (which is the backpointer to its > parent). So this directory will be moved to lost+found. And those original "errors"? Did not those modifications cause this in turn? /dev/hda3 has been mounted 38 times without being checked, check forced HTREE directory inode 1163319 has an invalid root node. HTREE INDEX CLEARED Entry '..' in .../??? (5570587) has deleted/unused inode 5570561. CLEARED. /dev/hda3: Entry '..' in .../??? (5570620) has deleted/unused inode 5570561. CLEARED. /dev/hda3: Entry '..' in .../??? (5570625) has deleted/unused inode 5570561. CLEARED. /dev/hda3: Entry '..' in .../??? (5570567) has deleted/unused inode 5570561. CLEARED. [cut] Martin - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.22-rc1 killed my ext3 filesystem cleanly unmounted
On Fri, May 18, 2007 at 05:17:06PM +0530, Kalpak Shah wrote: > On Fri, 2007-05-18 at 11:06 +0200, Martin Mokrejs wrote: > > Hi, > > I just tried the 2.6.22-r1 candidate to test whether some bug I have > > hit in the past still exists. I did use 2.6.20.6 so far. So, I have > > cleanly rebooted to use the new kernel, after the machine came up I > > tried to mess with the bug, and had to reboot again to play with kernel > > commandline parameters. Unfortunately, on the next reboot fsck was > > schedules on my filesystem after 38 clean mounts. :( And the problem > > started. The fsck found some unused inodes, but probably did not know > > where do they belong to, but it deleted them automagically. Finally, the > > fsck died because it cannot fine some '..' entry. > > > > /dev/hda3: Entry '..' in .../??? (5701636) has deleted/unused inode > > 5570561. CLEARED. > > Unconnected directory inode 5570567 (...) > > > > /dev/hda3: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY. > > (i.e., without -a or -p options) > > > > This means that e2fsck has reached a point where it needs user > intervention. So you should not run e2fsck with -p, -a or -y options. > Look up the e2fsck man page for more on this. Yeah, stupid init.d script in Gentoo. I will report at Gentoo as well but how can I revert the changes? Can you say which directories were affected? Thanks, Martin - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
2.6.22-rc1 killed my ext3 filesystem cleanly unmounted
Hi, I just tried the 2.6.22-r1 candidate to test whether some bug I have hit in the past still exists. I did use 2.6.20.6 so far. So, I have cleanly rebooted to use the new kernel, after the machine came up I tried to mess with the bug, and had to reboot again to play with kernel commandline parameters. Unfortunately, on the next reboot fsck was schedules on my filesystem after 38 clean mounts. :( And the problem started. The fsck found some unused inodes, but probably did not know where do they belong to, but it deleted them automagically. Finally, the fsck died because it cannot fine some '..' entry. Here is retyped what happened as recorded by my camera. ;) /dev/hda3 has been mounted 38 times without being checked, check forced HTREE directory inode 1163319 has an invalid root node. HTREE INDEX CLEARED Entry '..' in .../??? (5570587) has deleted/unused inode 5570561. CLEARED. /dev/hda3: Entry '..' in .../??? (5570620) has deleted/unused inode 5570561. CLEARED. /dev/hda3: Entry '..' in .../??? (5570625) has deleted/unused inode 5570561. CLEARED. /dev/hda3: Entry '..' in .../??? (5570567) has deleted/unused inode 5570561. CLEARED. /dev/hda3: Entry '..' in .../??? (5570614) has deleted/unused inode 5570561. CLEARED. /dev/hda3: Entry '..' in .../??? (5570603) has deleted/unused inode 5570561. CLEARED. /dev/hda3: Entry '..' in .../??? (5586948) has deleted/unused inode 5570561. CLEARED. /dev/hda3: Entry '..' in .../??? (5586957) has deleted/unused inode 5570561. CLEARED. /dev/hda3: Entry '..' in .../??? (5701636) has deleted/unused inode 5570561. CLEARED. Unconnected directory inode 5570567 (...) /dev/hda3: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY. (i.e., without -a or -p options) Turning off the power and booting back with 2.6.20.6 and obviously running same fsck gives me: /dev/hda3 contains a file system with errors, check forced. Missing '..' in direcotry inode 5570587. /dev/hda3: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY. (i.e., without -a or -p options) What do you recommend me now? I cannot say what is the fsck version, but I can tell you this is a Gentoo linux box in the ~x86 tree, so whatever is in the "unstable" branch. :( I do use ext2/ext3 windows driver from http://www.fs-driver.org/ to access the filesystem. Even now, when the filesystem should be marked as dirty I can access it from windows and see the files. Does the extfs.sys ignore the mark? ;) Anyway, since that time there is a directory 'Recycled' at the top level of the filesystem. ;-) I do remember recently that possibly one of the system packages in Gentoo installed some kind of a hash into the filesystem, or hashing support, something like that. Sorry, I do not remember the details. Am just think what could have made the fsck think there is something wrong. I think IO would like to restore the filesystem to the previous stage before running the fsck. How can I do it? No, I do not have a backup of the filesystem. :( I subscribed to the email lists but please send me Cc: anyway. Many thanks. Martin - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
2.6.22-rc1 killed my ext3 filesystem cleanly unmounted
Hi, I just tried the 2.6.22-r1 candidate to test whether some bug I have hit in the past still exists. I did use 2.6.20.6 so far. So, I have cleanly rebooted to use the new kernel, after the machine came up I tried to mess with the bug, and had to reboot again to play with kernel commandline parameters. Unfortunately, on the next reboot fsck was schedules on my filesystem after 38 clean mounts. :( And the problem started. The fsck found some unused inodes, but probably did not know where do they belong to, but it deleted them automagically. Finally, the fsck died because it cannot fine some '..' entry. Here is retyped what happened as recorded by my camera. ;) /dev/hda3 has been mounted 38 times without being checked, check forced HTREE directory inode 1163319 has an invalid root node. HTREE INDEX CLEARED Entry '..' in .../??? (5570587) has deleted/unused inode 5570561. CLEARED. /dev/hda3: Entry '..' in .../??? (5570620) has deleted/unused inode 5570561. CLEARED. /dev/hda3: Entry '..' in .../??? (5570625) has deleted/unused inode 5570561. CLEARED. /dev/hda3: Entry '..' in .../??? (5570567) has deleted/unused inode 5570561. CLEARED. /dev/hda3: Entry '..' in .../??? (5570614) has deleted/unused inode 5570561. CLEARED. /dev/hda3: Entry '..' in .../??? (5570603) has deleted/unused inode 5570561. CLEARED. /dev/hda3: Entry '..' in .../??? (5586948) has deleted/unused inode 5570561. CLEARED. /dev/hda3: Entry '..' in .../??? (5586957) has deleted/unused inode 5570561. CLEARED. /dev/hda3: Entry '..' in .../??? (5701636) has deleted/unused inode 5570561. CLEARED. Unconnected directory inode 5570567 (...) /dev/hda3: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY. (i.e., without -a or -p options) Turning off the power and booting back with 2.6.20.6 and obviously running same fsck gives me: /dev/hda3 contains a file system with errors, check forced. Missing '..' in direcotry inode 5570587. /dev/hda3: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY. (i.e., without -a or -p options) What do you recommend me now? I cannot say what is the fsck version, but I can tell you this is a Gentoo linux box in the ~x86 tree, so whatever is in the unstable branch. :( I do use ext2/ext3 windows driver from http://www.fs-driver.org/ to access the filesystem. Even now, when the filesystem should be marked as dirty I can access it from windows and see the files. Does the extfs.sys ignore the mark? ;) Anyway, since that time there is a directory 'Recycled' at the top level of the filesystem. ;-) I do remember recently that possibly one of the system packages in Gentoo installed some kind of a hash into the filesystem, or hashing support, something like that. Sorry, I do not remember the details. Am just think what could have made the fsck think there is something wrong. I think IO would like to restore the filesystem to the previous stage before running the fsck. How can I do it? No, I do not have a backup of the filesystem. :( I subscribed to the email lists but please send me Cc: anyway. Many thanks. Martin - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.22-rc1 killed my ext3 filesystem cleanly unmounted
On Fri, May 18, 2007 at 05:17:06PM +0530, Kalpak Shah wrote: On Fri, 2007-05-18 at 11:06 +0200, Martin Mokrejs wrote: Hi, I just tried the 2.6.22-r1 candidate to test whether some bug I have hit in the past still exists. I did use 2.6.20.6 so far. So, I have cleanly rebooted to use the new kernel, after the machine came up I tried to mess with the bug, and had to reboot again to play with kernel commandline parameters. Unfortunately, on the next reboot fsck was schedules on my filesystem after 38 clean mounts. :( And the problem started. The fsck found some unused inodes, but probably did not know where do they belong to, but it deleted them automagically. Finally, the fsck died because it cannot fine some '..' entry. /dev/hda3: Entry '..' in .../??? (5701636) has deleted/unused inode 5570561. CLEARED. Unconnected directory inode 5570567 (...) /dev/hda3: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY. (i.e., without -a or -p options) This means that e2fsck has reached a point where it needs user intervention. So you should not run e2fsck with -p, -a or -y options. Look up the e2fsck man page for more on this. Yeah, stupid init.d script in Gentoo. I will report at Gentoo as well but how can I revert the changes? Can you say which directories were affected? Thanks, Martin - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.22-rc1 killed my ext3 filesystem cleanly unmounted
On Fri, May 18, 2007 at 07:38:18PM +0530, Kalpak Shah wrote: On Fri, 2007-05-18 at 15:51 +0200, Martin Mokrejs wrote: On Fri, May 18, 2007 at 05:17:06PM +0530, Kalpak Shah wrote: On Fri, 2007-05-18 at 11:06 +0200, Martin Mokrejs wrote: Hi, I just tried the 2.6.22-r1 candidate to test whether some bug I have hit in the past still exists. I did use 2.6.20.6 so far. So, I have cleanly rebooted to use the new kernel, after the machine came up I tried to mess with the bug, and had to reboot again to play with kernel commandline parameters. Unfortunately, on the next reboot fsck was schedules on my filesystem after 38 clean mounts. :( And the problem started. The fsck found some unused inodes, but probably did not know where do they belong to, but it deleted them automagically. Finally, the fsck died because it cannot fine some '..' entry. /dev/hda3: Entry '..' in .../??? (5701636) has deleted/unused inode 5570561. CLEARED. Unconnected directory inode 5570567 (...) /dev/hda3: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY. (i.e., without -a or -p options) This means that e2fsck has reached a point where it needs user intervention. So you should not run e2fsck with -p, -a or -y options. Look up the e2fsck man page for more on this. Yeah, stupid init.d script in Gentoo. I will report at Gentoo as well but how can I revert the changes? Can you say which directories were affected? No there is nothing wrong with your script, most problems get solved by -a or -p and hence your init.d script is correct in using these options. I don't understand what you mean by reverting your changes. I would like to boot with another/previous/tested kernel and run another, stable fsck version. Yes, I cannot say how it happened that ext3 had broken directory, but for sure before making changes to the filesystem I would boot with a tested kernel and tools. An unconnected directory inode means that this directory (inode 5570567) does not have a valid .. entry (which is the backpointer to its parent). So this directory will be moved to lost+found. And those original errors? Did not those modifications cause this in turn? /dev/hda3 has been mounted 38 times without being checked, check forced HTREE directory inode 1163319 has an invalid root node. HTREE INDEX CLEARED Entry '..' in .../??? (5570587) has deleted/unused inode 5570561. CLEARED. /dev/hda3: Entry '..' in .../??? (5570620) has deleted/unused inode 5570561. CLEARED. /dev/hda3: Entry '..' in .../??? (5570625) has deleted/unused inode 5570561. CLEARED. /dev/hda3: Entry '..' in .../??? (5570567) has deleted/unused inode 5570561. CLEARED. [cut] Martin - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.22-rc1 killed my ext3 filesystem cleanly unmounted
On Fri, May 18, 2007 at 04:20:39PM +0200, Jesper Juhl wrote: On 18/05/07, Martin Mokrejs [EMAIL PROTECTED] wrote: Hi, I just tried the 2.6.22-r1 candidate to test whether some bug I have hit in the past still exists. I did use 2.6.20.6 so far. So, I have cleanly rebooted to use the new kernel, after the machine came up I tried to mess with the bug, and had to reboot again to play with kernel commandline parameters. Unfortunately, on the next reboot fsck was schedules on my filesystem after 38 clean mounts. :( And the problem started. The fsck found some unused inodes, but probably did not know where do they belong to, but it deleted them automagically. Finally, the fsck died because it cannot fine some '..' entry. How do you know that the corruption was caused by 2.6.21-rc1 ? Isn't it possible that the corruption was created by an earlier kernel, but only detected when a forced fsck was run - which just happened to be while you were running 2.6.21-rc1 ... My point is that, as far as I can see, there's nothing tying 2.6.21-rc1 specifically to this corruption... or? You might be right, but I thought maybe more probably is the cause in kernel as that is what I have changed recently. ;) Or maybe someone can at leats say No, no changes to be considered between 2.6.20.6 and 2.6.22-rc1.. ;) Martin - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: init 0 stopped working
> I used to shutdown my P4 machine based on ASUS P4C800E-Deluxe > with simple "init 0" command. That somehow broke between 2.6.12-rc6-git2 > and 2.6.13-rc1. The machines makes the sound like shutdown but it > immediately turns the power on again. I used acpi and the kernel > configs should be almost identical in all cases, as I just recopy > previously used .config and run "make oldconfig". > > Any clues? I still happens even with 2.6.13-rc3-git2. It was introduced after 2.6.12 but before or with 2.6.13-rc1. It is not fixed by acpi-20050708 patch for 2.6.13 series. I had KEXEC enabled and also disabled, but the problem still persists. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: init 0 stopped working
I used to shutdown my P4 machine based on ASUS P4C800E-Deluxe with simple init 0 command. That somehow broke between 2.6.12-rc6-git2 and 2.6.13-rc1. The machines makes the sound like shutdown but it immediately turns the power on again. I used acpi and the kernel configs should be almost identical in all cases, as I just recopy previously used .config and run make oldconfig. Any clues? I still happens even with 2.6.13-rc3-git2. It was introduced after 2.6.12 but before or with 2.6.13-rc1. It is not fixed by acpi-20050708 patch for 2.6.13 series. I had KEXEC enabled and also disabled, but the problem still persists. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
init 0 stopped working
Hi, I used to shutdown my P4 machine based on ASUS P4C800E-Deluxe with simple "init 0" command. That somehow broke between 2.6.12-rc6-git2 and 2.6.13-rc1. The machines makes the sound like shutdown but it immediately turns the power on again. I used acpi and the kernel configs should be almost identical in all cases, as I just recopy previously used .config and run "make oldconfig". Any clues? I still happens even with 2.6.13-rc3-git2. Please Cc: me in replies. Martin - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
init 0 stopped working
Hi, I used to shutdown my P4 machine based on ASUS P4C800E-Deluxe with simple init 0 command. That somehow broke between 2.6.12-rc6-git2 and 2.6.13-rc1. The machines makes the sound like shutdown but it immediately turns the power on again. I used acpi and the kernel configs should be almost identical in all cases, as I just recopy previously used .config and run make oldconfig. Any clues? I still happens even with 2.6.13-rc3-git2. Please Cc: me in replies. Martin - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Two 2.6.13-rc1 kernel crashes
Hi, I use on i686 architecture Gentoo linux with XFS filesystem. Recently it happened to me 3 time that the machine locked, although at least once sys-rq+b worked. Here is the log from remote console. I don't remeber having such problems with 2.6.12-rc6-git2, which was my previous testing kernel. The problems appear under heavy load when I compile/install some packages and maybe it's just a bad coincidence or not, when I move my usb mouse in fvwm2 environment. The machine locks. Any clues? Please Cc: me in replies. Martin Linux version 2.6.13-rc1 ([EMAIL PROTECTED]) (gcc version 3.4.4 (Gentoo 3.4.4, ssp-3.4.4-1.0, pie-8.7.8)) #2 Mon Jul 4 01:13:46 CEST 2005 BIOS-provided physical RAM map: BIOS-e820: - 0009fc00 (usable) BIOS-e820: 0009fc00 - 000a (reserved) BIOS-e820: 000e8000 - 0010 (reserved) BIOS-e820: 0010 - bff3 (usable) BIOS-e820: bff3 - bff4 (ACPI data) BIOS-e820: bff4 - bfff (ACPI NVS) BIOS-e820: bfff - c000 (reserved) BIOS-e820: ffb8 - 0001 (reserved) 2175MB HIGHMEM available. 896MB LOWMEM available. found SMP MP-table at 000ff780 DMI 2.3 present. ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled) Processor #0 15:2 APIC version 20 ACPI: LAPIC (acpi_id[0x02] lapic_id[0x01] disabled) ACPI: IOAPIC (id[0x02] address[0xfec0] gsi_base[0]) IOAPIC[0]: apic_id 2, version 32, address 0xfec0, GSI 0-23 ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl) ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level) Enabling APIC mode: Flat. Using 1 I/O APICs Using ACPI (MADT) for SMP configuration information Allocating PCI resources starting at c000 (gap: c000:3fb8) Built 1 zonelists Kernel command line: root=/dev/sda2 ide=reverse agp=try_unsupported console=ttyS0,57600n8 console=tty0 vga=792 idebus=66 ide_setup: ide=reverse : Enabled support for IDE inverse scan order. ide_setup: idebus=66 Initializing CPU#0 PID hash table entries: 4096 (order: 12, 65536 bytes) Detected 3228.252 MHz processor. Using tsc for high-res timesource Console: colour dummy device 80x25 Dentry cache hash table entries: 131072 (order: 7, 524288 bytes) Inode-cache hash table entries: 65536 (order: 6, 262144 bytes) Memory: 3112324k/3144896k available (2926k kernel code, 31420k reserved, 1612k data, 172k init, 2227392k highmem) Checking if this processor honours the WP bit even in supervisor mode... Ok. Calibrating delay using timer specific routine.. 6464.39 BogoMIPS (lpj=12928798) Mount-cache hash table entries: 512 CPU: Trace cache: 12K uops, L1 D cache: 8K CPU: L2 cache: 512K Intel machine check architecture supported. Intel machine check reporting enabled on CPU#0. CPU0: Intel P4/Xeon Extended MCE MSRs (12) available CPU0: Thermal monitoring enabled CPU: Intel(R) Pentium(R) 4 CPU 3.00GHz stepping 09 Enabling fast FPU save and restore... done. Enabling unmasked SIMD FPU exception support... done. Checking 'hlt' instruction... OK. ENABLING IO-APIC IRQs ..TIMER: vector=0x31 pin1=2 pin2=-1 NET: Registered protocol family 16 PCI: PCI BIOS revision 2.10 entry at 0xf0031, last bus=3 PCI: Using configuration type 1 mtrr: v2.0 (20020519) ACPI: Subsystem revision 20050309 ACPI: Interpreter enabled ACPI: Using IOAPIC for interrupt routing ACPI: PCI Root Bridge [PCI0] (:00) PCI: Probing PCI hardware (bus 00) PCI: Ignoring BAR0-3 of IDE controller :00:1f.1 PCI: Transparent bridge - :00:1e.0 ACPI: PCI Interrupt Link [LNKA] (IRQs 3 4 5 6 7 *10 11 12 14 15) ACPI: PCI Interrupt Link [LNKB] (IRQs 3 4 5 6 7 10 *11 12 14 15) ACPI: PCI Interrupt Link [LNKC] (IRQs 3 4 *5 6 7 10 11 12 14 15) ACPI: PCI Interrupt Link [LNKD] (IRQs 3 4 5 6 *7 10 11 12 14 15) ACPI: PCI Interrupt Link [LNKE] (IRQs 3 4 5 6 7 10 *11 12 14 15) ACPI: PCI Interrupt Link [LNKF] (IRQs 3 4 5 6 7 10 11 12 14 15) *0, disabled. ACPI: PCI Interrupt Link [LNKG] (IRQs 3 4 5 6 7 10 11 12 14 15) *0, disabled. ACPI: PCI Interrupt Link [LNKH] (IRQs 3 4 5 6 7 10 *11 12 14 15) Linux Plug and Play Support v0.97 (c) Adam Belay pnp: PnP ACPI init pnp: PnP ACPI: found 12 devices SCSI subsystem initialized usbcore: registered new driver usbfs usbcore: registered new driver hub PCI: Using ACPI for IRQ routing PCI: If a device doesn't work, try "pci=routeirq". If it helps, post a report pnp: 00:08: ioport range 0x680-0x6ff has been reserved pnp: 00:08: ioport range 0x290-0x297 has been reserved Machine check exception polling timer started. IA-32 Microcode Update Driver: v1.14 <[EMAIL PROTECTED]> highmem bounce pool size: 64 pages SGI XFS with no debug enabled Initializing Cryptographic API ACPI: PCI Interrupt :01:00.0[A] -> GSI 16 (level, low) -> IRQ 16 radeonfb: Found Intel x86 BIOS ROM Image radeonfb: Retreived PLL infos from BIOS radeonfb: Reference=27.00 MHz
Two 2.6.13-rc1 kernel crashes
Hi, I use on i686 architecture Gentoo linux with XFS filesystem. Recently it happened to me 3 time that the machine locked, although at least once sys-rq+b worked. Here is the log from remote console. I don't remeber having such problems with 2.6.12-rc6-git2, which was my previous testing kernel. The problems appear under heavy load when I compile/install some packages and maybe it's just a bad coincidence or not, when I move my usb mouse in fvwm2 environment. The machine locks. Any clues? Please Cc: me in replies. Martin Linux version 2.6.13-rc1 ([EMAIL PROTECTED]) (gcc version 3.4.4 (Gentoo 3.4.4, ssp-3.4.4-1.0, pie-8.7.8)) #2 Mon Jul 4 01:13:46 CEST 2005 BIOS-provided physical RAM map: BIOS-e820: - 0009fc00 (usable) BIOS-e820: 0009fc00 - 000a (reserved) BIOS-e820: 000e8000 - 0010 (reserved) BIOS-e820: 0010 - bff3 (usable) BIOS-e820: bff3 - bff4 (ACPI data) BIOS-e820: bff4 - bfff (ACPI NVS) BIOS-e820: bfff - c000 (reserved) BIOS-e820: ffb8 - 0001 (reserved) 2175MB HIGHMEM available. 896MB LOWMEM available. found SMP MP-table at 000ff780 DMI 2.3 present. ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled) Processor #0 15:2 APIC version 20 ACPI: LAPIC (acpi_id[0x02] lapic_id[0x01] disabled) ACPI: IOAPIC (id[0x02] address[0xfec0] gsi_base[0]) IOAPIC[0]: apic_id 2, version 32, address 0xfec0, GSI 0-23 ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl) ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level) Enabling APIC mode: Flat. Using 1 I/O APICs Using ACPI (MADT) for SMP configuration information Allocating PCI resources starting at c000 (gap: c000:3fb8) Built 1 zonelists Kernel command line: root=/dev/sda2 ide=reverse agp=try_unsupported console=ttyS0,57600n8 console=tty0 vga=792 idebus=66 ide_setup: ide=reverse : Enabled support for IDE inverse scan order. ide_setup: idebus=66 Initializing CPU#0 PID hash table entries: 4096 (order: 12, 65536 bytes) Detected 3228.252 MHz processor. Using tsc for high-res timesource Console: colour dummy device 80x25 Dentry cache hash table entries: 131072 (order: 7, 524288 bytes) Inode-cache hash table entries: 65536 (order: 6, 262144 bytes) Memory: 3112324k/3144896k available (2926k kernel code, 31420k reserved, 1612k data, 172k init, 2227392k highmem) Checking if this processor honours the WP bit even in supervisor mode... Ok. Calibrating delay using timer specific routine.. 6464.39 BogoMIPS (lpj=12928798) Mount-cache hash table entries: 512 CPU: Trace cache: 12K uops, L1 D cache: 8K CPU: L2 cache: 512K Intel machine check architecture supported. Intel machine check reporting enabled on CPU#0. CPU0: Intel P4/Xeon Extended MCE MSRs (12) available CPU0: Thermal monitoring enabled CPU: Intel(R) Pentium(R) 4 CPU 3.00GHz stepping 09 Enabling fast FPU save and restore... done. Enabling unmasked SIMD FPU exception support... done. Checking 'hlt' instruction... OK. ENABLING IO-APIC IRQs ..TIMER: vector=0x31 pin1=2 pin2=-1 NET: Registered protocol family 16 PCI: PCI BIOS revision 2.10 entry at 0xf0031, last bus=3 PCI: Using configuration type 1 mtrr: v2.0 (20020519) ACPI: Subsystem revision 20050309 ACPI: Interpreter enabled ACPI: Using IOAPIC for interrupt routing ACPI: PCI Root Bridge [PCI0] (:00) PCI: Probing PCI hardware (bus 00) PCI: Ignoring BAR0-3 of IDE controller :00:1f.1 PCI: Transparent bridge - :00:1e.0 ACPI: PCI Interrupt Link [LNKA] (IRQs 3 4 5 6 7 *10 11 12 14 15) ACPI: PCI Interrupt Link [LNKB] (IRQs 3 4 5 6 7 10 *11 12 14 15) ACPI: PCI Interrupt Link [LNKC] (IRQs 3 4 *5 6 7 10 11 12 14 15) ACPI: PCI Interrupt Link [LNKD] (IRQs 3 4 5 6 *7 10 11 12 14 15) ACPI: PCI Interrupt Link [LNKE] (IRQs 3 4 5 6 7 10 *11 12 14 15) ACPI: PCI Interrupt Link [LNKF] (IRQs 3 4 5 6 7 10 11 12 14 15) *0, disabled. ACPI: PCI Interrupt Link [LNKG] (IRQs 3 4 5 6 7 10 11 12 14 15) *0, disabled. ACPI: PCI Interrupt Link [LNKH] (IRQs 3 4 5 6 7 10 *11 12 14 15) Linux Plug and Play Support v0.97 (c) Adam Belay pnp: PnP ACPI init pnp: PnP ACPI: found 12 devices SCSI subsystem initialized usbcore: registered new driver usbfs usbcore: registered new driver hub PCI: Using ACPI for IRQ routing PCI: If a device doesn't work, try pci=routeirq. If it helps, post a report pnp: 00:08: ioport range 0x680-0x6ff has been reserved pnp: 00:08: ioport range 0x290-0x297 has been reserved Machine check exception polling timer started. IA-32 Microcode Update Driver: v1.14 [EMAIL PROTECTED] highmem bounce pool size: 64 pages SGI XFS with no debug enabled Initializing Cryptographic API ACPI: PCI Interrupt :01:00.0[A] - GSI 16 (level, low) - IRQ 16 radeonfb: Found Intel x86 BIOS ROM Image radeonfb: Retreived PLL infos from BIOS radeonfb: Reference=27.00 MHz
Re: find: /usr/src/linux-2.4.30/include/asm: Too many levels of symbolic links
DervishD wrote: again I've hit some wird problem doing "make dep" for 2.4 kernel: Not a kernel problem but a findutils problem. Fixed in 4.2.19, but 4.2.20 was released recently. Upgrade. You were right. Thanks! M. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: find: /usr/src/linux-2.4.30/include/asm: Too many levels of symbolic links
DervishD wrote: again I've hit some wird problem doing make dep for 2.4 kernel: Not a kernel problem but a findutils problem. Fixed in 4.2.19, but 4.2.20 was released recently. Upgrade. You were right. Thanks! M. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/