from:"Martin Mokrejs"

Re: [PATCH] power: supply: fix sbs-charger build, needs REGMAP_I2C

2021-01-17 Thread Martin Mokrejs

Hi Randy,
  thank you very much. I would not mind dropping my name but I tested the patch
now with 5.4.89 so you may actually also add

Tested-by: Martin Mokrejs 

It also happened with 5.10.7, it is probably obvious.

Thank you for quick action.
Martin

On 16/01/2021 22:13, Randy Dunlap wrote:
> CHARGER_SBS should select REGMAP_I2C since it uses API(s) that are
> provided by that Kconfig symbol.
> 
> Fixes these errors:
> 
> ../drivers/power/supply/sbs-charger.c:149:21: error: variable ‘sbs_regmap’ 
> has initializer but incomplete type
>  static const struct regmap_config sbs_regmap = {
> ../drivers/power/supply/sbs-charger.c:150:3: error: ‘const struct 
> regmap_config’ has no member named ‘reg_bits’
>   .reg_bits = 8,
> ../drivers/power/supply/sbs-charger.c:155:23: error: ‘REGMAP_ENDIAN_LITTLE’ 
> undeclared here (not in a function)
>   .val_format_endian = REGMAP_ENDIAN_LITTLE, /* since based on SMBus */
> ../drivers/power/supply/sbs-charger.c: In function ‘sbs_probe’:
> ../drivers/power/supply/sbs-charger.c:183:17: error: implicit declaration of 
> function ‘devm_regmap_init_i2c’; did you mean ‘devm_request_irq’? 
> [-Werror=implicit-function-declaration]
>   chip->regmap = devm_regmap_init_i2c(client, _regmap);
> ../drivers/power/supply/sbs-charger.c: At top level:
> ../drivers/power/supply/sbs-charger.c:149:35: error: storage size of 
> ‘sbs_regmap’ isn’t known
>  static const struct regmap_config sbs_regmap = {
> 
> Fixes: feb583e37f8a ("power: supply: add sbs-charger driver")
> Signed-off-by: Randy Dunlap 
> Cc: Sebastian Reichel 
> Cc: linux...@vger.kernel.org
> Cc: Martin Mokrejs 
> Cc: Greg Kroah-Hartman 
> Cc: nicolassae...@gmail.com
> Cc: Nicolas Saenz Julienne 
> Cc: Rafael J. Wysocki 
> ---
> Martin, do you want Reported-by: on this?
> 
>  drivers/power/supply/Kconfig |1 +
>  1 file changed, 1 insertion(+)
> 
> --- linux-next-20210115.orig/drivers/power/supply/Kconfig
> +++ linux-next-20210115/drivers/power/supply/Kconfig
> @@ -229,6 +229,7 @@ config BATTERY_SBS
>  config CHARGER_SBS
>   tristate "SBS Compliant charger"
>   depends on I2C
> + select REGMAP_I2C
>   help
> Say Y to include support for SBS compliant battery chargers.
>  
>

Re: [PATCH] i2c: i801: fix memleak on probe error

2013-12-23 Thread Martin Mokrejs


Thanks for the note, was just compiling a new 3.10.24 kernel to test it.
;-)

So far just booted an old 3.9 kernel and after plugging in an external
USB3 drive I got the message, just to be sure I am still able to reproduce
the error and that I have the right .config in the running kernel.

Will wait for another fix instead.
Martin

Peter Wu wrote:

Nevermind this patch, it does not really fix the memleak because
i2c_set_adapdata() calls dev_set_drvdata() which allocates memory.
(I must have ran kmemleak too early, right after boot it did not
give any warnings, now it does).

RFC: what about dropping i2c_set_adapdata() from the probe function
and replacing i2c_get_adapdata(adapter) by
pci_get_drvdata(adapter->pci_dev) on top of this patch? I am not
sure what the purpose is for i2c_set_adapdata, hence this question.

Regards,
Peter

On Monday 23 December 2013 10:39:38 Peter Wu wrote:

The driver-specific data for i801 was only set for the device on
success, that led to a memory leak on error paths (for instance, when
there is a resource conflict with ACPI). (The driver core clears the
driver data (if set) if the probe routine fails).

Fix it by setting the driver data right after successful memory
allocation, before reaching any error paths.

References: http://lkml.org/lkml/2013/1/23/191
Reported-by: Martin Mokrejs 
Tested-by: Peter Wu  [ACPI conflict error path]
Signed-off-by: Peter Wu 
---
Hi Jean,

This memleak issue is still present in v3.13-rc4-256-gb7000ad.
 From kmemleak:

unreferenced object 0x88022f501a00 (size 256):
   comm "systemd-udevd", pid 209, jiffies 4294896115 (age 2872.520s)
   hex dump (first 32 bytes):
 00 00 00 00 ad 4e ad de ff ff ff ff 00 00 00 00  .N..
 ff ff ff ff ff ff ff ff f4 e2 53 82 ff ff ff ff  ..S.
   backtrace:
 [] kmemleak_alloc+0x4e/0xb0
 [] kmem_cache_alloc_trace+0xfa/0x1e0
 [] device_private_init+0x23/0x80
 [] dev_set_drvdata+0x39/0x50
 [] i801_probe+0x59/0x528 [i2c_i801]
 [] local_pci_probe+0x45/0xa0
 [] pci_device_probe+0xd9/0x130
 [] driver_probe_device+0x87/0x390
 [] __driver_attach+0x93/0xa0
 [] bus_for_each_dev+0x6b/0xb0
 [] driver_attach+0x1e/0x20
 [] bus_add_driver+0x188/0x260
 [] driver_register+0x64/0xf0
 [] __pci_register_driver+0x60/0x70
 [] 0xa02990af
 [] do_one_initcall+0xf2/0x1a0

The dmesg for this laptop also contains a resource conflict message,
just like the reporter (Martin Mokrejs):

 [   15.409772] ACPI Warning: 0x1840-0x185f 
SystemIO conflicts with Region \_SB_.PCI0.SBUS.SMBI 1 (20131115/utaddress-251)
 [   15.413439] ACPI: If an ACPI driver is available for this device, you 
should use it instead of the native driver

With this patch applied on top of almost 3.13-rc5 (v3.13-rc4-256-gb7000ad),
the memleak is gone.

Regards,
Peter
---
  drivers/i2c/busses/i2c-i801.c | 3 +--
  1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/i2c/busses/i2c-i801.c b/drivers/i2c/busses/i2c-i801.c
index 737e298..a7096bf 100644
--- a/drivers/i2c/busses/i2c-i801.c
+++ b/drivers/i2c/busses/i2c-i801.c
@@ -1117,6 +1117,7 @@ static int i801_probe(struct pci_dev *dev, const struct 
pci_device_id *id)
if (!priv)
return -ENOMEM;

+   pci_set_drvdata(dev, priv);
i2c_set_adapdata(>adapter, priv);
priv->adapter.owner = THIS_MODULE;
priv->adapter.class = i801_get_adapter_class(priv);
@@ -1236,8 +1237,6 @@ static int i801_probe(struct pci_dev *dev, const struct 
pci_device_id *id)
/* We ignore errors - multiplexing is optional */
i801_add_mux(priv);

-   pci_set_drvdata(dev, priv);
-
return 0;

  exit_free_irq:





--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] i2c: i801: fix memleak on probe error

2013-12-23 Thread Martin Mokrejs


Thanks for the note, was just compiling a new 3.10.24 kernel to test it.
;-)

So far just booted an old 3.9 kernel and after plugging in an external
USB3 drive I got the message, just to be sure I am still able to reproduce
the error and that I have the right .config in the running kernel.

Will wait for another fix instead.
Martin

Peter Wu wrote:

Nevermind this patch, it does not really fix the memleak because
i2c_set_adapdata() calls dev_set_drvdata() which allocates memory.
(I must have ran kmemleak too early, right after boot it did not
give any warnings, now it does).

RFC: what about dropping i2c_set_adapdata() from the probe function
and replacing i2c_get_adapdata(adapter) by
pci_get_drvdata(adapter-pci_dev) on top of this patch? I am not
sure what the purpose is for i2c_set_adapdata, hence this question.

Regards,
Peter

On Monday 23 December 2013 10:39:38 Peter Wu wrote:

The driver-specific data for i801 was only set for the device on
success, that led to a memory leak on error paths (for instance, when
there is a resource conflict with ACPI). (The driver core clears the
driver data (if set) if the probe routine fails).

Fix it by setting the driver data right after successful memory
allocation, before reaching any error paths.

References: http://lkml.org/lkml/2013/1/23/191
Reported-by: Martin Mokrejs mmokr...@fold.natur.cuni.cz
Tested-by: Peter Wu lekenst...@gmail.com [ACPI conflict error path]
Signed-off-by: Peter Wu lekenst...@gmail.com
---
Hi Jean,

This memleak issue is still present in v3.13-rc4-256-gb7000ad.
 From kmemleak:

unreferenced object 0x88022f501a00 (size 256):
   comm systemd-udevd, pid 209, jiffies 4294896115 (age 2872.520s)
   hex dump (first 32 bytes):
 00 00 00 00 ad 4e ad de ff ff ff ff 00 00 00 00  .N..
 ff ff ff ff ff ff ff ff f4 e2 53 82 ff ff ff ff  ..S.
   backtrace:
 [815d29ce] kmemleak_alloc+0x4e/0xb0
 [8116ea5a] kmem_cache_alloc_trace+0xfa/0x1e0
 [813efc63] device_private_init+0x23/0x80
 [813f2b49] dev_set_drvdata+0x39/0x50
 [a0294539] i801_probe+0x59/0x528 [i2c_i801]
 [81332d95] local_pci_probe+0x45/0xa0
 [81333be9] pci_device_probe+0xd9/0x130
 [813f30e7] driver_probe_device+0x87/0x390
 [813f34c3] __driver_attach+0x93/0xa0
 [813f102b] bus_for_each_dev+0x6b/0xb0
 [813f2b0e] driver_attach+0x1e/0x20
 [813f26e8] bus_add_driver+0x188/0x260
 [813f3b04] driver_register+0x64/0xf0
 [81332930] __pci_register_driver+0x60/0x70
 [a02990af] 0xa02990af
 [81000312] do_one_initcall+0xf2/0x1a0

The dmesg for this laptop also contains a resource conflict message,
just like the reporter (Martin Mokrejs):

 [   15.409772] ACPI Warning: 0x1840-0x185f 
SystemIO conflicts with Region \_SB_.PCI0.SBUS.SMBI 1 (20131115/utaddress-251)
 [   15.413439] ACPI: If an ACPI driver is available for this device, you 
should use it instead of the native driver

With this patch applied on top of almost 3.13-rc5 (v3.13-rc4-256-gb7000ad),
the memleak is gone.

Regards,
Peter
---
  drivers/i2c/busses/i2c-i801.c | 3 +--
  1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/i2c/busses/i2c-i801.c b/drivers/i2c/busses/i2c-i801.c
index 737e298..a7096bf 100644
--- a/drivers/i2c/busses/i2c-i801.c
+++ b/drivers/i2c/busses/i2c-i801.c
@@ -1117,6 +1117,7 @@ static int i801_probe(struct pci_dev *dev, const struct 
pci_device_id *id)
if (!priv)
return -ENOMEM;

+   pci_set_drvdata(dev, priv);
i2c_set_adapdata(priv-adapter, priv);
priv-adapter.owner = THIS_MODULE;
priv-adapter.class = i801_get_adapter_class(priv);
@@ -1236,8 +1237,6 @@ static int i801_probe(struct pci_dev *dev, const struct 
pci_device_id *id)
/* We ignore errors - multiplexing is optional */
i801_add_mux(priv);

-   pci_set_drvdata(dev, priv);
-
return 0;

  exit_free_irq:





--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [sched_delayed] sched: RT throttling activated

2013-08-23 Thread Martin Mokrejs

While you are probably thinking about the iwlwifi issue causing RT throttling
I have one more interesting followup below.

Peter Zijlstra wrote:
> On Fri, Aug 23, 2013 at 12:38:53PM +0200, Martin Mokrejs wrote:
>>> It means you have (a) real-time task(s) that consume significant amount
>>
>> How can I find them? 
> 
> ps -deo pid,cls,cmd | grep -e RR -e FF
> 
> Should do I suppose
> 
>> I don't think I need the RT, I have two CPU-bound
>> processes and want to run them at max speed. Rest of the system is 
>> unimportant.
>>
>> I still don't understand what the $subj message actually says. Does it say
>> the RT-requiring task was slowed down? I am a bit lost here.
> 
> Yeah, they were forcibly stopped from running for a little while.
> 
>>> of time. At some point we throttle them in an attempt to keep the system
>>> from falling over.
>>
>> Will I get companion "[sched_delayed] sched: RT throttling deactivated"
>> at some point?
> 
> Nope, you get that message once to tell you that we throttle RT tasks.
> 
>> Are python-based apps requiring the realtime features?
> 
> I'm fairly sure python could use the relevant scheduling classes, but I
> don't speak snake so I really wouldn't know.
> 
>> I used to get the messages below which are now gone with my CPU cooler being 
>> replaced yesterday:
>>
>> [ 4172.717272] CPU1: Core temperature above threshold, cpu clock throttled 
>> (total events = 153727)
> 
>> mcelog report in such cases:
>>
>> Hardware event. This is not a software error.
>> MCE 0
>> CPU 1 THERMAL EVENT TSC 1bf82e2a146 
>> TIME 1375536062 Sat Aug  3 15:21:02 2013
>> Processor 1 heated above trip temperature. Throttling enabled.
>> Please check your system cooling. Performance will be impacted
>> STATUS 880003c3 MCGSTATUS 0
>> MCGCAP c07 APICID 2 SOCKETID 0 
>> CPUID Vendor Intel Family 6 Model 42
> 
> Right, those are thermal events throttling the speed of your CPU to keep
> the thing from heat damaging itself.
> 
>> While my CPU cooler got replaced even now I still get (hence this email 
>> thread):
>>
>> [39564.452795] blah.py[14396]: segfault at 7ff67af34a58 ip 7ff67badff00 
>> sp 7fff771ce798 error 4 in libpython2.7.so.1.0[7ff67b9cf000+173000]
>> [44520.259205] [sched_delayed] sched: RT throttling activated
>> [48956.057816] blah.py[16623]: segfault at 2f ip 7fd462e5d046 sp 
>> 7fff638431e0 error 4 in libpython2.7.so.1.0[7fd462d7c000+173000]
>> [49288.388797] blah.py[28631]: segfault at 7fe254b6aa58 ip 7fe255715f00 
>> sp 7fff6ddaaff8 error 4 in libpython2.7.so.1.0[7fe255605000+173000]
>> [49942.020084] blah.py[6950]: segfault at d0 ip 7f3e8a9acf9c sp 
>> 7fffa72288a0 error 4 in libpython2.7.so.1.0[7f3e8a904000+173000]
>> [66696.443342] blah.py[8015]: segfault at cf ip 7f798f708f9c sp 
>> 7fff420336e0 error 4 in libpython2.7.so.1.0[7f798f66+173000]
>> [67561.587383] blah.py[7483]: segfault at 7f7b16e01540 ip 7f7b17a85f00 
>> sp 7fffe663d9b8 error 4 in libpython2.7.so.1.0[7f7b17975000+173000]
>> [77262.490502] blah.py[29107]: segfault at 21e1458 ip 7fc54cd17f00 sp 
>> 7fff283c5c38 error 4 in libpython2.7.so.1.0[7fc54cc07000+173000]
>>
>>
>> So, what does this "[sched_delayed] sched: RT throttling activated" tell me?
> 
> That of the past 1s, 0.95s were spend running RR/FIFO tasks. It is a
> warning that comes only once per boot and should prompt you to
> investigate.
> 
> You can turn the throttle off, but be advised that running a RR/FIFO
> task at 100% can (and generally does) negatively affect the running of
> your system (as in, these tasks can prevent system duties from taking
> place and eventually make the system come to a halt).
> 
> 
> As to those faults, investigate if your python prog does something
> particualrly weird or your runtime is in order. Otherwise I would advise
> you to run memtest for a while to make sure your machine is in proper
> working order.

Hmm, meanwhile the core dumps filled up my /var/dumps/ directory of / 
filesystem.
I do not have timing information what was the time since bootup. I deleted some
files on the disk and thought I am done. Now, few hours later I realized:

[85451.247130] traps: blah.py[30787] general protection ip:7faf7b57a046 
sp:7fffd9f7b1d0 error:0 in libpython2.7.so.1.0[7faf7b499000+173000]
[87125.493730] nr_pdflush_threads exported in /proc is scheduled for removal
[87125.494238] sysctl: The scan_unevictable_pages sysctl/node-interface has 
been disabled for lack of a legitimate use case.  If you have one, please send 
an email to linux...

Re: [sched_delayed] sched: RT throttling activated

2013-08-23 Thread Martin Mokrejs

Martin Mokrejs wrote:

>>>> Nope, you get that message once to tell you that we throttle RT tasks.
>>>
>>> I think the message could improved to explain this is a warn ONCE message 
>>> and
>>> that there is no "[sched_delayed] sched: RT throttling deactivated" 
>>> counterpart
>>> message to be anticipated.
>>
>> Would something like: 
>>
>>   sched: [ONCE] RT throttle hit -- inspect system configuration.
>>
>> Be a better message?
> 
> Not really. I would prefer something like:
> 
> [sched_delayed] sched: stopped running $cmd on CPU%d in favor of RR/FIFO task 
> $psname

Actually, to retain the message text appearing in current kernel so that people 
can find
by e.g. Google newer syntax and possibly this thread maybe much better would be:

[sched_delayed] sched: RT throttling limit $d hit. Stopped running $cmd on 
CPU%d in favor of RR/FIFO task $psname. Will not issue any more these messages 
until reboot.

I know, looong line.

I just realized this is about some threshold limit value, and you mean that 
iwlwifi
contributed the highest increase compared to the other kernel threads on my 
system.

sysctl -q -a |  grep -i limit

does not show what is the actual value. Am probably looking into a wrong place. 
;-)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [sched_delayed] sched: RT throttling activated

2013-08-23 Thread Martin Mokrejs



Peter Zijlstra wrote:
> On Fri, Aug 23, 2013 at 01:35:24PM +0200, Martin Mokrejs wrote:
> 
>> # ps -deo pid,cls,cmd | grep -e 'RR \[' -e 'FF \['
> 
> This explicitly only lists kernel threads; from your other comment:
> 
>> The shell/python tasks have 'TS' in place of the FF value in the second 
>> column
>> so I guess they are not requiring realtime responsiveness.
> 
> I'll assume you actually inspected the other tasks and found none.

Yes, the other (false) matches were in the third or latter columns so I wanted 
to match
just those true matches and cut it. I admit, this is not a general-purpose
REGEXP and is misleading.

> 
>> 7  FF [migration/0]
>>10  FF [watchdog/0]
>>11  FF [watchdog/1]
>>12  FF [migration/1]
>>17  FF [migration/2]
>>22  FF [migration/3]
> 
> The 'migration' threads only look like FIFO threads but they're secretly
> not and don't count to the limit. The watchdog threads shouldn't run
> much either.
> 
>>  2161  FF [irq/50-iwlwifi]
> 
> Oh a threaded interrupt, I presume you're not using "threadiqrs" since

Is that what you talk about?

CONFIG_IRQ_FORCED_THREADING=y
CONFIG_GENERIC_SMP_IDLE_THREAD=y

> this is the only interrupt thread around and I see a
> 'request_threaded_irq()' call in
> drivers/net/wireless/iwlwifi/pcie/trans.c
> 
> And wow, why would that thing consume that much cpu. 
> 
> Johill, ever seen the iwlwifi interrupt go 'funny' and consume gobs of
> cpu-time?

I am not sure if I understand you but in case it helps somebody

Current values:

# cat /proc/interrupts 
   CPU0   CPU1   
  0: 23  0   IO-APIC-edge  timer
  1: 42  0   IO-APIC-edge  i8042
  8: 36  0   IO-APIC-edge  rtc0
  9:  3  0   IO-APIC-fasteoi   acpi
 12: 404650  0   IO-APIC-edge  i8042
 16:109  0   IO-APIC-fasteoi   ehci_hcd:usb1
 23: 583646  0   IO-APIC-fasteoi   ehci_hcd:usb2
 40:  0  0   PCI-MSI-edge  pciehp
 41:  54319  0   PCI-MSI-edge  i915
 42: 553802  0   PCI-MSI-edge  ahci
 43:  0  0   PCI-MSI-edge  enp5s0
 44: 257268  0   PCI-MSI-edge  xhci_hcd
 45:  0  0   PCI-MSI-edge  xhci_hcd
 46:  0  0   PCI-MSI-edge  xhci_hcd
 47:  0  0   PCI-MSI-edge  xhci_hcd
 48:  0  0   PCI-MSI-edge  xhci_hcd
 49: 465462  0   PCI-MSI-edge  snd_hda_intel
 50:3895788  0   PCI-MSI-edge  iwlwifi
NMI:   8687   9483   Non-maskable interrupts
LOC:   17531664   16978131   Local timer interrupts
SPU:  0  0   Spurious interrupts
PMI:   8687   9483   Performance monitoring interrupts
IWI: 213009 205171   IRQ work interrupts
RTR:  3  0   APIC ICR read retries
RES:19226514491695   Rescheduling interrupts
CAL:  73741 348678   Function call interrupts
TLB:  98634 73   TLB shootdowns
TRM:  0  0   Thermal event interrupts
THR:  0  0   Threshold APIC interrupts
MCE:  0  0   Machine check exceptions
MCP:286286   Machine check polls
ERR:  0
MIS:  0

# ifconfig wlp9s0
wlp9s0: flags=4163  mtu 1500
inet 192.168.0.24  netmask 255.255.255.0  broadcast 192.168.0.255
inet6 fe80::4e80:93ff:fe15:e6c7  prefixlen 64  scopeid 0x20
ether 4c:80:93:15:e6:c7  txqueuelen 1000  (Ethernet)
RX packets 811806  bytes 992611146 (946.6 MiB)
RX errors 0  dropped 0  overruns 0  frame 0
TX packets 490006  bytes 71390887 (68.0 MiB)
TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

# dmesg
...
[   11.789302] Intel(R) Wireless WiFi driver for Linux, in-tree:d
[   11.789310] Copyright(c) 2003-2013 Intel Corporation
[   11.791626] iwlwifi :09:00.0: irq 50 for MSI/MSI-X
[   12.044905] iwlwifi :09:00.0: loaded firmware version 18.168.6.1 op_mode 
iwldvm
[   13.896033] iwlwifi :09:00.0: CONFIG_IWLWIFI_DEBUG enabled
[   13.896041] iwlwifi :09:00.0: CONFIG_IWLWIFI_DEBUGFS disabled
[   13.896044] iwlwifi :09:00.0: CONFIG_IWLWIFI_DEVICE_TRACING disabled
[   13.896047] iwlwifi :09:00.0: CONFIG_IWLWIFI_DEVICE_TESTMODE disabled
[   13.896049] iwlwifi :09:00.0: CONFIG_IWLWIFI_P2P disabled
[   13.896054] iwlwifi :09:00.0: Detected Intel(R) Centrino(R) Wireless-N 
1030 BGN, REV=0xB0
[   13.896173] iwlwifi :09:00.0: L1 Disabled; Enabling L0S
[   13.917705] ieee80211 phy0: Selected rate control algorithm 'iwl-agn-rs'


> 
> 
>>> Nope, you get that message once to tell you that we throttle RT tasks.
>>
>> I think the message could improved to explain this is a warn ONCE message and
>>

Re: [sched_delayed] sched: RT throttling activated

2013-08-23 Thread Martin Mokrejs



Peter Zijlstra wrote:
> On Fri, Aug 23, 2013 at 12:38:53PM +0200, Martin Mokrejs wrote:
>>> It means you have (a) real-time task(s) that consume significant amount
>>
>> How can I find them? 
> 
> ps -deo pid,cls,cmd | grep -e RR -e FF

# ps -deo pid,cls,cmd | grep -e 'RR \[' -e 'FF \['
7  FF [migration/0]
   10  FF [watchdog/0]
   11  FF [watchdog/1]
   12  FF [migration/1]
   17  FF [migration/2]
   22  FF [migration/3]
 2161  FF [irq/50-iwlwifi]
#

The shell/python tasks have 'TS' in place of the FF value in the second column
so I guess they are not requiring realtime responsiveness.

> 
> Should do I suppose
> 
>> I don't think I need the RT, I have two CPU-bound
>> processes and want to run them at max speed. Rest of the system is 
>> unimportant.
>>
>> I still don't understand what the $subj message actually says. Does it say
>> the RT-requiring task was slowed down? I am a bit lost here.
> 
> Yeah, they were forcibly stopped from running for a little while.
> 
>>> of time. At some point we throttle them in an attempt to keep the system
>>> from falling over.
>>
>> Will I get companion "[sched_delayed] sched: RT throttling deactivated"
>> at some point?
> 
> Nope, you get that message once to tell you that we throttle RT tasks.

I think the message could improved to explain this is a warn ONCE message and
that there is no "[sched_delayed] sched: RT throttling deactivated" counterpart
message to be anticipated.

> 
>> Are python-based apps requiring the realtime features?
> 
> I'm fairly sure python could use the relevant scheduling classes, but I
> don't speak snake so I really wouldn't know.
> 
>> I used to get the messages below which are now gone with my CPU cooler being 
>> replaced yesterday:
>>
>> [ 4172.717272] CPU1: Core temperature above threshold, cpu clock throttled 
>> (total events = 153727)
> 
>> mcelog report in such cases:
>>
>> Hardware event. This is not a software error.
>> MCE 0
>> CPU 1 THERMAL EVENT TSC 1bf82e2a146 
>> TIME 1375536062 Sat Aug  3 15:21:02 2013
>> Processor 1 heated above trip temperature. Throttling enabled.
>> Please check your system cooling. Performance will be impacted
>> STATUS 880003c3 MCGSTATUS 0
>> MCGCAP c07 APICID 2 SOCKETID 0 
>> CPUID Vendor Intel Family 6 Model 42
> 
> Right, those are thermal events throttling the speed of your CPU to keep
> the thing from heat damaging itself.
> 
>> While my CPU cooler got replaced even now I still get (hence this email 
>> thread):
>>
>> [39564.452795] blah.py[14396]: segfault at 7ff67af34a58 ip 7ff67badff00 
>> sp 7fff771ce798 error 4 in libpython2.7.so.1.0[7ff67b9cf000+173000]
>> [44520.259205] [sched_delayed] sched: RT throttling activated
>> [48956.057816] blah.py[16623]: segfault at 2f ip 7fd462e5d046 sp 
>> 7fff638431e0 error 4 in libpython2.7.so.1.0[7fd462d7c000+173000]
>> [49288.388797] blah.py[28631]: segfault at 7fe254b6aa58 ip 7fe255715f00 
>> sp 7fff6ddaaff8 error 4 in libpython2.7.so.1.0[7fe255605000+173000]
>> [49942.020084] blah.py[6950]: segfault at d0 ip 7f3e8a9acf9c sp 
>> 7fffa72288a0 error 4 in libpython2.7.so.1.0[7f3e8a904000+173000]
>> [66696.443342] blah.py[8015]: segfault at cf ip 7f798f708f9c sp 
>> 7fff420336e0 error 4 in libpython2.7.so.1.0[7f798f66+173000]
>> [67561.587383] blah.py[7483]: segfault at 7f7b16e01540 ip 7f7b17a85f00 
>> sp 7fffe663d9b8 error 4 in libpython2.7.so.1.0[7f7b17975000+173000]
>> [77262.490502] blah.py[29107]: segfault at 21e1458 ip 7fc54cd17f00 sp 
>> 7fff283c5c38 error 4 in libpython2.7.so.1.0[7fc54cc07000+173000]
>>
>>
>> So, what does this "[sched_delayed] sched: RT throttling activated" tell me?
> 
> That of the past 1s, 0.95s were spend running RR/FIFO tasks. It is a
> warning that comes only once per boot and should prompt you to
> investigate.

Could kernel log by itself some kind of equivalent of the
"ps -deo pid,cls,cmd | grep -e 'RR \[' -e 'FF \['" command?

> 
> You can turn the throttle off, but be advised that running a RR/FIFO
> task at 100% can (and generally does) negatively affect the running of
> your system (as in, these tasks can prevent system duties from taking
> place and eventually make the system come to a halt).

Provided I have in my .config:

# grep EMPT .config.current 
# CONFIG_PREEMPT_RCU is not set
CONFIG_PREEMPT_NONE=y
# CONFIG_PREEMPT_VOLUNTARY is not set
# CONFIG_PREEMPT is not set

does that mean that I can't do much about those kernel tasks reported by the ps
command above? Or could kernel be tuned to be even less dema

Re: [sched_delayed] sched: RT throttling activated

2013-08-23 Thread Martin Mokrejs

Hi Peter,

Peter Zijlstra wrote:
> On Fri, Aug 23, 2013 at 10:53:02AM +0200, Martin Mokrejs wrote:
>> Hi,
>>   I tried to figure out what this message really means. I came to 
>> https://rt.wiki.kernel.org/index.php/Frequently_Asked_Questions
>> but I am still lost. I lack in the FAQ some user-related information.
>> The first paragraph is still unclear to me. I have a i7-2640M based
>> laptop, hyperthreading is enabled by BIOS but I shut down the two
>> emulated cores by (no BIOS option to disable HT):
>>
>> Would you please clarify what the "[sched_delayed] sched: RT throttling 
>> activated"
>> really means? 
> 
> It means you have (a) real-time task(s) that consume significant amount

How can I find them? I don't think I need the RT, I have two CPU-bound
processes and want to run them at max speed. Rest of the system is unimportant.

I still don't understand what the $subj message actually says. Does it say
the RT-requiring task was slowed down? I am a bit lost here.

> of time. At some point we throttle them in an attempt to keep the system
> from falling over.

Will I get companion "[sched_delayed] sched: RT throttling deactivated"
at some point?

> 
>> Is that because there is some RT-requiring application on my system?
> 
> Yep.

Which? How can I find them and turn that requirement off (if I understand right 
they
interrupt my long-living computing processes)?

> 
>> I don't know of any (or don't care about real-time responsiveness except 
>> that ALSA
>> drivers require me to have CONFIG_SND_HRTIMER=y). Per Goggle answers could 
>> the
>> culprit be nfsd? Then I will recompile is as a module.
> 
> Unlikely, I don't think I've ever seen anybody run their nfsd with RT

Maybe false info in that thread, I don't know:
http://forums.opensuse.org/english/get-technical-help-here/applications/482756-kernel-panic-rt-throttling-activated.html

> priority. Also, you can run RT tasks regardless of the config options.
> SCHED_RR and SCHED_FIFO are POSIX specified and always available.

Are python-based apps requiring the realtime features?


I used to get the messages below which are now gone with my CPU cooler being 
replaced yesterday:

[ 4172.717272] CPU1: Core temperature above threshold, cpu clock throttled 
(total events = 153727)
[ 4172.717277] CPU1: Package temperature above threshold, cpu clock throttled 
(total events = 158008)
[ 4172.717348] CPU0: Package temperature above threshold, cpu clock throttled 
(total events = 158008)
[ 4172.718291] CPU1: Core temperature/speed normal
[ 4172.718293] CPU1: Package temperature/speed normal
[ 4172.718347] CPU0: Package temperature/speed normal
[ 4205.336883] mce: [Hardware Error]: Machine check events logged
...
[ 8966.052786] CPU1: Core temperature/speed normal
[ 8966.052788] CPU0: Package temperature/speed normal
[ 8966.052791] CPU1: Package temperature/speed normal
[ 9266.421068] CPU1: Core temperature above threshold, cpu clock throttled 
(total events = 530778)
[ 9266.421070] CPU0: Package temperature above threshold, cpu clock throttled 
(total events = 547228)
[ 9266.421075] CPU1: Package temperature above threshold, cpu clock throttled 
(total events = 547228)
[ 9266.422076] CPU1: Core temperature/speed normal
[ 9266.422078] CPU0: Package temperature/speed normal
[ 9266.422081] CPU1: Package temperature/speed normal
[ 9445.150679] [sched_delayed] sched: RT throttling activated
[ 9566.792369] CPU1: Core temperature above threshold, cpu clock throttled 
(total events = 559429)
[ 9566.792372] CPU0: Package temperature above threshold, cpu clock throttled 
(total events = 576882)
[ 9566.792378] CPU1: Package temperature above threshold, cpu clock throttled 
(total events = 576882)
[ 9566.793377] CPU1: Core temperature/speed normal
[ 9566.793380] CPU0: Package temperature/speed normal
[ 9566.793382] CPU1: Package temperature/speed normal
[ 9872.630811] CPU1: Core temperature above threshold, cpu clock throttled 
(total events = 583223)
[ 9872.630813] CPU0: Package temperature above threshold, cpu clock throttled 
(total events = 601532)
[ 9872.630817] CPU1: Package temperature above threshold, cpu clock throttled 
(total events = 601532)
[ 9872.631818] CPU1: Core temperature/speed normal
[ 9872.631820] CPU0: Package temperature/speed normal
[ 9872.631823] CPU1: Package temperature/speed normal

mcelog report in such cases:

Hardware event. This is not a software error.
MCE 0
CPU 1 THERMAL EVENT TSC 1bf82e2a146 
TIME 1375536062 Sat Aug  3 15:21:02 2013
Processor 1 heated above trip temperature. Throttling enabled.
Please check your system cooling. Performance will be impacted
STATUS 880003c3 MCGSTATUS 0
MCGCAP c07 APICID 2 SOCKETID 0 
CPUID Vendor Intel Family 6 Model 42





While my CPU cooler got replaced even now I still get (hence this email thread):

[39564.452795] blah.py[14396]: segfault

[sched_delayed] sched: RT throttling activated

2013-08-23 Thread Martin Mokrejs

Hi,
  I tried to figure out what this message really means. I came to 
https://rt.wiki.kernel.org/index.php/Frequently_Asked_Questions
but I am still lost. I lack in the FAQ some user-related information.
The first paragraph is still unclear to me. I have a i7-2640M based
laptop, hyperthreading is enabled by BIOS but I shut down the two
emulated cores by (no BIOS option to disable HT):

echo 0 > /sys/devices/system/cpu/cpu2/online
echo 0 > /sys/devices/system/cpu/cpu3/online

At least I hope I shutdown those emulated ones. i7z claims I did the
right thing and IntelPerformanceCounterMonitorV2.5.1/pcm.x application
says the same:

 EXEC  : instructions per nominal CPU cycle
 IPC   : instructions per CPU cycle
 FREQ  : relation to nominal CPU frequency='unhalted clock ticks'/'invariant 
timer ticks' (includes Intel Turbo Boost)
 AFREQ : relation to nominal CPU frequency while in active state (not in 
power-saving C state)='unhalted clock ticks'/'invariant timer ticks while in 
C0-state'  (includes Intel Turbo Boost)
 L3MISS: L3 cache misses 
 L2MISS: L2 cache misses (including other core's L2 cache *hits*) 
 L3HIT : L3 cache hit ratio (0.00-1.00)
 L2HIT : L2 cache hit ratio (0.00-1.00)
 L3CLK : ratio of CPU cycles lost due to L3 cache misses (0.00-1.00), in some 
cases could be >1.0 due to a higher memory latency
 L2CLK : ratio of CPU cycles lost due to missing L2 cache but still hitting L3 
cache (0.00-1.00)
 READ  : bytes read from memory controller (in GBytes)
 WRITE : bytes written to memory controller (in GBytes)
 TEMP  : Temperature reading in 1 degree Celsius relative to the TjMax 
temperature (thermal headroom): 0 corresponds to the max temperature


 Core (SKT) | EXEC | IPC  | FREQ  | AFREQ | L3MISS | L2MISS | L3HIT | L2HIT | 
L3CLK | L2CLK  | READ  | WRITE | TEMP

   00 1.78   1.51   1.181.181595 K   3363 K0.530.00
0.090.02 N/A N/A 23
   10 1.21   1.03   1.181.189359 K 13 M0.310.00
0.510.04 N/A N/A 24
---
 SKT0 1.50   1.27   1.181.18  10 M 16 M0.350.00
0.300.031.320.37 24
---
 TOTAL  * 1.50   1.27   1.181.18  10 M 16 M0.350.00
0.300.031.320.37 N/A

 Instructions retired: 8368 M ; Active cycles: 6594 M ; Time (TSC): 2797 Mticks 
; C0 (active,non-halted) core residency: 100.00 %

 C1 core residency: 0.00 %; C3 core residency: 0.00 %; C6 core residency: 0.00 
%; C7 core residency: 0.00 %
 C2 package residency: 0.00 %; C3 package residency: 0.00 %; C6 package 
residency: 0.00 %; C7 package residency: 0.00 %

 PHYSICAL CORE IPC : 1.27 => corresponds to 31.73 % utilization 
for cores in active state
 Instructions per nominal CPU cycle: 1.50 => corresponds to 37.40 % core 
utilization over time interval
--

--
 SKT0 package consumed 28.18 Joules
--
 TOTAL:28.18 Joules





Why do I get the message at all? I have in 3.10.9 kernel:

...
CONFIG_IOSCHED_DEADLINE=y
CONFIG_DEFAULT_IOSCHED="deadline"
...
CONFIG_NR_CPUS=4
...
# CONFIG_PREEMPT_RCU is not set
CONFIG_PREEMPT_NONE=y
# CONFIG_PREEMPT_VOLUNTARY is not set
# CONFIG_PREEMPT is not set
...
# CONFIG_SCHED_MC is not set
CONFIG_SCHED_HRTICK=y

I fear this is about CPU being overloaded (both cores loaded
by user processes), by why do I get the message at all?



Cpu speed from cpuinfo 2796.00Mhz
cpuinfo might be wrong if cpufreq is enabled. To guess correctly try estimating 
via tsc
Linux's inbuilt cpu_khz code emulated now
True Frequency (without accounting Turbo) 2796 MHz
  CPU Multiplier 28x || Bus clock frequency (BCLK) 99.86 MHz

Socket [0] - [physical cores=2, logical cores=2, max online cores ever=2]
  TURBO ENABLED on 2 Cores, Hyper Threading OFF
  Max Frequency without considering Turbo 2895.86 MHz (99.86 x [29])
  Max TURBO Multiplier (if Enabled) with 1/2/3/4 Cores is  35x/33x/33x/33x
  Real Current Frequency 3295.29 MHz [99.86 x 33.00] (Max of below)
Core [core-id]  :Actual Freq (Mult.)  C0%   Halt(C1)%  C3 %   C6 %  
 C7 %  Temp
Core 1 [0]:   3295.28 (33.00x)   100   0   0   0
   076
Core 2 [1]:   3295.29 (33.00x)   100   0   0   0
   076



Would you please clarify what the "[sched_delayed] sched: RT throttling 
activated"
really means? Is that because there is some RT-requiring application on my 
system?
I don't know of any (or don't care about

[sched_delayed] sched: RT throttling activated

2013-08-23 Thread Martin Mokrejs

Hi,
  I tried to figure out what this message really means. I came to 
https://rt.wiki.kernel.org/index.php/Frequently_Asked_Questions
but I am still lost. I lack in the FAQ some user-related information.
The first paragraph is still unclear to me. I have a i7-2640M based
laptop, hyperthreading is enabled by BIOS but I shut down the two
emulated cores by (no BIOS option to disable HT):

echo 0  /sys/devices/system/cpu/cpu2/online
echo 0  /sys/devices/system/cpu/cpu3/online

At least I hope I shutdown those emulated ones. i7z claims I did the
right thing and IntelPerformanceCounterMonitorV2.5.1/pcm.x application
says the same:

 EXEC  : instructions per nominal CPU cycle
 IPC   : instructions per CPU cycle
 FREQ  : relation to nominal CPU frequency='unhalted clock ticks'/'invariant 
timer ticks' (includes Intel Turbo Boost)
 AFREQ : relation to nominal CPU frequency while in active state (not in 
power-saving C state)='unhalted clock ticks'/'invariant timer ticks while in 
C0-state'  (includes Intel Turbo Boost)
 L3MISS: L3 cache misses 
 L2MISS: L2 cache misses (including other core's L2 cache *hits*) 
 L3HIT : L3 cache hit ratio (0.00-1.00)
 L2HIT : L2 cache hit ratio (0.00-1.00)
 L3CLK : ratio of CPU cycles lost due to L3 cache misses (0.00-1.00), in some 
cases could be 1.0 due to a higher memory latency
 L2CLK : ratio of CPU cycles lost due to missing L2 cache but still hitting L3 
cache (0.00-1.00)
 READ  : bytes read from memory controller (in GBytes)
 WRITE : bytes written to memory controller (in GBytes)
 TEMP  : Temperature reading in 1 degree Celsius relative to the TjMax 
temperature (thermal headroom): 0 corresponds to the max temperature


 Core (SKT) | EXEC | IPC  | FREQ  | AFREQ | L3MISS | L2MISS | L3HIT | L2HIT | 
L3CLK | L2CLK  | READ  | WRITE | TEMP

   00 1.78   1.51   1.181.181595 K   3363 K0.530.00
0.090.02 N/A N/A 23
   10 1.21   1.03   1.181.189359 K 13 M0.310.00
0.510.04 N/A N/A 24
---
 SKT0 1.50   1.27   1.181.18  10 M 16 M0.350.00
0.300.031.320.37 24
---
 TOTAL  * 1.50   1.27   1.181.18  10 M 16 M0.350.00
0.300.031.320.37 N/A

 Instructions retired: 8368 M ; Active cycles: 6594 M ; Time (TSC): 2797 Mticks 
; C0 (active,non-halted) core residency: 100.00 %

 C1 core residency: 0.00 %; C3 core residency: 0.00 %; C6 core residency: 0.00 
%; C7 core residency: 0.00 %
 C2 package residency: 0.00 %; C3 package residency: 0.00 %; C6 package 
residency: 0.00 %; C7 package residency: 0.00 %

 PHYSICAL CORE IPC : 1.27 = corresponds to 31.73 % utilization 
for cores in active state
 Instructions per nominal CPU cycle: 1.50 = corresponds to 37.40 % core 
utilization over time interval
--

--
 SKT0 package consumed 28.18 Joules
--
 TOTAL:28.18 Joules





Why do I get the message at all? I have in 3.10.9 kernel:

...
CONFIG_IOSCHED_DEADLINE=y
CONFIG_DEFAULT_IOSCHED=deadline
...
CONFIG_NR_CPUS=4
...
# CONFIG_PREEMPT_RCU is not set
CONFIG_PREEMPT_NONE=y
# CONFIG_PREEMPT_VOLUNTARY is not set
# CONFIG_PREEMPT is not set
...
# CONFIG_SCHED_MC is not set
CONFIG_SCHED_HRTICK=y

I fear this is about CPU being overloaded (both cores loaded
by user processes), by why do I get the message at all?



Cpu speed from cpuinfo 2796.00Mhz
cpuinfo might be wrong if cpufreq is enabled. To guess correctly try estimating 
via tsc
Linux's inbuilt cpu_khz code emulated now
True Frequency (without accounting Turbo) 2796 MHz
  CPU Multiplier 28x || Bus clock frequency (BCLK) 99.86 MHz

Socket [0] - [physical cores=2, logical cores=2, max online cores ever=2]
  TURBO ENABLED on 2 Cores, Hyper Threading OFF
  Max Frequency without considering Turbo 2895.86 MHz (99.86 x [29])
  Max TURBO Multiplier (if Enabled) with 1/2/3/4 Cores is  35x/33x/33x/33x
  Real Current Frequency 3295.29 MHz [99.86 x 33.00] (Max of below)
Core [core-id]  :Actual Freq (Mult.)  C0%   Halt(C1)%  C3 %   C6 %  
 C7 %  Temp
Core 1 [0]:   3295.28 (33.00x)   100   0   0   0
   076
Core 2 [1]:   3295.29 (33.00x)   100   0   0   0
   076



Would you please clarify what the [sched_delayed] sched: RT throttling 
activated
really means? Is that because there is some RT-requiring application on my 
system?
I don't know of any (or don't care about real-time

Re: [sched_delayed] sched: RT throttling activated

2013-08-23 Thread Martin Mokrejs

Hi Peter,

Peter Zijlstra wrote:
 On Fri, Aug 23, 2013 at 10:53:02AM +0200, Martin Mokrejs wrote:
 Hi,
   I tried to figure out what this message really means. I came to 
 https://rt.wiki.kernel.org/index.php/Frequently_Asked_Questions
 but I am still lost. I lack in the FAQ some user-related information.
 The first paragraph is still unclear to me. I have a i7-2640M based
 laptop, hyperthreading is enabled by BIOS but I shut down the two
 emulated cores by (no BIOS option to disable HT):

 Would you please clarify what the [sched_delayed] sched: RT throttling 
 activated
 really means? 
 
 It means you have (a) real-time task(s) that consume significant amount

How can I find them? I don't think I need the RT, I have two CPU-bound
processes and want to run them at max speed. Rest of the system is unimportant.

I still don't understand what the $subj message actually says. Does it say
the RT-requiring task was slowed down? I am a bit lost here.

 of time. At some point we throttle them in an attempt to keep the system
 from falling over.

Will I get companion [sched_delayed] sched: RT throttling deactivated
at some point?

 
 Is that because there is some RT-requiring application on my system?
 
 Yep.

Which? How can I find them and turn that requirement off (if I understand right 
they
interrupt my long-living computing processes)?

 
 I don't know of any (or don't care about real-time responsiveness except 
 that ALSA
 drivers require me to have CONFIG_SND_HRTIMER=y). Per Goggle answers could 
 the
 culprit be nfsd? Then I will recompile is as a module.
 
 Unlikely, I don't think I've ever seen anybody run their nfsd with RT

Maybe false info in that thread, I don't know:
http://forums.opensuse.org/english/get-technical-help-here/applications/482756-kernel-panic-rt-throttling-activated.html

 priority. Also, you can run RT tasks regardless of the config options.
 SCHED_RR and SCHED_FIFO are POSIX specified and always available.

Are python-based apps requiring the realtime features?


I used to get the messages below which are now gone with my CPU cooler being 
replaced yesterday:

[ 4172.717272] CPU1: Core temperature above threshold, cpu clock throttled 
(total events = 153727)
[ 4172.717277] CPU1: Package temperature above threshold, cpu clock throttled 
(total events = 158008)
[ 4172.717348] CPU0: Package temperature above threshold, cpu clock throttled 
(total events = 158008)
[ 4172.718291] CPU1: Core temperature/speed normal
[ 4172.718293] CPU1: Package temperature/speed normal
[ 4172.718347] CPU0: Package temperature/speed normal
[ 4205.336883] mce: [Hardware Error]: Machine check events logged
...
[ 8966.052786] CPU1: Core temperature/speed normal
[ 8966.052788] CPU0: Package temperature/speed normal
[ 8966.052791] CPU1: Package temperature/speed normal
[ 9266.421068] CPU1: Core temperature above threshold, cpu clock throttled 
(total events = 530778)
[ 9266.421070] CPU0: Package temperature above threshold, cpu clock throttled 
(total events = 547228)
[ 9266.421075] CPU1: Package temperature above threshold, cpu clock throttled 
(total events = 547228)
[ 9266.422076] CPU1: Core temperature/speed normal
[ 9266.422078] CPU0: Package temperature/speed normal
[ 9266.422081] CPU1: Package temperature/speed normal
[ 9445.150679] [sched_delayed] sched: RT throttling activated
[ 9566.792369] CPU1: Core temperature above threshold, cpu clock throttled 
(total events = 559429)
[ 9566.792372] CPU0: Package temperature above threshold, cpu clock throttled 
(total events = 576882)
[ 9566.792378] CPU1: Package temperature above threshold, cpu clock throttled 
(total events = 576882)
[ 9566.793377] CPU1: Core temperature/speed normal
[ 9566.793380] CPU0: Package temperature/speed normal
[ 9566.793382] CPU1: Package temperature/speed normal
[ 9872.630811] CPU1: Core temperature above threshold, cpu clock throttled 
(total events = 583223)
[ 9872.630813] CPU0: Package temperature above threshold, cpu clock throttled 
(total events = 601532)
[ 9872.630817] CPU1: Package temperature above threshold, cpu clock throttled 
(total events = 601532)
[ 9872.631818] CPU1: Core temperature/speed normal
[ 9872.631820] CPU0: Package temperature/speed normal
[ 9872.631823] CPU1: Package temperature/speed normal

mcelog report in such cases:

Hardware event. This is not a software error.
MCE 0
CPU 1 THERMAL EVENT TSC 1bf82e2a146 
TIME 1375536062 Sat Aug  3 15:21:02 2013
Processor 1 heated above trip temperature. Throttling enabled.
Please check your system cooling. Performance will be impacted
STATUS 880003c3 MCGSTATUS 0
MCGCAP c07 APICID 2 SOCKETID 0 
CPUID Vendor Intel Family 6 Model 42





While my CPU cooler got replaced even now I still get (hence this email thread):

[39564.452795] blah.py[14396]: segfault at 7ff67af34a58 ip 7ff67badff00 sp 
7fff771ce798 error 4 in libpython2.7.so.1.0[7ff67b9cf000+173000]
[44520.259205] [sched_delayed] sched: RT throttling activated
[48956.057816] blah.py[16623]: segfault

Re: [sched_delayed] sched: RT throttling activated

2013-08-23 Thread Martin Mokrejs



Peter Zijlstra wrote:
 On Fri, Aug 23, 2013 at 12:38:53PM +0200, Martin Mokrejs wrote:
 It means you have (a) real-time task(s) that consume significant amount

 How can I find them? 
 
 ps -deo pid,cls,cmd | grep -e RR -e FF

# ps -deo pid,cls,cmd | grep -e 'RR \[' -e 'FF \['
7  FF [migration/0]
   10  FF [watchdog/0]
   11  FF [watchdog/1]
   12  FF [migration/1]
   17  FF [migration/2]
   22  FF [migration/3]
 2161  FF [irq/50-iwlwifi]
#

The shell/python tasks have 'TS' in place of the FF value in the second column
so I guess they are not requiring realtime responsiveness.

 
 Should do I suppose
 
 I don't think I need the RT, I have two CPU-bound
 processes and want to run them at max speed. Rest of the system is 
 unimportant.

 I still don't understand what the $subj message actually says. Does it say
 the RT-requiring task was slowed down? I am a bit lost here.
 
 Yeah, they were forcibly stopped from running for a little while.
 
 of time. At some point we throttle them in an attempt to keep the system
 from falling over.

 Will I get companion [sched_delayed] sched: RT throttling deactivated
 at some point?
 
 Nope, you get that message once to tell you that we throttle RT tasks.

I think the message could improved to explain this is a warn ONCE message and
that there is no [sched_delayed] sched: RT throttling deactivated counterpart
message to be anticipated.

 
 Are python-based apps requiring the realtime features?
 
 I'm fairly sure python could use the relevant scheduling classes, but I
 don't speak snake so I really wouldn't know.
 
 I used to get the messages below which are now gone with my CPU cooler being 
 replaced yesterday:

 [ 4172.717272] CPU1: Core temperature above threshold, cpu clock throttled 
 (total events = 153727)
 
 mcelog report in such cases:

 Hardware event. This is not a software error.
 MCE 0
 CPU 1 THERMAL EVENT TSC 1bf82e2a146 
 TIME 1375536062 Sat Aug  3 15:21:02 2013
 Processor 1 heated above trip temperature. Throttling enabled.
 Please check your system cooling. Performance will be impacted
 STATUS 880003c3 MCGSTATUS 0
 MCGCAP c07 APICID 2 SOCKETID 0 
 CPUID Vendor Intel Family 6 Model 42
 
 Right, those are thermal events throttling the speed of your CPU to keep
 the thing from heat damaging itself.
 
 While my CPU cooler got replaced even now I still get (hence this email 
 thread):

 [39564.452795] blah.py[14396]: segfault at 7ff67af34a58 ip 7ff67badff00 
 sp 7fff771ce798 error 4 in libpython2.7.so.1.0[7ff67b9cf000+173000]
 [44520.259205] [sched_delayed] sched: RT throttling activated
 [48956.057816] blah.py[16623]: segfault at 2f ip 7fd462e5d046 sp 
 7fff638431e0 error 4 in libpython2.7.so.1.0[7fd462d7c000+173000]
 [49288.388797] blah.py[28631]: segfault at 7fe254b6aa58 ip 7fe255715f00 
 sp 7fff6ddaaff8 error 4 in libpython2.7.so.1.0[7fe255605000+173000]
 [49942.020084] blah.py[6950]: segfault at d0 ip 7f3e8a9acf9c sp 
 7fffa72288a0 error 4 in libpython2.7.so.1.0[7f3e8a904000+173000]
 [66696.443342] blah.py[8015]: segfault at cf ip 7f798f708f9c sp 
 7fff420336e0 error 4 in libpython2.7.so.1.0[7f798f66+173000]
 [67561.587383] blah.py[7483]: segfault at 7f7b16e01540 ip 7f7b17a85f00 
 sp 7fffe663d9b8 error 4 in libpython2.7.so.1.0[7f7b17975000+173000]
 [77262.490502] blah.py[29107]: segfault at 21e1458 ip 7fc54cd17f00 sp 
 7fff283c5c38 error 4 in libpython2.7.so.1.0[7fc54cc07000+173000]


 So, what does this [sched_delayed] sched: RT throttling activated tell me?
 
 That of the past 1s, 0.95s were spend running RR/FIFO tasks. It is a
 warning that comes only once per boot and should prompt you to
 investigate.

Could kernel log by itself some kind of equivalent of the
ps -deo pid,cls,cmd | grep -e 'RR \[' -e 'FF \[' command?

 
 You can turn the throttle off, but be advised that running a RR/FIFO
 task at 100% can (and generally does) negatively affect the running of
 your system (as in, these tasks can prevent system duties from taking
 place and eventually make the system come to a halt).

Provided I have in my .config:

# grep EMPT .config.current 
# CONFIG_PREEMPT_RCU is not set
CONFIG_PREEMPT_NONE=y
# CONFIG_PREEMPT_VOLUNTARY is not set
# CONFIG_PREEMPT is not set

does that mean that I can't do much about those kernel tasks reported by the ps
command above? Or could kernel be tuned to be even less demanding and not
interrupt my tasks that often (no idea how often that happens if the message 
is
logged only once and how much harm is causes).

 
 
 As to those faults, investigate if your python prog does something
 particualrly weird or your runtime is in order. Otherwise I would advise
 you to run memtest for a while to make sure your machine is in proper
 working order.

I will re-check the stacktraces but last time I did I did not come to a single
place where it crashes. OK, will re-test the memory again but I think it is 
fine.
It seemed those results of the overheated CPU and thermal

Re: [sched_delayed] sched: RT throttling activated

2013-08-23 Thread Martin Mokrejs



Peter Zijlstra wrote:
 On Fri, Aug 23, 2013 at 01:35:24PM +0200, Martin Mokrejs wrote:
 
 # ps -deo pid,cls,cmd | grep -e 'RR \[' -e 'FF \['
 
 This explicitly only lists kernel threads; from your other comment:
 
 The shell/python tasks have 'TS' in place of the FF value in the second 
 column
 so I guess they are not requiring realtime responsiveness.
 
 I'll assume you actually inspected the other tasks and found none.

Yes, the other (false) matches were in the third or latter columns so I wanted 
to match
just those true matches and cutpaste it. I admit, this is not a general-purpose
REGEXP and is misleading.

 
 7  FF [migration/0]
10  FF [watchdog/0]
11  FF [watchdog/1]
12  FF [migration/1]
17  FF [migration/2]
22  FF [migration/3]
 
 The 'migration' threads only look like FIFO threads but they're secretly
 not and don't count to the limit. The watchdog threads shouldn't run
 much either.
 
  2161  FF [irq/50-iwlwifi]
 
 Oh a threaded interrupt, I presume you're not using threadiqrs since

Is that what you talk about?

CONFIG_IRQ_FORCED_THREADING=y
CONFIG_GENERIC_SMP_IDLE_THREAD=y

 this is the only interrupt thread around and I see a
 'request_threaded_irq()' call in
 drivers/net/wireless/iwlwifi/pcie/trans.c
 
 And wow, why would that thing consume that much cpu. 
 
 Johill, ever seen the iwlwifi interrupt go 'funny' and consume gobs of
 cpu-time?

I am not sure if I understand you but in case it helps somebody

Current values:

# cat /proc/interrupts 
   CPU0   CPU1   
  0: 23  0   IO-APIC-edge  timer
  1: 42  0   IO-APIC-edge  i8042
  8: 36  0   IO-APIC-edge  rtc0
  9:  3  0   IO-APIC-fasteoi   acpi
 12: 404650  0   IO-APIC-edge  i8042
 16:109  0   IO-APIC-fasteoi   ehci_hcd:usb1
 23: 583646  0   IO-APIC-fasteoi   ehci_hcd:usb2
 40:  0  0   PCI-MSI-edge  pciehp
 41:  54319  0   PCI-MSI-edge  i915
 42: 553802  0   PCI-MSI-edge  ahci
 43:  0  0   PCI-MSI-edge  enp5s0
 44: 257268  0   PCI-MSI-edge  xhci_hcd
 45:  0  0   PCI-MSI-edge  xhci_hcd
 46:  0  0   PCI-MSI-edge  xhci_hcd
 47:  0  0   PCI-MSI-edge  xhci_hcd
 48:  0  0   PCI-MSI-edge  xhci_hcd
 49: 465462  0   PCI-MSI-edge  snd_hda_intel
 50:3895788  0   PCI-MSI-edge  iwlwifi
NMI:   8687   9483   Non-maskable interrupts
LOC:   17531664   16978131   Local timer interrupts
SPU:  0  0   Spurious interrupts
PMI:   8687   9483   Performance monitoring interrupts
IWI: 213009 205171   IRQ work interrupts
RTR:  3  0   APIC ICR read retries
RES:19226514491695   Rescheduling interrupts
CAL:  73741 348678   Function call interrupts
TLB:  98634 73   TLB shootdowns
TRM:  0  0   Thermal event interrupts
THR:  0  0   Threshold APIC interrupts
MCE:  0  0   Machine check exceptions
MCP:286286   Machine check polls
ERR:  0
MIS:  0

# ifconfig wlp9s0
wlp9s0: flags=4163UP,BROADCAST,RUNNING,MULTICAST  mtu 1500
inet 192.168.0.24  netmask 255.255.255.0  broadcast 192.168.0.255
inet6 fe80::4e80:93ff:fe15:e6c7  prefixlen 64  scopeid 0x20link
ether 4c:80:93:15:e6:c7  txqueuelen 1000  (Ethernet)
RX packets 811806  bytes 992611146 (946.6 MiB)
RX errors 0  dropped 0  overruns 0  frame 0
TX packets 490006  bytes 71390887 (68.0 MiB)
TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

# dmesg
...
[   11.789302] Intel(R) Wireless WiFi driver for Linux, in-tree:d
[   11.789310] Copyright(c) 2003-2013 Intel Corporation
[   11.791626] iwlwifi :09:00.0: irq 50 for MSI/MSI-X
[   12.044905] iwlwifi :09:00.0: loaded firmware version 18.168.6.1 op_mode 
iwldvm
[   13.896033] iwlwifi :09:00.0: CONFIG_IWLWIFI_DEBUG enabled
[   13.896041] iwlwifi :09:00.0: CONFIG_IWLWIFI_DEBUGFS disabled
[   13.896044] iwlwifi :09:00.0: CONFIG_IWLWIFI_DEVICE_TRACING disabled
[   13.896047] iwlwifi :09:00.0: CONFIG_IWLWIFI_DEVICE_TESTMODE disabled
[   13.896049] iwlwifi :09:00.0: CONFIG_IWLWIFI_P2P disabled
[   13.896054] iwlwifi :09:00.0: Detected Intel(R) Centrino(R) Wireless-N 
1030 BGN, REV=0xB0
[   13.896173] iwlwifi :09:00.0: L1 Disabled; Enabling L0S
[   13.917705] ieee80211 phy0: Selected rate control algorithm 'iwl-agn-rs'


 
 
 Nope, you get that message once to tell you that we throttle RT tasks.

 I think the message could improved to explain this is a warn ONCE message and
 that there is no [sched_delayed] sched: RT throttling deactivated 
 counterpart
 message to be anticipated.
 
 Would something like: 
 
   sched: [ONCE] RT throttle hit -- inspect system configuration

Re: [sched_delayed] sched: RT throttling activated

2013-08-23 Thread Martin Mokrejs



Martin Mokrejs wrote:

 Nope, you get that message once to tell you that we throttle RT tasks.

 I think the message could improved to explain this is a warn ONCE message 
 and
 that there is no [sched_delayed] sched: RT throttling deactivated 
 counterpart
 message to be anticipated.

 Would something like: 

   sched: [ONCE] RT throttle hit -- inspect system configuration.

 Be a better message?
 
 Not really. I would prefer something like:
 
 [sched_delayed] sched: stopped running $cmd on CPU%d in favor of RR/FIFO task 
 $psname

Actually, to retain the message text appearing in current kernel so that people 
can find
by e.g. Google newer syntax and possibly this thread maybe much better would be:

[sched_delayed] sched: RT throttling limit $d hit. Stopped running $cmd on 
CPU%d in favor of RR/FIFO task $psname. Will not issue any more these messages 
until reboot.

I know, looong line.


I just realized this is about some threshold limit value, and you mean that 
iwlwifi
contributed the highest increase compared to the other kernel threads on my 
system.

sysctl -q -a |  grep -i limit

does not show what is the actual value. Am probably looking into a wrong place. 
;-)

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [sched_delayed] sched: RT throttling activated

2013-08-23 Thread Martin Mokrejs

While you are probably thinking about the iwlwifi issue causing RT throttling
I have one more interesting followup below.

Peter Zijlstra wrote:
 On Fri, Aug 23, 2013 at 12:38:53PM +0200, Martin Mokrejs wrote:
 It means you have (a) real-time task(s) that consume significant amount

 How can I find them? 
 
 ps -deo pid,cls,cmd | grep -e RR -e FF
 
 Should do I suppose
 
 I don't think I need the RT, I have two CPU-bound
 processes and want to run them at max speed. Rest of the system is 
 unimportant.

 I still don't understand what the $subj message actually says. Does it say
 the RT-requiring task was slowed down? I am a bit lost here.
 
 Yeah, they were forcibly stopped from running for a little while.
 
 of time. At some point we throttle them in an attempt to keep the system
 from falling over.

 Will I get companion [sched_delayed] sched: RT throttling deactivated
 at some point?
 
 Nope, you get that message once to tell you that we throttle RT tasks.
 
 Are python-based apps requiring the realtime features?
 
 I'm fairly sure python could use the relevant scheduling classes, but I
 don't speak snake so I really wouldn't know.
 
 I used to get the messages below which are now gone with my CPU cooler being 
 replaced yesterday:

 [ 4172.717272] CPU1: Core temperature above threshold, cpu clock throttled 
 (total events = 153727)
 
 mcelog report in such cases:

 Hardware event. This is not a software error.
 MCE 0
 CPU 1 THERMAL EVENT TSC 1bf82e2a146 
 TIME 1375536062 Sat Aug  3 15:21:02 2013
 Processor 1 heated above trip temperature. Throttling enabled.
 Please check your system cooling. Performance will be impacted
 STATUS 880003c3 MCGSTATUS 0
 MCGCAP c07 APICID 2 SOCKETID 0 
 CPUID Vendor Intel Family 6 Model 42
 
 Right, those are thermal events throttling the speed of your CPU to keep
 the thing from heat damaging itself.
 
 While my CPU cooler got replaced even now I still get (hence this email 
 thread):

 [39564.452795] blah.py[14396]: segfault at 7ff67af34a58 ip 7ff67badff00 
 sp 7fff771ce798 error 4 in libpython2.7.so.1.0[7ff67b9cf000+173000]
 [44520.259205] [sched_delayed] sched: RT throttling activated
 [48956.057816] blah.py[16623]: segfault at 2f ip 7fd462e5d046 sp 
 7fff638431e0 error 4 in libpython2.7.so.1.0[7fd462d7c000+173000]
 [49288.388797] blah.py[28631]: segfault at 7fe254b6aa58 ip 7fe255715f00 
 sp 7fff6ddaaff8 error 4 in libpython2.7.so.1.0[7fe255605000+173000]
 [49942.020084] blah.py[6950]: segfault at d0 ip 7f3e8a9acf9c sp 
 7fffa72288a0 error 4 in libpython2.7.so.1.0[7f3e8a904000+173000]
 [66696.443342] blah.py[8015]: segfault at cf ip 7f798f708f9c sp 
 7fff420336e0 error 4 in libpython2.7.so.1.0[7f798f66+173000]
 [67561.587383] blah.py[7483]: segfault at 7f7b16e01540 ip 7f7b17a85f00 
 sp 7fffe663d9b8 error 4 in libpython2.7.so.1.0[7f7b17975000+173000]
 [77262.490502] blah.py[29107]: segfault at 21e1458 ip 7fc54cd17f00 sp 
 7fff283c5c38 error 4 in libpython2.7.so.1.0[7fc54cc07000+173000]


 So, what does this [sched_delayed] sched: RT throttling activated tell me?
 
 That of the past 1s, 0.95s were spend running RR/FIFO tasks. It is a
 warning that comes only once per boot and should prompt you to
 investigate.
 
 You can turn the throttle off, but be advised that running a RR/FIFO
 task at 100% can (and generally does) negatively affect the running of
 your system (as in, these tasks can prevent system duties from taking
 place and eventually make the system come to a halt).
 
 
 As to those faults, investigate if your python prog does something
 particualrly weird or your runtime is in order. Otherwise I would advise
 you to run memtest for a while to make sure your machine is in proper
 working order.

Hmm, meanwhile the core dumps filled up my /var/dumps/ directory of / 
filesystem.
I do not have timing information what was the time since bootup. I deleted some
files on the disk and thought I am done. Now, few hours later I realized:

[85451.247130] traps: blah.py[30787] general protection ip:7faf7b57a046 
sp:7fffd9f7b1d0 error:0 in libpython2.7.so.1.0[7faf7b499000+173000]
[87125.493730] nr_pdflush_threads exported in /proc is scheduled for removal
[87125.494238] sysctl: The scan_unevictable_pages sysctl/node-interface has 
been disabled for lack of a legitimate use case.  If you have one, please send 
an email to linux...@kvack.org.
[97959.812943] blah.py[13069]: segfault at 7f1f2cfdca58 ip 7f1f2db87f00 sp 
7fffade41768 error 4 in libpython2.7.so.1.0[7f1f2da77000+173000]


I bet at about the time 87125 the disk was full. The laptop has 16GB of RAM
and the coredump files are really big, 300MB to 8GB. However, the 
nr_pdflush_threads
message sounds scary. Does linux 3.10.9 want to delete /proc on the fly? ;-)


Martin
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo

Re: [PATCH] pciehp: Add pciehp_surprise module option

2013-06-06 Thread Martin Mokrejs

Takashi Iwai wrote:
> At Wed, 20 Mar 2013 19:41:38 +0100,
> Martin Mokrejs wrote:
>>
>> Hi Takashi,
>>   would you please describe your test system in more detail? How
>> about 'lspci -tv'? And 'lsusb -v' of the broken device?
> 
> I left the machine in my office, so I'll give details tomorrow.
> It's a Realtek 5249 PCI-e card reader, and this appears as a PCI
> device once when registered by pciehp.  At cold boot, it doesn't
> appear in lspci.  It appears only when you insert the card.  Also,
> this device is no USB.  It's supported by mfd/rtsx_pci driver in 3.9.
> 

>> If I have Realtek MediaCardReader enabled in BIOS, no card in it, coldboot, 
>> and hot
>> insert an ExpressCard into the slot, the Realtek MediaCardReader pops up in 
>> dmesg as
>> a new PCI device. How about you?
> 
> The device is hotplugged only when the option of my patch is enabled,
> i.e. overriding the surprise capability check.
> 
>> My card does NOT show in lspci (maybe because I never plugged in a data card 
>> into it) but does show in lsusb:
> 
> So, it's a completely different case...
> 
>>
>> Bus 002 Device 005: ID 0bda:0138 Realtek Semiconductor Corp. RTS5138 Card 
>> Reader Controller
>> Device Descriptor:
>>   bLength18
>>   bDescriptorType 1
>>   bcdUSB   2.00
>>   bDeviceClass0 (Defined at Interface level)
>>   bDeviceSubClass 0 
>>   bDeviceProtocol 0 
>>   bMaxPacketSize064
>>   idVendor   0x0bda Realtek Semiconductor Corp.
>>   idProduct  0x0138 RTS5138 Card Reader Controller
>>   bcdDevice   38.82
>>   iManufacturer   1 Generic
>>   iProduct2 USB2.0-CRW
>>   iSerial 3 2009051638820
>>
>>
>> Can you try coldboot without a media card inserted before power up without
>> your patch and check whether the CardReader pops up after you plugin some
>> ExpressCard into an ExpressCardSlot (not the CardReader)? I presume it is
>> a laptop. ;-)
> 
> When you boot without the card, there is no PCI device.  Triggering
> PCI bus rescan also doesn't expose it.  But, when you insert the card,
> you'll get the notification in pciehp (seeing "Card present on Slot"
> message), but pciehp doesn't do anything right now unless the
> surprising bit is set.  The device may appear if you trigger the PCI
> bus rescan at this moment, too, though.
> 
>> 2. Is the hotplug broken also under acpiphp? And again, does it get detected
>> once you plugin some card into an ExpressCard slot?
> 
> acpiphp doesn't load on this machine.

While we concluded above that I have a different card (USB-hooked Realtek card)
I need since about 3.5 kernel pcie_aspm=off to get acpiphp working. It does not
work for all express cards but maybe this will help you to get *acpiphp* 
recognize
the slot? Note: the same kernel command line pcie_aspm=off breaks *pciehp* on
my laptop, so don't forget to delete it from grub.conf if you want to stick
with *pciehp* (if your hardware is prone to hit same bug like me:
https://bugzilla.kernel.org/show_bug.cgi?id=59391 .

Just in case you could switch to acpiphp. ;)
Martin
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] pciehp: Add pciehp_surprise module option

2013-06-06 Thread Martin Mokrejs

Takashi Iwai wrote:
 At Wed, 20 Mar 2013 19:41:38 +0100,
 Martin Mokrejs wrote:

 Hi Takashi,
   would you please describe your test system in more detail? How
 about 'lspci -tv'? And 'lsusb -v' of the broken device?
 
 I left the machine in my office, so I'll give details tomorrow.
 It's a Realtek 5249 PCI-e card reader, and this appears as a PCI
 device once when registered by pciehp.  At cold boot, it doesn't
 appear in lspci.  It appears only when you insert the card.  Also,
 this device is no USB.  It's supported by mfd/rtsx_pci driver in 3.9.
 

 If I have Realtek MediaCardReader enabled in BIOS, no card in it, coldboot, 
 and hot
 insert an ExpressCard into the slot, the Realtek MediaCardReader pops up in 
 dmesg as
 a new PCI device. How about you?
 
 The device is hotplugged only when the option of my patch is enabled,
 i.e. overriding the surprise capability check.
 
 My card does NOT show in lspci (maybe because I never plugged in a data card 
 into it) but does show in lsusb:
 
 So, it's a completely different case...
 

 Bus 002 Device 005: ID 0bda:0138 Realtek Semiconductor Corp. RTS5138 Card 
 Reader Controller
 Device Descriptor:
   bLength18
   bDescriptorType 1
   bcdUSB   2.00
   bDeviceClass0 (Defined at Interface level)
   bDeviceSubClass 0 
   bDeviceProtocol 0 
   bMaxPacketSize064
   idVendor   0x0bda Realtek Semiconductor Corp.
   idProduct  0x0138 RTS5138 Card Reader Controller
   bcdDevice   38.82
   iManufacturer   1 Generic
   iProduct2 USB2.0-CRW
   iSerial 3 2009051638820


 Can you try coldboot without a media card inserted before power up without
 your patch and check whether the CardReader pops up after you plugin some
 ExpressCard into an ExpressCardSlot (not the CardReader)? I presume it is
 a laptop. ;-)
 
 When you boot without the card, there is no PCI device.  Triggering
 PCI bus rescan also doesn't expose it.  But, when you insert the card,
 you'll get the notification in pciehp (seeing Card present on Slot
 message), but pciehp doesn't do anything right now unless the
 surprising bit is set.  The device may appear if you trigger the PCI
 bus rescan at this moment, too, though.
 
 2. Is the hotplug broken also under acpiphp? And again, does it get detected
 once you plugin some card into an ExpressCard slot?
 
 acpiphp doesn't load on this machine.

While we concluded above that I have a different card (USB-hooked Realtek card)
I need since about 3.5 kernel pcie_aspm=off to get acpiphp working. It does not
work for all express cards but maybe this will help you to get *acpiphp* 
recognize
the slot? Note: the same kernel command line pcie_aspm=off breaks *pciehp* on
my laptop, so don't forget to delete it from grub.conf if you want to stick
with *pciehp* (if your hardware is prone to hit same bug like me:
https://bugzilla.kernel.org/show_bug.cgi?id=59391 .

Just in case you could switch to acpiphp. ;)
Martin
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] ACPI: Fix potential NULL pointer dereference in acpi_processor_add()

2013-05-29 Thread Martin Mokrejs

Hanjun Guo wrote:
> On 2013-5-29 7:30, Rafael J. Wysocki wrote:
>> On Thursday, May 23, 2013 08:44:26 PM Hanjun Guo wrote:
>>> In acpi_processor_add(), get_cpu_device() will return NULL sometimes,
>>> although the chances are small, I think it should be fixed.
>>>
>>> Signed-off-by: Hanjun Guo 
>>
>> This patch isn't necessary any more after the changes queued up for 3.11
>> in the acpi-hotplug branch of the linux-pm.git tree.
> 
> Ok, I noticed your patch set, just drop my patch.

But shouldn't this go to stable at least? I checked linux-3.9.4
and it applies fine. Whether this is relevant for other stable
series I will leave up to somebody else. ;)
Martin

> 
> Thanks
> Hanjun
> 
>>
>> Thanks,
>> Rafael
>>
>>
>>> ---
>>>  drivers/acpi/processor_driver.c |4 
>>>  1 files changed, 4 insertions(+), 0 deletions(-)
>>>
>>> diff --git a/drivers/acpi/processor_driver.c 
>>> b/drivers/acpi/processor_driver.c
>>> index bec717f..dd64f23 100644
>>> --- a/drivers/acpi/processor_driver.c
>>> +++ b/drivers/acpi/processor_driver.c
>>> @@ -579,6 +579,10 @@ static int __cpuinit acpi_processor_add(struct 
>>> acpi_device
>>> *device)
>>> per_cpu(processors, pr->id) = pr;
>>>
>>> dev = get_cpu_device(pr->id);
>>> +   if (!dev) {
>>> +   result = -ENODEV;
>>> +   goto err_clear_processor;
>>> +   }
>>> if (sysfs_create_link(>dev.kobj, >kobj, "sysdev")) {
>>> result = -EFAULT;
>>> goto err_clear_processor;
>>>
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] ACPI: Fix potential NULL pointer dereference in acpi_processor_add()

2013-05-29 Thread Martin Mokrejs

Hanjun Guo wrote:
 On 2013-5-29 7:30, Rafael J. Wysocki wrote:
 On Thursday, May 23, 2013 08:44:26 PM Hanjun Guo wrote:
 In acpi_processor_add(), get_cpu_device() will return NULL sometimes,
 although the chances are small, I think it should be fixed.

 Signed-off-by: Hanjun Guo hanjun@linaro.org

 This patch isn't necessary any more after the changes queued up for 3.11
 in the acpi-hotplug branch of the linux-pm.git tree.
 
 Ok, I noticed your patch set, just drop my patch.

But shouldn't this go to stable at least? I checked linux-3.9.4
and it applies fine. Whether this is relevant for other stable
series I will leave up to somebody else. ;)
Martin

 
 Thanks
 Hanjun
 

 Thanks,
 Rafael


 ---
  drivers/acpi/processor_driver.c |4 
  1 files changed, 4 insertions(+), 0 deletions(-)

 diff --git a/drivers/acpi/processor_driver.c 
 b/drivers/acpi/processor_driver.c
 index bec717f..dd64f23 100644
 --- a/drivers/acpi/processor_driver.c
 +++ b/drivers/acpi/processor_driver.c
 @@ -579,6 +579,10 @@ static int __cpuinit acpi_processor_add(struct 
 acpi_device
 *device)
 per_cpu(processors, pr-id) = pr;

 dev = get_cpu_device(pr-id);
 +   if (!dev) {
 +   result = -ENODEV;
 +   goto err_clear_processor;
 +   }
 if (sysfs_create_link(device-dev.kobj, dev-kobj, sysdev)) {
 result = -EFAULT;
 goto err_clear_processor;

 
 --
 To unsubscribe from this list: send the line unsubscribe linux-acpi in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 
 
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: PROBLEM: modprobe hang at startup (3.8.x, 3.9.x, IBM x3550)

2013-05-17 Thread Martin Mokrejs

Hi,
  while you are chasing some problem with i2c_801 I would like to mention
that I never got an answer on the thread https://lkml.org/lkml/2013/1/23/405
about a kmemleak reported by kernel . Maybe this could give you a hint?
If these do not overlap I would be anyways glad to receive an answer via
the original thread I have started.
Thank you,
Martin

Jean Delvare wrote:
> Hi Robert,
> 
> On Thu, 16 May 2013 13:44:55 +1000, Robert Norris wrote:
>> On Wed, May 15, 2013 at 09:49:23PM +0200, Jean Delvare wrote:
 Interrupt: pin B routed to IRQ 0
>>>
>>> Hmm, this "IRQ 0" is quite odd. I'm wondering if this could be the
>>> reason for this hang. Was it with the i2c-i801 driver loaded, or
>>> blacklisted? Please check if it makes a difference.
>>
>> That was without the driver loaded (blacklisted). After loading (with
>> interrupts enabled) we get:
>>
>> Interrupt: pin B routed to IRQ 20
> 
> For the record, I also see the IRQ value change after loading the
> i2c-i801 driver on my system (with an ICH10 south bridge.) From 14 to
> 22 in my case. So it's a bit different (no IRQ 0) but not still
> somewhat similar, so I'm still not sure if this has anything to do with
> your issue.
> 
>>
>>> Do you see the same (and more generally, this issue) on one, some or
>>> all of your x3550 servers?
>>
>> The issue has occured on at least three x3550s (we have 11). I haven't
>> tested more, because knowingly crashing production machines sucks.
> 
> Yes of course, I understand, I did not expect you to do that ;) 
> 
>> This appears to be the case on other machines. With the module
>> blacklisted (never loaded), lspci shows IRQ 0. After load, IRQ 20.
>> (tested on 3.4 and 3.9).
> 
> OK.
> 
>>> Are you using IPMI on these machines?
>>
>> Yes, but only for monitoring/sensors, if that makes a difference.
> 
> IPMI is still likely to access the SMBus controller. If there's a BMC
> in the machine, it can also access the SMBus slave with its own
> controller. It would be good to rule this out by disabling IPMI
> completely, removing the BMC from the machine if it has one, and
> checking if it makes the issue go away or not.
> 
>>> I would appreciate if you could test the following:
>>> * Blacklist i2c-i801 and ics932s401 so that none of them get
>>>   auto-loaded.
>>
>> Done.
>>
>>> * Manually load i2c-i801 with interrupts enabled, and see what
>>>   happens.
>>
>> Returned immediately:
>>
>> [   60.527140] i801_smbus :00:1f.3: SMBus using PCI Interrupt
> 
> This confirms that the i2c-i801 driver loading itself isn't the problem.
> 
>>> * If no hang happens, load i2c-dev, find the i801 bus number with
>>>   i2cdetect -l (from the i2c-tools package - it should be 4 according
>>>   to what you reported so far but there is no guarantee that it won't
>>>   change across reboots.)
>>
>> $ i2cdetect -l
>> i2c-0   i2c Radeon i2c bit bus DVI_DDC  I2C adapter
>> i2c-1   i2c Radeon i2c bit bus VGA_DDC  I2C adapter
>> i2c-2   i2c Radeon i2c bit bus MONIDI2C adapter
>> i2c-3   i2c Radeon i2c bit bus CRT2_DDC I2C adapter
>> i2c-4   smbus   SMBus I801 adapter at 0440  SMBus adapter
>>
>>> Then do a simple read from a random address
>>>   with:
>>>   # i2cget 4 0x50 0x00
>>>   (Adjust the bus number as needed.)
>>>   I am curious if this will hang as well or only when accessing the
>>>   clock chip at address 0x69.
>>
>> Yep, that one hangs. The hung task handler picked it up after a few
>> minutes.
> 
> OK, this means that any transaction request to the SMBus controller
> causes the hang.
> 
> The i2c-i801 driver is optimistically using wait_event() when waiting
> for an interrupt to arrive. I suppose that the interrupt is never
> delivered in your case (all 0 in /proc/interrupts.)
> 
> Daniel, shouldn't we use wait_event_timeout() instead to catch issues
> like this and fail cleanly? Maybe even fallback to polling
> automatically?
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: PROBLEM: modprobe hang at startup (3.8.x, 3.9.x, IBM x3550)

2013-05-17 Thread Martin Mokrejs

Hi,
  while you are chasing some problem with i2c_801 I would like to mention
that I never got an answer on the thread https://lkml.org/lkml/2013/1/23/405
about a kmemleak reported by kernel . Maybe this could give you a hint?
If these do not overlap I would be anyways glad to receive an answer via
the original thread I have started.
Thank you,
Martin

Jean Delvare wrote:
 Hi Robert,
 
 On Thu, 16 May 2013 13:44:55 +1000, Robert Norris wrote:
 On Wed, May 15, 2013 at 09:49:23PM +0200, Jean Delvare wrote:
 Interrupt: pin B routed to IRQ 0

 Hmm, this IRQ 0 is quite odd. I'm wondering if this could be the
 reason for this hang. Was it with the i2c-i801 driver loaded, or
 blacklisted? Please check if it makes a difference.

 That was without the driver loaded (blacklisted). After loading (with
 interrupts enabled) we get:

 Interrupt: pin B routed to IRQ 20
 
 For the record, I also see the IRQ value change after loading the
 i2c-i801 driver on my system (with an ICH10 south bridge.) From 14 to
 22 in my case. So it's a bit different (no IRQ 0) but not still
 somewhat similar, so I'm still not sure if this has anything to do with
 your issue.
 

 Do you see the same (and more generally, this issue) on one, some or
 all of your x3550 servers?

 The issue has occured on at least three x3550s (we have 11). I haven't
 tested more, because knowingly crashing production machines sucks.
 
 Yes of course, I understand, I did not expect you to do that ;) 
 
 This appears to be the case on other machines. With the module
 blacklisted (never loaded), lspci shows IRQ 0. After load, IRQ 20.
 (tested on 3.4 and 3.9).
 
 OK.
 
 Are you using IPMI on these machines?

 Yes, but only for monitoring/sensors, if that makes a difference.
 
 IPMI is still likely to access the SMBus controller. If there's a BMC
 in the machine, it can also access the SMBus slave with its own
 controller. It would be good to rule this out by disabling IPMI
 completely, removing the BMC from the machine if it has one, and
 checking if it makes the issue go away or not.
 
 I would appreciate if you could test the following:
 * Blacklist i2c-i801 and ics932s401 so that none of them get
   auto-loaded.

 Done.

 * Manually load i2c-i801 with interrupts enabled, and see what
   happens.

 Returned immediately:

 [   60.527140] i801_smbus :00:1f.3: SMBus using PCI Interrupt
 
 This confirms that the i2c-i801 driver loading itself isn't the problem.
 
 * If no hang happens, load i2c-dev, find the i801 bus number with
   i2cdetect -l (from the i2c-tools package - it should be 4 according
   to what you reported so far but there is no guarantee that it won't
   change across reboots.)

 $ i2cdetect -l
 i2c-0   i2c Radeon i2c bit bus DVI_DDC  I2C adapter
 i2c-1   i2c Radeon i2c bit bus VGA_DDC  I2C adapter
 i2c-2   i2c Radeon i2c bit bus MONIDI2C adapter
 i2c-3   i2c Radeon i2c bit bus CRT2_DDC I2C adapter
 i2c-4   smbus   SMBus I801 adapter at 0440  SMBus adapter

 Then do a simple read from a random address
   with:
   # i2cget 4 0x50 0x00
   (Adjust the bus number as needed.)
   I am curious if this will hang as well or only when accessing the
   clock chip at address 0x69.

 Yep, that one hangs. The hung task handler picked it up after a few
 minutes.
 
 OK, this means that any transaction request to the SMBus controller
 causes the hang.
 
 The i2c-i801 driver is optimistically using wait_event() when waiting
 for an interrupt to arrive. I suppose that the interrupt is never
 delivered in your case (all 0 in /proc/interrupts.)
 
 Daniel, shouldn't we use wait_event_timeout() instead to catch issues
 like this and fail cleanly? Maybe even fallback to polling
 automatically?
 
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

3.9-linux-next-20130501: OOPS in intel_pstate_sample

2013-05-01 Thread Martin Mokrejs

Hi,
  I opened yet another bug https://bugzilla.kernel.org/show_bug.cgi?id=57411 .

This is maybe a dupe of bug https://bugzilla.kernel.org/show_bug.cgi?id=57401
(which is vanilla 3.9) but happened on linux-next-20130501 after I did "dmesg | 
less".

? pid_param_set
intel_pstate_timer_func
call_timer_fn
? __internal_add_timer
? pid_param_set
run_timer_softirq
__do_softirq
irq_exit
smp_apic_timer_interrupt
apic_timer_interrupt
? sysret_check

A camera picture of the stacktrace is attached to the bug 
https://bugzilla.kernel.org/show_bug.cgi?id=57411
Please forward this to the appropriate person.
Thanks,
Martin
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

linux-3.9: OOPS in intel_timer_pstate_func

2013-05-01 Thread Martin Mokrejs

Hi,
  I just this kernel crash on my laptop running fine so far on 3.7.10 (and 3.8.5
if really necessary). The 3.9 was running for maybe 2 hrs, at the most. :(

? cpumask_weight
call_timer_fn.clone
? init_timer_key
run_timer_softirq
? cpumask_weight
__do_softirq
smp_apic_timer_interrupt
apic_timer_interrupt
? cpuidle_wrap_enter
? cpuidle_wrap_enter
cpuidle_enter_tk
cpuidle_enter_state
cpuidle_call
cpu_idel
rest_init
? csum_partial_copy_generic
start_kernel
? repair_env_string
x86_64_start_reservations
x86_64_start_kernel

  I just opened a bug at https://bugzilla.kernel.org/show_bug.cgi?id=57401
with a camera picture of the screen with the stacktrace. I failed to find
a component like CPU or IRQ so please forward this to the appropriate person.
Thank you,
Martin
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

linux-3.9: OOPS in intel_timer_pstate_func

2013-05-01 Thread Martin Mokrejs

Hi,
  I just this kernel crash on my laptop running fine so far on 3.7.10 (and 3.8.5
if really necessary). The 3.9 was running for maybe 2 hrs, at the most. :(

? cpumask_weight
call_timer_fn.clone
? init_timer_key
run_timer_softirq
? cpumask_weight
__do_softirq
smp_apic_timer_interrupt
apic_timer_interrupt
? cpuidle_wrap_enter
? cpuidle_wrap_enter
cpuidle_enter_tk
cpuidle_enter_state
cpuidle_call
cpu_idel
rest_init
? csum_partial_copy_generic
start_kernel
? repair_env_string
x86_64_start_reservations
x86_64_start_kernel

  I just opened a bug at https://bugzilla.kernel.org/show_bug.cgi?id=57401
with a camera picture of the screen with the stacktrace. I failed to find
a component like CPU or IRQ so please forward this to the appropriate person.
Thank you,
Martin
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

3.9-linux-next-20130501: OOPS in intel_pstate_sample

2013-05-01 Thread Martin Mokrejs

Hi,
  I opened yet another bug https://bugzilla.kernel.org/show_bug.cgi?id=57411 .

This is maybe a dupe of bug https://bugzilla.kernel.org/show_bug.cgi?id=57401
(which is vanilla 3.9) but happened on linux-next-20130501 after I did dmesg | 
less.

? pid_param_set
intel_pstate_timer_func
call_timer_fn
? __internal_add_timer
? pid_param_set
run_timer_softirq
__do_softirq
irq_exit
smp_apic_timer_interrupt
apic_timer_interrupt
? sysret_check

A camera picture of the stacktrace is attached to the bug 
https://bugzilla.kernel.org/show_bug.cgi?id=57411
Please forward this to the appropriate person.
Thanks,
Martin
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Update][PATCH] PCI / PM: Disable runtime PM of PCIe ports

2013-04-01 Thread Martin Mokrejs

Bjorn Helgaas wrote:
> On Mon, Apr 1, 2013 at 2:51 PM, Rafael J. Wysocki  wrote:
>> On Monday, April 01, 2013 11:34:46 AM Bjorn Helgaas wrote:
>>> [+cc Zheng, who added this with 71a83bd727]
>>>
>>> On Sat, Mar 30, 2013 at 4:38 PM, Rafael J. Wysocki  wrote:
 From: Rafael J. Wysocki 

 The runtime PM of PCIe ports turns out to be quite fragile, as in
 some cases things work while in some other cases they don't and we
 don't seem to have a good way to determine whether or not they are
 going to work in advance.
>>>
>>> Do you have any references to problems encountered when enabling
>>> runtime PM for PCIe ports?  That information will be useful to anybody
>>> who wants to take another crack at getting this working.
>>
>> Well, bug 53811 is one example and problems recently reported by
>> Martin are another.  Do you want me to dig deeper?
> 
> OK, I got this one:
> 
>   https://bugzilla.kernel.org/show_bug.cgi?id=53811
> 
> Martin has reported a lot of problems lately, and I don't know which
> are related to runtime PM for PCIe ports.  I was hoping for a couple
> URLs to put in the changelog so that when somebody gets the itch to
> make this work, they have some useful info to start from.  If you
> point me at a specific message, I'll dig up an archive URL for it.

In the thread

Re: 3.8.2: xhci port is dead until pcieport PME# goes to disabled
http://marc.info/?t=13632822262=1=2

I reported that if an upstream express root port 1c.4 of the xHCI controller
at 0b:00 is suspended the USB3 socket on the laptop appears dead.
Initially I found that 'lsusb -v' rescues the dead socket and is accompanied
by these in logs:

[ 1445.597641] pcieport :00:1c.4: PME# disabled
[ 1445.617667] xhci_hcd :0b:00.0: PME# disabled

Ying Huang then realized elsewhere I am running laptop-mode-tools although
in their config file I set that they should NOT be run when on AC power.
Looks they do enable 'auto' power mode as seen in
/sys/bus/pci/devices/*/power/control files already upon bootup.
BTW, even worse, if I do /etc/init.d/laptop-mode-tools stop
they restore to some initial values. :(( So, if I meanwhile forced
'on' for some device they will return me back to 'auto' and the device
will immediately do suspend. ;-)

Provided I uninstalled the laptop-mode-tools and made sure all control
files say 'on' (and hence runtime_status files say 'active') then
my problem is with a dead xHCI port 'obeyed'.

Myself it weird that suspend of the port happens only upon USB device
unplug. The port does not suspend by itself if unused.

What is not clear to me how kernel is going to handle laptop-mode-tools
which enabled powersaving on the 1c.4. In my naive, user view kernel does
not realize and *check* that no user tool or a desperate user tried to
suspend an upstream port while there is something bound to it and it
does not apply a check for cascaded devices (1c.4 > 0b:00 and
1c.7 -> 11:00 in my case).

I am writing this without a reference but modprobe of a driver can overcome
suspended root port. I am in this particular case meaning my 1c.7 port
and its downstream 11:00 express card device. From the top of my head
I am not sure if modprobe overcame both 1c.7 and 11:00 being initially
suspended. I could dig it out from the

Re: 3.9-rc1: pciehp and eSATA card SiI 3132, no XHCI
http://marc.info/?t=13630500881=1=2

thread if you want. Or it might be easier for you to test it yourself.

So, for me the issue is not fixed but if you decide to disable runtime
power saving for devices under pcieport I don't mind. Their mishandling
definitely causes my acpiphp hotplug issues under 3.7-3.8 kernels
(3.9-rc not tested) whereas these PM issues do not answer why pciehp
is broken on 3.7-3.9-rc1.

Anyway, this patch maybe only good because I would like to use the
laptop-mode-tools and they for sure will put one of the devices into 'auto'
and it will likely fall into suspend.
Martin

> 
> Otherwise, I'm afraid we'll just oscillate between "enable PM, find
> bug, disable PM, enable PM, find same bug, disable PM, etc..."
> 
> Bjorn
> 
 For this reason, avoid enabling runtime PM for PCIe ports by
 keeping their runtime PM reference counters always above 0 for the
 time being.

 Signed-off-by: Rafael J. Wysocki 
 ---

 This version also removes the no longer necessary (and empty anyway)
 port_runtime_pm_black_list[] table.

 Thanks,
 Rafael

 ---
  drivers/pci/pcie/portdrv_pci.c |   13 -
  1 file changed, 13 deletions(-)

 Index: linux-pm/drivers/pci/pcie/portdrv_pci.c
 ===
 --- linux-pm.orig/drivers/pci/pcie/portdrv_pci.c
 +++ linux-pm/drivers/pci/pcie/portdrv_pci.c
 @@ -185,14 +185,6 @@ static const struct dev_pm_ops pcie_port
  #endif /* !PM */

  /*
 - * PCIe port runtime suspend is broken for some chipsets, so use a
 - * black list

Re: [Update][PATCH] PCI / PM: Disable runtime PM of PCIe ports

2013-04-01 Thread Martin Mokrejs

Bjorn Helgaas wrote:
 On Mon, Apr 1, 2013 at 2:51 PM, Rafael J. Wysocki r...@sisk.pl wrote:
 On Monday, April 01, 2013 11:34:46 AM Bjorn Helgaas wrote:
 [+cc Zheng, who added this with 71a83bd727]

 On Sat, Mar 30, 2013 at 4:38 PM, Rafael J. Wysocki r...@sisk.pl wrote:
 From: Rafael J. Wysocki rafael.j.wyso...@intel.com

 The runtime PM of PCIe ports turns out to be quite fragile, as in
 some cases things work while in some other cases they don't and we
 don't seem to have a good way to determine whether or not they are
 going to work in advance.

 Do you have any references to problems encountered when enabling
 runtime PM for PCIe ports?  That information will be useful to anybody
 who wants to take another crack at getting this working.

 Well, bug 53811 is one example and problems recently reported by
 Martin are another.  Do you want me to dig deeper?
 
 OK, I got this one:
 
   https://bugzilla.kernel.org/show_bug.cgi?id=53811
 
 Martin has reported a lot of problems lately, and I don't know which
 are related to runtime PM for PCIe ports.  I was hoping for a couple
 URLs to put in the changelog so that when somebody gets the itch to
 make this work, they have some useful info to start from.  If you
 point me at a specific message, I'll dig up an archive URL for it.

In the thread

Re: 3.8.2: xhci port is dead until pcieport PME# goes to disabled
http://marc.info/?t=13632822262r=1w=2

I reported that if an upstream express root port 1c.4 of the xHCI controller
at 0b:00 is suspended the USB3 socket on the laptop appears dead.
Initially I found that 'lsusb -v' rescues the dead socket and is accompanied
by these in logs:

[ 1445.597641] pcieport :00:1c.4: PME# disabled
[ 1445.617667] xhci_hcd :0b:00.0: PME# disabled

Ying Huang then realized elsewhere I am running laptop-mode-tools although
in their config file I set that they should NOT be run when on AC power.
Looks they do enable 'auto' power mode as seen in
/sys/bus/pci/devices/*/power/control files already upon bootup.
BTW, even worse, if I do /etc/init.d/laptop-mode-tools stop
they restore to some initial values. :(( So, if I meanwhile forced
'on' for some device they will return me back to 'auto' and the device
will immediately do suspend. ;-)

Provided I uninstalled the laptop-mode-tools and made sure all control
files say 'on' (and hence runtime_status files say 'active') then
my problem is with a dead xHCI port 'obeyed'.

Myself it weird that suspend of the port happens only upon USB device
unplug. The port does not suspend by itself if unused.


What is not clear to me how kernel is going to handle laptop-mode-tools
which enabled powersaving on the 1c.4. In my naive, user view kernel does
not realize and *check* that no user tool or a desperate user tried to
suspend an upstream port while there is something bound to it and it
does not apply a check for cascaded devices (1c.4  0b:00 and
1c.7 - 11:00 in my case).

I am writing this without a reference but modprobe of a driver can overcome
suspended root port. I am in this particular case meaning my 1c.7 port
and its downstream 11:00 express card device. From the top of my head
I am not sure if modprobe overcame both 1c.7 and 11:00 being initially
suspended. I could dig it out from the

Re: 3.9-rc1: pciehp and eSATA card SiI 3132, no XHCI
http://marc.info/?t=13630500881r=1w=2

thread if you want. Or it might be easier for you to test it yourself.



So, for me the issue is not fixed but if you decide to disable runtime
power saving for devices under pcieport I don't mind. Their mishandling
definitely causes my acpiphp hotplug issues under 3.7-3.8 kernels
(3.9-rc not tested) whereas these PM issues do not answer why pciehp
is broken on 3.7-3.9-rc1.


Anyway, this patch maybe only good because I would like to use the
laptop-mode-tools and they for sure will put one of the devices into 'auto'
and it will likely fall into suspend.
Martin



 
 Otherwise, I'm afraid we'll just oscillate between enable PM, find
 bug, disable PM, enable PM, find same bug, disable PM, etc...
 
 Bjorn
 
 For this reason, avoid enabling runtime PM for PCIe ports by
 keeping their runtime PM reference counters always above 0 for the
 time being.

 Signed-off-by: Rafael J. Wysocki rafael.j.wyso...@intel.com
 ---

 This version also removes the no longer necessary (and empty anyway)
 port_runtime_pm_black_list[] table.

 Thanks,
 Rafael

 ---
  drivers/pci/pcie/portdrv_pci.c |   13 -
  1 file changed, 13 deletions(-)

 Index: linux-pm/drivers/pci/pcie/portdrv_pci.c
 ===
 --- linux-pm.orig/drivers/pci/pcie/portdrv_pci.c
 +++ linux-pm/drivers/pci/pcie/portdrv_pci.c
 @@ -185,14 +185,6 @@ static const struct dev_pm_ops pcie_port
  #endif /* !PM */

  /*
 - * PCIe port runtime suspend is broken for some chipsets, so use a
 - * black list to disable runtime PM for these chipsets.
 - */
 -static const struct pci_device_id

Re: [Update][PATCH] PCI / ACPI: Always resume devices on ACPI wakeup notifications

2013-03-29 Thread Martin Mokrejs

So, I re-tested again with the patch and 3.8.3 but without laptop-mode-tools.
The xHCI port works fine provided 
/sys/bus/pci/devices/:0b:00.0/power/control
is set to on and /sys/bus/pci/devices/:00:1c.4/power/control also to on.
If I set parent 1c.4 to auto, it gets suspended and the port seems dead until
a device is in and I wake it using lsusb -vv. There must be a bug in linux so
that it cannot overcome upstream 1c.4 sleeping while willing to access 0b:00.
Or more likely, that upstream root port should be prevented to fall asleep, 
right?


# lspci -tv
-[:00]-+-00.0  Intel Corporation 2nd Generation Core Processor Family DRAM 
Controller
   +-02.0  Intel Corporation 2nd Generation Core Processor Family 
Integrated Graphics Controller
   +-16.0  Intel Corporation 6 Series/C200 Series Chipset Family MEI 
Controller #1
   +-1a.0  Intel Corporation 6 Series/C200 Series Chipset Family USB 
Enhanced Host Controller #2
   +-1b.0  Intel Corporation 6 Series/C200 Series Chipset Family High 
Definition Audio Controller
   +-1c.0-[03-04]--
   +-1c.1-[05-06]00.0  Realtek Semiconductor Co., Ltd. RTL8111/8168 
PCI Express Gigabit Ethernet controller
   +-1c.3-[09-0a]00.0  Intel Corporation Centrino Wireless-N 1030 
[Rainbow Peak]
   +-1c.4-[0b-0c]00.0  Texas Instruments TUSB73x0 SuperSpeed USB 
3.0 xHCI Host Controller
   +-1c.7-[11-16]00.0  Silicon Image, Inc. SiI 3132 Serial ATA Raid 
II Controller
   +-1d.0  Intel Corporation 6 Series/C200 Series Chipset Family USB 
Enhanced Host Controller #1
   +-1f.0  Intel Corporation HM67 Express Chipset Family LPC Controller
   +-1f.2  Intel Corporation 6 Series/C200 Series Chipset Family 6 port 
SATA AHCI Controller
   \-1f.3  Intel Corporation 6 Series/C200 Series Chipset Family SMBus 
Controller
#



I have attached the lspci -vvv -n.

Interestingly, maybe, the TI xHCI controller ended up after my tests
changed. I booted up with all devices with power/control set to on
due to laptop-mode-tools uninstalled. I fiddled with the echo commands
tweaking 1c.4 and 0b:00 but in the end set both back to "on". However,
below is some diff. Don't know what that means. Maybe because I tried
to write '0', 'off', 'none' to the control file? ;-)

 00:1c.4 0604: 8086:1c18 (rev b5) (prog-if 00 [Normal decode])
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- 
Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR- TAbort- Reset- FastB2B-
PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
Capabilities: [40] Express (v2) Root Port (Slot+), MSI 00
DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <64ns, 
L1 <1us
ExtTag- RBE+ FLReset-
DevCtl: Report errors: Correctable- Non-Fatal- Fatal- 
Unsupported-
RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
MaxPayload 128 bytes, MaxReadReq 128 bytes
DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+ 
TransPend-
LnkCap: Port #5, Speed 5GT/s, Width x1, ASPM L0s L1, Latency L0 
<512ns, L1 <16us
ClockPM- Surprise- LLActRep+ BwNot-
LnkCtl: ASPM L1 Enabled; RCB 64 bytes Disabled- Retrain- 
CommClk+
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive+ 
BWMgmt+ ABWMgmt+
SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug- 
Surprise-
Slot #4, PowerLimit 10.000W; Interlock- NoCompl+
SltCtl: Enable: AttnBtn- PwrFlt- MRL- PresDet- CmdCplt- HPIrq- 
LinkChg-
Control: AttnInd Unknown, PwrInd Unknown, Power- 
Interlock-
SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet+ 
Interlock-
Changed: MRL- PresDet- LinkState+
RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna- 
CRSVisible-
RootCap: CRSVisible-
-   RootSta: PME ReqID , PMEStatus- PMEPending-
+   RootSta: PME ReqID 0b00, PMEStatus- PMEPending-
DevCap2: Completion Timeout: Range BC, TimeoutDis+, LTR-, OBFF 
Not Supported ARIFwd-
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, 
OBFF Disabled ARIFwd-
LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
 Transmit Margin: Normal Operating Range, 
EnterModifiedCompliance- ComplianceSOS-
 Compliance De-emphasis: -6dB
LnkSta2: Current De-emphasis Level: -6dB, 
EqualizationComplete-, EqualizationPhase1-
 EqualizationPhase2-, EqualizationPhase3-, 
LinkEqualizationRequest-
Capabilities: [80] MSI: Enable-

Re: [Update][PATCH] PCI / ACPI: Always resume devices on ACPI wakeup notifications

2013-03-29 Thread Martin Mokrejs

Sarah,
  please let me know if you feel the test was screwed by laptop-mode-tools
kicking in, although I believed they were not running while I was on AC power.
I was testing under these conditions:

vostro ~ # grep . /sys/bus/pci/devices/*/power/control
/sys/bus/pci/devices/:00:00.0/power/control:auto
/sys/bus/pci/devices/:00:02.0/power/control:auto
/sys/bus/pci/devices/:00:16.0/power/control:auto
/sys/bus/pci/devices/:00:1a.0/power/control:auto
/sys/bus/pci/devices/:00:1b.0/power/control:auto
/sys/bus/pci/devices/:00:1c.0/power/control:auto
/sys/bus/pci/devices/:00:1c.1/power/control:auto
/sys/bus/pci/devices/:00:1c.3/power/control:auto
/sys/bus/pci/devices/:00:1c.4/power/control:auto
/sys/bus/pci/devices/:00:1c.7/power/control:auto
/sys/bus/pci/devices/:00:1d.0/power/control:auto
/sys/bus/pci/devices/:00:1f.0/power/control:auto
/sys/bus/pci/devices/:00:1f.2/power/control:auto
/sys/bus/pci/devices/:00:1f.3/power/control:auto
/sys/bus/pci/devices/:05:00.0/power/control:auto
/sys/bus/pci/devices/:09:00.0/power/control:auto
/sys/bus/pci/devices/:0b:00.0/power/control:auto
/sys/bus/pci/devices/:11:00.0/power/control:auto
vostro ~ # grep . /sys/bus/pci/devices/*/power/runtime_status
/sys/bus/pci/devices/:00:00.0/power/runtime_status:suspended
/sys/bus/pci/devices/:00:02.0/power/runtime_status:active
/sys/bus/pci/devices/:00:16.0/power/runtime_status:suspended
/sys/bus/pci/devices/:00:1a.0/power/runtime_status:active
/sys/bus/pci/devices/:00:1b.0/power/runtime_status:active
/sys/bus/pci/devices/:00:1c.0/power/runtime_status:suspended
/sys/bus/pci/devices/:00:1c.1/power/runtime_status:active
/sys/bus/pci/devices/:00:1c.3/power/runtime_status:active
/sys/bus/pci/devices/:00:1c.4/power/runtime_status:active
/sys/bus/pci/devices/:00:1c.7/power/runtime_status:active
/sys/bus/pci/devices/:00:1d.0/power/runtime_status:active
/sys/bus/pci/devices/:00:1f.0/power/runtime_status:active
/sys/bus/pci/devices/:00:1f.2/power/runtime_status:active
/sys/bus/pci/devices/:00:1f.3/power/runtime_status:suspended
/sys/bus/pci/devices/:05:00.0/power/runtime_status:active
/sys/bus/pci/devices/:09:00.0/power/runtime_status:active
/sys/bus/pci/devices/:0b:00.0/power/runtime_status:active
/sys/bus/pci/devices/:11:00.0/power/runtime_status:active
vostro ~ # 

My apologies if that twisted the test and thanks for you detailed explanations.

I will spot below, however, a few questions.

Sarah Sharp wrote:
> On Fri, Mar 29, 2013 at 04:05:54PM +0100, Martin Mokrejs wrote:

> 
>> Nevertheless, I went to check if if the USB3 socket dies after first unplug 
>> of device
>> or not anymore thanks to the patch being tested:
>>
>> I plugged into the USB3.0 socket a mouse, it worked. Around its unplug I got:
>>
>> [   94.954779] hub 3-0:1.0: debounce: port 2: total 100ms stable 100ms 
>> status 0x100
>> [   94.954795] hub 3-0:1.0: hub_suspend
>> [   94.954802] usb usb3: bus auto-suspend, wakeup 1
>> [   94.954817] xhci_hcd :0b:00.0: xhci_hub_status_data: stopping port 
>> polling.
>> [   94.954835] xhci_hcd :0b:00.0: xhci_suspend: stopping port polling.
>> [   94.954857] xhci_hcd :0b:00.0: // Setting command ring address to 
>> 0xd6007001
>> [   94.954898] xhci_hcd :0b:00.0: hcd_pci_runtime_suspend: 0
>> [   94.954983] xhci_hcd :0b:00.0: PME# enabled
>> [  169.622513] hub 2-1:1.0: state 7 ports 8 chg  evt 0004
>> [  169.623057] hub 2-1:1.0: port 2, status 0101, change 0001, 12 Mb/s
>> [  169.777012] hub 2-1:1.0: debounce: port 2: total 100ms stable 100ms 
>> status 0x101
>> [  169.856992] usb 2-1.2: new low-speed USB device number 4 using ehci-pci
>>
>> and the port was dead, no matter what "lsusb -v or -vv" options I tried. At 
>> about
>> [  169.622513] I plugged the mouse into a USB2.0 socket (do not know if that 
>> is 1a.0 or 1d.0).
> 
> All right, I wonder if the USB core/xHCI driver is forgetting to clear a
> port status change bit after the device is unplugged.  That can cause
> the xHCI host to not give us a port status change event later (and thus
> no PME).  Looking at the logs later, it doesn't seem like we do this
> though.
> 
>> If I run lsusb -vv it does (with the problematic patch):
>>
>> [ 1760.414086] pcieport :00:1c.4: PME# disabled
>> [ 1760.434314] xhci_hcd :0b:00.0: PME# disabled
>> [ 1760.434327] xhci_hcd :0b:00.0: enabling bus mastering
>> [ 1760.434338] xhci_hcd :0b:00.0: // Setting command ring address to 
>> 0xd6007001
>> [ 1760.434360] xhci_hcd :0b:00.0: Port Status Change Event for port 2
> 
> Ok, so the xHCI driver *is* getting a port status change event, and thus
> must have gott

Re: [Update][PATCH] PCI / ACPI: Always resume devices on ACPI wakeup notifications

2013-03-29 Thread Martin Mokrejs

Hi,
  I applied this patches over 3.8.3 hoping it will fix my issue under
thread: "Re: 3.8.2: xhci port is dead until pcieport PME# goes to disabled"
but unfortunately, it is even worse! Now, although lsusb -v nor lsusb -vv do
wakeup the XHCI port but it falls asleep immediately, more quickly than I am
able to plug a device into the socket. To get a device working in the USB3 
socket
I need to plug it in, run lsusb -vv and then it is recognized.

Without the patch, the 'lsusb -vv' woke up the port (PME# disabled happened
on both 1c.4 and 0b:00.0) and I had unlimited time to find some USB device
around and to plug it into the slot.


  I noticed this message some while after a bootup (no external USB devices were
connected to the laptop, neither into USB2 socket nor into USB3.0 sockets) 
before
I started to do the tests:

[   36.594171] xhci_hcd :0b:00.0: xhci_suspend: stopping port polling.
[   36.594202] xhci_hcd :0b:00.0: // Setting command ring address to 
0xd6007001
[   36.594247] xhci_hcd :0b:00.0: hcd_pci_runtime_suspend: 0
[   36.594349] xhci_hcd :0b:00.0: PME# enabled
[   36.703695] r8169 :05:00.0 eth0: link down
[   37.098299] microcode: CPU0 updated to revision 0x28, date = 2012-04-24
[   37.098941] microcode: CPU1 updated to revision 0x28, date = 2012-04-24
[   37.098944] perf_event_intel: PEBS enabled due to microcode update
[   38.343029] r8169 :05:00.0 eth0: link up
[   39.094944] r8169 :05:00.0 eth0: link down
[   41.492768] r8169 :05:00.0 eth0: link up
[   62.782910] xhci_hcd :0b:00.0: Poll event ring: 4294943584
[   62.782938] xhci_hcd :0b:00.0: op reg status = 0x
[   62.782939] xhci_hcd :0b:00.0: HW died, polling stopped.
[   88.754183] pcieport :00:1c.0: PME# enabled
[   88.764182] xhci_hcd :0b:00.0: PME# disabled
[   88.764192] xhci_hcd :0b:00.0: enabling bus mastering
[   88.764206] xhci_hcd :0b:00.0: // Setting command ring address to 
0xd6007001
[   88.764242] xhci_hcd :0b:00.0: Port Status Change Event for port 2
[   88.764246] xhci_hcd :0b:00.0: resume root hub
[   88.764259] xhci_hcd :0b:00.0: handle_port_status: starting port polling.
[   88.764276] xhci_hcd :0b:00.0: xhci_resume: starting port polling.
[   88.764281] xhci_hcd :0b:00.0: hcd_pci_runtime_resume: 0


What "HW died? Why 1c.0 is here? What is this device actually doing?

00:1c.0 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI 
Express Root Port 1 (rev b5) (prog-if 00 [Normal decode])
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- 
Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR- TAbort- Reset- FastB2B-
PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
Capabilities: [40] Express (v2) Root Port (Slot+), MSI 00
DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <64ns, 
L1 <1us
ExtTag- RBE+ FLReset-
DevCtl: Report errors: Correctable- Non-Fatal- Fatal- 
Unsupported-
RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
MaxPayload 128 bytes, MaxReadReq 128 bytes
DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+ 
TransPend-
LnkCap: Port #1, Speed 5GT/s, Width x1, ASPM L0s L1, Latency L0 
<1us, L1 <16us
ClockPM- Surprise- LLActRep+ BwNot-
LnkCtl: ASPM L1 Enabled; RCB 64 bytes Disabled- Retrain- 
CommClk-
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 2.5GT/s, Width x0, TrErr- Train- SlotClk+ 
DLActive- BWMgmt- ABWMgmt-
SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug- 
Surprise-
Slot #0, PowerLimit 10.000W; Interlock- NoCompl+
SltCtl: Enable: AttnBtn- PwrFlt- MRL- PresDet- CmdCplt- HPIrq- 
LinkChg-
Control: AttnInd Unknown, PwrInd Unknown, Power- 
Interlock-
SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet- 
Interlock-
Changed: MRL- PresDet- LinkState-
RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna- 
CRSVisible-
RootCap: CRSVisible-
RootSta: PME ReqID , PMEStatus- PMEPending-
DevCap2: Completion Timeout: Range BC, TimeoutDis+, LTR-, OBFF 
Not Supported ARIFwd-
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, 
OBFF Disabled ARIFwd-
LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
 Transmit Margin: Normal Operating Range, 
EnterModifiedCompliance- ComplianceSOS-
 Compliance De-emphasis: -6dB
LnkSta2: Current De-emphasis Level: -3.5dB, 
EqualizationComplete-, EqualizationPhase1-
 EqualizationPhase2-, EqualizationPhase3-,

Re: [Update][PATCH] PCI / ACPI: Always resume devices on ACPI wakeup notifications

2013-03-29 Thread Martin Mokrejs

Hi,
  I applied this patches over 3.8.3 hoping it will fix my issue under
thread: Re: 3.8.2: xhci port is dead until pcieport PME# goes to disabled
but unfortunately, it is even worse! Now, although lsusb -v nor lsusb -vv do
wakeup the XHCI port but it falls asleep immediately, more quickly than I am
able to plug a device into the socket. To get a device working in the USB3 
socket
I need to plug it in, run lsusb -vv and then it is recognized.

Without the patch, the 'lsusb -vv' woke up the port (PME# disabled happened
on both 1c.4 and 0b:00.0) and I had unlimited time to find some USB device
around and to plug it into the slot.


  I noticed this message some while after a bootup (no external USB devices were
connected to the laptop, neither into USB2 socket nor into USB3.0 sockets) 
before
I started to do the tests:

[   36.594171] xhci_hcd :0b:00.0: xhci_suspend: stopping port polling.
[   36.594202] xhci_hcd :0b:00.0: // Setting command ring address to 
0xd6007001
[   36.594247] xhci_hcd :0b:00.0: hcd_pci_runtime_suspend: 0
[   36.594349] xhci_hcd :0b:00.0: PME# enabled
[   36.703695] r8169 :05:00.0 eth0: link down
[   37.098299] microcode: CPU0 updated to revision 0x28, date = 2012-04-24
[   37.098941] microcode: CPU1 updated to revision 0x28, date = 2012-04-24
[   37.098944] perf_event_intel: PEBS enabled due to microcode update
[   38.343029] r8169 :05:00.0 eth0: link up
[   39.094944] r8169 :05:00.0 eth0: link down
[   41.492768] r8169 :05:00.0 eth0: link up
[   62.782910] xhci_hcd :0b:00.0: Poll event ring: 4294943584
[   62.782938] xhci_hcd :0b:00.0: op reg status = 0x
[   62.782939] xhci_hcd :0b:00.0: HW died, polling stopped.
[   88.754183] pcieport :00:1c.0: PME# enabled
[   88.764182] xhci_hcd :0b:00.0: PME# disabled
[   88.764192] xhci_hcd :0b:00.0: enabling bus mastering
[   88.764206] xhci_hcd :0b:00.0: // Setting command ring address to 
0xd6007001
[   88.764242] xhci_hcd :0b:00.0: Port Status Change Event for port 2
[   88.764246] xhci_hcd :0b:00.0: resume root hub
[   88.764259] xhci_hcd :0b:00.0: handle_port_status: starting port polling.
[   88.764276] xhci_hcd :0b:00.0: xhci_resume: starting port polling.
[   88.764281] xhci_hcd :0b:00.0: hcd_pci_runtime_resume: 0


What HW died? Why 1c.0 is here? What is this device actually doing?

00:1c.0 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI 
Express Root Port 1 (rev b5) (prog-if 00 [Normal decode])
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- 
Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast TAbort- TAbort- 
MAbort- SERR- PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Bus: primary=00, secondary=03, subordinate=04, sec-latency=0
I/O behind bridge: f000-0fff
Memory behind bridge: fff0-000f
Prefetchable memory behind bridge: fff0-000f
Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast TAbort- TAbort- 
MAbort+ SERR- PERR-
BridgeCtl: Parity- SERR- NoISA- VGA- MAbort- Reset- FastB2B-
PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
Capabilities: [40] Express (v2) Root Port (Slot+), MSI 00
DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s 64ns, 
L1 1us
ExtTag- RBE+ FLReset-
DevCtl: Report errors: Correctable- Non-Fatal- Fatal- 
Unsupported-
RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
MaxPayload 128 bytes, MaxReadReq 128 bytes
DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+ 
TransPend-
LnkCap: Port #1, Speed 5GT/s, Width x1, ASPM L0s L1, Latency L0 
1us, L1 16us
ClockPM- Surprise- LLActRep+ BwNot-
LnkCtl: ASPM L1 Enabled; RCB 64 bytes Disabled- Retrain- 
CommClk-
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 2.5GT/s, Width x0, TrErr- Train- SlotClk+ 
DLActive- BWMgmt- ABWMgmt-
SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug- 
Surprise-
Slot #0, PowerLimit 10.000W; Interlock- NoCompl+
SltCtl: Enable: AttnBtn- PwrFlt- MRL- PresDet- CmdCplt- HPIrq- 
LinkChg-
Control: AttnInd Unknown, PwrInd Unknown, Power- 
Interlock-
SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet- 
Interlock-
Changed: MRL- PresDet- LinkState-
RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna- 
CRSVisible-
RootCap: CRSVisible-
RootSta: PME ReqID , PMEStatus- PMEPending-
DevCap2: Completion Timeout: Range BC, TimeoutDis+, LTR-, OBFF 
Not Supported ARIFwd-
DevCtl2: Completion Timeout: 50us to 50ms,

Re: [Update][PATCH] PCI / ACPI: Always resume devices on ACPI wakeup notifications

2013-03-29 Thread Martin Mokrejs

Sarah,
  please let me know if you feel the test was screwed by laptop-mode-tools
kicking in, although I believed they were not running while I was on AC power.
I was testing under these conditions:

vostro ~ # grep . /sys/bus/pci/devices/*/power/control
/sys/bus/pci/devices/:00:00.0/power/control:auto
/sys/bus/pci/devices/:00:02.0/power/control:auto
/sys/bus/pci/devices/:00:16.0/power/control:auto
/sys/bus/pci/devices/:00:1a.0/power/control:auto
/sys/bus/pci/devices/:00:1b.0/power/control:auto
/sys/bus/pci/devices/:00:1c.0/power/control:auto
/sys/bus/pci/devices/:00:1c.1/power/control:auto
/sys/bus/pci/devices/:00:1c.3/power/control:auto
/sys/bus/pci/devices/:00:1c.4/power/control:auto
/sys/bus/pci/devices/:00:1c.7/power/control:auto
/sys/bus/pci/devices/:00:1d.0/power/control:auto
/sys/bus/pci/devices/:00:1f.0/power/control:auto
/sys/bus/pci/devices/:00:1f.2/power/control:auto
/sys/bus/pci/devices/:00:1f.3/power/control:auto
/sys/bus/pci/devices/:05:00.0/power/control:auto
/sys/bus/pci/devices/:09:00.0/power/control:auto
/sys/bus/pci/devices/:0b:00.0/power/control:auto
/sys/bus/pci/devices/:11:00.0/power/control:auto
vostro ~ # grep . /sys/bus/pci/devices/*/power/runtime_status
/sys/bus/pci/devices/:00:00.0/power/runtime_status:suspended
/sys/bus/pci/devices/:00:02.0/power/runtime_status:active
/sys/bus/pci/devices/:00:16.0/power/runtime_status:suspended
/sys/bus/pci/devices/:00:1a.0/power/runtime_status:active
/sys/bus/pci/devices/:00:1b.0/power/runtime_status:active
/sys/bus/pci/devices/:00:1c.0/power/runtime_status:suspended
/sys/bus/pci/devices/:00:1c.1/power/runtime_status:active
/sys/bus/pci/devices/:00:1c.3/power/runtime_status:active
/sys/bus/pci/devices/:00:1c.4/power/runtime_status:active
/sys/bus/pci/devices/:00:1c.7/power/runtime_status:active
/sys/bus/pci/devices/:00:1d.0/power/runtime_status:active
/sys/bus/pci/devices/:00:1f.0/power/runtime_status:active
/sys/bus/pci/devices/:00:1f.2/power/runtime_status:active
/sys/bus/pci/devices/:00:1f.3/power/runtime_status:suspended
/sys/bus/pci/devices/:05:00.0/power/runtime_status:active
/sys/bus/pci/devices/:09:00.0/power/runtime_status:active
/sys/bus/pci/devices/:0b:00.0/power/runtime_status:active
/sys/bus/pci/devices/:11:00.0/power/runtime_status:active
vostro ~ # 

My apologies if that twisted the test and thanks for you detailed explanations.

I will spot below, however, a few questions.

Sarah Sharp wrote:
 On Fri, Mar 29, 2013 at 04:05:54PM +0100, Martin Mokrejs wrote:

 
 Nevertheless, I went to check if if the USB3 socket dies after first unplug 
 of device
 or not anymore thanks to the patch being tested:

 I plugged into the USB3.0 socket a mouse, it worked. Around its unplug I got:

 [   94.954779] hub 3-0:1.0: debounce: port 2: total 100ms stable 100ms 
 status 0x100
 [   94.954795] hub 3-0:1.0: hub_suspend
 [   94.954802] usb usb3: bus auto-suspend, wakeup 1
 [   94.954817] xhci_hcd :0b:00.0: xhci_hub_status_data: stopping port 
 polling.
 [   94.954835] xhci_hcd :0b:00.0: xhci_suspend: stopping port polling.
 [   94.954857] xhci_hcd :0b:00.0: // Setting command ring address to 
 0xd6007001
 [   94.954898] xhci_hcd :0b:00.0: hcd_pci_runtime_suspend: 0
 [   94.954983] xhci_hcd :0b:00.0: PME# enabled
 [  169.622513] hub 2-1:1.0: state 7 ports 8 chg  evt 0004
 [  169.623057] hub 2-1:1.0: port 2, status 0101, change 0001, 12 Mb/s
 [  169.777012] hub 2-1:1.0: debounce: port 2: total 100ms stable 100ms 
 status 0x101
 [  169.856992] usb 2-1.2: new low-speed USB device number 4 using ehci-pci

 and the port was dead, no matter what lsusb -v or -vv options I tried. At 
 about
 [  169.622513] I plugged the mouse into a USB2.0 socket (do not know if that 
 is 1a.0 or 1d.0).
 
 All right, I wonder if the USB core/xHCI driver is forgetting to clear a
 port status change bit after the device is unplugged.  That can cause
 the xHCI host to not give us a port status change event later (and thus
 no PME).  Looking at the logs later, it doesn't seem like we do this
 though.
 
 If I run lsusb -vv it does (with the problematic patch):

 [ 1760.414086] pcieport :00:1c.4: PME# disabled
 [ 1760.434314] xhci_hcd :0b:00.0: PME# disabled
 [ 1760.434327] xhci_hcd :0b:00.0: enabling bus mastering
 [ 1760.434338] xhci_hcd :0b:00.0: // Setting command ring address to 
 0xd6007001
 [ 1760.434360] xhci_hcd :0b:00.0: Port Status Change Event for port 2
 
 Ok, so the xHCI driver *is* getting a port status change event, and thus
 must have gotten a PME.  So the PCI layer is doing its job.
 
 [ 1760.434363] xhci_hcd :0b:00.0: resume root hub
 [ 1760.434367] xhci_hcd :0b:00.0: handle_port_status: starting port 
 polling.
 [ 1760.434378] xhci_hcd :0b:00.0: xhci_resume: starting port polling.
 [ 1760.434383] xhci_hcd :0b:00.0: hcd_pci_runtime_resume: 0
 [ 1760.434388

Re: [Update][PATCH] PCI / ACPI: Always resume devices on ACPI wakeup notifications

2013-03-29 Thread Martin Mokrejs

So, I re-tested again with the patch and 3.8.3 but without laptop-mode-tools.
The xHCI port works fine provided 
/sys/bus/pci/devices/:0b:00.0/power/control
is set to on and /sys/bus/pci/devices/:00:1c.4/power/control also to on.
If I set parent 1c.4 to auto, it gets suspended and the port seems dead until
a device is in and I wake it using lsusb -vv. There must be a bug in linux so
that it cannot overcome upstream 1c.4 sleeping while willing to access 0b:00.
Or more likely, that upstream root port should be prevented to fall asleep, 
right?


# lspci -tv
-[:00]-+-00.0  Intel Corporation 2nd Generation Core Processor Family DRAM 
Controller
   +-02.0  Intel Corporation 2nd Generation Core Processor Family 
Integrated Graphics Controller
   +-16.0  Intel Corporation 6 Series/C200 Series Chipset Family MEI 
Controller #1
   +-1a.0  Intel Corporation 6 Series/C200 Series Chipset Family USB 
Enhanced Host Controller #2
   +-1b.0  Intel Corporation 6 Series/C200 Series Chipset Family High 
Definition Audio Controller
   +-1c.0-[03-04]--
   +-1c.1-[05-06]00.0  Realtek Semiconductor Co., Ltd. RTL8111/8168 
PCI Express Gigabit Ethernet controller
   +-1c.3-[09-0a]00.0  Intel Corporation Centrino Wireless-N 1030 
[Rainbow Peak]
   +-1c.4-[0b-0c]00.0  Texas Instruments TUSB73x0 SuperSpeed USB 
3.0 xHCI Host Controller
   +-1c.7-[11-16]00.0  Silicon Image, Inc. SiI 3132 Serial ATA Raid 
II Controller
   +-1d.0  Intel Corporation 6 Series/C200 Series Chipset Family USB 
Enhanced Host Controller #1
   +-1f.0  Intel Corporation HM67 Express Chipset Family LPC Controller
   +-1f.2  Intel Corporation 6 Series/C200 Series Chipset Family 6 port 
SATA AHCI Controller
   \-1f.3  Intel Corporation 6 Series/C200 Series Chipset Family SMBus 
Controller
#



I have attached the lspci -vvv -n.

Interestingly, maybe, the TI xHCI controller ended up after my tests
changed. I booted up with all devices with power/control set to on
due to laptop-mode-tools uninstalled. I fiddled with the echo commands
tweaking 1c.4 and 0b:00 but in the end set both back to on. However,
below is some diff. Don't know what that means. Maybe because I tried
to write '0', 'off', 'none' to the control file? ;-)

 00:1c.4 0604: 8086:1c18 (rev b5) (prog-if 00 [Normal decode])
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- 
Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast TAbort- TAbort- 
MAbort- SERR- PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Bus: primary=00, secondary=0b, subordinate=0c, sec-latency=0
I/O behind bridge: f000-0fff
Memory behind bridge: f7d0-f7df
Prefetchable memory behind bridge: fff0-000f
Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast TAbort- TAbort- 
MAbort+ SERR- PERR-
BridgeCtl: Parity- SERR- NoISA- VGA- MAbort- Reset- FastB2B-
PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
Capabilities: [40] Express (v2) Root Port (Slot+), MSI 00
DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s 64ns, 
L1 1us
ExtTag- RBE+ FLReset-
DevCtl: Report errors: Correctable- Non-Fatal- Fatal- 
Unsupported-
RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
MaxPayload 128 bytes, MaxReadReq 128 bytes
DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+ 
TransPend-
LnkCap: Port #5, Speed 5GT/s, Width x1, ASPM L0s L1, Latency L0 
512ns, L1 16us
ClockPM- Surprise- LLActRep+ BwNot-
LnkCtl: ASPM L1 Enabled; RCB 64 bytes Disabled- Retrain- 
CommClk+
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive+ 
BWMgmt+ ABWMgmt+
SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug- 
Surprise-
Slot #4, PowerLimit 10.000W; Interlock- NoCompl+
SltCtl: Enable: AttnBtn- PwrFlt- MRL- PresDet- CmdCplt- HPIrq- 
LinkChg-
Control: AttnInd Unknown, PwrInd Unknown, Power- 
Interlock-
SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet+ 
Interlock-
Changed: MRL- PresDet- LinkState+
RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna- 
CRSVisible-
RootCap: CRSVisible-
-   RootSta: PME ReqID , PMEStatus- PMEPending-
+   RootSta: PME ReqID 0b00, PMEStatus- PMEPending-
DevCap2: Completion Timeout: Range BC, TimeoutDis+, LTR-, OBFF 
Not Supported ARIFwd-
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, 
OBFF Disabled ARIFwd-

Re: 3.9-rc3+: reports battery as 0 mWh capacity on thinkpad x60

2013-03-24 Thread Martin Mokrejs



Pavel Machek wrote:
> Hi!
> 
>> pavel@amd:~$ cat /proc/acpi/battery/BAT0/info 
>> present: yes
>> design capacity: 0 mWh
>> last full capacity:  0 mWh
>> battery technology:  rechargeable
>> design voltage:  14400 mV
>>
>> This worked before... at least it works in 2.6 kernel used by debian.
>
> This works for me in 3.9-rc3.  May I see your .config?
> ...
>>> But problem is not in /proc, /sys has zeros, too.
>>>
>>> pavel@amd:~$ cat /sys/class/power_supply/BAT0/energy_full
>>> 0
>>> pavel@amd:~$ cat /sys/class/power_supply/BAT0/energy_full_design 
>>> 0
>>> pavel@amd:~$ cat /sys/class/power_supply/BAT0/model_name 
>>> 93P5030
>>> pavel@amd:~$
>>
>> Can you narrow the time frame when it stopped working a bit?
> 
> Well, 2.6.32 from debian works ok, and self-compiled 3.1+ kernel also
> seems to work ok.
> 
> I'm not sure if 3.7+ kernels worked, actually... I'd have to do some
> compiling to check.

FYI, on 3.7.10 I don't have the above files. See below what I do have:

# for f in /sys/class/power_supply/BAT0/*; do echo $f; cat $f; done
/sys/class/power_supply/BAT0/alarm
0
/sys/class/power_supply/BAT0/capacity
106
/sys/class/power_supply/BAT0/charge_full
4126000
/sys/class/power_supply/BAT0/charge_full_design
440
/sys/class/power_supply/BAT0/charge_now
440
/sys/class/power_supply/BAT0/current_now
1000
/sys/class/power_supply/BAT0/cycle_count
0
/sys/class/power_supply/BAT0/device
cat: /sys/class/power_supply/BAT0/device: Is a directory
/sys/class/power_supply/BAT0/manufacturer
SMP
/sys/class/power_supply/BAT0/model_name
DELL 8NH551B
/sys/class/power_supply/BAT0/power
cat: /sys/class/power_supply/BAT0/power: Is a directory
/sys/class/power_supply/BAT0/present
1
/sys/class/power_supply/BAT0/serial_number
 2630
/sys/class/power_supply/BAT0/status
Full
/sys/class/power_supply/BAT0/subsystem
cat: /sys/class/power_supply/BAT0/subsystem: Is a directory
/sys/class/power_supply/BAT0/technology
Li-ion
/sys/class/power_supply/BAT0/type
Battery
/sys/class/power_supply/BAT0/uevent
POWER_SUPPLY_NAME=BAT0
POWER_SUPPLY_STATUS=Full
POWER_SUPPLY_PRESENT=1
POWER_SUPPLY_TECHNOLOGY=Li-ion
POWER_SUPPLY_CYCLE_COUNT=0
POWER_SUPPLY_VOLTAGE_MIN_DESIGN=1110
POWER_SUPPLY_VOLTAGE_NOW=12294000
POWER_SUPPLY_CURRENT_NOW=1000
POWER_SUPPLY_CHARGE_FULL_DESIGN=440
POWER_SUPPLY_CHARGE_FULL=4126000
POWER_SUPPLY_CHARGE_NOW=440
POWER_SUPPLY_CAPACITY=106
POWER_SUPPLY_MODEL_NAME=DELL 8NH551B
POWER_SUPPLY_MANUFACTURER=SMP
POWER_SUPPLY_SERIAL_NUMBER= 2630
/sys/class/power_supply/BAT0/voltage_min_design
1110
/sys/class/power_supply/BAT0/voltage_now
12294000
#
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 3.9-rc3+: reports battery as 0 mWh capacity on thinkpad x60

2013-03-24 Thread Martin Mokrejs



Pavel Machek wrote:
 Hi!
 
 pavel@amd:~$ cat /proc/acpi/battery/BAT0/info 
 present: yes
 design capacity: 0 mWh
 last full capacity:  0 mWh
 battery technology:  rechargeable
 design voltage:  14400 mV

 This worked before... at least it works in 2.6 kernel used by debian.

 This works for me in 3.9-rc3.  May I see your .config?
 ...
 But problem is not in /proc, /sys has zeros, too.

 pavel@amd:~$ cat /sys/class/power_supply/BAT0/energy_full
 0
 pavel@amd:~$ cat /sys/class/power_supply/BAT0/energy_full_design 
 0
 pavel@amd:~$ cat /sys/class/power_supply/BAT0/model_name 
 93P5030
 pavel@amd:~$

 Can you narrow the time frame when it stopped working a bit?
 
 Well, 2.6.32 from debian works ok, and self-compiled 3.1+ kernel also
 seems to work ok.
 
 I'm not sure if 3.7+ kernels worked, actually... I'd have to do some
 compiling to check.

FYI, on 3.7.10 I don't have the above files. See below what I do have:

# for f in /sys/class/power_supply/BAT0/*; do echo $f; cat $f; done
/sys/class/power_supply/BAT0/alarm
0
/sys/class/power_supply/BAT0/capacity
106
/sys/class/power_supply/BAT0/charge_full
4126000
/sys/class/power_supply/BAT0/charge_full_design
440
/sys/class/power_supply/BAT0/charge_now
440
/sys/class/power_supply/BAT0/current_now
1000
/sys/class/power_supply/BAT0/cycle_count
0
/sys/class/power_supply/BAT0/device
cat: /sys/class/power_supply/BAT0/device: Is a directory
/sys/class/power_supply/BAT0/manufacturer
SMP
/sys/class/power_supply/BAT0/model_name
DELL 8NH551B
/sys/class/power_supply/BAT0/power
cat: /sys/class/power_supply/BAT0/power: Is a directory
/sys/class/power_supply/BAT0/present
1
/sys/class/power_supply/BAT0/serial_number
 2630
/sys/class/power_supply/BAT0/status
Full
/sys/class/power_supply/BAT0/subsystem
cat: /sys/class/power_supply/BAT0/subsystem: Is a directory
/sys/class/power_supply/BAT0/technology
Li-ion
/sys/class/power_supply/BAT0/type
Battery
/sys/class/power_supply/BAT0/uevent
POWER_SUPPLY_NAME=BAT0
POWER_SUPPLY_STATUS=Full
POWER_SUPPLY_PRESENT=1
POWER_SUPPLY_TECHNOLOGY=Li-ion
POWER_SUPPLY_CYCLE_COUNT=0
POWER_SUPPLY_VOLTAGE_MIN_DESIGN=1110
POWER_SUPPLY_VOLTAGE_NOW=12294000
POWER_SUPPLY_CURRENT_NOW=1000
POWER_SUPPLY_CHARGE_FULL_DESIGN=440
POWER_SUPPLY_CHARGE_FULL=4126000
POWER_SUPPLY_CHARGE_NOW=440
POWER_SUPPLY_CAPACITY=106
POWER_SUPPLY_MODEL_NAME=DELL 8NH551B
POWER_SUPPLY_MANUFACTURER=SMP
POWER_SUPPLY_SERIAL_NUMBER= 2630
/sys/class/power_supply/BAT0/voltage_min_design
1110
/sys/class/power_supply/BAT0/voltage_now
12294000
#
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] pciehp: Add pciehp_surprise module option

2013-03-20 Thread Martin Mokrejs

Martin Mokrejs wrote:
> Hi Takashi,
>   would you please describe your test system in more detail? How
> about 'lspci -tv'? And 'lsusb -v' of the broken device?
> 
> 1. For me on Dell Vostro 3550 with a SandyBridge chip doing all 
> SATA+USB2+ExpressCardSlot:
> 
> 00:00.0 Host bridge: Intel Corporation 2nd Generation Core Processor Family 
> DRAM Controller (rev 09)
> 00:02.0 VGA compatible controller: Intel Corporation 2nd Generation Core 
> Processor Family Integrated Graphics Controller (rev 09) (prog-if 00 [VGA 
> controller])
> 00:16.0 Communication controller: Intel Corporation 6 Series/C200 Series 
> Chipset Family MEI Controller #1 (rev 04)
> 00:1a.0 USB controller: Intel Corporation 6 Series/C200 Series Chipset Family 
> USB Enhanced Host Controller #2 (rev 05) (prog-if 20 [EHCI])
> 00:1b.0 Audio device: Intel Corporation 6 Series/C200 Series Chipset Family 
> High Definition Audio Controller (rev 05)
> 00:1c.0 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI 
> Express Root Port 1 (rev b5) (prog-if 00 [Normal decode])
> 00:1c.1 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI 
> Express Root Port 2 (rev b5) (prog-if 00 [Normal decode])
> 00:1c.3 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI 
> Express Root Port 4 (rev b5) (prog-if 00 [Normal decode])
> 00:1c.4 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI 
> Express Root Port 5 (rev b5) (prog-if 00 [Normal decode])
> 00:1c.7 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI 
> Express Root Port 8 (rev b5) (prog-if 00 [Normal decode])
> 00:1d.0 USB controller: Intel Corporation 6 Series/C200 Series Chipset Family 
> USB Enhanced Host Controller #1 (rev 05) (prog-if 20 [EHCI])
> 00:1f.0 ISA bridge: Intel Corporation HM67 Express Chipset Family LPC 
> Controller (rev 05)
> 00:1f.2 SATA controller: Intel Corporation 6 Series/C200 Series Chipset 
> Family 6 port SATA AHCI Controller (rev 05) (prog-if 01 [AHCI 1.0])
> 00:1f.3 SMBus: Intel Corporation 6 Series/C200 Series Chipset Family SMBus 
> Controller (rev 05)
> 05:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168 PCI 
> Express Gigabit Ethernet controller (rev 06)
> 09:00.0 Network controller: Intel Corporation Centrino Wireless-N 1030 
> [Rainbow Peak] (rev 34)
> 0b:00.0 USB controller: Texas Instruments TUSB73x0 SuperSpeed USB 3.0 xHCI 
> Host Controller (rev 02) (prog-if 30 [XHCI])
> 11:00.0 Mass storage controller: Silicon Image, Inc. SiI 3132 Serial ATA Raid 
> II Controller (rev 01)
> #
> 
> If I have Realtek MediaCardReader enabled in BIOS, no card in it, coldboot, 
> and hot
> insert an ExpressCard into the slot, the Realtek MediaCardReader pops up in 
> dmesg as
> a new PCI device. How about you?

Err, not PCI device as I said, sorry, but gets re-detected as a USB device:

[4.220009] hub 2-1:1.0: port 6, status 0101, change , 12 Mb/s
[4.291831] usb 2-1.6: new high-speed USB device number 5 using ehci_hcd
[4.409353] usb 2-1.6: default language 0x0409
[4.414740] usb 2-1.6: udev 5, busnum 2, minor = 132
[4.414745] usb 2-1.6: New USB device found, idVendor=0bda, idProduct=0138
[4.414858] usb 2-1.6: New USB device strings: Mfr=1, Product=2, 
SerialNumber=3
[4.414967] usb 2-1.6: Product: USB2.0-CRW
[4.415069] usb 2-1.6: Manufacturer: Generic
[4.415172] usb 2-1.6: SerialNumber: 2009051638820
[4.416956] usb 2-1.6: usb_probe_device
[4.416962] usb 2-1.6: configuration #1 chosen from 1 choice
[4.419477] usb 2-1.6: adding 2-1.6:1.0 (config #1, interface 0)
[4.424094] usb-storage 2-1.6:1.0: usb_probe_interface
[4.424103] usb-storage 2-1.6:1.0: usb_probe_interface - got id
[4.424276] ums-realtek 2-1.6:1.0: usb_probe_interface
[4.424279] ums-realtek 2-1.6:1.0: usb_probe_interface - got id
[4.440838] scsi6 : usb-storage 2-1.6:1.0

cut

[  222.748820] pci :11:00.0: [1095:3132] type 00 class 0x018000
[  222.748865] pci :11:00.0: reg 10: [mem 0x-0x007f 64bit]
[  222.748898] pci :11:00.0: reg 18: [mem 0x-0x3fff 64bit]
[  222.748919] pci :11:00.0: reg 20: [io  0x-0x007f]
[  222.748960] pci :11:00.0: reg 30: [mem 0x-0x0007 pref]
[  222.749095] pci :11:00.0: supports D1 D2
[  222.769438] pci :11:00.0: BAR 6: assigned [mem 0xf000-0xf007 
pref]
[  222.769442] pci :11:00.0: BAR 2: assigned [mem 0xf6c0-0xf6c03fff 
64bit]
[  222.769464] pci :11:00.0: BAR 2: set to [mem 0xf6c0-0xf6c03fff 
64bit] (PCI address [0xf6c0-0xf6c03fff])
[  222.769466] pci :11:00.0: BAR 0: assigned [mem 0xf6c04000-0xf6c0407f 
64bit]
[  222.769487] pci :11:00.0: BAR 0: set to [mem 0xf6c04000-0xf6c0407f 
64bit] (PCI address [0xf6c04000-0xf6c0407f])
[  222.769489] pci :11:00.0: BAR 4:

Re: [PATCH] pciehp: Add pciehp_surprise module option

2013-03-20 Thread Martin Mokrejs

Hi Takashi,
  would you please describe your test system in more detail? How
about 'lspci -tv'? And 'lsusb -v' of the broken device?

1. For me on Dell Vostro 3550 with a SandyBridge chip doing all 
SATA+USB2+ExpressCardSlot:

00:00.0 Host bridge: Intel Corporation 2nd Generation Core Processor Family 
DRAM Controller (rev 09)
00:02.0 VGA compatible controller: Intel Corporation 2nd Generation Core 
Processor Family Integrated Graphics Controller (rev 09) (prog-if 00 [VGA 
controller])
00:16.0 Communication controller: Intel Corporation 6 Series/C200 Series 
Chipset Family MEI Controller #1 (rev 04)
00:1a.0 USB controller: Intel Corporation 6 Series/C200 Series Chipset Family 
USB Enhanced Host Controller #2 (rev 05) (prog-if 20 [EHCI])
00:1b.0 Audio device: Intel Corporation 6 Series/C200 Series Chipset Family 
High Definition Audio Controller (rev 05)
00:1c.0 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI 
Express Root Port 1 (rev b5) (prog-if 00 [Normal decode])
00:1c.1 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI 
Express Root Port 2 (rev b5) (prog-if 00 [Normal decode])
00:1c.3 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI 
Express Root Port 4 (rev b5) (prog-if 00 [Normal decode])
00:1c.4 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI 
Express Root Port 5 (rev b5) (prog-if 00 [Normal decode])
00:1c.7 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI 
Express Root Port 8 (rev b5) (prog-if 00 [Normal decode])
00:1d.0 USB controller: Intel Corporation 6 Series/C200 Series Chipset Family 
USB Enhanced Host Controller #1 (rev 05) (prog-if 20 [EHCI])
00:1f.0 ISA bridge: Intel Corporation HM67 Express Chipset Family LPC 
Controller (rev 05)
00:1f.2 SATA controller: Intel Corporation 6 Series/C200 Series Chipset Family 
6 port SATA AHCI Controller (rev 05) (prog-if 01 [AHCI 1.0])
00:1f.3 SMBus: Intel Corporation 6 Series/C200 Series Chipset Family SMBus 
Controller (rev 05)
05:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168 PCI 
Express Gigabit Ethernet controller (rev 06)
09:00.0 Network controller: Intel Corporation Centrino Wireless-N 1030 [Rainbow 
Peak] (rev 34)
0b:00.0 USB controller: Texas Instruments TUSB73x0 SuperSpeed USB 3.0 xHCI Host 
Controller (rev 02) (prog-if 30 [XHCI])
11:00.0 Mass storage controller: Silicon Image, Inc. SiI 3132 Serial ATA Raid 
II Controller (rev 01)
#

If I have Realtek MediaCardReader enabled in BIOS, no card in it, coldboot, and 
hot
insert an ExpressCard into the slot, the Realtek MediaCardReader pops up in 
dmesg as
a new PCI device. How about you?

My card does NOT show in lspci (maybe because I never plugged in a data card 
into it) but does show in lsusb:

Bus 002 Device 005: ID 0bda:0138 Realtek Semiconductor Corp. RTS5138 Card 
Reader Controller
Device Descriptor:
  bLength18
  bDescriptorType 1
  bcdUSB   2.00
  bDeviceClass0 (Defined at Interface level)
  bDeviceSubClass 0 
  bDeviceProtocol 0 
  bMaxPacketSize064
  idVendor   0x0bda Realtek Semiconductor Corp.
  idProduct  0x0138 RTS5138 Card Reader Controller
  bcdDevice   38.82
  iManufacturer   1 Generic
  iProduct2 USB2.0-CRW
  iSerial 3 2009051638820

Can you try coldboot without a media card inserted before power up without
your patch and check whether the CardReader pops up after you plugin some
ExpressCard into an ExpressCardSlot (not the CardReader)? I presume it is
a laptop. ;-)

2. Is the hotplug broken also under acpiphp? And again, does it get detected
once you plugin some card into an ExpressCard slot?

3. Does the device appear under lsusb also in addition to lspci?

4. How does the 'lack of the hotplug surprise (PCI_EXP_SLTCAP_HPS) capability 
bit'
manifest in 'lspci -vvv' output? A diff before and after the patch?

5. Where is the *real* bug in the code that "linux" ignores the fact that one of
the PCIe Root Ports (or the whole PCI Bridge?) does not support 'hotplug 
surprise'?
Or is this about the hooked up "third-party" PCI devices? Why does it affect
other PCIe ports of the bridge?

Would be nice if you look into any of my previous emails to linux-pci and
with your current knowledge comment whether here or there I faced a same
problem. Looks like. Disabling the hotplug is a no go for me, I need hotplug
for my ExpressCards. So far am rather having disabled the MediaCardReader in
BIOS. But thank you, I did not know that inserting a data card into a CardReader
is supposed to give me a lspci entry for it. So far I saw only the one in lsusb.

Thank you,
Martin

Takashi Iwai wrote:
> We encountered a problem that on some HP machines the Realtek PCI-e
> card reader device appears only when you inserted a card before the
> cold boot.  While debugging, it turned out that the device is actually
> handled via PCI-e hotplug in

Re: [PATCH] pciehp: Add pciehp_surprise module option

2013-03-20 Thread Martin Mokrejs

Hi Takashi,
  would you please describe your test system in more detail? How
about 'lspci -tv'? And 'lsusb -v' of the broken device?

1. For me on Dell Vostro 3550 with a SandyBridge chip doing all 
SATA+USB2+ExpressCardSlot:

00:00.0 Host bridge: Intel Corporation 2nd Generation Core Processor Family 
DRAM Controller (rev 09)
00:02.0 VGA compatible controller: Intel Corporation 2nd Generation Core 
Processor Family Integrated Graphics Controller (rev 09) (prog-if 00 [VGA 
controller])
00:16.0 Communication controller: Intel Corporation 6 Series/C200 Series 
Chipset Family MEI Controller #1 (rev 04)
00:1a.0 USB controller: Intel Corporation 6 Series/C200 Series Chipset Family 
USB Enhanced Host Controller #2 (rev 05) (prog-if 20 [EHCI])
00:1b.0 Audio device: Intel Corporation 6 Series/C200 Series Chipset Family 
High Definition Audio Controller (rev 05)
00:1c.0 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI 
Express Root Port 1 (rev b5) (prog-if 00 [Normal decode])
00:1c.1 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI 
Express Root Port 2 (rev b5) (prog-if 00 [Normal decode])
00:1c.3 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI 
Express Root Port 4 (rev b5) (prog-if 00 [Normal decode])
00:1c.4 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI 
Express Root Port 5 (rev b5) (prog-if 00 [Normal decode])
00:1c.7 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI 
Express Root Port 8 (rev b5) (prog-if 00 [Normal decode])
00:1d.0 USB controller: Intel Corporation 6 Series/C200 Series Chipset Family 
USB Enhanced Host Controller #1 (rev 05) (prog-if 20 [EHCI])
00:1f.0 ISA bridge: Intel Corporation HM67 Express Chipset Family LPC 
Controller (rev 05)
00:1f.2 SATA controller: Intel Corporation 6 Series/C200 Series Chipset Family 
6 port SATA AHCI Controller (rev 05) (prog-if 01 [AHCI 1.0])
00:1f.3 SMBus: Intel Corporation 6 Series/C200 Series Chipset Family SMBus 
Controller (rev 05)
05:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168 PCI 
Express Gigabit Ethernet controller (rev 06)
09:00.0 Network controller: Intel Corporation Centrino Wireless-N 1030 [Rainbow 
Peak] (rev 34)
0b:00.0 USB controller: Texas Instruments TUSB73x0 SuperSpeed USB 3.0 xHCI Host 
Controller (rev 02) (prog-if 30 [XHCI])
11:00.0 Mass storage controller: Silicon Image, Inc. SiI 3132 Serial ATA Raid 
II Controller (rev 01)
#

If I have Realtek MediaCardReader enabled in BIOS, no card in it, coldboot, and 
hot
insert an ExpressCard into the slot, the Realtek MediaCardReader pops up in 
dmesg as
a new PCI device. How about you?

My card does NOT show in lspci (maybe because I never plugged in a data card 
into it) but does show in lsusb:

Bus 002 Device 005: ID 0bda:0138 Realtek Semiconductor Corp. RTS5138 Card 
Reader Controller
Device Descriptor:
  bLength18
  bDescriptorType 1
  bcdUSB   2.00
  bDeviceClass0 (Defined at Interface level)
  bDeviceSubClass 0 
  bDeviceProtocol 0 
  bMaxPacketSize064
  idVendor   0x0bda Realtek Semiconductor Corp.
  idProduct  0x0138 RTS5138 Card Reader Controller
  bcdDevice   38.82
  iManufacturer   1 Generic
  iProduct2 USB2.0-CRW
  iSerial 3 2009051638820


Can you try coldboot without a media card inserted before power up without
your patch and check whether the CardReader pops up after you plugin some
ExpressCard into an ExpressCardSlot (not the CardReader)? I presume it is
a laptop. ;-)

2. Is the hotplug broken also under acpiphp? And again, does it get detected
once you plugin some card into an ExpressCard slot?

3. Does the device appear under lsusb also in addition to lspci?

4. How does the 'lack of the hotplug surprise (PCI_EXP_SLTCAP_HPS) capability 
bit'
manifest in 'lspci -vvv' output? A diff before and after the patch?

5. Where is the *real* bug in the code that linux ignores the fact that one of
the PCIe Root Ports (or the whole PCI Bridge?) does not support 'hotplug 
surprise'?
Or is this about the hooked up third-party PCI devices? Why does it affect
other PCIe ports of the bridge?


Would be nice if you look into any of my previous emails to linux-pci and
with your current knowledge comment whether here or there I faced a same
problem. Looks like. Disabling the hotplug is a no go for me, I need hotplug
for my ExpressCards. So far am rather having disabled the MediaCardReader in
BIOS. But thank you, I did not know that inserting a data card into a CardReader
is supposed to give me a lspci entry for it. So far I saw only the one in lsusb.

Thank you,
Martin


Takashi Iwai wrote:
 We encountered a problem that on some HP machines the Realtek PCI-e
 card reader device appears only when you inserted a card before the
 cold boot.  While debugging, it turned out that the device is actually
 handled via PCI-e hotplug in some

Re: [PATCH] pciehp: Add pciehp_surprise module option

2013-03-20 Thread Martin Mokrejs

Martin Mokrejs wrote:
 Hi Takashi,
   would you please describe your test system in more detail? How
 about 'lspci -tv'? And 'lsusb -v' of the broken device?
 
 1. For me on Dell Vostro 3550 with a SandyBridge chip doing all 
 SATA+USB2+ExpressCardSlot:
 
 00:00.0 Host bridge: Intel Corporation 2nd Generation Core Processor Family 
 DRAM Controller (rev 09)
 00:02.0 VGA compatible controller: Intel Corporation 2nd Generation Core 
 Processor Family Integrated Graphics Controller (rev 09) (prog-if 00 [VGA 
 controller])
 00:16.0 Communication controller: Intel Corporation 6 Series/C200 Series 
 Chipset Family MEI Controller #1 (rev 04)
 00:1a.0 USB controller: Intel Corporation 6 Series/C200 Series Chipset Family 
 USB Enhanced Host Controller #2 (rev 05) (prog-if 20 [EHCI])
 00:1b.0 Audio device: Intel Corporation 6 Series/C200 Series Chipset Family 
 High Definition Audio Controller (rev 05)
 00:1c.0 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI 
 Express Root Port 1 (rev b5) (prog-if 00 [Normal decode])
 00:1c.1 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI 
 Express Root Port 2 (rev b5) (prog-if 00 [Normal decode])
 00:1c.3 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI 
 Express Root Port 4 (rev b5) (prog-if 00 [Normal decode])
 00:1c.4 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI 
 Express Root Port 5 (rev b5) (prog-if 00 [Normal decode])
 00:1c.7 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI 
 Express Root Port 8 (rev b5) (prog-if 00 [Normal decode])
 00:1d.0 USB controller: Intel Corporation 6 Series/C200 Series Chipset Family 
 USB Enhanced Host Controller #1 (rev 05) (prog-if 20 [EHCI])
 00:1f.0 ISA bridge: Intel Corporation HM67 Express Chipset Family LPC 
 Controller (rev 05)
 00:1f.2 SATA controller: Intel Corporation 6 Series/C200 Series Chipset 
 Family 6 port SATA AHCI Controller (rev 05) (prog-if 01 [AHCI 1.0])
 00:1f.3 SMBus: Intel Corporation 6 Series/C200 Series Chipset Family SMBus 
 Controller (rev 05)
 05:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168 PCI 
 Express Gigabit Ethernet controller (rev 06)
 09:00.0 Network controller: Intel Corporation Centrino Wireless-N 1030 
 [Rainbow Peak] (rev 34)
 0b:00.0 USB controller: Texas Instruments TUSB73x0 SuperSpeed USB 3.0 xHCI 
 Host Controller (rev 02) (prog-if 30 [XHCI])
 11:00.0 Mass storage controller: Silicon Image, Inc. SiI 3132 Serial ATA Raid 
 II Controller (rev 01)
 #
 
 If I have Realtek MediaCardReader enabled in BIOS, no card in it, coldboot, 
 and hot
 insert an ExpressCard into the slot, the Realtek MediaCardReader pops up in 
 dmesg as
 a new PCI device. How about you?

Err, not PCI device as I said, sorry, but gets re-detected as a USB device:

[4.220009] hub 2-1:1.0: port 6, status 0101, change , 12 Mb/s
[4.291831] usb 2-1.6: new high-speed USB device number 5 using ehci_hcd
[4.409353] usb 2-1.6: default language 0x0409
[4.414740] usb 2-1.6: udev 5, busnum 2, minor = 132
[4.414745] usb 2-1.6: New USB device found, idVendor=0bda, idProduct=0138
[4.414858] usb 2-1.6: New USB device strings: Mfr=1, Product=2, 
SerialNumber=3
[4.414967] usb 2-1.6: Product: USB2.0-CRW
[4.415069] usb 2-1.6: Manufacturer: Generic
[4.415172] usb 2-1.6: SerialNumber: 2009051638820
[4.416956] usb 2-1.6: usb_probe_device
[4.416962] usb 2-1.6: configuration #1 chosen from 1 choice
[4.419477] usb 2-1.6: adding 2-1.6:1.0 (config #1, interface 0)
[4.424094] usb-storage 2-1.6:1.0: usb_probe_interface
[4.424103] usb-storage 2-1.6:1.0: usb_probe_interface - got id
[4.424276] ums-realtek 2-1.6:1.0: usb_probe_interface
[4.424279] ums-realtek 2-1.6:1.0: usb_probe_interface - got id
[4.440838] scsi6 : usb-storage 2-1.6:1.0

cut

[  222.748820] pci :11:00.0: [1095:3132] type 00 class 0x018000
[  222.748865] pci :11:00.0: reg 10: [mem 0x-0x007f 64bit]
[  222.748898] pci :11:00.0: reg 18: [mem 0x-0x3fff 64bit]
[  222.748919] pci :11:00.0: reg 20: [io  0x-0x007f]
[  222.748960] pci :11:00.0: reg 30: [mem 0x-0x0007 pref]
[  222.749095] pci :11:00.0: supports D1 D2
[  222.769438] pci :11:00.0: BAR 6: assigned [mem 0xf000-0xf007 
pref]
[  222.769442] pci :11:00.0: BAR 2: assigned [mem 0xf6c0-0xf6c03fff 
64bit]
[  222.769464] pci :11:00.0: BAR 2: set to [mem 0xf6c0-0xf6c03fff 
64bit] (PCI address [0xf6c0-0xf6c03fff])
[  222.769466] pci :11:00.0: BAR 0: assigned [mem 0xf6c04000-0xf6c0407f 
64bit]
[  222.769487] pci :11:00.0: BAR 0: set to [mem 0xf6c04000-0xf6c0407f 
64bit] (PCI address [0xf6c04000-0xf6c0407f])
[  222.769489] pci :11:00.0: BAR 4: assigned [io  0xc000-0xc07f]
[  222.769496] pci :11:00.0: BAR 4: set to [io  0xc000-0xc07f] (PCI address 
[0xc000-0xc07f])
[  222.891588] sata_sil24 :11:00.0: version 1.1
[  222.891606] sata_sil24

Re: [PATCH] pci: Disable slot presence detection around bus reset

2013-02-14 Thread Martin Mokrejs

Hi Alex,
  I was just going to ask you whether your patch would "explain" why pciehp has
in my experience broken presence detection while acpiphp has not (on 3.7 kernel)
and whether the patch will fix it.
  Some testing I have done in the past on 3.2 kernel and on 3.7.1, with no 
fixes.
Maybe you are interested in these threads? Actually, another user confirmed that
pciehp is broken on 3.7 while luckily, he also could have shifted to acpiphp.
Still, it is weird the behavior is different for different express cards
(USB3 vs. SATA vs. RS232 vs. firewire).

Four thread subjects on card presence detection:

Re: 3.2.11: PCI Express card cannot be re-detected withing cca 60sec timeframe
Re: linux-3.4-rc5: eSATA Sil3132 ExpressCard removal results in 
warn_slowpath_common
Re: Dell Vostro 3550: pci_hotplug+acpiphp require 'pcie_aspm=force' on kernel 
command-line for hotplug to work
Re: [PATCH v3 3/6] ACPI/pci_slot: update PCI slot information when PCI hotplug 
event happens
Re: 3.8.0-rc4+ - Oops on removing WinTV-HVR-1400 expresscard TV Tuner

Maybe you will crack it? ;-)
Thanks,
Martin

Alex Williamson wrote:
> On Thu, 2013-02-14 at 11:37 -0700, Alex Williamson wrote:
>> A bus reset can trigger a presence detection change and result in a
>> suprise hotplug.  This is generally not what we want to happen when
>> trying to reset a device.  Disable the presence detection control on
>> on bridges around bus reset.
>>
>> Signed-off-by: Alex Williamson 
>> ---
>>  drivers/pci/pci.c |   29 -
>>  1 file changed, 24 insertions(+), 5 deletions(-)
> 
> 
> Hmm, this doesn't seem to be sufficient, still seeing it
> occasionally :-\
> 
>> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
>> index 5cb5820..c1f7d77 100644
>> --- a/drivers/pci/pci.c
>> +++ b/drivers/pci/pci.c
>> @@ -3229,8 +3229,8 @@ static int pci_pm_reset(struct pci_dev *dev, int probe)
>>  
>>  static int pci_parent_bus_reset(struct pci_dev *dev, int probe)
>>  {
>> -u16 ctrl;
>> -struct pci_dev *pdev;
>> +u16 ctrl, flags, sltctl = 0;
>> +struct pci_dev *pdev, *bridge;
>>  
>>  if (pci_is_root_bus(dev->bus) || dev->subordinate || !dev->bus->self)
>>  return -ENOTTY;
>> @@ -3242,15 +3242,34 @@ static int pci_parent_bus_reset(struct pci_dev *dev, 
>> int probe)
>>  if (probe)
>>  return 0;
>>  
>> -pci_read_config_word(dev->bus->self, PCI_BRIDGE_CONTROL, );
>> +bridge = dev->bus->self;
>> +
>> +/*
>> + * If the parent device supports a slot with presence detection
>> + * change enabled, holding the bus in reset can trigger that and
>> + * cause an unwanted surprise removal.  Disable presence detection
>> + * around the bus reset.
>> + */
>> +pcie_capability_read_word(bridge, PCI_EXP_FLAGS, );
>> +if (flags & PCI_EXP_FLAGS_SLOT) {
>> +pcie_capability_read_word(bridge, PCI_EXP_SLTCTL, );
>> +if (sltctl & PCI_EXP_SLTCTL_PDCE)
>> +pcie_capability_write_word(bridge, PCI_EXP_SLTCTL,
>> +sltctl & ~PCI_EXP_SLTCTL_PDCE);
>> +}
>> +
>> +pci_read_config_word(bridge, PCI_BRIDGE_CONTROL, );
>>  ctrl |= PCI_BRIDGE_CTL_BUS_RESET;
>> -pci_write_config_word(dev->bus->self, PCI_BRIDGE_CONTROL, ctrl);
>> +pci_write_config_word(bridge, PCI_BRIDGE_CONTROL, ctrl);
>>  msleep(100);
>>  
>>  ctrl &= ~PCI_BRIDGE_CTL_BUS_RESET;
>> -pci_write_config_word(dev->bus->self, PCI_BRIDGE_CONTROL, ctrl);
>> +pci_write_config_word(bridge, PCI_BRIDGE_CONTROL, ctrl);
>>  msleep(100);
>>  
>> +if (sltctl & PCI_EXP_SLTCTL_PDCE)
>> +pcie_capability_write_word(bridge, PCI_EXP_SLTCTL, sltctl);
>> +
>>  return 0;
>>  }
>>  
>>
> 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] pci: Disable slot presence detection around bus reset

2013-02-14 Thread Martin Mokrejs

Hi Alex,
  I was just going to ask you whether your patch would explain why pciehp has
in my experience broken presence detection while acpiphp has not (on 3.7 kernel)
and whether the patch will fix it.
  Some testing I have done in the past on 3.2 kernel and on 3.7.1, with no 
fixes.
Maybe you are interested in these threads? Actually, another user confirmed that
pciehp is broken on 3.7 while luckily, he also could have shifted to acpiphp.
Still, it is weird the behavior is different for different express cards
(USB3 vs. SATA vs. RS232 vs. firewire).

Four thread subjects on card presence detection:

Re: 3.2.11: PCI Express card cannot be re-detected withing cca 60sec timeframe
Re: linux-3.4-rc5: eSATA Sil3132 ExpressCard removal results in 
warn_slowpath_common
Re: Dell Vostro 3550: pci_hotplug+acpiphp require 'pcie_aspm=force' on kernel 
command-line for hotplug to work
Re: [PATCH v3 3/6] ACPI/pci_slot: update PCI slot information when PCI hotplug 
event happens
Re: 3.8.0-rc4+ - Oops on removing WinTV-HVR-1400 expresscard TV Tuner

Maybe you will crack it? ;-)
Thanks,
Martin

Alex Williamson wrote:
 On Thu, 2013-02-14 at 11:37 -0700, Alex Williamson wrote:
 A bus reset can trigger a presence detection change and result in a
 suprise hotplug.  This is generally not what we want to happen when
 trying to reset a device.  Disable the presence detection control on
 on bridges around bus reset.

 Signed-off-by: Alex Williamson alex.william...@redhat.com
 ---
  drivers/pci/pci.c |   29 -
  1 file changed, 24 insertions(+), 5 deletions(-)
 
 
 Hmm, this doesn't seem to be sufficient, still seeing it
 occasionally :-\
 
 diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
 index 5cb5820..c1f7d77 100644
 --- a/drivers/pci/pci.c
 +++ b/drivers/pci/pci.c
 @@ -3229,8 +3229,8 @@ static int pci_pm_reset(struct pci_dev *dev, int probe)
  
  static int pci_parent_bus_reset(struct pci_dev *dev, int probe)
  {
 -u16 ctrl;
 -struct pci_dev *pdev;
 +u16 ctrl, flags, sltctl = 0;
 +struct pci_dev *pdev, *bridge;
  
  if (pci_is_root_bus(dev-bus) || dev-subordinate || !dev-bus-self)
  return -ENOTTY;
 @@ -3242,15 +3242,34 @@ static int pci_parent_bus_reset(struct pci_dev *dev, 
 int probe)
  if (probe)
  return 0;
  
 -pci_read_config_word(dev-bus-self, PCI_BRIDGE_CONTROL, ctrl);
 +bridge = dev-bus-self;
 +
 +/*
 + * If the parent device supports a slot with presence detection
 + * change enabled, holding the bus in reset can trigger that and
 + * cause an unwanted surprise removal.  Disable presence detection
 + * around the bus reset.
 + */
 +pcie_capability_read_word(bridge, PCI_EXP_FLAGS, flags);
 +if (flags  PCI_EXP_FLAGS_SLOT) {
 +pcie_capability_read_word(bridge, PCI_EXP_SLTCTL, sltctl);
 +if (sltctl  PCI_EXP_SLTCTL_PDCE)
 +pcie_capability_write_word(bridge, PCI_EXP_SLTCTL,
 +sltctl  ~PCI_EXP_SLTCTL_PDCE);
 +}
 +
 +pci_read_config_word(bridge, PCI_BRIDGE_CONTROL, ctrl);
  ctrl |= PCI_BRIDGE_CTL_BUS_RESET;
 -pci_write_config_word(dev-bus-self, PCI_BRIDGE_CONTROL, ctrl);
 +pci_write_config_word(bridge, PCI_BRIDGE_CONTROL, ctrl);
  msleep(100);
  
  ctrl = ~PCI_BRIDGE_CTL_BUS_RESET;
 -pci_write_config_word(dev-bus-self, PCI_BRIDGE_CONTROL, ctrl);
 +pci_write_config_word(bridge, PCI_BRIDGE_CONTROL, ctrl);
  msleep(100);
  
 +if (sltctl  PCI_EXP_SLTCTL_PDCE)
 +pcie_capability_write_word(bridge, PCI_EXP_SLTCTL, sltctl);
 +
  return 0;
  }
  

 
 
 
 --
 To unsubscribe from this list: send the line unsubscribe linux-pci in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 
 
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 3.8.0-rc4+ - Oops on removing WinTV-HVR-1400 expresscard TV Tuner

2013-01-31 Thread Martin Mokrejs

Hi Chris,

Chris Clayton wrote:
> Hi Martin,
> 
> On 01/28/13 21:02, Martin Mokrejs wrote:
>> Hi Chris,
>>
>> Chris Clayton wrote:
>>> Hi Martin,
>>>
>>> On 01/28/13 12:12, Martin Mokrejs wrote:
>>>> Chris Clayton wrote:

>>>
>>> I've struggled with this a little. For some reason, the expresscard
>>> doesn't always stay properly inserted in the slot when I insert it.
>>> Now that hotplug is working, the modules are being loaded and when
>>> the card pops out again, I get an oops because, of course, the driver
>>> is running and the card disappears. Perhaps the driver can be made a
>>> bit more robust to sudden disappearance of the card. I'll report the
>>
>> Yes, I had or maybe still have same issues here. I used to get an Oops
>> for sata_sil24 card weird behavior for USB3.0 NEC-based card. It was
>> fine always for a VIA-based firewire card and serial PL2303-based one.
>> I found out it is better if a usb device is connected to the USB card
>> because if that slips out then the libata layer quickly realizes that.
>> If there was no device connected, the usb waits too long before it removes
>> the usb hub from the system. And if you plugin the card meanwhile
>> back into the slot, weird thing happen.
>>
> My usb3 expresscard device has arrived and I get an oops with that
> too, if I remove it without unloading the driver first. I guess it
> shouldn't be a surprise that the driver isn't expecting the device to
> disappear.

I avoided the oopses when a USB device to connected to the express card.
Nevertheless, you should report it to linux-usb and linux-pci mailing lists,
along with the oops stacktrace (under a new thread). Maybe you suffer from
another Oops.

> 
> As I mentioned, I have some trouble with the WinTV-HVR-1400 card,
> which sometimes pops out again, if I push it into the slot too hard
> (but I'm geeting better at that with practice). So what I've done
> (with the usb3 card too) to avoid the oopsen is blacklist the driver
> in /etc/modprobe.d/blacklist.conf and then load them when I'm sure
> the card is properly inserted. Not exactly hotplug, but at least I
> don't have to reboot because of an oops- and it's not something I'm
> doing several times an hour.

Yeah, i also my way around - not fiddle much with the cards and if they
slip out during insertion, don't re-plug them too quickly (at least with
the USB3 card and SATA card I had problems).

BTW, if you remove a card, you are supposed to push the card into the slot
so it gets ejected. Do not just pull it out (what I did in the beginnings).
I was told that is not the right way (probably affects the PresDet status).

Martin
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 3.8.0-rc4+ - Oops on removing WinTV-HVR-1400 expresscard TV Tuner

2013-01-31 Thread Martin Mokrejs

Hi Chris,

Chris Clayton wrote:
 Hi Martin,
 
 On 01/28/13 21:02, Martin Mokrejs wrote:
 Hi Chris,

 Chris Clayton wrote:
 Hi Martin,

 On 01/28/13 12:12, Martin Mokrejs wrote:
 Chris Clayton wrote:


 I've struggled with this a little. For some reason, the expresscard
 doesn't always stay properly inserted in the slot when I insert it.
 Now that hotplug is working, the modules are being loaded and when
 the card pops out again, I get an oops because, of course, the driver
 is running and the card disappears. Perhaps the driver can be made a
 bit more robust to sudden disappearance of the card. I'll report the

 Yes, I had or maybe still have same issues here. I used to get an Oops
 for sata_sil24 card weird behavior for USB3.0 NEC-based card. It was
 fine always for a VIA-based firewire card and serial PL2303-based one.
 I found out it is better if a usb device is connected to the USB card
 because if that slips out then the libata layer quickly realizes that.
 If there was no device connected, the usb waits too long before it removes
 the usb hub from the system. And if you plugin the card meanwhile
 back into the slot, weird thing happen.

 My usb3 expresscard device has arrived and I get an oops with that
 too, if I remove it without unloading the driver first. I guess it
 shouldn't be a surprise that the driver isn't expecting the device to
 disappear.

I avoided the oopses when a USB device to connected to the express card.
Nevertheless, you should report it to linux-usb and linux-pci mailing lists,
along with the oops stacktrace (under a new thread). Maybe you suffer from
another Oops.

 
 As I mentioned, I have some trouble with the WinTV-HVR-1400 card,
 which sometimes pops out again, if I push it into the slot too hard
 (but I'm geeting better at that with practice). So what I've done
 (with the usb3 card too) to avoid the oopsen is blacklist the driver
 in /etc/modprobe.d/blacklist.conf and then load them when I'm sure
 the card is properly inserted. Not exactly hotplug, but at least I
 don't have to reboot because of an oops- and it's not something I'm
 doing several times an hour.

Yeah, i also my way around - not fiddle much with the cards and if they
slip out during insertion, don't re-plug them too quickly (at least with
the USB3 card and SATA card I had problems).

BTW, if you remove a card, you are supposed to push the card into the slot
so it gets ejected. Do not just pull it out (what I did in the beginnings).
I was told that is not the right way (probably affects the PresDet status).

Martin
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 3.8.0-rc4+ - Oops on removing WinTV-HVR-1400 expresscard TV Tuner

2013-01-28 Thread Martin Mokrejs

Hi Chris,

Chris Clayton wrote:
> Hi Martin,
> 
> On 01/28/13 12:12, Martin Mokrejs wrote:
>> Chris Clayton wrote:
>>>
>>> [snip]
>>>
>>>> [chris:~]$ cat /proc/cmdline
>>>> root=/dev/sda5 pciehp_ports=native ro resume=/dev/sda6
>>>   ^^
>>>**typo**
>>> I've run the test again with pcie_ports=native and the directories now get 
>>> populated. Even better though, is that when I plug in the card, hotplug 
>>> **works** and the card's drivers are loaded.
>>
>> BTW, I have with acpiphp on 3.7.4:
>>
>> ls -la /sys/bus/pci_express/devices
>> total 0
>> drwxr-xr-x 2 root root 0 Jan 28 13:07 .
>> drwxr-xr-x 4 root root 0 Jan 28 13:07 ..
>> $ ls -la /sys/bus/pci/devices/slots
>
> **typo**
> It should be /sys/bus/pci/slots.
> 
>> ls: cannot access /sys/bus/pci/devices/slots: No such file or directory
>> $
>>
> With acpiphp, I get /sys/bus/pci_express/devices populated but 
> /sys/bus/pci/slots is empty.

OK, I haven't realized the typo, but I have here with acpiphp:

# ls -laR /sys/bus/pci/slots
/sys/bus/pci/slots:
total 0
drwxr-xr-x 3 root root 0 Jan 27 17:14 .
drwxr-xr-x 5 root root 0 Jan 25 15:56 ..
drwxr-xr-x 2 root root 0 Jan 27 17:14 1

/sys/bus/pci/slots/1:
total 0
drwxr-xr-x 2 root root0 Jan 27 17:14 .
drwxr-xr-x 3 root root0 Jan 27 17:14 ..
-r--r--r-- 1 root root 4096 Jan 28 21:31 adapter
-r--r--r-- 1 root root 4096 Jan 27 17:14 address
-rw-r--r-- 1 root root 4096 Jan 28 21:31 attention
-r--r--r-- 1 root root 4096 Jan 28 21:31 cur_bus_speed
-r--r--r-- 1 root root 4096 Jan 28 21:31 latch
-r--r--r-- 1 root root 4096 Jan 28 21:31 max_bus_speed
lrwxrwxrwx 1 root root0 Jan 28 21:31 module -> ../../../../module/acpiphp
-rw-r--r-- 1 root root 4096 Jan 28 21:31 power
#

> 
>> And for me hotplug also works (as far as I can tell). ;-)
>>
>>>
>>> Excellent! Thank you so much for your help (and patience) Martin and Yijing.
>>>
>>> Now to solving why running scandvb doesn't find any TV channels.
>>
>> Would be fine if you could re-do the PresDet checks and confirm whether it 
>> is also broken
>> for you under pciehp.
>
> I've struggled with this a little. For some reason, the expresscard
> doesn't always stay properly inserted in the slot when I insert it.
> Now that hotplug is working, the modules are being loaded and when
> the card pops out again, I get an oops because, of course, the driver
> is running and the card disappears. Perhaps the driver can be made a
> bit more robust to sudden disappearance of the card. I'll report the

Yes, I had or maybe still have same issues here. I used to get an Oops
for sata_sil24 card weird behavior for USB3.0 NEC-based card. It was
fine always for a VIA-based firewire card and serial PL2303-based one.
I found out it is better if a usb device is connected to the USB card
because if that slips out then the libata layer quickly realizes that.
If there was no device connected, the usb waits too long before it removes
the usb hub from the system. And if you plugin the card meanwhile
back into the slot, weird thing happen.


> oops later. Anyway, to run these tests I built a kernel without the
> dvb card's drivers, effectively simulating the situation I had before
> Yijing got hotplug working for me. The card popping out may also have
> affected these diffs a bit because, for example, the first one has
> the CorrErr flag changed, possibly because I had to have two or more
> goes at getting the card to lock in the slot. Yesterday that diff
> showed no changes. Anyway, here are the diffs:
> 
> diff lspci.after_insertion.txt lspci.after_1st_re-insertion.txt
> 262c262
> <   DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+ 
> TransPend-
> ---
>>   DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq- AuxPwr+ 
>> TransPend-
> 295c295
> < 40: 10 80 42 01 00 80 00 00 00 00 10 00 12 3c 12 04
> ---
>> 40: 10 80 42 01 00 80 00 00 00 00 11 00 12 3c 12 04
> 
> 
> diff lspci.after_1st_removal.txt lspci.after_2nd_removal.txt
> 
> 

BTW, with the NEC-based card only after every second removal of the card I got
into PresDet- state. So, on every other diff attempt you won't see a difference!
But we are talking about acpiphp here (unlike pciehp) and with that I also 
have no problems.

> 
> =
> diff lspci.before_insertion.txt lspci.after_1st_removal.txt
> 112c112
> < 60: 20 20 ff 07 00 00 00 00 01 00 00 00 00 00 08 c0
> ---
>> 60: 20 20 ff 07 00 00 0

Re: 3.8.0-rc4+ - Oops on removing WinTV-HVR-1400 expresscard TV Tuner

2013-01-28 Thread Martin Mokrejs

Chris Clayton wrote:
> 
> [snip]
> 
>> [chris:~]$ cat /proc/cmdline
>> root=/dev/sda5 pciehp_ports=native ro resume=/dev/sda6
>  ^^
>   **typo**
> I've run the test again with pcie_ports=native and the directories now get 
> populated. Even better though, is that when I plug in the card, hotplug 
> **works** and the card's drivers are loaded.

BTW, I have with acpiphp on 3.7.4:

ls -la /sys/bus/pci_express/devices
total 0
drwxr-xr-x 2 root root 0 Jan 28 13:07 .
drwxr-xr-x 4 root root 0 Jan 28 13:07 ..
$ ls -la /sys/bus/pci/devices/slots
ls: cannot access /sys/bus/pci/devices/slots: No such file or directory
$

And for me hotplug also works (as far as I can tell). ;-)

> 
> Excellent! Thank you so much for your help (and patience) Martin and Yijing.
> 
> Now to solving why running scandvb doesn't find any TV channels.

Would be fine if you could re-do the PresDet checks and confirm whether it is 
also broken
for you under pciehp.

Martin
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 3.8.0-rc4+ - Oops on removing WinTV-HVR-1400 expresscard TV Tuner

2013-01-28 Thread Martin Mokrejs



Chris Clayton wrote:
> Hi Yijing,
> 
> On 01/28/13 02:40, Yijing Wang wrote:
>> Hi Chris,
>> Sorry for the delay reply. It seems like my reply last night was missed.
>>
>>  From the sysinfo you provide, there are no pcie port devices under 
>> /sys/bus/pci_express/devices.
>> Maybe because there are some problems with _OSC in your laptop, so pcie port 
>> driver won't create pcie port device
>> for hotplug, aer and so on.
>>
>> Maybe you can add boot parameter "pcie_ports=native" and reboot your laptop.
>> Then use #modprobe pciehp pciehp_force=1 pciehp_debug=1 to load pciehp 
>> modules.
>> After above actions, enter /sys/bus/pci_express/devices/ directory and 
>> /sys/bus/pci/slots/
>> Some slots and pcie port devices should be there now.
>>
> Sorry, I've tried your suggestion, but the two directories are still empty.
> 
> I verified the test environment as follows:
> 
> [chris:~]$ uname -a
> Linux laptop 3.7.4 #15 SMP PREEMPT Mon Jan 28 09:43:57 GMT 2013 i686 GNU/Linux
> [chris:~]$ grep acpiphp /boot/System.map-3.7.4
> [chris:~]$ modinfo acpiphp
> modinfo: ERROR: Module acpiphp not found.
> [chris:~]$ modinfo pciehp
> filename:   /lib/modules/3.7.4/kernel/drivers/pci/hotplug/pciehp.ko
> license:GPL
> description:PCI Express Hot Plug Controller Driver
> author: Dan Zink , Greg Kroah-Hartman 
> , Dely Sy 
> depends:
> intree: Y
> vermagic:   3.7.4 SMP preempt mod_unload CORE2
> parm:   pciehp_detect_mode:Slot detection mode: pcie, acpi, auto
>   pcie  - Use PCIe based slot detection
>   acpi  - Use ACPI for slot detection
>   auto(default) - Auto select mode. Use acpi option if duplicate
>   slot ids are found. Otherwise, use pcie option
>  (charp)
> parm:   pciehp_debug:Debugging mode enabled or not (bool)
> parm:   pciehp_poll_mode:Using polling mechanism for hot-plug events 
> or not (bool)
> parm:   pciehp_poll_time:Polling mechanism frequency, in seconds (int)
> parm:   pciehp_force:Force pciehp, even if OSHP is missing (bool)
> [chris:~]$ cat /proc/cmdline
> root=/dev/sda5 pciehp_ports=native ro resume=/dev/sda6
> [chris:~]$ sudo modprobe pciehp pciehp_force=1 pciehp_debug=1
> [chris:~]$ lsmod
> Module  Size  Used by
> pciehp 19907  0
> [...]
> 
> You will notice that the kernel I have used is 3.7.4. I hope that's a 
> suitable kernel for your tests. I've moved away from the 3.8 development 
> kernel onto one that's stable and on which Martin has identified a solution. 
> I see Greg KH released 3.7.5 yesterday and it includes a pciehp change. I'll 
> upgrade to that, run the tests again and report back.
> 
> One question - should I include the (acpi) pci_slot driver in the kernel 
> build or does pciehp populate the directories without pci_slot?

Hi Chris,
  I am not a kernel developer but from the other threads at linux-pci I 
gathered there are in some
scenarios problems with improper loading of the hotplug modules. Therefore, the 
patches floating
now around are to disable hotplug module availability. Therefore, I suggested 
you to try only
only static kernel support for hotplug. That way you don't hit the issue. That 
is for sure not
addressed in 3.7.5, seems that it is probably in -next.
Martin

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 3.8.0-rc4+ - Oops on removing WinTV-HVR-1400 expresscard TV Tuner

2013-01-28 Thread Martin Mokrejs



Chris Clayton wrote:
 Hi Yijing,
 
 On 01/28/13 02:40, Yijing Wang wrote:
 Hi Chris,
 Sorry for the delay reply. It seems like my reply last night was missed.

  From the sysinfo you provide, there are no pcie port devices under 
 /sys/bus/pci_express/devices.
 Maybe because there are some problems with _OSC in your laptop, so pcie port 
 driver won't create pcie port device
 for hotplug, aer and so on.

 Maybe you can add boot parameter pcie_ports=native and reboot your laptop.
 Then use #modprobe pciehp pciehp_force=1 pciehp_debug=1 to load pciehp 
 modules.
 After above actions, enter /sys/bus/pci_express/devices/ directory and 
 /sys/bus/pci/slots/
 Some slots and pcie port devices should be there now.

 Sorry, I've tried your suggestion, but the two directories are still empty.
 
 I verified the test environment as follows:
 
 [chris:~]$ uname -a
 Linux laptop 3.7.4 #15 SMP PREEMPT Mon Jan 28 09:43:57 GMT 2013 i686 GNU/Linux
 [chris:~]$ grep acpiphp /boot/System.map-3.7.4
 [chris:~]$ modinfo acpiphp
 modinfo: ERROR: Module acpiphp not found.
 [chris:~]$ modinfo pciehp
 filename:   /lib/modules/3.7.4/kernel/drivers/pci/hotplug/pciehp.ko
 license:GPL
 description:PCI Express Hot Plug Controller Driver
 author: Dan Zink dan.z...@compaq.com, Greg Kroah-Hartman 
 g...@kroah.com, Dely Sy dely.l...@intel.com
 depends:
 intree: Y
 vermagic:   3.7.4 SMP preempt mod_unload CORE2
 parm:   pciehp_detect_mode:Slot detection mode: pcie, acpi, auto
   pcie  - Use PCIe based slot detection
   acpi  - Use ACPI for slot detection
   auto(default) - Auto select mode. Use acpi option if duplicate
   slot ids are found. Otherwise, use pcie option
  (charp)
 parm:   pciehp_debug:Debugging mode enabled or not (bool)
 parm:   pciehp_poll_mode:Using polling mechanism for hot-plug events 
 or not (bool)
 parm:   pciehp_poll_time:Polling mechanism frequency, in seconds (int)
 parm:   pciehp_force:Force pciehp, even if OSHP is missing (bool)
 [chris:~]$ cat /proc/cmdline
 root=/dev/sda5 pciehp_ports=native ro resume=/dev/sda6
 [chris:~]$ sudo modprobe pciehp pciehp_force=1 pciehp_debug=1
 [chris:~]$ lsmod
 Module  Size  Used by
 pciehp 19907  0
 [...]
 
 You will notice that the kernel I have used is 3.7.4. I hope that's a 
 suitable kernel for your tests. I've moved away from the 3.8 development 
 kernel onto one that's stable and on which Martin has identified a solution. 
 I see Greg KH released 3.7.5 yesterday and it includes a pciehp change. I'll 
 upgrade to that, run the tests again and report back.
 
 One question - should I include the (acpi) pci_slot driver in the kernel 
 build or does pciehp populate the directories without pci_slot?

Hi Chris,
  I am not a kernel developer but from the other threads at linux-pci I 
gathered there are in some
scenarios problems with improper loading of the hotplug modules. Therefore, the 
patches floating
now around are to disable hotplug module availability. Therefore, I suggested 
you to try only
only static kernel support for hotplug. That way you don't hit the issue. That 
is for sure not
addressed in 3.7.5, seems that it is probably in -next.
Martin

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 3.8.0-rc4+ - Oops on removing WinTV-HVR-1400 expresscard TV Tuner

2013-01-28 Thread Martin Mokrejs

Chris Clayton wrote:
 
 [snip]
 
 [chris:~]$ cat /proc/cmdline
 root=/dev/sda5 pciehp_ports=native ro resume=/dev/sda6
  ^^
   **typo**
 I've run the test again with pcie_ports=native and the directories now get 
 populated. Even better though, is that when I plug in the card, hotplug 
 **works** and the card's drivers are loaded.

BTW, I have with acpiphp on 3.7.4:

ls -la /sys/bus/pci_express/devices
total 0
drwxr-xr-x 2 root root 0 Jan 28 13:07 .
drwxr-xr-x 4 root root 0 Jan 28 13:07 ..
$ ls -la /sys/bus/pci/devices/slots
ls: cannot access /sys/bus/pci/devices/slots: No such file or directory
$

And for me hotplug also works (as far as I can tell). ;-)

 
 Excellent! Thank you so much for your help (and patience) Martin and Yijing.
 
 Now to solving why running scandvb doesn't find any TV channels.

Would be fine if you could re-do the PresDet checks and confirm whether it is 
also broken
for you under pciehp.

Martin
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 3.8.0-rc4+ - Oops on removing WinTV-HVR-1400 expresscard TV Tuner

2013-01-28 Thread Martin Mokrejs

Hi Chris,

Chris Clayton wrote:
 Hi Martin,
 
 On 01/28/13 12:12, Martin Mokrejs wrote:
 Chris Clayton wrote:

 [snip]

 [chris:~]$ cat /proc/cmdline
 root=/dev/sda5 pciehp_ports=native ro resume=/dev/sda6
   ^^
**typo**
 I've run the test again with pcie_ports=native and the directories now get 
 populated. Even better though, is that when I plug in the card, hotplug 
 **works** and the card's drivers are loaded.

 BTW, I have with acpiphp on 3.7.4:

 ls -la /sys/bus/pci_express/devices
 total 0
 drwxr-xr-x 2 root root 0 Jan 28 13:07 .
 drwxr-xr-x 4 root root 0 Jan 28 13:07 ..
 $ ls -la /sys/bus/pci/devices/slots

 **typo**
 It should be /sys/bus/pci/slots.
 
 ls: cannot access /sys/bus/pci/devices/slots: No such file or directory
 $

 With acpiphp, I get /sys/bus/pci_express/devices populated but 
 /sys/bus/pci/slots is empty.

OK, I haven't realized the typo, but I have here with acpiphp:

# ls -laR /sys/bus/pci/slots
/sys/bus/pci/slots:
total 0
drwxr-xr-x 3 root root 0 Jan 27 17:14 .
drwxr-xr-x 5 root root 0 Jan 25 15:56 ..
drwxr-xr-x 2 root root 0 Jan 27 17:14 1

/sys/bus/pci/slots/1:
total 0
drwxr-xr-x 2 root root0 Jan 27 17:14 .
drwxr-xr-x 3 root root0 Jan 27 17:14 ..
-r--r--r-- 1 root root 4096 Jan 28 21:31 adapter
-r--r--r-- 1 root root 4096 Jan 27 17:14 address
-rw-r--r-- 1 root root 4096 Jan 28 21:31 attention
-r--r--r-- 1 root root 4096 Jan 28 21:31 cur_bus_speed
-r--r--r-- 1 root root 4096 Jan 28 21:31 latch
-r--r--r-- 1 root root 4096 Jan 28 21:31 max_bus_speed
lrwxrwxrwx 1 root root0 Jan 28 21:31 module - ../../../../module/acpiphp
-rw-r--r-- 1 root root 4096 Jan 28 21:31 power
#

 
 And for me hotplug also works (as far as I can tell). ;-)


 Excellent! Thank you so much for your help (and patience) Martin and Yijing.

 Now to solving why running scandvb doesn't find any TV channels.

 Would be fine if you could re-do the PresDet checks and confirm whether it 
 is also broken
 for you under pciehp.

 I've struggled with this a little. For some reason, the expresscard
 doesn't always stay properly inserted in the slot when I insert it.
 Now that hotplug is working, the modules are being loaded and when
 the card pops out again, I get an oops because, of course, the driver
 is running and the card disappears. Perhaps the driver can be made a
 bit more robust to sudden disappearance of the card. I'll report the

Yes, I had or maybe still have same issues here. I used to get an Oops
for sata_sil24 card weird behavior for USB3.0 NEC-based card. It was
fine always for a VIA-based firewire card and serial PL2303-based one.
I found out it is better if a usb device is connected to the USB card
because if that slips out then the libata layer quickly realizes that.
If there was no device connected, the usb waits too long before it removes
the usb hub from the system. And if you plugin the card meanwhile
back into the slot, weird thing happen.


 oops later. Anyway, to run these tests I built a kernel without the
 dvb card's drivers, effectively simulating the situation I had before
 Yijing got hotplug working for me. The card popping out may also have
 affected these diffs a bit because, for example, the first one has
 the CorrErr flag changed, possibly because I had to have two or more
 goes at getting the card to lock in the slot. Yesterday that diff
 showed no changes. Anyway, here are the diffs:
 
 diff lspci.after_insertion.txt lspci.after_1st_re-insertion.txt
 262c262
DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+ 
 TransPend-
 ---
   DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq- AuxPwr+ 
 TransPend-
 295c295
  40: 10 80 42 01 00 80 00 00 00 00 10 00 12 3c 12 04
 ---
 40: 10 80 42 01 00 80 00 00 00 00 11 00 12 3c 12 04
 
 
 diff lspci.after_1st_removal.txt lspci.after_2nd_removal.txt
 
 no difference

BTW, with the NEC-based card only after every second removal of the card I got
into PresDet- state. So, on every other diff attempt you won't see a difference!
But we are talking about acpiphp here (unlike pciehp) and with that I also 
have no problems.

 
 =
 diff lspci.before_insertion.txt lspci.after_1st_removal.txt
 112c112
  60: 20 20 ff 07 00 00 00 00 01 00 00 00 00 00 08 c0
 ---
 60: 20 20 ff 07 00 00 00 00 01 00 00 00 00 00 00 c0
 262,263c262,263
DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+ 
 TransPend-
LnkCap: Port #4, Speed 5GT/s, Width x1, ASPM L0s L1, Latency 
 L0 1us, L1 16us
 ---
   DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq- AuxPwr+ 
 TransPend-
   LnkCap: Port #4, Speed 5GT/s, Width x1, ASPM L0s L1, Latency 
 L0 512ns, L1 16us
 265c265
LnkCtl: ASPM L0s L1 Enabled; RCB 64 bytes Disabled- Retrain- 
 CommClk

Re: 3.8.0-rc4+ - Oops on removing WinTV-HVR-1400 expresscard TV Tuner

2013-01-27 Thread Martin Mokrejs

Hi Chris,

Chris Clayton wrote:
> Thanks again, Martin.
> 
> Firstly, maybe we should remove the linux-media list from the copy list. I 
> imagine this hotplug stuff is just noise to them.
> 
> [snip]
>> Do you have any other express card around to try if it works at all? Try 
>> that always after a cold boot.
>>
> Not at the moment, but I ordered at USB3 expresscard yesterday, so I will 
> have one soon.
> 
>> Posting a diff result of the below procedure might help:
>>
>> # lspci -vvvxxx > lspci.before_insertion.txt
>>
>> [plug your card into the slot]
>>
>> # lspci -vvvxxx > lspci.after_insertion.txt
>>
>> [ unplug your card]
>>
>> # lspci -vvvxxx > lspci.after_1st_removal.txt
>>
>> [re-plug your card into the slot]
>>
>> # lspci -vvvxxx > lspci.after_1st_re-insertion.txt
>>
>> [ unplug your card]
>>
>> # lspci -vvvxxx > lspci.after_2nd_removal.txt
>>
> 
> OK, I've been using kernel 3.8.0-rc kernels so far, but given that is still 
> under development, I've switched to 3.7.4, mainly because you are having 
> success with 3.7.x, acpiphp and pcie_aspm=off. I verified the environment as 
> follows:
> 
> [chris:~]$ cat /proc/cmdline
> root=/dev/sda5 pcie_aspm=off ro resume=/dev/sda6
> [chris:~]$ dmesg | grep ASPM
> [0.00] PCIe ASPM is disabled
> [0.348959]  pci:00: ACPI _OSC support notification failed, disabling 
> PCIe ASPM
> [chris:~]$ dmesg | grep acpiphp
> [0.400846] acpiphp: ACPI Hot Plug PCI Controller Driver version: 0.5
> [chris:~]$ dmesg | grep pciehp
> [chris:~]$ uname -a
> Linux laptop 3.7.4 #13 SMP PREEMPT Sun Jan 27 18:39:39 GMT 2013 i686 GNU/Linux
> 

vostro ~ # cat /proc/cmdline
root=/dev/sda5 pciehp.pciehp_debug=1 slub_debug=AFPZ pcie_aspm=off
vostro ~ # dmesg | grep ASPM
[0.00] PCIe ASPM is disabled
vostro ~ # dmesg | grep acpiphp
[2.449038] acpiphp: ACPI Hot Plug PCI Controller Driver version: 0.5
[2.453757] acpiphp: Slot [1] registered
vostro ~ # uname -a
Linux vostro 3.7.4-default #2 SMP Mon Jan 21 22:45:22 MET 2013 x86_64 Intel(R) 
Core(TM) i7-2640M CPU @ 2.80GHz GenuineIntel GNU/Linux
vostro ~ # 

> 
>> Then compare them using diff. These should have no difference:
>>
>> diff lspci.after_insertion.txt lspci.after_1st_re-insertion.txt
>> diff lspci.after_1st_removal.txt lspci.after_2nd_removal.txt
>>
> Correct, there were no differences.
> 
>>
>> These may have only little difference, or none:
>>
>> diff lspci.before_insertion.txt lspci.after_1st_removal.txt
> 
> 263c263
> <   LnkCap: Port #4, Speed 5GT/s, Width x1, ASPM L0s L1, Latency 
> L0 <1us, L1 <16us
> ---
>  >   LnkCap: Port #4, Speed 5GT/s, Width x1, ASPM L0s L1, Latency 
> L0 <512ns, L1 <16us
> 265c265
> <   LnkCtl: ASPM L0s L1 Enabled; RCB 64 bytes Disabled- Retrain- 
> CommClk-
> ---
>  >   LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- 
> CommClk+
> 267c267
> <   LnkSta: Speed 2.5GT/s, Width x0, TrErr- Train- SlotClk+ 
> DLActive- BWMgmt- ABWMgmt-
> ---
>  >   LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ 
> DLActive- BWMgmt+ ABWMgmt-
> 273c273
> <   Changed: MRL- PresDet- LinkState-
> ---
>  >   Changed: MRL- PresDet- LinkState+
> 295,296c295,296
> < 40: 10 80 42 01 00 80 00 00 00 00 10 00 12 4c 12 04
> < 50: 03 00 01 10 60 b2 1c 00 08 00 00 00 00 00 00 00
> ---
>  > 40: 10 80 42 01 00 80 00 00 00 00 10 00 12 3c 12 04
>  > 50: 40 00 11 50 60 b2 1c 00 08 00 00 01 00 00 00 00
> 
>> diff lspci.after_1st_removal.txt lspci.after_2nd_removal.txt
>>
> No difference.
>>
>>
>> Finally, these should confirm whether the PresDet works for you (for me NOT 
>> with pciehp but does work with acpiphp).
>> You should see PresDet- to PresDet+ changes in:
>>
> Yes, I do see the PresDet- to PresDet+ changes
> 
>> diff lspci.before_insertion.txt lspci.after_insertion.txt
> 
> 263c263
> <   LnkCap: Port #4, Speed 5GT/s, Width x1, ASPM L0s L1, Latency 
> L0 <1us, L1 <16us
> ---
>  >   LnkCap: Port #4, Speed 5GT/s, Width x1, ASPM L0s L1, Latency 
> L0 <512ns, L1 <16us
> 265c265
> <   LnkCtl: ASPM L0s L1 Enabled; RCB 64 bytes Disabled- Retrain- 
> CommClk-
> ---
>  >   LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- 
> CommClk+
> 267c267
> <   LnkSta: Speed 2.5GT/s, Width x0, TrErr- Train- SlotClk+ 
> DLActive- BWMgmt- ABWMgmt-
> ---
>  >   LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ 
> DLActive+ BWMgmt+ ABWMgmt-
> 272,273c272,273
> <   SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet- 
> Interlock-
> <   Changed: MRL- PresDet- LinkState-
> ---
>  >   SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet+ 
> Interlock-
>  >   Changed: MRL- PresDet- LinkState+
> 295,296c295,296
> < 40: 10 80 42 01 00 80 00 00 00 00 10 00 12 4c 12 04
> < 50: 03 00 01 10 60 b2 1c 00 08 00 00 00 00 00 00 00
> ---
>  > 40: 10 80 42 01 00 80

Re: 3.8.0-rc4+ - Oops on removing WinTV-HVR-1400 expresscard TV Tuner

2013-01-27 Thread Martin Mokrejs

Chris Clayton wrote:
> 
> 
> On 01/27/13 14:26, Martin Mokrejs wrote:
>> Chris Clayton wrote:
>>>
>>>
>>> On 01/27/13 12:18, Yijing Wang wrote:
>>>> 于 2013-01-27 19:19, Chris Clayton 写道:
>>>>> Hi Yijing
>>>>>
>>>>> On 01/27/13 02:45, Yijing Wang wrote:
>>>>>> 于 2013-01-27 4:54, Chris Clayton 写道:
>>>>>>> Hi Martin,
>>>>>>>
>>>>>>> On 01/24/13 19:21, Martin Mokrejs wrote:
>>>>>>>> Hi Chris,
>>>>>>>>   try to include in kernel only acpiphp and omit pciehp. Don't use 
>>>>>>>> modules but include
>>>>>>>> them statically. And try, in addition, check whether "pcie_aspm=off" 
>>>>>>>> in grub.conf helped.
>>>>>>>>
>>>>>>>
>>>>>>> Thanks for the tip. I had the pciehp driver installed, but it was a 
>>>>>>> module and not loaded. I didn't have acpiphp enabled at all. Building 
>>>>>>> them both in statically, appears to have papered over the cracks of the 
>>>>>>> oops :-)
>>>>>>
>>>>>> Not loaded pciehp driver? Remove the device from this slot without 
>>>>>> poweroff ?
>>>>>>
>>>>>
>>>>> That's correct. When I first encountered the oops, I did not have the 
>>>>> pciehp driver loaded and removing the device from the slot whilst the 
>>>>> laptop was powered on resulted in the oops.
>>>>
>>>> Hmm, that's unsafe and dangerous, because device now may be running.
>>>> There are two ways to trigger pci hot-add or hot-remove in linux, after 
>>>> loaded pciehp or acpiphp module
>>>> (the two modules only one can loaded into system at the same time). You 
>>>> can trigger hot-add/hot-remove by
>>>> sysfs interface under /sys/bus/pci/slots/[slot-name]/power or attention 
>>>> button on hardware (if your laptop supports that).
>>>>
>>>
>>> OK, thanks for the advice.
>>>
>>>>>>>
>>>>>>>>   The best would if you subscribe to linux-pci, and read my recent 
>>>>>>>> threads
>>>>>>>> about similar issues I had with express cards with Dell Vostro 3550. 
>>>>>>>> Further, there is
>>>>>>>> a lot of changes to PCI hotplug done by Yingahi Liu and Rafael 
>>>>>>>> Wysockij, just browse the
>>>>>>>> archives of linux-pci and see the pacthes and the discussion.
>>>>>>>
>>>>>>> Those discussions are way above my level of knowledge. I guess all this 
>>>>>>> work will be merged into mainline in due course, so I'll watch for them 
>>>>>>> in 3.9 or later. Unless, of course, there is a tree I could clone and 
>>>>>>> help test the changes with my laptop and expresscard.
>>>>>>>
>>>>>>> Hotplug isn't working at all on my Fujitsu laptop, so I can only get 
>>>>>>> the card recognised by rebooting with the card inserted (or by writing 
>>>>>>> 1 to/sys/bus/pci/rescan). There seem to be a few reports on this in the 
>>>>>>> kernel bugzilla, so I'll look through them and see what's being done.
>>>>>>
>>>>>> Hi Chris,
>>>>>>   What about use #modprobe pciehp pciehp_debug=1 pciehp_poll_mode=1 
>>>>>> pciehp_poll_time=1 ?
>>>>>>
>>>>>> Can you resend the dmesg log and "lspci -vvv" info after hotplug device 
>>>>>> from your Fujitsu laptop with above module parameters?
>>>>>>
>>>>>
>>>>> I wasn't sure whether or not the pciehp driver should be loaded on its 
>>>>> own or with the acpiphp driver also loaded. So I built them both as 
>>>>> modules and planned to try both, pciehp only and acpiphp only. However, 
>>>>> I've found that acpiphp will not load (regardless of whether or not 
>>>>> pciehp is already loaded). What I get is:
>>>>>
>>>>> [chris:~]$ sudo modprobe acpiphp debug=1
>>>>> modprobe: ERROR: could not insert 'acpiphp': No such device
>>
>> Are you sure you had pciehp already loaded?
>>
> Yes, I'm sure it was.

Ah, sorry, wanted t

Re: 3.8.0-rc4+ - Oops on removing WinTV-HVR-1400 expresscard TV Tuner

2013-01-27 Thread Martin Mokrejs

Chris Clayton wrote:
> 
> 
> On 01/27/13 12:18, Yijing Wang wrote:
>> 于 2013-01-27 19:19, Chris Clayton 写道:
>>> Hi Yijing
>>>
>>> On 01/27/13 02:45, Yijing Wang wrote:
>>>> 于 2013-01-27 4:54, Chris Clayton 写道:
>>>>> Hi Martin,
>>>>>
>>>>> On 01/24/13 19:21, Martin Mokrejs wrote:
>>>>>> Hi Chris,
>>>>>>  try to include in kernel only acpiphp and omit pciehp. Don't use 
>>>>>> modules but include
>>>>>> them statically. And try, in addition, check whether "pcie_aspm=off" in 
>>>>>> grub.conf helped.
>>>>>>
>>>>>
>>>>> Thanks for the tip. I had the pciehp driver installed, but it was a 
>>>>> module and not loaded. I didn't have acpiphp enabled at all. Building 
>>>>> them both in statically, appears to have papered over the cracks of the 
>>>>> oops :-)
>>>>
>>>> Not loaded pciehp driver? Remove the device from this slot without 
>>>> poweroff ?
>>>>
>>>
>>> That's correct. When I first encountered the oops, I did not have the 
>>> pciehp driver loaded and removing the device from the slot whilst the 
>>> laptop was powered on resulted in the oops.
>>
>> Hmm, that's unsafe and dangerous, because device now may be running.
>> There are two ways to trigger pci hot-add or hot-remove in linux, after 
>> loaded pciehp or acpiphp module
>> (the two modules only one can loaded into system at the same time). You can 
>> trigger hot-add/hot-remove by
>> sysfs interface under /sys/bus/pci/slots/[slot-name]/power or attention 
>> button on hardware (if your laptop supports that).
>>
> 
> OK, thanks for the advice.
> 
>>>>>
>>>>>>  The best would if you subscribe to linux-pci, and read my recent 
>>>>>> threads
>>>>>> about similar issues I had with express cards with Dell Vostro 3550. 
>>>>>> Further, there is
>>>>>> a lot of changes to PCI hotplug done by Yingahi Liu and Rafael Wysockij, 
>>>>>> just browse the
>>>>>> archives of linux-pci and see the pacthes and the discussion.
>>>>>
>>>>> Those discussions are way above my level of knowledge. I guess all this 
>>>>> work will be merged into mainline in due course, so I'll watch for them 
>>>>> in 3.9 or later. Unless, of course, there is a tree I could clone and 
>>>>> help test the changes with my laptop and expresscard.
>>>>>
>>>>> Hotplug isn't working at all on my Fujitsu laptop, so I can only get the 
>>>>> card recognised by rebooting with the card inserted (or by writing 1 
>>>>> to/sys/bus/pci/rescan). There seem to be a few reports on this in the 
>>>>> kernel bugzilla, so I'll look through them and see what's being done.
>>>>
>>>> Hi Chris,
>>>>  What about use #modprobe pciehp pciehp_debug=1 pciehp_poll_mode=1 
>>>> pciehp_poll_time=1 ?
>>>>
>>>> Can you resend the dmesg log and "lspci -vvv" info after hotplug device 
>>>> from your Fujitsu laptop with above module parameters?
>>>>
>>>
>>> I wasn't sure whether or not the pciehp driver should be loaded on its own 
>>> or with the acpiphp driver also loaded. So I built them both as modules and 
>>> planned to try both, pciehp only and acpiphp only. However, I've found that 
>>> acpiphp will not load (regardless of whether or not pciehp is already 
>>> loaded). What I get is:
>>>
>>> [chris:~]$ sudo modprobe acpiphp debug=1
>>> modprobe: ERROR: could not insert 'acpiphp': No such device

Are you sure you had pciehp already loaded?

>>>
>>
>> Currently, If your hardware support pciehp native hotplug, acpiphp driver 
>> will be rejected when loading it in system
>> (you can force loading it by add boot parameter pcie_aspm=off as Martin 
>> said).
>>
> 
> OK, thanks again for the advice. I've disabled the acpiphp driver.

Pitty. For me only with acpiphp works detection of express card in the slot. 
With pciehp
the PresDet is not updated properly upon removal/insertion and sometimes, 
probably as a result
of the previous, PresDet on the SltSta: line of lspci is not correct. So I 
moved away from pciehp.
I have a SandyBridge based laptop so I was hoping with your i5-based laptop you 
have also great
chance to get rid of pciehp issues.

Martin
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 3.8.0-rc4+ - Oops on removing WinTV-HVR-1400 expresscard TV Tuner

2013-01-27 Thread Martin Mokrejs

Chris Clayton wrote:
 
 
 On 01/27/13 12:18, Yijing Wang wrote:
 于 2013-01-27 19:19, Chris Clayton 写道:
 Hi Yijing

 On 01/27/13 02:45, Yijing Wang wrote:
 于 2013-01-27 4:54, Chris Clayton 写道:
 Hi Martin,

 On 01/24/13 19:21, Martin Mokrejs wrote:
 Hi Chris,
  try to include in kernel only acpiphp and omit pciehp. Don't use 
 modules but include
 them statically. And try, in addition, check whether pcie_aspm=off in 
 grub.conf helped.


 Thanks for the tip. I had the pciehp driver installed, but it was a 
 module and not loaded. I didn't have acpiphp enabled at all. Building 
 them both in statically, appears to have papered over the cracks of the 
 oops :-)

 Not loaded pciehp driver? Remove the device from this slot without 
 poweroff ?


 That's correct. When I first encountered the oops, I did not have the 
 pciehp driver loaded and removing the device from the slot whilst the 
 laptop was powered on resulted in the oops.

 Hmm, that's unsafe and dangerous, because device now may be running.
 There are two ways to trigger pci hot-add or hot-remove in linux, after 
 loaded pciehp or acpiphp module
 (the two modules only one can loaded into system at the same time). You can 
 trigger hot-add/hot-remove by
 sysfs interface under /sys/bus/pci/slots/[slot-name]/power or attention 
 button on hardware (if your laptop supports that).

 
 OK, thanks for the advice.
 

  The best would if you subscribe to linux-pci, and read my recent 
 threads
 about similar issues I had with express cards with Dell Vostro 3550. 
 Further, there is
 a lot of changes to PCI hotplug done by Yingahi Liu and Rafael Wysockij, 
 just browse the
 archives of linux-pci and see the pacthes and the discussion.

 Those discussions are way above my level of knowledge. I guess all this 
 work will be merged into mainline in due course, so I'll watch for them 
 in 3.9 or later. Unless, of course, there is a tree I could clone and 
 help test the changes with my laptop and expresscard.

 Hotplug isn't working at all on my Fujitsu laptop, so I can only get the 
 card recognised by rebooting with the card inserted (or by writing 1 
 to/sys/bus/pci/rescan). There seem to be a few reports on this in the 
 kernel bugzilla, so I'll look through them and see what's being done.

 Hi Chris,
  What about use #modprobe pciehp pciehp_debug=1 pciehp_poll_mode=1 
 pciehp_poll_time=1 ?

 Can you resend the dmesg log and lspci -vvv info after hotplug device 
 from your Fujitsu laptop with above module parameters?


 I wasn't sure whether or not the pciehp driver should be loaded on its own 
 or with the acpiphp driver also loaded. So I built them both as modules and 
 planned to try both, pciehp only and acpiphp only. However, I've found that 
 acpiphp will not load (regardless of whether or not pciehp is already 
 loaded). What I get is:

 [chris:~]$ sudo modprobe acpiphp debug=1
 modprobe: ERROR: could not insert 'acpiphp': No such device

Are you sure you had pciehp already loaded?



 Currently, If your hardware support pciehp native hotplug, acpiphp driver 
 will be rejected when loading it in system
 (you can force loading it by add boot parameter pcie_aspm=off as Martin 
 said).

 
 OK, thanks again for the advice. I've disabled the acpiphp driver.

Pitty. For me only with acpiphp works detection of express card in the slot. 
With pciehp
the PresDet is not updated properly upon removal/insertion and sometimes, 
probably as a result
of the previous, PresDet on the SltSta: line of lspci is not correct. So I 
moved away from pciehp.
I have a SandyBridge based laptop so I was hoping with your i5-based laptop you 
have also great
chance to get rid of pciehp issues.

Martin
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 3.8.0-rc4+ - Oops on removing WinTV-HVR-1400 expresscard TV Tuner

2013-01-27 Thread Martin Mokrejs

Chris Clayton wrote:
 
 
 On 01/27/13 14:26, Martin Mokrejs wrote:
 Chris Clayton wrote:


 On 01/27/13 12:18, Yijing Wang wrote:
 于 2013-01-27 19:19, Chris Clayton 写道:
 Hi Yijing

 On 01/27/13 02:45, Yijing Wang wrote:
 于 2013-01-27 4:54, Chris Clayton 写道:
 Hi Martin,

 On 01/24/13 19:21, Martin Mokrejs wrote:
 Hi Chris,
   try to include in kernel only acpiphp and omit pciehp. Don't use 
 modules but include
 them statically. And try, in addition, check whether pcie_aspm=off 
 in grub.conf helped.


 Thanks for the tip. I had the pciehp driver installed, but it was a 
 module and not loaded. I didn't have acpiphp enabled at all. Building 
 them both in statically, appears to have papered over the cracks of the 
 oops :-)

 Not loaded pciehp driver? Remove the device from this slot without 
 poweroff ?


 That's correct. When I first encountered the oops, I did not have the 
 pciehp driver loaded and removing the device from the slot whilst the 
 laptop was powered on resulted in the oops.

 Hmm, that's unsafe and dangerous, because device now may be running.
 There are two ways to trigger pci hot-add or hot-remove in linux, after 
 loaded pciehp or acpiphp module
 (the two modules only one can loaded into system at the same time). You 
 can trigger hot-add/hot-remove by
 sysfs interface under /sys/bus/pci/slots/[slot-name]/power or attention 
 button on hardware (if your laptop supports that).


 OK, thanks for the advice.


   The best would if you subscribe to linux-pci, and read my recent 
 threads
 about similar issues I had with express cards with Dell Vostro 3550. 
 Further, there is
 a lot of changes to PCI hotplug done by Yingahi Liu and Rafael 
 Wysockij, just browse the
 archives of linux-pci and see the pacthes and the discussion.

 Those discussions are way above my level of knowledge. I guess all this 
 work will be merged into mainline in due course, so I'll watch for them 
 in 3.9 or later. Unless, of course, there is a tree I could clone and 
 help test the changes with my laptop and expresscard.

 Hotplug isn't working at all on my Fujitsu laptop, so I can only get 
 the card recognised by rebooting with the card inserted (or by writing 
 1 to/sys/bus/pci/rescan). There seem to be a few reports on this in the 
 kernel bugzilla, so I'll look through them and see what's being done.

 Hi Chris,
   What about use #modprobe pciehp pciehp_debug=1 pciehp_poll_mode=1 
 pciehp_poll_time=1 ?

 Can you resend the dmesg log and lspci -vvv info after hotplug device 
 from your Fujitsu laptop with above module parameters?


 I wasn't sure whether or not the pciehp driver should be loaded on its 
 own or with the acpiphp driver also loaded. So I built them both as 
 modules and planned to try both, pciehp only and acpiphp only. However, 
 I've found that acpiphp will not load (regardless of whether or not 
 pciehp is already loaded). What I get is:

 [chris:~]$ sudo modprobe acpiphp debug=1
 modprobe: ERROR: could not insert 'acpiphp': No such device

 Are you sure you had pciehp already loaded?

 Yes, I'm sure it was.

Ah, sorry, wanted to say Are you sure you had NOT pciehp already loaded 
(loaded before)?. If you retry without loading it ever you might succeed with 
acpiphp.



 Currently, If your hardware support pciehp native hotplug, acpiphp driver 
 will be rejected when loading it in system
 (you can force loading it by add boot parameter pcie_aspm=off as Martin 
 said).


 OK, thanks again for the advice. I've disabled the acpiphp driver.

 Pitty. For me only with acpiphp works detection of express card in the slot. 
 With pciehp
 the PresDet is not updated properly upon removal/insertion and sometimes, 
 probably as a result
 of the previous, PresDet on the SltSta: line of lspci is not correct. So I 
 moved away from pciehp.
 I have a SandyBridge based laptop so I was hoping with your i5-based laptop 
 you have also great
 chance to get rid of pciehp issues.

 
 I've just (very carefully) set this up again (i.e. no pciehp driver (module 
 or builtin), acpiphp driver built in and pcie_aspm=off on the kernel command 
 line (via grub). My card is not detected on insertion. :-(

Do you have any other express card around to try if it works at all? Try that 
always after a cold boot.

Posting a diff result of the below procedure might help:

# lspci -vvvxxx  lspci.before_insertion.txt

[plug your card into the slot]

# lspci -vvvxxx  lspci.after_insertion.txt

[ unplug your card]

# lspci -vvvxxx  lspci.after_1st_removal.txt

[re-plug your card into the slot]

# lspci -vvvxxx  lspci.after_1st_re-insertion.txt

[ unplug your card]

# lspci -vvvxxx  lspci.after_2nd_removal.txt

Then compare them using diff. These should have no difference:

diff lspci.after_insertion.txt lspci.after_1st_re-insertion.txt
diff lspci.after_1st_removal.txt lspci.after_2nd_removal.txt


These may have only little difference, or none:

diff lspci.before_insertion.txt lspci.after_1st_removal.txt
diff

Re: 3.8.0-rc4+ - Oops on removing WinTV-HVR-1400 expresscard TV Tuner

2013-01-27 Thread Martin Mokrejs

Hi Chris,

Chris Clayton wrote:
 Thanks again, Martin.
 
 Firstly, maybe we should remove the linux-media list from the copy list. I 
 imagine this hotplug stuff is just noise to them.
 
 [snip]
 Do you have any other express card around to try if it works at all? Try 
 that always after a cold boot.

 Not at the moment, but I ordered at USB3 expresscard yesterday, so I will 
 have one soon.
 
 Posting a diff result of the below procedure might help:

 # lspci -vvvxxx  lspci.before_insertion.txt

 [plug your card into the slot]

 # lspci -vvvxxx  lspci.after_insertion.txt

 [ unplug your card]

 # lspci -vvvxxx  lspci.after_1st_removal.txt

 [re-plug your card into the slot]

 # lspci -vvvxxx  lspci.after_1st_re-insertion.txt

 [ unplug your card]

 # lspci -vvvxxx  lspci.after_2nd_removal.txt

 
 OK, I've been using kernel 3.8.0-rc kernels so far, but given that is still 
 under development, I've switched to 3.7.4, mainly because you are having 
 success with 3.7.x, acpiphp and pcie_aspm=off. I verified the environment as 
 follows:
 
 [chris:~]$ cat /proc/cmdline
 root=/dev/sda5 pcie_aspm=off ro resume=/dev/sda6
 [chris:~]$ dmesg | grep ASPM
 [0.00] PCIe ASPM is disabled
 [0.348959]  pci:00: ACPI _OSC support notification failed, disabling 
 PCIe ASPM
 [chris:~]$ dmesg | grep acpiphp
 [0.400846] acpiphp: ACPI Hot Plug PCI Controller Driver version: 0.5
 [chris:~]$ dmesg | grep pciehp
 [chris:~]$ uname -a
 Linux laptop 3.7.4 #13 SMP PREEMPT Sun Jan 27 18:39:39 GMT 2013 i686 GNU/Linux
 

vostro ~ # cat /proc/cmdline
root=/dev/sda5 pciehp.pciehp_debug=1 slub_debug=AFPZ pcie_aspm=off
vostro ~ # dmesg | grep ASPM
[0.00] PCIe ASPM is disabled
vostro ~ # dmesg | grep acpiphp
[2.449038] acpiphp: ACPI Hot Plug PCI Controller Driver version: 0.5
[2.453757] acpiphp: Slot [1] registered
vostro ~ # uname -a
Linux vostro 3.7.4-default #2 SMP Mon Jan 21 22:45:22 MET 2013 x86_64 Intel(R) 
Core(TM) i7-2640M CPU @ 2.80GHz GenuineIntel GNU/Linux
vostro ~ # 

 
 Then compare them using diff. These should have no difference:

 diff lspci.after_insertion.txt lspci.after_1st_re-insertion.txt
 diff lspci.after_1st_removal.txt lspci.after_2nd_removal.txt

 Correct, there were no differences.
 

 These may have only little difference, or none:

 diff lspci.before_insertion.txt lspci.after_1st_removal.txt
 
 263c263
LnkCap: Port #4, Speed 5GT/s, Width x1, ASPM L0s L1, Latency 
 L0 1us, L1 16us
 ---
 LnkCap: Port #4, Speed 5GT/s, Width x1, ASPM L0s L1, Latency 
 L0 512ns, L1 16us
 265c265
LnkCtl: ASPM L0s L1 Enabled; RCB 64 bytes Disabled- Retrain- 
 CommClk-
 ---
 LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- 
 CommClk+
 267c267
LnkSta: Speed 2.5GT/s, Width x0, TrErr- Train- SlotClk+ 
 DLActive- BWMgmt- ABWMgmt-
 ---
 LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ 
 DLActive- BWMgmt+ ABWMgmt-
 273c273
Changed: MRL- PresDet- LinkState-
 ---
 Changed: MRL- PresDet- LinkState+
 295,296c295,296
  40: 10 80 42 01 00 80 00 00 00 00 10 00 12 4c 12 04
  50: 03 00 01 10 60 b2 1c 00 08 00 00 00 00 00 00 00
 ---
   40: 10 80 42 01 00 80 00 00 00 00 10 00 12 3c 12 04
   50: 40 00 11 50 60 b2 1c 00 08 00 00 01 00 00 00 00
 
 diff lspci.after_1st_removal.txt lspci.after_2nd_removal.txt

 No difference.


 Finally, these should confirm whether the PresDet works for you (for me NOT 
 with pciehp but does work with acpiphp).
 You should see PresDet- to PresDet+ changes in:

 Yes, I do see the PresDet- to PresDet+ changes
 
 diff lspci.before_insertion.txt lspci.after_insertion.txt
 
 263c263
LnkCap: Port #4, Speed 5GT/s, Width x1, ASPM L0s L1, Latency 
 L0 1us, L1 16us
 ---
 LnkCap: Port #4, Speed 5GT/s, Width x1, ASPM L0s L1, Latency 
 L0 512ns, L1 16us
 265c265
LnkCtl: ASPM L0s L1 Enabled; RCB 64 bytes Disabled- Retrain- 
 CommClk-
 ---
 LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- 
 CommClk+
 267c267
LnkSta: Speed 2.5GT/s, Width x0, TrErr- Train- SlotClk+ 
 DLActive- BWMgmt- ABWMgmt-
 ---
 LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ 
 DLActive+ BWMgmt+ ABWMgmt-
 272,273c272,273
SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet- 
 Interlock-
Changed: MRL- PresDet- LinkState-
 ---
 SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet+ 
 Interlock-
 Changed: MRL- PresDet- LinkState+
 295,296c295,296
  40: 10 80 42 01 00 80 00 00 00 00 10 00 12 4c 12 04
  50: 03 00 01 10 60 b2 1c 00 08 00 00 00 00 00 00 00
 ---
   40: 10 80 42 01 00 80 00 00 00 00 10 00 12 3c 12 04
   50: 40 00 11 70 60 b2 1c 00 08 00 40 01 00 00 00 00
 
 diff lspci.after_1st_removal.txt lspci.after_1st_re-insertion.txt
 267c267
LnkSta: Speed 2.5GT/s, Width x1,

Re: 3.8.0-rc4+ - Oops on removing WinTV-HVR-1400 expresscard TV Tuner

2013-01-26 Thread Martin Mokrejs

Hi Chris,

Chris Clayton wrote:
> Hi Martin,
> 
> On 01/24/13 19:21, Martin Mokrejs wrote:
>> Hi Chris,
>>try to include in kernel only acpiphp and omit pciehp. Don't use modules 
>> but include
>> them statically. And try, in addition, check whether "pcie_aspm=off" in 
>> grub.conf helped.
>>
> 
> Thanks for the tip. I had the pciehp driver installed, but it was a module 
> and not loaded. I didn't have acpiphp enabled at all. Building them both in 
> statically, appears to have papered over the cracks of the oops :-)
> 
>>The best would if you subscribe to linux-pci, and read my recent threads
>> about similar issues I had with express cards with Dell Vostro 3550. 
>> Further, there is
>> a lot of changes to PCI hotplug done by Yingahi Liu and Rafael Wysockij, 
>> just browse the
>> archives of linux-pci and see the pacthes and the discussion.
> 
> Those discussions are way above my level of knowledge. I guess all this work 
> will be merged into mainline in due course, so I'll watch for them in 3.9 or 
> later. Unless, of course, there is a tree I could clone and help test the 
> changes with my laptop and expresscard.
> 
> Hotplug isn't working at all on my Fujitsu laptop, so I can only get the card 
> recognised by rebooting with the card inserted (or by writing 1 
> to/sys/bus/pci/rescan). There seem to be a few reports on this in the kernel 
> bugzilla, so I'll look through them and see what's being done.

That's what I suspected. Compile in statically acpiphp, no pciehp at all (not 
even as a module).
Then it might work for you -- at least it does for me, provided I use 
"pcie_aspm=off".

Martin

> 
> Thanks again.
> 
> Chris
> 
>> Martin
>>
>> Chris Clayton wrote:
>>> Hi,
>>>
>>> I've today taken delivery of a WinTV-HVR-1400 expresscard TV Tuner and got 
>>> an Oops when I removed from the expresscard slot in my laptop. I will quite 
>>> understand if the response to this report is "don't do that!", but in that 
>>> case, how should one remove one of these cards?
>>>
>>> I have attached three files:
>>>
>>> 1. the dmesg output from when I rebooted the machine after the oops. I have 
>>> turned debugging on in the dib700p and cx23885 modules via modules options 
>>> in /etc/modprobe.d/hvr1400.conf;
>>>
>>> 2. the .config file for the kernel that oopsed.
>>>
>>> 3. the text of the oops message. I've typed this up from a photograph of 
>>> the screen because the laptop was locked up and there was nothing in the 
>>> log files. Apologies for any typos, but I have tried to be careful.
>>>
>>> Assuming the answer isn't don't do that, let me know if I can provide any 
>>> additional diagnostics, test any patches, etc. Please, however, cc me as 
>>> I'm not subscribed.
>>>
>>> Chris
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 3.8.0-rc4+ - Oops on removing WinTV-HVR-1400 expresscard TV Tuner

2013-01-26 Thread Martin Mokrejs

Hi Chris,

Chris Clayton wrote:
 Hi Martin,
 
 On 01/24/13 19:21, Martin Mokrejs wrote:
 Hi Chris,
try to include in kernel only acpiphp and omit pciehp. Don't use modules 
 but include
 them statically. And try, in addition, check whether pcie_aspm=off in 
 grub.conf helped.

 
 Thanks for the tip. I had the pciehp driver installed, but it was a module 
 and not loaded. I didn't have acpiphp enabled at all. Building them both in 
 statically, appears to have papered over the cracks of the oops :-)
 
The best would if you subscribe to linux-pci, and read my recent threads
 about similar issues I had with express cards with Dell Vostro 3550. 
 Further, there is
 a lot of changes to PCI hotplug done by Yingahi Liu and Rafael Wysockij, 
 just browse the
 archives of linux-pci and see the pacthes and the discussion.
 
 Those discussions are way above my level of knowledge. I guess all this work 
 will be merged into mainline in due course, so I'll watch for them in 3.9 or 
 later. Unless, of course, there is a tree I could clone and help test the 
 changes with my laptop and expresscard.
 
 Hotplug isn't working at all on my Fujitsu laptop, so I can only get the card 
 recognised by rebooting with the card inserted (or by writing 1 
 to/sys/bus/pci/rescan). There seem to be a few reports on this in the kernel 
 bugzilla, so I'll look through them and see what's being done.

That's what I suspected. Compile in statically acpiphp, no pciehp at all (not 
even as a module).
Then it might work for you -- at least it does for me, provided I use 
pcie_aspm=off.

Martin

 
 Thanks again.
 
 Chris
 
 Martin

 Chris Clayton wrote:
 Hi,

 I've today taken delivery of a WinTV-HVR-1400 expresscard TV Tuner and got 
 an Oops when I removed from the expresscard slot in my laptop. I will quite 
 understand if the response to this report is don't do that!, but in that 
 case, how should one remove one of these cards?

 I have attached three files:

 1. the dmesg output from when I rebooted the machine after the oops. I have 
 turned debugging on in the dib700p and cx23885 modules via modules options 
 in /etc/modprobe.d/hvr1400.conf;

 2. the .config file for the kernel that oopsed.

 3. the text of the oops message. I've typed this up from a photograph of 
 the screen because the laptop was locked up and there was nothing in the 
 log files. Apologies for any typos, but I have tried to be careful.

 Assuming the answer isn't don't do that, let me know if I can provide any 
 additional diagnostics, test any patches, etc. Please, however, cc me as 
 I'm not subscribed.

 Chris
 
 
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] USB: XHCI: fix memory leak of URB-private data

2013-01-24 Thread Martin Mokrejs

Greg KH wrote:
> On Thu, Jan 24, 2013 at 10:53:25PM +0100, Martin Mokrejs wrote:
>> Hi Sarah and Alan,
>>   I just saw 3.7.5 patches announced by Greg but I don't see this path in 
>> there.
>> And, don't know but maybe this applies to older stable kernels as well?
>> Where will this patch posted originally to linux-usb land?
>>
>> Ah, is that because the email was actually NOT sent to "stable@"? ;-)
> 
> No.  It's because the patch isn't in Linus's tree yet, which is one of
> the requirements for a patch to be able to get into the stable kernel
> releases.
> 
> Please read the kernel file, Documentation/stable_kernel_rules.txt for
> more details if you are curious.

Thank you Greg!

Aside from the fact that I do not know how much serious a memleak is and whether
it is eligible for -stable. Other than that, it was helpful to read the file.
Will see what happens. Meanwhile will continue running my patched kernel. ;-)
Martin
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] USB: XHCI: fix memory leak of URB-private data

2013-01-24 Thread Martin Mokrejs

Hi Sarah and Alan,
  I just saw 3.7.5 patches announced by Greg but I don't see this path in there.
And, don't know but maybe this applies to older stable kernels as well?
Where will this patch posted originally to linux-usb land?

Ah, is that because the email was actually NOT sent to "stable@"? ;-)

Date:   Thu, 17 Jan 2013 10:32:16 -0500 (EST)
From: Alan Stern 
To: Sarah Sharp 
cc: Martin Mokrejs ,
  USB list 
Subject: [PATCH] USB: XHCI: fix memory leak of URB-private data
Message-ID: 

Thank you,
Martin

Alan Stern wrote:
> This patch (as1640) fixes a memory leak in xhci-hcd.  The urb_priv
> data structure isn't always deallocated in the handle_tx_event()
> routine for non-control transfers.  The patch adds a kfree() call so
> that all paths end up freeing the memory properly.
> 
> Signed-off-by: Alan Stern 
> Reported-and-tested-by: Martin Mokrejs 
> CC: 
> 
> ---
> 
>  drivers/usb/host/xhci-ring.c |2 ++
>  1 file changed, 2 insertions(+)
> 
> Index: usb-3.7/drivers/usb/host/xhci-ring.c
> ===
> --- usb-3.7.orig/drivers/usb/host/xhci-ring.c
> +++ usb-3.7/drivers/usb/host/xhci-ring.c
> @@ -2580,6 +2580,8 @@ cleanup:
>   (trb_comp_code != COMP_STALL &&
>   trb_comp_code != COMP_BABBLE))
>   xhci_urb_free_priv(xhci, urb_priv);
> + else
> + kfree(urb_priv);
>  
>   usb_hcd_unlink_urb_from_ep(bus_to_hcd(urb->dev->bus), 
> urb);
>   if ((urb->actual_length != urb->transfer_buffer_length 
> &&
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-usb" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 3.8.0-rc4+ - Oops on removing WinTV-HVR-1400 expresscard TV Tuner

2013-01-24 Thread Martin Mokrejs

Hi Chris,
  try to include in kernel only acpiphp and omit pciehp. Don't use modules but 
include
them statically. And try, in addition, check whether "pcie_aspm=off" in 
grub.conf helped.

  The best would if you subscribe to linux-pci, and read my recent threads
about similar issues I had with express cards with Dell Vostro 3550. Further, 
there is
a lot of changes to PCI hotplug done by Yingahi Liu and Rafael Wysockij, just 
browse the
archives of linux-pci and see the pacthes and the discussion.
Martin

Chris Clayton wrote:
> Hi,
> 
> I've today taken delivery of a WinTV-HVR-1400 expresscard TV Tuner and got an 
> Oops when I removed from the expresscard slot in my laptop. I will quite 
> understand if the response to this report is "don't do that!", but in that 
> case, how should one remove one of these cards?
> 
> I have attached three files:
> 
> 1. the dmesg output from when I rebooted the machine after the oops. I have 
> turned debugging on in the dib700p and cx23885 modules via modules options in 
> /etc/modprobe.d/hvr1400.conf;
> 
> 2. the .config file for the kernel that oopsed.
> 
> 3. the text of the oops message. I've typed this up from a photograph of the 
> screen because the laptop was locked up and there was nothing in the log 
> files. Apologies for any typos, but I have tried to be careful.
> 
> Assuming the answer isn't don't do that, let me know if I can provide any 
> additional diagnostics, test any patches, etc. Please, however, cc me as I'm 
> not subscribed.
> 
> Chris
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 3.8.0-rc4+ - Oops on removing WinTV-HVR-1400 expresscard TV Tuner

2013-01-24 Thread Martin Mokrejs

Hi Chris,
  try to include in kernel only acpiphp and omit pciehp. Don't use modules but 
include
them statically. And try, in addition, check whether pcie_aspm=off in 
grub.conf helped.

  The best would if you subscribe to linux-pci, and read my recent threads
about similar issues I had with express cards with Dell Vostro 3550. Further, 
there is
a lot of changes to PCI hotplug done by Yingahi Liu and Rafael Wysockij, just 
browse the
archives of linux-pci and see the pacthes and the discussion.
Martin

Chris Clayton wrote:
 Hi,
 
 I've today taken delivery of a WinTV-HVR-1400 expresscard TV Tuner and got an 
 Oops when I removed from the expresscard slot in my laptop. I will quite 
 understand if the response to this report is don't do that!, but in that 
 case, how should one remove one of these cards?
 
 I have attached three files:
 
 1. the dmesg output from when I rebooted the machine after the oops. I have 
 turned debugging on in the dib700p and cx23885 modules via modules options in 
 /etc/modprobe.d/hvr1400.conf;
 
 2. the .config file for the kernel that oopsed.
 
 3. the text of the oops message. I've typed this up from a photograph of the 
 screen because the laptop was locked up and there was nothing in the log 
 files. Apologies for any typos, but I have tried to be careful.
 
 Assuming the answer isn't don't do that, let me know if I can provide any 
 additional diagnostics, test any patches, etc. Please, however, cc me as I'm 
 not subscribed.
 
 Chris
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] USB: XHCI: fix memory leak of URB-private data

2013-01-24 Thread Martin Mokrejs

Hi Sarah and Alan,
  I just saw 3.7.5 patches announced by Greg but I don't see this path in there.
And, don't know but maybe this applies to older stable kernels as well?
Where will this patch posted originally to linux-usb land?

Ah, is that because the email was actually NOT sent to stable@? ;-)

Date:   Thu, 17 Jan 2013 10:32:16 -0500 (EST)
From: Alan Stern st...@rowland.harvard.edu
To: Sarah Sharp sarah.a.sh...@linux.intel.com
cc: Martin Mokrejs mmokr...@fold.natur.cuni.cz,
  USB list linux-...@vger.kernel.org
Subject: [PATCH] USB: XHCI: fix memory leak of URB-private data
Message-ID: pine.lnx.4.44l0.1301171031260.1339-100...@iolanthe.rowland.org

Thank you,
Martin

Alan Stern wrote:
 This patch (as1640) fixes a memory leak in xhci-hcd.  The urb_priv
 data structure isn't always deallocated in the handle_tx_event()
 routine for non-control transfers.  The patch adds a kfree() call so
 that all paths end up freeing the memory properly.
 
 Signed-off-by: Alan Stern st...@rowland.harvard.edu
 Reported-and-tested-by: Martin Mokrejs mmokr...@fold.natur.cuni.cz
 CC: sta...@vger.kernel.org
 
 ---
 
  drivers/usb/host/xhci-ring.c |2 ++
  1 file changed, 2 insertions(+)
 
 Index: usb-3.7/drivers/usb/host/xhci-ring.c
 ===
 --- usb-3.7.orig/drivers/usb/host/xhci-ring.c
 +++ usb-3.7/drivers/usb/host/xhci-ring.c
 @@ -2580,6 +2580,8 @@ cleanup:
   (trb_comp_code != COMP_STALL 
   trb_comp_code != COMP_BABBLE))
   xhci_urb_free_priv(xhci, urb_priv);
 + else
 + kfree(urb_priv);
  
   usb_hcd_unlink_urb_from_ep(bus_to_hcd(urb-dev-bus), 
 urb);
   if ((urb-actual_length != urb-transfer_buffer_length 
 
 
 --
 To unsubscribe from this list: send the line unsubscribe linux-usb in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 
 
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] USB: XHCI: fix memory leak of URB-private data

2013-01-24 Thread Martin Mokrejs

Greg KH wrote:
 On Thu, Jan 24, 2013 at 10:53:25PM +0100, Martin Mokrejs wrote:
 Hi Sarah and Alan,
   I just saw 3.7.5 patches announced by Greg but I don't see this path in 
 there.
 And, don't know but maybe this applies to older stable kernels as well?
 Where will this patch posted originally to linux-usb land?

 Ah, is that because the email was actually NOT sent to stable@? ;-)
 
 No.  It's because the patch isn't in Linus's tree yet, which is one of
 the requirements for a patch to be able to get into the stable kernel
 releases.
 
 Please read the kernel file, Documentation/stable_kernel_rules.txt for
 more details if you are curious.

Thank you Greg!

Aside from the fact that I do not know how much serious a memleak is and whether
it is eligible for -stable. Other than that, it was helpful to read the file.
Will see what happens. Meanwhile will continue running my patched kernel. ;-)
Martin
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

linux-3.7.4: kmemleak in sctp_sysctl_net_register()?

2013-01-23 Thread Martin Mokrejs

Hi,
  today I got the following report from the kernel, looks it happened when
I started/used/quit chromium browser. I haven't seen this with 3.7.1 but
I use builtin kmemleak detector only for 2-3 weeks.

unreferenced object 0x880402d08000 (size 2048):
  comm "chrome_sandbox", pid 18437, jiffies 4310887172 (age 9097.630s)
  hex dump (first 32 bytes):
b2 68 89 81 ff ff ff ff 20 04 04 f8 01 88 ff ff  .h.. ...
04 00 00 00 a4 01 00 00 00 00 00 00 00 00 00 00  
  backtrace:
[] kmemleak_alloc+0x21/0x3e
[] slab_post_alloc_hook+0x28/0x2a
[] __kmalloc_track_caller+0xf1/0x104
[] kmemdup+0x1b/0x30
[] sctp_sysctl_net_register+0x1f/0x72
[] sctp_net_init+0x100/0x39f
[] ops_init+0xc6/0xf5
[] setup_net+0x4c/0xd0
[] copy_net_ns+0x6d/0xd6
[] create_new_namespaces+0xd7/0x147
[] copy_namespaces+0x63/0x99
[] copy_process+0xa65/0x1233
[] do_fork+0x10b/0x271
[] sys_clone+0x23/0x25
[] stub_clone+0x13/0x20
[] 0x


Please let me know if you need more info, like dmesg, .config or other.
Hope this helps.
Martin
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: linux-3.7.[1,4]: kmemleak in i801_probe

2013-01-23 Thread Martin Mokrejs

Hi Jean,

Jean Delvare wrote:
> Hi Martin,
> 
> On Wed, 23 Jan 2013 12:15:37 +0100, Martin Mokrejs wrote:
>> Hi,
>>   I already reported this to lkml recently with linux-3.7.1 but this is to 
>> let you know
>> that with 3.7.4 I am still getting this kmemleak reported by the kernel.
> 
> I don't read LKML.
> 
>> unreferenced object 0x88040b614690 (size 256):
>>   comm "swapper/0", pid 1, jiffies 4294937573 (age 133834.550s)
>>   hex dump (first 32 bytes):
>> 00 00 00 00 ad 4e ad de ff ff ff ff 00 00 00 00  .N..
>> ff ff ff ff ff ff ff ff 08 7f 5d 82 ff ff ff ff  ..].
>>   backtrace:
>> [] kmemleak_alloc+0x21/0x3e
>> [] slab_post_alloc_hook+0x28/0x2a
>> [] __kmalloc+0xf2/0x104
>> [] kzalloc.constprop.14+0xe/0x10
>> [] device_private_init+0x14/0x63
>> [] dev_set_drvdata+0x19/0x2f
>> [] i801_probe+0x5e/0x451
>> [] local_pci_probe+0x39/0x61
>> [] pci_device_probe+0xc6/0xf3
>> [] driver_probe_device+0xa9/0x1c1
>> [] __driver_attach+0x5a/0x7e
>> [] bus_for_each_dev+0x57/0x83
>> [] driver_attach+0x19/0x1b
>> [] bus_add_driver+0xa8/0x1fa
>> [] driver_register+0x8c/0x106
>> [] __pci_register_driver+0x59/0x5d
> 
> I am using the i2c-i801 driver, enabled kmemleak, but I don't get this
> leak. Did you have to do anything special to get it? Didn't you get a

Based on the dmesg timestamp I think I just logged in through xdm. Eh.
Actually, xdm crashes for me, I have to do in the framebuffer VT console:

root # /etc/init.d/xdm stop
user $ startx

and happily use my X. I have a bugreport opened at 
https://bugs.freedesktop.org/show_bug.cgi?id=56608
but I doubt it is related to i2c_801 driver. But it is not clear
why I cannot just use xdm but can always start X11 via startx.
And actually, rarely, but without reinstalling my kernel or x11
server or drivers, I sometimes (1/20 attempts?) I can login through xdm. But
comparing Xorg.log files from successful xdm login against those unsuccessful
did not help so far. Only reordered items, probably due to autoconfig.
So, I don't think it helps you with isolating the i2c_801 driver memleak.

> similar leak with older kernels? Do you get a similar leak (with
> reference to dev_set_drvdata)?

With 3.7.1 I was getting same stacktrace:

unreferenced object 0x88040b1c5230 (size 256):
  comm "swapper/0", pid 1, jiffies 4294937570 (age 182492.630s)
  hex dump (first 32 bytes):
00 00 00 00 ad 4e ad de ff ff ff ff 00 00 00 00  .N..
ff ff ff ff ff ff ff ff 38 3f 5d 82 ff ff ff ff  8?].
  backtrace:
[] kmemleak_alloc+0x21/0x3e
[] slab_post_alloc_hook+0x28/0x2a
[] __kmalloc+0xf2/0x104
[] kzalloc.constprop.14+0xe/0x10
[] device_private_init+0x14/0x63
[] dev_set_drvdata+0x19/0x2f
[] i801_probe+0x5e/0x451
[] local_pci_probe+0x5b/0xa2
[] pci_device_probe+0xc8/0xf7
[] driver_probe_device+0xa9/0x1c1
[] __driver_attach+0x5a/0x7e
[] bus_for_each_dev+0x57/0x83
[] driver_attach+0x19/0x1b
[] bus_add_driver+0xa8/0x1fa
[] driver_register+0x8c/0x106
[] __pci_register_driver+0x5a/0x5e

Before 3.7.1 I did not use kmemleak detector. while searching my older 
emails/reports
I found only that I loaded in the past both drivers (on a 2.6.32.59 kernel):

Mar 26 11:21:55 vostro kernel: i801_smbus :00:1f.3: PCI INT C -> GSI 18 
(level, low) -> IRQ 18
Mar 26 11:21:55 vostro kernel: ACPI: If an ACPI driver is available for this 
device, you should use it instead of the native driver


And here the relevant line from lspci from that time:
00:1f.3 SMBus: Intel Corporation 6 Series/C200 Series Chipset Family SMBus 
Controller (rev 05)

00:1f.3 SMBus: Intel Corporation 6 Series/C200 Series Chipset Family SMBus 
Controller (rev 05)
Subsystem: Dell Device 04b3
Flags: medium devsel, IRQ 18
Memory at f7f05000 (64-bit, non-prefetchable) [size=256]
I/O ports at f040 [size=32]
Kernel modules: i2c-i801


I don't think this will help you now. :(


> 
> I can see that dev_set_drvdata may allocate memory (which I didn't
> know) and I admit I don't see where it gets released, however this is
> all happening in the driver core and isn't specific to the i2c-i801
> driver, so if there really is a leak there, you should see it in all
> drivers.

I am not a kernel developer at all but maybe that little bit points out that the
kmemleak was reported when I was pulling in/out my external USB drives?

Martin
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: linux-3.7.[1,4]: kmemleak in i801_probe

2013-01-23 Thread Martin Mokrejs

Hi Jean,

Jean Delvare wrote:
 Hi Martin,
 
 On Wed, 23 Jan 2013 12:15:37 +0100, Martin Mokrejs wrote:
 Hi,
   I already reported this to lkml recently with linux-3.7.1 but this is to 
 let you know
 that with 3.7.4 I am still getting this kmemleak reported by the kernel.
 
 I don't read LKML.
 
 unreferenced object 0x88040b614690 (size 256):
   comm swapper/0, pid 1, jiffies 4294937573 (age 133834.550s)
   hex dump (first 32 bytes):
 00 00 00 00 ad 4e ad de ff ff ff ff 00 00 00 00  .N..
 ff ff ff ff ff ff ff ff 08 7f 5d 82 ff ff ff ff  ..].
   backtrace:
 [815b4aad] kmemleak_alloc+0x21/0x3e
 [81110352] slab_post_alloc_hook+0x28/0x2a
 [8111288a] __kmalloc+0xf2/0x104
 [81305165] kzalloc.constprop.14+0xe/0x10
 [813055c6] device_private_init+0x14/0x63
 [813076a0] dev_set_drvdata+0x19/0x2f
 [815c4f5e] i801_probe+0x5e/0x451
 [81280e40] local_pci_probe+0x39/0x61
 [81281f53] pci_device_probe+0xc6/0xf3
 [81307c5d] driver_probe_device+0xa9/0x1c1
 [81307dcf] __driver_attach+0x5a/0x7e
 [8130650a] bus_for_each_dev+0x57/0x83
 [81307806] driver_attach+0x19/0x1b
 [813073d8] bus_add_driver+0xa8/0x1fa
 [81308241] driver_register+0x8c/0x106
 [81281b4e] __pci_register_driver+0x59/0x5d
 
 I am using the i2c-i801 driver, enabled kmemleak, but I don't get this
 leak. Did you have to do anything special to get it? Didn't you get a

Based on the dmesg timestamp I think I just logged in through xdm. Eh.
Actually, xdm crashes for me, I have to do in the framebuffer VT console:

root # /etc/init.d/xdm stop
user $ startx

and happily use my X. I have a bugreport opened at 
https://bugs.freedesktop.org/show_bug.cgi?id=56608
but I doubt it is related to i2c_801 driver. But it is not clear
why I cannot just use xdm but can always start X11 via startx.
And actually, rarely, but without reinstalling my kernel or x11
server or drivers, I sometimes (1/20 attempts?) I can login through xdm. But
comparing Xorg.log files from successful xdm login against those unsuccessful
did not help so far. Only reordered items, probably due to autoconfig.
So, I don't think it helps you with isolating the i2c_801 driver memleak.

 similar leak with older kernels? Do you get a similar leak (with
 reference to dev_set_drvdata)?

With 3.7.1 I was getting same stacktrace:

unreferenced object 0x88040b1c5230 (size 256):
  comm swapper/0, pid 1, jiffies 4294937570 (age 182492.630s)
  hex dump (first 32 bytes):
00 00 00 00 ad 4e ad de ff ff ff ff 00 00 00 00  .N..
ff ff ff ff ff ff ff ff 38 3f 5d 82 ff ff ff ff  8?].
  backtrace:
[815b1dbd] kmemleak_alloc+0x21/0x3e
[81110536] slab_post_alloc_hook+0x28/0x2a
[81112a6e] __kmalloc+0xf2/0x104
[81302bd5] kzalloc.constprop.14+0xe/0x10
[81303036] device_private_init+0x14/0x63
[81305110] dev_set_drvdata+0x19/0x2f
[815c1ed4] i801_probe+0x5e/0x451
[81280fb3] local_pci_probe+0x5b/0xa2
[81282074] pci_device_probe+0xc8/0xf7
[813056cd] driver_probe_device+0xa9/0x1c1
[8130583f] __driver_attach+0x5a/0x7e
[81303f7a] bus_for_each_dev+0x57/0x83
[81305276] driver_attach+0x19/0x1b
[81304e48] bus_add_driver+0xa8/0x1fa
[81305cb1] driver_register+0x8c/0x106
[81281c6d] __pci_register_driver+0x5a/0x5e

Before 3.7.1 I did not use kmemleak detector. while searching my older 
emails/reports
I found only that I loaded in the past both drivers (on a 2.6.32.59 kernel):

Mar 26 11:21:55 vostro kernel: i801_smbus :00:1f.3: PCI INT C - GSI 18 
(level, low) - IRQ 18
Mar 26 11:21:55 vostro kernel: ACPI: If an ACPI driver is available for this 
device, you should use it instead of the native driver


And here the relevant line from lspci from that time:
00:1f.3 SMBus: Intel Corporation 6 Series/C200 Series Chipset Family SMBus 
Controller (rev 05)

00:1f.3 SMBus: Intel Corporation 6 Series/C200 Series Chipset Family SMBus 
Controller (rev 05)
Subsystem: Dell Device 04b3
Flags: medium devsel, IRQ 18
Memory at f7f05000 (64-bit, non-prefetchable) [size=256]
I/O ports at f040 [size=32]
Kernel modules: i2c-i801


I don't think this will help you now. :(


 
 I can see that dev_set_drvdata may allocate memory (which I didn't
 know) and I admit I don't see where it gets released, however this is
 all happening in the driver core and isn't specific to the i2c-i801
 driver, so if there really is a leak there, you should see it in all
 drivers.

I am not a kernel developer at all but maybe that little bit points out that the
kmemleak was reported when I was pulling in/out my external USB drives?

Martin
--
To unsubscribe from this list: send the line unsubscribe linux

linux-3.7.4: kmemleak in sctp_sysctl_net_register()?

2013-01-23 Thread Martin Mokrejs

Hi,
  today I got the following report from the kernel, looks it happened when
I started/used/quit chromium browser. I haven't seen this with 3.7.1 but
I use builtin kmemleak detector only for 2-3 weeks.

unreferenced object 0x880402d08000 (size 2048):
  comm chrome_sandbox, pid 18437, jiffies 4310887172 (age 9097.630s)
  hex dump (first 32 bytes):
b2 68 89 81 ff ff ff ff 20 04 04 f8 01 88 ff ff  .h.. ...
04 00 00 00 a4 01 00 00 00 00 00 00 00 00 00 00  
  backtrace:
[815b4aad] kmemleak_alloc+0x21/0x3e
[81110352] slab_post_alloc_hook+0x28/0x2a
[81113fad] __kmalloc_track_caller+0xf1/0x104
[810f10c2] kmemdup+0x1b/0x30
[81571e9f] sctp_sysctl_net_register+0x1f/0x72
[8155d305] sctp_net_init+0x100/0x39f
[814ad53c] ops_init+0xc6/0xf5
[814ad5b7] setup_net+0x4c/0xd0
[814ada5e] copy_net_ns+0x6d/0xd6
[810938b1] create_new_namespaces+0xd7/0x147
[810939f4] copy_namespaces+0x63/0x99
[81076733] copy_process+0xa65/0x1233
[81077030] do_fork+0x10b/0x271
[8100a0e9] sys_clone+0x23/0x25
[815dda73] stub_clone+0x13/0x20
[] 0x


Please let me know if you need more info, like dmesg, .config or other.
Hope this helps.
Martin
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Use of memmap= to forcibly recover memory in 3GB-4GB range - is this safe?

2013-01-17 Thread Martin Mokrejs

Yinghai Lu wrote:
> On Wed, Jan 16, 2013 at 4:24 PM, Alex Villacís Lasso
>  wrote:
>> El 16/01/13 02:11, Yinghai Lu escribió:
>>
>>> On Tue, Jan 15, 2013 at 5:47 PM, Alex Villacís Lasso
>>>  wrote:

 [0.00] e820: BIOS-provided physical RAM map:
 [0.00] BIOS-e820: [mem 0x-0x0009f3ff]
 usable
 [0.00] BIOS-e820: [mem 0x0009f400-0x0009]
 reserved
 [0.00] BIOS-e820: [mem 0x000f-0x000f]
 reserved
 [0.00] BIOS-e820: [mem 0x0010-0xcf58]
 usable
 [0.00] BIOS-e820: [mem 0xcf59-0xcf5e2fff]
 ACPI
 NVS
 [0.00] BIOS-e820: [mem 0xcf5e3000-0xcf5e]
 ACPI
 data
 [0.00] BIOS-e820: [mem 0xcf5f-0xcf5f]
 reserved
 [0.00] BIOS-e820: [mem 0xe000-0xefff]
 reserved
 [0.00] BIOS-e820: [mem 0xfec0-0x]
 reserved
 [0.00] NX (Execute Disable) protection: active
>>>
>>> ..

 [0.00] original variable MTRRs
 [0.00] reg 0, base: 4GB, range: 512MB, type WB
 [0.00] reg 1, base: 4608MB, range: 256MB, type WB
 [0.00] reg 2, base: 0GB, range: 2GB, type WB
 [0.00] reg 3, base: 2GB, range: 1GB, type WB
 [0.00] reg 4, base: 3GB, range: 256MB, type WB
 [0.00] reg 5, base: 3319MB, range: 1MB, type UC
 [0.00] reg 6, base: 3320MB, range: 8MB, type UC
 [0.00] reg 7, base: 3318MB, range: 1MB, type UC
 [0.00] total RAM covered: 4086M
>>>
>>> Can you apply attached debug patch to see if the raw e820 is right from
>>> BIOS ?
> 
>> Done. The output is attached. I see no difference between raw and sanitized
>> maps.
> 
> yeah, it is BIOS problem.
> 
> you may either live with memmap= or try to get one BIOS update.

Hi Yinghai,
  wouldn't it be useful for other to include this patch into the kernel? It 
might help
someone else. Provided it is printed only when extra debug is enabled in the 
kernel
I don't think it hurts. Right?
  Actually, if it could do the check for differences automatically and print a 
warning
it would be even better.
Martin
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Use of memmap= to forcibly recover memory in 3GB-4GB range - is this safe?

2013-01-17 Thread Martin Mokrejs

Yinghai Lu wrote:
 On Wed, Jan 16, 2013 at 4:24 PM, Alex Villacís Lasso
 a_villa...@palosanto.com wrote:
 El 16/01/13 02:11, Yinghai Lu escribió:

 On Tue, Jan 15, 2013 at 5:47 PM, Alex Villacís Lasso
 a_villa...@palosanto.com wrote:

 [0.00] e820: BIOS-provided physical RAM map:
 [0.00] BIOS-e820: [mem 0x-0x0009f3ff]
 usable
 [0.00] BIOS-e820: [mem 0x0009f400-0x0009]
 reserved
 [0.00] BIOS-e820: [mem 0x000f-0x000f]
 reserved
 [0.00] BIOS-e820: [mem 0x0010-0xcf58]
 usable
 [0.00] BIOS-e820: [mem 0xcf59-0xcf5e2fff]
 ACPI
 NVS
 [0.00] BIOS-e820: [mem 0xcf5e3000-0xcf5e]
 ACPI
 data
 [0.00] BIOS-e820: [mem 0xcf5f-0xcf5f]
 reserved
 [0.00] BIOS-e820: [mem 0xe000-0xefff]
 reserved
 [0.00] BIOS-e820: [mem 0xfec0-0x]
 reserved
 [0.00] NX (Execute Disable) protection: active

 ..

 [0.00] original variable MTRRs
 [0.00] reg 0, base: 4GB, range: 512MB, type WB
 [0.00] reg 1, base: 4608MB, range: 256MB, type WB
 [0.00] reg 2, base: 0GB, range: 2GB, type WB
 [0.00] reg 3, base: 2GB, range: 1GB, type WB
 [0.00] reg 4, base: 3GB, range: 256MB, type WB
 [0.00] reg 5, base: 3319MB, range: 1MB, type UC
 [0.00] reg 6, base: 3320MB, range: 8MB, type UC
 [0.00] reg 7, base: 3318MB, range: 1MB, type UC
 [0.00] total RAM covered: 4086M

 Can you apply attached debug patch to see if the raw e820 is right from
 BIOS ?
 
 Done. The output is attached. I see no difference between raw and sanitized
 maps.
 
 yeah, it is BIOS problem.
 
 you may either live with memmap= or try to get one BIOS update.

Hi Yinghai,
  wouldn't it be useful for other to include this patch into the kernel? It 
might help
someone else. Provided it is printed only when extra debug is enabled in the 
kernel
I don't think it hurts. Right?
  Actually, if it could do the check for differences automatically and print a 
warning
it would be even better.
Martin
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: linux-3.7.1: OOPS in page_lock_anon_vma

2013-01-11 Thread Martin Mokrejs

Hugh Dickins wrote:
> On Sun, 6 Jan 2013, Martin Mokrejs wrote:
> 
>> I was running 3.7.1 kernel quite fine for a while but I realized that it is 
>> slow and that
>> I should go and drop useless kernel drivers from my kernel. I have a 
>> SandyBridge-based
>> laptop and I found that I gain speed while setting CONFIG_NO_HZ=y, 
>> CONFIG_PREEMPT_NONE=y,
>> removing multicore scheduler, asking configurator set set maximum amount of 
>> CPUs for my
>> system (and not blindly specifying 4 for my dual-core i7 processor).
>> Further I get faster system while removing IOMMU and DMA redirects while it 
>> still
>> emulates NUMA. And, I switched away from CFQ scheduler to deadline and from 
>> SLAB to SLUB.
>> Finally, to make sure my CPU cores do not go back and forth between C0 and 
>> C7 states and
>> shutdown dynamically the 2 hyperthreaded cores. So I have really only two, 
>> physical cores
>> accessible. With performance CPU governor I have 1/2 of context switches and 
>> both cores
>> can be satured by whatever jobs (kernel compile or some computational jobs). 
>> It was not
>> possible to get the CPU running at turbo speed for a long while as it always 
>> went down
>> time to time. With ondemand governor I had cores in C7 for 50-70% of the 
>> time, that was
>> a bit better with performance governor but having the two hyperthreaded 
>> cores disabled
>> reduced the context switches by half, rescheduling interrupts went down by 
>> several orders
>> of magnitute. So it is crunching at max turbo speed on both cores, temp 
>> about 80 oC.
>>
>> I think none of the changes relates to the kernel crash directly but I had 
>> not a single crash
>> with 3.7.1 for few weeks. After the tweaks I had 3-4 crashes this afternoon. 
>> The system always
>> locked up so I could not see anything. Luckily, be it actually the same 
>> crash or not, now my X11
>> screen was dropped and to my framebuffer console and I got to see a kernel 
>> stacktrace. Here
>> is the first, fished out from /var/log/messages upon next bootup:
>>
>>
>> Jan  6 22:37:29 vostro kernel: [ 7663.251110] general protection fault:  
>> [#1] SMP
>> Jan  6 22:37:29 vostro kernel: [ 7663.251135] Modules linked in: i915 fbcon 
>> bitblit cfbfillrect softcursor cfbimgblt i2c_algo_bit font cfbcopyarea 
>> drm_kms_helper drm fb iwldvm iwlwifi fbdev sata_sil24
>> Jan  6 22:37:29 vostro kernel: [ 7663.251197] CPU 1 
>> Jan  6 22:37:29 vostro kernel: [ 7663.251206] Pid: 795, comm: kswapd0 Not 
>> tainted 3.7.1-default #22 Dell Inc. Vostro 3550/
>> Jan  6 22:37:29 vostro kernel: [ 7663.251229] RIP: 0010:[] 
>>  [] mutex_trylock+0xb/0x26
>> Jan  6 22:37:29 vostro kernel: [ 7663.251257] RSP: 0018:88040d25bbb8  
>> EFLAGS: 00010246
>> Jan  6 22:37:29 vostro kernel: [ 7663.251273] RAX: 0001 RBX: 
>> 88040bfdc000 RCX: 88040d25bce8
>> Jan  6 22:37:29 vostro kernel: [ 7663.251293] RDX:  RSI: 
>>  RDI: 0720072007200728
>> Jan  6 22:37:29 vostro kernel: [ 7663.251313] RBP: 88040d25bbb8 R08: 
>> dead00200200 R09: dead00100100
>> Jan  6 22:37:29 vostro kernel: [ 7663.251333] R10: 88040d25bc38 R11: 
>> 8804078acec0 R12: 88040bfdc001
>> Jan  6 22:37:29 vostro kernel: [ 7663.251354] R13: ea0010137440 R14: 
>> 0720072007200728 R15: 0001
>> Jan  6 22:37:29 vostro kernel: [ 7663.251374] FS:  () 
>> GS:88041fa8() knlGS:
>> Jan  6 22:37:29 vostro kernel: [ 7663.251396] CS:  0010 DS:  ES:  
>> CR0: 80050033
>> Jan  6 22:37:29 vostro kernel: [ 7663.251413] CR2: 2b876c545978 CR3: 
>> 018f6000 CR4: 000407e0
>> Jan  6 22:37:29 vostro kernel: [ 7663.251432] DR0:  DR1: 
>>  DR2: 
>> Jan  6 22:37:29 vostro kernel: [ 7663.251452] DR3:  DR6: 
>> 0ff0 DR7: 0400
>> Jan  6 22:37:29 vostro kernel: [ 7663.251472] Process kswapd0 (pid: 795, 
>> threadinfo 88040d25a000, task 88040d07ce30)
>> Jan  6 22:37:29 vostro kernel: [ 7663.251494] Stack:
>> Jan  6 22:37:29 vostro kernel: [ 7663.251501]  88040d25bbe8 
>> 810f6994 ea0010137440 
>> Jan  6 22:37:29 vostro kernel: [ 7663.251527]  88040d25bde8 
>> 88041fddad00 88040d25bc58 810f6b9e
>> Jan  6 22:37:29 vostro kernel: [ 7663.251551]   
>> 8804046d2dc0 810dee97 88040d25bce8
>> Jan

Re: [PATCH v3 3/6] ACPI/pci_slot: update PCI slot information when PCI hotplug event happens

2013-01-11 Thread Martin Mokrejs

Hi,
  I just hit this thread in my bloated Inbox.

Rafael J. Wysocki wrote:
> On Thursday, January 10, 2013 03:03:53 PM Yinghai Lu wrote:
>> On Thu, Jan 10, 2013 at 1:50 PM, Rafael J. Wysocki  wrote:
>>> Well, I don't see what functional problems that can bring.
>>>
>>> In theory people may want to have them as modules to avoid loading them on
>>> systems that don't use PCI hotplug, but honestly I think that the complexity
>>> this causes us to deal with is not worth it.
>>>
>>> Moreover, removing the modularity may actually allow us to solve some 
>>> ordering
>>> issues once and for good.
>>
>> No, the world is not really ideal yet.
>>
>> looks like laptops have problem with pci express cards.
>>
>> when pciehp is used, surprise insert/removal does not work because
>> PresDect does not change properly, so no interrupt is generated.
>> --- i suspects that is silicon problem.

That's what seemed to be the conclusion half a year ago around 3.2.x/3.3.x
for my issues as well (SandyBridge C6/C200 chipset).

>>
>> but when acpiphp is used, that surprise  insert/removal is working.

That's what I discovered few days ago as well. However, there are still some
differences between individual express cards and I just need to find some time 
to
dig through the data I collected.

>>
>> some laptop like thinkpad, just don't give osc to kernel..
>> [0.505117]  pci:00: Requesting ACPI _OSC control (0x1d)
>> [0.505413]  pci:00: ACPI _OSC request failed (AE_SUPPORT),
>> returned control mask: 0x0d
>> [0.505517] ACPI _OSC control for PCIe not granted, disabling ASPM
>>
>> and other laptop give that to kernel, in recent kernel will not give
>> acpiphp to have that slot, because it want to hold that for pciehp.
>> poor user have to pass 'pci_aspm=off" to disable _OSC for all.
>> --- please check the mail that i forward to you yesterday.
> 
> Yes, this is a bug, but I'm not sure how to fix it yet.

Looks like what I see with Dell Vostro 3550 as well.

> 
>> Anyway, we do need to let the user to have choice to use acpiphp and pciehp.
>> and it should be first come and first serve policy.
> 
> And that's why you think they should be modules?  I disagree if so.

For me it is easier to cold boot with a card plugged in and fiddle later with
hotplug if I want to unload the card. Until that, I can inspect wheteher PresDet
really reports the card is in, and the see if system reports same after loading
acpiphp or pciehp. I wouldn't drop the possibility to have them as modules, at
least for now when finally we have some clue what is going on and can load the
modules as we want while chasing the bugs.

But sorry for hijacking this thread, maybe I managed to delete your answers on 
my
thread ("Re: Dell Vostro 3550: pci_hotplug+acpiphp require 'pcie_aspm=force' on 
kernel command-line for hotplug to work").
Will go through web archives to make sure I did not miss something.

Martin
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v3 3/6] ACPI/pci_slot: update PCI slot information when PCI hotplug event happens

2013-01-11 Thread Martin Mokrejs

Hi,
  I just hit this thread in my bloated Inbox.

Rafael J. Wysocki wrote:
 On Thursday, January 10, 2013 03:03:53 PM Yinghai Lu wrote:
 On Thu, Jan 10, 2013 at 1:50 PM, Rafael J. Wysocki r...@sisk.pl wrote:
 Well, I don't see what functional problems that can bring.

 In theory people may want to have them as modules to avoid loading them on
 systems that don't use PCI hotplug, but honestly I think that the complexity
 this causes us to deal with is not worth it.

 Moreover, removing the modularity may actually allow us to solve some 
 ordering
 issues once and for good.

 No, the world is not really ideal yet.

 looks like laptops have problem with pci express cards.

 when pciehp is used, surprise insert/removal does not work because
 PresDect does not change properly, so no interrupt is generated.
 --- i suspects that is silicon problem.

That's what seemed to be the conclusion half a year ago around 3.2.x/3.3.x
for my issues as well (SandyBridge C6/C200 chipset).


 but when acpiphp is used, that surprise  insert/removal is working.

That's what I discovered few days ago as well. However, there are still some
differences between individual express cards and I just need to find some time 
to
dig through the data I collected.


 some laptop like thinkpad, just don't give osc to kernel..
 [0.505117]  pci:00: Requesting ACPI _OSC control (0x1d)
 [0.505413]  pci:00: ACPI _OSC request failed (AE_SUPPORT),
 returned control mask: 0x0d
 [0.505517] ACPI _OSC control for PCIe not granted, disabling ASPM

 and other laptop give that to kernel, in recent kernel will not give
 acpiphp to have that slot, because it want to hold that for pciehp.
 poor user have to pass 'pci_aspm=off to disable _OSC for all.
 --- please check the mail that i forward to you yesterday.
 
 Yes, this is a bug, but I'm not sure how to fix it yet.

Looks like what I see with Dell Vostro 3550 as well.

 
 Anyway, we do need to let the user to have choice to use acpiphp and pciehp.
 and it should be first come and first serve policy.
 
 And that's why you think they should be modules?  I disagree if so.

For me it is easier to cold boot with a card plugged in and fiddle later with
hotplug if I want to unload the card. Until that, I can inspect wheteher PresDet
really reports the card is in, and the see if system reports same after loading
acpiphp or pciehp. I wouldn't drop the possibility to have them as modules, at
least for now when finally we have some clue what is going on and can load the
modules as we want while chasing the bugs.

But sorry for hijacking this thread, maybe I managed to delete your answers on 
my
thread (Re: Dell Vostro 3550: pci_hotplug+acpiphp require 'pcie_aspm=force' on 
kernel command-line for hotplug to work).
Will go through web archives to make sure I did not miss something.

Martin
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: linux-3.7.1: OOPS in page_lock_anon_vma

2013-01-11 Thread Martin Mokrejs

Hugh Dickins wrote:
 On Sun, 6 Jan 2013, Martin Mokrejs wrote:
 
 I was running 3.7.1 kernel quite fine for a while but I realized that it is 
 slow and that
 I should go and drop useless kernel drivers from my kernel. I have a 
 SandyBridge-based
 laptop and I found that I gain speed while setting CONFIG_NO_HZ=y, 
 CONFIG_PREEMPT_NONE=y,
 removing multicore scheduler, asking configurator set set maximum amount of 
 CPUs for my
 system (and not blindly specifying 4 for my dual-core i7 processor).
 Further I get faster system while removing IOMMU and DMA redirects while it 
 still
 emulates NUMA. And, I switched away from CFQ scheduler to deadline and from 
 SLAB to SLUB.
 Finally, to make sure my CPU cores do not go back and forth between C0 and 
 C7 states and
 shutdown dynamically the 2 hyperthreaded cores. So I have really only two, 
 physical cores
 accessible. With performance CPU governor I have 1/2 of context switches and 
 both cores
 can be satured by whatever jobs (kernel compile or some computational jobs). 
 It was not
 possible to get the CPU running at turbo speed for a long while as it always 
 went down
 time to time. With ondemand governor I had cores in C7 for 50-70% of the 
 time, that was
 a bit better with performance governor but having the two hyperthreaded 
 cores disabled
 reduced the context switches by half, rescheduling interrupts went down by 
 several orders
 of magnitute. So it is crunching at max turbo speed on both cores, temp 
 about 80 oC.

 I think none of the changes relates to the kernel crash directly but I had 
 not a single crash
 with 3.7.1 for few weeks. After the tweaks I had 3-4 crashes this afternoon. 
 The system always
 locked up so I could not see anything. Luckily, be it actually the same 
 crash or not, now my X11
 screen was dropped and to my framebuffer console and I got to see a kernel 
 stacktrace. Here
 is the first, fished out from /var/log/messages upon next bootup:


 Jan  6 22:37:29 vostro kernel: [ 7663.251110] general protection fault:  
 [#1] SMP
 Jan  6 22:37:29 vostro kernel: [ 7663.251135] Modules linked in: i915 fbcon 
 bitblit cfbfillrect softcursor cfbimgblt i2c_algo_bit font cfbcopyarea 
 drm_kms_helper drm fb iwldvm iwlwifi fbdev sata_sil24
 Jan  6 22:37:29 vostro kernel: [ 7663.251197] CPU 1 
 Jan  6 22:37:29 vostro kernel: [ 7663.251206] Pid: 795, comm: kswapd0 Not 
 tainted 3.7.1-default #22 Dell Inc. Vostro 3550/
 Jan  6 22:37:29 vostro kernel: [ 7663.251229] RIP: 0010:[815d3dee] 
  [815d3dee] mutex_trylock+0xb/0x26
 Jan  6 22:37:29 vostro kernel: [ 7663.251257] RSP: 0018:88040d25bbb8  
 EFLAGS: 00010246
 Jan  6 22:37:29 vostro kernel: [ 7663.251273] RAX: 0001 RBX: 
 88040bfdc000 RCX: 88040d25bce8
 Jan  6 22:37:29 vostro kernel: [ 7663.251293] RDX:  RSI: 
  RDI: 0720072007200728
 Jan  6 22:37:29 vostro kernel: [ 7663.251313] RBP: 88040d25bbb8 R08: 
 dead00200200 R09: dead00100100
 Jan  6 22:37:29 vostro kernel: [ 7663.251333] R10: 88040d25bc38 R11: 
 8804078acec0 R12: 88040bfdc001
 Jan  6 22:37:29 vostro kernel: [ 7663.251354] R13: ea0010137440 R14: 
 0720072007200728 R15: 0001
 Jan  6 22:37:29 vostro kernel: [ 7663.251374] FS:  () 
 GS:88041fa8() knlGS:
 Jan  6 22:37:29 vostro kernel: [ 7663.251396] CS:  0010 DS:  ES:  
 CR0: 80050033
 Jan  6 22:37:29 vostro kernel: [ 7663.251413] CR2: 2b876c545978 CR3: 
 018f6000 CR4: 000407e0
 Jan  6 22:37:29 vostro kernel: [ 7663.251432] DR0:  DR1: 
  DR2: 
 Jan  6 22:37:29 vostro kernel: [ 7663.251452] DR3:  DR6: 
 0ff0 DR7: 0400
 Jan  6 22:37:29 vostro kernel: [ 7663.251472] Process kswapd0 (pid: 795, 
 threadinfo 88040d25a000, task 88040d07ce30)
 Jan  6 22:37:29 vostro kernel: [ 7663.251494] Stack:
 Jan  6 22:37:29 vostro kernel: [ 7663.251501]  88040d25bbe8 
 810f6994 ea0010137440 
 Jan  6 22:37:29 vostro kernel: [ 7663.251527]  88040d25bde8 
 88041fddad00 88040d25bc58 810f6b9e
 Jan  6 22:37:29 vostro kernel: [ 7663.251551]   
 8804046d2dc0 810dee97 88040d25bce8
 Jan  6 22:37:29 vostro kernel: [ 7663.251576] Call Trace:
 Jan  6 22:37:29 vostro kernel: [ 7663.251587]  [810f6994] 
 page_lock_anon_vma+0x40/0xaf
 Jan  6 22:37:29 vostro kernel: [ 7663.251605]  [810f6b9e] 
 page_referenced+0x78/0x1b7
 Jan  6 22:37:29 vostro kernel: [ 7663.251623]  [810e026a] 
 shrink_active_list+0x209/0x305
 Jan  6 22:37:29 vostro kernel: [ 7663.251641]  [810e1269] 
 kswapd+0x3fe/0x8ea
 Jan  6 22:37:29 vostro kernel: [ 7663.251658]  [81091697] ? 
 wake_up_bit+0x25/0x25
 Jan  6 22:37:29 vostro kernel: [ 7663.251675]  [810e0e6b] ? 
 try_to_free_pages+0x8c/0x8c
 Jan  6 22:37:29 vostro kernel

3.7.1: BUG filp (Not tainted): Poison overwritten

2013-01-09 Thread Martin Mokrejs

Hi,
  today I received the following.

[  124.927854] pci_hotplug: PCI Hot Plug PCI Core version: 0.5
[  124.987250] acpiphp: ACPI Hot Plug PCI Controller Driver version: 0.5
[  124.992228] pci_bus :11: dev 00, created physical slot 1
[  124.992448] acpiphp: Slot [1] registered
[  233.258244] 
=
[  233.258247] BUG filp (Not tainted): Poison overwritten
[  233.258248] 
-

[  233.258248] Disabling lock debugging due to kernel taint
[  233.258250] INFO: 0x88040102-0x88040102001d. First byte 0x20 
instead of 0x6b
[  233.258253] INFO: Slab 0xea0010040800 objects=21 used=21 fp=0x  
(null) flags=0x204080
[  233.258254] INFO: Object 0x88040102 @offset=0 fp=0x880401021e00

[  233.258255] Object 88040102: 20 07 20 07 20 07 20 07 20 07 20 07 20 
07 20 07   . . . . . . . .
[  233.258256] Object 880401020010: 20 07 20 07 20 07 20 07 20 07 20 07 20 
07 6b 6b   . . . . . . .kk
[  233.258257] Object 880401020020: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 
6b 6b 6b  
[  233.258258] Object 880401020030: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 
6b 6b 6b  
[  233.258259] Object 880401020040: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 
6b 6b 6b  
[  233.258260] Object 880401020050: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 
6b 6b 6b  
[  233.258260] Object 880401020060: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 
6b 6b 6b  
[  233.258261] Object 880401020070: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 
6b 6b 6b  
[  233.258262] Object 880401020080: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 
6b 6b 6b  
[  233.258263] Object 880401020090: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 
6b 6b 6b  
[  233.258264] Object 8804010200a0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 
6b 6b 6b  
[  233.258265] Object 8804010200b0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 
6b 6b 6b  
[  233.258265] Object 8804010200c0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 
6b 6b 6b  
[  233.258266] Object 8804010200d0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 
6b 6b 6b  
[  233.258267] Object 8804010200e0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 
6b 6b 6b  
[  233.258268] Object 8804010200f0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 
6b 6b 6b  
[  233.258269] Object 880401020100: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 
6b 6b 6b  
[  233.258269] Object 880401020110: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 
6b 6b 6b  
[  233.258270] Object 880401020120: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 
6b 6b a5  kkk.
[  233.258271] Redzone 880401020130: bb bb bb bb bb bb bb bb
  
[  233.258272] Padding 880401020140: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 
5a 5a 5a  
[  233.258273] Padding 880401020150: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 
5a 5a 5a  
[  233.258274] Padding 880401020160: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 
5a 5a 5a  
[  233.258275] Padding 880401020170: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 
5a 5a 5a  
[  233.258277] Pid: 4440, comm: lspci Tainted: GB3.7.1-default #30
[  233.258277] Call Trace:
[  233.258283]  [] ? print_section+0x38/0x3a
[  233.258285]  [] print_trailer+0x105/0x10e
[  233.258287]  [] check_bytes_and_report+0xac/0xe5
[  233.258290]  [] check_object+0xbf/0x1ad
[  233.258291]  [] ? check_slab+0xaf/0xbd
[  233.258294]  [] ? get_empty_filp+0x6f/0x155
[  233.258297]  [] alloc_debug_processing+0x61/0xed
[  233.258299]  [] __slab_alloc+0x344/0x3ba
[  233.258301]  [] ? get_empty_filp+0x6f/0x155
[  233.258303]  [] ? print_context_stack+0xa2/0xbe
[  233.258305]  [] ? get_empty_filp+0x6f/0x155
[  233.258307]  [] ? get_empty_filp+0x6f/0x155
[  233.258309]  [] kmem_cache_alloc+0x50/0xb6
[  233.258310]  [] get_empty_filp+0x6f/0x155
[  233.258313]  [] path_openat+0x35/0x313
[  233.258315]  [] do_filp_open+0x33/0x81
[  233.258317]  [] ? _raw_spin_unlock+0x23/0x27
[  233.258320]  [] ? __alloc_fd+0xe4/0xf6
[  233.258322]  [] do_sys_open+0x68/0xfa
[  233.258323]  [] sys_open+0x1c/0x1e
[  233.258325]  [] system_call_fastpath+0x1a/0x1f
[  233.258327] FIX filp: Restoring 0x88040102-0x88040102001d=0x6b

[  233.258327] FIX filp: Marking all objects used


If you need .config or full dmesg please let me know and please Cc: me, ideally.
Martin
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

3.7.1: BUG filp (Not tainted): Poison overwritten

2013-01-09 Thread Martin Mokrejs

Hi,
  today I received the following.

[  124.927854] pci_hotplug: PCI Hot Plug PCI Core version: 0.5
[  124.987250] acpiphp: ACPI Hot Plug PCI Controller Driver version: 0.5
[  124.992228] pci_bus :11: dev 00, created physical slot 1
[  124.992448] acpiphp: Slot [1] registered
[  233.258244] 
=
[  233.258247] BUG filp (Not tainted): Poison overwritten
[  233.258248] 
-

[  233.258248] Disabling lock debugging due to kernel taint
[  233.258250] INFO: 0x88040102-0x88040102001d. First byte 0x20 
instead of 0x6b
[  233.258253] INFO: Slab 0xea0010040800 objects=21 used=21 fp=0x  
(null) flags=0x204080
[  233.258254] INFO: Object 0x88040102 @offset=0 fp=0x880401021e00

[  233.258255] Object 88040102: 20 07 20 07 20 07 20 07 20 07 20 07 20 
07 20 07   . . . . . . . .
[  233.258256] Object 880401020010: 20 07 20 07 20 07 20 07 20 07 20 07 20 
07 6b 6b   . . . . . . .kk
[  233.258257] Object 880401020020: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 
6b 6b 6b  
[  233.258258] Object 880401020030: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 
6b 6b 6b  
[  233.258259] Object 880401020040: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 
6b 6b 6b  
[  233.258260] Object 880401020050: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 
6b 6b 6b  
[  233.258260] Object 880401020060: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 
6b 6b 6b  
[  233.258261] Object 880401020070: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 
6b 6b 6b  
[  233.258262] Object 880401020080: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 
6b 6b 6b  
[  233.258263] Object 880401020090: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 
6b 6b 6b  
[  233.258264] Object 8804010200a0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 
6b 6b 6b  
[  233.258265] Object 8804010200b0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 
6b 6b 6b  
[  233.258265] Object 8804010200c0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 
6b 6b 6b  
[  233.258266] Object 8804010200d0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 
6b 6b 6b  
[  233.258267] Object 8804010200e0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 
6b 6b 6b  
[  233.258268] Object 8804010200f0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 
6b 6b 6b  
[  233.258269] Object 880401020100: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 
6b 6b 6b  
[  233.258269] Object 880401020110: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 
6b 6b 6b  
[  233.258270] Object 880401020120: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 
6b 6b a5  kkk.
[  233.258271] Redzone 880401020130: bb bb bb bb bb bb bb bb
  
[  233.258272] Padding 880401020140: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 
5a 5a 5a  
[  233.258273] Padding 880401020150: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 
5a 5a 5a  
[  233.258274] Padding 880401020160: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 
5a 5a 5a  
[  233.258275] Padding 880401020170: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 
5a 5a 5a  
[  233.258277] Pid: 4440, comm: lspci Tainted: GB3.7.1-default #30
[  233.258277] Call Trace:
[  233.258283]  [8111085b] ? print_section+0x38/0x3a
[  233.258285]  [81110d19] print_trailer+0x105/0x10e
[  233.258287]  [81110fe9] check_bytes_and_report+0xac/0xe5
[  233.258290]  [80e1] check_object+0xbf/0x1ad
[  233.258291]  [897f] ? check_slab+0xaf/0xbd
[  233.258294]  [81119b04] ? get_empty_filp+0x6f/0x155
[  233.258297]  [815d2a31] alloc_debug_processing+0x61/0xed
[  233.258299]  [815d34dd] __slab_alloc+0x344/0x3ba
[  233.258301]  [81119b04] ? get_empty_filp+0x6f/0x155
[  233.258303]  [8100536b] ? print_context_stack+0xa2/0xbe
[  233.258305]  [81119b04] ? get_empty_filp+0x6f/0x155
[  233.258307]  [81119b04] ? get_empty_filp+0x6f/0x155
[  233.258309]  [81112f50] kmem_cache_alloc+0x50/0xb6
[  233.258310]  [81119b04] get_empty_filp+0x6f/0x155
[  233.258313]  [81123e4b] path_openat+0x35/0x313
[  233.258315]  [8112440b] do_filp_open+0x33/0x81
[  233.258317]  [815d9b93] ? _raw_spin_unlock+0x23/0x27
[  233.258320]  [8112e4cb] ? __alloc_fd+0xe4/0xf6
[  233.258322]  [81118403] do_sys_open+0x68/0xfa
[  233.258323]  [811184b1] sys_open+0x1c/0x1e
[  233.258325]  [815da756] system_call_fastpath+0x1a/0x1f
[  233.258327] FIX filp: Restoring 0x88040102-0x88040102001d=0x6b

[  233.258327] FIX filp: Marking all objects used


If you need .config or full

Re: linux-3.7.1: OOPS in page_lock_anon_vma

2013-01-07 Thread Martin Mokrejs

Hi Hilf,
   thank you for your answer on this albeit I am not sure I understood your 
point well.

Hillf Danton wrote:
> Hello Martin
> 
> On Mon, Jan 7, 2013 at 6:59 AM, Martin Mokrejs
>  wrote:
>> time to time. With ondemand governor I had cores in C7 for 50-70% of the 
>> time, that was
>> a bit better with performance governor but having the two hyperthreaded 
>> cores disabled
>> reduced the context switches by half, rescheduling interrupts went down by 
>> several orders
>> of magnitute. So it is crunching at max turbo speed on both cores, temp 
>> about 80 oC.
>>
> Your boxen could be used to cook pizza, and check the
> recommended working temperature in the manual please.

I meant CPU temperature, not environment temperature. ;-) This is a laptop dual 
core i7.


# dmesg | grep -i temp  
[2.233856] coretemp coretemp.0: TjMax is 100 degrees C
[2.233882] coretemp coretemp.0: TjMax is 100 degrees C
#


I am a bit worried whether I disabled the 2 hyperthreaded cores (cpu2 and cpu3).
Per the stats below it like inadverently disabled the second core and its 
hyperthreaded
sibling? Or why are the counters not updated for CPU1 below?

# cat /proc/interrupts 
   CPU0   CPU1   
  0: 30  0   IO-APIC-edge  timer
  1: 15  0   IO-APIC-edge  i8042
  8: 33  0   IO-APIC-edge  rtc0
  9:  2  0   IO-APIC-fasteoi   acpi
 12:241  0   IO-APIC-edge  i8042
 16: 50  0   IO-APIC-fasteoi   ehci_hcd:usb1
 19: 464445  0   IO-APIC-fasteoi   sata_sil24
 23:  17324  0   IO-APIC-fasteoi   ehci_hcd:usb2
 40:  0  0   PCI-MSI-edge  pciehp
 41: 14  0   PCI-MSI-edge  mei
 42: 137666  0   PCI-MSI-edge  ahci
 43:  13901  0   PCI-MSI-edge  eth0
 44:  36022  0   PCI-MSI-edge  xhci_hcd
 45:  0  0   PCI-MSI-edge  xhci_hcd
 46:  0  0   PCI-MSI-edge  xhci_hcd
 47:  0  0   PCI-MSI-edge  xhci_hcd
 48:  0  0   PCI-MSI-edge  xhci_hcd
 49:810  0   PCI-MSI-edge  snd_hda_intel
 50:  1  0   PCI-MSI-edge  iwlwifi
 51:461  0   PCI-MSI-edge  i915
NMI:   6496   6111   Non-maskable interrupts
LOC: 526765 521983   Local timer interrupts
SPU:  0  0   Spurious interrupts
PMI:   6496   6111   Performance monitoring interrupts
IWI:  0  0   IRQ work interrupts
RTR:  2  0   APIC ICR read retries
RES: 197262 220079   Rescheduling interrupts
CAL: 33 299572   Function call interrupts
TLB:   3302  19119   TLB shootdowns
TRM:  0  0   Thermal event interrupts
THR:  0  0   Threshold APIC interrupts
MCE:  0  0   Machine check exceptions
MCP: 20 20   Machine check polls
ERR:  0
MIS:  0
#



i7z reports at the moment:


Cpu speed from cpuinfo 2793.00Mhz
cpuinfo might be wrong if cpufreq is enabled. To guess correctly try estimating 
via tsc
Linux's inbuilt cpu_khz code emulated now
True Frequency (without accounting Turbo) 2793 MHz
  CPU Multiplier 28x || Bus clock frequency (BCLK) 99.75 MHz

Socket [0] - [physical cores=2, logical cores=2, max online cores ever=2]
  TURBO ENABLED on 2 Cores, Hyper Threading OFF
  Max Frequency without considering Turbo 2892.75 MHz (99.75 x [29])
  Max TURBO Multiplier (if Enabled) with 1/2/3/4 Cores is  35x/33x/33x/33x
  Real Current Frequency 3291.75 MHz [99.75 x 33.00] (Max of below)
Core [core-id]  :Actual Freq (Mult.)  C0%   Halt(C1)%  C3 %   C6 %  
 C7 %  Temp
Core 1 [0]:   3291.75 (33.00x)   100   0   0   0
   087
Core 2 [1]:   3291.75 (33.00x)   100   0   0   0
   081


# cat /proc/schedstat 
version 15
timestamp 4295525245
cpu0 0 0 4348066 350860 2727228 2499580 4026361745866 2434254688153 3965236
domain0 3 25687 19018 2642 7492049 4293 7 0 19018 22219 21471 43 1338108 709 0 
0 21471 342087 288140 40648 58479699 14067 33 5 288135 0 0 0 0 0 0 0 0 0 223256 
24270 0
cpu1 0 0 4297136 324961 2565709 2361951 3810969849763 2437183692947 3941014
domain0 3 24296 17512 2837 7768706 4218 15 1 17511 22994 22053 48 1636623 896 0 
0 22053 313125 260913 38828 58232101 14403 37 2 260911 0 0 0 0 0 0 0 0 0 198332 
23230 0
# cat /proc/sched_debug 
Sched Debug Version: v0.10, 3.7.1-default #24
ktime   : 5888049.840626
sched_clk   : 5878999.320221
cpu_clk : 5878999.320272
jiffies : 4295526100
sched_clock_stable  : 1

sysctl_sched
  .sysctl_sched_latency

Re: linux-3.7.1: OOPS in page_lock_anon_vma

2013-01-07 Thread Martin Mokrejs

Hi Hilf,
   thank you for your answer on this albeit I am not sure I understood your 
point well.

Hillf Danton wrote:
 Hello Martin
 
 On Mon, Jan 7, 2013 at 6:59 AM, Martin Mokrejs
 mmokr...@fold.natur.cuni.cz wrote:
 time to time. With ondemand governor I had cores in C7 for 50-70% of the 
 time, that was
 a bit better with performance governor but having the two hyperthreaded 
 cores disabled
 reduced the context switches by half, rescheduling interrupts went down by 
 several orders
 of magnitute. So it is crunching at max turbo speed on both cores, temp 
 about 80 oC.

 Your boxen could be used to cook pizza, and check the
 recommended working temperature in the manual please.

I meant CPU temperature, not environment temperature. ;-) This is a laptop dual 
core i7.


# dmesg | grep -i temp  
[2.233856] coretemp coretemp.0: TjMax is 100 degrees C
[2.233882] coretemp coretemp.0: TjMax is 100 degrees C
#


I am a bit worried whether I disabled the 2 hyperthreaded cores (cpu2 and cpu3).
Per the stats below it like inadverently disabled the second core and its 
hyperthreaded
sibling? Or why are the counters not updated for CPU1 below?

# cat /proc/interrupts 
   CPU0   CPU1   
  0: 30  0   IO-APIC-edge  timer
  1: 15  0   IO-APIC-edge  i8042
  8: 33  0   IO-APIC-edge  rtc0
  9:  2  0   IO-APIC-fasteoi   acpi
 12:241  0   IO-APIC-edge  i8042
 16: 50  0   IO-APIC-fasteoi   ehci_hcd:usb1
 19: 464445  0   IO-APIC-fasteoi   sata_sil24
 23:  17324  0   IO-APIC-fasteoi   ehci_hcd:usb2
 40:  0  0   PCI-MSI-edge  pciehp
 41: 14  0   PCI-MSI-edge  mei
 42: 137666  0   PCI-MSI-edge  ahci
 43:  13901  0   PCI-MSI-edge  eth0
 44:  36022  0   PCI-MSI-edge  xhci_hcd
 45:  0  0   PCI-MSI-edge  xhci_hcd
 46:  0  0   PCI-MSI-edge  xhci_hcd
 47:  0  0   PCI-MSI-edge  xhci_hcd
 48:  0  0   PCI-MSI-edge  xhci_hcd
 49:810  0   PCI-MSI-edge  snd_hda_intel
 50:  1  0   PCI-MSI-edge  iwlwifi
 51:461  0   PCI-MSI-edge  i915
NMI:   6496   6111   Non-maskable interrupts
LOC: 526765 521983   Local timer interrupts
SPU:  0  0   Spurious interrupts
PMI:   6496   6111   Performance monitoring interrupts
IWI:  0  0   IRQ work interrupts
RTR:  2  0   APIC ICR read retries
RES: 197262 220079   Rescheduling interrupts
CAL: 33 299572   Function call interrupts
TLB:   3302  19119   TLB shootdowns
TRM:  0  0   Thermal event interrupts
THR:  0  0   Threshold APIC interrupts
MCE:  0  0   Machine check exceptions
MCP: 20 20   Machine check polls
ERR:  0
MIS:  0
#



i7z reports at the moment:


Cpu speed from cpuinfo 2793.00Mhz
cpuinfo might be wrong if cpufreq is enabled. To guess correctly try estimating 
via tsc
Linux's inbuilt cpu_khz code emulated now
True Frequency (without accounting Turbo) 2793 MHz
  CPU Multiplier 28x || Bus clock frequency (BCLK) 99.75 MHz

Socket [0] - [physical cores=2, logical cores=2, max online cores ever=2]
  TURBO ENABLED on 2 Cores, Hyper Threading OFF
  Max Frequency without considering Turbo 2892.75 MHz (99.75 x [29])
  Max TURBO Multiplier (if Enabled) with 1/2/3/4 Cores is  35x/33x/33x/33x
  Real Current Frequency 3291.75 MHz [99.75 x 33.00] (Max of below)
Core [core-id]  :Actual Freq (Mult.)  C0%   Halt(C1)%  C3 %   C6 %  
 C7 %  Temp
Core 1 [0]:   3291.75 (33.00x)   100   0   0   0
   087
Core 2 [1]:   3291.75 (33.00x)   100   0   0   0
   081


# cat /proc/schedstat 
version 15
timestamp 4295525245
cpu0 0 0 4348066 350860 2727228 2499580 4026361745866 2434254688153 3965236
domain0 3 25687 19018 2642 7492049 4293 7 0 19018 22219 21471 43 1338108 709 0 
0 21471 342087 288140 40648 58479699 14067 33 5 288135 0 0 0 0 0 0 0 0 0 223256 
24270 0
cpu1 0 0 4297136 324961 2565709 2361951 3810969849763 2437183692947 3941014
domain0 3 24296 17512 2837 7768706 4218 15 1 17511 22994 22053 48 1636623 896 0 
0 22053 313125 260913 38828 58232101 14403 37 2 260911 0 0 0 0 0 0 0 0 0 198332 
23230 0
# cat /proc/sched_debug 
Sched Debug Version: v0.10, 3.7.1-default #24
ktime   : 5888049.840626
sched_clk   : 5878999.320221
cpu_clk : 5878999.320272
jiffies : 4295526100
sched_clock_stable  : 1

sysctl_sched
  .sysctl_sched_latency: 12.00
  .sysctl_sched_min_granularity: 1.50

Re: [QUESTION ON BUG] the rcu stall issue could not be reproduced

2012-07-23 Thread Martin Mokrejs

Hi,
  I see few more RCU bugs reported in bugzilla:

https://bugzilla.kernel.org/show_bug.cgi?id=43028
https://bugzilla.kernel.org/show_bug.cgi?id=40092
https://bugzilla.kernel.org/show_bug.cgi?id=42997

And, I placed my previous long email with logs at
https://bugzilla.kernel.org/show_bug.cgi?id=45091

Hope this helps eventually once.
Martin

Mike Galbraith wrote:
> On Fri, 2012-07-20 at 11:09 +0800, Michael Wang wrote: 
>> Hi, Mike, Martin, Dan
>>
>> I'm currently taking an eye on the rcu stall issue which was reported by
>> you in the mail:
>>
>> rcu: endless stalls
>>  From: Mike Galbraith
>> linux-3.4-rc7: rcu_sched self-detected stall on CPU
>>  From: Martin Mokrejs
>> RCU stalls in linux-next
>>  From: Dan Carpenter
>>
>> I try to reproduce the issue on my X86 server with 12 cpu
> 
> The 'endless stalls' box was 341.3 times larger.  Dunno if you can
> even set a serial port slow enough to approximate all cores trying to
> gripe through a single pinhole simultaneously.
> 
> -Mike
> 
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [QUESTION ON BUG] the rcu stall issue could not be reproduced

2012-07-23 Thread Martin Mokrejs

Hi,
  I see few more RCU bugs reported in bugzilla:

https://bugzilla.kernel.org/show_bug.cgi?id=43028
https://bugzilla.kernel.org/show_bug.cgi?id=40092
https://bugzilla.kernel.org/show_bug.cgi?id=42997

And, I placed my previous long email with logs at
https://bugzilla.kernel.org/show_bug.cgi?id=45091

Hope this helps eventually once.
Martin

Mike Galbraith wrote:
 On Fri, 2012-07-20 at 11:09 +0800, Michael Wang wrote: 
 Hi, Mike, Martin, Dan

 I'm currently taking an eye on the rcu stall issue which was reported by
 you in the mail:

 rcu: endless stalls
  From: Mike Galbraith
 linux-3.4-rc7: rcu_sched self-detected stall on CPU
  From: Martin Mokrejs
 RCU stalls in linux-next
  From: Dan Carpenter

 I try to reproduce the issue on my X86 server with 12 cpu
 
 The 'endless stalls' box was 341.3 times larger.  Dunno if you can
 even set a serial port slow enough to approximate all cores trying to
 gripe through a single pinhole simultaneously.
 
 -Mike
 
 


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.22-rc1 killed my ext3 filesystem cleanly unmounted

2007-05-18 Thread Martin Mokrejs

On Fri, May 18, 2007 at 04:20:39PM +0200, Jesper Juhl wrote:
>  On 18/05/07, Martin Mokrejs <[EMAIL PROTECTED]> wrote:
> > Hi,
> >   I just tried the 2.6.22-r1 candidate to test whether some bug I have
> > hit in the past still exists. I did use 2.6.20.6 so far. So, I have
> > cleanly rebooted to use the new kernel, after the machine came up I
> > tried to mess with the bug, and had to reboot again to play with kernel
> > commandline parameters. Unfortunately, on the next reboot fsck was
> > schedules on my filesystem after 38 clean mounts. :( And the problem
> > started. The fsck found some unused inodes, but probably did not know
> > where do they belong to, but it deleted them automagically. Finally, the
> > fsck died because it cannot fine some '..' entry.
> >
> 
>  How do you know that the corruption was caused by 2.6.21-rc1 ?
>  Isn't it possible that the corruption was created by an earlier
>  kernel, but only detected when a forced fsck was run - which just
>  happened to be while you were running 2.6.21-rc1 ...
> 
>  My point is that, as far as I can see, there's nothing tying
>  2.6.21-rc1 specifically to this corruption... or?

You might be right, but I thought maybe more probably is the cause in kernel
as that is what I have changed recently. ;) Or maybe someone can at leats say
"No, no changes to be considered between 2.6.20.6 and 2.6.22-rc1.". ;)

Martin
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.22-rc1 killed my ext3 filesystem cleanly unmounted

2007-05-18 Thread Martin Mokrejs

On Fri, May 18, 2007 at 07:38:18PM +0530, Kalpak Shah wrote:
> On Fri, 2007-05-18 at 15:51 +0200, Martin Mokrejs wrote:
> > On Fri, May 18, 2007 at 05:17:06PM +0530, Kalpak Shah wrote:
> > > On Fri, 2007-05-18 at 11:06 +0200, Martin Mokrejs wrote:
> > > > Hi,
> > > >   I just tried the 2.6.22-r1 candidate to test whether some bug I have 
> > > > hit in the past still exists. I did use 2.6.20.6 so far. So, I have 
> > > > cleanly rebooted to use the new kernel, after the machine came up I 
> > > > tried to mess with the bug, and had to reboot again to play with kernel 
> > > > commandline parameters. Unfortunately, on the next reboot fsck was 
> > > > schedules on my filesystem after 38 clean mounts. :( And the problem 
> > > > started. The fsck found some unused inodes, but probably did not know 
> > > > where do they belong to, but it deleted them automagically. Finally, 
> > > > the 
> > > > fsck died because it cannot fine some '..' entry.
> > > > 
> > > > /dev/hda3: Entry '..' in .../??? (5701636) has deleted/unused inode
> > > > 5570561. CLEARED.
> > > > Unconnected directory inode 5570567 (...)
> > > > 
> > > > /dev/hda3: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY.
> > > >   (i.e., without -a or -p options)
> > > > 
> > > 
> > > This means that e2fsck has reached a point where it needs user
> > > intervention. So you should not run e2fsck with -p, -a or -y options.
> > > Look up the e2fsck man page for more on this.
> > 
> > Yeah, stupid init.d script in Gentoo. I will report at Gentoo as well but
> > how can I revert the changes? Can you say which directories were affected?
> 
> No there is nothing wrong with your script, most problems get solved by
> -a or -p and hence your init.d script is correct in using these options.
> 
> I don't understand what you mean by reverting your changes. 

I would like to boot with another/previous/tested kernel and run another,
stable fsck version. Yes, I cannot say how it happened that ext3 had broken
directory, but for sure before making changes to the filesystem I would
boot with a tested kernel and tools.

> 
> An unconnected directory inode means that this directory (inode 5570567)
> does not have a valid ".." entry (which is the backpointer to its
> parent). So this directory will be moved to lost+found.

And those original "errors"? Did not those modifications cause this in turn?


/dev/hda3 has been mounted 38 times without being checked, check forced
HTREE directory inode 1163319 has an invalid root node.
HTREE INDEX CLEARED
Entry '..' in .../??? (5570587) has deleted/unused inode 5570561. 
CLEARED.
/dev/hda3: Entry '..' in .../??? (5570620) has deleted/unused inode 
5570561. CLEARED.
/dev/hda3: Entry '..' in .../??? (5570625) has deleted/unused inode
5570561. CLEARED.
/dev/hda3: Entry '..' in .../??? (5570567) has deleted/unused inode
5570561. CLEARED.
[cut]

Martin
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.22-rc1 killed my ext3 filesystem cleanly unmounted

2007-05-18 Thread Martin Mokrejs

On Fri, May 18, 2007 at 05:17:06PM +0530, Kalpak Shah wrote:
> On Fri, 2007-05-18 at 11:06 +0200, Martin Mokrejs wrote:
> > Hi,
> >   I just tried the 2.6.22-r1 candidate to test whether some bug I have 
> > hit in the past still exists. I did use 2.6.20.6 so far. So, I have 
> > cleanly rebooted to use the new kernel, after the machine came up I 
> > tried to mess with the bug, and had to reboot again to play with kernel 
> > commandline parameters. Unfortunately, on the next reboot fsck was 
> > schedules on my filesystem after 38 clean mounts. :( And the problem 
> > started. The fsck found some unused inodes, but probably did not know 
> > where do they belong to, but it deleted them automagically. Finally, the 
> > fsck died because it cannot fine some '..' entry.
> > 
> > /dev/hda3: Entry '..' in .../??? (5701636) has deleted/unused inode
> > 5570561. CLEARED.
> > Unconnected directory inode 5570567 (...)
> > 
> > /dev/hda3: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY.
> >   (i.e., without -a or -p options)
> > 
> 
> This means that e2fsck has reached a point where it needs user
> intervention. So you should not run e2fsck with -p, -a or -y options.
> Look up the e2fsck man page for more on this.

Yeah, stupid init.d script in Gentoo. I will report at Gentoo as well but
how can I revert the changes? Can you say which directories were affected?
Thanks,
Martin
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

2.6.22-rc1 killed my ext3 filesystem cleanly unmounted

2007-05-18 Thread Martin Mokrejs

Hi,
  I just tried the 2.6.22-r1 candidate to test whether some bug I have 
hit in the past still exists. I did use 2.6.20.6 so far. So, I have 
cleanly rebooted to use the new kernel, after the machine came up I 
tried to mess with the bug, and had to reboot again to play with kernel 
commandline parameters. Unfortunately, on the next reboot fsck was 
schedules on my filesystem after 38 clean mounts. :( And the problem 
started. The fsck found some unused inodes, but probably did not know 
where do they belong to, but it deleted them automagically. Finally, the 
fsck died because it cannot fine some '..' entry.

  Here is retyped what happened as recorded by my camera. ;)


/dev/hda3 has been mounted 38 times without being checked, check forced
HTREE directory inode 1163319 has an invalid root node.
HTREE INDEX CLEARED
Entry '..' in .../??? (5570587) has deleted/unused inode 5570561. 
CLEARED.
/dev/hda3: Entry '..' in .../??? (5570620) has deleted/unused inode 
5570561. CLEARED.
/dev/hda3: Entry '..' in .../??? (5570625) has deleted/unused inode
5570561. CLEARED.
/dev/hda3: Entry '..' in .../??? (5570567) has deleted/unused inode
5570561. CLEARED.
/dev/hda3: Entry '..' in .../??? (5570614) has deleted/unused inode
5570561. CLEARED.
/dev/hda3: Entry '..' in .../??? (5570603) has deleted/unused inode
5570561. CLEARED.
/dev/hda3: Entry '..' in .../??? (5586948) has deleted/unused inode
5570561. CLEARED.
/dev/hda3: Entry '..' in .../??? (5586957) has deleted/unused inode
5570561. CLEARED.
/dev/hda3: Entry '..' in .../??? (5701636) has deleted/unused inode
5570561. CLEARED.
Unconnected directory inode 5570567 (...)

/dev/hda3: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY.
  (i.e., without -a or -p options)




  Turning off the power and booting back with 2.6.20.6 and obviously 
running same fsck gives me:

/dev/hda3 contains a file system with errors, check forced.
Missing '..' in direcotry inode 5570587.

/dev/hda3: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY.
  (i.e., without -a or -p options)


  What do you recommend me now?

  I cannot say what is the fsck version, but I can tell you this is a 
Gentoo linux box in the ~x86 tree, so whatever is in the "unstable" 
branch. :(

  I do use ext2/ext3 windows driver from http://www.fs-driver.org/ to 
access the filesystem. Even now, when the filesystem should be marked as 
dirty I can access it from windows and see the files. Does the extfs.sys 
ignore the mark? ;) Anyway, since that time there is a directory 
'Recycled' at the top level of the filesystem. ;-)

  I do remember recently that possibly one of the system packages in 
Gentoo installed some kind of a hash into the filesystem, or hashing 
support, something like that. Sorry, I do not remember the details.
Am just think what could have made the fsck think there is something 
wrong.

  I think IO would like to restore the filesystem to the previous stage 
before running the fsck. How can I do it? No, I do not have a backup of 
the filesystem. :(

I subscribed to the email lists but please send me Cc: anyway. Many thanks.
Martin
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

2.6.22-rc1 killed my ext3 filesystem cleanly unmounted

2007-05-18 Thread Martin Mokrejs

Hi,
  I just tried the 2.6.22-r1 candidate to test whether some bug I have 
hit in the past still exists. I did use 2.6.20.6 so far. So, I have 
cleanly rebooted to use the new kernel, after the machine came up I 
tried to mess with the bug, and had to reboot again to play with kernel 
commandline parameters. Unfortunately, on the next reboot fsck was 
schedules on my filesystem after 38 clean mounts. :( And the problem 
started. The fsck found some unused inodes, but probably did not know 
where do they belong to, but it deleted them automagically. Finally, the 
fsck died because it cannot fine some '..' entry.

  Here is retyped what happened as recorded by my camera. ;)


/dev/hda3 has been mounted 38 times without being checked, check forced
HTREE directory inode 1163319 has an invalid root node.
HTREE INDEX CLEARED
Entry '..' in .../??? (5570587) has deleted/unused inode 5570561. 
CLEARED.
/dev/hda3: Entry '..' in .../??? (5570620) has deleted/unused inode 
5570561. CLEARED.
/dev/hda3: Entry '..' in .../??? (5570625) has deleted/unused inode
5570561. CLEARED.
/dev/hda3: Entry '..' in .../??? (5570567) has deleted/unused inode
5570561. CLEARED.
/dev/hda3: Entry '..' in .../??? (5570614) has deleted/unused inode
5570561. CLEARED.
/dev/hda3: Entry '..' in .../??? (5570603) has deleted/unused inode
5570561. CLEARED.
/dev/hda3: Entry '..' in .../??? (5586948) has deleted/unused inode
5570561. CLEARED.
/dev/hda3: Entry '..' in .../??? (5586957) has deleted/unused inode
5570561. CLEARED.
/dev/hda3: Entry '..' in .../??? (5701636) has deleted/unused inode
5570561. CLEARED.
Unconnected directory inode 5570567 (...)

/dev/hda3: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY.
  (i.e., without -a or -p options)




  Turning off the power and booting back with 2.6.20.6 and obviously 
running same fsck gives me:

/dev/hda3 contains a file system with errors, check forced.
Missing '..' in direcotry inode 5570587.

/dev/hda3: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY.
  (i.e., without -a or -p options)


  What do you recommend me now?

  I cannot say what is the fsck version, but I can tell you this is a 
Gentoo linux box in the ~x86 tree, so whatever is in the unstable 
branch. :(

  I do use ext2/ext3 windows driver from http://www.fs-driver.org/ to 
access the filesystem. Even now, when the filesystem should be marked as 
dirty I can access it from windows and see the files. Does the extfs.sys 
ignore the mark? ;) Anyway, since that time there is a directory 
'Recycled' at the top level of the filesystem. ;-)

  I do remember recently that possibly one of the system packages in 
Gentoo installed some kind of a hash into the filesystem, or hashing 
support, something like that. Sorry, I do not remember the details.
Am just think what could have made the fsck think there is something 
wrong.

  I think IO would like to restore the filesystem to the previous stage 
before running the fsck. How can I do it? No, I do not have a backup of 
the filesystem. :(

I subscribed to the email lists but please send me Cc: anyway. Many thanks.
Martin
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.22-rc1 killed my ext3 filesystem cleanly unmounted

2007-05-18 Thread Martin Mokrejs

On Fri, May 18, 2007 at 05:17:06PM +0530, Kalpak Shah wrote:
 On Fri, 2007-05-18 at 11:06 +0200, Martin Mokrejs wrote:
  Hi,
I just tried the 2.6.22-r1 candidate to test whether some bug I have 
  hit in the past still exists. I did use 2.6.20.6 so far. So, I have 
  cleanly rebooted to use the new kernel, after the machine came up I 
  tried to mess with the bug, and had to reboot again to play with kernel 
  commandline parameters. Unfortunately, on the next reboot fsck was 
  schedules on my filesystem after 38 clean mounts. :( And the problem 
  started. The fsck found some unused inodes, but probably did not know 
  where do they belong to, but it deleted them automagically. Finally, the 
  fsck died because it cannot fine some '..' entry.
  
  /dev/hda3: Entry '..' in .../??? (5701636) has deleted/unused inode
  5570561. CLEARED.
  Unconnected directory inode 5570567 (...)
  
  /dev/hda3: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY.
(i.e., without -a or -p options)
  
 
 This means that e2fsck has reached a point where it needs user
 intervention. So you should not run e2fsck with -p, -a or -y options.
 Look up the e2fsck man page for more on this.

Yeah, stupid init.d script in Gentoo. I will report at Gentoo as well but
how can I revert the changes? Can you say which directories were affected?
Thanks,
Martin
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.22-rc1 killed my ext3 filesystem cleanly unmounted

2007-05-18 Thread Martin Mokrejs

On Fri, May 18, 2007 at 07:38:18PM +0530, Kalpak Shah wrote:
 On Fri, 2007-05-18 at 15:51 +0200, Martin Mokrejs wrote:
  On Fri, May 18, 2007 at 05:17:06PM +0530, Kalpak Shah wrote:
   On Fri, 2007-05-18 at 11:06 +0200, Martin Mokrejs wrote:
Hi,
  I just tried the 2.6.22-r1 candidate to test whether some bug I have 
hit in the past still exists. I did use 2.6.20.6 so far. So, I have 
cleanly rebooted to use the new kernel, after the machine came up I 
tried to mess with the bug, and had to reboot again to play with kernel 
commandline parameters. Unfortunately, on the next reboot fsck was 
schedules on my filesystem after 38 clean mounts. :( And the problem 
started. The fsck found some unused inodes, but probably did not know 
where do they belong to, but it deleted them automagically. Finally, 
the 
fsck died because it cannot fine some '..' entry.

/dev/hda3: Entry '..' in .../??? (5701636) has deleted/unused inode
5570561. CLEARED.
Unconnected directory inode 5570567 (...)

/dev/hda3: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY.
  (i.e., without -a or -p options)

   
   This means that e2fsck has reached a point where it needs user
   intervention. So you should not run e2fsck with -p, -a or -y options.
   Look up the e2fsck man page for more on this.
  
  Yeah, stupid init.d script in Gentoo. I will report at Gentoo as well but
  how can I revert the changes? Can you say which directories were affected?
 
 No there is nothing wrong with your script, most problems get solved by
 -a or -p and hence your init.d script is correct in using these options.
 
 I don't understand what you mean by reverting your changes. 

I would like to boot with another/previous/tested kernel and run another,
stable fsck version. Yes, I cannot say how it happened that ext3 had broken
directory, but for sure before making changes to the filesystem I would
boot with a tested kernel and tools.

 
 An unconnected directory inode means that this directory (inode 5570567)
 does not have a valid .. entry (which is the backpointer to its
 parent). So this directory will be moved to lost+found.

And those original errors? Did not those modifications cause this in turn?


/dev/hda3 has been mounted 38 times without being checked, check forced
HTREE directory inode 1163319 has an invalid root node.
HTREE INDEX CLEARED
Entry '..' in .../??? (5570587) has deleted/unused inode 5570561. 
CLEARED.
/dev/hda3: Entry '..' in .../??? (5570620) has deleted/unused inode 
5570561. CLEARED.
/dev/hda3: Entry '..' in .../??? (5570625) has deleted/unused inode
5570561. CLEARED.
/dev/hda3: Entry '..' in .../??? (5570567) has deleted/unused inode
5570561. CLEARED.
[cut]

Martin
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.22-rc1 killed my ext3 filesystem cleanly unmounted

2007-05-18 Thread Martin Mokrejs

On Fri, May 18, 2007 at 04:20:39PM +0200, Jesper Juhl wrote:
  On 18/05/07, Martin Mokrejs [EMAIL PROTECTED] wrote:
  Hi,
I just tried the 2.6.22-r1 candidate to test whether some bug I have
  hit in the past still exists. I did use 2.6.20.6 so far. So, I have
  cleanly rebooted to use the new kernel, after the machine came up I
  tried to mess with the bug, and had to reboot again to play with kernel
  commandline parameters. Unfortunately, on the next reboot fsck was
  schedules on my filesystem after 38 clean mounts. :( And the problem
  started. The fsck found some unused inodes, but probably did not know
  where do they belong to, but it deleted them automagically. Finally, the
  fsck died because it cannot fine some '..' entry.
 
 
  How do you know that the corruption was caused by 2.6.21-rc1 ?
  Isn't it possible that the corruption was created by an earlier
  kernel, but only detected when a forced fsck was run - which just
  happened to be while you were running 2.6.21-rc1 ...
 
  My point is that, as far as I can see, there's nothing tying
  2.6.21-rc1 specifically to this corruption... or?

You might be right, but I thought maybe more probably is the cause in kernel
as that is what I have changed recently. ;) Or maybe someone can at leats say
No, no changes to be considered between 2.6.20.6 and 2.6.22-rc1.. ;)

Martin
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: init 0 stopped working

2005-07-16 Thread Martin Mokrejs

>   I used to shutdown my P4 machine based on ASUS P4C800E-Deluxe
> with simple "init 0" command. That somehow broke between 2.6.12-rc6-git2
> and 2.6.13-rc1. The machines makes the sound like shutdown but it
> immediately turns the power on again. I used acpi and the kernel
> configs should be almost identical in all cases, as I just recopy
> previously used .config and run "make oldconfig".
> 
>   Any clues? I still happens even with 2.6.13-rc3-git2.

It was introduced after 2.6.12 but before or with 2.6.13-rc1.
It is not fixed by acpi-20050708 patch for 2.6.13 series.
I had KEXEC enabled and also disabled, but the problem still persists.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: init 0 stopped working

2005-07-16 Thread Martin Mokrejs

   I used to shutdown my P4 machine based on ASUS P4C800E-Deluxe
 with simple init 0 command. That somehow broke between 2.6.12-rc6-git2
 and 2.6.13-rc1. The machines makes the sound like shutdown but it
 immediately turns the power on again. I used acpi and the kernel
 configs should be almost identical in all cases, as I just recopy
 previously used .config and run make oldconfig.
 
   Any clues? I still happens even with 2.6.13-rc3-git2.

It was introduced after 2.6.12 but before or with 2.6.13-rc1.
It is not fixed by acpi-20050708 patch for 2.6.13 series.
I had KEXEC enabled and also disabled, but the problem still persists.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

init 0 stopped working

2005-07-15 Thread Martin Mokrejs

Hi,
  I used to shutdown my P4 machine based on ASUS P4C800E-Deluxe
with simple "init 0" command. That somehow broke between 2.6.12-rc6-git2
and 2.6.13-rc1. The machines makes the sound like shutdown but it
immediately turns the power on again. I used acpi and the kernel
configs should be almost identical in all cases, as I just recopy
previously used .config and run "make oldconfig".

  Any clues? I still happens even with 2.6.13-rc3-git2.
Please Cc: me in replies.
Martin
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

init 0 stopped working

2005-07-15 Thread Martin Mokrejs

Hi,
  I used to shutdown my P4 machine based on ASUS P4C800E-Deluxe
with simple init 0 command. That somehow broke between 2.6.12-rc6-git2
and 2.6.13-rc1. The machines makes the sound like shutdown but it
immediately turns the power on again. I used acpi and the kernel
configs should be almost identical in all cases, as I just recopy
previously used .config and run make oldconfig.

  Any clues? I still happens even with 2.6.13-rc3-git2.
Please Cc: me in replies.
Martin
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Two 2.6.13-rc1 kernel crashes

2005-07-04 Thread Martin Mokrejs

Hi,
  I use on i686 architecture Gentoo linux with XFS filesystem.
Recently it happened to me 3 time that the machine locked,
although at least once sys-rq+b worked. Here is the log
from remote console. I don't remeber having such problems
with 2.6.12-rc6-git2, which was my previous testing kernel.
The problems appear under heavy load when I compile/install
some packages and maybe it's just a bad coincidence or not,
when I move my usb mouse in fvwm2 environment. The machine
locks.
Any clues? Please Cc: me in replies.
Martin
Linux version 2.6.13-rc1 ([EMAIL PROTECTED]) (gcc version 3.4.4 (Gentoo 3.4.4, 
ssp-3.4.4-1.0, pie-8.7.8)) #2 Mon Jul 4 01:13:46 CEST 2005
BIOS-provided physical RAM map: 
  
 BIOS-e820:  - 0009fc00 (usable)
  
 BIOS-e820: 0009fc00 - 000a (reserved)  
  
 BIOS-e820: 000e8000 - 0010 (reserved)
 BIOS-e820: 0010 - bff3 (usable)
 BIOS-e820: bff3 - bff4 (ACPI data)
 BIOS-e820: bff4 - bfff (ACPI NVS)
 BIOS-e820: bfff - c000 (reserved)
 BIOS-e820: ffb8 - 0001 (reserved)
2175MB HIGHMEM available.
896MB LOWMEM available.
found SMP MP-table at 000ff780
DMI 2.3 present.
ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled)
Processor #0 15:2 APIC version 20
ACPI: LAPIC (acpi_id[0x02] lapic_id[0x01] disabled)
ACPI: IOAPIC (id[0x02] address[0xfec0] gsi_base[0])
IOAPIC[0]: apic_id 2, version 32, address 0xfec0, GSI 0-23
ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
Enabling APIC mode:  Flat.  Using 1 I/O APICs
Using ACPI (MADT) for SMP configuration information
Allocating PCI resources starting at c000 (gap: c000:3fb8)
Built 1 zonelists
Kernel command line: root=/dev/sda2 ide=reverse agp=try_unsupported 
console=ttyS0,57600n8 console=tty0 vga=792 idebus=66
ide_setup: ide=reverse : Enabled support for IDE inverse scan order.
ide_setup: idebus=66
Initializing CPU#0
PID hash table entries: 4096 (order: 12, 65536 bytes)
Detected 3228.252 MHz processor.
Using tsc for high-res timesource
Console: colour dummy device 80x25
Dentry cache hash table entries: 131072 (order: 7, 524288 bytes)
Inode-cache hash table entries: 65536 (order: 6, 262144 bytes)
Memory: 3112324k/3144896k available (2926k kernel code, 31420k reserved, 1612k 
data, 172k init, 2227392k highmem)
Checking if this processor honours the WP bit even in supervisor mode... Ok.
Calibrating delay using timer specific routine.. 6464.39 BogoMIPS (lpj=12928798)
Mount-cache hash table entries: 512
CPU: Trace cache: 12K uops, L1 D cache: 8K
CPU: L2 cache: 512K
Intel machine check architecture supported.
Intel machine check reporting enabled on CPU#0.
CPU0: Intel P4/Xeon Extended MCE MSRs (12) available
CPU0: Thermal monitoring enabled
CPU: Intel(R) Pentium(R) 4 CPU 3.00GHz stepping 09
Enabling fast FPU save and restore... done.
Enabling unmasked SIMD FPU exception support... done.
Checking 'hlt' instruction... OK.
ENABLING IO-APIC IRQs
..TIMER: vector=0x31 pin1=2 pin2=-1
NET: Registered protocol family 16
PCI: PCI BIOS revision 2.10 entry at 0xf0031, last bus=3
PCI: Using configuration type 1
mtrr: v2.0 (20020519)
ACPI: Subsystem revision 20050309
ACPI: Interpreter enabled
ACPI: Using IOAPIC for interrupt routing
ACPI: PCI Root Bridge [PCI0] (:00)
PCI: Probing PCI hardware (bus 00)
PCI: Ignoring BAR0-3 of IDE controller :00:1f.1
PCI: Transparent bridge - :00:1e.0
ACPI: PCI Interrupt Link [LNKA] (IRQs 3 4 5 6 7 *10 11 12 14 15)
ACPI: PCI Interrupt Link [LNKB] (IRQs 3 4 5 6 7 10 *11 12 14 15)
ACPI: PCI Interrupt Link [LNKC] (IRQs 3 4 *5 6 7 10 11 12 14 15)
ACPI: PCI Interrupt Link [LNKD] (IRQs 3 4 5 6 *7 10 11 12 14 15)
ACPI: PCI Interrupt Link [LNKE] (IRQs 3 4 5 6 7 10 *11 12 14 15)
ACPI: PCI Interrupt Link [LNKF] (IRQs 3 4 5 6 7 10 11 12 14 15) *0, disabled.
ACPI: PCI Interrupt Link [LNKG] (IRQs 3 4 5 6 7 10 11 12 14 15) *0, disabled.
ACPI: PCI Interrupt Link [LNKH] (IRQs 3 4 5 6 7 10 *11 12 14 15)
Linux Plug and Play Support v0.97 (c) Adam Belay
pnp: PnP ACPI init
pnp: PnP ACPI: found 12 devices
SCSI subsystem initialized
usbcore: registered new driver usbfs
usbcore: registered new driver hub
PCI: Using ACPI for IRQ routing
PCI: If a device doesn't work, try "pci=routeirq".  If it helps, post a report
pnp: 00:08: ioport range 0x680-0x6ff has been reserved
pnp: 00:08: ioport range 0x290-0x297 has been reserved
Machine check exception polling timer started.
IA-32 Microcode Update Driver: v1.14 <[EMAIL PROTECTED]>
highmem bounce pool size: 64 pages
SGI XFS with no debug enabled
Initializing Cryptographic API
ACPI: PCI Interrupt :01:00.0[A] -> GSI 16 (level, low) -> IRQ 16
radeonfb: Found Intel x86 BIOS ROM Image
radeonfb: Retreived PLL infos from BIOS
radeonfb: Reference=27.00 MHz

Two 2.6.13-rc1 kernel crashes

2005-07-04 Thread Martin Mokrejs

Hi,
  I use on i686 architecture Gentoo linux with XFS filesystem.
Recently it happened to me 3 time that the machine locked,
although at least once sys-rq+b worked. Here is the log
from remote console. I don't remeber having such problems
with 2.6.12-rc6-git2, which was my previous testing kernel.
The problems appear under heavy load when I compile/install
some packages and maybe it's just a bad coincidence or not,
when I move my usb mouse in fvwm2 environment. The machine
locks.
Any clues? Please Cc: me in replies.
Martin
Linux version 2.6.13-rc1 ([EMAIL PROTECTED]) (gcc version 3.4.4 (Gentoo 3.4.4, 
ssp-3.4.4-1.0, pie-8.7.8)) #2 Mon Jul 4 01:13:46 CEST 2005
BIOS-provided physical RAM map: 
  
 BIOS-e820:  - 0009fc00 (usable)
  
 BIOS-e820: 0009fc00 - 000a (reserved)  
  
 BIOS-e820: 000e8000 - 0010 (reserved)
 BIOS-e820: 0010 - bff3 (usable)
 BIOS-e820: bff3 - bff4 (ACPI data)
 BIOS-e820: bff4 - bfff (ACPI NVS)
 BIOS-e820: bfff - c000 (reserved)
 BIOS-e820: ffb8 - 0001 (reserved)
2175MB HIGHMEM available.
896MB LOWMEM available.
found SMP MP-table at 000ff780
DMI 2.3 present.
ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled)
Processor #0 15:2 APIC version 20
ACPI: LAPIC (acpi_id[0x02] lapic_id[0x01] disabled)
ACPI: IOAPIC (id[0x02] address[0xfec0] gsi_base[0])
IOAPIC[0]: apic_id 2, version 32, address 0xfec0, GSI 0-23
ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
Enabling APIC mode:  Flat.  Using 1 I/O APICs
Using ACPI (MADT) for SMP configuration information
Allocating PCI resources starting at c000 (gap: c000:3fb8)
Built 1 zonelists
Kernel command line: root=/dev/sda2 ide=reverse agp=try_unsupported 
console=ttyS0,57600n8 console=tty0 vga=792 idebus=66
ide_setup: ide=reverse : Enabled support for IDE inverse scan order.
ide_setup: idebus=66
Initializing CPU#0
PID hash table entries: 4096 (order: 12, 65536 bytes)
Detected 3228.252 MHz processor.
Using tsc for high-res timesource
Console: colour dummy device 80x25
Dentry cache hash table entries: 131072 (order: 7, 524288 bytes)
Inode-cache hash table entries: 65536 (order: 6, 262144 bytes)
Memory: 3112324k/3144896k available (2926k kernel code, 31420k reserved, 1612k 
data, 172k init, 2227392k highmem)
Checking if this processor honours the WP bit even in supervisor mode... Ok.
Calibrating delay using timer specific routine.. 6464.39 BogoMIPS (lpj=12928798)
Mount-cache hash table entries: 512
CPU: Trace cache: 12K uops, L1 D cache: 8K
CPU: L2 cache: 512K
Intel machine check architecture supported.
Intel machine check reporting enabled on CPU#0.
CPU0: Intel P4/Xeon Extended MCE MSRs (12) available
CPU0: Thermal monitoring enabled
CPU: Intel(R) Pentium(R) 4 CPU 3.00GHz stepping 09
Enabling fast FPU save and restore... done.
Enabling unmasked SIMD FPU exception support... done.
Checking 'hlt' instruction... OK.
ENABLING IO-APIC IRQs
..TIMER: vector=0x31 pin1=2 pin2=-1
NET: Registered protocol family 16
PCI: PCI BIOS revision 2.10 entry at 0xf0031, last bus=3
PCI: Using configuration type 1
mtrr: v2.0 (20020519)
ACPI: Subsystem revision 20050309
ACPI: Interpreter enabled
ACPI: Using IOAPIC for interrupt routing
ACPI: PCI Root Bridge [PCI0] (:00)
PCI: Probing PCI hardware (bus 00)
PCI: Ignoring BAR0-3 of IDE controller :00:1f.1
PCI: Transparent bridge - :00:1e.0
ACPI: PCI Interrupt Link [LNKA] (IRQs 3 4 5 6 7 *10 11 12 14 15)
ACPI: PCI Interrupt Link [LNKB] (IRQs 3 4 5 6 7 10 *11 12 14 15)
ACPI: PCI Interrupt Link [LNKC] (IRQs 3 4 *5 6 7 10 11 12 14 15)
ACPI: PCI Interrupt Link [LNKD] (IRQs 3 4 5 6 *7 10 11 12 14 15)
ACPI: PCI Interrupt Link [LNKE] (IRQs 3 4 5 6 7 10 *11 12 14 15)
ACPI: PCI Interrupt Link [LNKF] (IRQs 3 4 5 6 7 10 11 12 14 15) *0, disabled.
ACPI: PCI Interrupt Link [LNKG] (IRQs 3 4 5 6 7 10 11 12 14 15) *0, disabled.
ACPI: PCI Interrupt Link [LNKH] (IRQs 3 4 5 6 7 10 *11 12 14 15)
Linux Plug and Play Support v0.97 (c) Adam Belay
pnp: PnP ACPI init
pnp: PnP ACPI: found 12 devices
SCSI subsystem initialized
usbcore: registered new driver usbfs
usbcore: registered new driver hub
PCI: Using ACPI for IRQ routing
PCI: If a device doesn't work, try pci=routeirq.  If it helps, post a report
pnp: 00:08: ioport range 0x680-0x6ff has been reserved
pnp: 00:08: ioport range 0x290-0x297 has been reserved
Machine check exception polling timer started.
IA-32 Microcode Update Driver: v1.14 [EMAIL PROTECTED]
highmem bounce pool size: 64 pages
SGI XFS with no debug enabled
Initializing Cryptographic API
ACPI: PCI Interrupt :01:00.0[A] - GSI 16 (level, low) - IRQ 16
radeonfb: Found Intel x86 BIOS ROM Image
radeonfb: Retreived PLL infos from BIOS
radeonfb: Reference=27.00 MHz

Re: find: /usr/src/linux-2.4.30/include/asm: Too many levels of symbolic links

2005-04-07 Thread Martin MOKREJS(

DervishD wrote:
again I've hit some wird problem doing "make dep" for 2.4 kernel:

Not a kernel problem but a findutils problem. Fixed in 4.2.19,
but 4.2.20 was released recently. Upgrade.
You were right. Thanks!
M.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: find: /usr/src/linux-2.4.30/include/asm: Too many levels of symbolic links

2005-04-07 Thread Martin MOKREJS(

DervishD wrote:
again I've hit some wird problem doing make dep for 2.4 kernel:

Not a kernel problem but a findutils problem. Fixed in 4.2.19,
but 4.2.20 was released recently. Upgrade.
You were right. Thanks!
M.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

95 matches

Mail list logo