Re: [BUG] ohci_enable() fails during resume

2015-06-23 Thread Clemens Ladisch
Lukasz Stelmach wrote:
> A bit, suddenly by desktop PC started to fail to resume. [...]
> The failing code is somewhere around line 2400 of
> drivers/firewire/ohci.c (the latest mainline).

>0x003f <+31>: callq  0xb037 
>0x0044 <+36>: mov0x898(%rbx),%rax
> -->0x004b <+43>: mov(%rax),%edx   <--

(The copy_config_rom call was not actually executed; the else branch
jumped to 44.)

ohci->next_config_rom is NULL because ohci->config_rom is NULL.

> The code around the line 2400 appears to handle multiple
> firewire ports (if I recognise variable names correctly, e.g.
> next_config_rom).

No, this code handles multiple versions of the same data structure.

> Hardware bug in the on-board firewire controller *and* a bug in the
> driver.

Indeed; this appears to be the culprit:
> [  232.855042] firewire_ohci :04:03.0: added OHCI v1.0 device as card 0, 
> 8 IR + 8 IT contexts, quirks 0x0
> [  232.864724] firewire_ohci :04:03.0: bad self ID 0/1 ( != 
> ~)

With the "bad self ID", bus_reset_work() just aborts, and the controller
is never completely initialized (therefore the unexpected NULL).

Try unloading and reloading the firewire-ohci module to see if you can
ever avoid the "bad self ID" error.  But if it stays, your hardware
indeed appears to be broken.


Regards,
Clemens
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [BUG] ohci_enable() fails during resume

2015-06-23 Thread Clemens Ladisch
Lukasz Stelmach wrote:
 A bit, suddenly by desktop PC started to fail to resume. [...]
 The failing code is somewhere around line 2400 of
 drivers/firewire/ohci.c (the latest mainline).

0x003f +31: callq  0xb037 copy_config_rom
0x0044 +36: mov0x898(%rbx),%rax
 --0x004b +43: mov(%rax),%edx   --

(The copy_config_rom call was not actually executed; the else branch
jumped to 44.)

ohci-next_config_rom is NULL because ohci-config_rom is NULL.

 The code around the line 2400 appears to handle multiple
 firewire ports (if I recognise variable names correctly, e.g.
 next_config_rom).

No, this code handles multiple versions of the same data structure.

 Hardware bug in the on-board firewire controller *and* a bug in the
 driver.

Indeed; this appears to be the culprit:
 [  232.855042] firewire_ohci :04:03.0: added OHCI v1.0 device as card 0, 
 8 IR + 8 IT contexts, quirks 0x0
 [  232.864724] firewire_ohci :04:03.0: bad self ID 0/1 ( != 
 ~)

With the bad self ID, bus_reset_work() just aborts, and the controller
is never completely initialized (therefore the unexpected NULL).

Try unloading and reloading the firewire-ohci module to see if you can
ever avoid the bad self ID error.  But if it stays, your hardware
indeed appears to be broken.


Regards,
Clemens
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[BUG] ohci_enable() fails during resume

2015-06-21 Thread Lukasz Stelmach
Hi,

A bit, suddenly by desktop PC started to fail to resume. I have
redirected the console to ttyS0 and managed to caputere the oops
(attached). I am not a dissassebling expert and I have built my
kernel without full debuging symbold but here is what I found
(at least for the first trace in the attached oops.txt).

The failing code is somewhere around line 2400 of
drivers/firewire/ohci.c (the latest mainline). There is a note
about some values beeing NULL during the resume process but it
appears there are more NULLs then expected.

(%rbx) points to ohci structure.

I attach:

oops.txt - full dump of oops from console.

oops_code.txt - disassembled Code from the oops.

ohci_enable_disassembled.txt - dissassembled ohci_enable function
from my kernel (gentoo v3.18.8, but as far as I can tell there
haven't been much changes around).

I have marked the failing instruction in the disassembler dumps
with "-->".

There are two conditinons I *suspect* being responsible for this
situation.

Hardware failure. There was a storm a week ago recently which might
damaged the hardware. It appears it hit my SB Audigy very slightly
(the card's PCI interface appears OK but the AC97 codec is glitching
when setting mixer registers)

Hardware bug in the on-board firewire controller *and* a bug in the
driver. The code around the line 2400 appears to handle multiple
firewire ports (if I recognise variable names correctly, e.g.
next_config_rom). Now, without the SB card, I've got only one
firewire port so this is what has changed.

Please tell me how can I help more to debug this problem. (I may
have some problems using the firewire port because I don't have any
firewire devices)

Kind regards,
-- 
Było mi bardzo miło.   Twoje oczy lubią mnie
>Łukasz< i to mnie zgubi  (c)SNL
root@kotik ~ # dmesg -n8
root@kotik ~ # modprobe firewire_ohci
[  232.783281] calling  fw_core_init+0x0/0xfb [firewire_core] @ 1944
[  232.789425] initcall fw_core_init+0x0/0xfb [firewire_core] returned 0 after 
39 usecs
[  232.798059] calling  fw_ohci_init+0x0/0x4d [firewire_ohci] @ 1944
[  232.855042] firewire_ohci :04:03.0: added OHCI v1.0 device as card 0, 8 
IR + 8 IT contexts, quirks 0x0
[  232.864724] firewire_ohci :04:03.0: bad self ID 0/1 ( != 
~)
[  232.864862] initcall fw_ohci_init+0x0/0x4d [firewire_ohci] returned 0 after 
59285 usecs
root@kotik ~ # systemctl suspend
[  311.223312] PM: Syncing filesystems ... done.
[  311.301192] Freezing user space processes ... (elapsed 0.001 seconds) done.
[  311.309337] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) 
done.
[  311.318148] wlan0: deauthenticating from 28:be:9b:d2:ce:ce by local choice 
(Reason: 3=DEAUTH_LEAVING)
[  311.416471] cfg80211: Calling CRDA to update world regulatory domain

[  usb usb5: root hub lost power or was reset
[  340.411105] snd_hda_intel :00:1b.0: irq 28 for MSI/MSI-X
[  340.411222] usb usb6: root hub lost power or was reset
[  340.411302] usb usb7: root hub lost power or was reset
[  340.411380] usb usb8: root hub lost power or was reset
[  340.411711] usb usb4: root hub lost power or was reset
[  340.411737] rtc_cmos 00:01: System wakeup disabled by ACPI
[  340.412285] serial 00:05: activated
[  341.759829] BUG: unable to handle kernel NULL pointer dereference at 
  (null)
[  341.759834] IP: [] ohci_enable+0x274/0x570 [firewire_ohci]
[  341.759836] PGD 0 
[  341.759837] Oops:  [#1] PREEMPT SMP 
[  341.759859] Modules linked in: firewire_ohci firewire_core crc_itu_t ctr ccm 
snd_usb_audio snd_usbmidi_lib snd_rawmidi snd_seq_device snd_hda_codec_analog 
snd_hda_codec_generic arc4 iTCO_wdt coretemp kvm_intel kvm microcode sr_mod 
rtl8187 serio_raw cdrom eeprom_93cx6 mac80211 mousedev gspca_zc3xx gspca_main 
uas videodev cfg80211 btusb usb_storage media bluetooth rfkill i2c_i801 
i2c_core sky2 lpc_ich mfd_core snd_hda_intel snd_hda_controller floppy 
snd_hda_codec snd_hwdep snd_pcm snd_timer asus_atk0110 acpi_cpufreq snd 
processor soundcore button thermal_sys hwmon binfmt_misc dm_mod ext4 crc16 
mbcache jbd2 usbhid hid_generic hid
[  341.759861] CPU: 0 PID: 2033 Comm: kworker/u8:84 Not tainted 3.18.7-gentoo #3
[  341.759862] Hardware name: System manufacturer P5K-E/P5K-E, BIOS 1305
06/19/2009
[  341.759866] Workqueue: events_unbound async_run_entry_fn
[  341.759867] task: 88019262e010 ti: 88019177 task.ti: 
88019177
[  341.759869] RIP: 0010:[]  [] 
ohci_enable+0x274/0x570 [firewire_ohci]
[  341.759870] RSP: 0018:880191773cb8  EFLAGS: 00010246
[  341.759871] RAX:  RBX: 88019140a000 RCX: 00c0
[  341.759872] RDX:  RSI: 0004 RDI: 88019140a5f8
[  341.759873] RBP: 880191773ce8 R08: 88019177 R09: 8800c9bb97e0
[  341.759873] R10: 000f R11: 0001 R12: 
[  341.759874] R13:  R14:  R15: 0001
[  

[BUG] ohci_enable() fails during resume

2015-06-21 Thread Lukasz Stelmach
Hi,

A bit, suddenly by desktop PC started to fail to resume. I have
redirected the console to ttyS0 and managed to caputere the oops
(attached). I am not a dissassebling expert and I have built my
kernel without full debuging symbold but here is what I found
(at least for the first trace in the attached oops.txt).

The failing code is somewhere around line 2400 of
drivers/firewire/ohci.c (the latest mainline). There is a note
about some values beeing NULL during the resume process but it
appears there are more NULLs then expected.

(%rbx) points to ohci structure.

I attach:

oops.txt - full dump of oops from console.

oops_code.txt - disassembled Code from the oops.

ohci_enable_disassembled.txt - dissassembled ohci_enable function
from my kernel (gentoo v3.18.8, but as far as I can tell there
haven't been much changes around).

I have marked the failing instruction in the disassembler dumps
with --.

There are two conditinons I *suspect* being responsible for this
situation.

Hardware failure. There was a storm a week ago recently which might
damaged the hardware. It appears it hit my SB Audigy very slightly
(the card's PCI interface appears OK but the AC97 codec is glitching
when setting mixer registers)

Hardware bug in the on-board firewire controller *and* a bug in the
driver. The code around the line 2400 appears to handle multiple
firewire ports (if I recognise variable names correctly, e.g.
next_config_rom). Now, without the SB card, I've got only one
firewire port so this is what has changed.

Please tell me how can I help more to debug this problem. (I may
have some problems using the firewire port because I don't have any
firewire devices)

Kind regards,
-- 
Było mi bardzo miło.   Twoje oczy lubią mnie
Łukasz i to mnie zgubi  (c)SNL
root@kotik ~ # dmesg -n8
root@kotik ~ # modprobe firewire_ohci
[  232.783281] calling  fw_core_init+0x0/0xfb [firewire_core] @ 1944
[  232.789425] initcall fw_core_init+0x0/0xfb [firewire_core] returned 0 after 
39 usecs
[  232.798059] calling  fw_ohci_init+0x0/0x4d [firewire_ohci] @ 1944
[  232.855042] firewire_ohci :04:03.0: added OHCI v1.0 device as card 0, 8 
IR + 8 IT contexts, quirks 0x0
[  232.864724] firewire_ohci :04:03.0: bad self ID 0/1 ( != 
~)
[  232.864862] initcall fw_ohci_init+0x0/0x4d [firewire_ohci] returned 0 after 
59285 usecs
root@kotik ~ # systemctl suspend
[  311.223312] PM: Syncing filesystems ... done.
[  311.301192] Freezing user space processes ... (elapsed 0.001 seconds) done.
[  311.309337] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) 
done.
[  311.318148] wlan0: deauthenticating from 28:be:9b:d2:ce:ce by local choice 
(Reason: 3=DEAUTH_LEAVING)
[  311.416471] cfg80211: Calling CRDA to update world regulatory domain

[  some RS-232 garbageusb usb5: root hub lost power or was reset
[  340.411105] snd_hda_intel :00:1b.0: irq 28 for MSI/MSI-X
[  340.411222] usb usb6: root hub lost power or was reset
[  340.411302] usb usb7: root hub lost power or was reset
[  340.411380] usb usb8: root hub lost power or was reset
[  340.411711] usb usb4: root hub lost power or was reset
[  340.411737] rtc_cmos 00:01: System wakeup disabled by ACPI
[  340.412285] serial 00:05: activated
[  341.759829] BUG: unable to handle kernel NULL pointer dereference at 
  (null)
[  341.759834] IP: [a069a7c4] ohci_enable+0x274/0x570 [firewire_ohci]
[  341.759836] PGD 0 
[  341.759837] Oops:  [#1] PREEMPT SMP 
[  341.759859] Modules linked in: firewire_ohci firewire_core crc_itu_t ctr ccm 
snd_usb_audio snd_usbmidi_lib snd_rawmidi snd_seq_device snd_hda_codec_analog 
snd_hda_codec_generic arc4 iTCO_wdt coretemp kvm_intel kvm microcode sr_mod 
rtl8187 serio_raw cdrom eeprom_93cx6 mac80211 mousedev gspca_zc3xx gspca_main 
uas videodev cfg80211 btusb usb_storage media bluetooth rfkill i2c_i801 
i2c_core sky2 lpc_ich mfd_core snd_hda_intel snd_hda_controller floppy 
snd_hda_codec snd_hwdep snd_pcm snd_timer asus_atk0110 acpi_cpufreq snd 
processor soundcore button thermal_sys hwmon binfmt_misc dm_mod ext4 crc16 
mbcache jbd2 usbhid hid_generic hid
[  341.759861] CPU: 0 PID: 2033 Comm: kworker/u8:84 Not tainted 3.18.7-gentoo #3
[  341.759862] Hardware name: System manufacturer P5K-E/P5K-E, BIOS 1305
06/19/2009
[  341.759866] Workqueue: events_unbound async_run_entry_fn
[  341.759867] task: 88019262e010 ti: 88019177 task.ti: 
88019177
[  341.759869] RIP: 0010:[a069a7c4]  [a069a7c4] 
ohci_enable+0x274/0x570 [firewire_ohci]
[  341.759870] RSP: 0018:880191773cb8  EFLAGS: 00010246
[  341.759871] RAX:  RBX: 88019140a000 RCX: 00c0
[  341.759872] RDX:  RSI: 0004 RDI: 88019140a5f8
[  341.759873] RBP: 880191773ce8 R08: 88019177 R09: 8800c9bb97e0
[  341.759873] R10: 000f R11: 0001 R12: 
[  341.759874] R13: