Re: kernel BUG at mm/rmap.c:631!

2008-02-17 Thread Ignacy Gawedzki
On Sun, Feb 17, 2008 at 09:16:36PM +0100, thus spake Rafael J. Wysocki:
> On Sunday, 17 of February 2008, Ignacy Gawedzki wrote:
> > Hi,
> 
> Hi,
> 
> > I was printing on the parallel port and suddenly the "parallel" CUPS backend
> > went 50% CPU (obviously endless-looping), while the other 50% were eaten by
> > ghostscript (strace didn't show anything, so this might be an "internal"
> > loop).  When I eventually killed the latter, I got this:
> 
> Which kernel is this?

As is shown in the dmesg, it is 2.6.24.1.

>Is it a regression?

Can't really say for sure.  At least it already happened with 2.6.23.9.

> If so, what's the last known
> working kernel?

This is really difficult to determine, since the event is pretty hard to
reproduce.  I'll try to investigate more, then. :/  This one happened pretty
much right after a reboot due to a completely frozen machine (no Oops or Eeek
whatsoever) apparently due to intensive writing to the parallel port (the
kernel complained that "FIFO write timed out" twice before locking up).

Of course I do suspect a hardware problem, but since last time I had similarly
strange things it ended up being due to misconfiguration, I still hope
someone will tell me this is also the case here.

-- 
:wq!
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


kernel BUG at mm/rmap.c:631!

2008-02-17 Thread Ignacy Gawedzki
Hi,

I was printing on the parallel port and suddenly the "parallel" CUPS backend
went 50% CPU (obviously endless-looping), while the other 50% were eaten by
ghostscript (strace didn't show anything, so this might be an "internal"
loop).  When I eventually killed the latter, I got this:

Eeek! page_mapcount(page) went negative! (-1)
  page pfn = 3
  page->flags = 80014
  page->count = 0
  page->mapping = 
  vma->vm_ops = _stext+0x3feff000/0x14
[ cut here ]
kernel BUG at mm/rmap.c:631!
invalid opcode:  [#1] 
Modules linked in: cls_fw sch_prio sch_htb iptable_nat xt_limit xt_state
ipt_REJECT xt_tcpudp ipt_LOG xt_DSCP xt_dscp xt_mark nf_conntrack_ipv4
xt_CONNMARK xt_MARK iptable_mangle iptable_filter ip_tables x_tables aes_i586
geode_aes aes_generic ieee80211_crypt_ccmp lirc_dev nf_nat_ftp nf_nat
nf_conntrack_ftp nf_conntrack ipv6 evdev hostap_pci hostap ieee80211_crypt
i2c_viapro via686a ide_cd

Pid: 5098, comm: gs Tainted: GF   (2.6.24.1 #9)
EIP: 0060:[] EFLAGS: 00010246 CPU: 0
EIP is at page_remove_rmap+0xe4/0x111
EAX:  EBX: c160 ECX: 0046 EDX: 5a52
ESI: e90d3f44 EDI: ea61b720 EBP: b700 ESP: eefc7e00
 DS: 007b ES: 007b FS:  GS:  SS: 0068
Process gs (pid: 5098, ti=eefc6000 task=ea6fd570 task.ti=eefc6000)
Stack: c0398eb6  c160 b6dc8000 c013f6c8 326b  e90d3f44 
   eefc7e74  0001 ef371b6c ef1073a0 c0454f98 ffa0  
   ef371b6c 000eaeb1 b70bc000  eefc7e74 e90d3860 ef1073a0 eefc7f10 
Call Trace:
 [] unmap_vmas+0x23e/0x403
 [] exit_mmap+0x5f/0xc9
 [] mmput+0x1b/0x5e
 [] do_exit+0x1ad/0x5ae
 [] sys_exit_group+0x0/0xd
 [] get_signal_to_deliver+0x370/0x380
 [] net_rx_action+0x70/0x144
 [] intr_handler+0x9c/0xcf
 [] do_page_fault+0x0/0x52d
 [] do_notify_resume+0x81/0x5c0
 [] handle_mm_fault+0x70/0x49d
 [] common_interrupt+0x23/0x28
 [] do_page_fault+0x18c/0x52d
 [] schedule+0x1f3/0x20d
 [] do_page_fault+0x0/0x52d
 [] work_notifysig+0x13/0x19
 [] rpc_info_open+0x17/0x6a
 ===
Code: 8b 46 40 8b 50 08 b8 05 8f 39 c0 e8 08 df fe ff 8b 46 48 85 c0 74 14 8b
40 10 85 c0 74 0d 8b 50 2c b8 23 8f 39 c0 e8 ed de fe ff <0f> 0b eb fe 8b 53
10 8b 03 83 e2 01 f7 da c1 e8 1e 83 c2 04 69 
EIP: [] page_remove_rmap+0xe4/0x111 SS:ESP 0068:eefc7e00
---[ end trace 42d12388f65d0f6f ]---
Fixing recursive fault but reboot is needed!

Apparently this happened to me in the near past, but I didn't have any
netconsole facility enabled at that time to capture the message.

Anybody has any idea where this might have come from?

-- 
Sex on TV doesn't hurtunless you fall off.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


kernel BUG at mm/rmap.c:631!

2008-02-17 Thread Ignacy Gawedzki
Hi,

I was printing on the parallel port and suddenly the parallel CUPS backend
went 50% CPU (obviously endless-looping), while the other 50% were eaten by
ghostscript (strace didn't show anything, so this might be an internal
loop).  When I eventually killed the latter, I got this:

Eeek! page_mapcount(page) went negative! (-1)
  page pfn = 3
  page-flags = 80014
  page-count = 0
  page-mapping = 
  vma-vm_ops = _stext+0x3feff000/0x14
[ cut here ]
kernel BUG at mm/rmap.c:631!
invalid opcode:  [#1] 
Modules linked in: cls_fw sch_prio sch_htb iptable_nat xt_limit xt_state
ipt_REJECT xt_tcpudp ipt_LOG xt_DSCP xt_dscp xt_mark nf_conntrack_ipv4
xt_CONNMARK xt_MARK iptable_mangle iptable_filter ip_tables x_tables aes_i586
geode_aes aes_generic ieee80211_crypt_ccmp lirc_dev nf_nat_ftp nf_nat
nf_conntrack_ftp nf_conntrack ipv6 evdev hostap_pci hostap ieee80211_crypt
i2c_viapro via686a ide_cd

Pid: 5098, comm: gs Tainted: GF   (2.6.24.1 #9)
EIP: 0060:[c01443f1] EFLAGS: 00010246 CPU: 0
EIP is at page_remove_rmap+0xe4/0x111
EAX:  EBX: c160 ECX: 0046 EDX: 5a52
ESI: e90d3f44 EDI: ea61b720 EBP: b700 ESP: eefc7e00
 DS: 007b ES: 007b FS:  GS:  SS: 0068
Process gs (pid: 5098, ti=eefc6000 task=ea6fd570 task.ti=eefc6000)
Stack: c0398eb6  c160 b6dc8000 c013f6c8 326b  e90d3f44 
   eefc7e74  0001 ef371b6c ef1073a0 c0454f98 ffa0  
   ef371b6c 000eaeb1 b70bc000  eefc7e74 e90d3860 ef1073a0 eefc7f10 
Call Trace:
 [c013f6c8] unmap_vmas+0x23e/0x403
 [c0141c3e] exit_mmap+0x5f/0xc9
 [c0116a82] mmput+0x1b/0x5e
 [c011a8f8] do_exit+0x1ad/0x5ae
 [c011ad4a] sys_exit_group+0x0/0xd
 [c0120962] get_signal_to_deliver+0x370/0x380
 [c02cf89e] net_rx_action+0x70/0x144
 [c026d8bb] intr_handler+0x9c/0xcf
 [c0111f67] do_page_fault+0x0/0x52d
 [c0103326] do_notify_resume+0x81/0x5c0
 [c013ff5d] handle_mm_fault+0x70/0x49d
 [c010455b] common_interrupt+0x23/0x28
 [c01120f3] do_page_fault+0x18c/0x52d
 [c0336dc2] schedule+0x1f3/0x20d
 [c0111f67] do_page_fault+0x0/0x52d
 [c0103c4e] work_notifysig+0x13/0x19
 [c033] rpc_info_open+0x17/0x6a
 ===
Code: 8b 46 40 8b 50 08 b8 05 8f 39 c0 e8 08 df fe ff 8b 46 48 85 c0 74 14 8b
40 10 85 c0 74 0d 8b 50 2c b8 23 8f 39 c0 e8 ed de fe ff 0f 0b eb fe 8b 53
10 8b 03 83 e2 01 f7 da c1 e8 1e 83 c2 04 69 
EIP: [c01443f1] page_remove_rmap+0xe4/0x111 SS:ESP 0068:eefc7e00
---[ end trace 42d12388f65d0f6f ]---
Fixing recursive fault but reboot is needed!

Apparently this happened to me in the near past, but I didn't have any
netconsole facility enabled at that time to capture the message.

Anybody has any idea where this might have come from?

-- 
Sex on TV doesn't hurtunless you fall off.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel BUG at mm/rmap.c:631!

2008-02-17 Thread Ignacy Gawedzki
On Sun, Feb 17, 2008 at 09:16:36PM +0100, thus spake Rafael J. Wysocki:
 On Sunday, 17 of February 2008, Ignacy Gawedzki wrote:
  Hi,
 
 Hi,
 
  I was printing on the parallel port and suddenly the parallel CUPS backend
  went 50% CPU (obviously endless-looping), while the other 50% were eaten by
  ghostscript (strace didn't show anything, so this might be an internal
  loop).  When I eventually killed the latter, I got this:
 
 Which kernel is this?

As is shown in the dmesg, it is 2.6.24.1.

Is it a regression?

Can't really say for sure.  At least it already happened with 2.6.23.9.

 If so, what's the last known
 working kernel?

This is really difficult to determine, since the event is pretty hard to
reproduce.  I'll try to investigate more, then. :/  This one happened pretty
much right after a reboot due to a completely frozen machine (no Oops or Eeek
whatsoever) apparently due to intensive writing to the parallel port (the
kernel complained that FIFO write timed out twice before locking up).

Of course I do suspect a hardware problem, but since last time I had similarly
strange things it ended up being due to misconfiguration, I still hope
someone will tell me this is also the case here.

-- 
:wq!
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Oops with hostap_pci (?)

2008-02-11 Thread Ignacy Gawedzki
On Mon, Feb 11, 2008 at 04:19:35AM +0100, thus spake Ignacy Gawedzki:
> Hi,
> 
> A few days back I started having strange lockups on a gateway machine so I
> started looking at things.  Then I compiled the 2.6.24.1 kernel and started
> having oopses not long after upping the wlan0 (hostap_pci) interface.
> 
> So I enabled netconsole and got a few logs.  Now the sad point is that I'm
> getting an oops even with my older kernel which used to be fine (2.6.23.9).  I
> also checked with 2.6.24 and the effects are the same: I boot, I up the wlan0
> interface and a few seconds or minutes later, boom!  Sometimes only rmmod'ing
> hostap_pci triggers the oops.  I'm suspecting some hardware problem and have
> already checked the ram with memtest86+ and tested with only one memory module
> out of two plugged: same thing.
> 
> If anybody could take a look at these and shed some light on that issue...

Okay, false alarm... it's all my fault. :/

The cause of the problem was my previous tampering with udev rules.  The udev
rules as such (on Ubuntu Gutsy) were bad for hostapd, since persistent rules
were written for the wlan0ap interface name created by hostapd.  So I changed
a few things that had the unexpected effect of renaming the initial
hostap_pci's wifi0 into wlan0ap.  This in turn made hostap_pci oops in many
cases.

Anyway, I've modified my udev rules again and hopefully this will be it. =)

-- 
 "The whole problem with the world is that fools and fanatics are
   always so certain of themselves, and wiser people so full of doubts."
 - Bertrand Russell
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Oops with hostap_pci (?)

2008-02-11 Thread Ignacy Gawedzki
On Mon, Feb 11, 2008 at 04:19:35AM +0100, thus spake Ignacy Gawedzki:
 Hi,
 
 A few days back I started having strange lockups on a gateway machine so I
 started looking at things.  Then I compiled the 2.6.24.1 kernel and started
 having oopses not long after upping the wlan0 (hostap_pci) interface.
 
 So I enabled netconsole and got a few logs.  Now the sad point is that I'm
 getting an oops even with my older kernel which used to be fine (2.6.23.9).  I
 also checked with 2.6.24 and the effects are the same: I boot, I up the wlan0
 interface and a few seconds or minutes later, boom!  Sometimes only rmmod'ing
 hostap_pci triggers the oops.  I'm suspecting some hardware problem and have
 already checked the ram with memtest86+ and tested with only one memory module
 out of two plugged: same thing.
 
 If anybody could take a look at these and shed some light on that issue...

Okay, false alarm... it's all my fault. :/

The cause of the problem was my previous tampering with udev rules.  The udev
rules as such (on Ubuntu Gutsy) were bad for hostapd, since persistent rules
were written for the wlan0ap interface name created by hostapd.  So I changed
a few things that had the unexpected effect of renaming the initial
hostap_pci's wifi0 into wlan0ap.  This in turn made hostap_pci oops in many
cases.

Anyway, I've modified my udev rules again and hopefully this will be it. =)

-- 
 The whole problem with the world is that fools and fanatics are
   always so certain of themselves, and wiser people so full of doubts.
 - Bertrand Russell
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Oops with hostap_pci (?)

2008-02-10 Thread Ignacy Gawedzki
Hi,

A few days back I started having strange lockups on a gateway machine so I
started looking at things.  Then I compiled the 2.6.24.1 kernel and started
having oopses not long after upping the wlan0 (hostap_pci) interface.

So I enabled netconsole and got a few logs.  Now the sad point is that I'm
getting an oops even with my older kernel which used to be fine (2.6.23.9).  I
also checked with 2.6.24 and the effects are the same: I boot, I up the wlan0
interface and a few seconds or minutes later, boom!  Sometimes only rmmod'ing
hostap_pci triggers the oops.  I'm suspecting some hardware problem and have
already checked the ram with memtest86+ and tested with only one memory module
out of two plugged: same thing.

If anybody could take a look at these and shed some light on that issue...

Thanks a lot,

Ignacy

-- 
Save the whales. Feed the hungry. Free the mallocs. 
With kernel 2.6.24.1

BUG: unable to handle kernel NULL pointer dereference at virtual address 

printing eip: f08f50c2 *pde =  
Oops:  [#1] 
Modules linked in: lirc_serial(F) lirc_dev cls_fw sch_prio sch_htb iptable_nat 
xt_limit xt_state ipt_REJECT xt_tcpudp ipt_LOG xt_DSCP xt_dscp xt_mark 
nf_conntrack_ipv4 xt_CONNMARK xt_MARK iptable_mangle iptable_filter ip_tables 
x_tables nf_nat_ftp nf_nat nf_conntrack_ftp nf_conntrack ipv6 evdev hostap_pci 
i2c_viapro hostap via686a ieee80211_crypt ide_cd

Pid: 0, comm: swapper Tainted: GF   (2.6.24.1 #5)
EIP: 0060:[] EFLAGS: 00010297 CPU: 0
EIP is at hostap_80211_rx+0x41d/0xecf [hostap]
EAX: eec28460 EBX:  ECX: eec28444 EDX: 
ESI: efbb8434 EDI:  EBP: efbb843e ESP: c0419e74
 DS: 007b ES: 007b FS:  GS:  SS: 0068
Process swapper (pid: 0, ti=c0418000 task=c03e4300 task.ti=c0418000)
Stack:  0080 004c 0001 c0419f2c c0419f30 ef3ab760 0018 
   0100 eec28444 1148 0040 c9c0 0001 ef8d3370 2a40 
   04b1cd93 000a1e00 1148 013a1148 685b0900 ef8d3000 1f714b23 685b0900 
Call Trace:
 [] hostap_rx_tasklet+0x11f/0x145 [hostap_pci]
 [] run_timer_softirq+0x11/0x12f
 [] tasklet_action+0x32/0x52
 [] __do_softirq+0x35/0x75
 [] do_softirq+0x22/0x26
 [] irq_exit+0x29/0x58
 [] do_IRQ+0x58/0x6b
 [] common_interrupt+0x23/0x28
 [] mod_sysfs_init+0x17/0x6d
 [] arch_setup_additional_pages+0x121/0x13a
 [] acpi_processor_idle+0x244/0x3c4
 [] cpu_idle+0x43/0x5d
 [] start_kernel+0x237/0x23c
 [] unknown_bootoption+0x0/0x195
 ===
Code: 0a 8b 4c 24 24 8b 59 1c eb 21 83 bb d8 00 00 00 04 75 16 8d 83 dc 00 00 
00 b9 06 00 00 00 89 ea e8 0b d1 91 cf 85 c0 74 18 89 fb <8b> 3b 0f 18 07 90 8b 
44 24 24 83 c0 1c 39 c3 75 ce e9 44 0a 00 
EIP: [] hostap_80211_rx+0x41d/0xecf [hostap] SS:ESP 0068:c0419e74
Kernel panic - not syncing: Fatal exception in interrupt
wlan0ap: SW TICK stuck? bits=0x0 EvStat=8001 IntEn=e018
With kernel 2.6.24.1

BUG: unable to handle kernel paging request at virtual address abdb24ce
printing eip: f08ea0c2 *pde =  
Oops:  [#1] 
Modules linked in: cls_fw sch_prio sch_htb iptable_nat xt_limit xt_state 
ipt_REJECT xt_tcpudp ipt_LOG xt_DSCP xt_dscp xt_mark nf_conntrack_ipv4 
xt_CONNMARK xt_MARK iptable_mangle iptable_filter ip_tables x_tables nf_nat_ftp 
nf_nat nf_conntrack_ftp nf_conntrack ipv6 evdev hostap_pci i2c_viapro via686a 
hostap ieee80211_crypt ide_cd

Pid: 0, comm: swapper Not tainted (2.6.24.1 #5)
EIP: 0060:[] EFLAGS: 00010202 CPU: 0
EIP is at hostap_80211_rx+0x41d/0xecf [hostap]
EAX: efa68460 EBX: abdb24ce ECX: efa68444 EDX: 
ESI: ef1e1034 EDI: abdb24ce EBP: ef1e103e ESP: c0419e74
 DS: 007b ES: 007b FS:  GS:  SS: 0068
Process swapper (pid: 0, ti=c0418000 task=c03e4300 task.ti=c0418000)
Stack:  0080 c045358c 0001 c0453570 c0419f30 eec598e0 0018 
   0100 efa68444 1148 0040 1f90 0001 eec29370 5a40 
   0080ce43 000a1e00 1148 013a1148 685b0900 eec29000 1f714b23 685b0900 
Call Trace:
 [] hostap_rx_tasklet+0x11f/0x145 [hostap_pci]
 [] tasklet_action+0x32/0x52
 [] __do_softirq+0x35/0x75
 [] do_softirq+0x22/0x26
 [] irq_exit+0x29/0x58
 [] do_IRQ+0x58/0x6b
 [] common_interrupt+0x23/0x28
 [] mod_sysfs_init+0x17/0x6d
 [] arch_setup_additional_pages+0x121/0x13a
 [] acpi_processor_idle+0x244/0x3c4
 [] cpu_idle+0x43/0x5d
 [] start_kernel+0x237/0x23c
 [] unknown_bootoption+0x0/0x195
 ===
Code: 0a 8b 4c 24 24 8b 59 1c eb 21 83 bb d8 00 00 00 04 75 16 8d 83 dc 00 00 
00 b9 06 00 00 00 89 ea e8 0b 81 92 cf 85 c0 74 18 89 fb <8b> 3b 0f 18 07 90 8b 
44 24 24 83 c0 1c 39 c3 75 ce e9 44 0a 00 
EIP: [] hostap_80211_rx+0x41d/0xecf [hostap] SS:ESP 0068:c0419e74
Kernel panic - not syncing: Fatal exception in interrupt
wlan0ap: SW TICK stuck? bits=0x0 EvStat=8001 IntEn=e018
With kernel 2.6.24

BUG: unable to handle kernel paging request at virtual address 630e0021
printing eip: f08f20c2 *pde =  
Oops:  [#1] 
Modules linked in: cls_fw sch_prio sch_htb iptable_nat xt_limit xt_state 
ipt_REJECT 

Oops with hostap_pci (?)

2008-02-10 Thread Ignacy Gawedzki
Hi,

A few days back I started having strange lockups on a gateway machine so I
started looking at things.  Then I compiled the 2.6.24.1 kernel and started
having oopses not long after upping the wlan0 (hostap_pci) interface.

So I enabled netconsole and got a few logs.  Now the sad point is that I'm
getting an oops even with my older kernel which used to be fine (2.6.23.9).  I
also checked with 2.6.24 and the effects are the same: I boot, I up the wlan0
interface and a few seconds or minutes later, boom!  Sometimes only rmmod'ing
hostap_pci triggers the oops.  I'm suspecting some hardware problem and have
already checked the ram with memtest86+ and tested with only one memory module
out of two plugged: same thing.

If anybody could take a look at these and shed some light on that issue...

Thanks a lot,

Ignacy

-- 
Save the whales. Feed the hungry. Free the mallocs. 
With kernel 2.6.24.1

BUG: unable to handle kernel NULL pointer dereference at virtual address 

printing eip: f08f50c2 *pde =  
Oops:  [#1] 
Modules linked in: lirc_serial(F) lirc_dev cls_fw sch_prio sch_htb iptable_nat 
xt_limit xt_state ipt_REJECT xt_tcpudp ipt_LOG xt_DSCP xt_dscp xt_mark 
nf_conntrack_ipv4 xt_CONNMARK xt_MARK iptable_mangle iptable_filter ip_tables 
x_tables nf_nat_ftp nf_nat nf_conntrack_ftp nf_conntrack ipv6 evdev hostap_pci 
i2c_viapro hostap via686a ieee80211_crypt ide_cd

Pid: 0, comm: swapper Tainted: GF   (2.6.24.1 #5)
EIP: 0060:[f08f50c2] EFLAGS: 00010297 CPU: 0
EIP is at hostap_80211_rx+0x41d/0xecf [hostap]
EAX: eec28460 EBX:  ECX: eec28444 EDX: 
ESI: efbb8434 EDI:  EBP: efbb843e ESP: c0419e74
 DS: 007b ES: 007b FS:  GS:  SS: 0068
Process swapper (pid: 0, ti=c0418000 task=c03e4300 task.ti=c0418000)
Stack:  0080 004c 0001 c0419f2c c0419f30 ef3ab760 0018 
   0100 eec28444 1148 0040 c9c0 0001 ef8d3370 2a40 
   04b1cd93 000a1e00 1148 013a1148 685b0900 ef8d3000 1f714b23 685b0900 
Call Trace:
 [f090ffca] hostap_rx_tasklet+0x11f/0x145 [hostap_pci]
 [c011e399] run_timer_softirq+0x11/0x12f
 [c011bbbc] tasklet_action+0x32/0x52
 [c011bb24] __do_softirq+0x35/0x75
 [c011bb86] do_softirq+0x22/0x26
 [c011bdb3] irq_exit+0x29/0x58
 [c0105bc0] do_IRQ+0x58/0x6b
 [c010455b] common_interrupt+0x23/0x28
 [c013007b] mod_sysfs_init+0x17/0x6d
 [c011007b] arch_setup_additional_pages+0x121/0x13a
 [c023f4a0] acpi_processor_idle+0x244/0x3c4
 [c01024fc] cpu_idle+0x43/0x5d
 [c041a9ac] start_kernel+0x237/0x23c
 [c041a303] unknown_bootoption+0x0/0x195
 ===
Code: 0a 8b 4c 24 24 8b 59 1c eb 21 83 bb d8 00 00 00 04 75 16 8d 83 dc 00 00 
00 b9 06 00 00 00 89 ea e8 0b d1 91 cf 85 c0 74 18 89 fb 8b 3b 0f 18 07 90 8b 
44 24 24 83 c0 1c 39 c3 75 ce e9 44 0a 00 
EIP: [f08f50c2] hostap_80211_rx+0x41d/0xecf [hostap] SS:ESP 0068:c0419e74
Kernel panic - not syncing: Fatal exception in interrupt
wlan0ap: SW TICK stuck? bits=0x0 EvStat=8001 IntEn=e018
With kernel 2.6.24.1

BUG: unable to handle kernel paging request at virtual address abdb24ce
printing eip: f08ea0c2 *pde =  
Oops:  [#1] 
Modules linked in: cls_fw sch_prio sch_htb iptable_nat xt_limit xt_state 
ipt_REJECT xt_tcpudp ipt_LOG xt_DSCP xt_dscp xt_mark nf_conntrack_ipv4 
xt_CONNMARK xt_MARK iptable_mangle iptable_filter ip_tables x_tables nf_nat_ftp 
nf_nat nf_conntrack_ftp nf_conntrack ipv6 evdev hostap_pci i2c_viapro via686a 
hostap ieee80211_crypt ide_cd

Pid: 0, comm: swapper Not tainted (2.6.24.1 #5)
EIP: 0060:[f08ea0c2] EFLAGS: 00010202 CPU: 0
EIP is at hostap_80211_rx+0x41d/0xecf [hostap]
EAX: efa68460 EBX: abdb24ce ECX: efa68444 EDX: 
ESI: ef1e1034 EDI: abdb24ce EBP: ef1e103e ESP: c0419e74
 DS: 007b ES: 007b FS:  GS:  SS: 0068
Process swapper (pid: 0, ti=c0418000 task=c03e4300 task.ti=c0418000)
Stack:  0080 c045358c 0001 c0453570 c0419f30 eec598e0 0018 
   0100 efa68444 1148 0040 1f90 0001 eec29370 5a40 
   0080ce43 000a1e00 1148 013a1148 685b0900 eec29000 1f714b23 685b0900 
Call Trace:
 [f0904fca] hostap_rx_tasklet+0x11f/0x145 [hostap_pci]
 [c011bbbc] tasklet_action+0x32/0x52
 [c011bb24] __do_softirq+0x35/0x75
 [c011bb86] do_softirq+0x22/0x26
 [c011bdb3] irq_exit+0x29/0x58
 [c0105bc0] do_IRQ+0x58/0x6b
 [c010455b] common_interrupt+0x23/0x28
 [c013007b] mod_sysfs_init+0x17/0x6d
 [c011007b] arch_setup_additional_pages+0x121/0x13a
 [c023f4a0] acpi_processor_idle+0x244/0x3c4
 [c01024fc] cpu_idle+0x43/0x5d
 [c041a9ac] start_kernel+0x237/0x23c
 [c041a303] unknown_bootoption+0x0/0x195
 ===
Code: 0a 8b 4c 24 24 8b 59 1c eb 21 83 bb d8 00 00 00 04 75 16 8d 83 dc 00 00 
00 b9 06 00 00 00 89 ea e8 0b 81 92 cf 85 c0 74 18 89 fb 8b 3b 0f 18 07 90 8b 
44 24 24 83 c0 1c 39 c3 75 ce e9 44 0a 00 
EIP: [f08ea0c2] hostap_80211_rx+0x41d/0xecf [hostap] SS:ESP 0068:c0419e74
Kernel panic - not syncing: Fatal exception in interrupt
wlan0ap: SW TICK stuck? bits=0x0 EvStat=8001 

Re: Hot (un)plugging of a SATA drive with sata_nv (CK8S)?

2008-02-03 Thread Ignacy Gawedzki
On Mon, Jan 28, 2008 at 05:35:58PM -0600, thus spake Robert Hancock:
> Any ideas guys? When the drive is plugged in, a stream of this shows up. It 
> would seem like the controller is throwing hotplug interrupts but we never 
> seem to get a "SATA link up". This is on nForce3, btw.

I just happened to upgrade to kernel 2.6.24 and the problem is gone.  I just
have a few SError messages that appear to be harmless:

  ata2: exception Emask 0x10 SAct 0x0 SErr 0x5 action 0xa frozen
  ata2: SError: { PHYRdyChg CommWake }
  ata2: hard resetting link
  ata2: SATA link down (SStatus 0 SControl 300)
  ata2: EH complete
  ata2: exception Emask 0x10 SAct 0x0 SErr 0x1d action 0xa frozen
  ata2: SError: { PHYRdyChg CommWake 10B8B Dispar }
  ata2: hard resetting link
  ata2: port is slow to respond, please be patient (Status 0x80)
  ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
  ata2.00: ATA-8: ST3500320AS, SD15, max UDMA/133
  ata2.00: 976773168 sectors, multi 0: LBA48 NCQ (depth 0/32)
  ata2.00: configured for UDMA/133
  ata2: EH complete
  
and then the usual SCSI messages about the newly seen drive.  The scsiadd -r
command works every time and does stop the disk indeed :

  sd 1:0:0:0: [sdb] Synchronizing SCSI cache
  sd 1:0:0:0: [sdb] Stopping disk
  ata2.00: disabled

and then when I switch the drive off :

  ata2: exception Emask 0x10 SAct 0x0 SErr 0x199 action 0xa frozen
  ata2: SError: { PHYRdyChg 10B8B Dispar LinkSeq TrStaTrns }
  ata2: hard resetting link
  ata2: SATA link down (SStatus 0 SControl 300)
  ata2: EH complete

So thanks for the help and sorry for the bother. =)

-- 
Everything is more fun naked except cooking with grease.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Hot (un)plugging of a SATA drive with sata_nv (CK8S)?

2008-02-03 Thread Ignacy Gawedzki
On Mon, Jan 28, 2008 at 05:35:58PM -0600, thus spake Robert Hancock:
 Any ideas guys? When the drive is plugged in, a stream of this shows up. It 
 would seem like the controller is throwing hotplug interrupts but we never 
 seem to get a SATA link up. This is on nForce3, btw.

I just happened to upgrade to kernel 2.6.24 and the problem is gone.  I just
have a few SError messages that appear to be harmless:

  ata2: exception Emask 0x10 SAct 0x0 SErr 0x5 action 0xa frozen
  ata2: SError: { PHYRdyChg CommWake }
  ata2: hard resetting link
  ata2: SATA link down (SStatus 0 SControl 300)
  ata2: EH complete
  ata2: exception Emask 0x10 SAct 0x0 SErr 0x1d action 0xa frozen
  ata2: SError: { PHYRdyChg CommWake 10B8B Dispar }
  ata2: hard resetting link
  ata2: port is slow to respond, please be patient (Status 0x80)
  ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
  ata2.00: ATA-8: ST3500320AS, SD15, max UDMA/133
  ata2.00: 976773168 sectors, multi 0: LBA48 NCQ (depth 0/32)
  ata2.00: configured for UDMA/133
  ata2: EH complete
  
and then the usual SCSI messages about the newly seen drive.  The scsiadd -r
command works every time and does stop the disk indeed :

  sd 1:0:0:0: [sdb] Synchronizing SCSI cache
  sd 1:0:0:0: [sdb] Stopping disk
  ata2.00: disabled

and then when I switch the drive off :

  ata2: exception Emask 0x10 SAct 0x0 SErr 0x199 action 0xa frozen
  ata2: SError: { PHYRdyChg 10B8B Dispar LinkSeq TrStaTrns }
  ata2: hard resetting link
  ata2: SATA link down (SStatus 0 SControl 300)
  ata2: EH complete

So thanks for the help and sorry for the bother. =)

-- 
Everything is more fun naked except cooking with grease.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Hot (un)plugging of a SATA drive with sata_nv (CK8S)?

2008-01-28 Thread Ignacy Gawedzki
On Fri, Jan 25, 2008 at 09:03:02PM -0600, thus spake Robert Hancock:
> Ignacy Gawedzki wrote:
>> Hi everyone,
>> I'm having trouble to determine the cause of the following behavior.  I'm 
>> not
>> even sure that I'm supposed to hot plug and unplug a SATA drive from a 
>> nForce3
>> Ultra (apparently CK8S, on a Gigabyte K8NS Ultra 939 mobo) SATA interface, 
>> to
>> begin with.  The information is hard to find given that the sata_nv driver
>> supports a range of different hardware.
>> I've recently acquired an external drive with (among others) an eSATA
>> interface, so I also bought a eSATA->SATA bracket and intend to use that 
>> drive
>> (Lacie d2 quadra 500G) through eSATA.
>
> BTW, eSATA cannot technically be converted properly to SATA with a simple 
> connector adapter. eSATA is supposed to use higher signalling voltages and 
> so using such an adapter is not guaranteed to work.

Yeah, apparently this shortens the max cable length to 1 meter.  In this case
I've got a 1 meter external cable and approx. 30 cm internal (heavily shielded
though) cable from the bracket to the SATA port.  Anyway, the drive works
perfectly if plugged at boot time.

>
>> The thing is that if I boot the machine with the drive plugged and turned 
>> on,
>> it is properly detected and usable.  If, at some point, I want to remove 
>> the
>> drive, I unmount any partitions on it and issue the proper scsiadd -r 
>> command
>> (usually scsiadd -r 1 0 0 0, since this is the second SATA drive) and
>> everything is fine (I turn the drive off and unplug it), so far.  Next, 
>> when
>> I want to use the drive again, it's still detected alright (although 
>> appears
>> as sdc and not sdb anymore), but the SCSI layer issues "scsi 1:0:0:0:
>> rejecting I/O to dead device" from time to time.  Then any scsiadd -r 1 0 
>> 0 0
>> command fails with "No such device or address", although it appears in the
>> output of scsiadd -p or even scsiadd -s (always as 1 0 0 0).  If I ignore 
>> that
>> detail and switch the drive off, then the kernel eventually notices that 
>> the
>> drive is gone and the SCSI layer attempts to stop the device and fails 
>> ([sdc]
>> START_STOP FAILED).  From that moment on, any attempt to plug the drive 
>> again
>> fails.  The kernel issues "ata2: hard resetting port" and "ata2: port is 
>> slow
>> to respond, please be patient (Status 0x80)" periodically, until I switch 
>> the
>> drive off.
>> If the drive is not present at boot, then hot plugging it fails.  The 
>> kernel
>> first soft resets the port, then issues the "please be patient (Status 
>> 0x80)"
>> message, complains that SRST failed (errno=-16) and goes on hard resetting 
>> the
>> port, issuing "please be patient (Status 0x80)" and complaining that 
>> COMRESET
>> failed (errno=-16), periodically, until the drive is switched off.
>
> Full dmesg output would be useful..

I repeated the experiments and dumped as much dmesg as I could.

The dmesg outputs of both experiments are attached and commented.  It seems
that in the case the drive is pluggin at boot time, it remains hot pluggable
later (be it with some strange error messages) after all (or is there another
factor that I did not reproduce?).

Thank you for any help. =)

-- 
NO CARRIER
### First experiment, the drive is plugged and turned on at boot time.
### The initial full dmesg dump follows.

Linux version 2.6.23.14 ([EMAIL PROTECTED]) (gcc version 4.1.3 20070929 
(prerelease) (U
buntu 4.1.2-16ubuntu2)) #1 PREEMPT Thu Jan 24 22:07:54 CET 2008
Command line: root=UUID=84d4c1b4-5602-4364-a583-7913d518b4ab ro quiet splash
BIOS-provided physical RAM map:
 BIOS-e820:  - 0009f400 (usable)
 BIOS-e820: 0009f800 - 000a (reserved)
 BIOS-e820: 000f - 0010 (reserved)
 BIOS-e820: 0010 - 7fff (usable)
 BIOS-e820: 7fff - 7fff3000 (ACPI NVS)
 BIOS-e820: 7fff3000 - 8000 (ACPI data)
 BIOS-e820: fec0 - 0001 (reserved)
Entering add_active_range(0, 0, 159) 0 entries of 256 used
Entering add_active_range(0, 256, 524272) 1 entries of 256 used
end_pfn_map = 1048576
DMI 2.3 present.
ACPI: RSDP 000F6C90, 0014 (r0 Nvidia)
ACPI: RSDT 7FFF3000, 002C (r1 Nvidia AWRDACPI 42302E31 AWRD  1010101)
ACPI: FACP 7FFF3040, 0074 (r1 Nvidia AWRDACPI 42302E31 AWRD  1010101)
ACPI: DSDT 7FFF30C0, 4AC4 (r1 NVIDIA AWRDACPI 1000 MSFT  10C)
ACPI: FACS 7FFF, 0040
ACPI: APIC 7FFF7BC0, 007C (r1 Nvidia AWRDACPI 42302E31 AWRD  1010101)
Entering add_active_range(0, 0, 159) 0 entries of 256 used
Enterin

Re: Hot (un)plugging of a SATA drive with sata_nv (CK8S)?

2008-01-28 Thread Ignacy Gawedzki
On Fri, Jan 25, 2008 at 09:03:02PM -0600, thus spake Robert Hancock:
 Ignacy Gawedzki wrote:
 Hi everyone,
 I'm having trouble to determine the cause of the following behavior.  I'm 
 not
 even sure that I'm supposed to hot plug and unplug a SATA drive from a 
 nForce3
 Ultra (apparently CK8S, on a Gigabyte K8NS Ultra 939 mobo) SATA interface, 
 to
 begin with.  The information is hard to find given that the sata_nv driver
 supports a range of different hardware.
 I've recently acquired an external drive with (among others) an eSATA
 interface, so I also bought a eSATA-SATA bracket and intend to use that 
 drive
 (Lacie d2 quadra 500G) through eSATA.

 BTW, eSATA cannot technically be converted properly to SATA with a simple 
 connector adapter. eSATA is supposed to use higher signalling voltages and 
 so using such an adapter is not guaranteed to work.

Yeah, apparently this shortens the max cable length to 1 meter.  In this case
I've got a 1 meter external cable and approx. 30 cm internal (heavily shielded
though) cable from the bracket to the SATA port.  Anyway, the drive works
perfectly if plugged at boot time.


 The thing is that if I boot the machine with the drive plugged and turned 
 on,
 it is properly detected and usable.  If, at some point, I want to remove 
 the
 drive, I unmount any partitions on it and issue the proper scsiadd -r 
 command
 (usually scsiadd -r 1 0 0 0, since this is the second SATA drive) and
 everything is fine (I turn the drive off and unplug it), so far.  Next, 
 when
 I want to use the drive again, it's still detected alright (although 
 appears
 as sdc and not sdb anymore), but the SCSI layer issues scsi 1:0:0:0:
 rejecting I/O to dead device from time to time.  Then any scsiadd -r 1 0 
 0 0
 command fails with No such device or address, although it appears in the
 output of scsiadd -p or even scsiadd -s (always as 1 0 0 0).  If I ignore 
 that
 detail and switch the drive off, then the kernel eventually notices that 
 the
 drive is gone and the SCSI layer attempts to stop the device and fails 
 ([sdc]
 START_STOP FAILED).  From that moment on, any attempt to plug the drive 
 again
 fails.  The kernel issues ata2: hard resetting port and ata2: port is 
 slow
 to respond, please be patient (Status 0x80) periodically, until I switch 
 the
 drive off.
 If the drive is not present at boot, then hot plugging it fails.  The 
 kernel
 first soft resets the port, then issues the please be patient (Status 
 0x80)
 message, complains that SRST failed (errno=-16) and goes on hard resetting 
 the
 port, issuing please be patient (Status 0x80) and complaining that 
 COMRESET
 failed (errno=-16), periodically, until the drive is switched off.

 Full dmesg output would be useful..

I repeated the experiments and dumped as much dmesg as I could.

The dmesg outputs of both experiments are attached and commented.  It seems
that in the case the drive is pluggin at boot time, it remains hot pluggable
later (be it with some strange error messages) after all (or is there another
factor that I did not reproduce?).

Thank you for any help. =)

-- 
NO CARRIER
### First experiment, the drive is plugged and turned on at boot time.
### The initial full dmesg dump follows.

Linux version 2.6.23.14 ([EMAIL PROTECTED]) (gcc version 4.1.3 20070929 
(prerelease) (U
buntu 4.1.2-16ubuntu2)) #1 PREEMPT Thu Jan 24 22:07:54 CET 2008
Command line: root=UUID=84d4c1b4-5602-4364-a583-7913d518b4ab ro quiet splash
BIOS-provided physical RAM map:
 BIOS-e820:  - 0009f400 (usable)
 BIOS-e820: 0009f800 - 000a (reserved)
 BIOS-e820: 000f - 0010 (reserved)
 BIOS-e820: 0010 - 7fff (usable)
 BIOS-e820: 7fff - 7fff3000 (ACPI NVS)
 BIOS-e820: 7fff3000 - 8000 (ACPI data)
 BIOS-e820: fec0 - 0001 (reserved)
Entering add_active_range(0, 0, 159) 0 entries of 256 used
Entering add_active_range(0, 256, 524272) 1 entries of 256 used
end_pfn_map = 1048576
DMI 2.3 present.
ACPI: RSDP 000F6C90, 0014 (r0 Nvidia)
ACPI: RSDT 7FFF3000, 002C (r1 Nvidia AWRDACPI 42302E31 AWRD  1010101)
ACPI: FACP 7FFF3040, 0074 (r1 Nvidia AWRDACPI 42302E31 AWRD  1010101)
ACPI: DSDT 7FFF30C0, 4AC4 (r1 NVIDIA AWRDACPI 1000 MSFT  10C)
ACPI: FACS 7FFF, 0040
ACPI: APIC 7FFF7BC0, 007C (r1 Nvidia AWRDACPI 42302E31 AWRD  1010101)
Entering add_active_range(0, 0, 159) 0 entries of 256 used
Entering add_active_range(0, 256, 524272) 1 entries of 256 used
Zone PFN ranges:
  DMA 0 - 4096
  DMA324096 -  1048576
  Normal1048576 -  1048576
Movable zone start PFN for each node
early_node_map[2] active PFN ranges
0:0 -  159
0:  256 -   524272
On node 0 totalpages: 524175
  DMA zone: 56 pages used for memmap
  DMA zone: 1312 pages reserved
  DMA zone: 2631 pages, LIFO batch:0
  DMA32 zone: 7111 pages used for memmap
  DMA32 zone: 513065 pages, LIFO batch:31

Hot (un)plugging of a SATA drive with sata_nv (CK8S) ?

2008-01-25 Thread Ignacy Gawedzki
Hi everyone,

I'm having trouble to determine the cause of the following behavior.  I'm not
even sure that I'm supposed to hot plug and unplug a SATA drive from a nForce3
Ultra (apparently CK8S, on a Gigabyte K8NS Ultra 939 mobo) SATA interface, to
begin with.  The information is hard to find given that the sata_nv driver
supports a range of different hardware.

I've recently acquired an external drive with (among others) an eSATA
interface, so I also bought a eSATA->SATA bracket and intend to use that drive
(Lacie d2 quadra 500G) through eSATA.

The thing is that if I boot the machine with the drive plugged and turned on,
it is properly detected and usable.  If, at some point, I want to remove the
drive, I unmount any partitions on it and issue the proper scsiadd -r command
(usually scsiadd -r 1 0 0 0, since this is the second SATA drive) and
everything is fine (I turn the drive off and unplug it), so far.  Next, when
I want to use the drive again, it's still detected alright (although appears
as sdc and not sdb anymore), but the SCSI layer issues "scsi 1:0:0:0:
rejecting I/O to dead device" from time to time.  Then any scsiadd -r 1 0 0 0
command fails with "No such device or address", although it appears in the
output of scsiadd -p or even scsiadd -s (always as 1 0 0 0).  If I ignore that
detail and switch the drive off, then the kernel eventually notices that the
drive is gone and the SCSI layer attempts to stop the device and fails ([sdc]
START_STOP FAILED).  From that moment on, any attempt to plug the drive again
fails.  The kernel issues "ata2: hard resetting port" and "ata2: port is slow
to respond, please be patient (Status 0x80)" periodically, until I switch the
drive off.

If the drive is not present at boot, then hot plugging it fails.  The kernel
first soft resets the port, then issues the "please be patient (Status 0x80)"
message, complains that SRST failed (errno=-16) and goes on hard resetting the
port, issuing "please be patient (Status 0x80)" and complaining that COMRESET
failed (errno=-16), periodically, until the drive is switched off.

If somebody could tell me whether hot-plugging is supposed to work with my
SATA interface, it would be nice. =)  The motherboard happens to offer another
SATA interface (Sil3512A) which is well supported and appears to support
hot-plugging as well, but it conflicts nastily with my PCTV Pro (bttv) card
(which are apparently known to conflict with the Sil SATA interfaces).

Thanks for any help.

Ignacy

-- 
I used to have a sig, but I've stopped smoking now.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Hot (un)plugging of a SATA drive with sata_nv (CK8S) ?

2008-01-25 Thread Ignacy Gawedzki
Hi everyone,

I'm having trouble to determine the cause of the following behavior.  I'm not
even sure that I'm supposed to hot plug and unplug a SATA drive from a nForce3
Ultra (apparently CK8S, on a Gigabyte K8NS Ultra 939 mobo) SATA interface, to
begin with.  The information is hard to find given that the sata_nv driver
supports a range of different hardware.

I've recently acquired an external drive with (among others) an eSATA
interface, so I also bought a eSATA-SATA bracket and intend to use that drive
(Lacie d2 quadra 500G) through eSATA.

The thing is that if I boot the machine with the drive plugged and turned on,
it is properly detected and usable.  If, at some point, I want to remove the
drive, I unmount any partitions on it and issue the proper scsiadd -r command
(usually scsiadd -r 1 0 0 0, since this is the second SATA drive) and
everything is fine (I turn the drive off and unplug it), so far.  Next, when
I want to use the drive again, it's still detected alright (although appears
as sdc and not sdb anymore), but the SCSI layer issues scsi 1:0:0:0:
rejecting I/O to dead device from time to time.  Then any scsiadd -r 1 0 0 0
command fails with No such device or address, although it appears in the
output of scsiadd -p or even scsiadd -s (always as 1 0 0 0).  If I ignore that
detail and switch the drive off, then the kernel eventually notices that the
drive is gone and the SCSI layer attempts to stop the device and fails ([sdc]
START_STOP FAILED).  From that moment on, any attempt to plug the drive again
fails.  The kernel issues ata2: hard resetting port and ata2: port is slow
to respond, please be patient (Status 0x80) periodically, until I switch the
drive off.

If the drive is not present at boot, then hot plugging it fails.  The kernel
first soft resets the port, then issues the please be patient (Status 0x80)
message, complains that SRST failed (errno=-16) and goes on hard resetting the
port, issuing please be patient (Status 0x80) and complaining that COMRESET
failed (errno=-16), periodically, until the drive is switched off.

If somebody could tell me whether hot-plugging is supposed to work with my
SATA interface, it would be nice. =)  The motherboard happens to offer another
SATA interface (Sil3512A) which is well supported and appears to support
hot-plugging as well, but it conflicts nastily with my PCTV Pro (bttv) card
(which are apparently known to conflict with the Sil SATA interfaces).

Thanks for any help.

Ignacy

-- 
I used to have a sig, but I've stopped smoking now.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/