found 628444 linux-2.6/3.2.9-1
tags 628444 + upstream patch moreinfo
quit

Hi Dafydd,

Dafydd Harries wrote:

> I've been seeing similar problems with my "Intel Corporation Centrino
> Ultimate-N 6300".
>
> Like others, the problems seemed to start around 2.6.39.

Odd. What kernel did you use before then?  (/var/log/dpkg.log might
tell.)

> Like othes, the card flakes out a day or two after booting, and a reboot
> always fixes the problem. Occasionally it stays working for longer.
>
> Like others, I've added RAM. But as far as I can recall the upgrade
> happened well before any poblems started appearing.

Interesting and useful.

> Any ASPM settings are at their default.
>
> I'll try wd_disable=1 as a workaround for now.
>
> Meenakshi, will the patch you mentioned be applied in 3.3?

Cc-ing her.  The patch currently seems to be part of the wireless-next
tree but not davem's net tree.

> Below is a syslog excerpt from around the time of failue. It seems to
> support Meenakshi's suggestion that it's related to the queue getting
> stuck.

Well, that can be tested.  Could you try the patch against current
"master"?  It works like this:

0. Prerequisites:
        apt-get install git build-essential

1. Get the kernel history, if you don't already have it:
        git clone \
          git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git

2. Configure and build:
        cd linux
        git checkout origin/master
        cp /boot/config-$(uname -r) .config; # current configuration
        make localmodconfig; # optional: minimize configuration
        make deb-pkg; # optionally with -j<num> for parallel build
        dpkg -i ../<name of package>; # as root
        reboot

        ... test test test ...

3. Hopefully it reproduces the problem.  So try the attached patch:
        git am -3sc <the patch>
        make deb-pkg; # maybe with -j4
        dpkg -i ../<name of package>; # as root
        reboot

If it works, we can pass this to Dave with information about what
happened and your test result, to get the patch fast-tracked.

Thanks,
Jonathan

> Below is a syslog excerpt from around the time of failue. It seems to
> support Meenakshi's suggestion that it's related to the queue getting
> stuck.
[...]
> iwlwifi 0000:02:00.0: Queue 4 stuck for 2000 ms.
> iwlwifi 0000:02:00.0: Current read_ptr 112 write_ptr 115
> iwlwifi 0000:02:00.0: On demand firmware reload
> iwlwifi 0000:02:00.0: Command REPLY_QOS_PARAM failed: FW Error
> iwlwifi 0000:02:00.0: Failed to update QoS
> iwlwifi 0000:02:00.0: fw recovery, no hcmd send
> iwlwifi 0000:02:00.0: Error sending REPLY_RXON: enqueue_hcmd failed: -5
> iwlwifi 0000:02:00.0: Error clearing ASSOC_MSK on BSS (-5)
> iwlwifi 0000:02:00.0: MAC is in deep sleep!. CSR_GP_CNTRL = 0xFFFFFFFF
> iwlwifi 0000:02:00.0: MAC is in deep sleep!. CSR_GP_CNTRL = 0xFFFFFFFF
[...]
> ieee80211 phy0: Hardware restart was requested
> wpa_supplicant[1472]: CTRL-EVENT-DISCONNECTED bssid=00:50:7f:cb:4b:58 reason=4
> ieee80211 phy0: failed to remove key (1, ff:ff:ff:ff:ff:ff) from hardware (-2)
[....]
> iwlwifi 0000:02:00.0: Could not load the INST uCode section
> iwlwifi 0000:02:00.0: Failed to start RT ucode: -110
[...]
> iwlwifi 0000:02:00.0: MAC is in deep sleep!. CSR_GP_CNTRL = 0xFFFFFFFF
[...]
> I get some kind of OOPS but I'm guessing this is just because the driver can't
> communicate with the card when the module is being unloaded:
[...]
> WARNING: at 
> /build/buildd-linux-2.6_3.2.9-1-amd64-KTPapN/linux-2.6-3.2.9/debian/build/source_amd64_none/drivers/net/wireless/iwlwifi/iwl-core.c:1330
>  iwlagn_mac_remove_interface+0x48/0xdd [iwlwifi]()
> Hardware name: 3249CTO
> Modules linked in: uvcvideo videodev v4l2_compat_ioctl32 media snd_usb_audio 
> snd_usbmidi_lib pci_stub vboxpci(O) vboxnetadp(O) vboxnetflt(O) vboxdrv(O) 
> acpi_cpufreq mperf cpufreq_stats cpufreq_userspace cpu
> Mar 12 13:15:04 localhost kernel: sync_memcpy async_tx raid1 raid0 multipath 
> linear md_mod sd_mod crc_t10dif usbhid hid ahci libahci ehci_hcd libata 
> scsi_mod usbcore thermal thermal_sys usb_common e1000e [last unloaded: 
> scsi_wait_scan]
> Mar 12 13:15:04 localhost kernel: [48290.674508] Pid: 1405, comm: 
> NetworkManager Tainted: G           O 3.2.0-2-amd64 #1
> Mar 12 13:15:04 localhost kernel: [48290.674511] Call Trace:
> Mar 12 13:15:04 localhost kernel: [48290.674520]  [<ffffffff81046879>] ? 
> warn_slowpath_common+0x78/0x8c
> Mar 12 13:15:04 localhost kernel: [48290.674531]  [<ffffffffa03ea9af>] ? 
> iwlagn_mac_remove_interface+0x48/0xdd [iwlwifi]
[...]
> Mar 12 13:15:04 localhost kernel: [48290.674647]  [<ffffffff812a35a5>] ? 
> netlink_rcv_skb+0x36/0x7a
[...]
> iwlwifi 0000:02:00.0: ctx->vif =           (null), vif = ffff8801b1c72df0
> iwlwifi 0000:02:00.0:  ID = 0: ctx = ffff8801b1a834b0  ctx->vif =           
> (null)
From: Johannes Berg <johannes.b...@intel.com>
Date: Sun, 4 Mar 2012 08:50:46 -0800
Subject: iwlwifi: always monitor for stuck queues

commit 342bbf3fee2fa9a18147e74b2e3c4229a4564912 upstream.

If we only monitor while associated, the following
can happen:
 - we're associated, and the queue stuck check
   runs, setting the queue "touch" time to X
 - we disassociate, stopping the monitoring,
   which leaves the time set to X
 - almost 2s later, we associate, and enqueue
   a frame
 - before the frame is transmitted, we monitor
   for stuck queues, and find the time set to
   X, although it is now later than X + 2000ms,
   so we decide that the queue is stuck and
   erroneously restart the device

It happens more with P2P because there we can
go between associated/unassociated frequently.

Cc: sta...@vger.kernel.org
Reported-by: Ben Cahill <ben.m.cah...@intel.com>
Signed-off-by: Johannes Berg <johannes.b...@intel.com>
Signed-off-by: Wey-Yi Guy <wey-yi.w....@intel.com>
Signed-off-by: John W. Linville <linvi...@tuxdriver.com>
Signed-off-by: Jonathan Nieder <jrnie...@gmail.com>
---
 drivers/net/wireless/iwlwifi/iwl-core.c |   18 ++++--------------
 1 file changed, 4 insertions(+), 14 deletions(-)

diff --git a/drivers/net/wireless/iwlwifi/iwl-core.c 
b/drivers/net/wireless/iwlwifi/iwl-core.c
index 7bcfa781e0b9..3abe9ede6990 100644
--- a/drivers/net/wireless/iwlwifi/iwl-core.c
+++ b/drivers/net/wireless/iwlwifi/iwl-core.c
@@ -1465,20 +1465,10 @@ void iwl_bg_watchdog(unsigned long data)
        if (timeout == 0)
                return;
 
-       /* monitor and check for stuck cmd queue */
-       if (iwl_check_stuck_queue(priv, priv->shrd->cmd_queue))
-               return;
-
-       /* monitor and check for other stuck queues */
-       if (iwl_is_any_associated(priv)) {
-               for (cnt = 0; cnt < hw_params(priv).max_txq_num; cnt++) {
-                       /* skip as we already checked the command queue */
-                       if (cnt == priv->shrd->cmd_queue)
-                               continue;
-                       if (iwl_check_stuck_queue(priv, cnt))
-                               return;
-               }
-       }
+       /* monitor and check for stuck queues */
+       for (cnt = 0; cnt < hw_params(priv).max_txq_num; cnt++)
+               if (iwl_check_stuck_queue(priv, cnt))
+                       return;
 
        mod_timer(&priv->watchdog, jiffies +
                  msecs_to_jiffies(IWL_WD_TICK(timeout)));
-- 
1.7.9.2

Reply via email to