Re: questions on NAPI processing latency and dropped network packets
On Jan 10, 2008 9:24 AM, Chris Friesen [EMAIL PROTECTED] wrote: After a recent userspace app change, we've started seeing packets being dropped by the ethernet hardware (e1000, NAPI is enabled). The error/dropped/fifo counts are going up in ethtool: (These are perhaps too obvious, but I didn't see the questions or answers in the thread.) Can you reproduce it with a simple userspace cpu hog? (Two, really, one per cpu.) Can you reproduce it with the newer e1000? Can you reproduce it with git head? If the answer to the first one is yes, the last no, then bisect until you get a kernel that doesn't show the problem. Backport the fix, unless the fix happens to be CFS. However, I suspect that your userpace app is just starving the system from time to time. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Regression: Wireshark sees no packets in 2.6.24-rc3
On Dec 14, 2007 11:09 PM, Ray Lee [EMAIL PROTECTED] wrote: On Dec 14, 2007 6:41 PM, Gabriel C [EMAIL PROTECTED] wrote: Correct, absolutely no traffic. So if it works for you, then either it's something that got fixed between -rc3 and -rc5, or something odd when I did a make oldconfig, I suppose. (Or because I'm on an x86-64 kernel?) Regardless, -rc5 is currently building, and I'll try it in the morning. -rc5 works great. Really don't know what's different between my -rc3 and -rc5 builds. The diff of .config between the two doesn't show anything obvious, so perhaps it was something fixed in the interim. I've gone ahead and closed the bugzilla entry, btw. Thanks, and sorry for the false (or tardy) alarm. Ray -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Regression: Wireshark sees no packets in 2.6.24-rc3
On Dec 14, 2007 6:41 PM, Gabriel C [EMAIL PROTECTED] wrote: Rafael J. Wysocki wrote: On Friday, 14 of December 2007, Ray Lee wrote: tshark -i eth0, eth1, lo are all empty. Works under 2.6.23.0 just fine. A quick scan of the log between 2.6.24-rc3 and current tip (-rc5) doesn't show any obvious fixes, but then again, what do I know. I'll check current tip on the weekend when I'll have the luxury to have my main system down long enough for a test. Right now I'm kinda up against a deadline, but didn't want to leave it unreported. Should be easy for someone else to confirm or deny whether current tip has the problem. FYI, I have created a bugzilla entry for this issue at: http://bugzilla.kernel.org/show_bug.cgi?id=9568 Hmm what do you mean by empty ? it does not capturing anything on that interface ? Correct, absolutely no traffic. So if it works for you, then either it's something that got fixed between -rc3 and -rc5, or something odd when I did a make oldconfig, I suppose. (Or because I'm on an x86-64 kernel?) Regardless, -rc5 is currently building, and I'll try it in the morning. I do run -rc5-git with wireshark-0.99.6 and tshark -i eth0 or lo works here. Excellent. Thank you for checking! Rafael: I'll update the bugzilla as warranted after testing. Ray -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Regression: Wireshark sees no packets in 2.6.24-rc3
tshark -i eth0, eth1, lo are all empty. Works under 2.6.23.0 just fine. A quick scan of the log between 2.6.24-rc3 and current tip (-rc5) doesn't show any obvious fixes, but then again, what do I know. I'll check current tip on the weekend when I'll have the luxury to have my main system down long enough for a test. Right now I'm kinda up against a deadline, but didn't want to leave it unreported. Should be easy for someone else to confirm or deny whether current tip has the problem. Ray -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [BUG] New Kernel Bugs
On Nov 13, 2007 7:24 AM, Giacomo A. Catenazzi [EMAIL PROTECTED] wrote: As a long time kernel tester, I see some problem with the newer new development model. In the short merge windows, after to much time, there are to many patches. I think the root issue there is that it's hard to get all testers to run a bisect, but easy to ask them to test snapshots. Right now the snapshots are generated nightly, but I think it would make more sense if they were generated every N patches, for some value of N... Of course, for that to really work, we have to ensure that the result is always compilable, which has been getting better, but not perfect. Ray - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Weird network problems with 2.6.23-rc2
Hello there Shish, On Aug 10, 2007 11:39 PM, Shish [EMAIL PROTECTED] wrote: Something seems to have broken in 2.6.23-rc2, and I'm not sure what, or where I should look for further debugging. The info I have: On my 2.6.23-rc2 desktop, things run fine. On my test server, built from the same source tree, networking goes strange every few minutes, with the following symptoms: o) running ping against the server, the first ping goes through; further pings go AWOL until about icmp_seq=30, when I get 4-5 icmp replies (marked as DUP!), then no pings for a while, then dups, and so on. o) the server doesn't see ARP replies. According to tcpdump, the server will send eg who has 192.168.0.2? tell 192.168.0.1; the client in question will recieve the packet and send a response, but nothing shows up in the server-side tcpdump. o) after a few minutes of random network troubles, everything will work fine again, (ping is normal, arp replies are seen, tcp sessions work) for a few minutes. o) The server's dmesg shows lots of short udp packet messages o) ifdown then ifup'ing the interfaces fixes things, temporarily. Reverting to 2.6.22, everything seems to be running fine (but no lguest, which is what I came for :( ) I've also tried with the latest code from git, the behaviour is the same as 2.6.23-rc2. Several questions. What network card do you have on your server? Is this still reproducible with the latest code from git? If so, it would be extremely helpful if you could do a bisect between 2.6.22 and 2.6.23-rc2. Feel free to ask for help if you need it. Ray - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 00/23] per device dirty throttling -v8
(adding netdev cc:) On 8/4/07, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: On Sat, 4 Aug 2007, Ingo Molnar wrote: * Ingo Molnar [EMAIL PROTECTED] wrote: There are positive reports in the never-ending my system crawls like an XT when copying large files bugzilla entry: http://bugzilla.kernel.org/show_bug.cgi?id=7372 i forgot this entry: We recently upgraded our office to gigabit Ethernet and got some big AMD64 / 3ware boxes for file and vmware servers... only to find them almost useless under any kind of real load. I've built some patched 2.6.21.6 kernels (using the bdi throttling patch you mentioned) to see if our various Debian Etch boxes run better. So far my testing shows a *great* improvement over the stock Debian 2.6.18 kernel on our configurations. and bdi has been in -mm in the past i think, so we also know (to a certain degree) that it does not hurt those workloads that are fine either. [ my personal interest in this is the following regression: every time i start a large kernel build with DEBUG_INFO on a quad-core 4GB RAM box, i get up to 30 seconds complete pauses in Vim (and most other tasks), during plain editing of the source code. (which happens when Vim tries to write() to its swap/undo-file.) ] I have an issue that sounds like it's related. I've got a syslog server that's got two Opteron 246 cpu's, 16G ram, 2x140G 15k rpm drives (fusion MPT hardware mirroring), 16x500G 7200rpm SATA drives on 3ware 9500 cards (software raid6) running 2.6.20.3 with hz set at default and preempt turned off. I have syslog doing buffered writes to the SCSI drives and every 5 min a cron job copies the data to the raid array. I've found that if I do anything significant on the large raid array that the system looses a significant amount of the UDP syslog traffic, even though there should be pleanty of ram and cpu (and the spindles involved in the writes are not being touched), even a grep can cause up to 40% losses in the syslog traffic. I've experimented with nice levels (nicing down the grep and nicing up the syslogd) without a noticable effect on the losses. I've been planning to try a new kernel with hz=1000 to see if that would help, and after that experiment with the various preempt settings, but it sounds like the per-device queues may actually be more relavent to the problem. what would you suggest I test, and in what order and combination? At least on a surface level, your report has some similarities to http://lkml.org/lkml/2007/5/21/84 . In that message, John Miller mentions several things he tried without effect: - I increased the max allowed receive buffer through proc/sys/net/core/rmem_max and the application calls the right syscall. netstat -su does not show any packet receive errors. - After getting kernel: swapper: page allocation failure. order:0, mode:0x20, I increased /proc/sys/vm/min_free_kbytes - ixgb.txt in kernel network documentation suggests to increase net.core.netdev_max_backlog to 30. This did not help. - I also had to increase net.core.optmem_max, because the default value was too small for 700 multicast groups. As they're all pretty simple to test, it may be worthwhile to give them a shot just to rule things out. Ray - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ieee80211 sleeping in invalid context
Michael Buesch wrote: Congratulations to your decision ;) Sometimes making decisions via Brownian motion has its advantages. Which kernel are you using? Hmm, I'm using the mercurial repository, let me see if I can translate that to a git head... Looks like git tree c2bb88baa52429b6b76e3ba4272cb2b29713c5a8 . (Which is from less than 24 hours ago.) There is some locking breakage in latest kernels with softmac. I attached the fixes for the known bugs. Okay, I'll apply to my local copy, rebuild, and try again. I'll let you know what happens. Ray - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
ieee80211 sleeping in invalid context
Hey all, more data on my bcm43xx problem report from a few weeks back. By random chance I acquired a brain, and decided to rebuild my latest kernel pull with as many debugging options on as I could stand. Got the below, plus a dead keyboard (except for Magic SysRq) (but only if I let userspace come up fully -- booting with init=/bin/bash is fine). Since the trace below mentions scans, I'm hoping it's related to my problem. In other news, now that I've moved my laptop back to my home office, I'm able to recreate the dead-keyboard lockups I've been having again, about once every day or two. What fun. So if there are patches I should try ontop of the latest git, let me know. (Though I'm hoping the below will be a smoking gun to someone who has a clue, i.e., not me.) Ray Dec 11 19:34:18 phoenix syslogd 1.4.1#18ubuntu6: restart. Dec 11 19:34:18 phoenix kernel: Inspecting /boot/System.map-2.6.19 Dec 11 19:34:19 phoenix kernel: Loaded 26330 symbols from /boot/System.map-2.6.19. Dec 11 19:34:19 phoenix kernel: Symbols match kernel version 2.6.19. Dec 11 19:34:19 phoenix kernel: No module symbols loaded - kernel modules not enabled. Dec 11 19:34:19 phoenix kernel: [0.00] Linux version 2.6.19 ([EMAIL PROTECTED]) (gcc version 4.1.2 20060928 (prerelease) (Ubuntu 4.1.1-13ubuntu5)) #1 PREEMPT Mon Dec 11 12:52:41 PST 2006 Dec 11 19:34:19 phoenix kernel: [0.00] Command line: root=UUID=bf7dc35f-5eff-4a85-b398-590f37c5679e ro noapic Dec 11 19:34:19 phoenix kernel: [0.00] BIOS-provided physical RAM map: Dec 11 19:34:19 phoenix kernel: [0.00] BIOS-e820: - 0009fc00 (usable) Dec 11 19:34:19 phoenix kernel: [0.00] BIOS-e820: 0009fc00 - 000a (reserved) Dec 11 19:34:19 phoenix kernel: [0.00] BIOS-e820: 000e - 0010 (reserved) Dec 11 19:34:20 phoenix kernel: [0.00] BIOS-e820: 0010 - 37fd (usable) Dec 11 19:34:21 phoenix kernel: [0.00] BIOS-e820: 37fd - 37fefc00 (reserved) Dec 11 19:34:21 phoenix kernel: [0.00] BIOS-e820: 37fefc00 - 37ffb000 (ACPI NVS) Dec 11 19:34:21 phoenix kernel: [0.00] BIOS-e820: 37ffb000 - 4000 (reserved) Dec 11 19:34:21 phoenix kernel: [0.00] BIOS-e820: e000 - f000 (reserved) Dec 11 19:34:21 phoenix kernel: [0.00] BIOS-e820: fec0 - fec02000 (reserved) Dec 11 19:34:21 phoenix kernel: [0.00] BIOS-e820: ffb8 - ffc0 (reserved) Dec 11 19:34:21 phoenix kernel: [0.00] BIOS-e820: fff8 - 0001 (reserved) Dec 11 19:34:21 phoenix kernel: [0.00] end_pfn_map = 1048576 Dec 11 19:34:21 phoenix kernel: [0.00] DMI 2.3 present. Dec 11 19:34:23 phoenix kernel: [0.00] No mptable found. Dec 11 19:34:23 phoenix kernel: [0.00] Zone PFN ranges: Dec 11 19:34:23 phoenix kernel: [0.00] DMA 0 - 4096 Dec 11 19:34:23 phoenix kernel: [0.00] DMA324096 - 1048576 Dec 11 19:34:24 phoenix kernel: [0.00] Normal1048576 - 1048576 Dec 11 19:34:24 phoenix kernel: [0.00] early_node_map[2] active PFN ranges Dec 11 19:34:24 phoenix kernel: [0.00] 0:0 - 159 Dec 11 19:34:24 phoenix kernel: [0.00] 0: 256 - 229328 Dec 11 19:34:24 phoenix hpiod: 1.6.9 accepting connections at 2208... Dec 11 19:34:25 phoenix kernel: [0.00] ACPI: PM-Timer IO Port: 0x8008 Dec 11 19:34:25 phoenix kernel: [0.00] ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled) Dec 11 19:34:25 phoenix kernel: [0.00] Processor #0 (Bootup-CPU) Dec 11 19:34:25 phoenix kernel: [0.00] ACPI: LAPIC_NMI (acpi_id[0x01] high edge lint[0x1]) Dec 11 19:34:25 phoenix kernel: [0.00] ACPI: Skipping IOAPIC probe due to 'noapic' option. Dec 11 19:34:25 phoenix kernel: [0.00] arch/x86_64/mm/init.c:145: bad pte 810001c58fe8(8000fec01173). Dec 11 19:34:25 phoenix kernel: [0.00] Nosave address range: 0009f000 - 000a Dec 11 19:34:25 phoenix kernel: [0.00] Nosave address range: 000a - 000e Dec 11 19:34:25 phoenix kernel: [0.00] Nosave address range: 000e - 0010 Dec 11 19:34:25 phoenix kernel: [0.00] Allocating PCI resources starting at 5000 (gap: 4000:a000) Dec 11 19:34:25 phoenix kernel: [0.00] Built 1 zonelists. Total pages: 223940 Dec 11 19:34:25 phoenix kernel: [0.00] Kernel command line: root=UUID=bf7dc35f-5eff-4a85-b398-590f37c5679e ro noapic Dec 11 19:34:25 phoenix kernel: [0.00] Initializing CPU#0 Dec 11 19:34:25 phoenix kernel: [0.00] PID hash table entries: 4096 (order: 12, 32768 bytes) Dec 11 19:34:25 phoenix kernel: [ 13.705535] time.c: Using 3.579545 MHz WALL PM GTOD PIT/TSC
Re: bcm43xx regression 2.6.19rc3 - rc5, rtnl_lock trouble?
Larry Finger wrote: Johannes Berg wrote: Hah, that's a lot more plausible than bcm43xx's drain patch actually causing this. So maybe somehow interrupts for bcm43xx aren't routed properly or something... Ray, please check /proc/interrupts when this happens. When it happens, I can't. The keyboard is entirely dead (I'm in X, perhaps at a console it would be okay). The only thing that works is magic SysRq. even ctrl-alt-f1 to get to a console doesn't work. That said, /proc/interrupts doesn't show MSI routed things on my AMD64 laptop. I am convinced that the patch in question (drain tx status) is not causing this -- the patch should be a no-op in most cases anyway, and in those cases where it isn't a no-op it'll run only once at card init and remove some things from a hardware-internal FIFO. Okay, I can buy that. I agree that drain tx status should not cause the problem. Ray, does -rc6 solve your problem as it did for Joseph? I can't get it to repeat other than the first two times. However, I accidentally stopped NetworkManager from handling my wireless a few days ago, and haven't restarted it, so that may play into this. Humor me one last time, I beg. Did you look at the messages file I posted? (Or maybe I didn't include this second bit... Damn, I need to be more careful with cutting and pasting...) The second sysrq-t shows locking stuff going on, can you tell me if it looks reasonable? It still seems to me that something acquiring and not releasing rtnl_lock explains what I was seeing (rtnl lock is implicated in both sysrq-t backtraces). I don't know if that thing is bcm43xx, though. Is this part reasonable?: 1 lock held by events/0/4: #0: (bcm-mutex){--..}, at: [mutex_lock+9/16] mutex_lock+0x9/0x10 2 locks held by NetworkManager/4837: #0: (rtnl_mutex){--..}, at: [mutex_lock+9/16] mutex_lock+0x9/0x10 #1: (bcm-mutex){--..}, at: [mutex_lock+9/16] mutex_lock+0x9/0x10 1 lock held by wpa_supplicant/5953: #0: (rtnl_mutex){--..}, at: [mutex_lock+9/16] mutex_lock+0x9/0x10 (So locks A, AB, B) ...of the below... Showing all locks held in the system: 1 lock held by events/0/4: #0: (bcm-mutex){--..}, at: [mutex_lock+9/16] mutex_lock+0x9/0x10 1 lock held by getty/4224: #0: (tty-atomic_read_lock){--..}, at: [mutex_lock_interruptible+9/16] mutex_lock_interruptible+0x9/0x10 1 lock held by getty/4225: #0: (tty-atomic_read_lock){--..}, at: [mutex_lock_interruptible+9/16] mutex_lock_interruptible+0x9/0x10 1 lock held by getty/4226: #0: (tty-atomic_read_lock){--..}, at: [mutex_lock_interruptible+9/16] mutex_lock_interruptible+0x9/0x10 1 lock held by getty/4227: #0: (tty-atomic_read_lock){--..}, at: [mutex_lock_interruptible+9/16] mutex_lock_interruptible+0x9/0x10 1 lock held by getty/4228: #0: (tty-atomic_read_lock){--..}, at: [mutex_lock_interruptible+9/16] mutex_lock_interruptible+0x9/0x10 1 lock held by getty/4229: #0: (tty-atomic_read_lock){--..}, at: [mutex_lock_interruptible+9/16] mutex_lock_interruptible+0x9/0x10 2 locks held by NetworkManager/4837: #0: (rtnl_mutex){--..}, at: [mutex_lock+9/16] mutex_lock+0x9/0x10 #1: (bcm-mutex){--..}, at: [mutex_lock+9/16] mutex_lock+0x9/0x10 1 lock held by wpa_supplicant/5953: #0: (rtnl_mutex){--..}, at: [mutex_lock+9/16] mutex_lock+0x9/0x10 1 lock held by less/29492: #0: (tty-atomic_read_lock){--..}, at: [mutex_lock_interruptible+9/16] mutex_lock_interruptible+0x9/0x10 1 lock held by bash/9871: #0: (tty-atomic_read_lock){--..}, at: [mutex_lock_interruptible+9/16] mutex_lock_interruptible+0x9/0x10 = Regardless, I'm going to withdraw my regression report until I can reproduce this. I can't justify holding anything up if we can't even finger a culprit to look at. In the meantime I'll try running with rc6. Ray - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: bcm43xx regression 2.6.19rc3 - rc5, rtnl_lock trouble?
First off, thanks for all your help. Second off, On 11/16/06, Larry Finger [EMAIL PROTECTED] wrote: Ray Lee wrote: If I could figure out a way to make it repeatable, I'd happily do a blind bisect. [...] I'm open to suggestions on how to make the problem trigger more than once every two days... I don't know what might be causing the lock problems. I'm more concerned with the NETDEV WATCHDOG timeouts. AFAIK, you are the only one still reporting this error. On my system, I get an occasional MAC suspend failure, sometimes followed by an BCM43xx_IRQ_XMIT_ERROR. Last time I had trouble with 2.6.18-rcX, I wasn't the only one, just the only one reporting it. Can you tell me why reverting the likely culprit isn't an option? rc6 is out, and Linus is really pushing to finalize 2.6.19 here soon. From what I read in your post, the timeouts happen a lot more often than once every two days. Once we get those fixed, then we can concentrate on the locking. It's becoming clear that I wasn't so clear :-). No, it doesn't happen more than once every two (three, now) days. I'm saying that it's only happened twice, as once the first timeout message starts, the timeouts don't stop short of a reboot. Or, in other words, it happened occasionally under 2.6.19-rc3, but fixed itself. Under 2.6.19-rc5, it's happened less frequently (maybe), but once it starts, it goes on solid until I reboot the computer. Until I reboot, the laptop is fully unusable as things start hanging on the rtnl_lock (X, apparently). Please see http://madrabbit.org/~ray/messages.gz for the /var/log/messages to understand what I mean by that. (Though, that was captured before I'd rebuilt the module with debugging, unfortunately. Regardless, it may help clarify what I mean here.) So all the NETDEV WATCHDOG timeouts other than the first (of each of the two events) appear to be bogus, or side effects of rtnl_lock being held after the first time, and not clearing out. thinks... Maybe I've got the culprit backward here. Perhaps something else in my system is locking on rtnl_lock, and bcm43xx can't acquire it? Could the NETDEV WATCHDOG timeouts be a side effect of someone acquiring and not releasing the rtnl_lock()? Is that possible? (ie, would it cause the effect I'm seeing?) Ray - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
bcm43xx regression 2.6.19rc3 - rc5, rtnl_lock trouble?
Hey all, I ran 2.6.19-rc3 for almost two weeks or so with no difficulties (none related to the bcm43xx driver, at least). However, Andrew asked me to double check the latest release to see if my problem report against 2.6.18 (hard locks) was fixed. Good news is that it still is fixed. Bad news is that 2.6.19-rc5 is worse than rc3 in other ways. I've come back to my laptop being mostly dead after hours of it being off on its own (twice now). Mostly dead meaning the keyboard is nearly non-responsive, but the mouse works great (I'm in X, of course). I say 'nearly dead' as sysrq-t,b works, so I'm sorta stumped there. (x-session seems to use netlink, so perhaps that's the connection? ctrl-alt-f[1-7] don't do anything, however.) It seems to be a locking problem, though lockdep isn't catching it. I'll let you guys decide though. Regardless, here's what's I can see. My logs start filling with: $ grep 'NETDEV WATCHDOG:' /var/log/messages | cut -d '[' -f 2- | head 50025.388173] NETDEV WATCHDOG: eth1: transmit timed out 50029.019574] NETDEV WATCHDOG: eth1: transmit timed out 50030.835313] NETDEV WATCHDOG: eth1: transmit timed out 50032.651049] NETDEV WATCHDOG: eth1: transmit timed out 50034.466785] NETDEV WATCHDOG: eth1: transmit timed out 50036.282523] NETDEV WATCHDOG: eth1: transmit timed out 50038.098237] NETDEV WATCHDOG: eth1: transmit timed out 50039.913974] NETDEV WATCHDOG: eth1: transmit timed out 50041.729709] NETDEV WATCHDOG: eth1: transmit timed out 50043.545447] NETDEV WATCHDOG: eth1: transmit timed out (...1249 of these, so it doesn't fix itself.) and then the system becomes pretty worthless. (Full /var/log/messages with sysrq-t at: http://madrabbit.org/~ray/messages.gz ). Interesting bits of that: $ grep -B5 -A10 'Nov 13 01:5.*mutex' /var/log/messages | cut -d ']' -f2- DWARF2 unwinder stuck at child_rip+0xa/0x12 Leftover inexact backtrace: [restore_args+0/48] restore_args+0x0/0x30 [mutex_lock+9/16] mutex_lock+0x9/0x10 [kthread+0/272] kthread+0x0/0x110 [child_rip+0/18] child_rip+0x0/0x12 khelper S 810037fbe318 0 5 1 6 4 (L-TLB) 810037907e60 0046 810037907e70 810037fbe140 81001095f140 3b5d 810001e3e668 0286 810037907e40 8026bbb2 810037907e70 810001e3e600 Call Trace: [worker_thread+236/352] worker_thread+0xec/0x160 [kthread+211/272] kthread+0xd3/0x110 -- DWARF2 unwinder stuck at child_rip+0xa/0x12 Leftover inexact backtrace: [restore_args+0/48] restore_args+0x0/0x30 [mutex_lock+9/16] mutex_lock+0x9/0x10 [kthread+0/272] kthread+0x0/0x110 [child_rip+0/18] child_rip+0x0/0x12 kthread S 810037fad218 0 6 1252129 5 (L-TLB) 810037f01e60 0046 810037f01e70 810037fad040 81002b3df140 062b 810001e3e468 0286 810037f01e40 8026bbb2 810037f01e70 810001e3e400 Call Trace: [worker_thread+236/352] worker_thread+0xec/0x160 [kthread+211/272] kthread+0xd3/0x110 -- DWARF2 unwinder stuck at child_rip+0xa/0x12 Leftover inexact backtrace: [restore_args+0/48] restore_args+0x0/0x30 [mutex_lock+9/16] mutex_lock+0x9/0x10 [kthread+0/272] kthread+0x0/0x110 [child_rip+0/18] child_rip+0x0/0x12 kblockd/0 S 810037989318 025 626 (L-TLB) 81003798fe60 0046 81003798fe70 810037989140 8100379a5100 078b 810037fa2468 0286 81003798fe40 8026bbb2 81003798fe70 810037fa2400 Call Trace: [worker_thread+236/352] worker_thread+0xec/0x160 [kthread+211/272] kthread+0xd3/0x110 -- NetworkManage D 810037943258 0 4833 1 4853 4809 (NOTLB) 81002bfefbe8 0046 81002bfefb98 810037943080 81002e6d2100 000122a6 8062ce80 0046 0246 810037943080 81002e47b3f0 81002e47b3a0 Call Trace: [__mutex_lock_slowpath+344/624] __mutex_lock_slowpath+0x158/0x270 [mutex_lock+9/16] mutex_lock+0x9/0x10 [_end+126343345/2126632680] :bcm43xx:bcm43xx_wx_get_mode+0x29/0x60 [ioctl_standard_call+139/944] ioctl_standard_call+0x8b/0x3b0 [wireless_process_ioctl+260/976] wireless_process_ioctl+0x104/0x3d0 [dev_ioctl+854/944] dev_ioctl+0x356/0x3b0 [sock_ioctl+576/624] sock_ioctl+0x240/0x270 [do_ioctl+49/160] do_ioctl+0x31/0xa0 [vfs_ioctl+683/720] vfs_ioctl+0x2ab/0x2d0 [sys_ioctl+106/160] sys_ioctl+0x6a/0xa0 [system_call+126/131] system_call+0x7e/0x83 DWARF2 unwinder stuck at system_call+0x7e/0x83 -- x-session-man D 81002ef02298 0 5625 4565 5672 4586 (NOTLB) 810028a1fad8 0046 8062c500 81002ef020c0 8100249a6040 8c5d 0046 0246 81002ef020c0 805505b0 80550560 Call Trace: [__mutex_lock_slowpath+344/624]
Re: bcm43xx regression 2.6.19rc3 - rc5, rtnl_lock trouble?
Larry Finger wrote: Ray Lee wrote: Michael Buesch wrote: On Wednesday 15 November 2006 20:01, Ray Lee wrote: Suggestions? Requests for shudder even more info? Yeah, enable bcm43xx debugging. Sigh, didn't even think to look for that. Okay, enabled and compiling a new kernel. This will take a few days to trigger, if the pattern holds, so in the meantime, any *other* thoughts? Which chip and revision do you have? Send me your equivalent of the line bcm43xx: Chip ID 0x4306, rev 0x2. bcm43xx: Chip ID 0x4306, rev 0x3 Also, another thing I wasn't clear about in my first email was that the netdev watchdog timeouts are new with rc5: $ zgrep 'NETDEV WATCH' /var/log/messages{,.0,.1.gz} | cut -d: -f2| cut -c 1-6 | uniq -c 1249 Nov 13 6 Nov 6 1 Nov 7 3 Nov 8 2 Nov 9 5717 Nov 10 5652 Nov 11 5 Oct 29 3 Oct 30 3 Oct 31 4 Nov 1 1 Nov 2 1 Nov 3 I booted into 2.6.19-rc5 on November 10th. Previous to that was 2.6.19-rc3. There really does seem to be something suspicious with that patch, yes? Thanks, Ray - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Bcm43xx softMac Driver in 2.6.18
(re-adding linux-kernel.) Larry Finger wrote: Would you please test the attached patch that should be applied to a vanilla 2.6.18? I'm currently running it, but only for a few minutes. It comes up fine and I ran it through several ifdown/ifup cycles without any problem. Okay, this is far better than vanilla 2.6.18 (or your other patch). I've been running this for six hours so far with no troubles, when before I'd have a hard system freeze within a minute or two of associating (or trying to associate) with an access point. As for -stable, the patch is sorta, y'know, ginormous: bcm43xx.h | 181 +- bcm43xx_debugfs.c | 80 bcm43xx_debugfs.h |1 bcm43xx_dma.c | 583 +++--- bcm43xx_dma.h | 296 + bcm43xx_leds.c| 10 bcm43xx_main.c| 905 +++--- bcm43xx_main.h|6 bcm43xx_phy.c | 48 +- bcm43xx_pio.c |4 bcm43xx_sysfs.c | 46 +- bcm43xx_wx.c | 121 +++ 12 files changed, 1426 insertions(+), 855 deletions(-) OTOH, the current version is completely unusable on this system, so I don't know if the right path is to revert the driver to 2.6.17's version, or to try to move forward with the patch when it's had hard review and testing. I'm heading out on vacation for the next two weeks. I'll catch up with any mail directed to me for more things to try (or report about this specific system), if requested, when I get back. (Or catch me today.) Thank you very much for your help, Ray - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Bcm43xx softMac Driver in 2.6.18
On 9/22/06, Larry Finger [EMAIL PROTECTED] wrote: When we found the cause of NETDEV watchdog timeouts in the wireless-2.6 code, I knew that the 2.6.18 release code would cause a serious regression. I don't know if this is the lockup you're trying to address, but 2.6.18's bcm43xx has definitely regressed for me versus 2.6.17.x. 2.6.18 vanilla and 2.6.18 with your patch both lock my system hard with bcm43xx. I've got an HP/Compaq nx6125 laptop. Symptoms are that it will associate fine on its own and send traffic to/fro upon ifup, but when I do an iwconfig, ifdown, ifup to change the access point, the system locks (somewhat randomly) during one of those operations. Well, the iwconfig or the ifup, actually. lspci -v: 02:02.0 Network controller: Broadcom Corporation BCM4309 802.11a/b/g (rev 03) Subsystem: Hewlett-Packard Company Unknown device 12f9 Flags: bus master, fast devsel, latency 64, IRQ 11 Memory at d001 (32-bit, non-prefetchable) [size=8K] ./bcm43xx-fwcutter -i BCMWL5.SYS filename : bcmwl5.sys version : 4.10.40.1 MD5 : 69f940672be0ecee5bd1e905706ba8ce Wireless tools are Version: 28-1ubuntu2. I've got multiple access points in view of the laptop, a g (54Mb), and a b (11Mb). Neither with encryption enabled, if that makes a difference (we live in the boonies). It's 2.6.18 + your patch, compiled for x86_64, ubuntu devel. Any suggestions or requests for tests? Ray - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Bcm43xx softMac Driver in 2.6.18
Rafael J. Wysocki wrote: 2.6.18 vanilla and 2.6.18 with your patch both lock my system hard with bcm43xx. I've got an HP/Compaq nx6125 laptop. Symptoms are that it will associate fine on its own and send traffic to/fro upon ifup, but when I do an iwconfig, ifdown, ifup to change the access point, the system locks (somewhat randomly) during one of those operations. Well, the iwconfig or the ifup, actually. I have observed similar symptoms on HPC nx6325, although I haven't managed to get the adapter associate with an AP. Yeah, I'm having the same troubles. Carefully watching the iwconfig results showed me that only half of the time did my `iwconfig eth1 essid AccessPointName` actually take. (It listed the essid of the ap I told it to associate with, but then showed Access Point: Invalid or words to that effect, until I issued the exact same iwconfig again.) So, try it twice, double check the iwconfig output, then try bringing up the interface. Though that seems awfully difficult to do as well (DHCP is just sending out stuff with nothing coming back). When I switch consoles while DHCP is plaintively asking for an IP, and issue *another* iwconfig with the same essid, then it seems to kick something in the driver and DHCP immediately associates. Happened twice for me so far, though that could merely be a coincidence. Ray - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC][PATCH 2/9] deadlock prevention core
On 8/18/06, Andrew Morton [EMAIL PROTECTED] wrote: I assert that this can be solved by putting swap on local disks. Peter asserts that this isn't acceptable due to disk unreliability. I point out that local disk reliability can be increased via MD, all goes quiet. A good exposition which helps us to understand whether and why a significant proportion of the target user base still wishes to do swap-over-network would be useful. Adding a hard drive adds $low per system, another failure point, and more importantly ~3-10 Watts which then has to be paid for twice (once to power it, again to cool it). For a hundred seats, that's significant. For 500, it's ranging toward fully painful. I'm in the process of designing the next upgrade for a VoIP call center, and we want to go entirely diskless in the agent systems. We'd also rather not swap over the network, but 'swap is as swap does.' That said, it in no way invalidates using /proc/sys/vm/min_free_kbytes... Ray - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html