Re: [GIT] Networking
On Wed, Oct 24, 2018 at 2:24 PM Andy Gross wrote: > > Yes this will conflict with Niklas's patch which is part of the 4.20 > pull requests. I would prefer that we revert Linus's and take Niklas's > unless there is a compelling argument to have it fixed before -rc1. I have no objection to just reverting my patch when I get the real fix. I just don't want my tree to have warnings that I see, and that may hide new warnings coming in when I do my next pull request.. Linus
Re: [PATCH v5 00/12] spectre variant1 mitigations for tip/x86/pti
On Fri, Jan 26, 2018 at 11:55 PM, Dan Williams wrote: > > Here's another spin of the spectre-v1 mitigations for 4.16. I see nothing really objectionable here. And unlike Spectre-v2 and Meltdown, I expect Spectre-v1 to be with us for a long time. It's not a "CPU did a bad job with checking the cached information it had" (whether it be from the TLB, BTB or RSB), it's pretty fundamental to just regular conditional branch prediction. So ack from me, and I don't expect this to be behind any config options. I still haven't really seen any numbers for this, but I _assume_ it's basically not measurable. Linus
Re: [PATCH v2 00/19] prevent bounds-check bypass via speculative execution
On Fri, Jan 12, 2018 at 4:15 PM, Tony Luck wrote: > > Here there isn't any reason for speculation. The core has the > value of 'x' in a register and the upper bound encoded into the > "cmp" instruction. Both are right there, no waiting, no speculation. So this is an argument I haven't seen before (although it was brought up in private long ago), but that is very relevant: the actual scope and depth of speculation. Your argument basically depends on just what gets speculated, and on the _actual_ order of execution. So your argument depends on "the uarch will actually run the code in order if there are no events that block the pipeline". Or at least it depends on a certain latency of the killing of any OoO execution being low enough that the cache access doesn't even begin. I realize that that is very much a particular microarchitectural detail, but it's actually a *big* deal. Do we have a set of rules for what is not a worry, simply because the speculated accesses get killed early enough? Apparently "test a register value against a constant" is good enough, assuming that register is also needed for the address of the access. Linus
Re: [PATCH v2 00/19] prevent bounds-check bypass via speculative execution
On Thu, Jan 11, 2018 at 4:46 PM, Dan Williams wrote: > > This series incorporates Mark Rutland's latest ARM changes and adds > the x86 specific implementation of 'ifence_array_ptr'. That ifence > based approach is provided as an opt-in fallback, but the default > mitigation, '__array_ptr', uses a 'mask' approach that removes > conditional branches instructions, and otherwise aims to redirect > speculation to use a NULL pointer rather than a user controlled value. Do you have any performance numbers and perhaps example code generation? Is this noticeable? Are there any microbenchmarks showing the difference between lfence use and the masking model? Having both seems good for testing, but wouldn't we want to pick one in the end? Also, I do think that there is one particular array load that would seem to be pretty obvious: the system call function pointer array. Yes, yes, the actual call is now behind a retpoline, but that protects against a speculative BTB access, it's not obvious that it protects against the mispredict of the __NR_syscall_max comparison in arch/x86/entry/entry_64.S. The act of fetching code is a kind of read too. And retpoline protects against BTB stuffing etc, but what if the _actual_ system call function address is wrong (due to mis-prediction of the system call index check)? Should the array access in entry_SYSCALL_64_fastpath be made to use the masking approach? Linus
Re: [run_timer_softirq] BUG: unable to handle kernel paging request at 0000000000010007
On Wed, Nov 8, 2017 at 9:19 PM, Fengguang Wu wrote: > > Yes it's accessing the list. Here is the faddr2line output. Ok, so it's a corrupted timer list. Which is not a big surprise. It's next->pprev = pprev; in __hlist_del(), and the trapping instruction decodes as mov%rdx,0x8(%rax) with %rax having the value dead0200, Which is just LIST_POISON2. So we've deleted that entry twice - LIST_POISON2 is what hlist_del() sets pprev to after already deleting it once. Although in this case it might not be hlist_del(), because detach_timer() also sets entry->next to LIST_POISON2. Which is pretty bogus, we are supposed to use LIST_POISON1 for the "next" pointer. Oh well. Nobody cares, except for the list entry debugging code, which isn't run on the hlist cases. Adding Thomas Gleixner to the cc. It should not be possible to delete the same timer twice. Linus
Re: [run_timer_softirq] BUG: unable to handle kernel paging request at 0000000000010007
On Sun, Oct 29, 2017 at 4:48 PM, Fengguang Wu wrote: > > Here are 3 dmesgs related to wireless and 1 from ethernet. Fengguang, these would be lovelier still _if_ you have DEBUG_INFO enabled on the kernel, and your script were to find things like "symbol+0xhex/0xhex", and run "./scripts/faddr2line" on them. So > [ 235.425464] BUG: unable to handle kernel paging request at 00010007 > [ 235.425470] IP: run_timer_softirq+0x13a/0x470 would also then have run_timer_softirq at timer.c:XYZ which would make it easier to see exactly _what_ it is that faults. As it is, I think there's a fair number of inlining that makes it hard to see the cause, but that faddrtoline would make very obvious. Finding that "symbol+xyz/abc" pattern should be fairly easy to automate, and should fit the 0day model fairly well. No? In this case, the trapping instruction ends up decoding to 0: 4c 8d 6c c5 90lea-0x70(%rbp,%rax,8),%r13 5: 49 8b 45 00 mov0x0(%r13),%rax 9: 48 85 c0 test %rax,%rax c: 74 deje 0xffec e: 4d 8b 7d 00 mov0x0(%r13),%r15 12: 4d 89 7c 24 08mov%r15,0x8(%r12) 17: 0f 1f 44 00 00nopl 0x0(%rax,%rax,1) 1c: 49 8b 07 mov(%r15),%rax 1f: 49 8b 57 08 mov0x8(%r15),%rdx 23: 48 85 c0 test %rax,%rax 26: 48 89 02 mov%rax,(%rdx) 29: 74 04je 0x2f 2b:* 48 89 50 08 mov%rdx,0x8(%rax) <-- trapping instruction 2f: 41 f6 47 2a 20testb $0x20,0x2a(%r15) 34: 49 c7 47 08 00 00 00 movq $0x0,0x8(%r15) and %rax has the value 0x, so yes, it will trap at 0x10007. It's not trivial to see just *wjhat* access it is. I *think* that "testb $32" is checking for TIMER_IRQSAFE in expire_timers(), and that the oops is due to the list operations in detach_timer() (inlined). Which doesn't really help: it looks like the timer lists are corrupt. With some luck, some register state could have the timer function pointer in it, and we'd get a hint of *which* timer this is, but that doesn't look to be the case here either. I'm not seeing anything to really help debug this here. Linus
Re: iwlwifi firmware load broken in current -git
On Fri, Sep 15, 2017 at 12:43 PM, Luca Coelho wrote: > On Fri, 2017-09-15 at 12:38 -0700, Linus Torvalds wrote: >> >> From some of the context it looks like commit 40f11adc7cd9 ("PCI: >> Avoid race while enabling upstream bridges"), is that correct? > > Yes, that's the one. And Bjorn already sent a revert: > > https://lkml.org/lkml/2017/9/15/46 Well, he may have sent a revert to lkml, but not to me. I'm assuming it's in his tree and I'll get a pull request. Hopefully soon, so that it makes rc. Jens, you were actually cc'd on that revert according to the email headers, so check your spam-box. Linus
Re: iwlwifi firmware load broken in current -git
On Fri, Sep 15, 2017 at 12:32 PM, Jens Axboe wrote: >> >> In any case, your patch introduces a regression on systems. Please get >> it reverted now, and then you can come up with a new approach to fix the >> double enable of the upstream bridge. > > Who's sending in the revert? I can certainly do it if no one else does, > but it needs to be done. > > I'm not seeing any patches coming out of Srinath to fix up the > situation, so we should revert the broken patch until a better solution > exists. Hmm. I don't have the history here (apparently it never made lkml, for example), so I don't even know which commit you're talking about. >From some of the context it looks like commit 40f11adc7cd9 ("PCI: Avoid race while enabling upstream bridges"), is that correct? Linus
Re: [PATCH] iwlwifi: mvm: only send LEDS_CMD when the FW supports it
On Thu, Sep 7, 2017 at 5:39 AM, Kalle Valo wrote: > > Linus, do you want to apply this directly or should we take it via the > normal route (wireless-drivers -> net)? If your prefer the latter when > I'm planning to submit this to Dave in a day or two and expecting it to > get to your tree in about a week, depending of course what is Dave's > schedule. Since we have a workaround for the problem, let's just go through the regular channels. As long as I get the fix through David before the merge window closes, I'm happy. Linus
Re: [GIT] Networking
On Wed, Sep 6, 2017 at 10:40 PM, Luca Coelho wrote: > > This patch is not very important (unless you really like blinking lights > -- maybe I'll change my mind when the holidays approach :P). so it is > fine if you just want to revert it for now. > > In any case, I'll send a patch fixing this problem soon. No need to revert if we can get this fixed quickly enough. I'll leave the fw-31 on my laptop, so that I can continue to use it for now. Thanks, Linus
Re: [GIT] Networking
On Wed, Sep 6, 2017 at 9:11 PM, Coelho, Luciano wrote: > > This seems to be a problem with backwards-compatibility with FW version > 27. We are now in version 31[1] and upgrading will probably fix that. I can confirm that fw version 31 works. > But obviously the driver should not fail miserably like this with > version 27, because it claims to support it still. Not just "claims to support it", but if it's what is shipped with a fairly recent distro like an up-to-date version of F26, I would really hope that the driver can still work with it. > I'm looking into this now and will provide a fix asap. Thanks, Linus
Re: [GIT] Networking
On Wed, Sep 6, 2017 at 4:27 PM, Linus Torvalds wrote: > > The firmware is iwlwifi-8000C-28.ucode from > iwl7260-firmware-25.30.13.0-75.fc26.noarch, and the kernel reports > > iwlwifi :3a:00.0: loaded firmware version 27.455470.0 op_mode iwlmvm And when I said "iwlwifi-8000C-28.ucode" I obviously meant "iwlwifi-8000C-27.ucode". At least it was _hopefully_ obvious from that "27" in the actual version number it reports. Linus
Re: [GIT] Networking
This pull request completely breaks Intel wireless for me. This is my trusty old XPS 13 (9350), using Intel Wireless 8260 (rev 3a). That remains a very standard Intel machine with absolutely zero odd things going on. The firmware is iwlwifi-8000C-28.ucode from iwl7260-firmware-25.30.13.0-75.fc26.noarch, and the kernel reports iwlwifi :3a:00.0: loaded firmware version 27.455470.0 op_mode iwlmvm the thing starts acting badly with this: iwlwifi :3a:00.0: FW Error notification: type 0x cmd_id 0x04 iwlwifi :3a:00.0: FW Error notification: seq 0x service 0x0004 iwlwifi :3a:00.0: FW Error notification: timestamp 0x5D84 iwlwifi :3a:00.0: Microcode SW error detected. Restarting 0x200. iwlwifi :3a:00.0: Start IWL Error Log Dump: iwlwifi :3a:00.0: Status: 0x0100, count: 6 iwlwifi :3a:00.0: Loaded firmware version: 27.455470.0 iwlwifi :3a:00.0: 0x0038 | BAD_COMMAND iwlwifi :3a:00.0: 0x00A002F0 | trm_hw_status0 ... iwlwifi :3a:00.0: 0x | isr status reg ieee80211 phy0: Hardware restart was requested iwlwifi :3a:00.0: FW error in SYNC CMD MAC_CONTEXT_CMD CPU: 2 PID: 993 Comm: NetworkManager Not tainted 4.13.0-06466-g80cee03bf1d6 #4 Hardware name: Dell Inc. XPS 13 9350/09JHRY, BIOS 1.4.17 05/10/2017 Call Trace: dump_stack+0x4d/0x70 iwl_trans_pcie_send_hcmd+0x4e7/0x530 [iwlwifi] ? wait_woken+0x80/0x80 iwl_trans_send_cmd+0x5c/0xc0 [iwlwifi] iwl_mvm_send_cmd+0x32/0x90 [iwlmvm] iwl_mvm_send_cmd_pdu+0x58/0x80 [iwlmvm] iwl_mvm_mac_ctxt_send_cmd+0x2a/0x60 [iwlmvm] ? iwl_mvm_mac_ctxt_send_cmd+0x2a/0x60 [iwlmvm] iwl_mvm_mac_ctxt_cmd_sta+0x140/0x1e0 [iwlmvm] iwl_mvm_mac_ctx_send+0x2d/0x60 [iwlmvm] iwl_mvm_mac_ctxt_add+0x43/0xc0 [iwlmvm] iwl_mvm_mac_add_interface+0x139/0x2b0 [iwlmvm] ? iwl_led_brightness_set+0x1f/0x30 [iwlmvm] drv_add_interface+0x4a/0x120 [mac80211] ieee80211_do_open+0x33d/0x820 [mac80211] ieee80211_open+0x52/0x60 [mac80211] __dev_open+0xae/0x120 __dev_change_flags+0x17b/0x1c0 dev_change_flags+0x29/0x60 do_setlink+0x2f7/0xe60 ? __nla_put+0x20/0x30 ? _raw_read_unlock_bh+0x20/0x30 ? inet6_fill_ifla6_attrs+0x4be/0x4e0 ? __kmalloc_node_track_caller+0x35/0x2b0 ? nla_parse+0x35/0x100 rtnl_newlink+0x5d2/0x8f0 ? __netlink_sendskb+0x3b/0x60 ? security_capset+0x40/0x80 ? ns_capable_common+0x68/0x80 ? ns_capable+0x13/0x20 rtnetlink_rcv_msg+0x1f9/0x280 ? rtnl_calcit.isra.26+0x110/0x110 netlink_rcv_skb+0x8e/0x130 rtnetlink_rcv+0x15/0x20 netlink_unicast+0x18b/0x220 netlink_sendmsg+0x2ad/0x3a0 sock_sendmsg+0x38/0x50 ___sys_sendmsg+0x269/0x2c0 ? addrconf_sysctl_forward+0x114/0x280 ? dev_forward_change+0x140/0x140 ? sysctl_head_finish.part.22+0x32/0x40 ? lockref_put_or_lock+0x5e/0x80 ? dput.part.22+0x13e/0x1c0 ? mntput+0x24/0x40 __sys_sendmsg+0x54/0x90 ? __sys_sendmsg+0x54/0x90 SyS_sendmsg+0x12/0x20 entry_SYSCALL_64_fastpath+0x13/0x94 RIP: 0033:0x7ff1f9933134 RSP: 002b:7ffe7419b460 EFLAGS: 0293 ORIG_RAX: 002e RAX: ffda RBX: 55604b6d82b9 RCX: 7ff1f9933134 RDX: RSI: 7ffe7419b4b0 RDI: 0007 RBP: 7ffe7419b940 R08: R09: 55604d16b400 R10: 7ff1f7cf8b38 R11: 0293 R12: 0001 R13: 0001 R14: 7ffe7419b670 R15: 55604b9515a0 iwlwifi :3a:00.0: Failed to send MAC context (action:1): -5 and it doesn't get any better from there. The next error seems to be Timeout waiting for hardware access (CSR_GP_CNTRL 0x0808) [ cut here ] WARNING: CPU: 3 PID: 1075 at drivers/net/wireless/intel/iwlwifi/pcie/trans.c:1874 iwl_trans_pcie_grab_nic_access+0xdf/0xf0 [iwlwifi] and it will continue with those microcode failure errors and various other warnigns about how nothing is working. And no, nothing works. A lot of log output, no actual network access.. Linus
Re: [PATCH V2] brcmfmac: fix possible buffer overflow in brcmf_cfg80211_mgmt_tx()
On Fri, Jul 7, 2017 at 1:09 PM, Arend van Spriel wrote: > Now I signed off on the patch although formally I suppose Linus should > sign it off. You can certainly consider it Signed-off-by: Linus Torvalds but I really don't need the authorship (or resulting sign-off requirement) because multiple people ended up sending in very similar patches. All the real work was in actually finding the issue. Linus
Re: [PATCH] brcmfmac: fix possible buffer overflow in brcmf_cfg80211_mgmt_tx()
On Fri, Jul 7, 2017 at 6:17 AM, Johannes Berg wrote: > > Linus, since you were involved already, will you apply this directly? I don't think it's _that_ urgent, since it's specific to one particular driver anyway. I'd suggest just going through the normal channels, and be cc'd to netdev. > I guess it should also have a Cc: stable tag, and perhaps a Fixes? The fixes tag would be Fixes: 18e2f61db3b70 ("brcmfmac: P2P action frame tx.") which is 3.9 in case anybody cares. I assume that didn't get backported any further. Linus
Re: [PATCH] brcmfmac: buffer overflow in brcmf_cfg80211_mgmt_tx()
On Thu, Jul 6, 2017 at 10:11 AM, Arend van Spriel wrote: > > Looks fine to me so ... I really think that if we can't trust 'len', then we have to check against the lower bound of DOT11_MGMT_HDR_LEN too, because otherwise we'll just have a big 16-bit number instead. And we should do that brcmf_err() that I had in my version, which also let's people know they are being attacked. Linus
Re: [PATCH] cfg80211: make RATE_INFO_BW_20 the default
On Thu, May 4, 2017 at 8:22 AM, David Miller wrote: > From: Johannes Berg >> >> I figured I'd give Linus to a chance to try or even apply it, but I >> have no objection to you applying it either. I don't have anything else >> yet right now, and sending a pull request for just a single patch >> would be quite pointless. > > Ok, let's give Linus a chance to test the patch. I'm having trouble recreating the warning. I have no idea why. It only happened during ten minutes yesterday, and nothing in my wireless setup has changed. I wonder if *normally* my setup ends up connecting with a 40MHz band or something, and I just happened to see the default uninitialized case once. I see that Jens reported that the patch works, although I'm wondering how repeatable it was for him. The patch obviously looks simple and seems like an obviously GoodThing(tm) regardless. Linus
new warning at net/wireless/util.c:1236
So my Dell XPS 13 seems to have grown a new warning as of the networking merge yesterday. Things still work, but when it starts warning, it generates a *lot* of noise (I got 36 of these within about ten minutes). I have no idea what triggered it, because when I rebooted (not because of this issue, but just to reboot into a newer kernel) I don't see it again. This is all pretty regular wireless - it's intel 8260 wireless in a fairly normal laptop. Things still seem to *work* ok, so the only problem here is the overly verbose and useless WARN_ON. It doesn't even print out *which* rate it is warning about, it just does that stupid unconditional WARN_ON() without ever shutting up about it.. The WARN_ON() seems to be old, but my logs don't seem to have any mention of this until today, so there's something that has changed that now triggers it. Ideas? Linus --- WARNING: CPU: 3 PID: 1138 at net/wireless/util.c:1236 cfg80211_calculate_bitrate+0x139/0x170 [cfg80211] Modules linked in: rfcomm fuse ccm ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set nfnetlink ebtable_broute bridge stp llc ebtable_nat ip6table_security ip6table_mangle ip6table_nat nf_con snd_hda_codec iwlmvm irqbypass snd_hwdep snd_hda_core intel_cstate mac80211 snd_seq intel_rapl_perf snd_seq_device snd_pcm iwlwifi rtsx_pci_ms snd_timer cfg80211 memstick snd soundcore i2c_i801 shpchp i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm i2c_hid video CPU: 3 PID: 1138 Comm: NetworkManager Tainted: GW 4.11.0-06543-g2f34c1231bfc #60 Hardware name: Dell Inc. XPS 13 9350/09JHRY, BIOS 1.4.13 12/28/2016 task: 9c1d1bfcbb80 task.stack: bb95c337c000 RIP: 0010:cfg80211_calculate_bitrate+0x139/0x170 [cfg80211] RSP: 0018:bb95c337f5b8 EFLAGS: 00010293 RAX: RBX: 9c1cb080cc00 RCX: RDX: 0005 RSI: 0002 RDI: bb95c337f76e RBP: bb95c337f5b8 R08: 0004 R09: 9c1cc36fe0c4 R10: f7c00472 R11: c0682000 R12: 9c1cc36fe0c0 R13: bb95c337f76e R14: R15: 9c1cc36fe030 FS: 7f15f980() GS:9c1d3ed8() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 7ffd1ae62748 CR3: 000469f4f000 CR4: 003406e0 Call Trace: nl80211_put_sta_rate+0x56/0x210 [cfg80211] nl80211_send_station.isra.63+0x639/0xd60 [cfg80211] nl80211_get_station+0x1e4/0x250 [cfg80211] genl_family_rcv_msg+0x1fa/0x3e0 genl_rcv_msg+0x4c/0x90 netlink_rcv_skb+0xde/0x110 genl_rcv+0x28/0x40 netlink_unicast+0x189/0x220 netlink_sendmsg+0x2ba/0x3b0 sock_sendmsg+0x38/0x50 ___sys_sendmsg+0x2b6/0x2d0 __sys_sendmsg+0x54/0x90 SyS_sendmsg+0x12/0x20 entry_SYSCALL_64_fastpath+0x13/0x94 RIP: 0033:0x7efffebf82fd RSP: 002b:7fff97d4af30 EFLAGS: 0293 ORIG_RAX: 002e RAX: ffda RBX: RCX: 7efffebf82fd RDX: RSI: 7fff97d4afc0 RDI: 0010 RBP: 7fff97d4b070 R08: R09: 7efffcc7b168 R10: 55eb69e6d110 R11: 0293 R12: 7fff97d4b0e0 R13: 0001 R14: R15: 55eb68a32760 Code: 89 d0 f7 e1 d1 ea 8d 14 92 01 d2 81 c2 50 c3 00 00 b9 c5 5a 7c 0a c1 ea 05 89 d0 f7 e1 5d 89 d0 c1 e8 07 c3 31 c0 80 f9 02 74 b7 <0f> ff 31 c0 eb b1 0f ff 31 c0 5d c3 0f ff 31 c0 5d c3 8d 04 40
Re: ath10k regression on XPS13
On Tue, Feb 21, 2017 at 10:18 AM, David Miller wrote: > > Kalle I really wanted to send my net-next pull request to Linus later > today. But I guess I have to wait for this ath10k first. Feel free to send it to me - it sounds like the regression is (a) easy to work around and (b) has a fix coming up. And it won't even be something that I personally notice, since I have the prev-gen XPS13 that has intel wireless. Linus
Re: [RFC (v7)] add basic register-field manipulation macros
On Thu, Aug 18, 2016 at 10:11 AM, Jakub Kicinski wrote: > Hi! > > This is what I came up with. Changes: I can live with this, certainly. I'm not really sure how many drivers (or perhaps core code, for that matter) will actually start using it, but it at least _looks_ like a usable interface that seems to be quite resistant to people doing stupid things with it that would result in surprising results (either performance or semantics). So I'm ok with something like this coming through (for example) the wireless tree if the drivers there are the first ones to start using this. Let's see if anybody else objects. Linus
Re: [PATCHv6 1/2] add basic register-field manipulation macros
On Wed, Aug 17, 2016 at 10:11 AM, Jakub Kicinski wrote: > On Wed, 17 Aug 2016 09:33:26 -0700, Linus Torvalds wrote: >> >> I'm not a huge fan, since the interface fundamentally seems to be >> oddly designed (why pass in the mask rather than "start bit + >> length"?). > > Would that not require start and length to have separate defines? Yeah. > I assume doing: > > #define REG_BLA_FIELD_FOO 3, 4 > val = FIELD_GET(REG_BLA_FIELD_FOO, reg); > > is not acceptable. Attempts to define a single value brought us to the > shifted mask. Agreed. Maybe the mask with the complexity to then undo it (at compile time) is the better approach. Linus
Re: [PATCHv6 1/2] add basic register-field manipulation macros
On Wed, Aug 17, 2016 at 3:31 AM, Kalle Valo wrote: > > Are people ok with this? I think they are useful and I can take these > through my tree, but I would prefer to get an ack from other maintainers > first. Dave? Andrew? I'm not a huge fan, since the interface fundamentally seems to be oddly designed (why pass in the mask rather than "start bit + length"?). I also don't like how this very much would match the GENMASK() macros we have, but then it clashes with them in other ways - it's in a different header file - it has completely different naming (GENMASK_ULL vs FIELD_GET64}. I actually think the naming could/should be fixed by just automatically doing the right thing based on sizes. For example, GENMASK could just have something like #define GENMASK(end,start) __builtin_choose_expr((end)>31, __GENMASK_64(end,start), __GENMASK_32(end,start)) and doing similar things with the FIELD_GET/SET ops. So I'm not entirely happy about this all. But if people love the interface, and think the above kind of cleanups might be possible, I'd just want to make sure that there is also a BUILD_BUG_ON(!__builtin_constant_p(_mask)); because if the mask isn't a build-time constant, the code ends up being *complete* shit. Also preferably something like BUILD_BUG_ON((_mask) > (typeof(_val)~0ull); to make sure you can't try to mask bits that don't exist in the value. Or something like that to make mis-uses *really* obvious. The FIELD_PUT macro also seems misnamed. It doesn't "put" anything (unlike the GET macro). It just prepares the field for inserting later. As exemplified by how you actually have to put things: First, "GET" - yeah, that looks like a "get" operation: * Get: * a = FIELD_GET(REG_FIELD_A, reg); But then "PUT" isn't actually a PUT operation at all, but the comments kind of gloss over it by talking about "Modify" instead: * Modify: * reg &= ~REG_FIELD_C; * reg |= FIELD_PUT(REG_FIELD_C, c); so I'm not entirely sure about the naming. I can live with the FIELD_PUT naming, because I see how it comes about, even if I think it's a bit odd. I might have called it "FIELD_PREP" instead, but I'm not really sure that's all that much better. Am I being a bit anal? Yeah. But when we add whole new abstractions that we haven't used historically, I'd really like those to be obvious and easy to use (or rather, really _hard_ to get wrong by mistake). Hmm? Linus
Re: [PATCH] remove lots of IS_ERR_VALUE abuses
On Fri, May 27, 2016 at 2:23 PM, Arnd Bergmann wrote: > > This patch changes all users of IS_ERR_VALUE() that I could find > on 32-bit ARM randconfig builds and x86 allmodconfig. For the > moment, this doesn't change the definition of IS_ERR_VALUE() > because there are probably still architecture specific users > elsewhere. Patch applied with the fixups from Al Viro edited in. I also ended up removing a few other users (due to the vm_brk() interface), and then made IS_ERR_VALUE() do the "void *" cast so that integer use of a non-pointer size should now complain. It works for me and has no new warnings in my allmodconfig build, and with your ARM work that is presumably clean too. But other architectures may see new warnings. People who got affected by this should check their subsystem code for the changes. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-wireless" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] remove lots of IS_ERR_VALUE abuses
On Fri, May 27, 2016 at 2:46 PM, Andrew Morton wrote: > > So you do plan to add some sort of typechecking into IS_ERR_VALUE()? The easiest way to do it is to just turn the (x) into (unsigned long)(void *)(x), which then complains about casting an integer to a pointer if the integer has the wrong size. But if we get rid of the bogus cases, there's just a few left, and we should probably just rename the whole thing (the initial double underscore). It really isn't something normal people should use. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-wireless" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GIT] Networking
On Wed, May 18, 2016 at 11:58 AM, Kalle Valo wrote: > > It would be best if you could send a patch either directly to Dave or > Linus to resolve this quickly. I'm committing my patch myself right now, since this bug makes my laptop useless, and I will take credit for finding and testing it on my own even if it was apparently also discussed independently on the networking list ;) Linus -- To unsubscribe from this list: send the line "unsubscribe linux-wireless" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GIT] Networking
On Wed, May 18, 2016 at 11:45 AM, Linus Torvalds wrote: > > From what I can tell, there's a merge bug in commit 909b27f70643, > where David seems to have lost some of the changes to > iwl_mvm_set_tx_cmd(). > > I do not know if that's the reason for the problem I see. But I will test. Yes. The attached patch that fixes the incorrect merge seems to fix things for me. That should mean that the assumption that this problem existed in v4.6 too was wrong, because the incorrect merge came in later. I think Luciano mis-understood "v4.6+" to mean plain v4.6. Reinoud Koornstra, does this patch fix things for you too? Linus drivers/net/wireless/intel/iwlwifi/mvm/tx.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/tx.c b/drivers/net/wireless/intel/iwlwifi/mvm/tx.c index 880210917a6f..c53aa0f220e0 100644 --- a/drivers/net/wireless/intel/iwlwifi/mvm/tx.c +++ b/drivers/net/wireless/intel/iwlwifi/mvm/tx.c @@ -211,6 +211,7 @@ void iwl_mvm_set_tx_cmd(struct iwl_mvm *mvm, struct sk_buff *skb, struct iwl_tx_cmd *tx_cmd, struct ieee80211_tx_info *info, u8 sta_id) { + struct ieee80211_tx_info *skb_info = IEEE80211_SKB_CB(skb); struct ieee80211_hdr *hdr = (void *)skb->data; __le16 fc = hdr->frame_control; u32 tx_flags = le32_to_cpu(tx_cmd->tx_flags); @@ -294,7 +295,7 @@ void iwl_mvm_set_tx_cmd(struct iwl_mvm *mvm, struct sk_buff *skb, tx_cmd->tx_flags = cpu_to_le32(tx_flags); /* Total # bytes to be transmitted */ tx_cmd->len = cpu_to_le16((u16)skb->len + - (uintptr_t)info->driver_data[0]); + (uintptr_t)skb_info->driver_data[0]); tx_cmd->life_time = cpu_to_le32(TX_CMD_LIFE_TIME_INFINITE); tx_cmd->sta_id = sta_id;
Re: [GIT] Networking
On Wed, May 18, 2016 at 7:23 AM, Coelho, Luciano wrote: > > I can confirm that 4.6 contains the same bug. And reverting the patch > I mentioned does solve the problem... > > The same patch works fine in our internal tree. I'll have to figure > out together with Emmanuel what the problem actually is. Hmm. >From what I can tell, there's a merge bug in commit 909b27f70643, where David seems to have lost some of the changes to iwl_mvm_set_tx_cmd(). The reason seems to be a conflict with d8fe484470dd, where David missed the fact that "info->driver_data[0]" had become "skb_info->driver_data[0]", and then he removed the skb_info because it was unused. I do not know if that's the reason for the problem I see. But I will test. David, do you happen to recall that merge conflict? I think you must have removed that "skb_info" variable declaration and initialization manually (due to the "unused variable" warning, which in turn was due to the incorrect merge of the actual conflict), because I think git would have merged that line into the result. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-wireless" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GIT] Networking
On Tue, May 17, 2016 at 12:11 PM, David Miller wrote: > > Highlights: Lowlights: 1) the iwlwifi driver seems to be broken My laptop that uses the intel 7680 iwlwifi module no longer connects to the network. It fails with a "Microcode SW error detected." and spews out register state over and over again. The last thing it says before falling over is: wlp1s0: authenticate with xx:xx:xx:xx:xx:xx wlp1s0: send auth to xx:xx:xx:xx:xx:xx (try 1/3) wlp1s0: send auth to xx:xx:xx:xx:xx:xx (try 2/3) and then it goes all titsup. I thought that it might be because I had downloaded one of the daily firmware versions (it calls itself iwlwifi-7260-17.ucode, but isn't a real release afaik - but it has worked fien for me before), but the problem persists with the ver-16 ucode too, so that wasn't it. I haven't bisected it, but there is absolutely nothing odd in my hardware. I do have a 802.11ac network, which apparently not everybody does, judging by previous bug-reports of mine.. Intel iwlwifi people: please check this out. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-wireless" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
iwlwifi incomplete initialization in Linux 4.5?
So I upgraded the firmware on my Intel NUC (NUC6i3SYK), and that made the wireless no longer work with a 4.5 kernel. I could get the occasional packets through, but not many, and ti would hang for ten seconds at a time, and then output errors like iwlwifi :01:00.0: Queue 2 stuck for 1 ms. iwlwifi :01:00.0: Current SW read_ptr 60 write_ptr 93 .. which was odd, because that kernel had worked fine before. I booted between two different kernels, going back to an older 4.5-rc3 one that had been running a lot longer on that machine, because initially I thought that this was some recent kernel failure (I didn't initially connect it with the firmware upgrade, because this is my kids machine and I hadn't tested networking after the firmware update). But that older known-good kernel failed the same way. Going all the way back to the 4.4 kernel that Fedora uses made wireless work, and then rebooting back into a 4.5 kernel also worked. Now, it's *possible* that it was just something odd and transient and it just happened to clear up as I rebooted into the Fedora kernel, but it feels more likely that there's some incomplete initialization in recent 4.5 kernels, which isn't normally noticeable, but the full system reset done as part of the firmware upgrade might have shown it. I'm attaching all the iwlwifi debug output that goes along with the stuck queue, in the hopes that it makes sense to somebody. This is from the 4.5-rc3 boot into an older kernel, but final 4.5 showed the same behavior. Googling iwlwifi stuck queues shows a lot of reports over the years, but it might be a common symptom of "something is screwed up". I'm not sure I can reproduce it any more now that it works again (and I'm not really willing to force a firmware downgrade), but if there is something particular to test, I can do that. Ideas? Linus celeste-wifi-problem Description: Binary data
Re: iwlwifi incomplete initialization in Linux 4.5?
On Wed, Mar 16, 2016 at 2:23 PM, Linus Torvalds wrote: > >> Do you use 20Mhz or 40MHz? > > HT20 on 2.4GHz, HT40 on 5GHz. > > At least that's the wireless AP setup. > >> Basically, I'd like to see the output of iw dev > > I'll have to walk over and check. I don't have my machines set up so > you can get into them over the network.. Hmm. "iw dev" seems to say that device is using the 2.4Ghz side, at least in the working configuration. phy#0 Unnamed/non-netdev interface wdev 0x2 addr a4:34:d9:0e:20:d7 type P2P-device Interface wlp1s0 ifindex 3 wdev 0x1 addr a4:34:d9:0e:20:d6 type managed channel 1 (2412 MHz), width: 20 MHz, center1: 2412 MHz I have no idea why it wouldn't connect to the 5GHz network, but it might just be far enough away (a couple of walls, not so much distance) that it is borderline. Both networks have the same essid and password, maybe I should add a separate 5GHz network to make it easier to say "connect to that one" for testing, in case the trouble happens with the 5GHz side only. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-wireless" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: iwlwifi incomplete initialization in Linux 4.5?
On Wed, Mar 16, 2016 at 2:13 PM, Grumbach, Emmanuel wrote: > > This ... typically means that the firmware got stuck while sending > packets. Can you tell me on what band your router operates? 2.4GHz or > 5.2GHz? Both. > Do you use 20Mhz or 40MHz? HT20 on 2.4GHz, HT40 on 5GHz. At least that's the wireless AP setup. > Basically, I'd like to see the output of iw dev I'll have to walk over and check. I don't have my machines set up so you can get into them over the network.. > Hmm, this is strange since 4.4 and 4.5 will both load -16.ucode which > you seemed to be running when the have the Queue hang message. Correct. Both cases used the 16 ucode, since that's what F22 comes with. I upgrade the kernel, but intentionally don't touch anything else in the system. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-wireless" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: WARNING at net/mac80211/rate.c:513 ieee80211_get_tx_rates [mac80211]
On Fri, Jan 29, 2016 at 11:42 AM, Larry Finger wrote: > > Thanks for testing. > > Upon reflection, it really should check the other WIRELESS_MODE_AC_x bits. > Johannes' patch was indeed correct. I just retested with this incremental (and whitespace-damaged) patch: @@ -139,7 +139,9 @@ static void _rtl_rc_rate_set_series(struct rtl_priv *rtlpriv, (wireless_mode == WIRELESS_MODE_N_24G))) rate->flags |= IEEE80211_TX_RC_MCS; if (sta && sta->vht_cap.vht_supported && - (wireless_mode == WIRELESS_MODE_AC_5G)) + ((wireless_mode == WIRELESS_MODE_AC_5G) || +(wireless_mode == WIRELESS_MODE_AC_24G) || +(wireless_mode == WIRELESS_MODE_AC_ONLY))) rate->flags |= IEEE80211_TX_RC_VHT_MCS; } } which brings it in line with Johannes' patch, and it does indeed still work. I think marking it for stable is also the right thing to do - the driver clearly doesn't work well in a wide-channel AC environment otherwise, and I assume it's going to be more and more common.. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-wireless" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: WARNING at net/mac80211/rate.c:513 ieee80211_get_tx_rates [mac80211]
On Fri, Jan 29, 2016 at 9:54 AM, Larry Finger wrote: > > The test patch that Johannes sent earlier was close. The section needed to > add VHT rates is: Hmm. This looks pretty much exactly like what I already tried (I had fixed Johannes' patch to use "vht_cap" already, since it didn't compile otherwise). So the only difference is that it only checks WIRELESS_MODE_AC_5G. But it worked for me this time. I have no idea why. Maybe Johannes' patch actually always worked for me, but I just had a transient problem that made me think it didn't. I think I only booted it once, and went "oh, ok, no network, that didn't work". Linus -- To unsubscribe from this list: send the line "unsubscribe linux-wireless" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: WARNING at net/mac80211/rate.c:513 ieee80211_get_tx_rates [mac80211]
On Thu, Jan 28, 2016 at 5:54 PM, Larry Finger wrote: > > I have been running an RTL8821AE since kernel 3.18 without hitting this > problem using a TRENDnet AC1750 dual-band AP. The UniFi may be doing > something that the driver is not expecting. I've had issues with unifi ap's before, but to be honest, I've had issues with lots of hotel and airport wifi too. I don't think the Unifi APs are outside of the normal spectrum.. > Attached is a minimal patch that comments out the "vht_cap->vht_supported = > true;" statement for both RTL8821AE and RTL8812AE in > _rtl_init_hw_vht_capab(). Does that allow your system to work? That works too, yes. > The patch > also logs some information regarding the channelplan and the country code. > Please let me know the values for those. rtlwifi: channelplan 127 rtlwifi: country code 13 > I apparently missed a previous complaint about this issue. If you still have > the reference, please send it to me. So googling for similar issues, I found https://bugzilla.redhat.com/show_bug.cgi?id=1168467 https://bugzilla.redhat.com/show_bug.cgi?id=1293136 where that second one in particular looks very like my issue ("Association succeeds, and ARP/DHCP work, but no IP frames can be transmitted"). In both cases you have to go into the dmesg attachment to see that its rtlwifi in both cases). And there's an ubuntuforum thread http://ubuntuforums.org/showthread.php?t=2226009&page=2 where it you follow the thing, it's an rtl chip on a PCI card, and it has very similar "connected but no internet" behavior, along with the "net/mac80211/rate.c:526" warning (different line numbers, different kernel version, but it smells similar). Or this one: http://forums.debian.net/viewtopic.php?f=5&t=111781 which is also rtl-wifi, and also has the "associated, connected, got an IP, but no data, not even a ping" behavior. It also has the warning, but it looks different in other ways (2.4GHz only and actually says it's not doing HT/VHT). So I don't know. The warning in net/mac80211/rate.c:does seem to be associated with the realtek driver. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-wireless" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: WARNING at net/mac80211/rate.c:513 ieee80211_get_tx_rates [mac80211]
On Thu, Jan 28, 2016 at 2:12 PM, Johannes Berg wrote: > > Your best workaround may just be to ignore VHT for now - clearly it's > broken so using "just" HT (which is likely not that much of a penalty > anyway since you're apparently not using 80 MHz) will be much better. > > Go into > > _rtl_init_hw_vht_capab() > > and just remove or stub out the entire contents of that (or you could > just remove the "vht_supported=true" if you feel like it.) > > That should get it to HT only, which is likely tested and working > better. Bingo. That indeed gets me working wireless. It's not super-fast, but I don't think it ever has been.. If somebody has a suggested patch to actually *fix* VHT on this chipset, that would obviously be better. And maybe it works on some other chipsets, but not on mine. I'll happily test patches now that the merge window is over and I have some time again (and I can also make my AP do 80MHz channels if that matters, although as Johannes noted it's not enabled by default). For the realtek driver people, here is what lspci says: 02:00.0 Network controller: Realtek Semiconductor Co., Ltd. RTL8821AE 802.11ac PCIe Wireless Network Adapter Subsystem: AzureWave Device 2161 Kernel driver in use: rtl8821ae (Numeric PCI ID: 10ec:8821, subsystem 1a3b:2161) Thanks, Linus -- To unsubscribe from this list: send the line "unsubscribe linux-wireless" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: WARNING at net/mac80211/rate.c:513 ieee80211_get_tx_rates [mac80211]
On Thu, Jan 28, 2016 at 1:44 PM, Linus Torvalds wrote: > > I will try Johannes' suggestion on that machine to see if it makes a > difference Well, it "makes a difference" in the sense that the warning goes away. But it doesn't make things work. In fact, it might be making things worse. Because with that patch, the wireless still authenticates and associates, but then it doesn't even get an IP address, so now even dhcp doesn't work. Of course, I was surprised that it worked last time, and I'm not 100% sure it did work consistently. I'll re-test without the patch, just to make sure, but it doesn't really seem to improve on anything. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-wireless" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: WARNING at net/mac80211/rate.c:513 ieee80211_get_tx_rates [mac80211]
Adding the RTL people to the cc, and leaving the whole thing quoted at the bottom.. I will try Johannes' suggestion on that machine to see if it makes a difference, but somebody who knows the rtlwifi rate control code should take a double- or triple-look at this. Please? Some googling shows that this is not a new issue. Or at least I seem to find reports that look very much like this from over a year ago. Linus On Thu, Jan 28, 2016 at 12:40 PM, Johannes Berg wrote: > On Thu, 2016-01-28 at 11:01 -0800, Linus Torvalds wrote: >> >> I used to have the basic original UniFi UAP. I've replaced them with >> the newer "AC Lite" version: >> >> https://www.ubnt.com/unifi/unifi-ap-ac-lite/ >> >> so it's a fairly big jump from a 2.4GHz-only network to a dual-band >> one. >> >> The old 2.4GHz-only AP's showed the problem with minstrel-ht >> incorrectly starting off at the highest rate (on a totally different >> machine). So the Unifi AP's have shown problems in the kernel >> wireless before, but so far it's always been the fault of the kernel >> wireless, not the AP. > > Yeah; I wasn't trying to blame it on this change, I was just trying to > understand the change in the environment. Seems likely that it's simply > the switch to 5 GHz, which is strange, I'd have thought that even that > rtlwifi driver would've been tested with that :) > >> > Could you print out the entire table there when the warning >> > happens? >> >> This is the best I can come up with: printing out the index, and the >> rate and bitrate tables: >> >> rates[i].idx (9) >= sband->n_bitrates (8) >> Rates: >> 0: idx 9 count 1 flags a0 >> 1: idx 8 count 1 flags a0 >> 2: idx 7 count 2 flags a0 >> 3: idx 6 count 3 flags a0 > > Yeah, perfect. See, this is already evidently not making any sense: > > flags a0 is > IEEE80211_TX_RC_40_MHZ_WIDTH | IEEE80211_TX_RC_SHORT_GI > > both of those options *require* IEEE80211_TX_RC_MCS or > IEEE80211_TX_RC_VHT_MCS as well, so the flags really should be a8 or > 1a0. > >> Bitrates: >> 0: flags 0002 bitrate 60 (hw: 0004 ) >> 1: flags bitrate 90 (hw: 0005 ) >> 2: flags 0002 bitrate 120 (hw: 0006 ) >> 3: flags bitrate 180 (hw: 0007 ) >> 4: flags 0002 bitrate 240 (hw: 0008 ) >> 5: flags bitrate 360 (hw: 0009 ) >> 6: flags bitrate 480 (hw: 000a ) >> 7: flags bitrate 540 (hw: 000b ) >> >> So it's the very first rate that has index 9, but the bitrate table >> only goes from 0-7. >> >> So I suspect that once the first index has been marked invalid, it >> now will never even look at the later indices, so it has no transmit >> rates at all. Or something. > > Indeed. > >> That bitrate table does seem to match: >> >>static struct ieee80211_rate rtl_ratetable_5g[] = { >> >> in drivers/net/wireless/realtek/rtlwifi/base.c >> > > Yeah, it would, but it's irrelevant since the rate table isn't actually > used with MCS rates. > > I'm not familiar with this code at all, but looking at it suggests that > perhaps the switch to 5 GHz wasn't at fault, but instead the switch to > VHT (802.11ac) - that's more plausible too, not testing with VHT seems > like something that could have happened for this driver. > > And as I figured, the code in _rtl_rc_rate_set_series() is obviously > not handling VHT correctly: it has > > if (sgi_20 || sgi_40 || sgi_80) > rate->flags |= IEEE80211_TX_RC_SHORT_GI; > if (sta && sta->ht_cap.ht_supported && > ((wireless_mode == WIRELESS_MODE_N_5G) || > (wireless_mode == WIRELESS_MODE_N_24G))) > rate->flags |= IEEE80211_TX_RC_MCS; > > but can never set IEEE80211_TX_RC_VHT_MCS. Seems like there should be > something like > > if (sta && sta->ht_cap.vht_supported && > (wireless_mode == WIRELESS_MODE_AC_5G || > wireless_mode == WIRELESS_MODE_AC_24G || > wireless_mode == WIRELESS_MODE_AC_ONLY)) > rate->flags |= IEEE80211_TX_RC_VHT_MCS; > > just after/before the above block. > > But I'm not familiar with this code at all, so that may not really be > the right fix or even work. > > johannes -- To unsubscribe from this list: send the line "unsubscribe linux-wireless" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: WARNING at net/mac80211/rate.c:513 ieee80211_get_tx_rates [mac80211]
On Thu, Jan 28, 2016 at 4:13 AM, Johannes Berg wrote: > On Wed, 2016-01-27 at 21:34 -0800, Linus Torvalds wrote: > >> .. except now I upgraded the nearest access point, and now wireless >> on that machine no longer works. > > Can you describe the upgrade a bit more, just for background? I used to have the basic original UniFi UAP. I've replaced them with the newer "AC Lite" version: https://www.ubnt.com/unifi/unifi-ap-ac-lite/ so it's a fairly big jump from a 2.4GHz-only network to a dual-band one. The old 2.4GHz-only AP's showed the problem with minstrel-ht incorrectly starting off at the highest rate (on a totally different machine). So the Unifi AP's have shown problems in the kernel wireless before, but so far it's always been the fault of the kernel wireless, not the AP. > Could you print out the entire table there when the warning happens? This is the best I can come up with: printing out the index, and the rate and bitrate tables: rates[i].idx (9) >= sband->n_bitrates (8) Rates: 0: idx 9 count 1 flags a0 1: idx 8 count 1 flags a0 2: idx 7 count 2 flags a0 3: idx 6 count 3 flags a0 Bitrates: 0: flags 0002 bitrate 60 (hw: 0004 ) 1: flags bitrate 90 (hw: 0005 ) 2: flags 0002 bitrate 120 (hw: 0006 ) 3: flags bitrate 180 (hw: 0007 ) 4: flags 0002 bitrate 240 (hw: 0008 ) 5: flags bitrate 360 (hw: 0009 ) 6: flags bitrate 480 (hw: 000a ) 7: flags bitrate 540 (hw: 000b ) So it's the very first rate that has index 9, but the bitrate table only goes from 0-7. So I suspect that once the first index has been marked invalid, it now will never even look at the later indices, so it has no transmit rates at all. Or something. That bitrate table does seem to match: static struct ieee80211_rate rtl_ratetable_5g[] = { in drivers/net/wireless/realtek/rtlwifi/base.c Does this give you any ideas? Linus -- To unsubscribe from this list: send the line "unsubscribe linux-wireless" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
WARNING at net/mac80211/rate.c:513 ieee80211_get_tx_rates [mac80211]
Hmm. So my daughter has a little Gigabyte Brix that has rtl8821ae wireless in it. Yeah, nasty, I know, but it has actually worked reasonably well. .. except now I upgraded the nearest access point, and now wireless on that machine no longer works. Or rather, it actually *does* work in the sense that it authenticates, it associates, and it actually gets a DHCP lease etc. So the darn thing has an IP address and everything, but then nothing else seems to go through after that. Very odd. My guess is that the auth/assoc/dhcp thign happens at low rates, then it starts trying to up the rates, and things go to hell. But clearly several packets have gotten through. And then absolutely nothing. Everything else is happy with the new AP, so this is not a problem with the wireless network itself. I'm appending the warning that gets printed, which may or may not be relevant. This is with a clean and up-to-date Fedora 23 install, so that line 513 is the 512 /* RC is busted */ 513 if (WARN_ON_ONCE(rates[i].idx >= sband->n_bitrates)) { 514 rates[i].idx = -1; 515 continue; 516 } thing, which still exists in the same form in current kernels (except in current -git it's line 625). I do note that that rate_fixup_ratelist() function is a bit odd wrt those rate indexes: it has code to make sure that there are no valid rates following an invalid one: /* * make sure there's no valid rate following * an invalid one, just in case drivers don't * take the API seriously to stop at -1. */ if (inval) { rates[i].idx = -1; continue; } if (rates[i].idx < 0) { inval = true; continue; } but then that "RC is busted" case that generates a warning will add one of those invalid rates in the middle anyway. So I get the feeling that if that warning ever triggers, it will basically be screwing up that whole rate table. I dunno. Is there anything sane I can do to help debug this case? Linus --- snip snip, relevant (?) wireless warning --- IPv6: ADDRCONF(NETDEV_UP): enp3s0: link is not ready r8169 :03:00.0 enp3s0: link down IPv6: ADDRCONF(NETDEV_UP): enp3s0: link is not ready IPv6: ADDRCONF(NETDEV_UP): wlp2s0: link is not ready IPv6: ADDRCONF(NETDEV_UP): wlp2s0: link is not ready IPv6: ADDRCONF(NETDEV_UP): wlp2s0: link is not ready tun: Universal TUN/TAP device driver, 1.6 tun: (C) 1999-2004 Max Krasnyansky device virbr0-nic entered promiscuous mode virbr0: port 1(virbr0-nic) entered listening state virbr0: port 1(virbr0-nic) entered listening state virbr0: port 1(virbr0-nic) entered disabled state wlp2s0: authenticate with 46:d9:e7:92:bf:29 wlp2s0: send auth to 46:d9:e7:92:bf:29 (try 1/3) wlp2s0: authenticated wlp2s0: associate with 46:d9:e7:92:bf:29 (try 1/3) wlp2s0: associate with 46:d9:e7:92:bf:29 (try 2/3) wlp2s0: RX AssocResp from 46:d9:e7:92:bf:29 (capab=0x411 status=0 aid=1) wlp2s0: associated IPv6: ADDRCONF(NETDEV_CHANGE): wlp2s0: link becomes ready [ cut here ] WARNING: CPU: 2 PID: 0 at net/mac80211/rate.c:513 ieee80211_get_tx_rates+0x243/0x7d0 [mac80211]() Modules linked in: ccm cmac xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 tun nf_conntrack_netbios_ns nf_conntrack_broadcast ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ebtable_filter ebtable_nat ebtable_broute bridge ebtables ip6table_raw ip6table_security ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_filter ip6_tables iptable_raw iptable_security iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle bnep arc4 rtl8821ae vfat fat btcoexist rtl_pci rtlwifi mac80211 x86_pkg_temp_thermal coretemp snd_hda_codec_realtek snd_hda_codec_hdmi snd_hda_codec_generic kvm_intel snd_soc_rt5640 kvm snd_soc_rl6231 snd_hda_intel snd_soc_core iTCO_wdt snd_hda_codec snd_compress btusb snd_pcm_dmaengine snd_hda_core iTCO_vendor_support cfg80211 ac97_bus btrtl snd_hwdep crct10dif_pclmul btbcm snd_seq crc32_pclmul btintel crc32c_intel bluetooth snd_seq_device joydev snd_pcm mei_me mei shpchp dw_dmac tpm_tis lpc_ich i2c_i801 snd_timer rfkill snd tpm soundcore snd_soc_sst_acpi dw_dmac_core i2c_designware_platform i2c_designware_core nfsd auth_rpcgss nfs_acl lockd grace sunrpc hid_logitech_hidpp hid_logitech_dj i915 i2c_algo_bit drm_kms_helper 8021q garp drm stp llc mrp r8169 sdhci_acpi mii sdhci mmc_core video i2c_hid CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.2.8-300.fc23.x86_64 #1 Hardware name: GIGABYTE M4HM87P-00/M4HM87P-00, BIOS F2 12/11/2013 aad0aff724c0ea01 88021ea83648 817738ca 000
Re: [PATCH] mac80211: Send EAPOL frames at lowest rate
Johannes, On Thu, Feb 26, 2015 at 5:50 AM, Jouni Malinen wrote: > > Reported-by: Linus Torvalds Also "Tested-by:", and I'd suggest marking it for stable too (although I understand that David generally doesn't use stable tags, and just sends them separately to the stable tree). This fixes both Atheros and brcmsmac for me (with Ubiquiti UniFi APs). My main laptop is iwlwifi, and the only reason that worked is apparently that the iwlwifi driver already basically does something similar on its own. All the other devices I have apparently don't use the 802.11 code even if they are Linux-based (ie mostly android, and presumably they use some vendor driver rather than the minstrel rate-handling code). Linus -- To unsubscribe from this list: send the line "unsubscribe linux-wireless" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [ath9k-devel] AR9462 problems connecting again..
On Wed, Feb 25, 2015 at 10:14 AM, Linus Torvalds wrote: > > I'm talking about the two from Jouni - the "don't encrypt EAPOL > frames" one, and the one-liner that makes all EAPOL frames go at the > lowest data rate. So I just found out and confirmed that this is not Atheros-specific in any way - it looks like it's simply the UniFi AP that does not like high-data-rate authentification frames at all. Because it looks like the brcmsmac driver has *exactly* the same behavior with this AP (in an Apple Macbook air). I assume brcmsmac uses the net/80211/tx.c logic too. And Jouni's one-liner fixes that one too, although as usual, maybe there is some testing noise, and I screwed something up. This time I only did the one-liner, so that's the critical one. It's interesting to note how nothing else has been unhappy with that network (admittedly it's been mainly android devices and a HP printer that I've tested), so it looks like everybody else does low-rate authentication packets anyway. So this actually looks like a Ubiquiti UniFi AP bug to me, but it also looks like presumably everybody else does low-rate initial packets, and our kernel 802.11 layer should just follow suit. The whole robustness principle and "be conservative in what you send, and liberal in what you accept" etc. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-wireless" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [ath9k-devel] AR9462 problems connecting again..
On Wed, Feb 25, 2015 at 6:47 AM, Jouni Malinen wrote: > > There may be something else wrong (say, some kind of interference), but > there is no way we can assume normal users to be able to fix such > issues. If we make EAPOL frames go through more robustly, the connection > can be established in more cases and this can result in relatively > functional network connection and rate control can handle the less > critical data frames through whatever means to get optimal throughput > from the network. As such, I do think we do need to "paper over" this > for EAPOL frames. While I realize that people may disagree about the exact details of how to fix this in the long run, may I suggest that in the meantime we at least get the two workaround patches applied? I'm talking about the two from Jouni - the "don't encrypt EAPOL frames" one, and the one-liner that makes all EAPOL frames go at the lowest data rate. Even if "lowest data rate" is ridiculously low, and even if that might disturb other things going on on the same channel at the same time, those authentication packets shouldn't be so common as to be a problem. No? Jouni has a few packet dumps for me, and he's stumped as to what exactly is going on, but those two patches (well, the one-liner "low data rate EAPOL" in particular, it seems) do seem to make my connections go through reliably. And it seems that other drivers already are working around the EAPOL issue in similar ways, judging by the comments about iwlwifi. Last time I had connection issues with this laptop, nothing ended up happening in the end, and I had people pipe up saying they had had similar problems. I'd hate for the same "nothing" to happen this time just because people aren't 100% sure what the final right thing is to do. So I'd really like people to apply the simple workarounds for now because clearly something is badly wrong, and *if* there is some better resolution later, that's fine. I'll happily test patches. It seems to be pretty repeatable for me, even if that "pretty repeatable" seems to be very much about the laptop being in one very particular place (it's right next to another AP, there's random other electronics around, since it's on my messy desk etc). So I wouldn't be at all surprised by horribly interference. And the AP is supposed to be ceiling- or wall-mounted, but because I'm just testing things out it's just sitting on a table in the next room, so for all I know it's in the *exact* worst position for the antennas etc etc. So I'm sure I can improve reception of my laptop, but that's not the point. The point is that bad wireless networks aren't so unusual, and right now things clearly don't work as well as they could. Does anybody hate Jouni's two patches *so* much that they can articulate *why* it would be wrong to apply them as interim patches? And if so, do you have better patches for me to try? Because if not.. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-wireless" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: AR9462 problems connecting again..
On Mon, Feb 23, 2015 at 2:43 PM, Jouni Malinen wrote: > > This did not get exactly supportive response when this was proposed last > time (Sep 2013). Anyway, for a quick test, this can be done with the > following one-liner: fwiw, that one-liner seems to work fine for me. Which I guess is not a huge surprise. Side note: I've done the "turn off wifi and turn it back on" several times to test that patch, and it has worked every time. BUT I also see this odd behavior where the logs show that it tries to authenticate twice: the first time it does that "send auth to 20:9f .." thing three times (looks like 100ms apart), and nothing happens so it does "authentication with 20:9f .. timed out". Then it waits three seconds and tries again, and now it succeeds on the first try. The only downside of that seems to be that it takes an extra 3s to connect to the network - but it does now seem to *reliably* connect - so it's not a big problem, but I wonder why it should be that repeatable. Is there some difference between the first and the second time it tries to authenticate? Anyway, even if people don't like that particular patch, it does seem like *something* like that should be done. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-wireless" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: AR9462 problems connecting again..
On Mon, Feb 23, 2015 at 1:30 PM, Jouni Malinen wrote: > > How far is the station from the AP? Would it be possible to see whether > the behavior changes if you were within, say, five meters or so? Well, it was pretty much within five meters already, but there was a thin wall in between (and the old AP was right next to the laptop, which might add some noise even if they are on different channels). Going closer does seem to help, but again, it's not like this is 100% reproducible to begin with. So the theory that the driver starts at too high a transmit rate, and then does not handle failures well, might be true. Of course, "not handle failures well" is something of an understatement. > It would be useful if you can capture the 802.11 frame exchange from a > failed connection case with an external wireless sniffer. I will try with my (much more reliable) iwlwifi laptop. At least the merge window is over, so I should have some time. Knock wood. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-wireless" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: AR9462 problems connecting again..
On Mon, Feb 23, 2015 at 12:06 PM, Linus Torvalds wrote: > > This machine has a fairly minimal kernel config. Does that "type > monitor" interface perhaps need some debug infrastructure that I > haven't added? Nope. Same behavior with a F21 kernel (which means that the touchpad doesn't work, but that's a separate and known issue with this platform). Linus -- To unsubscribe from this list: send the line "unsubscribe linux-wireless" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: AR9462 problems connecting again..
On Mon, Feb 23, 2015 at 9:17 AM, Jouni Malinen wrote: > > mac80211: Do not encrypt EAPOL frames before peer has used the key Hmm. This patch does not seem to make a difference. I thought it did at first, but then removed the wpa_supplicant debugging, and got the same failures. On Sun, Feb 22, 2015 at 10:01 PM, Sujith Manoharan wrote: > > Or 'iw dev wlp1s0 set bitrates ht-mcs-2.4 0' to choose the lowest > HT rate. This *might* have worked. But it's a bit hard to really make sure, since it sometimes does succeed even without debugging when I do nothing at all, but it did work twice in a row after doing that. > Sporadic association problems could be a problem with the chosen rates. > This would show the rates for the EAPOL frames: > > iw dev wlp1s0 interface add mon0 type monitor > ifconfig mon0 up Hmm. I don't actually see a "mon0" interface after the "iw dev interface add" thing. Yes, "iw" sees it when I do "iw dev", but "ifconfig" does not. This machine has a fairly minimal kernel config. Does that "type monitor" interface perhaps need some debug infrastructure that I haven't added? > tshark -i mon0 -Y eapol -T fields -e radiotap.datarate -e wlan -e eapol -e > wlan.sa -e wlan.da .. and then this fails, presumably for similar reasons. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-wireless" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: AR9462 problems connecting again..
On Sun, Feb 22, 2015 at 5:55 PM, Adrian Chadd wrote: > > Do you have a 5GHz SSID setup on this access point? Is this kind of > messed up diassociation-to-steer-you-to-another-band thing? Nope. That's the older single-band UniFi UAP - 2.4GHz only. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-wireless" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: AR9462 problems connecting again..
On Sun, Feb 22, 2015 at 4:54 PM, Adrian Chadd wrote: > > I /think/ it's okay? The removed stuff is the pre-shared key pieces. Ok. Attached is what seems to be the relevant part of the wpa_supplicant.log file. The datestamp has been changed so that it can be matched up with the dmesg, and I added empty lines for pauses when I was trying to figure out what the heck it was doing, but other than that it's the raw log. > Do you have another laptop with an atheros NIC in it that you could > use in monitor mode to capture all the frames? Nope, everything else I have seems to be intel wireless. I think one of the kids machines is a Mac Mini with an ath5k thing, but I'm hoping the wpa_supplicant.log is sufficient to give somebody an idea. Linus 14:07:10.971480: EAPOL: disable timer tick 14:07:10.971578: EAPOL: Supplicant port status: Unauthorized 14:07:12.886125: nl80211: Event message available 14:07:12.886287: nl80211: Regulatory beacon hint 14:07:12.886318: wlp1s0: Event CHANNEL_LIST_CHANGED (31) received 14:07:12.886608: nl80211: Regulatory information - country=US 14:07:12.886646: nl80211: 2402-2472 @ 40 MHz 14:07:12.886670: nl80211: 5170-5250 @ 80 MHz 14:07:12.886691: nl80211: 5250-5330 @ 80 MHz 14:07:12.886711: nl80211: 5490-5600 @ 80 MHz 14:07:12.886733: nl80211: 5650-5710 @ 40 MHz 14:07:12.886755: nl80211: 5735-5835 @ 80 MHz 14:07:12.886777: nl80211: 57240-63720 @ 2160 MHz 14:07:12.886822: nl80211: Added 802.11b mode based on 802.11g information 14:07:12.886848: P2P: Add operating class 81 14:07:12.886870: P2P: Channels - hexdump(len=11): 01 02 03 04 05 06 07 08 09 0a 0b 14:07:12.886902: P2P: Add operating class 124 14:07:12.886921: P2P: Channels - hexdump(len=1): a1 14:07:12.886948: wlp1s0: P2P: Update channel list 14:07:13.011791: nl80211: Event message available 14:07:13.011987: nl80211: New scan results available 14:07:13.012055: wlp1s0: Event SCAN_RESULTS (3) received 14:07:13.012281: nl80211: Received scan results (3 BSSes) 14:07:13.012464: wlp1s0: BSS: Start scan result update 1 14:07:13.012501: wlp1s0: BSS: Add new id 0 BSSID 60:a4:4c:8d:99:24 SSID '1gnoraNT' 14:07:13.012530: dbus: Register BSS object '/fi/w1/wpa_supplicant1/Interfaces/7/BSSs/0' 14:07:13.012745: wlp1s0: BSS: Add new id 1 BSSID 60:a4:4c:8d:99:20 SSID '1gnoraNT' 14:07:13.012767: dbus: Register BSS object '/fi/w1/wpa_supplicant1/Interfaces/7/BSSs/1' 14:07:13.012905: wlp1s0: BSS: Add new id 2 BSSID 20:9f:db:e7:80:80 SSID 'UniFi-home' 14:07:13.012925: dbus: Register BSS object '/fi/w1/wpa_supplicant1/Interfaces/7/BSSs/2' 14:07:13.013080: BSS: last_scan_res_used=3/32 last_scan_full=0 14:07:13.013115: wlp1s0: New scan results available 14:07:13.013191: wlp1s0: No suitable network found 14:07:13.013209: wlp1s0: Short-circuit new scan request since there are no enabled networks 14:07:13.013221: wlp1s0: State: DISCONNECTED -> INACTIVE 14:07:13.013272: wlp1s0: Checking for other virtual interfaces sharing same radio (phy0) in event_scan_results 14:07:13.014248: RTM_NEWLINK: operstate=0 ifi_flags=0x1003 ([UP]) 14:07:13.014265: RTM_NEWLINK, IFLA_IFNAME: Interface 'wlp1s0' added 14:07:13.014412: nl80211: if_removed already cleared - ignore event 14:07:13.017250: dbus: flush_object_timeout_handler: Timeout - sending changed properties of object /fi/w1/wpa_supplicant1/Interfaces/7 14:07:13.060305: dbus: Register network object '/fi/w1/wpa_supplicant1/Interfaces/7/Networks/0' 14:07:13.072845: wlp1s0: Setting scan request: 0 sec 0 usec 14:07:13.073000: wlp1s0: State: INACTIVE -> SCANNING 14:07:13.073054: Scan SSID - hexdump_ascii(len=10): 55 6e 69 46 69 2d 68 6f 6d 65 UniFi-home 14:07:13.073078: wlp1s0: Starting AP scan for wildcard SSID 14:07:13.073088: WPS: Building WPS IE for Probe Request 14:07:13.073096: WPS: * Version (hardcoded 0x10) 14:07:13.073104: WPS: * Request Type 14:07:13.073111: WPS: * Config Methods (108) 14:07:13.073129: WPS: * UUID-E 14:07:13.073137: WPS: * Primary Device Type 14:07:13.073145: WPS: * RF Bands (3) 14:07:13.073153: WPS: * Association State 14:07:13.073160: WPS: * Configuration Error (0) 14:07:13.073167: WPS: * Device Password ID (0) 14:07:13.073175: WPS: * Device Name 14:07:13.073184: P2P: * P2P IE header 14:07:13.073192: P2P: * Capability dev=25 group=00 14:07:13.073200: P2P: * Listen Channel: Regulatory Class 81 Channel 1 14:07:13.077659: Scan requested (ret=0) - scan timeout 30 seconds 14:07:13.077697: nl80211: Event message available 14:07:13.077716: nl80211: Scan trigger 14:07:13.078248: dbus: flush_object_timeout_handler: Timeout - sending changed properties of object /fi/w1/wpa_supplicant1/Interfaces/7 14:07:16.056351: RTM_NEWLINK: operstate=0 ifi_flags=0x1003 ([UP]) 14:07:16.056498: RTM_NEWLINK, IFLA_IFNAME: Interface 'wlp1s0' added 14:07:16.056652: nl80211: if_removed already cleared - ignore event 14:07:16.056728: nl80211: Event message available 14:07:16.056810: nl80211: New scan results available 14:07:16.056879: wlp1s0: Ev
Re: AR9462 problems connecting again..
On Sun, Feb 22, 2015 at 1:50 PM, Linus Torvalds wrote: > > Ugh. When I add "-dd" to the command line, it has now worked three > times in a row, when before it worked once out of ten tries. > > So my guess is that it's something timing-dependent. So it stays working with -dd, but I do get *occasional* failures that then seem to clear up on retry. So it ends up working in the end, but I think I have a few example failures in the logs. So for example, from my dmesg, I get this: [14:07:15] wlp1s0: authenticate with 20:9f:db:e7:80:80 [14:07:15] wlp1s0: send auth to 20:9f:db:e7:80:80 (try 1/3) [14:07:15] wlp1s0: send auth to 20:9f:db:e7:80:80 (try 2/3) [14:07:15] wlp1s0: send auth to 20:9f:db:e7:80:80 (try 3/3) [14:07:15] wlp1s0: authentication with 20:9f:db:e7:80:80 timed out [14:07:18] wlp1s0: authenticate with 20:9f:db:e7:80:80 [14:07:18] wlp1s0: send auth to 20:9f:db:e7:80:80 (try 1/3) [14:07:18] wlp1s0: authenticated [14:07:18] wlp1s0: associate with 20:9f:db:e7:80:80 (try 1/3) [14:07:18] wlp1s0: RX AssocResp from 20:9f:db:e7:80:80 (capab=0x431 status=0 aid=16) [14:07:18] wlp1s0: associated [14:07:22] wlp1s0: deauthenticated from 20:9f:db:e7:80:80 (Reason: 2=PREV_AUTH_NOT_VALID) with another failure at 14:07:22, but then it ends up working a bit later at 14:07:33. I've got the wpa supplicant log for this timeframe, but I'd rather not send it out in public. I see "[REMOVED] for what looks like the key data, but there's a lot of other hex data. Is any of it sensitive? Linus -- To unsubscribe from this list: send the line "unsubscribe linux-wireless" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: AR9462 problems connecting again..
On Sun, Feb 22, 2015 at 11:39 AM, Adrian Chadd wrote: > > Hm, can you enable wpa debugging to log everything whilst it's > associating / reassociating? Ugh. When I add "-dd" to the command line, it has now worked three times in a row, when before it worked once out of ten tries. So my guess is that it's something timing-dependent. Or it's something where once it starts working, it stays working until I reboot. I'll try that. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-wireless" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [ath9k-devel] AR9462 problems connecting again..
On Sun, Feb 22, 2015 at 10:58 AM, Dave Taht wrote: > > Hint: Several unifi (and most ubnt) products are well supported by > openwrt directly, I want Linux to "just work". None of this "oh, you can change something else and it probably works". I want to fix the problem in *linux*. There's clearly something wrong with the AR9462 driver and/or how it uses the wireless infrastructure, and it should be fixed. Not worked around with "use some other AP software". Especially since this has happened before. Besides, the reason I'm trying to use UniFi is because I want to have seamless roaming ("zero-handoff"). And I do *not* want to play the endless openwrt configuration games in the hopes I can get something like that working. I've tried openwrt, and I don't like tinkering with my AP's. I just want things to work out of the box. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-wireless" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: AR9462 problems connecting again..
On Sun, Feb 22, 2015 at 10:24 AM, Adrian Chadd wrote: > > Just a wild shot - try disabling fast authentication and see if that > makes a difference? > > wpa_supplicant.conf: > > fast_reauth=0 > > I recall having issues with fast_reauth once, but I never stuck around > that location long enough to debug it. Nope. Did that, killed wpa_supplicant (which restarts it), tried connecting, still failed. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-wireless" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: AR9462 problems connecting again..
On Sat, Feb 21, 2015 at 10:50 PM, Sujith Manoharan wrote: > > Can you please post the output of 'iw dev wlp1s0 scan' ? Attached. It's the "UniFi-home" SSID that doesn't work. The 1gnoraNT one is the old working one that I'm obviously associated with, and that has multiple AP's. (The UniFi-home also has two AP's, but they should all show up as a single network) Linus out Description: Binary data
AR9462 problems connecting again..
So I've had problems connecting to some networks before on my Chromebook Pixel, but now I'm testing a new Ubiquiti network at home, and can see this issue at home too. I know the wireless works, because other devices work fine on that network. Also, I know the AR9462 works, because I still have my old network up and it connects to that. And it *occasionally* connects to the new one. But it's rare, and it clearly has problems. It looks something like this: [ 73.757869] wlp1s0: authenticate with 20:9f:db:e7:80:80 [ 73.771471] wlp1s0: send auth to 20:9f:db:e7:80:80 (try 1/3) [ 73.773706] wlp1s0: authenticated [ 73.775122] wlp1s0: associate with 20:9f:db:e7:80:80 (try 1/3) [ 73.787434] wlp1s0: RX AssocResp from 20:9f:db:e7:80:80 (capab=0x431 status=0 aid=9) [ 73.787573] wlp1s0: associated [ 77.784931] wlp1s0: deauthenticated from 20:9f:db:e7:80:80 (Reason: 2=PREV_AUTH_NOT_VALID) and the password I used definitely is right, and sometimes works. Despite that PREV_AUTH_NOT_VALID thing. Any suggestions for what I should do to give you guys any sane and useful debug output? Linus -- To unsubscribe from this list: send the line "unsubscribe linux-wireless" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Wireless scanning while turning off the radio problem..
On Mon, Jan 19, 2015 at 5:48 AM, Arend van Spriel wrote: > > So as you indicated you were in location where none of your configured > networks were available. Flipping the rfkill switch in that situation is the > way to trigger the issue. So you certainly seem to be able to explain the behavior I saw under the circumstances they happened. I suspect the best thing to do is to just apply your patch. I may not be able to really test it much for the next few days anyway. Emmanuel? Linus -- To unsubscribe from this list: send the line "unsubscribe linux-wireless" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Wireless scanning while turning off the radio problem..
On Sun, Jan 18, 2015 at 11:24 PM, Emmanuel Grumbach wrote: > > we have different scan flows based on the firmware version you have, > so it would help if you could tell me what firmware you have. Sure. It's the larest one I could find iwlwifi :01:00.0: loaded firmware version 23.11.10.0 op_mode iwlmvm with the actual firmware file being 'iwlwifi-7260-10.ucode' from the current linux-firmware tree. Iin a different email Arend van Spriel wrote: > > The function iwl_trans_pcie_stop_device() put device in low-power and > resets the cpu on the device. So iwl_op_mode_hw_rf_kill ends up in > iwl_mvm_set_hw_rfkill_state which schedules cfg80211_rfkill_sync_work > and returns true if firmware is running. The patch below might work. Any suggestions for how to best try to trigger this for testing? Looking at my logs, it turns out that I actually got this three times, but they were all on the same boot, and I think the first case might just have triggered the later ones. The trigger was turning off wifi from the wifi settings app due to being in an airplane when they were closing the doors. I don't *think* there was actually any wifi around at the time, which may or may not have made the scanning take longer and made it easier to trigger. But I've done it before (although this machine has been upgraded to F21 reasonably recently, and I did update the ucode file before the trip). And I did it afterwards to test. And it happened that one time (and then apparently kept happening during suspend/resume/shutdown, but as mentioned, I blame that on some sticky problem from the first time, and those events in turn happened because I couldn't get wireless to work afterwards). IOW, I'm not at all sure I can recreate it, so your "analyzing the source code for how this could happen" may be the only good way.. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-wireless" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Wireless scanning while turning off the radio problem..
So there seems to be some issue with unlucky timing when turning off wireless while the driver is busy scanning. I can't reproduce this, so it's a one-off, but it's not just ugly warnings, the kernel woudln't scan any wireless on that device afterwards and I had to reboot to get networking back, so there is some long-term damage. This is with Intel wireless (iwlwifi, it's a iwl N7260 thing, rev 0x144 if anybody cares) , but the warning callbacks don't seem to be iwl-specific. This was a recent top-of-git kernel (3.19.0-rc4-00241-gfc7f0dd38172 to be exact). Anybody have any ideas? Anything in particular I should try out to help possibly get more information? Linus --- [ 204.361145] iwlwifi :01:00.0: RF_KILL bit toggled to disable radio. [ 204.362358] [ cut here ] [ 204.362383] WARNING: CPU: 0 PID: 37 at net/wireless/core.c:1011 cfg80211_netdev_notifier_call+0x491/0x500 [cfg80211]() [ 204.362385] Modules linked in: ccm rfcomm fuse ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_mangle iptable_security iptable_raw bnep arc4 vfat fat x86_pkg_temp_thermal pn544_mei mei_phy pn544 coretemp hci kvm_intel nfc iTCO_wdt iTCO_vendor_support kvm iwlmvm uvcvideo snd_hda_codec_realtek microcode snd_hda_codec_generic snd_hda_codec_hdmi mac80211 videobuf2_vmalloc videobuf2_memops videobuf2_core v4l2_common snd_hda_intel videodev snd_hda_controller joydev btusb media hid_multitouch i2c_i801 snd_hda_codec serio_raw iwlwifi bluetooth snd_hwdep snd_seq cfg80211 snd_seq_device [ 204.362432] snd_pcm sony_laptop rfkill mei_me snd_timer mei snd lpc_ich mfd_core shpchp soundcore dm_crypt i915 crct10dif_pclmul crc32_pclmul crc32c_intel i2c_algo_bit ghash_clmulni_intel drm_kms_helper drm i2c_core video [ 204.362453] CPU: 0 PID: 37 Comm: kworker/0:1 Not tainted 3.19.0-rc4-00241-gfc7f0dd38172 #14 [ 204.362455] Hardware name: Sony Corporation SVP11213CXB/VAIO, BIOS R0270V7 05/17/2013 [ 204.362464] Workqueue: events cfg80211_rfkill_sync_work [cfg80211] [ 204.362467] c0375870 815eb39a [ 204.362471] 8106c357 8800d3b12890 8800d9e08260 0002 [ 204.362475] 8800d3b12000 8800d9e08000 c0350161 8800d365dc00 [ 204.362479] Call Trace: [ 204.362490] [] ? dump_stack+0x40/0x50 [ 204.362496] [] ? warn_slowpath_common+0x77/0xb0 [ 204.362506] [] ? cfg80211_netdev_notifier_call+0x491/0x500 [cfg80211] [ 204.362513] [] ? __dev_remove_pack+0x39/0xa0 [ 204.362538] [] ? __unregister_prot_hook+0xcc/0xd0 [ 204.362542] [] ? packet_notifier+0x15c/0x1b0 [ 204.362549] [] ? notifier_call_chain+0x45/0x70 [ 204.362552] [] ? dev_close_many+0xb9/0x110 [ 204.362556] [] ? dev_close.part.87+0x2a/0x40 [ 204.362559] [] ? dev_close+0x19/0x20 [ 204.362569] [] ? cfg80211_shutdown_all_interfaces+0x3d/0xb0 [cfg80211] [ 204.362577] [] ? cfg80211_rfkill_sync_work+0x29/0x30 [cfg80211] [ 204.362580] [] ? process_one_work+0x135/0x370 [ 204.362585] [] ? pwq_activate_delayed_work+0x27/0x40 [ 204.362589] [] ? worker_thread+0x63/0x480 [ 204.362592] [] ? rescuer_thread+0x2f0/0x2f0 [ 204.362596] [] ? kthread+0xce/0xf0 [ 204.362600] [] ? kthread_create_on_node+0x180/0x180 [ 204.362605] [] ? ret_from_fork+0x7c/0xb0 [ 204.362609] [] ? kthread_create_on_node+0x180/0x180 [ 204.362612] ---[ end trace d0ac2826f7d2747f ]--- [ 204.362614] [ cut here ] [ 204.362628] WARNING: CPU: 0 PID: 37 at net/mac80211/driver-ops.h:12 ieee80211_request_sched_scan_stop+0xdd/0xf0 [mac80211]() [ 204.362630] wlp1s0: Failed check-sdata-in-driver check, flags: 0x4 [ 204.362631] Modules linked in: ccm rfcomm fuse ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_mangle iptable_security iptable_raw bnep arc4 vfat fat x86_pkg_temp_thermal pn544_mei mei_phy pn544 coretemp hci kvm_intel nfc iTCO_wdt iTCO_vendor_support kvm iwlmvm uvcvideo snd_hda_codec_realtek microcode snd_hda_codec_generic snd_hda_codec_hdmi mac80211 videobuf2_vmalloc videobuf2_memops videobuf2_core v4l2_common snd_hda_intel videodev snd_hda_controller joydev btusb media hid_multitouch i2c_i801 snd_hda_codec serio_raw iwlwifi bluetooth snd_hwdep snd_seq cfg80211 snd_seq_device [ 204.362677] snd_pcm sony_laptop rfkill mei_me snd_timer mei snd lpc_ich mfd_core shpchp soundcore dm_crypt i915 crct10dif_pclmul crc32_pclmul crc32c_intel i2c_algo_bit ghash_clmulni_intel drm_kms_helper drm i2c_core video [ 204.362695] CPU: 0 PID: 37 Comm: kworker/0:1 Tainted: GW 3.19.0-r
Re: [PATCH] Revert "ipw2200: select CFG80211_WEXT"
On Sat, Jan 3, 2015 at 10:02 AM, Marcel Holtmann wrote: > > why would you revert this? It is obviously the correct change to actually > select CFG80211_WEXT. I don't know about obvious, but yeah, I think the select in this case is actually the better idea anyway. We could make the CFG80211_WEXT help message be very negative so that people aren't encouraged to select it even if they can, but then if they need the ipw driver it gets selected because of that. Because the ipw driver is probably the more important of the two if you just happen to have old hardware but are upgrading yout software (and anybody who recompiles their own kernel is obviously doing the latter). Linus -- To unsubscribe from this list: send the line "unsubscribe linux-wireless" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Revert "cfg80211: make WEXT compatibility unselectable"
On Thu, Jan 1, 2015 at 11:44 AM, Lennart Sorensen wrote: > > ifconfig seems to just be broken for many cases of perfectly nice features > in the kernel. So I'm not saying "ifconfig is wonderful". It's not. But I *am* saying that "changing user interfaces and then expecting people to change is f*cking stupid". The fact is, ifconfig is simple for the simple cases, but more importantly, a lot of people learnt how to use it. Saying "you should all change, because we made up a new syntax" is not good policy. The people who did "ip" could have fairly easily have done a wrapper around the same code that also left the old "ifconfig" syntax. Then, distros could have trivially just dropped the old "ifconfig" package, and entirely replaced it with the new "ip" package. As it is, we have two different models, and they'll basically stay around forever. For something like ifconfig, very few people care. But *all* the same arguments are true wrt "iw" and "iwconfig". The people who are trying to deprecate the WEXT interfaces should put the blame firmly where it belongs - on the people who thought that "we'll just ignore all old history". Because people who think that "we'll just redesign everything" are actually f*cking morons. Really. There's a real reason the kernel has the "no regression" policy. And that reason is that I'm not a moron. History matter. Legacy uses matter. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-wireless" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Revert "cfg80211: make WEXT compatibility unselectable"
On Wed, Dec 31, 2014 at 1:44 PM, Theodore Ts'o wrote: > > Yeah, the confusing part is that "ip" tends to use "verb object" > scheme, which is consistent with the Cisco IOS command set it was > trying to emulate. Side note: does anybody think that was really a good idea to begin with? I mean, Cisco iOS is just _s_ universally loved, right? And yeah, I refuse to use "ip link" or other insane commands. Let's face it, "ifconfig" and "route" are perfectly fine commands, and a whole lot less confusing than "ip" with random crap after it. I'm really not seeing why that "ip" command was seen as an improvement. (Ok, "ip route" isn't any more complex than "route", but "ip link" sure as hell isn't simpler than "ifconfig" for most things I can think of) Linus -- To unsubscribe from this list: send the line "unsubscribe linux-wireless" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Revert "cfg80211: make WEXT compatibility unselectable"
On Wed, Dec 31, 2014 at 9:31 AM, Theodore Ts'o wrote: > > Most poeple are still using "route" and "ifconfig" instead of "ip". > Deal with it. Indeed. This whole "let's throw out the old and broken" stuff is a disease. It would have been much better (and it's still an option, as Ted points out) for the new commands to provide compatibility with what users - and scripts - have been doing for ages with the old ones. As it is, this inability for the new tools to just do what the old tools did clearly just means that not just the old tools, but all the old infrastructure, will need to be around for years to come. Thinking you can just start from a clean slate is naive, bordering on stupid. "New and improved" is only really improved if it also takes backwards compatibility into account, rather than saying "now everybody must do things the new and improved - and different - way" We've succeeded in getting rid of some old interfaces in the kernel, but it has usually been for some *really* esoteric stuff that nobody does by hand. And even then it has generally been an uphill battle, and in most cases we've ended up having the rule that new capabilities absolutely *have* to be a superset of the old, and we continue to support the old model using the new code. It's entirely possible that we might be able to cut down on the WEXT support a tiny bit by slowly removing some parts of it that nobody uses and depends on, but the whole "let's just make it a non-option" was clearly just a drug-fueled bad fantasy. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-wireless" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html