Re: [GIT] Networking

2018-10-24 Thread Linus Torvalds
On Wed, Oct 24, 2018 at 2:24 PM Andy Gross  wrote:
>
> Yes this will conflict with Niklas's patch which is part of the 4.20
> pull requests. I would prefer that we revert Linus's and take Niklas's
> unless there is a compelling argument to have it fixed before -rc1.

I have no objection to just reverting my patch when I get the real fix.

I just don't want my tree to have warnings that I see, and that may
hide new warnings coming in when I do my next pull request..

   Linus


Re: [PATCH v5 00/12] spectre variant1 mitigations for tip/x86/pti

2018-01-27 Thread Linus Torvalds
On Fri, Jan 26, 2018 at 11:55 PM, Dan Williams  wrote:
>
> Here's another spin of the spectre-v1 mitigations for 4.16.

I see nothing really objectionable here.

And unlike Spectre-v2 and Meltdown, I expect Spectre-v1 to be with us
for a long time. It's not a "CPU did a bad job with checking the
cached information it had" (whether it be from the TLB, BTB or RSB),
it's pretty fundamental to just regular conditional branch prediction.

So ack from me, and I don't expect this to be behind any config options.

I still haven't really seen any numbers for this, but I _assume_ it's
basically not measurable.

 Linus


Re: [PATCH v2 00/19] prevent bounds-check bypass via speculative execution

2018-01-13 Thread Linus Torvalds
On Fri, Jan 12, 2018 at 4:15 PM, Tony Luck  wrote:
>
> Here there isn't any reason for speculation. The core has the
> value of 'x' in a register and the upper bound encoded into the
> "cmp" instruction.  Both are right there, no waiting, no speculation.

So this is an argument I haven't seen before (although it was brought
up in private long ago), but that is very relevant: the actual scope
and depth of speculation.

Your argument basically depends on just what gets speculated, and on
the _actual_ order of execution.

So your argument depends on "the uarch will actually run the code in
order if there are no events that block the pipeline".

Or at least it depends on a certain latency of the killing of any OoO
execution being low enough that the cache access doesn't even begin.

I realize that that is very much a particular microarchitectural
detail, but it's actually a *big* deal. Do we have a set of rules for
what is not a worry, simply because the speculated accesses get killed
early enough?

Apparently "test a register value against a constant" is good enough,
assuming that register is also needed for the address of the access.

Linus


Re: [PATCH v2 00/19] prevent bounds-check bypass via speculative execution

2018-01-11 Thread Linus Torvalds
On Thu, Jan 11, 2018 at 4:46 PM, Dan Williams  wrote:
>
> This series incorporates Mark Rutland's latest ARM changes and adds
> the x86 specific implementation of 'ifence_array_ptr'. That ifence
> based approach is provided as an opt-in fallback, but the default
> mitigation, '__array_ptr', uses a 'mask' approach that removes
> conditional branches instructions, and otherwise aims to redirect
> speculation to use a NULL pointer rather than a user controlled value.

Do you have any performance numbers and perhaps example code
generation? Is this noticeable? Are there any microbenchmarks showing
the difference between lfence use and the masking model?

Having both seems good for testing, but wouldn't we want to pick one in the end?

Also, I do think that there is one particular array load that would
seem to be pretty obvious: the system call function pointer array.

Yes, yes, the actual call is now behind a retpoline, but that protects
against a speculative BTB access, it's not obvious that it  protects
against the mispredict of the __NR_syscall_max comparison in
arch/x86/entry/entry_64.S.

The act of fetching code is a kind of read too. And retpoline protects
against BTB stuffing etc, but what if the _actual_ system call
function address is wrong (due to mis-prediction of the system call
index check)?

Should the array access in entry_SYSCALL_64_fastpath be made to use
the masking approach?

Linus


Re: [run_timer_softirq] BUG: unable to handle kernel paging request at 0000000000010007

2017-11-10 Thread Linus Torvalds
On Wed, Nov 8, 2017 at 9:19 PM, Fengguang Wu  wrote:
>
> Yes it's accessing the list. Here is the faddr2line output.

Ok, so it's a corrupted timer list. Which is not a big surprise.

It's

next->pprev = pprev;

in __hlist_del(), and the trapping instruction decodes as

mov%rdx,0x8(%rax)

with %rax having the value dead0200,

Which is just LIST_POISON2.

So we've deleted that entry twice - LIST_POISON2 is what hlist_del()
sets pprev to after already deleting it once.

Although in this case it might not be hlist_del(), because
detach_timer() also sets entry->next to LIST_POISON2.

Which is pretty bogus, we are supposed to use LIST_POISON1 for the
"next" pointer. Oh well. Nobody cares, except for the list entry
debugging code, which isn't run on the hlist cases.

Adding Thomas Gleixner to the cc. It should not be possible to delete
the same timer twice.

  Linus


Re: [run_timer_softirq] BUG: unable to handle kernel paging request at 0000000000010007

2017-10-30 Thread Linus Torvalds
On Sun, Oct 29, 2017 at 4:48 PM, Fengguang Wu  wrote:
>
> Here are 3 dmesgs related to wireless and 1 from ethernet.

Fengguang, these would be lovelier still _if_ you have DEBUG_INFO
enabled on the kernel, and your script were to find things like
"symbol+0xhex/0xhex", and run "./scripts/faddr2line" on them.

So

> [  235.425464] BUG: unable to handle kernel paging request at 00010007
> [  235.425470] IP: run_timer_softirq+0x13a/0x470

would also then have

   run_timer_softirq at timer.c:XYZ

which would make it easier to see exactly _what_ it is that faults. As
it is, I think there's a fair number of inlining that makes it hard to
see the cause, but that faddrtoline would make very obvious.

Finding that "symbol+xyz/abc" pattern should be fairly easy to
automate, and should fit the 0day model fairly well. No?

In this case, the trapping instruction ends up decoding to

   0: 4c 8d 6c c5 90lea-0x70(%rbp,%rax,8),%r13
   5: 49 8b 45 00  mov0x0(%r13),%rax
   9: 48 85 c0  test   %rax,%rax
   c: 74 deje 0xffec
   e: 4d 8b 7d 00  mov0x0(%r13),%r15
  12: 4d 89 7c 24 08mov%r15,0x8(%r12)
  17: 0f 1f 44 00 00nopl   0x0(%rax,%rax,1)
  1c: 49 8b 07  mov(%r15),%rax
  1f: 49 8b 57 08  mov0x8(%r15),%rdx
  23: 48 85 c0  test   %rax,%rax
  26: 48 89 02  mov%rax,(%rdx)
  29: 74 04je 0x2f
  2b:* 48 89 50 08  mov%rdx,0x8(%rax) <-- trapping instruction
  2f: 41 f6 47 2a 20testb  $0x20,0x2a(%r15)
  34: 49 c7 47 08 00 00 00 movq   $0x0,0x8(%r15)

and %rax has the value 0x, so yes, it will trap at 0x10007.

It's not trivial to see just *wjhat* access it is.

I *think* that "testb $32" is checking for TIMER_IRQSAFE in
expire_timers(), and that the oops is due to the list operations in
detach_timer() (inlined).

Which doesn't really help: it looks like the timer lists are corrupt.
With some luck, some register state could have the timer function
pointer in it, and we'd get a hint of *which* timer this is, but that
doesn't look to be the case here either.

I'm not seeing anything to really help debug this here.

   Linus


Re: iwlwifi firmware load broken in current -git

2017-09-15 Thread Linus Torvalds
On Fri, Sep 15, 2017 at 12:43 PM, Luca Coelho  wrote:
> On Fri, 2017-09-15 at 12:38 -0700, Linus Torvalds wrote:
>>
>> From some of the context it looks like commit 40f11adc7cd9 ("PCI:
>> Avoid race while enabling upstream bridges"), is that correct?
>
> Yes, that's the one.  And Bjorn already sent a revert:
>
> https://lkml.org/lkml/2017/9/15/46

Well, he may have sent a revert to lkml, but not to me. I'm assuming
it's in his tree and I'll get a pull request. Hopefully soon, so that
it makes rc.

Jens, you were actually cc'd on that revert according to the email
headers, so check your spam-box.

 Linus


Re: iwlwifi firmware load broken in current -git

2017-09-15 Thread Linus Torvalds
On Fri, Sep 15, 2017 at 12:32 PM, Jens Axboe  wrote:
>>
>> In any case, your patch introduces a regression on systems. Please get
>> it reverted now, and then you can come up with a new approach to fix the
>> double enable of the upstream bridge.
>
> Who's sending in the revert? I can certainly do it if no one else does,
> but it needs to be done.
>
> I'm not seeing any patches coming out of Srinath to fix up the
> situation, so we should revert the broken patch until a better solution
> exists.

Hmm. I don't have the history here (apparently it never made lkml, for
example), so I don't even know which commit you're talking about.

>From some of the context it looks like commit 40f11adc7cd9 ("PCI:
Avoid race while enabling upstream bridges"), is that correct?

   Linus


Re: [PATCH] iwlwifi: mvm: only send LEDS_CMD when the FW supports it

2017-09-07 Thread Linus Torvalds
On Thu, Sep 7, 2017 at 5:39 AM, Kalle Valo  wrote:
>
> Linus, do you want to apply this directly or should we take it via the
> normal route (wireless-drivers -> net)? If your prefer the latter when
> I'm planning to submit this to Dave in a day or two and expecting it to
> get to your tree in about a week, depending of course what is Dave's
> schedule.

Since we have a workaround for the problem, let's just go through the
regular channels. As long as I get the fix through David before the
merge window closes, I'm happy.

  Linus


Re: [GIT] Networking

2017-09-06 Thread Linus Torvalds
On Wed, Sep 6, 2017 at 10:40 PM, Luca Coelho  wrote:
>
> This patch is not very important (unless you really like blinking lights
> -- maybe I'll change my mind when the holidays approach :P). so it is
> fine if you just want to revert it for now.
>
> In any case, I'll send a patch fixing this problem soon.

No need to revert if we can get this fixed quickly enough.

I'll leave the fw-31 on my laptop, so that I can continue to use it for now.

Thanks,

   Linus


Re: [GIT] Networking

2017-09-06 Thread Linus Torvalds
On Wed, Sep 6, 2017 at 9:11 PM, Coelho, Luciano
 wrote:
>
> This seems to be a problem with backwards-compatibility with FW version
> 27.  We are now in version 31[1] and upgrading will probably fix that.

I can confirm that fw version 31 works.

> But obviously the driver should not fail miserably like this with
> version 27, because it claims to support it still.

Not just "claims to support it", but if it's what is shipped with a
fairly recent distro like an up-to-date version of F26, I would really
hope that the driver can still work with it.

> I'm looking into this now and will provide a fix asap.

Thanks,

  Linus


Re: [GIT] Networking

2017-09-06 Thread Linus Torvalds
On Wed, Sep 6, 2017 at 4:27 PM, Linus Torvalds
 wrote:
>
> The firmware is iwlwifi-8000C-28.ucode from
> iwl7260-firmware-25.30.13.0-75.fc26.noarch, and the kernel reports
>
>   iwlwifi :3a:00.0: loaded firmware version 27.455470.0 op_mode iwlmvm

And when I said "iwlwifi-8000C-28.ucode" I obviously meant
"iwlwifi-8000C-27.ucode".

At least it was _hopefully_ obvious from that "27" in the actual
version number it reports.

Linus


Re: [GIT] Networking

2017-09-06 Thread Linus Torvalds
This pull request completely breaks Intel wireless for me.

This is my trusty old XPS 13 (9350), using Intel Wireless 8260 (rev 3a).

That remains a very standard Intel machine with absolutely zero odd
things going on.

The firmware is iwlwifi-8000C-28.ucode from
iwl7260-firmware-25.30.13.0-75.fc26.noarch, and the kernel reports

  iwlwifi :3a:00.0: loaded firmware version 27.455470.0 op_mode iwlmvm

the thing starts acting badly with this:

  iwlwifi :3a:00.0: FW Error notification: type 0x cmd_id 0x04
  iwlwifi :3a:00.0: FW Error notification: seq 0x service 0x0004
  iwlwifi :3a:00.0: FW Error notification: timestamp 0x5D84
  iwlwifi :3a:00.0: Microcode SW error detected.  Restarting 0x200.
  iwlwifi :3a:00.0: Start IWL Error Log Dump:
  iwlwifi :3a:00.0: Status: 0x0100, count: 6
  iwlwifi :3a:00.0: Loaded firmware version: 27.455470.0
  iwlwifi :3a:00.0: 0x0038 | BAD_COMMAND
  iwlwifi :3a:00.0: 0x00A002F0 | trm_hw_status0
  ...
  iwlwifi :3a:00.0: 0x | isr status reg
  ieee80211 phy0: Hardware restart was requested
  iwlwifi :3a:00.0: FW error in SYNC CMD MAC_CONTEXT_CMD
  CPU: 2 PID: 993 Comm: NetworkManager Not tainted 4.13.0-06466-g80cee03bf1d6 #4
  Hardware name: Dell Inc. XPS 13 9350/09JHRY, BIOS 1.4.17 05/10/2017
  Call Trace:
   dump_stack+0x4d/0x70
   iwl_trans_pcie_send_hcmd+0x4e7/0x530 [iwlwifi]
   ? wait_woken+0x80/0x80
   iwl_trans_send_cmd+0x5c/0xc0 [iwlwifi]
   iwl_mvm_send_cmd+0x32/0x90 [iwlmvm]
   iwl_mvm_send_cmd_pdu+0x58/0x80 [iwlmvm]
   iwl_mvm_mac_ctxt_send_cmd+0x2a/0x60 [iwlmvm]
   ? iwl_mvm_mac_ctxt_send_cmd+0x2a/0x60 [iwlmvm]
   iwl_mvm_mac_ctxt_cmd_sta+0x140/0x1e0 [iwlmvm]
   iwl_mvm_mac_ctx_send+0x2d/0x60 [iwlmvm]
   iwl_mvm_mac_ctxt_add+0x43/0xc0 [iwlmvm]
   iwl_mvm_mac_add_interface+0x139/0x2b0 [iwlmvm]
   ? iwl_led_brightness_set+0x1f/0x30 [iwlmvm]
   drv_add_interface+0x4a/0x120 [mac80211]
   ieee80211_do_open+0x33d/0x820 [mac80211]
   ieee80211_open+0x52/0x60 [mac80211]
   __dev_open+0xae/0x120
   __dev_change_flags+0x17b/0x1c0
   dev_change_flags+0x29/0x60
   do_setlink+0x2f7/0xe60
   ? __nla_put+0x20/0x30
   ? _raw_read_unlock_bh+0x20/0x30
   ? inet6_fill_ifla6_attrs+0x4be/0x4e0
   ? __kmalloc_node_track_caller+0x35/0x2b0
   ? nla_parse+0x35/0x100
   rtnl_newlink+0x5d2/0x8f0
   ? __netlink_sendskb+0x3b/0x60
   ? security_capset+0x40/0x80
   ? ns_capable_common+0x68/0x80
   ? ns_capable+0x13/0x20
   rtnetlink_rcv_msg+0x1f9/0x280
   ? rtnl_calcit.isra.26+0x110/0x110
   netlink_rcv_skb+0x8e/0x130
   rtnetlink_rcv+0x15/0x20
   netlink_unicast+0x18b/0x220
   netlink_sendmsg+0x2ad/0x3a0
   sock_sendmsg+0x38/0x50
   ___sys_sendmsg+0x269/0x2c0
   ? addrconf_sysctl_forward+0x114/0x280
   ? dev_forward_change+0x140/0x140
   ? sysctl_head_finish.part.22+0x32/0x40
   ? lockref_put_or_lock+0x5e/0x80
   ? dput.part.22+0x13e/0x1c0
   ? mntput+0x24/0x40
   __sys_sendmsg+0x54/0x90
   ? __sys_sendmsg+0x54/0x90
   SyS_sendmsg+0x12/0x20
   entry_SYSCALL_64_fastpath+0x13/0x94
  RIP: 0033:0x7ff1f9933134
  RSP: 002b:7ffe7419b460 EFLAGS: 0293 ORIG_RAX: 002e
  RAX: ffda RBX: 55604b6d82b9 RCX: 7ff1f9933134
  RDX:  RSI: 7ffe7419b4b0 RDI: 0007
  RBP: 7ffe7419b940 R08:  R09: 55604d16b400
  R10: 7ff1f7cf8b38 R11: 0293 R12: 0001
  R13: 0001 R14: 7ffe7419b670 R15: 55604b9515a0
  iwlwifi :3a:00.0: Failed to send MAC context (action:1): -5

and it doesn't get any better from there. The next error seems to be

  Timeout waiting for hardware access (CSR_GP_CNTRL 0x0808)
  [ cut here ]
  WARNING: CPU: 3 PID: 1075 at
drivers/net/wireless/intel/iwlwifi/pcie/trans.c:1874
iwl_trans_pcie_grab_nic_access+0xdf/0xf0 [iwlwifi]

and it will continue with those microcode failure errors and various
other warnigns about how nothing is working.

And no, nothing works.  A lot of log output, no actual network access..

  Linus


Re: [PATCH V2] brcmfmac: fix possible buffer overflow in brcmf_cfg80211_mgmt_tx()

2017-07-07 Thread Linus Torvalds
On Fri, Jul 7, 2017 at 1:09 PM, Arend van Spriel
 wrote:
> Now I signed off on the patch although formally I suppose Linus should
> sign it off.

You can certainly consider it

   Signed-off-by: Linus Torvalds 

but I really don't need the authorship (or resulting sign-off
requirement) because multiple people ended up sending in very similar
patches.

All the real work was in actually finding the issue.

  Linus


Re: [PATCH] brcmfmac: fix possible buffer overflow in brcmf_cfg80211_mgmt_tx()

2017-07-07 Thread Linus Torvalds
On Fri, Jul 7, 2017 at 6:17 AM, Johannes Berg  wrote:
>
> Linus, since you were involved already, will you apply this directly?

I don't think it's _that_ urgent, since it's specific to one
particular driver anyway. I'd suggest just going through the normal
channels, and be cc'd to netdev.

> I guess it should also have a Cc: stable tag, and perhaps a Fixes?

The fixes tag would be

Fixes: 18e2f61db3b70 ("brcmfmac: P2P action frame tx.")

which is 3.9 in case anybody cares. I assume that didn't get
backported any further.

Linus


Re: [PATCH] brcmfmac: buffer overflow in brcmf_cfg80211_mgmt_tx()

2017-07-06 Thread Linus Torvalds
On Thu, Jul 6, 2017 at 10:11 AM, Arend van Spriel
 wrote:
>
> Looks fine to me so ...

I really think that if we can't trust 'len', then we have to check
against the lower bound of DOT11_MGMT_HDR_LEN too, because otherwise
we'll just have a big 16-bit number instead.

And we should do that brcmf_err() that I had in my version, which also
let's people know they are being attacked.

Linus


Re: [PATCH] cfg80211: make RATE_INFO_BW_20 the default

2017-05-04 Thread Linus Torvalds
On Thu, May 4, 2017 at 8:22 AM, David Miller  wrote:
> From: Johannes Berg 
>>
>> I figured I'd give Linus to a chance to try or even apply it, but I
>> have no objection to you applying it either. I don't have anything else
>>   yet right now, and sending a pull request for just a single patch
>> would be quite pointless.
>
> Ok, let's give Linus a chance to test the patch.

I'm having trouble recreating the warning. I have no idea why. It only
happened during ten minutes yesterday, and nothing in my wireless
setup has changed.

I wonder if *normally* my setup ends up connecting with a 40MHz band
or something, and I just happened to see the default uninitialized
case once.

I see that Jens reported that the patch works, although I'm wondering
how repeatable it was for him.  The patch obviously looks simple and
seems like an obviously GoodThing(tm) regardless.

   Linus


new warning at net/wireless/util.c:1236

2017-05-03 Thread Linus Torvalds
So my Dell XPS 13 seems to have grown a new warning as of the
networking merge yesterday.

Things still work, but when it starts warning, it generates a *lot* of
noise (I got 36 of these within about ten minutes).

I have no idea what triggered it, because when I rebooted (not because
of this issue, but just to reboot into a newer kernel) I don't see it
again.

This is all pretty regular wireless - it's intel 8260 wireless in a
fairly normal laptop.

Things still seem to *work* ok, so the only problem here is the overly
verbose and useless WARN_ON. It doesn't even print out *which* rate it
is warning about, it just does that stupid unconditional WARN_ON()
without ever shutting up about it..

The WARN_ON() seems to be old, but my logs don't seem to have any
mention of this until today, so there's something that has changed
that now triggers it.

Ideas?

  Linus

---

WARNING: CPU: 3 PID: 1138 at net/wireless/util.c:1236
cfg80211_calculate_bitrate+0x139/0x170 [cfg80211]
Modules linked in: rfcomm fuse ccm ip6t_rpfilter ip6t_REJECT
nf_reject_ipv6 xt_conntrack ip_set nfnetlink ebtable_broute bridge stp
llc ebtable_nat ip6table_security ip6table_mangle ip6table_nat nf_con
 snd_hda_codec iwlmvm irqbypass snd_hwdep snd_hda_core intel_cstate
mac80211 snd_seq intel_rapl_perf snd_seq_device snd_pcm iwlwifi
rtsx_pci_ms snd_timer cfg80211 memstick snd soundcore i2c_i801 shpchp
 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt
fb_sys_fops drm i2c_hid video
CPU: 3 PID: 1138 Comm: NetworkManager Tainted: GW
4.11.0-06543-g2f34c1231bfc #60
Hardware name: Dell Inc. XPS 13 9350/09JHRY, BIOS 1.4.13 12/28/2016
task: 9c1d1bfcbb80 task.stack: bb95c337c000
RIP: 0010:cfg80211_calculate_bitrate+0x139/0x170 [cfg80211]
RSP: 0018:bb95c337f5b8 EFLAGS: 00010293
RAX:  RBX: 9c1cb080cc00 RCX: 
RDX: 0005 RSI: 0002 RDI: bb95c337f76e
RBP: bb95c337f5b8 R08: 0004 R09: 9c1cc36fe0c4
R10: f7c00472 R11: c0682000 R12: 9c1cc36fe0c0
R13: bb95c337f76e R14:  R15: 9c1cc36fe030
FS:  7f15f980() GS:9c1d3ed8() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 7ffd1ae62748 CR3: 000469f4f000 CR4: 003406e0
Call Trace:
 nl80211_put_sta_rate+0x56/0x210 [cfg80211]
 nl80211_send_station.isra.63+0x639/0xd60 [cfg80211]
 nl80211_get_station+0x1e4/0x250 [cfg80211]
 genl_family_rcv_msg+0x1fa/0x3e0
 genl_rcv_msg+0x4c/0x90
 netlink_rcv_skb+0xde/0x110
 genl_rcv+0x28/0x40
 netlink_unicast+0x189/0x220
 netlink_sendmsg+0x2ba/0x3b0
 sock_sendmsg+0x38/0x50
 ___sys_sendmsg+0x2b6/0x2d0
 __sys_sendmsg+0x54/0x90
 SyS_sendmsg+0x12/0x20
 entry_SYSCALL_64_fastpath+0x13/0x94
RIP: 0033:0x7efffebf82fd
RSP: 002b:7fff97d4af30 EFLAGS: 0293 ORIG_RAX: 002e
RAX: ffda RBX:  RCX: 7efffebf82fd
RDX:  RSI: 7fff97d4afc0 RDI: 0010
RBP: 7fff97d4b070 R08:  R09: 7efffcc7b168
R10: 55eb69e6d110 R11: 0293 R12: 7fff97d4b0e0
R13: 0001 R14:  R15: 55eb68a32760
Code: 89 d0 f7 e1 d1 ea 8d 14 92 01 d2 81 c2 50 c3 00 00 b9 c5 5a 7c
0a c1 ea 05 89 d0 f7 e1 5d 89 d0 c1 e8 07 c3 31 c0 80 f9 02 74 b7 <0f>
ff 31 c0 eb b1 0f ff 31 c0 5d c3 0f ff 31 c0 5d c3 8d 04 40


Re: ath10k regression on XPS13

2017-02-21 Thread Linus Torvalds
On Tue, Feb 21, 2017 at 10:18 AM, David Miller  wrote:
>
> Kalle I really wanted to send my net-next pull request to Linus later
> today.  But I guess I have to wait for this ath10k first.

Feel free to send it to me - it sounds like the regression is
 (a) easy to work around
and
 (b) has a fix coming up.

And it won't even be something that I personally notice, since I have
the prev-gen XPS13 that has intel wireless.

  Linus


Re: [RFC (v7)] add basic register-field manipulation macros

2016-08-18 Thread Linus Torvalds
On Thu, Aug 18, 2016 at 10:11 AM, Jakub Kicinski
 wrote:
> Hi!
>
> This is what I came up with.  Changes:

I can live with this, certainly. I'm not really sure how many drivers
(or perhaps core code, for that matter) will actually start using it,
but it at least _looks_ like a usable interface that seems to be quite
resistant to people doing stupid things with it that would result in
surprising results (either performance or semantics).

So I'm ok with something like this coming through (for example) the
wireless tree if the drivers there are the first ones to start using
this.

Let's see if anybody else objects.

   Linus


Re: [PATCHv6 1/2] add basic register-field manipulation macros

2016-08-17 Thread Linus Torvalds
On Wed, Aug 17, 2016 at 10:11 AM, Jakub Kicinski
 wrote:
> On Wed, 17 Aug 2016 09:33:26 -0700, Linus Torvalds wrote:
>>
>> I'm not a huge fan, since the interface fundamentally seems to be
>> oddly designed (why pass in the mask rather than "start bit +
>> length"?).
>
> Would that not require start and length to have separate defines?

Yeah.

> I assume doing:
>
> #define REG_BLA_FIELD_FOO  3, 4
> val = FIELD_GET(REG_BLA_FIELD_FOO, reg);
>
> is not acceptable.  Attempts to define a single value brought us to the
> shifted mask.

Agreed. Maybe the mask with the complexity to then undo it (at compile
time) is the better approach.

Linus


Re: [PATCHv6 1/2] add basic register-field manipulation macros

2016-08-17 Thread Linus Torvalds
On Wed, Aug 17, 2016 at 3:31 AM, Kalle Valo  wrote:
>
> Are people ok with this? I think they are useful and I can take these
> through my tree, but I would prefer to get an ack from other maintainers
> first. Dave? Andrew?

I'm not a huge fan, since the interface fundamentally seems to be
oddly designed (why pass in the mask rather than "start bit +
length"?).

I also don't like how this very much would match the GENMASK() macros
we have, but then it clashes with them in other ways

 - it's in a different header file

 - it has completely different naming (GENMASK_ULL vs FIELD_GET64}.

I actually think the naming could/should be fixed by just
automatically doing the right thing based on sizes.  For example,
GENMASK could just have something like

  #define GENMASK(end,start) __builtin_choose_expr((end)>31,
__GENMASK_64(end,start), __GENMASK_32(end,start))

and doing similar things with the FIELD_GET/SET ops.

So I'm not entirely happy about this all.

But if people love the interface, and think the above kind of cleanups
might be possible, I'd just want to make sure that there is also a

   BUILD_BUG_ON(!__builtin_constant_p(_mask));

because if the mask isn't a build-time constant, the code ends up
being *complete* shit. Also preferably something like

   BUILD_BUG_ON((_mask) > (typeof(_val)~0ull);

to make sure you can't try to mask bits that don't exist in the value.

Or something like that to make mis-uses *really* obvious.

The FIELD_PUT macro also seems misnamed. It doesn't "put" anything
(unlike the GET macro). It just prepares the field for inserting
later. As exemplified by how you actually have to put things:

First, "GET" - yeah, that looks like a "get" operation:

 * Get:
 *  a = FIELD_GET(REG_FIELD_A, reg);

But then "PUT" isn't actually a PUT operation at all, but the comments
kind of gloss over it by talking about "Modify" instead:

 * Modify:
 *  reg &= ~REG_FIELD_C;
 *  reg |= FIELD_PUT(REG_FIELD_C, c);

so I'm not entirely sure about the naming.

I can live with the FIELD_PUT naming, because I see how it comes
about, even if I think it's a bit odd. I might have called it
"FIELD_PREP" instead, but I'm not really sure that's all that much
better.

Am I being a bit anal? Yeah. But when we add whole new abstractions
that we haven't used historically, I'd really like those to be obvious
and easy to use (or rather, really _hard_ to get wrong by mistake).

Hmm?

  Linus


Re: [PATCH] remove lots of IS_ERR_VALUE abuses

2016-05-27 Thread Linus Torvalds
On Fri, May 27, 2016 at 2:23 PM, Arnd Bergmann  wrote:
>
> This patch changes all users of IS_ERR_VALUE() that I could find
> on 32-bit ARM randconfig builds and x86 allmodconfig. For the
> moment, this doesn't change the definition of IS_ERR_VALUE()
> because there are probably still architecture specific users
> elsewhere.

Patch applied with the fixups from Al Viro edited in.

I also ended up removing a few other users (due to the vm_brk()
interface), and then made IS_ERR_VALUE() do the "void *" cast so that
integer use of a non-pointer size should now complain.

It works for me and has no new warnings in my allmodconfig build, and
with your ARM work that is presumably clean too. But other
architectures may see new warnings.

People who got affected by this should check their subsystem code for
the changes.

  Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] remove lots of IS_ERR_VALUE abuses

2016-05-27 Thread Linus Torvalds
On Fri, May 27, 2016 at 2:46 PM, Andrew Morton
 wrote:
>
> So you do plan to add some sort of typechecking into IS_ERR_VALUE()?

The easiest way to do it is to just turn the (x) into (unsigned
long)(void *)(x), which then complains about casting an integer to a
pointer if the integer has the wrong size.

But if we get rid of the bogus cases, there's just a few left, and we
should probably just rename the whole thing (the initial double
underscore). It really isn't something normal people should use.

  Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [GIT] Networking

2016-05-18 Thread Linus Torvalds
On Wed, May 18, 2016 at 11:58 AM, Kalle Valo  wrote:
>
> It would be best if you could send a patch either directly to Dave or
> Linus to resolve this quickly.

I'm committing my patch myself right now, since this bug makes my
laptop useless, and I will take credit for finding and testing it on
my own even if it was apparently also discussed independently on the
networking list ;)

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [GIT] Networking

2016-05-18 Thread Linus Torvalds
On Wed, May 18, 2016 at 11:45 AM, Linus Torvalds
 wrote:
>
> From what I can tell, there's a merge bug in commit 909b27f70643,
> where David seems to have lost some of the changes to
> iwl_mvm_set_tx_cmd().
>
> I do not know if that's the reason for the problem I see. But I will test.

Yes. The attached patch that fixes the incorrect merge seems to fix
things for me.

That should mean that the assumption that this problem existed in v4.6
too was wrong, because the incorrect merge came in later. I think
Luciano mis-understood "v4.6+" to mean plain v4.6.

Reinoud Koornstra, does this patch fix things for you too?

   Linus
 drivers/net/wireless/intel/iwlwifi/mvm/tx.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/tx.c 
b/drivers/net/wireless/intel/iwlwifi/mvm/tx.c
index 880210917a6f..c53aa0f220e0 100644
--- a/drivers/net/wireless/intel/iwlwifi/mvm/tx.c
+++ b/drivers/net/wireless/intel/iwlwifi/mvm/tx.c
@@ -211,6 +211,7 @@ void iwl_mvm_set_tx_cmd(struct iwl_mvm *mvm, struct sk_buff 
*skb,
struct iwl_tx_cmd *tx_cmd,
struct ieee80211_tx_info *info, u8 sta_id)
 {
+   struct ieee80211_tx_info *skb_info = IEEE80211_SKB_CB(skb);
struct ieee80211_hdr *hdr = (void *)skb->data;
__le16 fc = hdr->frame_control;
u32 tx_flags = le32_to_cpu(tx_cmd->tx_flags);
@@ -294,7 +295,7 @@ void iwl_mvm_set_tx_cmd(struct iwl_mvm *mvm, struct sk_buff 
*skb,
tx_cmd->tx_flags = cpu_to_le32(tx_flags);
/* Total # bytes to be transmitted */
tx_cmd->len = cpu_to_le16((u16)skb->len +
-   (uintptr_t)info->driver_data[0]);
+   (uintptr_t)skb_info->driver_data[0]);
tx_cmd->life_time = cpu_to_le32(TX_CMD_LIFE_TIME_INFINITE);
tx_cmd->sta_id = sta_id;
 


Re: [GIT] Networking

2016-05-18 Thread Linus Torvalds
On Wed, May 18, 2016 at 7:23 AM, Coelho, Luciano
 wrote:
>
> I can confirm that 4.6 contains the same bug.  And reverting the patch
> I mentioned does solve the problem...
>
> The same patch works fine in our internal tree.  I'll have to figure
> out together with Emmanuel what the problem actually is.

Hmm.

>From what I can tell, there's a merge bug in commit 909b27f70643,
where David seems to have lost some of the changes to
iwl_mvm_set_tx_cmd().

The reason seems to be a conflict with d8fe484470dd, where David
missed the fact that "info->driver_data[0]" had become
"skb_info->driver_data[0]", and then he removed the skb_info because
it was unused.

I do not know if that's the reason for the problem I see. But I will test.

David, do you happen to recall that merge conflict? I think you must
have removed that "skb_info" variable declaration and initialization
manually (due to the "unused variable" warning, which in turn was due
to the incorrect merge of the actual conflict), because I think git
would have merged that line into the result.

  Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [GIT] Networking

2016-05-17 Thread Linus Torvalds
On Tue, May 17, 2016 at 12:11 PM, David Miller  wrote:
>
> Highlights:

Lowlights:

 1) the iwlwifi driver seems to be broken

My laptop that uses the intel 7680 iwlwifi module no longer connects
to the network. It fails with a "Microcode SW error detected." and
spews out register state over and over again.

The last thing it says before falling over is:

  wlp1s0: authenticate with xx:xx:xx:xx:xx:xx
  wlp1s0: send auth to xx:xx:xx:xx:xx:xx (try 1/3)
  wlp1s0: send auth to xx:xx:xx:xx:xx:xx (try 2/3)

and then it goes all titsup.

I thought that it might be because I had downloaded one of the daily
firmware versions (it calls itself iwlwifi-7260-17.ucode, but isn't a
real release afaik - but it has worked fien for me before), but the
problem persists with the ver-16 ucode too, so that wasn't it.

I haven't bisected it, but there is absolutely nothing odd in my hardware.

I do have a 802.11ac network, which apparently not everybody does,
judging by previous bug-reports of mine..

Intel iwlwifi people: please check this out.

   Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


iwlwifi incomplete initialization in Linux 4.5?

2016-03-19 Thread Linus Torvalds
So I upgraded the firmware on my Intel NUC (NUC6i3SYK), and that made
the wireless no longer work with a 4.5 kernel. I could get the
occasional packets through, but not many, and ti would hang for ten
seconds at a time, and then output errors like

  iwlwifi :01:00.0: Queue 2 stuck for 1 ms.
  iwlwifi :01:00.0: Current SW read_ptr 60 write_ptr 93
  ..

which was odd, because that kernel had worked fine before.

I booted between two different kernels, going back to an older 4.5-rc3
one that had been running a lot longer on that machine, because
initially I thought that this was some recent kernel failure (I didn't
initially connect it with the firmware upgrade, because this is my
kids machine and I hadn't tested networking after the firmware
update). But that older known-good kernel failed the same way.

Going all the way back to the 4.4 kernel that Fedora uses made
wireless work, and then rebooting back into a 4.5 kernel also worked.

Now, it's *possible* that it was just something odd and transient and
it just happened to clear up as I rebooted into the Fedora kernel, but
it feels more likely that there's some incomplete initialization in
recent 4.5 kernels, which isn't normally noticeable, but the full
system reset done as part of the firmware upgrade might have shown it.

I'm attaching all the iwlwifi debug output that goes along with the
stuck queue, in the hopes that it makes sense to somebody. This is
from the 4.5-rc3 boot into an older kernel, but final 4.5 showed the
same behavior.

Googling iwlwifi stuck queues shows a lot of reports over the years,
but it might be a common symptom of "something is screwed up".

I'm not sure I can reproduce it any more now that it works again (and
I'm not really willing to force a firmware downgrade), but if there is
something particular to test, I can do that.

Ideas?

Linus


celeste-wifi-problem
Description: Binary data


Re: iwlwifi incomplete initialization in Linux 4.5?

2016-03-19 Thread Linus Torvalds
On Wed, Mar 16, 2016 at 2:23 PM, Linus Torvalds
 wrote:
>
>> Do you use 20Mhz or 40MHz?
>
> HT20 on 2.4GHz, HT40 on 5GHz.
>
> At least that's the wireless AP setup.
>
>> Basically, I'd like to see the output of iw dev
>
> I'll have to walk over and check. I don't have my machines set up so
> you can get into them over the network..

Hmm. "iw dev" seems to say that device is using the 2.4Ghz side, at
least in the working configuration.

  phy#0
Unnamed/non-netdev interface
wdev 0x2
addr a4:34:d9:0e:20:d7
type P2P-device
Interface wlp1s0
ifindex 3
wdev 0x1
addr a4:34:d9:0e:20:d6
type managed
channel 1 (2412 MHz), width: 20 MHz, center1: 2412 MHz

I have no idea why it wouldn't connect to the 5GHz network, but it
might just be far enough away (a couple of walls, not so much
distance) that it is borderline. Both networks have the same essid and
password, maybe I should add a separate 5GHz network to make it easier
to say "connect to that one" for testing, in case the trouble happens
with the 5GHz side only.

 Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: iwlwifi incomplete initialization in Linux 4.5?

2016-03-18 Thread Linus Torvalds
On Wed, Mar 16, 2016 at 2:13 PM, Grumbach, Emmanuel
 wrote:
>
> This ... typically means that the firmware got stuck while sending
> packets. Can you tell me on what band your router operates? 2.4GHz or
> 5.2GHz?

Both.

> Do you use 20Mhz or 40MHz?

HT20 on 2.4GHz, HT40 on 5GHz.

At least that's the wireless AP setup.

> Basically, I'd like to see the output of iw dev

I'll have to walk over and check. I don't have my machines set up so
you can get into them over the network..

> Hmm, this is strange since 4.4 and 4.5 will both load -16.ucode which
> you seemed to be running when the have the Queue hang message.

Correct. Both cases used the 16 ucode, since that's what F22 comes
with. I upgrade the kernel, but intentionally don't touch anything
else in the system.

   Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: WARNING at net/mac80211/rate.c:513 ieee80211_get_tx_rates [mac80211]

2016-01-29 Thread Linus Torvalds
On Fri, Jan 29, 2016 at 11:42 AM, Larry Finger
 wrote:
>
> Thanks for testing.
>
> Upon reflection, it really should check the other WIRELESS_MODE_AC_x bits.
> Johannes' patch was indeed correct.

I just retested with this incremental (and whitespace-damaged) patch:

  @@ -139,7 +139,9 @@ static void _rtl_rc_rate_set_series(struct
rtl_priv *rtlpriv,
   (wireless_mode == WIRELESS_MODE_N_24G)))
  rate->flags |= IEEE80211_TX_RC_MCS;
  if (sta && sta->vht_cap.vht_supported &&
  -   (wireless_mode == WIRELESS_MODE_AC_5G))
  +   ((wireless_mode == WIRELESS_MODE_AC_5G) ||
  +(wireless_mode == WIRELESS_MODE_AC_24G) ||
  +(wireless_mode == WIRELESS_MODE_AC_ONLY)))
  rate->flags |= IEEE80211_TX_RC_VHT_MCS;
  }
   }

which brings it in line with Johannes' patch, and it does indeed still work.

I think marking it for stable is also the right thing to do - the
driver clearly doesn't work well in a wide-channel AC environment
otherwise, and I assume it's going to be more and more common..

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: WARNING at net/mac80211/rate.c:513 ieee80211_get_tx_rates [mac80211]

2016-01-29 Thread Linus Torvalds
On Fri, Jan 29, 2016 at 9:54 AM, Larry Finger  wrote:
>
> The test patch that Johannes sent earlier was close. The section needed to
> add VHT rates is:

Hmm. This looks pretty much exactly like what I already tried (I had
fixed Johannes' patch to use "vht_cap" already, since it didn't
compile otherwise).

So the only difference is that it only checks WIRELESS_MODE_AC_5G.

But it worked for me this time. I have no idea why.

Maybe Johannes' patch actually always worked for me, but I just had a
transient problem that made me think it didn't. I think I only booted
it once, and went "oh, ok, no network, that didn't work".

  Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: WARNING at net/mac80211/rate.c:513 ieee80211_get_tx_rates [mac80211]

2016-01-28 Thread Linus Torvalds
On Thu, Jan 28, 2016 at 5:54 PM, Larry Finger  wrote:
>
> I have been running an RTL8821AE since kernel 3.18 without hitting this
> problem using a TRENDnet AC1750 dual-band AP. The UniFi may be doing
> something that the driver is not expecting.

I've had issues with unifi ap's before, but to be honest, I've had
issues with lots of hotel and airport wifi too. I don't think the
Unifi APs are outside of the normal spectrum..

> Attached is a minimal patch that comments out the "vht_cap->vht_supported =
> true;" statement for both RTL8821AE and RTL8812AE in
> _rtl_init_hw_vht_capab(). Does that allow your system to work?

That works too, yes.

> The patch
> also logs some information regarding the channelplan and the country code.
> Please let me know the values for those.

  rtlwifi:  channelplan 127
  rtlwifi:  country code 13

> I apparently missed a previous complaint about this issue. If you still have
> the reference, please send it to me.

So googling for similar issues, I found

  https://bugzilla.redhat.com/show_bug.cgi?id=1168467
  https://bugzilla.redhat.com/show_bug.cgi?id=1293136

where that second one in particular looks very like my issue
("Association succeeds, and ARP/DHCP work, but no IP frames can be
transmitted").

In both cases you have to go into the dmesg attachment to see that its
rtlwifi in both cases).

And there's an ubuntuforum thread

  http://ubuntuforums.org/showthread.php?t=2226009&page=2

where it you follow the thing, it's an rtl chip on a PCI card, and it
has very similar "connected but no internet" behavior, along with the
"net/mac80211/rate.c:526" warning (different line numbers, different
kernel version, but it smells similar).

Or this one:

  http://forums.debian.net/viewtopic.php?f=5&t=111781

which is also rtl-wifi, and also has the "associated, connected, got
an IP, but no data, not even a ping" behavior. It also has the
warning, but it looks different in other ways (2.4GHz only and
actually says it's not doing HT/VHT).

So I don't know. The warning in net/mac80211/rate.c:does seem to be
associated with the realtek driver.

 Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: WARNING at net/mac80211/rate.c:513 ieee80211_get_tx_rates [mac80211]

2016-01-28 Thread Linus Torvalds
On Thu, Jan 28, 2016 at 2:12 PM, Johannes Berg
 wrote:
>
> Your best workaround may just be to ignore VHT for now - clearly it's
> broken so using "just" HT (which is likely not that much of a penalty
> anyway since you're apparently not using 80 MHz) will be much better.
>
> Go into
>
> _rtl_init_hw_vht_capab()
>
> and just remove or stub out the entire contents of that (or you could
> just remove the "vht_supported=true" if you feel like it.)
>
> That should get it to HT only, which is likely tested and working
> better.

Bingo. That indeed gets me working wireless. It's not super-fast, but
I don't think it ever has been..

If somebody has a suggested patch to actually *fix* VHT on this
chipset, that would obviously be better. And maybe it works on some
other chipsets, but not on mine. I'll happily test patches now that
the merge window is over and I have some time again (and I can also
make my AP do 80MHz channels if that matters, although as Johannes
noted it's not enabled by default).

For the realtek driver people, here is what lspci says:

02:00.0 Network controller: Realtek Semiconductor Co., Ltd. RTL8821AE
802.11ac PCIe Wireless Network Adapter
Subsystem: AzureWave Device 2161
Kernel driver in use: rtl8821ae

(Numeric PCI ID: 10ec:8821, subsystem 1a3b:2161)

Thanks,

  Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: WARNING at net/mac80211/rate.c:513 ieee80211_get_tx_rates [mac80211]

2016-01-28 Thread Linus Torvalds
On Thu, Jan 28, 2016 at 1:44 PM, Linus Torvalds
 wrote:
>
> I will try Johannes' suggestion on that machine to see if it makes a
> difference

Well, it "makes a difference" in the sense that the warning goes away.
But it doesn't make things work. In fact, it might be making things
worse.

Because with that patch, the wireless still authenticates and
associates, but then it doesn't even get an IP address, so now even
dhcp doesn't work. Of course, I was surprised that it worked last
time, and I'm not 100% sure it did work consistently. I'll re-test
without the patch, just to make sure, but it doesn't really seem to
improve on anything.

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: WARNING at net/mac80211/rate.c:513 ieee80211_get_tx_rates [mac80211]

2016-01-28 Thread Linus Torvalds
Adding the RTL people to the cc, and leaving the whole thing quoted at
the bottom..

I will try Johannes' suggestion on that machine to see if it makes a
difference, but somebody who knows the rtlwifi rate control code
should take a double- or triple-look at this.

Please? Some googling shows that this is not a new issue. Or at least
I seem to find reports that look very much like this from over a year
ago.

 Linus

On Thu, Jan 28, 2016 at 12:40 PM, Johannes Berg
 wrote:
> On Thu, 2016-01-28 at 11:01 -0800, Linus Torvalds wrote:
>>
>> I used to have the basic original UniFi UAP. I've replaced them with
>> the newer "AC Lite" version:
>>
>> https://www.ubnt.com/unifi/unifi-ap-ac-lite/
>>
>> so it's a fairly big jump from a 2.4GHz-only network to a dual-band
>> one.
>>
>> The old 2.4GHz-only AP's showed the problem with minstrel-ht
>> incorrectly starting off at the highest rate (on a totally different
>> machine). So the Unifi AP's have shown problems in the kernel
>> wireless before, but so far it's always been the fault of the kernel
>> wireless, not the AP.
>
> Yeah; I wasn't trying to blame it on this change, I was just trying to
> understand the change in the environment. Seems likely that it's simply
> the switch to 5 GHz, which is strange, I'd have thought that even that
> rtlwifi driver would've been tested with that :)
>
>> > Could you print out the entire table there when the warning
>> > happens?
>>
>> This is the best I can come up with: printing out the index, and the
>> rate and bitrate tables:
>>
>>   rates[i].idx (9) >= sband->n_bitrates (8)
>>   Rates:
>>   0: idx 9 count 1 flags a0
>>   1: idx 8 count 1 flags a0
>>   2: idx 7 count 2 flags a0
>>   3: idx 6 count 3 flags a0
>
> Yeah, perfect. See, this is already evidently not making any sense:
>
> flags a0 is
> IEEE80211_TX_RC_40_MHZ_WIDTH | IEEE80211_TX_RC_SHORT_GI
>
> both of those options *require* IEEE80211_TX_RC_MCS or
> IEEE80211_TX_RC_VHT_MCS as well, so the flags really should be a8 or
> 1a0.
>
>>   Bitrates:
>>   0: flags 0002 bitrate 60 (hw: 0004 )
>>   1: flags  bitrate 90 (hw: 0005 )
>>   2: flags 0002 bitrate 120 (hw: 0006 )
>>   3: flags  bitrate 180 (hw: 0007 )
>>   4: flags 0002 bitrate 240 (hw: 0008 )
>>   5: flags  bitrate 360 (hw: 0009 )
>>   6: flags  bitrate 480 (hw: 000a )
>>   7: flags  bitrate 540 (hw: 000b )
>>
>> So it's the very first rate that has index 9, but the bitrate table
>> only goes from 0-7.
>>
>> So I suspect that once the first index has been marked invalid, it
>> now will never even look at the later indices, so it has no transmit
>> rates at all.  Or something.
>
> Indeed.
>
>> That bitrate table does seem to match:
>>
>>static struct ieee80211_rate rtl_ratetable_5g[] = {
>>
>> in drivers/net/wireless/realtek/rtlwifi/base.c
>>
>
> Yeah, it would, but it's irrelevant since the rate table isn't actually
> used with MCS rates.
>
> I'm not familiar with this code at all, but looking at it suggests that
> perhaps the switch to 5 GHz wasn't at fault, but instead the switch to
> VHT (802.11ac) - that's more plausible too, not testing with VHT seems
> like something that could have happened for this driver.
>
> And as I figured, the code in _rtl_rc_rate_set_series() is obviously
> not handling VHT correctly: it has
>
> if (sgi_20 || sgi_40 || sgi_80)
> rate->flags |= IEEE80211_TX_RC_SHORT_GI;
> if (sta && sta->ht_cap.ht_supported &&
> ((wireless_mode == WIRELESS_MODE_N_5G) ||
>  (wireless_mode == WIRELESS_MODE_N_24G)))
> rate->flags |= IEEE80211_TX_RC_MCS;
>
> but can never set IEEE80211_TX_RC_VHT_MCS. Seems like there should be
> something like
>
> if (sta && sta->ht_cap.vht_supported &&
> (wireless_mode == WIRELESS_MODE_AC_5G ||
>  wireless_mode == WIRELESS_MODE_AC_24G ||
>  wireless_mode == WIRELESS_MODE_AC_ONLY))
> rate->flags |= IEEE80211_TX_RC_VHT_MCS;
>
> just after/before the above block.
>
> But I'm not familiar with this code at all, so that may not really be
> the right fix or even work.
>
> johannes
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: WARNING at net/mac80211/rate.c:513 ieee80211_get_tx_rates [mac80211]

2016-01-28 Thread Linus Torvalds
On Thu, Jan 28, 2016 at 4:13 AM, Johannes Berg
 wrote:
> On Wed, 2016-01-27 at 21:34 -0800, Linus Torvalds wrote:
>
>> .. except now I upgraded the nearest access point, and now wireless
>> on that machine no longer works.
>
> Can you describe the upgrade a bit more, just for background?

I used to have the basic original UniFi UAP. I've replaced them with
the newer "AC Lite" version:

https://www.ubnt.com/unifi/unifi-ap-ac-lite/

so it's a fairly big jump from a 2.4GHz-only network to a dual-band one.

The old 2.4GHz-only AP's showed the problem with minstrel-ht
incorrectly starting off at the highest rate (on a totally different
machine). So the Unifi AP's have shown problems in the kernel wireless
before, but so far it's always been the fault of the kernel wireless,
not the AP.

> Could you print out the entire table there when the warning happens?

This is the best I can come up with: printing out the index, and the
rate and bitrate tables:

  rates[i].idx (9) >= sband->n_bitrates (8)
  Rates:
  0: idx 9 count 1 flags a0
  1: idx 8 count 1 flags a0
  2: idx 7 count 2 flags a0
  3: idx 6 count 3 flags a0
  Bitrates:
  0: flags 0002 bitrate 60 (hw: 0004 )
  1: flags  bitrate 90 (hw: 0005 )
  2: flags 0002 bitrate 120 (hw: 0006 )
  3: flags  bitrate 180 (hw: 0007 )
  4: flags 0002 bitrate 240 (hw: 0008 )
  5: flags  bitrate 360 (hw: 0009 )
  6: flags  bitrate 480 (hw: 000a )
  7: flags  bitrate 540 (hw: 000b )

So it's the very first rate that has index 9, but the bitrate table
only goes from 0-7.

So I suspect that once the first index has been marked invalid, it now
will never even look at the later indices, so it has no transmit rates
at all.  Or something.

That bitrate table does seem to match:

   static struct ieee80211_rate rtl_ratetable_5g[] = {

in drivers/net/wireless/realtek/rtlwifi/base.c

Does this give you any ideas?

  Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


WARNING at net/mac80211/rate.c:513 ieee80211_get_tx_rates [mac80211]

2016-01-27 Thread Linus Torvalds
Hmm. So my daughter has a little Gigabyte Brix that has rtl8821ae
wireless in it. Yeah, nasty, I know, but it has actually worked
reasonably well.

.. except now I upgraded the nearest access point, and now wireless on
that machine no longer works.

Or rather, it actually *does* work in the sense that it authenticates,
it associates, and it actually gets a DHCP lease etc. So the darn
thing has an IP address and everything, but then nothing else seems to
go through after that. Very odd. My guess is that the auth/assoc/dhcp
thign happens at low rates, then it starts trying to up the rates, and
things go to hell.

But clearly several packets have gotten through.  And then absolutely
nothing. Everything else is happy with the new AP, so this is not a
problem with the wireless network itself.

I'm appending the warning that gets printed, which may or may not be relevant.

This is with a clean and up-to-date Fedora 23 install, so that line 513 is the

   512  /* RC is busted */
   513  if (WARN_ON_ONCE(rates[i].idx >= sband->n_bitrates)) {
   514  rates[i].idx = -1;
   515  continue;
   516  }

thing, which still exists in the same form in current kernels (except
in current -git it's line 625).

I do note that that rate_fixup_ratelist() function is a bit odd wrt
those rate indexes: it has code to make sure that there are no valid
rates following an invalid one:

/*
 * make sure there's no valid rate following
 * an invalid one, just in case drivers don't
 * take the API seriously to stop at -1.
 */
if (inval) {
rates[i].idx = -1;
continue;
}
if (rates[i].idx < 0) {
inval = true;
continue;
}

but then that "RC is busted" case that generates a warning will add
one of those invalid rates in the middle anyway. So I get the feeling
that if that warning ever triggers, it will basically be screwing up
that whole rate table. I dunno.

Is there anything sane I can do to help debug this case?

 Linus

--- snip snip, relevant (?) wireless warning ---

IPv6: ADDRCONF(NETDEV_UP): enp3s0: link is not ready
  r8169 :03:00.0 enp3s0: link down
  IPv6: ADDRCONF(NETDEV_UP): enp3s0: link is not ready
  IPv6: ADDRCONF(NETDEV_UP): wlp2s0: link is not ready
  IPv6: ADDRCONF(NETDEV_UP): wlp2s0: link is not ready
  IPv6: ADDRCONF(NETDEV_UP): wlp2s0: link is not ready
  tun: Universal TUN/TAP device driver, 1.6
  tun: (C) 1999-2004 Max Krasnyansky 
  device virbr0-nic entered promiscuous mode
  virbr0: port 1(virbr0-nic) entered listening state
  virbr0: port 1(virbr0-nic) entered listening state
  virbr0: port 1(virbr0-nic) entered disabled state
  wlp2s0: authenticate with 46:d9:e7:92:bf:29
  wlp2s0: send auth to 46:d9:e7:92:bf:29 (try 1/3)
  wlp2s0: authenticated
  wlp2s0: associate with 46:d9:e7:92:bf:29 (try 1/3)
  wlp2s0: associate with 46:d9:e7:92:bf:29 (try 2/3)
  wlp2s0: RX AssocResp from 46:d9:e7:92:bf:29 (capab=0x411 status=0 aid=1)
  wlp2s0: associated
  IPv6: ADDRCONF(NETDEV_CHANGE): wlp2s0: link becomes ready
  [ cut here ]
  WARNING: CPU: 2 PID: 0 at net/mac80211/rate.c:513
ieee80211_get_tx_rates+0x243/0x7d0 [mac80211]()
  Modules linked in: ccm cmac xt_CHECKSUM ipt_MASQUERADE
nf_nat_masquerade_ipv4 tun nf_conntrack_netbios_ns
nf_conntrack_broadcast ip6t_rpfilter ip6t_REJECT nf_reject_ipv6
xt_conntrack ebtable_filter ebtable_nat ebtable_broute bridge ebtables
ip6table_raw ip6table_security ip6table_nat nf_conntrack_ipv6
nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_filter ip6_tables
iptable_raw iptable_security iptable_nat nf_conntrack_ipv4
nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle bnep
arc4 rtl8821ae vfat fat btcoexist rtl_pci rtlwifi mac80211
x86_pkg_temp_thermal coretemp snd_hda_codec_realtek snd_hda_codec_hdmi
snd_hda_codec_generic kvm_intel snd_soc_rt5640 kvm snd_soc_rl6231
snd_hda_intel snd_soc_core iTCO_wdt snd_hda_codec snd_compress btusb
snd_pcm_dmaengine snd_hda_core
   iTCO_vendor_support cfg80211 ac97_bus btrtl snd_hwdep
crct10dif_pclmul btbcm snd_seq crc32_pclmul btintel crc32c_intel
bluetooth snd_seq_device joydev snd_pcm mei_me mei shpchp dw_dmac
tpm_tis lpc_ich i2c_i801 snd_timer rfkill snd tpm soundcore
snd_soc_sst_acpi dw_dmac_core i2c_designware_platform
i2c_designware_core nfsd auth_rpcgss nfs_acl lockd grace sunrpc
hid_logitech_hidpp hid_logitech_dj i915 i2c_algo_bit drm_kms_helper
8021q garp drm stp llc mrp r8169 sdhci_acpi mii sdhci mmc_core video
i2c_hid
  CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.2.8-300.fc23.x86_64 #1
  Hardware name: GIGABYTE M4HM87P-00/M4HM87P-00, BIOS F2 12/11/2013
    aad0aff724c0ea01 88021ea83648 817738ca
    000

Re: [PATCH] mac80211: Send EAPOL frames at lowest rate

2015-02-26 Thread Linus Torvalds
Johannes,

On Thu, Feb 26, 2015 at 5:50 AM, Jouni Malinen  wrote:
>
> Reported-by: Linus Torvalds 

Also "Tested-by:", and I'd suggest marking it for stable too (although
I understand that David generally doesn't use stable tags, and just
sends them separately to the stable tree).

This fixes both Atheros and brcmsmac for me (with Ubiquiti UniFi APs).

My main laptop is iwlwifi, and the only reason that worked is
apparently that the iwlwifi driver already basically does something
similar on its own.

All the other devices I have apparently don't use the 802.11 code even
if they are Linux-based (ie mostly android, and presumably they use
some vendor driver rather than the minstrel rate-handling code).

 Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ath9k-devel] AR9462 problems connecting again..

2015-02-25 Thread Linus Torvalds
On Wed, Feb 25, 2015 at 10:14 AM, Linus Torvalds
 wrote:
>
> I'm talking about the two from Jouni - the "don't encrypt EAPOL
> frames" one, and the one-liner that makes all EAPOL frames go at the
> lowest data rate.

So I just found out and confirmed that this is not Atheros-specific in
any way - it looks like it's simply the UniFi AP that does not like
high-data-rate authentification frames at all.

Because it looks like the brcmsmac driver has *exactly* the same
behavior with this AP (in an Apple Macbook air). I assume brcmsmac
uses the net/80211/tx.c logic too.

And Jouni's one-liner fixes that one too, although as usual, maybe
there is some testing noise, and I screwed something up. This time I
only did the one-liner, so that's the critical one.

It's interesting to note how nothing else has been unhappy with that
network (admittedly it's been mainly android devices and a HP printer
that I've tested), so it looks like everybody else does low-rate
authentication packets anyway.

So this actually looks like a Ubiquiti UniFi AP bug to me, but it also
looks like presumably everybody else does low-rate initial packets,
and our kernel 802.11 layer should just follow suit. The whole
robustness principle and "be conservative in what you send, and
liberal in what you accept" etc.

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ath9k-devel] AR9462 problems connecting again..

2015-02-25 Thread Linus Torvalds
On Wed, Feb 25, 2015 at 6:47 AM, Jouni Malinen  wrote:
>
> There may be something else wrong (say, some kind of interference), but
> there is no way we can assume normal users to be able to fix such
> issues. If we make EAPOL frames go through more robustly, the connection
> can be established in more cases and this can result in relatively
> functional network connection and rate control can handle the less
> critical data frames through whatever means to get optimal throughput
> from the network. As such, I do think we do need to "paper over" this
> for EAPOL frames.

While I realize that people may disagree about the exact details of
how to fix this in the long run, may I suggest that in the meantime we
at least get the two workaround patches applied?

I'm talking about the two from Jouni - the "don't encrypt EAPOL
frames" one, and the one-liner that makes all EAPOL frames go at the
lowest data rate.

Even if "lowest data rate" is ridiculously low, and even if that might
disturb other things going on on the same channel at the same time,
those authentication packets shouldn't be so common as to be a
problem.  No?

Jouni has a few packet dumps for me, and he's stumped as to what
exactly is going on, but those two patches (well, the one-liner "low
data rate EAPOL" in particular, it seems) do seem to make my
connections go through reliably.

And it seems that other drivers already are working around the EAPOL
issue in similar ways, judging by the comments about iwlwifi.

Last time I had connection issues with this laptop, nothing ended up
happening in the end, and I had people pipe up saying they had had
similar problems. I'd hate for the same "nothing" to happen this time
just because people aren't 100% sure what the final right thing is to
do. So I'd really like people to apply the simple workarounds for now
because clearly something is badly wrong, and *if* there is some
better resolution later, that's fine.

I'll happily test patches. It seems to be pretty repeatable for me,
even if that "pretty repeatable" seems to be very much about the
laptop being in one very particular place (it's right next to another
AP, there's random other electronics around, since it's on my messy
desk etc). So I wouldn't be at all surprised by horribly interference.
And the AP is supposed to be ceiling- or wall-mounted, but because I'm
just testing things out it's just sitting on a table in the next room,
so for all I know it's in the *exact* worst position for the antennas
etc etc.

So I'm sure I can improve reception of my laptop, but that's not the
point. The point is that bad wireless networks aren't so unusual, and
right now things clearly don't work as well as they could.

Does anybody hate Jouni's two patches *so* much that they can
articulate *why* it would be wrong to apply them as interim patches?
And if so, do you have better patches for me to try? Because if not..

 Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: AR9462 problems connecting again..

2015-02-23 Thread Linus Torvalds
On Mon, Feb 23, 2015 at 2:43 PM, Jouni Malinen  wrote:
>
> This did not get exactly supportive response when this was proposed last
> time (Sep 2013). Anyway, for a quick test, this can be done with the
> following one-liner:

fwiw, that one-liner seems to work fine for me.

Which I guess is not a huge surprise.

Side note: I've done the "turn off wifi and turn it back on" several
times to test that patch, and it has worked every time. BUT I also see
this odd behavior where the logs show that it tries to authenticate
twice: the first time it does that "send auth to 20:9f .." thing three
times (looks like 100ms apart), and nothing happens so it does
"authentication with 20:9f .. timed out". Then it waits three seconds
and tries again, and now it succeeds on the first try.

The only downside of that seems to be that it takes an extra 3s to
connect to the network - but it does now seem to *reliably* connect -
so it's not a big problem, but I wonder why it should be that
repeatable. Is there some difference between the first and the second
time it tries to authenticate?

Anyway, even if people don't like that particular patch, it does seem
like *something* like that should be done.

 Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: AR9462 problems connecting again..

2015-02-23 Thread Linus Torvalds
On Mon, Feb 23, 2015 at 1:30 PM, Jouni Malinen  wrote:
>
> How far is the station from the AP? Would it be possible to see whether
> the behavior changes if you were within, say, five meters or so?

Well, it was pretty much within five meters already, but there was a
thin wall in between (and the old AP was right next to the laptop,
which might add some noise even if they are on different channels).
Going closer does seem to help, but again, it's not like this is 100%
reproducible to begin with.

So the theory that the driver starts at too high a transmit rate, and
then does not handle failures well, might be true. Of course, "not
handle failures well" is something of an understatement.

> It would be useful if you can capture the 802.11 frame exchange from a
> failed connection case with an external wireless sniffer.

I will try with my (much more reliable) iwlwifi laptop. At least the
merge window is over, so I should have some time. Knock wood.

  Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: AR9462 problems connecting again..

2015-02-23 Thread Linus Torvalds
On Mon, Feb 23, 2015 at 12:06 PM, Linus Torvalds
 wrote:
>
> This machine has a fairly minimal kernel config. Does that "type
> monitor" interface perhaps need some debug infrastructure that I
> haven't added?

Nope. Same behavior with a F21 kernel (which means that the touchpad
doesn't work, but that's a separate and known issue with this
platform).

   Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: AR9462 problems connecting again..

2015-02-23 Thread Linus Torvalds
On Mon, Feb 23, 2015 at 9:17 AM, Jouni Malinen  wrote:
>
> mac80211: Do not encrypt EAPOL frames before peer has used the key

Hmm. This patch does not seem to make a difference. I thought it did
at first, but then removed the wpa_supplicant debugging, and got the
same failures.

On Sun, Feb 22, 2015 at 10:01 PM, Sujith Manoharan  wrote:
>
> Or 'iw dev wlp1s0 set bitrates ht-mcs-2.4 0' to choose the lowest
> HT rate.

This *might* have worked. But it's a bit hard to really make sure,
since it sometimes does succeed even without debugging when I do
nothing at all, but it did work twice in a row after doing that.

> Sporadic association problems could be a problem with the chosen rates.
> This would show the rates for the EAPOL frames:
>
> iw dev wlp1s0 interface add mon0 type monitor
> ifconfig mon0 up

Hmm. I don't actually see a "mon0" interface after the "iw dev
interface add" thing. Yes, "iw" sees it when I do "iw dev", but
"ifconfig" does not.

This machine has a fairly minimal kernel config. Does that "type
monitor" interface perhaps need some debug infrastructure that I
haven't added?

> tshark -i mon0 -Y eapol -T fields -e radiotap.datarate -e wlan -e eapol -e 
> wlan.sa -e wlan.da

.. and then this fails, presumably for similar reasons.

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: AR9462 problems connecting again..

2015-02-22 Thread Linus Torvalds
On Sun, Feb 22, 2015 at 5:55 PM, Adrian Chadd  wrote:
>
> Do you have a 5GHz SSID setup on this access point? Is this kind of
> messed up diassociation-to-steer-you-to-another-band thing?

Nope. That's the older single-band UniFi UAP - 2.4GHz only.

   Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: AR9462 problems connecting again..

2015-02-22 Thread Linus Torvalds
On Sun, Feb 22, 2015 at 4:54 PM, Adrian Chadd  wrote:
>
> I /think/ it's okay? The removed stuff is the pre-shared key pieces.

Ok. Attached is what seems to be the relevant part of the
wpa_supplicant.log file.

The datestamp has been changed so that it can be matched up with the
dmesg, and I added empty lines for pauses when I was trying to figure
out what the heck it was doing, but other than that it's the raw log.

> Do you have another laptop with an atheros NIC in it that you could
> use in monitor mode to capture all the frames?

Nope, everything else I have seems to be intel wireless. I think one
of the kids machines is a Mac Mini with an ath5k thing, but I'm hoping
the wpa_supplicant.log is sufficient to give somebody an idea.

Linus

14:07:10.971480: EAPOL: disable timer tick
14:07:10.971578: EAPOL: Supplicant port status: Unauthorized


14:07:12.886125: nl80211: Event message available
14:07:12.886287: nl80211: Regulatory beacon hint
14:07:12.886318: wlp1s0: Event CHANNEL_LIST_CHANGED (31) received
14:07:12.886608: nl80211: Regulatory information - country=US
14:07:12.886646: nl80211: 2402-2472 @ 40 MHz
14:07:12.886670: nl80211: 5170-5250 @ 80 MHz
14:07:12.886691: nl80211: 5250-5330 @ 80 MHz
14:07:12.886711: nl80211: 5490-5600 @ 80 MHz
14:07:12.886733: nl80211: 5650-5710 @ 40 MHz
14:07:12.886755: nl80211: 5735-5835 @ 80 MHz
14:07:12.886777: nl80211: 57240-63720 @ 2160 MHz
14:07:12.886822: nl80211: Added 802.11b mode based on 802.11g information
14:07:12.886848: P2P: Add operating class 81
14:07:12.886870: P2P: Channels - hexdump(len=11): 01 02 03 04 05 06 07 08 09 0a 0b
14:07:12.886902: P2P: Add operating class 124
14:07:12.886921: P2P: Channels - hexdump(len=1): a1
14:07:12.886948: wlp1s0: P2P: Update channel list

14:07:13.011791: nl80211: Event message available
14:07:13.011987: nl80211: New scan results available
14:07:13.012055: wlp1s0: Event SCAN_RESULTS (3) received
14:07:13.012281: nl80211: Received scan results (3 BSSes)
14:07:13.012464: wlp1s0: BSS: Start scan result update 1
14:07:13.012501: wlp1s0: BSS: Add new id 0 BSSID 60:a4:4c:8d:99:24 SSID '1gnoraNT'
14:07:13.012530: dbus: Register BSS object '/fi/w1/wpa_supplicant1/Interfaces/7/BSSs/0'
14:07:13.012745: wlp1s0: BSS: Add new id 1 BSSID 60:a4:4c:8d:99:20 SSID '1gnoraNT'
14:07:13.012767: dbus: Register BSS object '/fi/w1/wpa_supplicant1/Interfaces/7/BSSs/1'
14:07:13.012905: wlp1s0: BSS: Add new id 2 BSSID 20:9f:db:e7:80:80 SSID 'UniFi-home'
14:07:13.012925: dbus: Register BSS object '/fi/w1/wpa_supplicant1/Interfaces/7/BSSs/2'
14:07:13.013080: BSS: last_scan_res_used=3/32 last_scan_full=0
14:07:13.013115: wlp1s0: New scan results available
14:07:13.013191: wlp1s0: No suitable network found
14:07:13.013209: wlp1s0: Short-circuit new scan request since there are no enabled networks
14:07:13.013221: wlp1s0: State: DISCONNECTED -> INACTIVE
14:07:13.013272: wlp1s0: Checking for other virtual interfaces sharing same radio (phy0) in event_scan_results
14:07:13.014248: RTM_NEWLINK: operstate=0 ifi_flags=0x1003 ([UP])
14:07:13.014265: RTM_NEWLINK, IFLA_IFNAME: Interface 'wlp1s0' added
14:07:13.014412: nl80211: if_removed already cleared - ignore event
14:07:13.017250: dbus: flush_object_timeout_handler: Timeout - sending changed properties of object /fi/w1/wpa_supplicant1/Interfaces/7

14:07:13.060305: dbus: Register network object '/fi/w1/wpa_supplicant1/Interfaces/7/Networks/0'

14:07:13.072845: wlp1s0: Setting scan request: 0 sec 0 usec
14:07:13.073000: wlp1s0: State: INACTIVE -> SCANNING
14:07:13.073054: Scan SSID - hexdump_ascii(len=10):
 55 6e 69 46 69 2d 68 6f 6d 65 UniFi-home  
14:07:13.073078: wlp1s0: Starting AP scan for wildcard SSID
14:07:13.073088: WPS: Building WPS IE for Probe Request
14:07:13.073096: WPS:  * Version (hardcoded 0x10)
14:07:13.073104: WPS:  * Request Type
14:07:13.073111: WPS:  * Config Methods (108)
14:07:13.073129: WPS:  * UUID-E
14:07:13.073137: WPS:  * Primary Device Type
14:07:13.073145: WPS:  * RF Bands (3)
14:07:13.073153: WPS:  * Association State
14:07:13.073160: WPS:  * Configuration Error (0)
14:07:13.073167: WPS:  * Device Password ID (0)
14:07:13.073175: WPS:  * Device Name
14:07:13.073184: P2P: * P2P IE header
14:07:13.073192: P2P: * Capability dev=25 group=00
14:07:13.073200: P2P: * Listen Channel: Regulatory Class 81 Channel 1
14:07:13.077659: Scan requested (ret=0) - scan timeout 30 seconds
14:07:13.077697: nl80211: Event message available
14:07:13.077716: nl80211: Scan trigger
14:07:13.078248: dbus: flush_object_timeout_handler: Timeout - sending changed properties of object /fi/w1/wpa_supplicant1/Interfaces/7


14:07:16.056351: RTM_NEWLINK: operstate=0 ifi_flags=0x1003 ([UP])
14:07:16.056498: RTM_NEWLINK, IFLA_IFNAME: Interface 'wlp1s0' added
14:07:16.056652: nl80211: if_removed already cleared - ignore event
14:07:16.056728: nl80211: Event message available
14:07:16.056810: nl80211: New scan results available
14:07:16.056879: wlp1s0: Ev

Re: AR9462 problems connecting again..

2015-02-22 Thread Linus Torvalds
On Sun, Feb 22, 2015 at 1:50 PM, Linus Torvalds
 wrote:
>
> Ugh. When I add "-dd" to the command line, it has now worked three
> times in a row, when before it worked once out of ten tries.
>
> So my guess is that it's something timing-dependent.

So it stays working with -dd, but I do get *occasional* failures that
then seem to clear up on retry. So it ends up working in the end, but
I think I have a few example failures in the logs.

So for example, from my dmesg, I get this:

[14:07:15] wlp1s0: authenticate with 20:9f:db:e7:80:80
[14:07:15] wlp1s0: send auth to 20:9f:db:e7:80:80 (try 1/3)
[14:07:15] wlp1s0: send auth to 20:9f:db:e7:80:80 (try 2/3)
[14:07:15] wlp1s0: send auth to 20:9f:db:e7:80:80 (try 3/3)
[14:07:15] wlp1s0: authentication with 20:9f:db:e7:80:80 timed out
[14:07:18] wlp1s0: authenticate with 20:9f:db:e7:80:80
[14:07:18] wlp1s0: send auth to 20:9f:db:e7:80:80 (try 1/3)
[14:07:18] wlp1s0: authenticated
[14:07:18] wlp1s0: associate with 20:9f:db:e7:80:80 (try 1/3)
[14:07:18] wlp1s0: RX AssocResp from 20:9f:db:e7:80:80 (capab=0x431
status=0 aid=16)
[14:07:18] wlp1s0: associated
[14:07:22] wlp1s0: deauthenticated from 20:9f:db:e7:80:80 (Reason:
2=PREV_AUTH_NOT_VALID)

with another failure at 14:07:22, but then it ends up working a bit
later at 14:07:33.

I've got the wpa supplicant log for this timeframe, but I'd rather not
send it out in public. I see "[REMOVED] for what looks like the key
data, but there's a lot of other hex data. Is any of it sensitive?

  Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: AR9462 problems connecting again..

2015-02-22 Thread Linus Torvalds
On Sun, Feb 22, 2015 at 11:39 AM, Adrian Chadd  wrote:
>
> Hm, can you enable wpa debugging to log everything whilst it's
> associating / reassociating?

Ugh. When I add "-dd" to the command line, it has now worked three
times in a row, when before it worked once out of ten tries.

So my guess is that it's something timing-dependent.

Or it's something where once it starts working, it stays working until
I reboot. I'll try that.

 Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ath9k-devel] AR9462 problems connecting again..

2015-02-22 Thread Linus Torvalds
On Sun, Feb 22, 2015 at 10:58 AM, Dave Taht  wrote:
>
> Hint: Several unifi (and most ubnt) products are well supported by
> openwrt directly,

I want Linux to "just work". None of this "oh, you can change
something else and it probably works".  I want to fix the problem in
*linux*.

There's clearly something wrong with the AR9462 driver and/or how it
uses the wireless infrastructure, and it should be fixed. Not worked
around with "use some other AP software".

Especially since this has happened before.

Besides, the reason I'm trying to use UniFi is because I want to have
seamless roaming ("zero-handoff"). And I do *not* want to play the
endless openwrt configuration games in the hopes I can get something
like that working. I've tried openwrt, and I don't like tinkering with
my AP's. I just want things to work out of the box.

  Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: AR9462 problems connecting again..

2015-02-22 Thread Linus Torvalds
On Sun, Feb 22, 2015 at 10:24 AM, Adrian Chadd  wrote:
>
> Just a wild shot - try disabling fast authentication and see if that
> makes a difference?
>
> wpa_supplicant.conf:
>
> fast_reauth=0
>
> I recall having issues with fast_reauth once, but I never stuck around
> that location long enough to debug it.

Nope. Did that, killed wpa_supplicant (which restarts it), tried
connecting, still failed.

  Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: AR9462 problems connecting again..

2015-02-22 Thread Linus Torvalds
On Sat, Feb 21, 2015 at 10:50 PM, Sujith Manoharan  wrote:
>
> Can you please post the output of 'iw dev wlp1s0 scan' ?

Attached.

It's the "UniFi-home" SSID that doesn't work. The 1gnoraNT one is the
old working one that I'm obviously associated with, and that has
multiple AP's.

(The UniFi-home also has two AP's, but they should all show up as a
single network)

 Linus


out
Description: Binary data


AR9462 problems connecting again..

2015-02-21 Thread Linus Torvalds
So I've had problems connecting to some networks before on my
Chromebook Pixel, but now I'm testing a new Ubiquiti network at home,
and can see this issue at home too.

I know the wireless works, because other devices work fine on that
network. Also, I know the AR9462 works, because I still have my old
network up and it connects to that.

And it *occasionally* connects to the new one. But it's rare, and it
clearly has problems.

It looks something like this:

[   73.757869] wlp1s0: authenticate with 20:9f:db:e7:80:80
[   73.771471] wlp1s0: send auth to 20:9f:db:e7:80:80 (try 1/3)
[   73.773706] wlp1s0: authenticated
[   73.775122] wlp1s0: associate with 20:9f:db:e7:80:80 (try 1/3)
[   73.787434] wlp1s0: RX AssocResp from 20:9f:db:e7:80:80
(capab=0x431 status=0 aid=9)
[   73.787573] wlp1s0: associated
[   77.784931] wlp1s0: deauthenticated from 20:9f:db:e7:80:80 (Reason:
2=PREV_AUTH_NOT_VALID)

and the password I used definitely is right, and sometimes works.
Despite that PREV_AUTH_NOT_VALID thing.

Any suggestions for what I should do to give you guys any sane and
useful debug output?

  Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Wireless scanning while turning off the radio problem..

2015-01-18 Thread Linus Torvalds
On Mon, Jan 19, 2015 at 5:48 AM, Arend van Spriel  wrote:
>
> So as you indicated you were in location where none of your configured
> networks were available. Flipping the rfkill switch in that situation is the
> way to trigger the issue.

So you certainly seem to be able to explain the behavior I saw under
the circumstances they happened.

I suspect the best thing to do is to just apply your patch. I may not
be able to really test it much for the next few days anyway. Emmanuel?

 Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Wireless scanning while turning off the radio problem..

2015-01-18 Thread Linus Torvalds
On Sun, Jan 18, 2015 at 11:24 PM, Emmanuel Grumbach  wrote:
>
> we have different scan flows based on the firmware version you have,
> so it would help if you could tell me what firmware you have.

Sure. It's the larest one I could find

   iwlwifi :01:00.0: loaded firmware version 23.11.10.0 op_mode iwlmvm

with the actual firmware file being 'iwlwifi-7260-10.ucode' from the
current linux-firmware tree.

Iin a different email Arend van Spriel  wrote:
>
> The function iwl_trans_pcie_stop_device() put device in low-power and
> resets the cpu on the device.  So iwl_op_mode_hw_rf_kill ends up in
> iwl_mvm_set_hw_rfkill_state which schedules cfg80211_rfkill_sync_work
> and returns true if firmware is running.  The patch below might work.

Any suggestions for how to best try to trigger this for testing?
Looking at my logs, it turns out that I actually got this three times,
but they were all on the same boot, and I think the first case might
just have triggered the later ones.

The trigger was turning off wifi from the wifi settings app due to
being in an airplane when they were closing the doors. I don't *think*
there was actually any wifi around at the time, which may or may not
have made the scanning take longer and made it easier to trigger.

But I've done it before (although this machine has been upgraded to
F21 reasonably recently, and I did update the ucode file before the
trip). And I did it afterwards to test. And it happened that one time
(and then apparently kept happening during suspend/resume/shutdown,
but as mentioned, I blame that on some sticky problem from the first
time, and those events in turn happened because I couldn't get
wireless to work afterwards).

IOW, I'm not at all sure I can recreate it, so your "analyzing the
source code for how this could happen" may be the only good way..

   Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Wireless scanning while turning off the radio problem..

2015-01-17 Thread Linus Torvalds
So there seems to be some issue with unlucky timing when turning off
wireless while the driver is busy scanning. I can't reproduce this, so
it's a one-off, but it's not just ugly warnings, the kernel woudln't
scan any wireless on that device afterwards and I had to reboot to get
networking back, so there is some long-term damage.

This is with Intel wireless (iwlwifi, it's a iwl N7260 thing, rev
0x144 if anybody cares) , but the warning callbacks don't seem to be
iwl-specific.

This was a recent top-of-git kernel (3.19.0-rc4-00241-gfc7f0dd38172 to
be exact).

Anybody have any ideas? Anything in particular I should try out to
help possibly get more information?

  Linus

---
[  204.361145] iwlwifi :01:00.0: RF_KILL bit toggled to disable radio.
[  204.362358] [ cut here ]
[  204.362383] WARNING: CPU: 0 PID: 37 at net/wireless/core.c:1011
cfg80211_netdev_notifier_call+0x491/0x500 [cfg80211]()
[  204.362385] Modules linked in: ccm rfcomm fuse ip6t_rpfilter
ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6
nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ebtable_nat
ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_mangle
ip6table_security ip6table_raw ip6table_filter ip6_tables
iptable_mangle iptable_security iptable_raw bnep arc4 vfat fat
x86_pkg_temp_thermal pn544_mei mei_phy pn544 coretemp hci kvm_intel
nfc iTCO_wdt iTCO_vendor_support kvm iwlmvm uvcvideo
snd_hda_codec_realtek microcode snd_hda_codec_generic
snd_hda_codec_hdmi mac80211 videobuf2_vmalloc videobuf2_memops
videobuf2_core v4l2_common snd_hda_intel videodev snd_hda_controller
joydev btusb media hid_multitouch i2c_i801 snd_hda_codec serio_raw
iwlwifi bluetooth snd_hwdep snd_seq cfg80211 snd_seq_device
[  204.362432]  snd_pcm sony_laptop rfkill mei_me snd_timer mei snd
lpc_ich mfd_core shpchp soundcore dm_crypt i915 crct10dif_pclmul
crc32_pclmul crc32c_intel i2c_algo_bit ghash_clmulni_intel
drm_kms_helper drm i2c_core video
[  204.362453] CPU: 0 PID: 37 Comm: kworker/0:1 Not tainted
3.19.0-rc4-00241-gfc7f0dd38172 #14
[  204.362455] Hardware name: Sony Corporation SVP11213CXB/VAIO, BIOS
R0270V7 05/17/2013
[  204.362464] Workqueue: events cfg80211_rfkill_sync_work [cfg80211]
[  204.362467]   c0375870 815eb39a

[  204.362471]  8106c357 8800d3b12890 8800d9e08260
0002
[  204.362475]  8800d3b12000 8800d9e08000 c0350161
8800d365dc00
[  204.362479] Call Trace:
[  204.362490]  [] ? dump_stack+0x40/0x50
[  204.362496]  [] ? warn_slowpath_common+0x77/0xb0
[  204.362506]  [] ?
cfg80211_netdev_notifier_call+0x491/0x500 [cfg80211]
[  204.362513]  [] ? __dev_remove_pack+0x39/0xa0
[  204.362538]  [] ? __unregister_prot_hook+0xcc/0xd0
[  204.362542]  [] ? packet_notifier+0x15c/0x1b0
[  204.362549]  [] ? notifier_call_chain+0x45/0x70
[  204.362552]  [] ? dev_close_many+0xb9/0x110
[  204.362556]  [] ? dev_close.part.87+0x2a/0x40
[  204.362559]  [] ? dev_close+0x19/0x20
[  204.362569]  [] ?
cfg80211_shutdown_all_interfaces+0x3d/0xb0 [cfg80211]
[  204.362577]  [] ?
cfg80211_rfkill_sync_work+0x29/0x30 [cfg80211]
[  204.362580]  [] ? process_one_work+0x135/0x370
[  204.362585]  [] ? pwq_activate_delayed_work+0x27/0x40
[  204.362589]  [] ? worker_thread+0x63/0x480
[  204.362592]  [] ? rescuer_thread+0x2f0/0x2f0
[  204.362596]  [] ? kthread+0xce/0xf0
[  204.362600]  [] ? kthread_create_on_node+0x180/0x180
[  204.362605]  [] ? ret_from_fork+0x7c/0xb0
[  204.362609]  [] ? kthread_create_on_node+0x180/0x180
[  204.362612] ---[ end trace d0ac2826f7d2747f ]---

[  204.362614] [ cut here ]
[  204.362628] WARNING: CPU: 0 PID: 37 at net/mac80211/driver-ops.h:12
ieee80211_request_sched_scan_stop+0xdd/0xf0 [mac80211]()
[  204.362630] wlp1s0:  Failed check-sdata-in-driver check, flags: 0x4
[  204.362631] Modules linked in: ccm rfcomm fuse ip6t_rpfilter
ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6
nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ebtable_nat
ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_mangle
ip6table_security ip6table_raw ip6table_filter ip6_tables
iptable_mangle iptable_security iptable_raw bnep arc4 vfat fat
x86_pkg_temp_thermal pn544_mei mei_phy pn544 coretemp hci kvm_intel
nfc iTCO_wdt iTCO_vendor_support kvm iwlmvm uvcvideo
snd_hda_codec_realtek microcode snd_hda_codec_generic
snd_hda_codec_hdmi mac80211 videobuf2_vmalloc videobuf2_memops
videobuf2_core v4l2_common snd_hda_intel videodev snd_hda_controller
joydev btusb media hid_multitouch i2c_i801 snd_hda_codec serio_raw
iwlwifi bluetooth snd_hwdep snd_seq cfg80211 snd_seq_device
[  204.362677]  snd_pcm sony_laptop rfkill mei_me snd_timer mei snd
lpc_ich mfd_core shpchp soundcore dm_crypt i915 crct10dif_pclmul
crc32_pclmul crc32c_intel i2c_algo_bit ghash_clmulni_intel
drm_kms_helper drm i2c_core video
[  204.362695] CPU: 0 PID: 37 Comm: kworker/0:1 Tainted: GW
  3.19.0-r

Re: [PATCH] Revert "ipw2200: select CFG80211_WEXT"

2015-01-03 Thread Linus Torvalds
On Sat, Jan 3, 2015 at 10:02 AM, Marcel Holtmann  wrote:
>
> why would you revert this? It is obviously the correct change to actually 
> select CFG80211_WEXT.

I don't know about obvious, but yeah, I think the select in this case
is actually the better idea anyway.

We could make the CFG80211_WEXT help message be very negative so that
people aren't encouraged to select it even if they can, but then if
they need the ipw driver it gets selected because of that. Because the
ipw driver is probably the more important of the two if you just
happen to have old hardware but are upgrading yout software (and
anybody who recompiles their own kernel is obviously doing the
latter).

 Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Revert "cfg80211: make WEXT compatibility unselectable"

2015-01-01 Thread Linus Torvalds
On Thu, Jan 1, 2015 at 11:44 AM, Lennart Sorensen
 wrote:
>
> ifconfig seems to just be broken for many cases of perfectly nice features
> in the kernel.

So I'm not saying "ifconfig is wonderful". It's not.

But I *am* saying that "changing user interfaces and then expecting
people to change is f*cking stupid".

The fact is, ifconfig is simple for the simple cases, but more
importantly, a lot of people learnt how to use it. Saying "you should
all change, because we made up a new syntax" is not good policy.

The people who did "ip" could have fairly easily have done a wrapper
around the same code that also left the old "ifconfig" syntax. Then,
distros could have trivially just dropped the old "ifconfig" package,
and entirely replaced it with the new "ip" package.

As it is, we have two different models, and they'll basically stay
around forever.

For something like ifconfig, very few people care. But *all* the same
arguments are true wrt "iw" and "iwconfig".

The people who are trying to deprecate the WEXT interfaces should put
the blame firmly where it belongs - on the people who thought that
"we'll just ignore all old history".

Because people who think that "we'll just redesign everything" are
actually f*cking morons. Really.

There's a real reason the kernel has the "no regression" policy. And
that reason is that I'm not a moron.

History matter. Legacy uses matter.

  Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Revert "cfg80211: make WEXT compatibility unselectable"

2014-12-31 Thread Linus Torvalds
On Wed, Dec 31, 2014 at 1:44 PM, Theodore Ts'o  wrote:
>
> Yeah, the confusing part is that "ip" tends to use "verb object"
> scheme, which is consistent with the Cisco IOS command set it was
> trying to emulate.

Side note: does anybody think that was really a good idea to begin
with? I mean, Cisco iOS is just _s_ universally loved, right?

And yeah, I refuse to use "ip link" or other insane commands. Let's
face it, "ifconfig" and "route" are perfectly fine commands, and a
whole lot less confusing than "ip" with random crap after it.  I'm
really not seeing why that "ip" command was seen as an improvement.

(Ok, "ip route" isn't any more complex than "route", but "ip link"
sure as hell isn't simpler than "ifconfig" for most things I can think
of)

 Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Revert "cfg80211: make WEXT compatibility unselectable"

2014-12-31 Thread Linus Torvalds
On Wed, Dec 31, 2014 at 9:31 AM, Theodore Ts'o  wrote:
>
> Most poeple are still using "route" and "ifconfig" instead of "ip".
> Deal with it.

Indeed. This whole "let's throw out the old and broken" stuff is a disease.

It would have been much better (and it's still an option, as Ted
points out) for the new commands to provide compatibility with what
users - and scripts - have been doing for ages with the old ones.

As it is, this inability for the new tools to just do what the old
tools did clearly just means that not just the old tools, but all the
old infrastructure, will need to be around for years to come.

Thinking you can just start from a clean slate is naive, bordering on
stupid. "New and improved" is only really improved if it also takes
backwards compatibility into account, rather than saying "now
everybody must do things the new and improved - and different - way"

We've succeeded in getting rid of some old interfaces in the kernel,
but it has usually been for some *really* esoteric stuff that nobody
does by hand. And even then it has generally been an uphill battle,
and in most cases we've ended up having the rule that new capabilities
absolutely *have* to be a superset of the old, and we continue to
support the old model using the new code.

It's entirely possible that we might be able to cut down on the WEXT
support a tiny bit by slowly removing some parts of it that nobody
uses and depends on, but the whole "let's just make it a non-option"
was clearly just a drug-fueled bad fantasy.

 Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html