Re: [PATCH v2 00/19] prevent bounds-check bypass via speculative execution

2018-01-12 Thread Russell King - ARM Linux
Do you think that the appropriate patches could be copied to the
appropriate people please?

On Thu, Jan 11, 2018 at 04:46:24PM -0800, Dan Williams wrote:
> Changes since v1 [1]:
> * fixup the ifence definition to use alternative_2 per recent AMD
>   changes in tip/x86/pti (Tom)
> 
> * drop 'nospec_ptr' (Linus, Mark)
> 
> * rename 'nospec_array_ptr' to 'array_ptr' (Alexei)
> 
> * rename 'nospec_barrier' to 'ifence' (Peter, Ingo)
> 
> * clean up occasions of 'variable assignment in if()' (Sergei, Stephen)
> 
> * make 'array_ptr' use a mask instead of an architectural ifence by
>   default (Linus, Alexei)
> 
> * provide a command line and compile-time opt-in to the ifence
>   mechanism, if an architecture provides 'ifence_array_ptr'.
> 
> * provide an optimized mask generation helper, 'array_ptr_mask', for
>   x86 (Linus)
> 
> * move 'get_user' hardening from '__range_not_ok' to '__uaccess_begin'
>   (Linus)
> 
> * drop "Thermal/int340x: prevent bounds-check..." since userspace does
>   not have arbitrary control over the 'trip' index (Srinivas)
> 
> * update the changelog of "net: mpls: prevent bounds-check..." and keep
>   it in the series to continue the debate about Spectre hygiene patches.
>   (Eric).
> 
> * record a reviewed-by from Laurent on "[media] uvcvideo: prevent
>   bounds-check..."
> 
> * update the cover letter
> 
> [1]: https://lwn.net/Articles/743376/
> 
> ---
> 
> Quoting Mark's original RFC:
> 
> "Recently, Google Project Zero discovered several classes of attack
> against speculative execution. One of these, known as variant-1, allows
> explicit bounds checks to be bypassed under speculation, providing an
> arbitrary read gadget. Further details can be found on the GPZ blog [2]
> and the Documentation patch in this series."
> 
> This series incorporates Mark Rutland's latest ARM changes and adds
> the x86 specific implementation of 'ifence_array_ptr'. That ifence
> based approach is provided as an opt-in fallback, but the default
> mitigation, '__array_ptr', uses a 'mask' approach that removes
> conditional branches instructions, and otherwise aims to redirect
> speculation to use a NULL pointer rather than a user controlled value.
> 
> The mask is generated by the following from Alexei, and Linus:
> 
> mask = ~(long)(_i | (_s - 1 - _i)) >> (BITS_PER_LONG - 1);
> 
> ...and Linus provided an optimized mask generation helper for x86:
> 
> asm ("cmpq %1,%2; sbbq %0,%0;"
>   :"=r" (mask)
>   :"r"(sz),"r" (idx)
>   :"cc");
> 
> The 'array_ptr' mechanism can be switched between 'mask' and 'ifence'
> via the spectre_v1={mask,ifence} command line option, and the
> compile-time default is set by selecting either CONFIG_SPECTRE1_MASK or
> CONFIG_SPECTRE1_IFENCE.
> 
> The 'array_ptr' infrastructure is the primary focus this patch set. The
> individual patches that perform 'array_ptr' conversions are a point in
> time (i.e. earlier kernel, early analysis tooling, x86 only etc...)
> start at finding some of these gadgets.
> 
> Another consideration for reviewing these patches is the 'hygiene'
> argument. When a patch refers to hygiene it is concerned with stopping
> speculation on an unconstrained or insufficiently constrained pointer
> value under userspace control. That by itself is not sufficient for
> attack (per current understanding) [3], but it is a necessary
> pre-condition.  So 'hygiene' refers to cleaning up those suspect
> pointers regardless of whether they are usable as a gadget.
> 
> These patches are also be available via the 'nospec-v2' git branch
> here:
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/djbw/linux nospec-v2
> 
> Note that the BPF fix for Spectre variant1 is merged in the bpf.git
> tree [4], and is not included in this branch.
> 
> [2]: 
> https://googleprojectzero.blogspot.co.uk/2018/01/reading-privileged-memory-with-side.html
> [3]: https://spectreattack.com/spectre.pdf
> [4]: 
> https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git/commit/?id=b2157399cc98
> 
> ---
> 
> Dan Williams (16):
>   x86: implement ifence()
>   x86: implement ifence_array_ptr() and array_ptr_mask()
>   asm-generic/barrier: mask speculative execution flows
>   x86: introduce __uaccess_begin_nospec and ASM_IFENCE
>   x86: use __uaccess_begin_nospec and ASM_IFENCE in get_user paths
>   ipv6: prevent bounds-check bypass via speculative execution
>   ipv4: prevent bounds-check bypass via speculative execution
>   vfs, fdtable: prevent bounds-check bypass via speculative execution
>   userns: prevent bounds-check bypass via speculative execution
>   udf: prevent bounds-check bypass via speculative execution
>   [media] uvcvideo: prevent bounds-check bypass via speculative execution
>   carl9170: prevent bounds-check bypass via speculative execution
>   p54: prevent bounds-check bypass via speculative execution
>   qla2xxx: prevent bounds-check bypass via speculative execution
>   

Re: AP mode with Broadcom 4330

2017-11-14 Thread Russell King - ARM Linux
Arend,

Did this bug ever get fixed, or are 4330's still ending up advertising
BCRM_TEST_SSID when they're put into AP mode with mainline kernels?

It would be good to know whether I can drop my patch for this from my
kernel tree and still have working AP mode.

Thanks.

On Mon, Jul 31, 2017 at 10:28:50PM +0200, Arend van Spriel wrote:
> On 31-07-17 14:59, Russell King - ARM Linux wrote:
> > On Fri, Jul 28, 2017 at 09:50:21PM +0200, Arend van Spriel wrote:
> >> I was going to agree with you, but having second thoughts. There are
> >> actually two use-cases that need to be handled properly. The regular AP
> >> case and the MBSS case. In case of MBSS the initial AP interface will
> >> have mbss set to false and subsequent AP interfaces will have mbss set
> >> to true, but in firmware this has to be configured inverted. That is
> >> what the code above is doing. However, this indeed breaks the regular AP
> >> case for firmwares that abuse that setting for testing purposes (no idea
> >> why that is in a released firmware).
> > 
> > Maybe detect the BCRM_TEST_SSID string in the firmware file (as it's
> > broken up amongst other data, it's not trivial) and disable mbss for
> > such firmware?  Alternatively, maybe blacklist mbss for some firmware
> > versions?
> 
> Well. It seem 43362 chip also had this and we disabled mbss for that
> chipset. So we may do that for 4330 as well.
> 
> > Do the firmware versions that include this "abuse" actually have
> > functional mbss?
> 
> Digging through our internal bug database I found a remark that
> BRCM_TEST_SSID showing up means mbss is not functional.
> 
> > There's also the obvious question: which firmware is recommended for
> > the 4330?
> 
> We tend to rely on what is released to AOSP as our team does not have
> the bandwidth to go through the release process. I checked and it is
> still the same so matches what is in linux-firmware.
> 
> Regards,
> Arend

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up
According to speedtest.net: 8.21Mbps down 510kbps up


Re: [EXTERNAL] wl1837: poor performance?

2017-10-14 Thread Russell King - ARM Linux
On Fri, Oct 13, 2017 at 07:43:37AM +, Reizer, Eyal wrote:
> Sorry for top posting.
> Signal level on wlan0 seems low (-79 dbm) are you sure there are antennas
> connected?

Thanks for the reply.

In development environments, it's common to have the AP and station
nearby, which will give good signal.  Out of the development
environment, the AP and station may be separated by several 10s of ft
and objects in the way.  That's the case here, so about -80dBm is to
be expected.

As I mentioned, there's two stations each with different chipsets, both
with the same signal level being reported, both at a similar distance 
from the AP, but one performs way better than the other.  I've found
the reason for this (see below.)

I should also mention that this is installed remotely, and I'm debugging
it over the 'net, involving this very Wi-Fi connection.

The radio environment is not excessively populated - both systems are
within a tower with >2ft thick stone and lime mortar walls which radio
has difficulty penetrating, which is itself in a park with very few
other buildings around.  I'm using the 2.4G channel 8, I've seen some
other APs on channel 1 when outside but almost never inside the tower.

nmcli (used to?) occasionally reports that the wl1837 can see another
AP ("CARWIFI") on channel 1, but that's rare - I suspect it requires
someone to park their car in direct line of literal sight through a
tower window from the machine.  I suspect NM noticed that before I
configured the BSSID of the associated AP as I haven't recently
noticed NM reporting any other networks.  Could also be that non-wifi
enabled cars are parked in those spaces!

> In addition what type of module is used here?

It's a WL1837 integrated onto SolidRun's iMX6 microsom.  I'm not sure
what other information in terms of "module" you're asking for.

> Di you use the configure_device.sh scripts as documented in the
> "wlconf" manual?

No, because quite honestly the wilink tooling is something of a mess
in terms of trying to find the right tooling.  This is what I ended
up doing:

1. cloning git://github.com/TI-OpenLink/18xx-ti-utils
2. taking the wl18xx driver default configuration (from debugfs).
3. taking the ti wilink driver conf.h files.
4. massaging the conf.h files into a form that wlconf can read
5. generating a new struct.bin
6. changing:
   wl18xx.phy.number_of_assembled_ant2_4 = 0x01
   wl18xx.phy.number_of_assembled_ant5 = 0x00

since this variant of the microsom I have only populates one antenna.
(The layout allows for the design to be extended to a second antenna.)

I've since found the _right_ repository for the tooling
(git://git.ti.com/wilink8-wlan/18xx-ti-utils.git), and compared the
resulting files, and confirmed that my new struct.bin is identical to
the one in the right repository.

There are some differences between the two files (- = mine,
+ = configure_device.sh generated):

-core.sg.params = 0x, 0x, 0x, 0x, 0x, 
0x, 0x, 0x, 0x, 0x, 0x, 
0x, 0x, 0x, 0x, 0x, 0x, 
0x, 0x, 0x, 0x, 0x, 0x, 
0x, 0x, 0x, 0x00aa, 0x0032, 0x, 
0x, 0x, 0x00c8, 0x, 0x, 0x, 
0x0001, 0x, 0x003c, 0x, 0x04b0, 0x, 
0x0001, 0x0003, 0x0006, 0x, 0x, 0x0002, 
0x, 0x, 0x0003, 0x, 0x0002, 0x001e, 
0x, 0x, 0x, 0x, 0x, 0x, 
0x, 0x, 0x, 0x, 0x, 0x, 
0x, 0x
+core.sg.params = 0x, 0x, 0x, 0x, 0x, 
0x, 0x, 0x, 0x, 0x, 0x, 
0x, 0x, 0x, 0x, 0x, 0x, 
0x, 0x, 0x, 0x, 0x, 0x, 
0x000f, 0x001b, 0x0011, 0x00aa, 0x0032, 0x0064, 
0x0320, 0x00c8, 0x00c8, 0x, 0x, 0x, 
0x0001, 0x, 0x003c, 0x1388, 0x04b0, 0x03e8, 
0x0001, 0x0003, 0x0006, 0x000a, 0x000a, 0x0002, 
0x0005, 0x001e, 0x0003, 0x000a, 0x0002, 0x, 
0x0019, 0x0019, 0x, 0x, 0x, 0x, 
0x, 0x, 0x, 0x, 0x, 0x, 
0x, 0x
-core.conn.dynamic_ps_timeout = 0x05dc
+core.conn.dynamic_ps_timeout = 0x0096
-core.sched_scan.num_short_intervals = 0x0e
+core.sched_scan.num_short_intervals = 0x0d
-core.fwlog.mem_blocks = 0x00
+core.fwlog.mem_blocks = 0x02
-wl18xx.ap_sleep.max_stations_thresh = 0x00
-wl18xx.ap_sleep.idle_conn_thresh = 0x00
+wl18xx.ap_sleep.max_stations_thresh = 0x04
+wl18xx.ap_sleep.idle_conn_thresh = 0x08

The core.sg.param 

wl1837: poor performance?

2017-10-12 Thread Russell King - ARM Linux
On Thu, Oct 12, 2017 at 10:59:25AM +0100, Russell King - ARM Linux wrote:
> It looks like ti wilink is unmaintained, so I've added some people who
> have touched the driver recently.
> 
> Running wl1837 on a Hummingboard2 (iMX6 Dual core) I've seen one instance
> of the warning below.  Luckily, the recovery worked and connectivity was
> maintained.

I also have a question about seemingly poor performance of this driver,
so starting a new thread.

When running a ping on two clients (and only two clients) connected to
a single AP:

AP: Broadcom brcmfmac bcm4330 Linux machine.

With a ping running on each client machine, the AP reports:
# iw wlan0 station dump
Station a4:xx:xx:xx:xx:xx (on wlan0) <-- rtl8192eu
inactive time:  0 ms
rx packets: 6846
tx packets: 2108
tx failed:  0
tx bitrate: 13.0 MBit/s
rx bitrate: 19.5 MBit/s
authorized: yes
authenticated:  yes
WMM/WME:yes
TDLS peer:  no
Station 00:0f:00:xx:xx:xx (on wlan0) <-- wl1837
inactive time:  0 ms
rx packets: 589233
tx packets: 360969
tx failed:  737
tx bitrate: 13.0 MBit/s
rx bitrate: 19.5 MBit/s
authorized: yes
authenticated:  yes
WMM/WME:yes
TDLS peer:  no

Client 1: wl1837 using mainline driver:
# iw wlan0 link
Connected to xx:xx:xx:xx:xx:xx (on wlan0)
SSID: 
freq: 2447
RX: 223158 bytes (2307 packets)
TX: 1502828 bytes (1955 packets)
signal: -78 dBm
tx bitrate: 19.5 MBit/s MCS 2

bss flags:  short-preamble short-slot-time
dtim period:2
beacon int: 100

Ping statistics:
64 bytes from 192.168.250.1: icmp_seq=85 ttl=64 time=10.6 ms
64 bytes from 192.168.250.1: icmp_seq=86 ttl=64 time=10.5 ms
64 bytes from 192.168.250.1: icmp_seq=87 ttl=64 time=10.9 ms
64 bytes from 192.168.250.1: icmp_seq=88 ttl=64 time=10.4 ms
64 bytes from 192.168.250.1: icmp_seq=89 ttl=64 time=12.2 ms
64 bytes from 192.168.250.1: icmp_seq=90 ttl=64 time=12.6 ms

90 packets transmitted, 88 received, 2% packet loss, time 89223ms
rtt min/avg/max/mdev = 3.359/58.323/2275.188/273.244 ms, pipe 3


Client 2: rtl8192eu (using vendor driver): 
# iw wlan0 link
Connected to xx:xx:xx:xx:xx:xx (on wlan0)
SSID: 
freq: 2447
signal: -78 dBm
tx bitrate: 65.0 MBit/s

Ping statistics:

64 bytes from 192.168.250.1: icmp_seq=85 ttl=64 time=6.55 ms
64 bytes from 192.168.250.1: icmp_seq=86 ttl=64 time=3.34 ms
64 bytes from 192.168.250.1: icmp_seq=87 ttl=64 time=3.96 ms
64 bytes from 192.168.250.1: icmp_seq=88 ttl=64 time=3.47 ms
64 bytes from 192.168.250.1: icmp_seq=89 ttl=64 time=3.48 ms
64 bytes from 192.168.250.1: icmp_seq=90 ttl=64 time=3.35 ms
90 packets transmitted, 90 received, 0% packet loss, time 89124ms
rtt min/avg/max/mdev = 3.191/4.902/26.607/3.978 ms

The difference is quite marked: the rtl8192eu seems to perform much
better than the wl1837 even though the wl1837 is located closer to
the AP than the rtl8192eu.

Most of the ping times via the wl1837 look to be around 10ms, vs
3ms for the rtl8192eu, as can be seen above.

Individually, the ping times are very similar to the above.

All wifi interfaces have power save disabled, either via the driver
module options where possible, or via iw wlan0 set power_save 0

So, why does wl1837 appear to be performing not as well than the
rtl8192eu?

Are out of tree vendor drivers in fact better than kernel-merged
drivers? :)

The machines are synchronised via NTP across the wifi link as best
they can manage with the irregularity in the network (the rtl8192eu
syncs /way/ better than the wl1837 - rtl8192eu is always sync'd to
within 250us, the wl1837 keeps reporting milliseconds offset in the
NTP loop stats):

>From the AP:
23:54:18.026441 IP 192.168.250.4 > 192.168.250.1: ICMP echo request, id 1112, 
seq 110, length 64
23:54:18.026604 IP 192.168.250.1 > 192.168.250.4: ICMP echo reply, id 1112, seq 
110, length 64
23:54:19.036396 IP 192.168.250.4 > 192.168.250.1: ICMP echo request, id 1112, 
seq 111, length 64
23:54:19.036560 IP 192.168.250.1 > 192.168.250.4: ICMP echo reply, id 1112, seq 
111, length 64
23:54:20.036874 IP 192.168.250.4 > 192.168.250.1: ICMP echo request, id 1112, 
seq 112, length 64
23:54:20.037025 IP 192.168.250.1 > 192.168.250.4: ICMP echo reply, id 1112, seq 
112, length 64

>From the wl1837 client:
23:54:18.028504 IP 192.168.250.4 > 192.168.250.1: ICMP echo request, id 1112, 
seq 110, length 64
23:54:18.032074 IP 192.168.250.1 > 192.168.250.4: ICMP echo reply, id 1112, seq 
110, length 64
23:54:19.030464 IP 192.168.250.4 > 192.168.250.1: ICMP echo request, id 1112, 
seq 111, length 64
23:54:19.043282 IP 192.168.250.1 > 192.168.250.4: ICMP echo reply, id 1112, seq 
111, length 64
23:54:20.031082 IP 192.168.250.4 > 192.1

wl1837: ERROR SW watchdog interrupt received! starting recovery

2017-10-12 Thread Russell King - ARM Linux
It looks like ti wilink is unmaintained, so I've added some people who
have touched the driver recently.

Running wl1837 on a Hummingboard2 (iMX6 Dual core) I've seen one instance
of the warning below.  Luckily, the recovery worked and connectivity was
maintained.

...
wlcore: Association completed.

After 19532s from boot, I saw:

wlcore: ERROR SW watchdog interrupt received! starting recovery.
[ cut here ]
WARNING: CPU: 0 PID: 244 at drivers/net/wireless/ti/wlcore/main.c:796 
wl12xx_queue_recovery_work+0x68/0x70 [wlcore]
Modules linked in: nfsd wl18xx wlcore mac80211 cfg80211 caam_jr imx_media_ic(C) 
imx_media_vdic(C) snd_soc_imx_sgtl5000 snd_soc_fsl_asoc_card imx_media_csi(C) 
imx_media_capture(C) snd_soc_imx_audmux wlcore_sdio snd_soc_sgtl5000 mux_mmio 
video_mux mux_core ci_hdrc_imx ci_hdrc caam udc_core usbmisc_imx imx_sdma 
imx2_wdt coda v4l2_mem2mem videobuf2_v4l2 rc_cec imx_vdoa videobuf2_dma_contig 
videobuf2_core videobuf2_vmalloc videobuf2_memops imx_thermal snd_soc_fsl_ssi 
imx_pcm_dma imx_media(C) dw_hdmi_ahb_audio dw_hdmi_cec imx_media_common(C) 
v4l2_fwnode etnaviv
CPU: 0 PID: 244 Comm: irq/243-wl18xx Tainted: G C  4.14.0-rc1+ #2209
Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
Backtrace:
[] (dump_backtrace) from [] (show_stack+0x18/0x1c)
 r6:6013 r5: r4: r3:
[] (show_stack) from [] (dump_stack+0xa4/0xdc)
[] (dump_stack) from [] (__warn+0xdc/0x108)
 r6:bf376a48 r5: r4: r3:c0a41530
[] (__warn) from [] (warn_slowpath_null+0x28/0x30)
 r10:ee309950 r8:ee30973c r7: r6:ee309788 r5:ee309704 r4:ee3096e0
[] (warn_slowpath_null) from [] 
(wl12xx_queue_recovery_work+0x68/0x70 [wlcore])
[] (wl12xx_queue_recovery_work [wlcore]) from [] 
(wlcore_irq+0x15c/0x174 [wlcore])
 r4:ee3096e0 r3:0001
[] (wlcore_irq [wlcore]) from [] (irq_thread_fn+0x24/0x3c)
 r10:c00a46ec r8:ee349b00 r7:ef2ffc00 r6:ef2ffc00 r5: r4:ee349b00
[] (irq_thread_fn) from [] (irq_thread+0x128/0x1ec)
 r6:0001 r5: r4:ee349b24 r3:0004
[] (irq_thread) from [] (kthread+0x150/0x198)[19532.504033] 
 r10:c00a4784 r9:ef111d10 r8:ee349b00 r7:ee2b9680 r6:ee349c00 r5:
 r4:ee2b9600
[] (kthread) from [] (ret_from_fork+0x14/0x3c)
 r10: r9: r8: r7: r6: r5:c005cc08
 r4:ee349c00 r3:ed9a8000
---[ end trace b35f1ada6f716c27 ]---
wlcore: Hardware recovery in progress. FW ver: Rev 8.9.0.0.75
wlcore: pc: 0x116424, hint_sts: 0x count: 1
wlcore: down
ieee80211 phy0: Hardware restart was requested
wlcore: PHY firmware version: Rev 8.2.0.0.240
wlcore: firmware booted (Rev 8.9.0.0.75)
wlcore: Association completed.

The interrupt, according to /proc/interrupts, shows:

   CPU0   CPU1
243:  32387  0  gpio-mxc   4 Level wl18xx

although that's from about a day or so after boot.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up
According to speedtest.net: 8.21Mbps down 510kbps up


Re: AP mode with Broadcom 4330

2017-08-15 Thread Russell King - ARM Linux
On Tue, Aug 15, 2017 at 10:58:42AM +0100, Russell King - ARM Linux wrote:
> Sorry for the confusion - the problem with iwlwifi turned out to be a
> lack of /dev/random entropy, causing hostapd to forcefully deauth the
> client - that was hidden when running hostapd from systemd and is only
> visible if you run hostapd manually.
> 
> (I have other problems there - manually starting hostapd works every
> time, but when started using systemctl start hostapd, systemctl status
> hostapd always reports that it's started but exited and it definitely
> isn't running... I'm just hitting one problem after another here with
> wireless, I'm quite sure this tech hates me!)
> 
> However, I'm still having problems with the Realtek not getting further
> than _allegedly_ sending the auth frames - I'm not convinced that the
> driver is actually sending anything yet.  I can't see anything suggesting
> it is from iwlwifi in monitor mode, and enabling all the debug for the
> rtl8xxxu driver doesn't give the slightest hint that this driver is
> doing anything remotely useful to transmit these frames.  For instance:
> 
> [41742.979480] usb 1-1: rtl8xxxu_read32(0440)  = 0x000f, len 4
> [41742.985826] usb 1-1: rtl8xxxu_write32(0440) = 0x000f
> [41742.994470] usb 1-1: rtl8xxxu_write8(0480) = 0x04
> [41742.999430] wlan0: send auth to 6c:ad:f8:1d:4c:d9 (try 1/3)
> [41743.206163] wlan0: send auth to 6c:ad:f8:1d:4c:d9 (try 2/3)
> [41743.414148] wlan0: send auth to 6c:ad:f8:1d:4c:d9 (try 3/3)
> [41743.622138] wlan0: authentication with 6c:ad:f8:1d:4c:d9 timed out
> [41743.629100] usb 1-1: rtl8xxxu_write8(0618) = 0x00
> [41743.636490] usb 1-1: rtl8xxxu_write8(0619) = 0x00
> [41743.641573] usb 1-1: rtl8xxxu_write8(061a) = 0x00
> 
> How can it send auth packets without writing to any registers (this is
> with all debug options set in /sys/module/rtl8xxxu/parameters/debug !)
> 
> So, I don't think the 4330 has a problem, I think it's all down to
> the Realtek driver being buggy.

And things get even weirder - if I reboot, rtl8xxxu fails to associate:

[   38.377649] wlan0: authenticate with 6c:ad:f8:1d:4c:d9
[   38.410252] wlan0: send auth to 6c:ad:f8:1d:4c:d9 (try 1/3)
[   38.616678] wlan0: send auth to 6c:ad:f8:1d:4c:d9 (try 2/3)
[   38.824660] wlan0: send auth to 6c:ad:f8:1d:4c:d9 (try 3/3)
[   39.032645] wlan0: authentication with 6c:ad:f8:1d:4c:d9 timed out

if I then rmmod the rtl8xxxu module and reinsert the exact same module:

[   49.914724] wlan0: authenticate with 6c:ad:f8:1d:4c:d9
[   49.935765] wlan0: send auth to 6c:ad:f8:1d:4c:d9 (try 1/3)
[   49.943182] wlan0: authenticated
[   49.956116] wlan0: associate with 6c:ad:f8:1d:4c:d9 (try 1/3)
[   49.971504] wlan0: RX AssocResp from 6c:ad:f8:1d:4c:d9 (capab=0x411 status=0 
aid=2)
[   49.985466] usb 1-1: rtl8xxxu_bss_info_changed: HT supported
[   49.997918] wlan0: associated

This looks like a rtl8xxxu driver / core mac80211 bug, so I'll take it
to the rtl8xxx folk now.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up
According to speedtest.net: 8.21Mbps down 510kbps up


Re: AP mode with Broadcom 4330

2017-08-15 Thread Russell King - ARM Linux
On Tue, Aug 15, 2017 at 11:42:12AM +0200, Arend van Spriel wrote:
> On 15-08-17 10:22, Russell King - ARM Linux wrote:
> >On Mon, Aug 14, 2017 at 11:25:03PM -0700, ros...@gmail.com wrote:
> >>If using rtlwifi, you could try using rtl8xxxu and see if you get
> >>similar results. ¯\_(ツ)_/¯
> >
> >I'm using rtl8xxxu.  I don't think rtlwifi supports the 8192eu - it
> >certainly does not list the device id:
> >Bus 001 Device 002: ID 0bda:818b Realtek Semiconductor Corp.
> >
> >As an extra data point, trying to associate to the 4330 with an Intel
> >client gives:
> >
> >[8821752.691490] wlan0: authenticate with 6c:ad:f8:1d:4c:d9
> >[8821752.693448] wlan0: send auth to 6c:ad:f8:1d:4c:d9 (try 1/3)
> >[8821752.696230] wlan0: authenticated
> >[8821752.697493] wlan0: associate with 6c:ad:f8:1d:4c:d9 (try 1/3)
> >[8821752.700816] wlan0: RX AssocResp from 6c:ad:f8:1d:4c:d9 (capab=0x411 
> >status=0 aid=1)
> >[8821752.704407] wlan0: associated
> >[8821755.814844] wlan0: deauthenticated from 6c:ad:f8:1d:4c:d9 (Reason: 
> >2=PREV_AUTH_NOT_VALID)
> >
> >which gets slightly further but ultimately still fails.
> 
> Hi Russell,
> 
> On vacation this week, but it is raining over here so have some moments to
> kill. Here a couple of things to try:
> 
> 1) try without encryption.
> 2) does hostapd log show anything interesting.
> 3) can you use the Intel client to make a sniff.
> 
> I can check what could trigger the firmware after 3 sec. to deauth. Just not
> sure if I will get to that this week. Could be failing EAPOL handshake.

Sorry for the confusion - the problem with iwlwifi turned out to be a
lack of /dev/random entropy, causing hostapd to forcefully deauth the
client - that was hidden when running hostapd from systemd and is only
visible if you run hostapd manually.

(I have other problems there - manually starting hostapd works every
time, but when started using systemctl start hostapd, systemctl status
hostapd always reports that it's started but exited and it definitely
isn't running... I'm just hitting one problem after another here with
wireless, I'm quite sure this tech hates me!)

However, I'm still having problems with the Realtek not getting further
than _allegedly_ sending the auth frames - I'm not convinced that the
driver is actually sending anything yet.  I can't see anything suggesting
it is from iwlwifi in monitor mode, and enabling all the debug for the
rtl8xxxu driver doesn't give the slightest hint that this driver is
doing anything remotely useful to transmit these frames.  For instance:

[41742.979480] usb 1-1: rtl8xxxu_read32(0440)  = 0x000f, len 4
[41742.985826] usb 1-1: rtl8xxxu_write32(0440) = 0x000f
[41742.994470] usb 1-1: rtl8xxxu_write8(0480) = 0x04
[41742.999430] wlan0: send auth to 6c:ad:f8:1d:4c:d9 (try 1/3)
[41743.206163] wlan0: send auth to 6c:ad:f8:1d:4c:d9 (try 2/3)
[41743.414148] wlan0: send auth to 6c:ad:f8:1d:4c:d9 (try 3/3)
[41743.622138] wlan0: authentication with 6c:ad:f8:1d:4c:d9 timed out
[41743.629100] usb 1-1: rtl8xxxu_write8(0618) = 0x00
[41743.636490] usb 1-1: rtl8xxxu_write8(0619) = 0x00
[41743.641573] usb 1-1: rtl8xxxu_write8(061a) = 0x00

How can it send auth packets without writing to any registers (this is
with all debug options set in /sys/module/rtl8xxxu/parameters/debug !)

So, I don't think the 4330 has a problem, I think it's all down to
the Realtek driver being buggy.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up
According to speedtest.net: 8.21Mbps down 510kbps up


Re: AP mode with Broadcom 4330

2017-08-15 Thread Russell King - ARM Linux
On Mon, Aug 14, 2017 at 11:25:03PM -0700, ros...@gmail.com wrote:
> If using rtlwifi, you could try using rtl8xxxu and see if you get
> similar results. ¯\_(ツ)_/¯

I'm using rtl8xxxu.  I don't think rtlwifi supports the 8192eu - it
certainly does not list the device id:
Bus 001 Device 002: ID 0bda:818b Realtek Semiconductor Corp.

As an extra data point, trying to associate to the 4330 with an Intel
client gives:

[8821752.691490] wlan0: authenticate with 6c:ad:f8:1d:4c:d9
[8821752.693448] wlan0: send auth to 6c:ad:f8:1d:4c:d9 (try 1/3)
[8821752.696230] wlan0: authenticated
[8821752.697493] wlan0: associate with 6c:ad:f8:1d:4c:d9 (try 1/3)
[8821752.700816] wlan0: RX AssocResp from 6c:ad:f8:1d:4c:d9 (capab=0x411 
status=0 aid=1)
[8821752.704407] wlan0: associated
[8821755.814844] wlan0: deauthenticated from 6c:ad:f8:1d:4c:d9 (Reason: 
2=PREV_AUTH_NOT_VALID)

which gets slightly further but ultimately still fails.

> On Tue, 2017-08-15 at 00:30 +0100, Russell King - ARM Linux wrote:
> > On Fri, Jul 28, 2017 at 09:50:21PM +0200, Arend van Spriel wrote:
> > > On 28-07-17 19:49, Russell King - ARM Linux wrote:
> > > > Replacing that "(!mbss)" with "mbss" results in AP mode working
> > > > on the
> > > > 4330.  However, I suspect:
> > > > 
> > > > if (brcmf_feat_is_enabled(ifp, BRCMF_FEAT_MBSS))
> > > > brcmf_fil_iovar_int_set(ifp, "mbss", mbss);
> > > > 
> > > > actually makes much more sense.
> > > > 
> > > > Given that this is direct firmware interaction, I can't say which
> > > > is
> > > > correct - all I can say is that mainline kernels are currently
> > > > broken.
> > > 
> > > Indeed. I have to come up with a proper fix for both scenarios.
> > > Thanks
> > > for the report.
> > 
> > I'm now on 4.13-rc4, and running with my patch.  I've configured AP
> > mode:
> > 
> > Interface wlan0
> > ifindex 3
> > wdev 0x1
> > addr 6c:ad:f8:1d:4c:d9
> > ssid Time
> > type AP
> > wiphy 0
> > channel 5 (2432 MHz), width: 20 MHz, center1: 2432 MHz
> > 
> > using NetworkManager with a WPA2 key.  However, a Realtek RTL8192EU
> > client is trying to connect to it, but is unable:
> > 
> > [ 1750.497436] wlan0: authenticate with 6c:ad:f8:1d:4c:d9
> > [ 1750.530929] wlan0: send auth to 6c:ad:f8:1d:4c:d9 (try 1/3)
> > [ 1750.738795] wlan0: send auth to 6c:ad:f8:1d:4c:d9 (try 2/3)
> > [ 1750.946780] wlan0: send auth to 6c:ad:f8:1d:4c:d9 (try 3/3)
> > [ 1751.154764] wlan0: authentication with 6c:ad:f8:1d:4c:d9 timed out
> > 
> > The antennas are definitely within range (finger to thumb distance)
> > and the Realtek can definitely see the BCM4330 in AP mode.  I don't
> > see the interrupts for the SDIO interface increment while the client
> > tries to connect.
> > 
> > The WPA2 password is definitely the same on both ends.
> > 
> > Any ideas how to debug this?
> > 
> > Thanks.
> > 

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up
According to speedtest.net: 8.21Mbps down 510kbps up


Re: AP mode with Broadcom 4330

2017-08-14 Thread Russell King - ARM Linux
On Fri, Jul 28, 2017 at 09:50:21PM +0200, Arend van Spriel wrote:
> On 28-07-17 19:49, Russell King - ARM Linux wrote:
> > Replacing that "(!mbss)" with "mbss" results in AP mode working on the
> > 4330.  However, I suspect:
> > 
> > if (brcmf_feat_is_enabled(ifp, BRCMF_FEAT_MBSS))
> > brcmf_fil_iovar_int_set(ifp, "mbss", mbss);
> > 
> > actually makes much more sense.
> > 
> > Given that this is direct firmware interaction, I can't say which is
> > correct - all I can say is that mainline kernels are currently broken.
> 
> Indeed. I have to come up with a proper fix for both scenarios. Thanks
> for the report.

I'm now on 4.13-rc4, and running with my patch.  I've configured AP mode:

Interface wlan0
ifindex 3
wdev 0x1
addr 6c:ad:f8:1d:4c:d9
ssid Time
type AP
wiphy 0
channel 5 (2432 MHz), width: 20 MHz, center1: 2432 MHz

using NetworkManager with a WPA2 key.  However, a Realtek RTL8192EU
client is trying to connect to it, but is unable:

[ 1750.497436] wlan0: authenticate with 6c:ad:f8:1d:4c:d9
[ 1750.530929] wlan0: send auth to 6c:ad:f8:1d:4c:d9 (try 1/3)
[ 1750.738795] wlan0: send auth to 6c:ad:f8:1d:4c:d9 (try 2/3)
[ 1750.946780] wlan0: send auth to 6c:ad:f8:1d:4c:d9 (try 3/3)
[ 1751.154764] wlan0: authentication with 6c:ad:f8:1d:4c:d9 timed out

The antennas are definitely within range (finger to thumb distance)
and the Realtek can definitely see the BCM4330 in AP mode.  I don't
see the interrupts for the SDIO interface increment while the client
tries to connect.

The WPA2 password is definitely the same on both ends.

Any ideas how to debug this?

Thanks.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up
According to speedtest.net: 8.21Mbps down 510kbps up


Re: AP mode with Broadcom 4330

2017-07-31 Thread Russell King - ARM Linux
On Fri, Jul 28, 2017 at 09:50:21PM +0200, Arend van Spriel wrote:
> I was going to agree with you, but having second thoughts. There are
> actually two use-cases that need to be handled properly. The regular AP
> case and the MBSS case. In case of MBSS the initial AP interface will
> have mbss set to false and subsequent AP interfaces will have mbss set
> to true, but in firmware this has to be configured inverted. That is
> what the code above is doing. However, this indeed breaks the regular AP
> case for firmwares that abuse that setting for testing purposes (no idea
> why that is in a released firmware).

Maybe detect the BCRM_TEST_SSID string in the firmware file (as it's
broken up amongst other data, it's not trivial) and disable mbss for
such firmware?  Alternatively, maybe blacklist mbss for some firmware
versions?

Do the firmware versions that include this "abuse" actually have
functional mbss?

There's also the obvious question: which firmware is recommended for
the 4330?

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.


Re: AP mode with Broadcom 4330

2017-07-28 Thread Russell King - ARM Linux
On Fri, Jul 28, 2017 at 03:15:03PM +0100, Russell King - ARM Linux wrote:
> Hi,
> 
> I've been struggling yesterday and today trying to configure AP mode
> with the Broadcom 4330 on a SolidRun Hummingboard2, using the 2013
> firmware:
> 
> Firmware version = wl0: Jan 23 2013 17:47:32 version 5.90.195.114 FWID 
> 01-f9e7e464
> 
> People tell me that this works with SR's 3.14 kernel, but I'd prefer
> to use mainline (4.13-rc2).  Whenever I try to configure AP mode via
> Network Manager or hostapd (on Debian Jessie), the SSID I ask for and
> the MAC address does not appear on other wifi clients.  wlan0's
> MAC is 6c:ad:f8:1d:4c:d9.

I've just found the cause of this.  What it comes down to is this commit:

commit a44aa4001a86d46f936ca449e5d6c268446bfae2
Author: Hante Meuleman <meule...@broadcom.com>
Date:   Wed Dec 3 21:05:33 2014 +0100

brcmfmac: add multiple BSS support.

This patch adds support for multiple BSS interfaces (AP). In
total three AP configurations can be created. In order to use
multiple BSS firmware needs to support it.

Reviewed-by: Arend Van Spriel <ar...@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <piete...@broadcom.com>
Signed-off-by: Hante Meuleman <meule...@broadcom.com>
Signed-off-by: Arend van Spriel <ar...@broadcom.com>
Signed-off-by: John W. Linville <linvi...@tuxdriver.com>

which adds this hunk to brcmf_cfg80211_start_ap()

if (dev_role == NL80211_IFTYPE_AP) {
+   if ((brcmf_feat_is_enabled(ifp, BRCMF_FEAT_MBSS)) && (!mbss))
+   brcmf_fil_iovar_int_set(ifp, "mbss", 1);
+

What this is saying is: "if the device supports MBSS, and MBSS was not
requested (from ifp->vif->mbss), then *ENABLE* MBSS."  That's clearly
nonsense.

Replacing that "(!mbss)" with "mbss" results in AP mode working on the
4330.  However, I suspect:

if (brcmf_feat_is_enabled(ifp, BRCMF_FEAT_MBSS))
brcmf_fil_iovar_int_set(ifp, "mbss", mbss);

actually makes much more sense.

Given that this is direct firmware interaction, I can't say which is
correct - all I can say is that mainline kernels are currently broken.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.


AP mode with Broadcom 4330

2017-07-28 Thread Russell King - ARM Linux
Hi,

I've been struggling yesterday and today trying to configure AP mode
with the Broadcom 4330 on a SolidRun Hummingboard2, using the 2013
firmware:

Firmware version = wl0: Jan 23 2013 17:47:32 version 5.90.195.114 FWID 
01-f9e7e464

People tell me that this works with SR's 3.14 kernel, but I'd prefer
to use mainline (4.13-rc2).  Whenever I try to configure AP mode via
Network Manager or hostapd (on Debian Jessie), the SSID I ask for and
the MAC address does not appear on other wifi clients.  wlan0's
MAC is 6c:ad:f8:1d:4c:d9.

However, I have recently noticed that this pops up on clients when
AP mode is enabled:

BSS 00:10:18:f1:f2:f3(on wlan0)
TSF: 80810271 usec (0d, 00:01:20)
freq: 2412
beacon interval: 10 TUs
capability: ESS (0x0001)
signal: -15.00 dBm
last seen: 3203 ms ago
SSID: BRCM_TEST_SSID
Supported rates: 1.0* 2.0* 5.5* 11.0*
DS Parameter set: channel 1
IBSS ATIM window: 0 TUsBSS 52:0d:10:41:e9:99(on wlan0)
TSF: 21849896478 usec (0d, 06:04:09)
freq: 2462
beacon interval: 100 TUs
capability: ESS Privacy ShortPreamble ShortSlotTime (0x0431)
signal: -80.00 dBm
last seen: 3020 ms ago
Information elements from Probe Response frame:
SSID: Virgin Media
Supported rates: 1.0* 2.0* 5.5* 11.0* 6.0 9.0 12.0 18.0
DS Parameter set: channel 11
Country: GB Environment: Indoor/Outdoor
Channels [1 - 13] @ 20 dBm
ERP: 
Extended supported rates: 24.0 36.0 48.0 54.0
HT capabilities:
Capabilities: 0x1ad
RX LDPC
HT20
SM Power Save disabled
RX HT20 SGI
TX STBC
RX STBC 1-stream
Max AMSDU length: 3839 bytes
No DSSS/CCK HT40
Maximum RX AMPDU length 65535 bytes (exponent: 0x003)
Minimum RX AMPDU time spacing: 8 usec (0x06)
HT TX/RX MCS rate indexes supported: 0-15
HT operation:
 * primary channel: 11
 * secondary channel offset: no secondary
 * STA channel width: 20 MHz
 * RIFS: 1
 * HT protection: no
 * non-GF present: 1
 * OBSS non-GF present: 0
 * dual beacon: 0
 * dual CTS protection: 0
 * STBC beacon: 0
 * L-SIG TXOP Prot: 0
 * PCO active: 0
 * PCO phase: 0
Overlapping BSS scan params:
 * passive dwell: 20 TUs
 * active dwell: 10 TUs
 * channel width trigger scan interval: 300 s
 * scan passive total per channel: 200 TUs
 * scan active total per channel: 20 TUs
 * BSS width channel transition delay factor: 5
 * OBSS Scan Activity Threshold: 0.25 %
Extended capabilities: HT Information Exchange Supported, TFS, 
WNM-Sleep Mode, TIM Broadcast, BSS Transition, 6
WMM: * Parameter version 1
 * u-APSD
 * BE: CW 15-1023, AIFSN 3
 * BK: CW 15-1023, AIFSN 7
 * VI: CW 7-15, AIFSN 2, TXOP 3008 usec
 * VO: CW 3-7, AIFSN 2, TXOP 1504 usec
Vendor specific: OUI 00:03:7f, data: 01 01 00 00 ff 7f
RSN: * Version: 1
 * Group cipher: CCMP
 * Pairwise ciphers: CCMP
 * Authentication suites: IEEE 802.1X
 * Capabilities: 1-PTKSA-RC 1-GTKSA-RC (0x)

This is when using this hostapd configuration file:

interface=wlan0
driver=nl80211
ssid=Time
channel=1
hw_mode=g
wpa=2
wpa_passphrase=FooBarBazBat
wpa_pairwise=CCMP TKIP

Enabling tracing via 
/sys/kernel/debug/tracing/events/cfg80211/rdev_start_ap/enable
gives:

 hostapd-2213  [000]  15637.517729: rdev_start_ap: phy0, 
netdev:wlan0(3), AP settings - ssid: Time, band: 0, control freq: 2412, width: 
0, cf1: 2412, cf2: 0, beacon interval: 100, dtim period: 2, hidden ssid: 0, wpa 
versions: 2, privacy: true, auth type: 8, inactivity timeout: 0

So the right SSID is being requested.  Enabling debug (4096+6) in the
brcmfmac driver gives:

brcmfmac: brcmf_sdio_bus_txctl Enter
brcmfmac: brcmf_sdio_dpc Enter
brcmfmac: brcmf_sdio_isr Enter
brcmfmac: brcmf_sdio_dpc Enter
brcmfmac: brcmf_sdio_dpc Dongle reports CHIPACTIVE
brcmfmac: brcmf_sdio_tx_ctrlframe Enter
brcmfmac: brcmf_sdio_bus_rxctl Enter
brcmfmac: brcmf_sdio_isr Enter
brcmfmac: brcmf_sdio_dpc Enter
brcmfmac: brcmf_sdio_readframes Enter
brcmfmac: brcmf_sdio_read_control Enter
brcmfmac: brcmf_fil_iovar_data_get ifidx=0, name=chanspec, len=4
brcmutil: data
: 01 2b 00 00  .+..
brcmfmac: brcmf_cfg80211_get_tx_power Enter

[PATCH 4.10-rc3 00/13] net: dsa: remove unnecessary phy.h include

2017-01-31 Thread Russell King - ARM Linux
Including phy.h and phy_fixed.h into net/dsa.h causes phy*.h to be an
unnecessary dependency for quite a large amount of the kernel.  There's
very little which actually requires definitions from phy.h in net/dsa.h
- the include itself only wants the declaration of a couple of
structures and IFNAMSIZ.

Add linux/if.h for IFNAMSIZ, declarations for the structures, phy.h to
mv88e6xxx.h as it needs it for phy_interface_t, and remove both phy.h
and phy_fixed.h from net/dsa.h.

This patch reduces from around 800 files rebuilt to around 40 - even
with ccache, the time difference is noticable.

In order to make this change, several drivers need to be updated to
include necessary headers that they were picking up through this
include.  This has resulted in a much larger patch series.

I'm assuming the 0-day builder has had 24 hours with this series, and
hasn't reported any further issues with it - the last issue was two
weeks ago (before I became ill) which I fixed over the last weekend.

I'm hoping this doesn't conflict with what's already in net-next...

 arch/mips/cavium-octeon/octeon-platform.c | 4 
 drivers/net/dsa/mv88e6xxx/mv88e6xxx.h | 1 +
 drivers/net/ethernet/broadcom/bgmac.c | 2 ++
 drivers/net/ethernet/cadence/macb.h   | 2 ++
 drivers/net/ethernet/cavium/liquidio/lio_main.c   | 1 +
 drivers/net/ethernet/cavium/liquidio/lio_vf_main.c| 1 +
 drivers/net/ethernet/cavium/liquidio/octeon_console.c | 1 +
 drivers/net/ethernet/freescale/fman/fman_memac.c  | 1 +
 drivers/net/ethernet/marvell/mvneta.c | 1 +
 drivers/net/ethernet/qualcomm/emac/emac-sgmii.c   | 1 +
 drivers/net/usb/lan78xx.c | 1 +
 drivers/net/wireless/ath/ath5k/ahb.c  | 2 +-
 drivers/target/iscsi/iscsi_target_login.c | 1 +
 include/net/dsa.h | 6 --
 net/core/netprio_cgroup.c | 1 +
 net/sunrpc/xprtrdma/svc_rdma_backchannel.c| 1 +
 16 files changed, 20 insertions(+), 7 deletions(-)

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.


Re: ath9k ARMv7 OOPS in v4.8.6, v4.2.8

2016-11-23 Thread Russell King - ARM Linux
On Wed, Nov 23, 2016 at 08:59:17PM +, Jason Cooper wrote:
> As requested on irc:

Thanks.

>  7f0: ea02b   800 
>  7f4: e7970102ldr r0, [r7, r2, lsl #2]
>  7f8: ebfebl  0 
>  7fc: e0844000add r4, r4, r0
>  800: e300a000movwsl, #0
>  804: e28b2001add r2, fp, #1
>  808: e340a000movtsl, #0
>  80c: e3a01004mov r1, #4
>  810: e1aamov r0, sl
>  814: ebfebl  0 <_find_next_bit_le>
>  818: e5953000ldr r3, [r5]
>  81c: e153cmp r0, r3
>  820: e1a0b000mov fp, r0
>  824: e2802008add r2, r0, #8
>  828: baf1blt 7f4 

Okay, so i was 0, so running UP probably isn't going to help.  r7 is
also spec_priv->rfs_chan_spec_scan.

So, I think the question is... how is this NULL - and has it always
been NULL...

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.


Re: ath9k ARMv7 OOPS in v4.8.6, v4.2.8

2016-11-23 Thread Russell King - ARM Linux
On Wed, Nov 23, 2016 at 07:15:39PM +, Jason Cooper wrote:
> --- oops from v4.8.6 #2 --
> [42059.303625] Unable to handle kernel NULL pointer dereference at virtual 
> address 0020
> [42059.311799] pgd = c0004000
> [42059.314522] [0020] *pgd=
> [42059.318162] Internal error: Oops: 17 [#1] SMP ARM
> [42059.322889] Modules linked in: ath9k ath9k_common ath9k_hw ath
> [42059.328809] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.8.6 #37
> [42059.334755] Hardware name: Marvell Armada 370/XP (Device Tree)
> [42059.340613] task: c0b091c0 task.stack: c0b0
> [42059.345176] PC is at ath_cmn_process_fft+0xa0/0x578 [ath9k_common]
> [42059.351388] LR is at ath_cmn_process_fft+0xc4/0x578 [ath9k_common]
> [42059.357598] pc : []lr : []psr: 8153
> [42059.357598] sp : c0b01cd0  ip :   fp : 
> [42059.369127] r10: c0b034d4  r9 : 0069  r8 : 006c
> [42059.374374] r7 :   r6 : dcfbd340  r5 : c0b03da0  r4 : 
> [42059.380930] r3 : 0001  r2 : 0008  r1 : 0004  r0 : 

Well, the good news is that it's reproducable.

It looks like it could be this:

static int
ath_cmn_is_fft_buf_full(struct ath_spec_scan_priv *spec_priv)
{
for_each_online_cpu(i)
ret += relay_buf_full(rc->buf[i]);

where i = 8 (r2) and rc->buf is r7.  That's just a guess though, as
there's precious little to go on with the Code: line - modern GCCs
don't give us much with the Code: line anymore to figure out what's
going on without the exact object files.

e5933000ldr r3, [r3]
e1d330b4ldrhr3, [r3, #4]
e58d3030str r3, [sp, #48]   ; 0x30
ea02b   1c 
e7970102ldr r0, [r7, r2, lsl #2]

What makes me wonder though is that if i=8, that means you must have a
system with 9 online CPUs, which is probably unlikely - or maybe that's
the problem, for_each_online_cpu() is going wrong...

If it's not that line of code, I don't see what else it would be based
on the output of my compiler - there's only one case in my disassembly
that corresponds with the single code line that we have to go on, and
it's this:

 a44:   e5983020ldr r3, [r8, #32]
 a48:   e793010aldr r0, [r3, sl, lsl #2] <===
 a4c:   ebfebl  0 
 a50:   e0844000add r4, r4, r0
 a54:   e59f9434ldr r9, [pc, #1076]
 a58:   e28a2001add r2, sl, #1
 a5c:   e3a01004mov r1, #4
 a60:   e1a9mov r0, r9
 a64:   ebfebl  0 <_find_next_bit_le>
 a68:   e5953000ldr r3, [r5]
 a6c:   e153cmp r0, r3
 a70:   e1a0a000mov sl, r0
 a74:   baf2blt a44 

I'm debating now about whether we need to dump more of the code in the
oops - both before and after the faulting instruction...

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.


Re: [PATCH] hostap: avoid uninitialized variable use in hfa384x_get_rid

2016-01-27 Thread Russell King - ARM Linux
On Wed, Jan 27, 2016 at 02:45:26PM +0100, Arnd Bergmann wrote:
> To ensure we get consistent error handling here, this changes the code
> to only set rlen if we actually read data correctly, which also takes
> care of the warning.

It may be a good idea to do the job better.  Looking at the code:

struct hfa384x_rid_hdr rec;

spin_lock_bh(>baplock);

res = hfa384x_setup_bap(dev, BAP0, rid, 0);
if (!res)
res = hfa384x_from_bap(dev, BAP0, , sizeof(rec));

The only thing which initialises any of "rec" is that function call.
The following lines are:

if (le16_to_cpu(rec.len) == 0) {
/* RID not available */
res = -ENODATA;
}

rlen = (le16_to_cpu(rec.len) - 1) * 2;

So, why give the compiler a hard time as you're doing, why make the code
harder to read.  What's wrong with:

spin_lock_bh(>baplock);

res = hfa384x_setup_bap(dev, BAP0, rid, 0);
if (res)
goto unlock;

res = hfa384x_from_bap(dev, BAP0, , sizeof(rec));
if (res)
goto unlock;

if (le16_to_cpu(rec.len) == 0) {
/* RID not available */
res = -ENODATA;
goto unlock;
}

rlen = (le16_to_cpu(rec.len) - 1) * 2;
if (exact_len && rlen != len) {
printk(KERN_DEBUG "%s: hfa384x_get_rid - RID len mismatch: 
rid=0x%04x, len=%d (expected %d)\n",
   dev->name, rid, rlen, len);
res = -ENODATA;
goto unlock;
}

res = hfa384x_from_bap(dev, BAP0, buf, len);
unlock:
spin_unlock_bh(>baplock);

?

-- 
RMK's Patch system: http://www.arm.linux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 000/182] Rid struct gpio_chip from container_of() usage

2015-12-09 Thread Russell King - ARM Linux
On Wed, Dec 09, 2015 at 02:08:35PM +0100, Linus Walleij wrote:
> Because we want to have a proper userspace ABI for GPIO chips,
> which involves using a character device that the user opens
> and closes. While the character device is open, the underlying
> kernel objects must not go away.

Okay, so you stop the gpio_chip struct from going away.  What
about the code which is called via gpio_chip - say, if userspace
keep shte chardev open, and someone rmmod's the driver providing
the GPIO driver.

I'm not sure that splitting up objects in this way really solves
anything at all.  Yes, it divorses the driver's private data from
the subsystem data, but is that really an advantage?

Network drivers have a similar issue, and the way this problem is
solved there is that alloc_netdev() is always used to allocate the
subsystem data structure and any driver private data structure as
one allocation, and the lifetime of both objects remains under the
control of the subsystem.

The allocated memory is only freed when the last user goes away,
and net has protection to prevent an unregistered driver from
being called (via locks on every path into the layer.)

Things get a little more complex with gpio, because there's the
issue that some methods are spinlocked while others can take
semaphores, but it should be possible to come up with a solution
to that - maybe an atomic_t which is incremented whenever we're
in some operation provided it's >= 0 (otherwise it fails), and
decremented when the operation completes.  We can then control
in the unregistration path further GPIO accesses, and also
prevent new accesses occuring by setting the atomic_t to -1.
This shouldn't require any additional locking in any path.  It
does mean that the unregistration path needs careful thought to
ensure that when we set it to -1, we wait for it to be dropped
by the appropriate amount.

-- 
RMK's Patch system: http://www.arm.linux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: using DMA-API on ARM

2014-12-09 Thread Russell King - ARM Linux
On Tue, Dec 09, 2014 at 11:19:40AM +0100, Arend van Spriel wrote:
 The issue did not trigger overnight so it seems setting bit 22 Shared
 Attribute _Override_ Enable solves the issue over here. Now the question is
 how to move forward with this. As I understood from Catalin this patch was
 not included as it was not considered responsibility of the linux kernel.

It is preferable for firmware to configure the L2 cache appropriately,
which includes things like the prefetch offsets as well as feature bits
like bit 22.

I think what I'll do is queue up a patch which adds a warning if bit 22
is not set, suggesting that firmware is updated to set this bit.

-- 
FTTC broadband for 0.8mile line: currently at 9.5Mbps down 400kbps up
according to speedtest.net.
--
To unsubscribe from this list: send the line unsubscribe linux-wireless in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: using DMA-API on ARM

2014-12-08 Thread Russell King - ARM Linux
On Mon, Dec 08, 2014 at 04:50:43PM +, Catalin Marinas wrote:
 On Mon, Dec 08, 2014 at 04:38:57PM +, Arnd Bergmann wrote:
  On Monday 08 December 2014 17:22:44 Arend van Spriel wrote:
The log: first the ring allocation info is printed. Starting at
16.124847, ring 2, 3 and 4 are rings used for device to host. In this
log the failure is on a read of ring 3. Ring 3 is 1024 entries of each
16 bytes. The next thing printed is the kernel page tables. Then some
OpenWRT info and the logging of part of the connection setup. Then at
1780.130752 the logging of the failure starts. The sequence number is
modulo 253 with ring size of 1024 matches an old entry (read 40,
expected 52). Then the different pointers are printed followed by
the kernel page table. The code does then a cache invalidate on the
dma_handle and the next read the sequence number is correct.
   
How do you invalidate the cache? A dma_handle is of type dma_addr_t
and we don't define an operation for that, nor does it make sense
on an allocation from dma_alloc_coherent(). What happens if you
take out the invalidate?
   
   dma_sync_single_for_cpu(, DMA_FROM_DEVICE) which ends up invalidating 
   the cache (or that is our suspicion).
  
  I'm not sure about that:
  
  static void arm_dma_sync_single_for_cpu(struct device *dev,
  dma_addr_t handle, size_t size, enum dma_data_direction dir)
  {
  unsigned int offset = handle  (PAGE_SIZE - 1);
  struct page *page = pfn_to_page(dma_to_pfn(dev, handle-offset));
  __dma_page_dev_to_cpu(page, offset, size, dir);
  }
  
  Assuming a noncoherent linear (no IOMMU, no swiotlb, no dmabounce) mapping,
  dma_to_pfn will return the correct pfn here, but pfn_to_page will return a
  page pointer into the kernel linear mapping,
 
 Or a highmem page, both should be handled by dma_cache_maint_page().

A valid point, but one which is irrelevant to this thread, because we're
talking about a platform with only 128MB, and a PAGE_OFFSET of 2GB (hence
no highmem):

Memory: 125936K/131072K available (2682K kernel code, 103K rwdata,
 744K rodata, 164K init, 188K bss, 5136K reserved)

Can we stay on-point to getting this problem solved, rather than drifting
off topic please?

-- 
FTTC broadband for 0.8mile line: currently at 9.5Mbps down 400kbps up
according to speedtest.net.
--
To unsubscribe from this list: send the line unsubscribe linux-wireless in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: using DMA-API on ARM

2014-12-05 Thread Russell King - ARM Linux
On Fri, Dec 05, 2014 at 10:22:22AM +0100, Arend van Spriel wrote:
 For our brcm80211 development we are working on getting brcmfmac driver
 up and running on a Broadcom ARM-based platform. The wireless device is
 a PCIe device, which is hooked up to the system behind a PCIe host
 bridge, and we transfer information between host and device using a
 descriptor ring buffer allocated using dma_alloc_coherent(). We mostly
 tested on x86 and seen no issue. However, on this ARM platform
 (single-core A9) we detect occasionally that the descriptor content is
 invalid. When this occurs we do a dma_sync_single_for_cpu() and this is
 retried a number of times if the problem persists. Actually, found out
 that someone made a mistake by using virt_to_dma(va) to get the
 dma_handle parameter. So probably we only provided a delay in the retry
 loop. After fixing that a single call to dma_sync_single_for_cpu() is
 sufficient. The DMA-API-HOWTO clearly states that:
 
 
 the hardware should guarantee that the device and the CPU can access the
 data in parallel and will see updates made by each other without any
 explicit software flushing.
 
 
 So it seems incorrect that we would need to do a dma_sync for this
 memory. That we do need it seems like this memory can end up in
 cache(?), or whatever happens, in some rare condition. Is there anyway
 to investigate this situation either through DMA-API or some low-level
 ARM specific functions.

It's been a long while since I looked at the code, and the code for
dma_alloc_coherent() has completely changed since then with the
addition of CMA.  I'm afraid that anything I would say about it would
not be accurate without research into the possible paths through that
code - it's no longer just a simple allocator.

What you say is correct however: the memory should not have any cache
lines associated with it, if it does, there's a bug somewhere.

Also, the memory will be weakly ordered, which means that writes to such
memory can be reordered.  If ordering matters, barriers should be used.
rmb() and wmb() can be used for this.

(Added Marek for comment on dma_alloc_coherent(), Will for comment on
barrier stuff.)

-- 
FTTC broadband for 0.8mile line: currently at 9.5Mbps down 400kbps up
according to speedtest.net.
--
To unsubscribe from this list: send the line unsubscribe linux-wireless in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: using DMA-API on ARM

2014-12-05 Thread Russell King - ARM Linux
On Fri, Dec 05, 2014 at 10:52:02AM +0100, Arnd Bergmann wrote:
 I'm still puzzled why you'd need a single dma_sync_single_for_cpu()
 after dma_alloc_coherent though, you should not need any. Is it possible
 that the driver accidentally uses __raw_readl() instead of readl()
 in some places and you are just lacking an appropriate barrier?

Digging into the driver, it looks like individual DMA buffers are
allocated (via brcmf_pcie_init_dmabuffer_for_device) and registered
into a commonring layer.

Whenever the buffer is written to, space is first allocated via a call
to brcmf_commonring_reserve_for_write() or
brcmf_commonring_reserve_for_write_multiple(), data written to the
buffer, followed by a call to brcmf_commonring_write_complete().

brcmf_commonring_write_complete() calls two methods at that point:
cr_write_wptr() and cr_ring_bell(), which will be
brcmf_pcie_ring_mb_write_wptr() and brcmf_pcie_ring_mb_ring_bell().

The first calls brcmf_pcie_write_tcm16(), which uses iowrite16(),
which contains the appropriate barrier.  The bell ringing functions
also use ioread*/iowrite*().

So, on the write side, it looks fine from the barrier perspective.

On the read side, brcmf_commonring_get_read_ptr() is used before
a read access to the ring - which calls the cr_update_wptr() method,
which in turn uses an ioread16() call.  After the CPU has read data
from the ring, brcmf_commonring_read_complete() is used, which uses
iowrite16().

So, I don't see a barrier problem on the read side.

However, I did trip over this:

static void *
brcmf_pcie_init_dmabuffer_for_device(struct brcmf_pciedev_info *devinfo,
 u32 size, u32 tcm_dma_phys_addr,
 dma_addr_t *dma_handle)
{
void *ring;
long long address;

ring = dma_alloc_coherent(devinfo-pdev-dev, size, dma_handle,
  GFP_KERNEL);
if (!ring)
return NULL;

address = (long long)(long)*dma_handle;

Casting to (long) will truncate the DMA handle to 32-bits on a 32-bit
architecture, even if it supports 64-bit DMA addresses.  There's a couple
of other places where this same truncation occurs:

address = (long long)(long)devinfo-shared.scratch_dmahandle;

and

address = (long long)(long)devinfo-shared.ringupd_dmahandle;

In any case, wouldn't using a u64 type for address be better - isn't
long long 128-bit on 64-bit architectures?

-- 
FTTC broadband for 0.8mile line: currently at 9.5Mbps down 400kbps up
according to speedtest.net.
--
To unsubscribe from this list: send the line unsubscribe linux-wireless in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: using DMA-API on ARM

2014-12-05 Thread Russell King - ARM Linux
I've been doing more digging into the current DMA code, and I'm dismayed
to see that there's new bugs in it...

commit 513510ddba9650fc7da456eefeb0ead7632324f6
Author: Laura Abbott lau...@codeaurora.org
Date:   Thu Oct 9 15:26:40 2014 -0700

common: dma-mapping: introduce common remapping functions

This uses map_vm_area() to achieve the remapping of pages allocated inside
dma_alloc_coherent().  dma_alloc_coherent() is documented in a rather
round-about way in Documentation/DMA-API.txt:

| Part Ia - Using large DMA-coherent buffers
| --
| 
| void *
| dma_alloc_coherent(struct device *dev, size_t size,
|  dma_addr_t *dma_handle, gfp_t flag)
| 
| void
| dma_free_coherent(struct device *dev, size_t size, void *cpu_addr,
|dma_addr_t dma_handle)
| 
| Free a region of consistent memory you previously allocated.  dev,
| size and dma_handle must all be the same as those passed into
| dma_alloc_coherent().  cpu_addr must be the virtual address returned by
| the dma_alloc_coherent().
| 
| Note that unlike their sibling allocation calls, these routines
| may only be called with IRQs enabled.

Note that very last paragraph.  What this says is that it is explicitly
permitted to call dma_alloc_coherent() with IRQs disabled.

Now, the question is: is it safe to call map_vm_area() with IRQs disabled?
Well, map_vm_area() calls pud_alloc(), pmd_alloc(), and pte_alloc_kernel().
These functions all call into the kernel memory allocator *without*
GFP_ATOMIC - in other words, these allocations are permitted to sleep.
Except, IRQs are off, so it's a bug to call these functions from
dma_alloc_coherent().

Now, if we look at the previous code, it used ioremap_page_range().  This
has the same problem: it needs to allocate page tables, and it can only
do it via functions which may sleep.

If we go back even further, we find that the use of ioremap_page_range()
in dma_alloc_coherent() was introduced by:

commit e9da6e9905e639b0f842a244bc770b48ad0523e9
Author: Marek Szyprowski m.szyprow...@samsung.com
Date:   Mon Jul 30 09:11:33 2012 +0200

ARM: dma-mapping: remove custom consistent dma region

which is the commit which removed my pre-allocated page tables for the
DMA re-mapping region - code which I explicitly had to specifically
avoid this issue.

Obviously, this isn't a big problem, because people haven't reported
that they've hit any of the might_sleep() checks in the memory
allocators, which I think is our only saving grace - but it's still
wrong to the specified calling conditions of the DMA API.

If the problem which you (Broadcom) are suffering from is down to the
issue I suspect (that being having mappings with different cache
attributes) then I'm not sure that there's anything we can realistically
do about that.  There's a number of issues which make it hard to see a
way forward.

One example is that if we allocate memory, we need to be able to change
(or remove) the cacheable mappings associated with that memory.  We'd
need to touch the L1 page table, either to change the attributes of the
section mapping, or to convert the section mapping to a L2 page table
pointer.  We need to change the attributes in a break-flush-make sequence
to avoid TLB conflicts.

However, those mappings may be shared between other CPUs in a SMP system.
So, we would need to flush the TLBs on other CPUs before we could proceed
to create replacement mappings.  That means something like stop_machine()
or sending (and waiting for completion) of an IPI to the other CPUs.  That
is totally impractical due to dma_alloc_coherent() being allowed to be
called with IRQs off.

I'll continue to think about it, but I don't see many possibilities to
satisfy dma_alloc_coherent()'s documented requirements other than by
pre-allocating a chunk of memory at boot time to be served out as
DMA-able memory for these horrid cases.

I don't see much point in keeping the map_vm_area() approach on ARM
even if we did fallback - if we're re-establishing mappings for
the surrounding pages in lowmem, we might as well insert appropriately
attributed mappings for the DMA memory as well.

On the face of it, it would be better to allocate one section at a
time, but my unfortunate experience is that 3.x kernels are a /lot/
more trigger happy with the OOM killer, and the chances of being able
to allocate 1MB of memory at a go after the system has been running
for a while is near-on impossible.  So I don't think that's a reality.

Even if we did break up section mappings in this way, it would also
mean that over time, we'd end up with much of lowmem mapped using 4K
page table entries, which would place significant pressure on the MMU
TLBs.

So, we might just be far better off pre-allocating enough DMA
coherent memory at boot time and be done with it.  Those who want
it dynamic can use CMA instead.

-- 
FTTC broadband for 0.8mile line: currently at 9.5Mbps down 400kbps up
according to 

Re: using DMA-API on ARM

2014-12-05 Thread Russell King - ARM Linux
On Fri, Dec 05, 2014 at 05:38:39PM +, Catalin Marinas wrote:
 On Fri, Dec 05, 2014 at 11:11:14AM +, Russell King - ARM Linux wrote:
  In any case, wouldn't using a u64 type for address be better - isn't
  long long 128-bit on 64-bit architectures?
 
 No, it's still 64-bit. There is no 128-bit integer in the C standard.

Actually, that's a fallicy.

The C99 standard (like previous versions) does not define exactly the
number of bits in each type.  It defines ranks of type, and says that
lower ranks are a subrange of integers with higher ranks (for the same
signed-ness.)  See section 6.2.5.

So, it merely states that:

range(char) = range(short) = range(int) = range(long) = range(long long)

So, an implementation could have:

char: 8  short: 16 int: 16 long: 32 long long: 64
char: 8  short: 16 int: 32 long: 32 long long: 64
char: 8  short: 16 int: 32 long: 64 long long: 64
char: 8  short: 16 int: 64 long: 64 long long: 64

or even:

char: 8  short: 16 int: 32 long: 64 long long: 128

and that would still be compliant with C99, since it continues to meet
the criteria about the required data types specified in the standard.

-- 
FTTC broadband for 0.8mile line: currently at 9.5Mbps down 400kbps up
according to speedtest.net.
--
To unsubscribe from this list: send the line unsubscribe linux-wireless in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html