Re: rtl8723bu: low signal, fails to associate

2018-08-29 Thread James Cameron
On Wed, Aug 29, 2018 at 04:19:01PM -0400, Jes Sorensen wrote:
> On 08/23/2018 09:36 PM, James Cameron wrote:
> > G'day Carlo, Mylene, and James,
> > 
> > Thanks for your earlier reports about RTL8723bu.  Have you any more
> > recent experiences you might share?
> > 
> > I'm evaluating a sample laptop which worked fine with Windows 10, but
> > not very well with Ubuntu 18.04, and kernel v4.15 or kernel v4.18.4.
> > 
> > The laptop is by Hena, model NT16-PRO-C-E, with a wireless device on
> > internal USB (0x0bda:0xb720) which loads rtl8xxxu, identifying as
> > RTL8723BU.
> > 
> > http://dev.laptop.org/~quozl/z/1ft0qv.txt (dmesg)
> > 
> > Symptoms are low RSSI on scan, very short range, and often a failure
> > to associate over a distance of two metres in a radio quiet location.
> > 
> > Symptoms began after first power off, which suggests that device
> > registers programmed by the previous operating system Windows 10 had
> > not been reset by reboot into the Ubuntu 18.04 installer.  The device
> > worked fine in Ubuntu 18.04 before the first power off.
> > 
> > Jes, let me know if there is anything I can do to help.
> 
> It's been a while since I had time to look at the 8723bu support, and
> rtl8xxxu doesn't have BT coexist support. I notice that your laptop does
> load the bluetooth module for 8723bu which I believe fiddles with the
> antenna configuration and is likely to take control of the antennas. I
> suspect this is why you see low signal quality on the WiFi side.
> 
> If you blacklist the BT module, does it work better?

Thanks.  No, it doesn't work any better, or worse.

Method: add "blacklist btusb" to /etc/modprobe.d/blacklist.conf,
regenerate initramfs, and boot.

Also, a difference in symptom between cold and warm boot;

*  on cold boot after 15 seconds of power off, device connects but has
   short range, with received power at monitor of -76dBm,

   http://dev.laptop.org/~quozl/z/1fv7sW.txt (dmesg, cold boot, no btusb)

*  on warm boot with less than 5 seconds of power off, device does not
   connect, dmesg "authentication with xx:yy:zz:aa:bb:cc timed out",
   and probe request, authentication, and association packets are not
   seen by monitor,

   http://dev.laptop.org/~quozl/z/1fv8IO.txt (dmesg, warm boot, no btusb)

In both cases, scan results are normal, and similar signal level.  An
active scan for cold boot, and a passive scan for warm boot.

Monitor device is an ath9k about 30cm away; a radio quiet environment
on a farm.

Speculation; device registers are not being reset.  Would not have
been a problem for RTL8723BU on removable USB.

-- 
James Cameron
http://quozl.netrek.org/


rtl8723bu: low signal, fails to associate

2018-08-23 Thread James Cameron
G'day Carlo, Mylene, and James,

Thanks for your earlier reports about RTL8723bu.  Have you any more
recent experiences you might share?

I'm evaluating a sample laptop which worked fine with Windows 10, but
not very well with Ubuntu 18.04, and kernel v4.15 or kernel v4.18.4.

The laptop is by Hena, model NT16-PRO-C-E, with a wireless device on
internal USB (0x0bda:0xb720) which loads rtl8xxxu, identifying as
RTL8723BU.

http://dev.laptop.org/~quozl/z/1ft0qv.txt (dmesg)

Symptoms are low RSSI on scan, very short range, and often a failure
to associate over a distance of two metres in a radio quiet location.

Symptoms began after first power off, which suggests that device
registers programmed by the previous operating system Windows 10 had
not been reset by reboot into the Ubuntu 18.04 installer.  The device
worked fine in Ubuntu 18.04 before the first power off.

Jes, let me know if there is anything I can do to help.

-- 
James Cameron
http://quozl.netrek.org/


Re: iwlwifi intermittent beacon capture in monitor mode?

2018-03-22 Thread James Cameron
G'day Tyler,

I've seen that kind of behaviour when there are multiple APs with the
same beacon timing, and one or more APs are not backing off.  In my
case the beacons were colliding.  The times without beacons followed a
regular pattern; based on the variance in CPU oscillator clocks of the
APs.  Cooling or heating an AP changed the pattern.  Behaviour also
varied across cards; RF sensitivity of a batch of cards follows a
statistical normal distribution, with a bit of warping caused by
manufacturing test rejects.

Have you access to a spectrum analyser?  You might check what
transmissions are happening at the same time, on or near 2.457 MHz.

Can you exclude all other APs, e.g. by placing the devices inside a
disconnected microwave oven?

Can you monitor the current of the card with a digital storage
oscilloscope?

Can you watch the beacons with an RF probe and an oscilloscope?

Simplest probe is a diode (axial, bandoleer) with leads cut for a
multiple of 2.457 MHz held in oscilloscope probes within an inch or so
of the card antenna.

With both these last two tests, you may see dips corresponding to
beacon transmissions.  If they stop, you know you have a firmware or
software problem.

-- 
James Cameron
http://quozl.netrek.org/


Re: [PATCH] rtlwifi: rtl8723be: Fix loss of signal

2018-02-22 Thread James Cameron
On Thu, Feb 22, 2018 at 02:28:59PM -0600, Larry Finger wrote:
> In commit c713fb071edc ("rtlwifi: rtl8821ae: Fix connection lost problem
> correctly") a problem in rtl8821ae that caused loss of signal was fixed.
> That same problem has now been reported for rtl8723be. Accordingly,
> the ASPM L1 latency has been increased from 0 to 7 to fix the instability.
> 
> Signed-off-by: Larry Finger <larry.fin...@lwfinger.net>
> Cc: Stable <sta...@vger.kernel.org>

Tested-by: James Cameron <qu...@laptop.org>

With both patches applied to v4.15 on OLPC NL3 with rtl8723be.

Nice catch, well done!  May explain some of our problems with
rtl8723be that made me withdraw it from production of laptops.

-- 
James Cameron
http://quozl.netrek.org/


Re: [PATCH v2] ath9k: mark RSSI as invalid if frame received during channel setup

2018-02-15 Thread James Cameron
On Thu, Feb 15, 2018 at 08:52:53AM +, Jean Pierre TOSONI wrote:
> > -Message d'origine-
> > De : qu...@laptop.org [mailto:qu...@laptop.org]
> > Envoyé : jeudi 15 février 2018 08:21
> > À : Kalle Valo
> > Cc : Jean Pierre TOSONI; linux-wireless@vger.kernel.org; ath9k-
> > de...@qca.qualcomm.com
> > Objet : Re: [PATCH v2] ath9k: mark RSSI as invalid if frame received
> > during channel setup
> > 
> > On Thu, Feb 15, 2018 at 07:51:28AM +0200, Kalle Valo wrote:
> > > James Cameron <qu...@laptop.org> writes:
> > >
> > >> On Wed, Feb 14, 2018 at 04:26:42PM +, Jean Pierre TOSONI
> > wrote:
> > >>> ath9k returns a wrong RSSI value for frames received
> > >>> in a 30ms time window after a channel change. The
> > >>> correct value is typically 10dB below the returned value.
> > >>
> > >> How was your correct value determined?
> > >>
> 
> 1) test setup:
> Connecting the AP through coax and attenuators, then making 500 passive scans 
> off-channel, then drawing an histogram of the beacon signals found by the 
> chip. The off-channel period is 108 ms. The probability of being in the 30 ms 
> window is 28%. The histogram shows 2 spikes, one large with the expected 
> value, one small at around +10dB above.
> 
> 2) value determination
> Adjust the delay (CONFIG_HZ=250) by trial and error. 25ms was not enough to 
> completely absorb the +10dB spike in the histogram, while 30ms was enough.
> 
> Do you think of a better approach?

No, I think your approach is fine.  I was curious.  Thanks for explaining.

> Maybe the guys at Qualcomm know the correct value?

Yes, that seems likely.

> > >>> This was found with a Atheros AR9300 Rev:3 chip (WLE350NX /
> > >>> JWX6083 cards), during offchannel scans.
> > >>>
> > >>> Mark the signal value as invalid in this case.
> > >>
> > >> Why not adjust by 10dB?
> 
> I considered that also. But, 
> 1) during how much time should I do this adjustment? Around 30 ms after 
> channel switch?

Yes.  If RSSI is so critical for your application, you'll do what you
can to get a real RSSI rather than drop it.

> 2) The histogram shows a scattering of the measures in a +/- 3dB range around 
> the mean value.

Perhaps a sampling error by the device.

> So I could not decide for sure if it needed -9dB, -10dB or -11dB?
> 
> > >>
> > >> Speculating: in a typical card, RSSI is calculated by firmware
> > from
> > >> readings of ADCs attached to the receiver.  Firmware may average
> > >> several readings.  Firmware may apply other offsets or
> > calibrations,
> > >> based on frequency and temperature.  This sounds like a firmware
> > >> problem.
> > >
> > > ath9k does not have firmware, only ath9k_htc has it.
> > 
> > Heh.  s/firmware/silicon implementation/g
> 
> Oh well, if it's silicon problem, then it's a hardware problem, and
> I am right to correct it that way, since there is no other way :-)

Yes, if it can be reproduced by every ath9k.

-- 
James Cameron
http://quozl.netrek.org/


Re: [PATCH v2] ath9k: mark RSSI as invalid if frame received during channel setup

2018-02-14 Thread James Cameron
On Thu, Feb 15, 2018 at 07:51:28AM +0200, Kalle Valo wrote:
> James Cameron <qu...@laptop.org> writes:
> 
> > On Wed, Feb 14, 2018 at 04:26:42PM +, Jean Pierre TOSONI wrote:
> >> ath9k returns a wrong RSSI value for frames received in a 30ms time
> >> window after a channel change. The correct value is typically 10dB
> >> below the returned value.
> >
> > How was your correct value determined?
> >
> >> This was found with a Atheros AR9300 Rev:3 chip (WLE350NX / JWX6083
> >> cards), during offchannel scans.
> >> 
> >> Mark the signal value as invalid in this case.
> >
> > Why not adjust by 10dB?
> >
> > Speculating: in a typical card, RSSI is calculated by firmware from
> > readings of ADCs attached to the receiver.  Firmware may average
> > several readings.  Firmware may apply other offsets or calibrations,
> > based on frequency and temperature.  This sounds like a firmware
> > problem.
> 
> ath9k does not have firmware, only ath9k_htc has it.

Heh.  s/firmware/silicon implementation/g

-- 
James Cameron
http://quozl.netrek.org/


Re: [PATCH v2] ath9k: mark RSSI as invalid if frame received during channel setup

2018-02-14 Thread James Cameron
On Wed, Feb 14, 2018 at 04:26:42PM +, Jean Pierre TOSONI wrote:
> ath9k returns a wrong RSSI value for frames received in a 30ms time
> window after a channel change. The correct value is typically 10dB
> below the returned value.

How was your correct value determined?

> This was found with a Atheros AR9300 Rev:3 chip (WLE350NX / JWX6083
> cards), during offchannel scans.
> 
> Mark the signal value as invalid in this case.

Why not adjust by 10dB?

Speculating: in a typical card, RSSI is calculated by firmware from
readings of ADCs attached to the receiver.  Firmware may average
several readings.  Firmware may apply other offsets or calibrations,
based on frequency and temperature.  This sounds like a firmware
problem.

> Signed-off-by: Jean Pierre TOSONI <jp.tos...@acksys.fr>
> [...]

-- 
James Cameron
http://quozl.netrek.org/


Re: [PATCH] rtlwifi: rtl8821ae: Fix connection lost problem correctly

2018-02-05 Thread James Cameron
On Mon, Feb 05, 2018 at 12:38:11PM -0600, Larry Finger wrote:
> There has been a coding error in rtl8821ae since it was first introduced,
> namely that an 8-bit register was read using a 16-bit read in
> _rtl8821ae_dbi_read(). This error was fixed with commit 40b368af4b75
> ("rtlwifi: Fix alignment issues"); however, this change led to
> instability in the connection. To restore stability, this change
> was reverted in commit b8b8b16352cd ("rtlwifi: rtl8821ae: Fix connection
> lost problem").
> 
> Unfortunately, the unaligned access causes machine checks in ARM
> architecture, and we were finally forced to find the actual cause of the
> problem on x86 platforms. Following a suggestion from Pkshih
> <pks...@realtek.com>, it was found that increasing the ASPM L1
> latency from 0 to 7 fixed the instability. This parameter was varied to
> see if a smaller value would work; however, it appears that 7 is the
> safest value. A new symbol is defined for this quantity, thus it can be
> easily changed if necessary.
> 
> Fixes: b8b8b16352cd ("rtlwifi: rtl8821ae: Fix connection lost problem")
> Cc: Stable <sta...@vger.kernel.org> # 4.14+
> Fix-suggested-by: Pkshih <pks...@realtek.com>
> Signed-off-by: Larry Finger <larry.fin...@lwfinger.net>

Tested-by: James Cameron <qu...@laptop.org>  # x86_64 OLPC NL3

Thanks Larry & Pkshih, this does work as well as it did before.

-- 
James Cameron
http://quozl.netrek.org/


Re: rtl8821ae keep alive not set, connection lost

2018-01-31 Thread James Cameron
On Wed, Jan 31, 2018 at 11:06:12AM -0600, Larry Finger wrote:
> On 09/12/2017 05:09 PM, James Cameron wrote:
> >Summary: 40b368af4b75 ("rtlwifi: Fix alignment issues") breaks
> >rtl8821ae keep alive, causing "Connection to AP lost" and deauth,
> >but why?
> >
> >Wireless connection is lost after a few seconds or minutes, on
> >every OLPC NL3 laptop with rtl8821ae, with any stable kernel after
> >4.10.1, and any kernel with 40b368af4b75.
> >
> >dmesg contains
> >
> >   wlp2s0: Connection to AP 2c:b0:5d:a6:86:eb lost
> >
> >iw event shows
> >
> >   wlp2s0: del station 2c:b0:5d:a6:86:eb
> >   wlp2s0 (phy #0): deauth 74:c6:3b:09:b5:0d -> 2c:b0:5d:a6:86:eb reason 4: 
> > Disassociated due to inactivity
> >   wlp2s0 (phy #0): disconnected (local request)
> >
> >Workaround is to bounce the link, then reconnect;
> >
> >   ip link set wlp2s0 down
> >   ip link set wlp2s0 up
> >   iw dev wlp2s0 connect qz
> >
> >A nearby monitor host captures a deauthentication packet sent by
> >the device.
> >
> >Bisection showed cause is 40b368af4b75 ("rtlwifi: Fix alignment
> >issues") which changes the width of DBI register read.
> >
> >On the face of it, 40b368af4b75 looks correct, especially compared
> >against same function in rtl8723be.
> >
> >I've no idea why reverting fixes the problem.  I'm hoping someone
> >here might speculate and suggest ways to test.
> >
> >As keep alive is set through this path, my guess is that keep alive
> >is not being set in the device.  Or perhaps reading 16-bits
> >perturbs another register.  Is there a way to test?
> >
> >http://dev.laptop.org/~quozl/z/1drtGD.txt dmesg of 4.13
> >
> >http://dev.laptop.org/~quozl/z/1drt7c.txt dmesg with 4.13 and
> >revert of 40b368af4b75
> 
> James,
> 
> I'm afraid we are needing to revisit this problem again. Changing
> that 8-bit read to a 16-bit version causes an unaligned memory
> reference in AARCH64, thus we will need to re-revert. To prevent
> problems on systems such as yours, PK plans to turn off ASPM
> capability and backdoor in certain platforms that will be listed in
> a quirks table. Please report the output of 'dmidecode -t system'
> for you affected system(s).

Thanks for letting me know.

We made three production runs, and I'm waiting to get a hold of the
dmidecode for two of them.  This may take some weeks; we have to find
stock and ship it, or we have to ask our contract manufacturer (CM) if
they have kept data or units.

I've dmidecode for one production run.

http://dev.laptop.org/~quozl/z/1eh7JF.txt (my unit nl3-e)

I've dmidecode for prototypes, but they have clearly been programmed
badly.  We did not ask our CM for Windows compatibility, so they may
have had no step to verify the data.  We also went through several
iterations to get serial numbers assigned, so the data I have does not
have good provenance.

http://dev.laptop.org/~quozl/z/1eh7EE.txt (my unit nl3-c)
http://dev.laptop.org/~quozl/z/1eh7EV.txt (my unit nl3-d)
http://dev.laptop.org/~quozl/z/1eh7He.txt (my unit nl3-a)
http://dev.laptop.org/~quozl/z/1eh8DR.txt (my unit nl3-b)

> We hope you will be able to test any proposed patches.

Yes, can do.

I've just tested v4.15.

However, I'm concerned about your plan to use quirks;

1.  turning off ASPM may decrease run time on battery, which if it is
significant, across several thousand laptops will yield generator fuel
or solar budget failure; can the power impact be quantified?

2.  why not keep ASPM enabled, and use 8-bit when quirked, or on
x86_64, or when not AARCH64?

3.  why not find the underlying problem; PK is in the same company as
the device firmware engineers, so it should be possible for them to
find out why 16-bit access causes the device firmware to hang?  We
drew a blank trying to reach firmware engineers through our CM and
module maker; perhaps we were not large or noisy enough.

4.  it's not just me; there are others who have reported similar
problems, so won't re-reverting affect them?  They haven't engaged in
the process as thoroughly, and may not be in the quirks table.  You
also reproduced the problem with different hardware.

> Thanks,
> 
> Larry

-- 
James Cameron
http://quozl.netrek.org/


Re: [PATCH v3 1/3] mwifiex: refactor device dump code to make it generic for usb interface

2017-12-04 Thread James Cameron
 - int drv_info_size);
> +void mwifiex_drv_info_dump(struct mwifiex_adapter *adapter);
> +void mwifiex_prepare_fw_dump_info(struct mwifiex_adapter *adapter);
> +void mwifiex_upload_device_dump(struct mwifiex_adapter *adapter);
>  void *mwifiex_alloc_dma_align_buf(int rx_len, gfp_t flags);
>  void mwifiex_queue_main_work(struct mwifiex_adapter *adapter);
>  int mwifiex_get_wakeup_reason(struct mwifiex_private *priv, u16 action,
> diff --git a/drivers/net/wireless/marvell/mwifiex/pcie.c 
> b/drivers/net/wireless/marvell/mwifiex/pcie.c
> index cd31494..f666cb2 100644
> --- a/drivers/net/wireless/marvell/mwifiex/pcie.c
> +++ b/drivers/net/wireless/marvell/mwifiex/pcie.c
> @@ -2769,12 +2769,17 @@ static void mwifiex_pcie_fw_dump(struct 
> mwifiex_adapter *adapter)
>  
>  static void mwifiex_pcie_device_dump_work(struct mwifiex_adapter *adapter)
>  {
> - int drv_info_size;
> - void *drv_info;
> + adapter->devdump_data = vzalloc(MWIFIEX_FW_DUMP_SIZE);
> + if (!adapter->devdump_data) {
> + mwifiex_dbg(adapter, ERROR,
> + "vzalloc devdump data failure!\n");
> + return;
> + }
>  
> - drv_info_size = mwifiex_drv_info_dump(adapter, _info);
> + mwifiex_drv_info_dump(adapter);
>   mwifiex_pcie_fw_dump(adapter);
> - mwifiex_upload_device_dump(adapter, drv_info, drv_info_size);
> + mwifiex_prepare_fw_dump_info(adapter);
> + mwifiex_upload_device_dump(adapter);
>  }
>  
>  static void mwifiex_pcie_card_reset_work(struct mwifiex_adapter *adapter)
> diff --git a/drivers/net/wireless/marvell/mwifiex/sdio.c 
> b/drivers/net/wireless/marvell/mwifiex/sdio.c
> index fd5183c..a828801 100644
> --- a/drivers/net/wireless/marvell/mwifiex/sdio.c
> +++ b/drivers/net/wireless/marvell/mwifiex/sdio.c
> @@ -2505,15 +2505,21 @@ static void mwifiex_sdio_generic_fw_dump(struct 
> mwifiex_adapter *adapter)
>  static void mwifiex_sdio_device_dump_work(struct mwifiex_adapter *adapter)
>  {
>   struct sdio_mmc_card *card = adapter->card;
> - int drv_info_size;
> - void *drv_info;
>  
> - drv_info_size = mwifiex_drv_info_dump(adapter, _info);
> + adapter->devdump_data = vzalloc(MWIFIEX_FW_DUMP_SIZE);
> + if (!adapter->devdump_data) {
> + mwifiex_dbg(adapter, ERROR,
> + "vzalloc devdump data failure!\n");
> + return;
> + }
> +
> + mwifiex_drv_info_dump(adapter);
>   if (card->fw_dump_enh)
>   mwifiex_sdio_generic_fw_dump(adapter);
>   else
>   mwifiex_sdio_fw_dump(adapter);
> - mwifiex_upload_device_dump(adapter, drv_info, drv_info_size);
> + mwifiex_prepare_fw_dump_info(adapter);
> + mwifiex_upload_device_dump(adapter);
>  }
>  
>  static void mwifiex_sdio_work(struct work_struct *work)
> -- 
> 1.9.1
> 

-- 
James Cameron
http://quozl.netrek.org/


Re: rtl8723be on Fedora27

2017-11-21 Thread James Cameron
On Tue, Nov 21, 2017 at 09:52:12PM +0100, Rákosi Gergely wrote:
> 2017-11-21 21:37 keltezéssel, James Cameron írta:
> > On Tue, Nov 21, 2017 at 03:08:16PM +0100, Rákosi Gergely wrote:
> >> 2017-11-18 16:52 keltezéssel, Larry Finger írta:
> >>> On 11/17/2017 06:22 PM, Rákosi Gergely wrote:
> >>>> Hello Larry,
> >>>>
> >>>> First of all, thanks your help.
> >>>> Lets see...here is the kernel version: 4.13.12-300
> >>>> The machine is an Asus ROG 553VE
> >>>>
> >>>> The firmware which loading in the dmesg is : rtlwifi/rtl8723befw_36.bin
> >>>> The output of md5sum is : 1850c1308fbcd95e9f6a7f58ede1e35f
> >>> [...]
> >>> sudo iw dev wlan0 scan | egrep "SSID|signal"
> >>>
> >>> Post that output. In addition, copy the dmesg output to some pastebin
> >>> and post the link as well.
> >>>
> >>> Larry
> >>>
> >> Hello Larry,
> >>
> >> I hope this email post format is good, and fit to the rules.
> >> Here is the output:
> >>
> >> root@skynet-x2 ~]# iw dev wlp2s0 scan | egrep "SSID|signal"
> >>     signal: -46.00 dBm
> >>     SSID: SKYNET-X2
> >> [...]
> >> [root@skynet-x2 ~]#
> > Scan results seem normal.  Was this scan before disconnect?
>
> Yes, this command output taken while the connection was OK

Thanks.

> >> And the dmesg output:
> >>
> >> https://pastebin.com/iqQSu2hD
> > Is now v4.13.13.
>
> Yes, thats the upgraded kernel
>
> > This is interesting, an H2C command was dropped, but no idea which.
> >
> > [9.848052] rtl8723be: error H2C cmd because of Fw download fail!!!
> >
> > Disconnection happened at boot+440 seconds, associate+429 seconds;
> >
> > [  439.871033] rtlwifi: AP off, try to reconnect now
> > [  439.871093] wlp2s0: Connection to AP 4c:5e:0c:c7:fa:e3 lost
> >
> > I cannot tell what causes disconnect.  I wonder if the same
> > timing of the problem happens always, or if the timing varies.
>
> The timing always changing, never is the same. And I dont realize
> the cause at now...

Thanks.  I had similar problem with different wireless device.

> > Init MAC failed was another 30 seconds later;
> >
> > [  468.600670] IPv6: ADDRCONF(NETDEV_UP): wlp2s0: link is not ready
> > [  469.618926] rtl8723be: Init MAC failed
> >
> > Looking at _rtl8723be_init_mac, there are two false returns;
> > hardware power on fail, and llt write fail.
> >
> > Sorry, I don't have rtl8723be hardware.
> >
> > Rákosi, did any older kernel keep connection?
> 
> The oldest kernel is 4.13.9-300.fc27.x86_64 in Fedora 27
> installation, but did the same.  If you can give advise how, and
> what I do, then I'll try it.

You might try running "Live CD" of Fedora 26 or Fedora 25, without
installing, to test if unexpected disconnection happens in two older
kernels.  Not for permanent solution, just for easy testing.

I don't have specific advice for Fedora 27, you might ask Fedora
community about that, or use RHBZ.  A quick search finds this old 2013
page;

https://fedoraproject.org/wiki/User:Ignatenkobrain/Kernel/Bisection

-- 
James Cameron
http://quozl.netrek.org/


Re: rtl8723be on Fedora27

2017-11-21 Thread James Cameron
On Tue, Nov 21, 2017 at 03:08:16PM +0100, Rákosi Gergely wrote:
> 2017-11-18 16:52 keltezéssel, Larry Finger írta:
> > On 11/17/2017 06:22 PM, Rákosi Gergely wrote:
> >> Hello Larry,
> >>
> >> First of all, thanks your help.
> >> Lets see...here is the kernel version: 4.13.12-300
> >> The machine is an Asus ROG 553VE
> >>
> >> The firmware which loading in the dmesg is : rtlwifi/rtl8723befw_36.bin
> >> The output of md5sum is : 1850c1308fbcd95e9f6a7f58ede1e35f
> >
> > [...]
> > sudo iw dev wlan0 scan | egrep "SSID|signal"
> >
> > Post that output. In addition, copy the dmesg output to some pastebin
> > and post the link as well.
> >
> > Larry
> >
> Hello Larry,
> 
> I hope this email post format is good, and fit to the rules.
> Here is the output:
> 
> root@skynet-x2 ~]# iw dev wlp2s0 scan | egrep "SSID|signal"
>     signal: -46.00 dBm
>     SSID: SKYNET-X2
> [...]
> [root@skynet-x2 ~]#

Scan results seem normal.  Was this scan before disconnect?

> And the dmesg output:
> 
> https://pastebin.com/iqQSu2hD

Is now v4.13.13.

This is interesting, an H2C command was dropped, but no idea which.

[9.848052] rtl8723be: error H2C cmd because of Fw download fail!!!

Disconnection happened at boot+440 seconds, associate+429 seconds;

[  439.871033] rtlwifi: AP off, try to reconnect now
[  439.871093] wlp2s0: Connection to AP 4c:5e:0c:c7:fa:e3 lost

I cannot tell what causes disconnect.  I wonder if the same
timing of the problem happens always, or if the timing varies.

Init MAC failed was another 30 seconds later;

[  468.600670] IPv6: ADDRCONF(NETDEV_UP): wlp2s0: link is not ready
[  469.618926] rtl8723be: Init MAC failed

Looking at _rtl8723be_init_mac, there are two false returns; hardware
power on fail, and llt write fail.

Sorry, I don't have rtl8723be hardware.

Rákosi, did any older kernel keep connection?

-- 
James Cameron
http://quozl.netrek.org/


Re: [PATCH v2] mwifiex: do not support change AP interface to station mode

2017-11-21 Thread James Cameron
On Tue, Nov 21, 2017 at 08:03:35PM +0800, Xinming Hu wrote:
> Firmware do not support change interface from micro-ap mode to
> station mode, forbidden this operation in driver accordingly.

"forbidden" should be "forbid", for correct tense.

"in driver" is redundant and can be removed.

"accordingly" is also redundant.

Perhaps "Firmware do not support change interface from micro-ap mode
to station mode, forbid this operation."

> Signed-off-by: Cathy Luo <c...@marvell.com>
> Signed-off-by: Xinming Hu <h...@marvell.com>
> ---
> v2: remove unnecessary sta/uap combo check(James Cameron)
> 
>  drivers/net/wireless/marvell/mwifiex/cfg80211.c | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/drivers/net/wireless/marvell/mwifiex/cfg80211.c 
> b/drivers/net/wireless/marvell/mwifiex/cfg80211.c
> index 6e0d9a9..4d45df8 100644
> --- a/drivers/net/wireless/marvell/mwifiex/cfg80211.c
> +++ b/drivers/net/wireless/marvell/mwifiex/cfg80211.c
> @@ -1180,7 +1180,6 @@ static int mwifiex_deinit_priv_params(struct 
> mwifiex_private *priv)
>   case NL80211_IFTYPE_AP:
>   switch (type) {
>   case NL80211_IFTYPE_ADHOC:

Change interface type from micro-ap to adhoc is supported?

> - case NL80211_IFTYPE_STATION:
>   return mwifiex_change_vif_to_sta_adhoc(dev, curr_iftype,
>      type, params);
>   break;
> -- 
> 1.9.1
> 

-- 
James Cameron
http://quozl.netrek.org/


Re: [EXT] Re: [PATCH] mwifiex: do not support change AP interface to station mode

2017-11-21 Thread James Cameron
On Tue, Nov 21, 2017 at 12:03:19PM +, Xinming Hu wrote:
> Hi James,
> 
> > -Original Message-
> > From: qu...@laptop.org [mailto:qu...@laptop.org]
> > Sent: 2017年11月21日 16:04
> > To: Xinming Hu <h...@marvell.com>
> > Cc: Linux Wireless <linux-wireless@vger.kernel.org>; Kalle Valo
> > <kv...@codeaurora.org>; Brian Norris <briannor...@chromium.org>; Dmitry
> > Torokhov <d...@google.com>; raja...@google.com; Zhiyuan Yang
> > <yan...@marvell.com>; Tim Song <song...@marvell.com>; Cathy Luo
> > <c...@marvell.com>; James Cao <j...@marvell.com>; Ganapathi Bhat
> > <gb...@marvell.com>; Ellie Reeves <ellierev...@gmail.com>
> > Subject: [EXT] Re: [PATCH] mwifiex: do not support change AP interface to
> > station mode
> > 
> > External Email
> > 
> > --
> > On Tue, Nov 21, 2017 at 03:24:03PM +0800, Xinming Hu wrote:
> > > Firmware do not support change interface from micro-ap mode to station
> > > mode, forbidden this operation in driver accordingly.
> > 
> > All firmware or specific versions?
> > 
> 
> This property result from the initial design consideration in
> firmware.

Thanks.  I maintain a product that uses your MV8787 device with
firmware sd8787_uapsta.bin and review mwifiex patches for local
backport.

> 
> > >
> > > Signed-off-by: Cathy Luo <c...@marvell.com>
> > > Signed-off-by: Xinming Hu <h...@marvell.com>
> > > ---
> > >  drivers/net/wireless/marvell/mwifiex/cfg80211.c | 6 ++
> > >  1 file changed, 6 insertions(+)
> > >
> > > diff --git a/drivers/net/wireless/marvell/mwifiex/cfg80211.c
> > > b/drivers/net/wireless/marvell/mwifiex/cfg80211.c
> > > index 6e0d9a9..a87758f 100644
> > > --- a/drivers/net/wireless/marvell/mwifiex/cfg80211.c
> > > +++ b/drivers/net/wireless/marvell/mwifiex/cfg80211.c
> > > @@ -1181,6 +1181,12 @@ static int mwifiex_deinit_priv_params(struct
> > mwifiex_private *priv)
> > >   switch (type) {
> > >   case NL80211_IFTYPE_ADHOC:
> > >   case NL80211_IFTYPE_STATION:
> > > + if (mwifiex_get_priv_by_id(priv->adapter, priv->bss_num,
> > > +MWIFIEX_BSS_TYPE_STA)){
> > 
> > Is this test necessary?
> 
> Hhn, yes, Will remove this check, which comes from a fix for combo sta/uap 
> case.
> Thanks for the suggestion.
> 
> > 
> > dev->ieee80211_ptr->iftype is always NL80211_IFTYPE_AP at this point.
> > 
> > > + mwifiex_dbg(priv->adapter, INFO,
> > > + "Skip change virtual interface\n");
> > 
> > Is this message easy to understand?  Other messages in the same function
> > seem easier; e.g. "%s: changing to %d not supported\n"
> 
> OK.
> 
> > 
> > > + return 0;
> > 
> > Should this be -EOPNOTSUPP rather than 0?
> 
> Yes.
> 
> > 
> > > + }
> > >   return mwifiex_change_vif_to_sta_adhoc(dev, curr_iftype,
> > >  type, params);
> > >   break;
> > > --
> > > 1.9.1
> > >
> > 
> > --
> > James Cameron
> > http://quozl.netrek.org/

-- 
James Cameron
http://quozl.netrek.org/


Re: [PATCH] mwifiex: do not support change AP interface to station mode

2017-11-21 Thread James Cameron
On Tue, Nov 21, 2017 at 03:24:03PM +0800, Xinming Hu wrote:
> Firmware do not support change interface from micro-ap mode to
> station mode, forbidden this operation in driver accordingly.

All firmware or specific versions?

> 
> Signed-off-by: Cathy Luo <c...@marvell.com>
> Signed-off-by: Xinming Hu <h...@marvell.com>
> ---
>  drivers/net/wireless/marvell/mwifiex/cfg80211.c | 6 ++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/drivers/net/wireless/marvell/mwifiex/cfg80211.c 
> b/drivers/net/wireless/marvell/mwifiex/cfg80211.c
> index 6e0d9a9..a87758f 100644
> --- a/drivers/net/wireless/marvell/mwifiex/cfg80211.c
> +++ b/drivers/net/wireless/marvell/mwifiex/cfg80211.c
> @@ -1181,6 +1181,12 @@ static int mwifiex_deinit_priv_params(struct 
> mwifiex_private *priv)
>   switch (type) {
>   case NL80211_IFTYPE_ADHOC:
>   case NL80211_IFTYPE_STATION:
> + if (mwifiex_get_priv_by_id(priv->adapter, priv->bss_num,
> +MWIFIEX_BSS_TYPE_STA)){

Is this test necessary?

dev->ieee80211_ptr->iftype is always NL80211_IFTYPE_AP at this point.

> + mwifiex_dbg(priv->adapter, INFO,
> + "Skip change virtual interface\n");

Is this message easy to understand?  Other messages in the same
function seem easier; e.g. "%s: changing to %d not supported\n"

> + return 0;

Should this be -EOPNOTSUPP rather than 0?

> + }
>   return mwifiex_change_vif_to_sta_adhoc(dev, curr_iftype,
>          type, params);
>   break;
> -- 
> 1.9.1
> 

-- 
James Cameron
http://quozl.netrek.org/


Re: rtl8821ae dbi read question

2017-11-05 Thread James Cameron
On Sun, Nov 05, 2017 at 04:15:36PM -0500, Nik Nyby wrote:
> I also want to note that adding rtl8821ae.aspm=0 to my grub kernel
> boot command doesn't fix my problem. (I'm building this driver into
> the kernel, not as a module). My connection dropping problem is
> fixed only if I comment out the aspm init code in the driver, per
> this patch:
>   https://patchwork.kernel.org/patch/9951511/

Interesting; that patch was for testing, and did more than just aspm=0.

Check for a BIOS update from your vendor.  Device registers can be
configured and persist despite warm reboot, which implies there is no
register reset in driver start or device reset, which further implies
that the BIOS can easily affect the outcome.

Check also for different behaviour after cold reboot; that is a power
down for 30 seconds then power up.

Do you yet have a known working kernel to bisect against?

-- 
James Cameron
http://quozl.netrek.org/


Re: RTL usb adapter question

2017-10-27 Thread James Cameron
On Fri, Oct 27, 2017 at 10:23:54PM -0500, David Ashley wrote:
> On 10/26/17, James Cameron <qu...@laptop.org> wrote:
> > Interesting, thanks.  It should be a QFN 46 pin chip; you may have
> > counted 15 instead of 14 pins on the long edge.  Send me a photograph
> > of the inside, off-list?
> 
> I uploaded a couple of pictures here:
> http://www.linuxmotors.com/RTL8188CUS/
> 
> You're right, I miscounted, it has 46 pins.

Thanks.  The BZ5JA might be 5V to 3.3V switching voltage regulator,
with inductor above it.

Datasheet shows there is internal non-volatile memory, powered from
pin 27, which has a trace to an external filter capacitor.

The large zero ohm resistor bottom right is interesting; size chosen
for accessibility; probably for fault isolation or qualification.

In summary, the board design is consistent with the datasheet, and
confirms non-volatile memory that will contain configuration data and
probably firmware.

I agree with Larry; try the firmware file.

-- 
James Cameron
http://quozl.netrek.org/


Re: rtlwifi oops

2017-10-27 Thread James Cameron
On Sat, Oct 28, 2017 at 12:02:30AM +0300, nirinA raseliarison wrote:
> On 10/27/2017 07:57 AM, James Cameron wrote:
> >On Fri, Oct 27, 2017 at 04:08:48AM +0300, nirinA raseliarison wrote:
> >>hi all,
> >>i applied the patch against 4.13.8. i still got some trouble, dmesg
> >>is below.
> >
> >As this new event does not have "disabled by hub (EMI?)", it is a
> >different problem to your 19th October post, so I don't think the
> >patch is relevant.
> >
> >>after i plugged the device, it seems to be detected and all modules
> >>loaded, but when i tried to connect to an access point, by using
> >>wicd, it halted after a while. at this point, all usb ports are
> >>broken, there was no more log in dmesg,
> >
> >If the other USB ports are not responding, then your problem is
> >probably wider than the wireless device, and the wireless device
> >is acting as the "canary in the mine"; failing first because it is
> >the most active.
> >
> >Can you test to exclude possibility of damaged USB host controller or
> >hub?
> 
> yes, dmesg below with an usb audio adapter and a usb mouse plugged at
> boot time. then the rtl8192cu plugged, and i'm using it to retrieve
> and send this mail.

Thanks.

Your dmesg shows the mouse is discovered, then disconnects, then
reconnects.  I can't tell if your mouse normally does this.  Can you
also test for the wireless problem without the USB mouse, or with a
different mouse?

Your dmesg also shows "cannot get freq" for USB audio device
endpoints, but I'm not sure what this means.

> my first guess was also about a damaged device or usb port
> as those random crashes are recent.
> note that the device i'm using here is not the same as the one
> that triggered the previous errors.
> 
> >>lsusb still showed the device even after being unplugged. it got
> >>even worse as reboot failed.
> >
> >Yes, once a USB host controller is failed, organised reboot can be
> >difficult.  lsusb not updated confirms host controller not responding.
> >
> >>i cannot really trace the error as right now all thing works fine.
> >
> >Your dmesg looks like you removed and reinserted the wireless device
> >several times.  Did you do that, or did the system do it without any
> >physical action?
> 
> no, the device was always connected. i've only removed it long after
> i noticed something went wrong and just before i tried reboot.

Okay, thanks.  I'm worried that unexpected disconnect suggests a USB
host or hub problem.

> >A full dmesg from boot may be interesting, at least to better
> >understand the USB host controller.
> >
> 
> here it is.
> thanks,
> [...]

-- 
James Cameron
http://quozl.netrek.org/


Re: rtlwifi oops

2017-10-26 Thread James Cameron
On Fri, Oct 27, 2017 at 04:08:48AM +0300, nirinA raseliarison wrote:
> hi all,
> i applied the patch against 4.13.8. i still got some trouble, dmesg
> is below.

As this new event does not have "disabled by hub (EMI?)", it is a
different problem to your 19th October post, so I don't think the
patch is relevant.

> after i plugged the device, it seems to be detected and all modules
> loaded, but when i tried to connect to an access point, by using
> wicd, it halted after a while. at this point, all usb ports are
> broken, there was no more log in dmesg,

If the other USB ports are not responding, then your problem is
probably wider than the wireless device, and the wireless device
is acting as the "canary in the mine"; failing first because it is
the most active.

Can you test to exclude possibility of damaged USB host controller or
hub?

> lsusb still showed the device even after being unplugged. it got
> even worse as reboot failed.

Yes, once a USB host controller is failed, organised reboot can be
difficult.  lsusb not updated confirms host controller not responding.

> i cannot really trace the error as right now all thing works fine.

Your dmesg looks like you removed and reinserted the wireless device
several times.  Did you do that, or did the system do it without any
physical action?

A full dmesg from boot may be interesting, at least to better
understand the USB host controller.

-- 
James Cameron
http://quozl.netrek.org/


Re: RTL usb adapter question

2017-10-26 Thread James Cameron
Interesting, thanks.  It should be a QFN 46 pin chip; you may have
counted 15 instead of 14 pins on the long edge.  Send me a photograph
of the inside, off-list?

There's a brief datasheet that I found, but no sign of firmware or
registers documentation, as usual;

http://www.cnping.com/wp-content/uploads/2015/09/RTL8188CUS_DataSheet_1.01.pdf

I've no direct experience with the rtl8188cus chip.  I can't prove it,
but my experience with other vendors suggests a small non-volatile
storage built into the chip for device configuration and firmware.
Device configuration often includes USB vendor:product.

I've read that Edimax uses rtl8188cus in a device programmed with
vendor:product 7392:7811, and the kernel handles this in

rtl8xxxu/rtl8xxxu_core.c
rtlwifi/rtl8192cu/sw.c

rtl8188cus has several configurable pins, so device configuration or
firmware would have been programmed to match the circuit layout.

As your kernel isn't providing firmware, yet the device works to an
extent, there is probably firmware already on the device.  I don't
know of a way to ask the device for a firmware version, or a firmware
dump.

You might sacrifice a sample to see if loading rtl8192cu firmware
changes behaviour at all.

You might also work with your device vendor to improve clarity.  ;-)

On Thu, Oct 26, 2017 at 08:28:00PM -0500, David Ashley wrote:
> I opened up the dongle, it has these things inside (aside from 2 coils
> and various resistors and capacitors)
> 1)
> 48 pin chip (9 pins, 15 pins, 9 pins, 15 pins)
> REALTEK
> RTL8188CUS
> F6J23P2
> GF27 TAIWAN
> 
> 6 pin chip (3 pins,3 pins)
> BZ5JA
> 
> 40.0 mhz crystal oscillator
> 
> I was thinking maybe some serial eeprom would be included, but there wasn't 
> one.
> 
> -Dave
> 
> 
> On 10/26/17, James Cameron <qu...@laptop.org> wrote:
> > Base on your evidence, I'd say the device is different to others and
> > has firmware included.
> >
> > On Thu, Oct 26, 2017 at 04:45:54PM -0500, David Ashley wrote:
> >> OK I'm completely baffled.
> >>
> >> I have explicitly removed all rtlwifi/ firmware files from the root
> >> filesystem and yet the usb dongle still works, even after a power
> >> cycle. How can it possibly be getting its firmware file
> >>
> >> Here are the relevant kernel messages. There is no file
> >> rtl8192cufw.bin anywhere for the kernel to find...
> >> root@30046:~# ls -l /lib/firmware/rtlwifi/
> >> total 0
> >>
> >> I have ensured there is no *OTHER* route to the internet such that the
> >> driver (or udev) can magically get the firmware file from the
> >> internet...
> >>
> >> Here's other info that may be useful...
> >>
> >> root@30046:~# zcat /proc/config.gz | grep FIRM
> >> CONFIG_PREVENT_FIRMWARE_BUILD=y
> >> CONFIG_FIRMWARE_IN_KERNEL=y
> >> CONFIG_EXTRA_FIRMWARE="am335x-pm-firmware.elf
> >> am335x-bone-scale-data.bin am335x-evm-scale-data.bin
> >> am43x-evm-scale-data.bin"
> >> CONFIG_EXTRA_FIRMWARE_DIR="firmware"
> >> # CONFIG_LIBERTAS_THINFIRM is not set
> >> # CONFIG_FIRMWARE_MEMMAP is not set
> >> # CONFIG_TEST_FIRMWARE is not set
> >> root@30046:~# cat /proc/version
> >> Linux version 4.1.19-bone20 (dash@DaveDesktop) (gcc version 5.4.0
> >> 20160609 (Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.4) ) #2 Tue Oct 3
> >> 17:25:35 CDT 2017
> >> root@30046:~# lsusb
> >> Bus 001 Device 002: ID 7392:7811 Edimax Technology Co., Ltd EW-7811Un
> >> 802.11n Wireless Adapter [Realtek RTL8188CUS]
> >>
> >> ... ifconfig
> >> wlan0 Link encap:Ethernet  HWaddr 74:da:38:61:f1:2c
> >>   inet addr:192.168.10.31  Bcast:192.168.10.255
> >> Mask:255.255.255.0
> >>   inet6 addr: fe80::76da:38ff:fe61:f12c/64 Scope:Link
> >>   UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
> >>   RX packets:509 errors:0 dropped:0 overruns:0 frame:0
> >>   TX packets:146 errors:0 dropped:0 overruns:0 carrier:0
> >>   collisions:0 txqueuelen:1000
> >>   RX bytes:60812 (59.3 KiB)  TX bytes:16365 (15.9 KiB)
> >>
> >>
> >>
> >>
> >> [9.663796] rtl8192cu: Chip version 0x10
> >> [9.745394] cfg80211: Calling CRDA to update world regulatory domain
> >> [9.844311] random: nonblocking pool is initialized
> >> [9.877851] rtl8192cu: MAC address: 74:da:38:61:f1:2c
> >> [9.877883] rtl8192cu: Board Type 0
> >> [9.877989] rtl_usb: rx_max_size 15360, rx_urb_num 8, in_ep 1
> >> [9.878098] rtl8192cu: Loading fi

Re: RTL usb adapter question

2017-10-26 Thread James Cameron
filesystem.
> >>
> >> Basically I'm trying to understand the theory. We have a product that
> >> is making use of the device
> >>
> >> Bus 001 Device 007: ID 7392:7811 Edimax Technology Co., Ltd
> >> EW-7811Un802.11n Wireless Adapter [Realtek RTL8188CUS]
> >>
> >> It has not been especially reliable. I've never provided firmware
> >> files for the device in the root filesystem. I've started to pay
> >> attention to the kernel error messages. Now the kernel drivers seem to
> >> be loading the rtlwifi/rtl8192cufw_TMSC.bin file and I'm trying to
> >> understand if this is actually working, if it makes any difference in
> >> reliability...
> >>
> >> It's like I can't figure out how the usb dongle even worked without
> >> its firmware file...
> >>
> >> My working theory is that the usb dongle comes from the factory with a
> >> hardcoded firmware file (rtlwifi/rtl8192cufw.bin) but it is buggy or
> >> inferior. And the performance and reliability can be improved if the
> >> driver successfully manages to load the rtl8192cufw_TMSC.bin file. I
> >> don't know if the firmware load persists across a power cycle (my
> >> assumption is it doesn't).
> >
> > There is NO firmware coded by the factory in the device. It only has enough
> >
> > intelligence to load the real firmware. The exact file that it loads is
> > determined by the model. If you provide the appropriate section of the
> > output of
> > dmesg where the above firmware messages occur, and a file listing of
> > /lib/firmware/rtlwifi/, I can tell you what firmware is being loaded.
> >
> > No, firmware will not persist across a power failure.
> >
> > The driver has never been particularly reliable, and the USB group at
> > Realtek
> > seems not to care. You might try their other driver, but you will be on your
> >
> > own, as I will not support that particular piece of .
> >
> > Please reply to all on any followups.
> >
> > Larry
> >
> >

-- 
James Cameron
http://quozl.netrek.org/


Re: iwlwifi crash with hostapd

2017-10-25 Thread James Cameron
On Wed, Oct 25, 2017 at 09:08:17AM +0200, Mario Theodoridis wrote:
> On 24/10/17 23:01, James Cameron wrote:
> >Summary: WARN_ON(iwl_mvm_is_dqa_supported(mvm)) in
> >iwl_mvm_rx_tx_cmd_single with v4.13, but code is since changed.
> >
> >On Tue, Oct 24, 2017 at 09:56:31PM +0200, Mario Theodoridis wrote:
> >>Sorry for skipping the list one the last one.
> >
> >Sorry, that was my fault.  It was a private message you replied to.
> >
> >>On 19.10.2017 22:59, James Cameron wrote:
> >>[...]
> >
> >You didn't say virtualbox was essential for reproducing the problem,
> >so I'm continuing to exclude it from thought.  If it is essential for
> >reproducing, then you might contact them.
> >
> >Please do make sure you can exclude virtualbox as a cause.
> 
> Let me clarify the virtualbox thing. The machine in question is a VM host.
> It hosts several machines, one of which is my mail server, and another
> (openbsd) which acts as a gateway to the internet for all machines.
> If i run this machine without virtualbox, then my entire network topology is
> off-line. While one could argue, that this is bad design, the alternative
> would be to use openbsd as a virtual host, but i haven't seen many tutorials
> on that. I also would like to run just one machine 24/7 to keep a tap on the
> electricity consumption.
> 
> This machine also bridges several interfaces and acts as a hotspot for my
> wlan.
> 
> So i don't know whether virtualbox is responsible, but not running
> virtualbox is simply not an option.

Thanks.

I don't have a machine with the same wireless device, so I can't hope
to reproduce the problem or test fixes.  I do have a slightly later
wireless device which uses the same driver, but I'm not confident it
would reproduce the problem, because (a) I've not seen the same stack
traces, (b) the WARN_ON relates to device response coded in firmware,
and my wireless device may use different firmware, and (c) it isn't
clear to me what you did to enable the problem.

You do have a machine, and you might do tests without virtualbox,
but as you say, this is not an option for you.

> >>This one pretty quickly loads my syslog with new error stacks. I
> >>haven't tested actual behavior yet, but the logs don't look so hot.
> >
> >Do connections frequently keep dying as before?
> >
> >>I ran another wireless-info (attached) and appended some of the
> >>syslog stuff to it.
> >
> >Thanks, you identified a line of code and cause; a WARN_ON in
> >iwl_mvm_rx_tx_cmd_single;
> >
> > case TX_STATUS_FAIL_DEST_PS:
> > /* In DQA, the FW should have stopped the queue and not
> >  * return this status
> >  */
> > WARN_ON(iwl_mvm_is_dqa_supported(mvm));
> > info->flags |= IEEE80211_TX_STAT_TX_FILTERED;
> > break;
> >
> >But it is only a warning.  If connections aren't dying, it may not be
> >important to you.
> >
> >Please check you are using the most recent linux-firmware?
> 
> I'm running
> ii  linux-firmware   1.169 all
> from artful.
> No difference to the xenial version.

Good, thanks.

> >>>Several methods, though by far the most common seems to be personal
> >>>experience with offsets.
> >>>
> >>>When you don't have that personal experience, the methods are;
> >>>
> >>>1.  using GDB against the .o file,
> >>>
> >>>2.  using binutils objdump to disassemble .o file or vmlinuz,
> >>>
> >>>3.  using GCC to generate assembly listings,
> >>>
> >>>See https://wiki.ubuntu.com/Kernel/KernelDebuggingTricks right down
> >>>the end of page for the GDB method.
> >>
> >>I have gotten around to that part, yet, as i was busy with the
> >>above, but it seems later versions have issues, too.
> >
> >However, you're still testing old source code.
> >
> >Several changes made since are worth testing, please either
> >cherry-pick the patches or test a 4.14 rc kernel, and without
> >involving dkms or virtualbox.
> 
> Then i'd have to patch those files so they build for 4.14 first.
> I've seen patches, but still need to figure out how to get them
> applied in the build process.

It may be more efficient to wait for your dkms packagers to catch up
so that the v4.14-rc6 or v4.14 kernel will work with your
package configuration.

> -- 
> Mit freundlichen Grüßen/Best Regards
> 
> Mario Theodoridis

-- 
James Cameron
http://quozl.netrek.org/


Re: iwlwifi crash with hostapd

2017-10-24 Thread James Cameron
Summary: WARN_ON(iwl_mvm_is_dqa_supported(mvm)) in
iwl_mvm_rx_tx_cmd_single with v4.13, but code is since changed.

On Tue, Oct 24, 2017 at 09:56:31PM +0200, Mario Theodoridis wrote:
> Sorry for skipping the list one the last one.

Sorry, that was my fault.  It was a private message you replied to.

> On 19.10.2017 22:59, James Cameron wrote:
> >On Thu, Oct 19, 2017 at 08:56:46AM +0200, Mario Theodoridis wrote:
> >>On 18/10/17 23:33, James Cameron wrote:
> >>
> >> For your interest, kernel v4.4.93 in stable series just released has
> >> changes in relevant files.
> >>
> >> https://lwn.net/Articles/736770/
> >>
> >>Thanks James,
> >>
> >>after looking into bisection last night, i found that just before
> >>i wanted to test out the 4.4.0-82 kernel, i found 3 stack traces
> >>in my syslog. :(
> >>
> >>I guess, i'm dealing with race conditions now. But it seems the 79
> >>kernel still crashes wifi a lot less than later ones.
> >>
> >>How do i get line numbers into these traces?
> 
> As the 4.4.0-79 kernel was sometimes crapping out, too, i decided to
> try to test the latest kernel instead of bisecting after all.  This
> took a while because virtualbox was being a bitch. virtualbox-5.0
> doesn't bode well with virtualbox-dkms-51, so i ended up rebuilding
> virtualbox-5.1 to prevent dependency hell.  The vb-dkms package
> doesn't do 4.14, so i ended up going with the 4.13 kernel that comes
> with artful.

You didn't say virtualbox was essential for reproducing the problem,
so I'm continuing to exclude it from thought.  If it is essential for
reproducing, then you might contact them.

Please do make sure you can exclude virtualbox as a cause.

> This one pretty quickly loads my syslog with new error stacks. I
> haven't tested actual behavior yet, but the logs don't look so hot.

Do connections frequently keep dying as before?

> I ran another wireless-info (attached) and appended some of the
> syslog stuff to it.

Thanks, you identified a line of code and cause; a WARN_ON in
iwl_mvm_rx_tx_cmd_single;

case TX_STATUS_FAIL_DEST_PS:
/* In DQA, the FW should have stopped the queue and not
 * return this status
 */
WARN_ON(iwl_mvm_is_dqa_supported(mvm));
info->flags |= IEEE80211_TX_STAT_TX_FILTERED;
break;

But it is only a warning.  If connections aren't dying, it may not be
important to you.

Please check you are using the most recent linux-firmware?

> >Several methods, though by far the most common seems to be personal
> >experience with offsets.
> >
> >When you don't have that personal experience, the methods are;
> >
> >1.  using GDB against the .o file,
> >
> >2.  using binutils objdump to disassemble .o file or vmlinuz,
> >
> >3.  using GCC to generate assembly listings,
> >
> >See https://wiki.ubuntu.com/Kernel/KernelDebuggingTricks right down
> >the end of page for the GDB method.
> 
> I have gotten around to that part, yet, as i was busy with the
> above, but it seems later versions have issues, too.

However, you're still testing old source code.

Several changes made since are worth testing, please either
cherry-pick the patches or test a 4.14 rc kernel, and without
involving dkms or virtualbox.

Or, if new firmware fixes the problem, go with that instead.

> -- 
> Mit freundlichen Grüßen/Best regards
> 
> Mario Theodoridis

> 
> ## wireless info START ##
> [...]

-- 
James Cameron
http://quozl.netrek.org/


Re: [PATCH] bcma: Use bcma_debug and not pr_cont in MIPS driver

2017-10-18 Thread James Cameron
On Wed, Oct 18, 2017 at 10:12:18PM -0700, Joe Perches wrote:
> Commit 66cc04424960 ("bcma: use bcma_debug and pr_cont in MIPS driver")
> converted a printk(KERN_DEBUG to bcma_debug.
> 
> bcma_debug is guarded by a #define DEBUG via pr_debug.
> 
> This means that the bcma_debug will generally not be emitted
> but any pr_cont following the bcma_debug will be emitted.
> 
> Correct this by removing the uses of pr_cont by using a temporary.
> 
> Signed-off-by: Joe Perches <j...@perches.com>
> ---
>  drivers/bcma/driver_mips.c | 11 +++
>  1 file changed, 7 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/bcma/driver_mips.c b/drivers/bcma/driver_mips.c
> index 5904ef1aa624..a929956150eb 100644
> --- a/drivers/bcma/driver_mips.c
> +++ b/drivers/bcma/driver_mips.c
> @@ -184,11 +184,14 @@ static void bcma_core_mips_print_irq(struct bcma_device 
> *dev, unsigned int irq)
>  {
>   int i;
>   static const char *irq_name[] = {"2(S)", "3", "4", "5", "6", "D", "I"};
> +char interrupts[20];
> +char *ints = interrupts;

Tabs were changed to spaces.

>  
> - bcma_debug(dev->bus, "core 0x%04x, irq :", dev->id.id);
> - for (i = 0; i <= 6; i++)
> - pr_cont(" %s%s", irq_name[i], i == irq ? "*" : " ");
> - pr_cont("\n");
> +for (i = 0; i < ARRAY_SIZE(irq_name); i++)
> +ints += sprintf(ints, " %s%c",
> +     irq_name[i], i == irq ? '*' : ' ');

But not on this line.

> +
> +bcma_debug(dev->bus, "core 0x%04x, irq:%s\n", dev->id.id, 
> interrupts);
>  }
>  
>  static void bcma_core_mips_dump_irq(struct bcma_bus *bus)
> -- 
> 2.10.0.rc2.1.g053435c
> 

-- 
James Cameron
http://quozl.netrek.org/


Re: rtlwifi oops

2017-10-18 Thread James Cameron
00
> [  239.701327] RIP: rtl_deinit_core+0x2e/0x90 [rtlwifi] RSP:
> c99a3b40
> [  239.702028] CR2: 
> [  239.705370] ---[ end trace 6ec9029c0d9c0e13 ]---
> [  239.706311] udevd[528]: worker [1174] failed while handling
> '/devices/pci:00/:00:1d.0/usb2/2-1/2-1.3/2-1.3:1.0'
> 

-- 
James Cameron
http://quozl.netrek.org/


Re: iwlwifi crash with hostapd

2017-10-17 Thread James Cameron
On Tue, Oct 17, 2017 at 09:35:39PM +0200, Mario Theodoridis wrote:
> On 16.10.2017 05:37, James Cameron wrote:
> >On Sun, Oct 15, 2017 at 06:21:36PM +0200, Mario Theodoridis wrote:
> >>Thanks for the pointers, James.
> >>
> >>On 12.10.2017 23:24, James Cameron wrote:
> >>>There's a good chance this problem has been fixed already.  You
> >>>are using a v4.4 kernel with many patches applied by Ubuntu.  Here, we
> >>>are more concerned with the latest kernels, and v4.4 is quite old.
> >>>
> >>>Please test some of the later kernels, see
> >>>https://wiki.ubuntu.com/Kernel/MainlineBuilds
> >>>
> >>>In particular, test v4.13 or v4.14-rc4.
> >>
> >>I'm having a hard time with that, because the virtualbox-dkms build fails
> >>with the 4.13 kernel, and virtualbox unfortunately is essential.
> >
> >Is virtualbox essential for reproducing the problem, or essential for
> >your general use?
> 
> It is essential for general use, like Internet connectivity.

Okay, good, that means we can ignore virtualbox, and leave that to
you.

Please test v4.13 or v4.14-rc5, ignoring virtualbox for the time being.

> >If the former, then that's interesting.
> >
> >If the latter, then you might instead test the v4.13 or v14-rc4
> >kernels for only the problem, and then revert to an older kernel after
> >testing.
> >
> >Either way, to use virtualbox-dkms with a later kernel you may be able
> >to upgrade just the virtualbox packages from a later Ubuntu release.
> >
> >See https://packages.ubuntu.com/virtualbox-dkms and
> >https://packages.ubuntu.com/virtualbox for the later versions available.
> >
> >Purpose of the test can be to help isolate the cause, not only to
> >solve your problem.
> 
> Thanks for the info.
> 
> >
> >[...]
> >You might also try with later firmware package.
> >See https://packages.ubuntu.com/linux-firmware
> >
> >You might also test with booting installation media in live-mode,
> >ignoring the internal disk.
> 
> Ok, that was completely off the radar.

Updating linux-firmware may run different firmware on the wireless
card, and the change in behaviour may fix the problem.  A gamble.

A test with later installation media is useful, because you can verify
problems with different kernels and wireless firmware without change
to configuration.  You might try Ubuntu 17.10 Artful ISO.

> I ended up going the other way. I still had a 4.4.0-79-generic kernel and
> booted that. It does not have this problem.
> After checking out
> git://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/xenial
> i tried to find the culprit but was not able to trace the back trace to a
> potential null pointer or some such. I got stuck at
> iwl_mvm_send_cmd_pdu_status not finding a reference to iwl_mvm_disable_txq
> from there.
> 
> I did got the following diff though
> 
> git diff Ubuntu-4.4.0-79.100 Ubuntu-4.4.0-93.116 --
> drivers/net/wireless/iwlwifi/ drivers/net/wireless/mac80211_hwsim.c >
> wifi.patch
> 
> I don't know whether this came from upstream or was ubuntu sourced.

Upstream.

You found your problem was introduced in an Ubuntu kernel, in the
update from -79 to -93.  This contained Ubuntu backports of two
stable kernel patches, which are also upstream patches;

8fbcfeb8a9cc ("mac80211_hwsim: Replace bogus hrtimer clockid")
from v4.4.69

50ea05efaf3b ("mac80211: pass block ack session timeout to to driver")
from v4.4.77

git log Ubuntu-4.4.0-79.100..Ubuntu-4.4.0-93.116 -- \
drivers/net/wireless/iwlwifi/ drivers/net/wireless/mac80211_hwsim.c

git remote add stable \
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git
git fetch stable
git log v4.4.68..v4.4.92 -- \
drivers/net/wireless/iwlwifi/ drivers/net/wireless/mac80211_hwsim.c

> This fixed the issue for now, but now i'm stuck on that kernel :(

Yes.

Here in upstream, we would run the latest kernel v4.13 and work to
fix that.  Trouble you had with virtualbox packages would be
eventually solvable, but aren't really a problem with the kernel
itself.

So your next step may be to report an Ubuntu bug, and say that -79
worked fine, and -93 did not.

> While i'm perfectly comfortable with user land C, i have no kernel
> experience (clue stick links definitely welcome).

You might verify the above patches caused the problem by doing a
bisection between -79 and -93.

https://wiki.ubuntu.com/Kernel/KernelBisection

Or by reverting only those patches.

Then report to Ubuntu which patch caused the problem.

> [...]

Hope that helps.

-- 
James Cameron
http://quozl.netrek.org/


Re: iwlwifi crash with hostapd

2017-10-15 Thread James Cameron
On Sun, Oct 15, 2017 at 06:21:36PM +0200, Mario Theodoridis wrote:
> Thanks for the pointers, James.
> 
> On 12.10.2017 23:24, James Cameron wrote:
> >There's a good chance this problem has been fixed already.  You
> >are using a v4.4 kernel with many patches applied by Ubuntu.  Here, we
> >are more concerned with the latest kernels, and v4.4 is quite old.
> >
> >Please test some of the later kernels, see
> >https://wiki.ubuntu.com/Kernel/MainlineBuilds
> >
> >In particular, test v4.13 or v4.14-rc4.
> 
> I'm having a hard time with that, because the virtualbox-dkms build fails
> with the 4.13 kernel, and virtualbox unfortunately is essential.

Is virtualbox essential for reproducing the problem, or essential for
your general use?

If the former, then that's interesting.

If the latter, then you might instead test the v4.13 or v14-rc4
kernels for only the problem, and then revert to an older kernel after
testing.

Either way, to use virtualbox-dkms with a later kernel you may be able
to upgrade just the virtualbox packages from a later Ubuntu release.

See https://packages.ubuntu.com/virtualbox-dkms and
https://packages.ubuntu.com/virtualbox for the later versions available.

Purpose of the test can be to help isolate the cause, not only to
solve your problem.

> >If the problem still happens, capture the same information and send it
> >again as a reply.
> >
> >If the problem doesn't happen, then you can either continue to use the
> >new kernel, or find when the problem was fixed; a long but rewarding
> >process.
> >
> >Should the problem have been fixed for v4.10, you might also switch to
> >using the Ubuntu package linux-generic-hwe-16.04.
> >https://wiki.ubuntu.com/Kernel/RollingLTSEnablementStack#hwe-16.04
> 
> The 4.10 kernel readily produced this one
> 
> [ cut here ]
> WARNING: CPU: 4 PID: 1617 at 
> /build/linux-hwe-IJy1zi/linux-hwe-4.10.0/drivers/net/wireless/intel/iwlwifi/mvm/tx.c:510
> iwl_mvm_tx_skb_non_sta+0x39a/0x440 [iwlmvm]
> Modules linked in: bnep ccm pci_stub vboxpci(OE) vboxnetadp(OE)
> vboxnetflt(OE) vboxdrv(OE) nf_log_ipv4 nf_log_common xt_LOG xt_tcpudp
> nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack iptable_filter
> ip_tables x_tables snd_hda_codec_hdmi arc4 iwlmvm mac80211
> snd_hda_codec_realtek snd_hda_codec_generic intel_rapl x86_pkg_temp_thermal
> intel_powerclamp iwlwifi coretemp snd_hda_intel snd_hda_codec kvm_intel
> snd_hda_core snd_hwdep kvm input_leds irqbypass crct10dif_pclmul snd_pcm
> bridge crc32_pclmul joydev stp llc ghash_clmulni_intel snd_seq_midi pcbc
> snd_seq_midi_event snd_rawmidi aesni_intel snd_seq aes_x86_64 crypto_simd
> snd_seq_device glue_helper cfg80211 cryptd snd_timer intel_cstate snd
> intel_rapl_perf soundcore shpchp mei_me hci_uart mei btbcm btqca btintel
> bluetooth intel_lpss_acpi
>  acpi_als mac_hid intel_lpss kfifo_buf tpm_infineon industrialio acpi_pad
> parport_pc ppdev lp parport autofs4 i915 e1000e i2c_algo_bit drm_kms_helper
> syscopyarea sysfillrect sysimgblt fb_sys_fops e100 hid_generic ptp i2c_hid
> ahci mii drm pps_core pinctrl_sunrisepoint libahci usbhid e1000 hid wmi
> video pinctrl_intel fjes
> CPU: 4 PID: 1617 Comm: hostapd Tainted: G   OE 4.10.0-37-generic
> #41~16.04.1-Ubuntu
> Hardware name: Gigabyte Technology Co., Ltd. Z170M-D3H/Z170M-D3H-CF, BIOS
> F20 11/17/2016
> Call Trace:
>  dump_stack+0x63/0x90
>  __warn+0xcb/0xf0
>  warn_slowpath_null+0x1d/0x20
>  iwl_mvm_tx_skb_non_sta+0x39a/0x440 [iwlmvm]
>  iwl_mvm_mac_tx+0x11e/0x1d0 [iwlmvm]
>  ieee80211_tx_frags+0x14b/0x220 [mac80211]
>  __ieee80211_tx+0x81/0x180 [mac80211]
>  ieee80211_tx+0x10f/0x150 [mac80211]
>  ieee80211_xmit+0x9b/0xf0 [mac80211]
>  __ieee80211_tx_skb_tid_band+0x5c/0x70 [mac80211]
>  ieee80211_mgmt_tx+0x42c/0x4a0 [mac80211]
>  cfg80211_mlme_mgmt_tx+0xdc/0x310 [cfg80211]
>  nl80211_tx_mgmt+0x212/0x360 [cfg80211]
>  genl_family_rcv_msg+0x1db/0x3b0
>  ? skb_queue_tail+0x43/0x50
>  genl_rcv_msg+0x59/0xa0
>  ? genl_notify+0x80/0x80
>  netlink_rcv_skb+0xa4/0xc0
>  genl_rcv+0x28/0x40
>  netlink_unicast+0x18c/0x240
>  netlink_sendmsg+0x2fb/0x3a0
>  ? aa_sock_msg_perm+0x61/0x150
>  sock_sendmsg+0x38/0x50
>  ___sys_sendmsg+0x2c2/0x2d0
>  ? sock_sendmsg+0x38/0x50
>  ? SYSC_sendto+0x101/0x190
>  ? __check_object_size+0x108/0x1e3
>  ? _copy_to_user+0x55/0x60
>  __sys_sendmsg+0x54/0x90
>  SyS_sendmsg+0x12/0x20
>  entry_SYSCALL_64_fastpath+0x1e/0xad
> RIP: 0033:0x7fcc38cfe450
> RSP: 002b:7fffdefc9b18 EFLAGS: 0246 ORIG_RAX: 002e
> RAX: ffda RBX: 563e91285590 RCX: 7fcc38cfe450
> RDX:  RSI: 7fffdefc9ba0 RDI: 0005
> RBP:  R08: 000

Re: iwlwifi crash with hostapd

2017-10-12 Thread James Cameron
On Thu, Oct 12, 2017 at 10:26:33PM +0200, Mario Theodoridis wrote:
> Hello everyone,
> 
> i'm running Kubuntu 16.04 as a Virtualbox VM host, and a wireless AP
> with an Intel Wireless 7260.
> 
> My WLAN connections frequently keep dying, so that i need to
> disconnect and reconnect in order to use them again.
> My syslog is full of these:
> 
> Oct 12 21:48:55 zippy kernel: [3546600.957321] [ cut here
> ]
> Oct 12 21:48:55 zippy kernel: [3546600.957352] WARNING: CPU: 2 PID: 1571 at
> /build/linux-YyUNAI/linux-4.4.0/drivers/net/wireless/iwlwifi/mvm/utils.c:740
> iwl_mvm_disable_txq+0x2a6/0x2c0 [iwlmvm]()
> [...]
> I'm not sure if this is the right forum to post this.
> If it isn't, a pointer to the right place would be appreciated.

This is a right place.  Another right place is Ubuntu bug reporting.

> Please include me in the reply as i'm not on the list.
> Let me know, what additional details i need to provide, as i'm
> interested in getting this to work.

There's a good chance this problem has been fixed already.  You
are using a v4.4 kernel with many patches applied by Ubuntu.  Here, we
are more concerned with the latest kernels, and v4.4 is quite old.

Please test some of the later kernels, see
https://wiki.ubuntu.com/Kernel/MainlineBuilds

In particular, test v4.13 or v4.14-rc4.

If the problem still happens, capture the same information and send it
again as a reply.

If the problem doesn't happen, then you can either continue to use the
new kernel, or find when the problem was fixed; a long but rewarding
process.

Should the problem have been fixed for v4.10, you might also switch to
using the Ubuntu package linux-generic-hwe-16.04.
https://wiki.ubuntu.com/Kernel/RollingLTSEnablementStack#hwe-16.04

Hope that helps.

> Thanks.
> 
> Regards
> 
> Mario

[...]

-- 
James Cameron
http://quozl.netrek.org/


Re: Contributing to Linux-wireless drivers.

2017-10-10 Thread James Cameron
On Tue, Oct 10, 2017 at 05:14:02PM +0530, Himanshu Jha wrote:
> Hello everyone,
> 
> Apologies for that forwarded email which I hurriedly sent without
> editing here!
> 
> I am an undergraduate student in ECE(3rd year) and wish to contribute to 
> linux-wireless
> drivers. I am familiar with the kernel development process and have many
> patches accepted in the past 2 months with variety of tools used such as
> coccinelle, Kasan, smatch, sparse and checkpatch.
> 
> My past contributions can be found here:
> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/log/?qt=grep=Himanshu+Jha
> 
> Also, James Cameron suggested me to *not self promot* and other useful
> stuff. But I'm not self promoting and the purpose is to avoid the
> initial steps that you generally recommend to a newbie like reading the
> conding guideline, submitting patches, learn Git etc.

Last time I'll try that privately.  Now I'm publically outed for it.
I keep making this mistake.

For completeness, what I had said was;

> > Self promotion is not often acceptable.  For background on
> > culture, see http://www.catb.org/esr/faqs/hacker-howto.html

and Himanshu said they wanted to avoid being told the initial steps
again, to which I replied;

> > Good point.  However even as a grey beard, I can still get told
> > these things; it reflects more on them than me.
> >
> > An alternate method would be to say what you have done without
> > using any words that measure or evaluate what you have done.

However, I am curious to know if there will be a GSoC engagement by
Linux Foundation in the linux-wireless scope.  It would be fun to
watch and learn.

-- 
James Cameron
http://quozl.netrek.org/


Re: rtl8821ae keep alive not set, connection lost

2017-09-21 Thread James Cameron
On Thu, Sep 21, 2017 at 09:40:14AM -0500, Larry Finger wrote:
> On 09/21/2017 03:07 AM, James Cameron wrote:
> >My test kernel "-qb" was write_readback = false in sw.c, with 8-bit
> >read of REG_DBI_RDATA, and has been stable for four hours.  I'll
> >focus on some more testing of this one.  It is a surprise.
> >
> >http://dev.laptop.org/~quozl/z/1dutXk.txt  (dmesg)
> >
> >Observe how REG_DBI_FLAG+0 is briefly seen as 1, which doesn't
> >happen with write_readback = true.
> 
> Again, thanks for your efforts.
> 
> At this point, my system has been up over 17 hours without a single
> drop. As a result, I will leave the reversion of commit 40b368af4b75
> in place. It seems safer than turning off write_readback. After we
> get more testing, that could still be an option.

Thanks for the reversion commit, I'll point others to it.

My apologies for sloppy work, the test kernel features got swapped!
"-qb" above was with write_readback off, and 16-bit read of
REG_DBI_RDATA, not 8-bit.  Verified with objdump.  It has run for 24
hours without a drop.

So at conclusion;

- the 16-bit read is good with or without write_readback.

- the 8-bit read is bad with or without write_readback, and tends to
  lose connection much quicker without write_readback.

Been a pleasure working with you.  Back to lurk mode.

-- 
James Cameron
http://quozl.netrek.org/


Re: rtl8821ae keep alive not set, connection lost

2017-09-21 Thread James Cameron
On Thu, Sep 21, 2017 at 09:22:28AM +1000, James Cameron wrote:
> On Wed, Sep 20, 2017 at 04:48:23PM -0500, Larry Finger wrote:
> > On 09/20/2017 04:36 AM, James Cameron wrote:
> > >When the problem occurs, register 0x350 bit 25 is set, for which a
> > >comment in _rtl8821ae_check_pcie_dma_hang says means there is an RX
> > >hang.
> > >
> > >So perhaps driver should call _rtl8821ae_check_pcie_dma_hang
> > >and _rtl8821ae_reset_pcie_interface_dma.
> > >
> > >Any ideas where to do this?
> > 
> > Thanks for the extended debugging.
> > 
> > I was able to repeat your findings. With the 8-bit read of
> > REG_DBI_RDATA, I got poor connection stability. Reverting that part
> > made it stable again. For that reason, I pushed the partial
> > reversion of commit 40b368af4b75 ("rtlwifi: Fix alignment issues").
> 
> That's great you were able to reproduce, thanks!
> [...]
> I'm still pondering a few more theories;
> 
> - change write_readback, it is true now, and the while()/udelay in
>   _rtl8821ae_dbi_read seems a waste, it never executes,

My test kernel "-qb" was write_readback = false in sw.c, with 8-bit
read of REG_DBI_RDATA, and has been stable for four hours.  I'll focus
on some more testing of this one.  It is a surprise.

http://dev.laptop.org/~quozl/z/1dutXk.txt (dmesg)

Observe how REG_DBI_FLAG+0 is briefly seen as 1, which doesn't happen
with write_readback = true.

> - clearing REG_DBI_CTRL write enable bits at the end of
>   _rtl8821ae_dbi_write,

My test kernel "-qc" had reset of REG_DBI_ADDR as last step in both
_rtl8821ae_dbi_read and _rtl8821ae_dbi_write, and was very unstable,
not able to connect.

http://dev.laptop.org/~quozl/y/1dutbX.txt (git diff v4.13)
http://dev.laptop.org/~quozl/z/1dutuM.txt (dmesg)

My test kernel "-qd" had reset of REG_DBI_ADDR as last step in only
_rtl8821ae_dbi_write, and had poor connection stability.

http://dev.laptop.org/~quozl/y/1dutr3.txt (git diff v4.13)
http://dev.laptop.org/~quozl/z/1duuDc.txt (dmesg connection lost)

Based on the above two kernels, clearing REG_DBI_ADDR after a read is
a bad idea, and suggests there is some underlying asynchronicity about
the DBI access.  Almost as if some other condition should signal
completion rather than zero in REG_DBI_FLAG+0.

> - switching to 32-bit access as used by rtl8192de.

My test kernel "-qe" changed RED_DBI_RDATA read to 32-bit, then used a
union hack to pull out the desired byte, and had poor connection
stability.

http://dev.laptop.org/~quozl/y/1duvIC.txt (git diff v4.13)
http://dev.laptop.org/~quozl/z/1duwI1.txt (dmesg connection lost)

-- 
James Cameron
http://quozl.netrek.org/


Re: rtl8821ae keep alive not set, connection lost

2017-09-20 Thread James Cameron
On Wed, Sep 20, 2017 at 04:48:23PM -0500, Larry Finger wrote:
> On 09/20/2017 04:36 AM, James Cameron wrote:
> >When the problem occurs, register 0x350 bit 25 is set, for which a
> >comment in _rtl8821ae_check_pcie_dma_hang says means there is an RX
> >hang.
> >
> >So perhaps driver should call _rtl8821ae_check_pcie_dma_hang
> >and _rtl8821ae_reset_pcie_interface_dma.
> >
> >Any ideas where to do this?
> 
> Thanks for the extended debugging.
> 
> I was able to repeat your findings. With the 8-bit read of
> REG_DBI_RDATA, I got poor connection stability. Reverting that part
> made it stable again. For that reason, I pushed the partial
> reversion of commit 40b368af4b75 ("rtlwifi: Fix alignment issues").

That's great you were able to reproduce, thanks!

> Where did you detect that bit 25 of register 0x350 was set?

In _rtl8821ae_check_pcie_dma_hang on link up.

REG_DBI_FLAG (0x350 bits 16-31) is observed as;

- 0x on entry to function after warm boot,

- 0x0400 on exit from function; debug bit 23 is set by the function,

- 0x0400 on entry to function on link up when the problem has not
  happened,

- 0x0600 on entry to function on link up when the problem has
  happened.

But I don't know if 0x0600 is useful to detect earlier, or if it is
only a symptom of link down while device active.  Either way, if it
truly does signal an RX hang or firmware RX queue full, it's useful.

My "-q9" and "-qa" test kernels dump REG_DBI_CTRL and REG_DBI_FLAG.

"-q9" is with 8-bit read of REG_DBI_RDATA.

"-qa" is with 16-bit read of REG_DBI_DATA.

My "-qa" test kernel;
http://dev.laptop.org/~quozl/y/1dunwN.txt (git diff v4.13)
http://dev.laptop.org/~quozl/z/1dubX7.txt (dmesg)

REG_DBI_CTRL+3 used by _rtl8821ae_check_pcie_dma_hang is effectively
REG_DBI_FLAG+1 (0x353).

REG_DBI_CTRL is REG_DBI_ADDR; a duplicate register definition.

I'm still pondering a few more theories;

- change write_readback, it is true now, and the while()/udelay in
  _rtl8821ae_dbi_read seems a waste, it never executes,

- clearing REG_DBI_CTRL write enable bits at the end of
  _rtl8821ae_dbi_write,

- switching to 32-bit access as used by rtl8192de.

And a giggle from reviewing the code,
_rtl8821ae_wowlan_initialize_adapter says "Patch Pcie Rx DMA hang
after S3/S4 several times.  The root cause has not be found."
... I've learned that root causes that aren't found tend to cause
further problems later.  ;-)

Given this, my gut feel is firmware or silicon problem; RX DMA ceases,
the driver does not detect it, the connection is lost.

-- 
James Cameron
http://quozl.netrek.org/


Re: rtl8821ae keep alive not set, connection lost

2017-09-20 Thread James Cameron
On Tue, Sep 19, 2017 at 07:42:04PM +1000, James Cameron wrote:
> On Thu, Sep 14, 2017 at 07:27:39PM +1000, James Cameron wrote:
> > On Wed, Sep 13, 2017 at 07:39:35PM -0500, Larry Finger wrote:
> > > On 09/13/2017 04:46 PM, James Cameron wrote:
> > > >
> > > >I'll give it some more testing and let you know, but it seems as
> > > >capable of keeping a connection as 4.13 plus my earlier revert.
> > > >
> > 
> > Testing went well; removing the call to enable ASPM was as good as
> > changing the DBI read back to 16-bit width.
> > 
> > > The change I sent earlier should be as good as reverting the change
> > > to write_byte in your reversion.
> > 
> > Yes, that would be the hope.
> > 
> > But with the 16-bit DBI read, the register REG_DBI_CTRL+0 is being
> > read as well, in the first read in _rtl8821ae_enable_aspm_back_door,
> > so perhaps reading that register has an unexpected side-effect.
> > 
> 
> I've ruled that out after testing for several days different kernels
> based on v4.13;
> 
> - add an rtl_read_byte of REG_DBI_CTRL+0 in rtl8821ae_hw_init just
>   after the call to enable_aspm; does not solve problem,
> 
> - add an rtl_read_byte of REG_DBI_CTRL+0 at the start of
>   _rtl8821ae_check_pcie_dma_hang; does not solve problem,

When the problem occurs, register 0x350 bit 25 is set, for which a
comment in _rtl8821ae_check_pcie_dma_hang says means there is an RX
hang.

So perhaps driver should call _rtl8821ae_check_pcie_dma_hang
and _rtl8821ae_reset_pcie_interface_dma.

Any ideas where to do this?

> [...]

-- 
James Cameron
http://quozl.netrek.org/


Re: rtl8821ae keep alive not set, connection lost

2017-09-19 Thread James Cameron
On Thu, Sep 14, 2017 at 07:27:39PM +1000, James Cameron wrote:
> On Wed, Sep 13, 2017 at 07:39:35PM -0500, Larry Finger wrote:
> > On 09/13/2017 04:46 PM, James Cameron wrote:
> > >
> > >I'll give it some more testing and let you know, but it seems as
> > >capable of keeping a connection as 4.13 plus my earlier revert.
> > >
> 
> Testing went well; removing the call to enable ASPM was as good as
> changing the DBI read back to 16-bit width.
> 
> > The change I sent earlier should be as good as reverting the change
> > to write_byte in your reversion.
> 
> Yes, that would be the hope.
> 
> But with the 16-bit DBI read, the register REG_DBI_CTRL+0 is being
> read as well, in the first read in _rtl8821ae_enable_aspm_back_door,
> so perhaps reading that register has an unexpected side-effect.
> 

I've ruled that out after testing for several days different kernels
based on v4.13;

- add an rtl_read_byte of REG_DBI_CTRL+0 in rtl8821ae_hw_init just
  after the call to enable_aspm; does not solve problem,

- add an rtl_read_byte of REG_DBI_CTRL+0 at the start of
  _rtl8821ae_check_pcie_dma_hang; does not solve problem,

Only way to solve the problem at the moment is either;

- reverting 40b368af4b75 ("rtlwifi: Fix alignment issues"), which
  means using rtl_read_word in _rtl8821ae_dbi_read,

or

- removing the two lines that enable ASPM, as you asked me to try.

> Is there any documentation for that register?  I see other code writes
> to REG_DBI_CTRL+3, in _rtl8821ae_check_pcie_dma_hang

I'll repeat and expand on this.  Is there any documentation for this
register, or the other REG_DBI_* registers?

I see that DBI windowed access in rtl8192de is different and yet very
similar.

In rtl8821ae, rtl8723be, and rtl8192de the method seems straightforward;
there are bits for address, bits for write enable by byte, and flag
bits for starting the transfer and completing.

> Evidence of read from REG_DBI_CTRL was captured with an instrumented
> kernel; git diff http://dev.laptop.org/~quozl/y/1dsQ6B.txt yielding
> these dmesg lines;
> 
> [6.010255] rtl_pci: _rtl_pci_update_default_setting const_amdpci_aspm=03
> [6.010338] rtl_pci: rtl_pci_enable_aspm
> [6.034295] ieee80211 phy0: Selected rate control algorithm 'rtl_rc'
> [6.034806] rtlwifi: rtlwifi: wireless switch is on
> [6.196958] rtl8821ae :02:00.0 wlp2s0: renamed from wlan0
> [7.979186] rtl_pci: rtl_pci_disable_aspm
> [7.979306] rtl8821ae: _rtl8821ae_check_pcie_dma_hang
> [8.295360] rtl8821ae: _rtl8821ae_enable_aspm_back_door
> [8.295437] rtl8821ae: _rtl8821ae_dbi_read  070f ->  (@034f)
> [8.295449] rtl8821ae: _rtl8821ae_dbi_write 070f <- ff (@870c)
> [8.295462] rtl8821ae: _rtl8821ae_dbi_read  0719 -> 0200 (@034d)
> [8.295474] rtl8821ae: _rtl8821ae_dbi_write 0719 <- 18 (@2718)
> [8.295477] rtl_pci: rtl_pci_enable_aspm
> [8.469734] rtl_pci: rtl_pci_disable_aspm
> [8.469857] rtl8821ae: _rtl8821ae_check_pcie_dma_hang
> [8.686955] rtl8821ae: _rtl8821ae_enable_aspm_back_door
> [8.687013] rtl8821ae: _rtl8821ae_dbi_read  070f ->  (@034f)
> [8.687025] rtl8821ae: _rtl8821ae_dbi_write 070f <- ff (@870c)
> [8.687038] rtl8821ae: _rtl8821ae_dbi_read  0719 -> 0218 (@034d)
> [8.687050] rtl8821ae: _rtl8821ae_dbi_write 0719 <- 18 (@2718)
> [8.687053] rtl_pci: rtl_pci_enable_aspm
> 
> Observe how the windowed read of DBI register 0x70f causes a read of
> 16-bits at 0x34f, which includes first 8-bits of 0x350 REG_DBI_CTRL.
> 
> By the way, the cold boot value of DBI register 0x719 is 0x00, and
> the warm boot value is 0x18, so I'm confident there isn't a
> comprehensive register reset.  It means that BIOS has relevance; and
> this BIOS is outside my control.  BIOS variation may explain
> difficulty reproducing.

Is there a register for device reset that I can try?  It would help
to exclude BIOS.

> 
> > There has been a report (in Russian unfortunately) at
> > https://www.linux.org.ru/forum/desktop/12620193 of delays in ARP
> > handling.
> 
> Thanks.  I've considered and excluded ARP handling delay.  Though ARP
> renewal is typical reason for device sleep to end.
> 
> With the call to enable ASPM disabled, instead of changing the DBI
> read to 16-bit width, what happens is that the device stops accepting
> data from the access point, packets are buffered there, and are
> transmitted as soon as the device makes the next transmission.
> 
> http://dev.laptop.org/~quozl/z/1dsQBf.txt has the ping and IP tcpdump
> to confirm this.
> 
> I've a monitor mode tcpdump I can send by private mail if required.
> In that the burst of packets shows ICMP echo requests were buffered by
> the access point.
> 
&

Re: [TDLS PATCH V2 1/5] mac80211: Enable TDLS peer buffer STA feature

2017-09-18 Thread James Cameron
On Tue, Sep 19, 2017 at 10:51:04AM +0800, yint...@qti.qualcomm.com wrote:
> From: Yingying Tang <yint...@qti.qualcomm.com>
> 
> Enable TDLS peer buffer STA feature.
> Set extended capability bit to enable buffer STA when driver
> support it.
> 
> Signed-off-by: Yingying Tang <yint...@qti.qualcomm.com>
> ---
>  include/net/cfg80211.h |3 +++
>  net/mac80211/tdls.c|5 -
>  2 files changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/include/net/cfg80211.h b/include/net/cfg80211.h
> index f12fa52..edefc25 100644
> --- a/include/net/cfg80211.h
> +++ b/include/net/cfg80211.h
> @@ -3249,6 +3249,8 @@ struct cfg80211_ops {
>   *   beaconing mode (AP, IBSS, Mesh, ...).
>   * @WIPHY_FLAG_HAS_STATIC_WEP: The device supports static WEP key 
> installation
>   *   before connection.
> + * @WIPHY_FLAG_SUPPORT_TDLS_BUFFER_ST: Device support buffer STA when TDLS is

"_ST:" should be "_STA:"

> + *   established.
>   */
>  enum wiphy_flags {
>   /* use hole at 0 */
> @@ -3275,6 +3277,7 @@ enum wiphy_flags {
>   WIPHY_FLAG_SUPPORTS_5_10_MHZ= BIT(22),
>   WIPHY_FLAG_HAS_CHANNEL_SWITCH   = BIT(23),
>   WIPHY_FLAG_HAS_STATIC_WEP   = BIT(24),
> + WIPHY_FLAG_SUPPORT_TDLS_BUFFER_STA  = BIT(25),
>  };
>  
>  /**
> [...]

-- 
James Cameron
http://quozl.netrek.org/


Re: rtl8821ae keep alive not set, connection lost

2017-09-14 Thread James Cameron
On Wed, Sep 13, 2017 at 07:39:35PM -0500, Larry Finger wrote:
> On 09/13/2017 04:46 PM, James Cameron wrote:
> >
> >I'll give it some more testing and let you know, but it seems as
> >capable of keeping a connection as 4.13 plus my earlier revert.
> >

Testing went well; removing the call to enable ASPM was as good as
changing the DBI read back to 16-bit width.

> The change I sent earlier should be as good as reverting the change
> to write_byte in your reversion.

Yes, that would be the hope.

But with the 16-bit DBI read, the register REG_DBI_CTRL+0 is being
read as well, in the first read in _rtl8821ae_enable_aspm_back_door,
so perhaps reading that register has an unexpected side-effect.

Is there any documentation for that register?  I see other code writes
to REG_DBI_CTRL+3, in _rtl8821ae_check_pcie_dma_hang

Evidence of read from REG_DBI_CTRL was captured with an instrumented
kernel; git diff http://dev.laptop.org/~quozl/y/1dsQ6B.txt yielding
these dmesg lines;

[6.010255] rtl_pci: _rtl_pci_update_default_setting const_amdpci_aspm=03
[6.010338] rtl_pci: rtl_pci_enable_aspm
[6.034295] ieee80211 phy0: Selected rate control algorithm 'rtl_rc'
[6.034806] rtlwifi: rtlwifi: wireless switch is on
[6.196958] rtl8821ae :02:00.0 wlp2s0: renamed from wlan0
[7.979186] rtl_pci: rtl_pci_disable_aspm
[7.979306] rtl8821ae: _rtl8821ae_check_pcie_dma_hang
[8.295360] rtl8821ae: _rtl8821ae_enable_aspm_back_door
[8.295437] rtl8821ae: _rtl8821ae_dbi_read  070f ->  (@034f)
[8.295449] rtl8821ae: _rtl8821ae_dbi_write 070f <- ff (@870c)
[8.295462] rtl8821ae: _rtl8821ae_dbi_read  0719 -> 0200 (@034d)
[8.295474] rtl8821ae: _rtl8821ae_dbi_write 0719 <- 18 (@2718)
[8.295477] rtl_pci: rtl_pci_enable_aspm
[8.469734] rtl_pci: rtl_pci_disable_aspm
[8.469857] rtl8821ae: _rtl8821ae_check_pcie_dma_hang
[8.686955] rtl8821ae: _rtl8821ae_enable_aspm_back_door
[8.687013] rtl8821ae: _rtl8821ae_dbi_read  070f ->  (@034f)
[8.687025] rtl8821ae: _rtl8821ae_dbi_write 070f <- ff (@870c)
[8.687038] rtl8821ae: _rtl8821ae_dbi_read  0719 -> 0218 (@034d)
[8.687050] rtl8821ae: _rtl8821ae_dbi_write 0719 <- 18 (@2718)
[8.687053] rtl_pci: rtl_pci_enable_aspm

Observe how the windowed read of DBI register 0x70f causes a read of
16-bits at 0x34f, which includes first 8-bits of 0x350 REG_DBI_CTRL.

By the way, the cold boot value of DBI register 0x719 is 0x00, and
the warm boot value is 0x18, so I'm confident there isn't a
comprehensive register reset.  It means that BIOS has relevance; and
this BIOS is outside my control.  BIOS variation may explain
difficulty reproducing.

> There has been a report (in Russian unfortunately) at
> https://www.linux.org.ru/forum/desktop/12620193 of delays in ARP
> handling.

Thanks.  I've considered and excluded ARP handling delay.  Though ARP
renewal is typical reason for device sleep to end.

With the call to enable ASPM disabled, instead of changing the DBI
read to 16-bit width, what happens is that the device stops accepting
data from the access point, packets are buffered there, and are
transmitted as soon as the device makes the next transmission.

http://dev.laptop.org/~quozl/z/1dsQBf.txt has the ping and IP tcpdump
to confirm this.

I've a monitor mode tcpdump I can send by private mail if required.
In that the burst of packets shows ICMP echo requests were buffered by
the access point.

> According to Google translate is as follows:
> 
> 
> Periodically, Wi-Fi networker rtl8821ae ceases to respond to ARP,
> which causes the Internet to end. Wireshark looks quite interesting:
> ARP replays can be sent by one large packet a few seconds after
> receiving the requests, ie. they seem to be buffered somewhere.

Yes, buffering at access point.

> I need to explore that ENOBUFS return code.

I've seen ENOBUFS up at the application level with ping too, when the
original problem happens with v4.10 plus stable.

> Your case where the device is unresponsive to pings from another NIC
> until the device transmits may also be an ARP problem.
> 
> For completeness, are you using the 2.4 of 5 GHz band? What is the
> make/model your AP? If possible for you to determine, what firmware
> is it running?

2.4 GHz and 5 GHz reproduces the problem.

Open or WPA reproduces the problem.

Netgear WNDR3800 OpenWrt 12.09-beta, r33312.

Several other access points reproduce the problem, including a
customer's TP-Link TL-WR1042ND with unknown firmware version.

No access point as yet does not reproduce the problem.

Hope that helps, thanks for your ideas.

-- 
James Cameron
http://quozl.netrek.org/


Re: rtl8821ae keep alive not set, connection lost

2017-09-13 Thread James Cameron
On Wed, Sep 13, 2017 at 10:01:37AM -0500, Larry Finger wrote:
> Thank you very much for making the effort to bisect this problem. I
> know that several people have reported the problem, which we cannot
> duplicate; however, most of them just say it drops the connection
> and do nothing more.  In fact, we are lucky to have them even report
> which kernel version they are running!

Yes, in the reported bugs that style is common; almost animistic, very
mystical, and based on heuristics rather than analysis.  ;-)

> As we do not see the problem, we will be relying on you to help
> diagnose the issue. Merely changing the read from 8 to 16 bits
> should not cause any change.

Agreed.

> As _rtl8821ae_dbi_read() is only called from
> _rtl8821ae_enable_aspm_back_door(), we want to test turning off
> ASPM. The following patch will accomplish this. Unfortunately, the
> patch is white-space damaged, thus you will need to apply it
> manually. Please try it to see if it helps your connection
> loss. Note that ASPM settings are preserved through a module
> unload/reload sequence. Thus you will need to reboot after
> rebuilding the driver.

Went back to 4.13, added your test patch, and built kernel.

http://dev.laptop.org/~quozl/z/1dsFOW.txt is dmesg.

New symptom occurs; after 23 seconds since last transmission, the
device becomes unresponsive to ping from another host, but begins to
respond if the device transmits.  Flurry of responses then it settles
down to regular ping.

64 bytes from nl3-e.lan (10.0.0.94): icmp_seq=39 ttl=64 time=1.71 ms
64 bytes from nl3-e.lan (10.0.0.94): icmp_seq=40 ttl=64 time=1.93 ms
64 bytes from nl3-e.lan (10.0.0.94): icmp_seq=41 ttl=64 time=1.71 ms
64 bytes from nl3-e.lan (10.0.0.94): icmp_seq=42 ttl=64 time=1.66 ms
64 bytes from nl3-e.lan (10.0.0.94): icmp_seq=43 ttl=64 time=1.70 ms
64 bytes from nl3-e.lan (10.0.0.94): icmp_seq=44 ttl=64 time=1.69 ms
64 bytes from nl3-e.lan (10.0.0.94): icmp_seq=45 ttl=64 time=37.7 ms
64 bytes from nl3-e.lan (10.0.0.94): icmp_seq=46 ttl=64 time=383 ms
64 bytes from nl3-e.lan (10.0.0.94): icmp_seq=47 ttl=64 time=11464 ms
64 bytes from nl3-e.lan (10.0.0.94): icmp_seq=48 ttl=64 time=10465 ms
64 bytes from nl3-e.lan (10.0.0.94): icmp_seq=49 ttl=64 time=9465 ms
64 bytes from nl3-e.lan (10.0.0.94): icmp_seq=50 ttl=64 time=8466 ms
64 bytes from nl3-e.lan (10.0.0.94): icmp_seq=51 ttl=64 time=7466 ms
64 bytes from nl3-e.lan (10.0.0.94): icmp_seq=52 ttl=64 time=6466 ms
64 bytes from nl3-e.lan (10.0.0.94): icmp_seq=53 ttl=64 time=5466 ms
64 bytes from nl3-e.lan (10.0.0.94): icmp_seq=54 ttl=64 time=4467 ms
64 bytes from nl3-e.lan (10.0.0.94): icmp_seq=55 ttl=64 time=3467 ms
64 bytes from nl3-e.lan (10.0.0.94): icmp_seq=56 ttl=64 time=2468 ms
64 bytes from nl3-e.lan (10.0.0.94): icmp_seq=57 ttl=64 time=1468 ms
64 bytes from nl3-e.lan (10.0.0.94): icmp_seq=58 ttl=64 time=469 ms
64 bytes from nl3-e.lan (10.0.0.94): icmp_seq=59 ttl=64 time=1.79 ms
64 bytes from nl3-e.lan (10.0.0.94): icmp_seq=60 ttl=64 time=1.75 ms
64 bytes from nl3-e.lan (10.0.0.94): icmp_seq=61 ttl=64 time=1.72 ms
64 bytes from nl3-e.lan (10.0.0.94): icmp_seq=62 ttl=64 time=1.68 ms
64 bytes from nl3-e.lan (10.0.0.94): icmp_seq=63 ttl=64 time=1.68 ms
64 bytes from nl3-e.lan (10.0.0.94): icmp_seq=64 ttl=64 time=1.95 ms
64 bytes from nl3-e.lan (10.0.0.94): icmp_seq=65 ttl=64 time=1.68 ms

I'll give it some more testing and let you know, but it seems as
capable of keeping a connection as 4.13 plus my earlier revert.

-- 
James Cameron
http://quozl.netrek.org/


rtl8821ae keep alive not set, connection lost

2017-09-12 Thread James Cameron
Summary: 40b368af4b75 ("rtlwifi: Fix alignment issues") breaks
rtl8821ae keep alive, causing "Connection to AP lost" and deauth, but
why?

Wireless connection is lost after a few seconds or minutes, on every
OLPC NL3 laptop with rtl8821ae, with any stable kernel after 4.10.1,
and any kernel with 40b368af4b75.

dmesg contains

  wlp2s0: Connection to AP 2c:b0:5d:a6:86:eb lost

iw event shows

  wlp2s0: del station 2c:b0:5d:a6:86:eb
  wlp2s0 (phy #0): deauth 74:c6:3b:09:b5:0d -> 2c:b0:5d:a6:86:eb reason 4: 
Disassociated due to inactivity
  wlp2s0 (phy #0): disconnected (local request)

Workaround is to bounce the link, then reconnect;

  ip link set wlp2s0 down
  ip link set wlp2s0 up
  iw dev wlp2s0 connect qz

A nearby monitor host captures a deauthentication packet sent by the
device.

Bisection showed cause is 40b368af4b75 ("rtlwifi: Fix alignment
issues") which changes the width of DBI register read.

On the face of it, 40b368af4b75 looks correct, especially compared
against same function in rtl8723be.

I've no idea why reverting fixes the problem.  I'm hoping someone here
might speculate and suggest ways to test.

As keep alive is set through this path, my guess is that keep alive is
not being set in the device.  Or perhaps reading 16-bits perturbs
another register.  Is there a way to test?

http://dev.laptop.org/~quozl/z/1drtGD.txt dmesg of 4.13

http://dev.laptop.org/~quozl/z/1drt7c.txt dmesg with 4.13 and revert
of 40b368af4b75

-- 
James Cameron
http://quozl.netrek.org/


Re: [PATCH] libertas: Fix lbs_prb_rsp_limit_set()

2017-06-23 Thread James Cameron
On Fri, Jun 23, 2017 at 06:17:38PM +0300, Dan Carpenter wrote:
> The kstrtoul() test was reversed so this always returned -ENOTSUPP.
> 
> Fixes: 27d7f47756f4 ("net: wireless: replace strict_strtoul() with 
> kstrtoul()")
> Signed-off-by: Dan Carpenter <dan.carpen...@oracle.com>

Reviewed-by: James Cameron <qu...@laptop.org>

-- 
James Cameron
http://quozl.netrek.org/


Re: [ldv-project] [net] libertas: potential race condition

2016-06-14 Thread James Cameron
On Tue, Jun 14, 2016 at 05:16:11PM +0400, Pavel Andrianov wrote:
> 08.06.2016 02:51, James Cameron пишет:
> >On Tue, Jun 07, 2016 at 09:39:55AM -0500, Dan Williams wrote:
> >>On Tue, 2016-06-07 at 13:30 +0400, Pavel Andrianov wrote:
> >>>Hi!
> >>>
> >>>There is a potential race condition in
> >>>drivers/net/wireless/libertas/libertas.ko.
> >>>In the function lbs_hard_start_xmit(..), line 159, a socket buffer
> >>>is
> >>>written to priv->current_skb with a spin_lock protection.
> >>>In the function lbs_mac_event_disconnected(..), lines 50-51, the
> >>>field
> >>>current_skb is cleaned. There is no protection used. The
> >>>corresponding
> >>>handlers are activated at the same time in lbs_start_card(..) and
> >>>then
> >>>may be executed simultaneously. Note, there are two structures
> >>>lbs_netdev_ops and mesh_netdev_ops, which have the target handler
> >>>lbs_hard_start_xmit.
> >>>Is it a real race or I have missed something?
> >>Yeah, it looks like it should be grabbing priv->driver_lock before
> >>clearing priv->currenttxskb in lbs_mac_event_disconnected().  Care to
> >>submit a patch after testing?  Do you have any of that hardware?
> >I've hardware, with serial console.
> >
> >Can test any patch, on USB (8388) or SDIO (8686).
> >
> Hi!
> 
> I've prepare the patch for this issue. Could you test it?
> 
> Thank you.

Tested on OLPC XO-1 (usb8388) and XO-1.5 (sd8686) with v4.7-rc3.

Confirmed that lbs_mac_event_disconnected is being called on the
station when hostapd on access point is given SIGHUP.

Longer duration test was;

- SSH to station and run "top -d 0.2",

- send SIGHUP every six seconds, for 300 cycles,

You may add my;

Tested-by: James Cameron <qu...@laptop.org>

-- 
James Cameron
http://quozl.netrek.org/
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ldv-project] [net] libertas: potential race condition

2016-06-07 Thread James Cameron
On Tue, Jun 07, 2016 at 09:39:55AM -0500, Dan Williams wrote:
> On Tue, 2016-06-07 at 13:30 +0400, Pavel Andrianov wrote:
> > Hi!
> > 
> > There is a potential race condition in 
> > drivers/net/wireless/libertas/libertas.ko.
> > In the function lbs_hard_start_xmit(..), line 159, a socket buffer
> > is 
> > written to priv->current_skb with a spin_lock protection.
> > In the function lbs_mac_event_disconnected(..), lines 50-51, the
> > field 
> > current_skb is cleaned. There is no protection used. The
> > corresponding 
> > handlers are activated at the same time in lbs_start_card(..) and
> > then 
> > may be executed simultaneously. Note, there are two structures 
> > lbs_netdev_ops and mesh_netdev_ops, which have the target handler 
> > lbs_hard_start_xmit.
> > Is it a real race or I have missed something?
> 
> Yeah, it looks like it should be grabbing priv->driver_lock before
> clearing priv->currenttxskb in lbs_mac_event_disconnected().  Care to
> submit a patch after testing?  Do you have any of that hardware?

I've hardware, with serial console.

Can test any patch, on USB (8388) or SDIO (8686).

-- 
James Cameron
http://quozl.netrek.org/
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] mwifiex: add __GFP_REPEAT to skb allocation call

2016-03-28 Thread James Cameron
On Tue, Mar 29, 2016 at 12:47:20PM +0800, Wei-Ning Huang wrote:
> "single skb allocation failure" happens when system is under heavy
> memory pressure.  Add __GFP_REPEAT to skb allocation call so kernel
> attempts to reclaim pages and retry the allocation.

Oh, that's interesting, we're back to this symptom again.

Nice to see this fix.

Heavy memory pressure on 3.5 caused dev_alloc_skb failure in this
driver.  Tracked at OLPC as #12694.

-- 
James Cameron
http://quozl.netrek.org/
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] iw: Memory leak in error condition

2015-09-30 Thread James Cameron
On Thu, Oct 01, 2015 at 01:01:18AM +0200, Ola Olsson wrote:
> >From 5239e8e9aa79a131b716398efbf7a1203decbd9b Mon Sep 17 00:00:00 2001
> From: Ola Olsson <ola.ols...@sonymobile.com>
> Date: Thu, 1 Oct 2015 00:43:06 +0200
> Subject: [PATCH] iw: Memory leak in error condition Signed-off-by: Ola
>  Olsson <ola.ols...@sonymobile.com>
> 
> ---
>  scan.c |2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/scan.c b/scan.c
> index e959c1b..f248981 100644
> --- a/scan.c
> +++ b/scan.c
> @@ -446,6 +446,8 @@ static int handle_scan(struct nl80211_state *state,
> if (ies || meshid) {
> tmpies = (unsigned char *) malloc(ies_len + meshid_len);
> if (!tmpies)
> +   free(ies);
> +   free(meshid);
> goto nla_put_failure;

Braces?  { }


> if (ies) {
> memcpy(tmpies, ies, ies_len);
> -- 
> 1.7.9.5
> --
> To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
James Cameron
http://quozl.linux.org.au/
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: question about potential integer truncation in mwifiex_set_wapi_ie and mwifiex_set_wps_ie

2015-09-29 Thread James Cameron
On Tue, Sep 29, 2015 at 05:21:28PM +0200, PaX Team wrote:
> hi all,
> 
> in drivers/net/wireless/mwifiex/sta_ioctl.c the following functions
> 
>   mwifiex_set_wpa_ie_helper
>   mwifiex_set_wapi_ie
>   mwifiex_set_wps_ie
> 
> can truncate the incoming ie_len argument from u16 to u8 when it gets
> stored in mwifiex_private.wpa_ie_len, mwifiex_private.wapi_ie_len and
> mwifiex_private.wps_ie_len, respectively. based on some light code
> reading it seems a length value of 256 is valid (IEEE_MAX_IE_SIZE and
> MWIFIEX_MAX_VSIE_LEN seem to limit it) and thus would get truncated
> to 0 when stored in those u8 fields. the question is whether this is
> intentional or a bug somewhere.

i agree, while there is a test to ensure ie_len is not greater than
256, there is a possibility that it will be exactly 256, which means
256 bytes will be given to memcpy but
mwifiex_private.{wpa,wapi,wps}_ie_len will be zero.

i suggest changing the lengths to u16.  not tested.

diff --git a/drivers/net/wireless/mwifiex/main.h 
b/drivers/net/wireless/mwifiex/main.h
index fe12560..b66e9a7 100644
--- a/drivers/net/wireless/mwifiex/main.h
+++ b/drivers/net/wireless/mwifiex/main.h
@@ -512,14 +512,14 @@ struct mwifiex_private {
struct mwifiex_wep_key wep_key[NUM_WEP_KEYS];
u16 wep_key_curr_index;
u8 wpa_ie[256];
-   u8 wpa_ie_len;
+   u16 wpa_ie_len;
u8 wpa_is_gtk_set;
struct host_cmd_ds_802_11_key_material aes_key;
struct host_cmd_ds_802_11_key_material_v2 aes_key_v2;
u8 wapi_ie[256];
-   u8 wapi_ie_len;
+   u16 wapi_ie_len;
u8 *wps_ie;
-   u8 wps_ie_len;
+   u16 wps_ie_len;
u8 wmm_required;
u8 wmm_enabled;
u8 wmm_qosinfo;

-- 
James Cameron
http://quozl.linux.org.au/
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: How to set a scan on a given frequency

2015-08-31 Thread James Cameron
On Mon, Aug 31, 2015 at 03:53:18PM -0500, Shengrong Yin wrote:
> Hello,
> 
> I was using iw to scan a given frequency.
> For example,
> iw wlan0 scan freq 2412 | grep freq:
> However, the result was scanned ssids with different frequencies
> across 2.4 GHz band, which is
> freq: 2462
> freq: 2462
> freq: 2437
> freq: 2412
> ...
> Why this happened? Shouldn't it return only the ssid with 2412?

No.  A radio receiver in a wireless device can receive beacons on
adjacent frequencies to the frequency it is tuned for.  The signal
strength will be lower, but not low enough to prevent receive.

If you want to restrict results to the frequency you are interested
in, then filter the data after you have received it from the kernel.

But the data returned to you isn't the frequency of the received radio
burst, but is the frequency value in the beacon packet.  Usually this
is the same, but faulty devices, deceptive devices, or high speed
movement could make it different.

You should specify a frequency in your scan request if you can,
because it shortens the time taken by the scan.  If you do not specify
a frequency, then the scan must be repeated for every channel.  There
is a time cost for switching, and a time spent listening on each
channel.

-- 
James Cameron
http://quozl.linux.org.au/
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] rtlwifi: rtl8723be: disable FW control power save

2015-08-10 Thread James Cameron
On Mon, Aug 10, 2015 at 06:47:05PM +0800, Kai Heng Feng wrote:
 Do you use Ubuntu Trusty (14.04)?

Yes, that's what I'm using.

 The rtl8723be firmware is not up-to-date in Trusty's linux-firmware.
 You can grab the newer one from the upstream linux-firmware.

Thanks, but both seem to work fine (apart from the issue in this
thread), and I don't have a list of what has changed.

-- 
James Cameron
http://quozl.linux.org.au/
--
To unsubscribe from this list: send the line unsubscribe linux-wireless in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] rtlwifi: rtl8723be: disable FW control power save

2015-08-09 Thread James Cameron
On Mon, Aug 10, 2015 at 11:38:09AM +0800, AceLan Kao wrote:
 I tried using ips=0 today and found it's not working.
 I ping 8.8.8.8 and got below message within one hour.
ping: sendmsg: No buffer space available

I use both ips=0 and fwlps=0 with Ubuntu kernel and with 4.1, and the
connection remains stable.

fwlps=0 alone was not enough.

ips=0 alone was not enough.

Hope that helps.

-- 
James Cameron
http://quozl.linux.org.au/
--
To unsubscribe from this list: send the line unsubscribe linux-wireless in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4] Add new mac80211 driver mwlwifi.

2015-06-30 Thread James Cameron
On Tue, Jun 30, 2015 at 09:18:44AM -0500, Dan Williams wrote:
 On Tue, 2015-06-30 at 10:21 +0200, Johannes Berg wrote:
  On Tue, 2015-06-30 at 01:49 +, David Lin wrote:
   +++ b/drivers/net/wireless/mwlwifi/Kconfig
   @@ -0,0 +1,17 @@
   +config MWLWIFI
   + tristate Marvell Wireless WiFi driver (mwlwifi)
   + depends on PCI  MAC80211  MWIFIEX_PCIE=n
  
  I still think you need to get rid of this so we can build-test this
  driver properly.
 
 The OLPC 8388 is another device that has two drivers, libertas and
 libertas_tf.

Also 8686.

 I don't think there's any protection between then, you get
 whatever gets loaded first by the kernel.  In that case, I think the
 answer was either (a) only put the driver you want onto the system,
 or

Yes, for end-user.

 (b) manually manage from userspace.

Yes, for developer testing.

 Given that this Marvell hardware is likely intended for more
 customized use-cases (AP, embedded, etc?)  perhaps this would be an
 acceptable option for now...
 
 I tend to agree with Johannes here; the builder of the kernel can
 certainly adjust CONFIG_MWLWIFI and CONFIG_MWIFIEX to fit their
 scenario, including leaving both enabled.
 
 Dan
 
   + select FW_LOADER
   + select OF
  
  This looks OK, though I get a very strange dependency loop warning from
  Kconfig here.
  
  Looks like the driver now builds almost cleanly with sparse/smatch on
  64-bit.
  
  Two warnings remain, both are bugs:
  
   writew(0x00, (void __iomem *)priv-pcmd_buf[1]);
  
  cannot be right. This memory isn't __iomem, it's dma_alloc_coherent, so
  a simple write should be done.
  
  in mwl_rx_ring_init:
  
   rx_hndl-psk_buff =
   dev_alloc_skb(desc-rx_buf_size);
   
   if (skb_linearize(rx_hndl-psk_buff)) {
  
  *crash*. You also later check rx_hndl-psk_buff, but well after it
  already crashed.
  
  Also, this code sequence is utterly bogus. Please try to understand why
  and then remove it.
  
  You should also use paged RX since you're allocating *very large* buffe
  rs. We found that even alloc_pages(1) will fail eventually, you're
  doing an order-2 allocation here for every RX skb. At least used paged
  RX to get it down to order-1.
  
  johannes
  --
  To unsubscribe from this list: send the line unsubscribe linux-wireless in
  the body of a message to majord...@vger.kernel.org
  More majordomo info at  http://vger.kernel.org/majordomo-info.html
 
 
 --
 To unsubscribe from this list: send the line unsubscribe linux-wireless in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
James Cameron
http://quozl.linux.org.au/
--
To unsubscribe from this list: send the line unsubscribe linux-wireless in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Bug ?] 10ec:b723 Realtek RTL8723BE wireless card drops connection

2015-06-15 Thread James Cameron
On Sun, Jun 14, 2015 at 06:13:56PM -0500, Larry Finger wrote:
 On 06/14/2015 06:04 PM, James Cameron wrote:
 On Sun, Jun 14, 2015 at 10:10:30AM -0500, Larry Finger wrote:
 To address your problem, power saving does not work correctly on
 this device. That is why there are numerous posts on the web telling
 people to use ips=0. It seems that Ubuntu people never look at
 anything but the Ubuntu literature; however, I'm sure that I posted
 this suggestion there as well. The Realtek group is currently
 rewriting the entire dynamic management code for all their drivers.
 When complete, this should improve performance and should help the
 power-save condition. No, I do not know when the new code will be
 ready, or how much improvement it will make.
 
 Thanks for summary.
 
 OLPC is also seeing the issue.  Power saving mode impacts battery run
 time; one of our design goals.  ips=0 seems to solve with 3.19, but
 not fully with 4.1-rc7; still some periods of packet loss.
 
 I offer to test any rtl8723be changes.
 
 Please do a bisection between 4.1-rc7 and 3.19.

Thanks.  But I was too hasty in reporting a good result.

Now no difference across those kernel versions; still some
periods of packet loss with ips=0.

Workaround is to use both ips=0 and fwlps=0.  We'll ship with that
unless we hear of a fix.

-- 
James Cameron
http://quozl.linux.org.au/
--
To unsubscribe from this list: send the line unsubscribe linux-wireless in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Bug ?] 10ec:b723 Realtek RTL8723BE wireless card drops connection

2015-06-14 Thread James Cameron
On Sun, Jun 14, 2015 at 10:10:30AM -0500, Larry Finger wrote:
 To address your problem, power saving does not work correctly on
 this device. That is why there are numerous posts on the web telling
 people to use ips=0. It seems that Ubuntu people never look at
 anything but the Ubuntu literature; however, I'm sure that I posted
 this suggestion there as well. The Realtek group is currently
 rewriting the entire dynamic management code for all their drivers.
 When complete, this should improve performance and should help the
 power-save condition. No, I do not know when the new code will be
 ready, or how much improvement it will make.

Thanks for summary.

OLPC is also seeing the issue.  Power saving mode impacts battery run
time; one of our design goals.  ips=0 seems to solve with 3.19, but
not fully with 4.1-rc7; still some periods of packet loss.

I offer to test any rtl8723be changes.

(I'm also looking into IBSS, because Sugar desktop relies on
ad-hoc.  No beacons on creating an IBSS through NetworkManager, but
beacons are fine with iw dev wlan0 ibss join x 2437.  But I'm not
yet ready to report problem; still some debugging to do.)

-- 
James Cameron
http://quozl.linux.org.au/
--
To unsubscribe from this list: send the line unsubscribe linux-wireless in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/PATCH] mwifiex: Driver - Firmware Glitches

2015-04-16 Thread James Cameron
On Thu, Apr 16, 2015 at 11:10:02AM +0200, Florian Achleitner wrote:
 Is the necessity of frequent hardware resets a commonly known issue
 with this chip/firmware? Anybody else experiencing these?

Yes, but how frequent?

 Currently, we see three different scenarios. One of them is
 currently not answered by reset. Refer to the upcoming patch.
 
 (1)  mwifiex_cmd_timeout_func: Timeout cmd .. Ok, after reset.

See this a lot during heavy testing.

 (2)  Firmware wakeup failed.. Ok, after reset.

Never see this.

 (3) DNLD_CMD: host to card failed. No reset triggered. See patch.

Very rarely see this.

However, our experience may not be comparable; we are using 8787 with
a 3.5 kernel, because we haven't the resources to use a later kernel
or get backports working.

Also, we use WOL (wake on lan) heavily; frequent automatic suspends,
with a GPIO wakeup in addition to the SDHCI.

-- 
James Cameron
http://quozl.linux.org.au/
--
To unsubscribe from this list: send the line unsubscribe linux-wireless in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] mwifiex: increase number of probes for specific SSID scans

2015-04-15 Thread James Cameron
On Wed, Apr 15, 2015 at 02:01:44AM -0700, Amitkumar Karwar wrote:
 Hi James,
 
  
  On Tue, Apr 14, 2015 at 07:49:16AM -0700, Amitkumar Karwar wrote:
   It's been observed that device sometimes fails to find AP configured
   in hidden SSID in busy environment. We will increase number of probes
   for specific SSID scans for getting better results.
  
  I don't like this.  It worries me.  What is the underlying cause?  If it
  is something other than collision, why?
  
 
 Idea was to have better chance of finding an AP configured with
 hidden SSID when environment is busy by sending multiple probe
 requests.

Yes, I understand the intention, but I don't understand why busy
environment should cause missed probe response from hidden SSID AP.

Speculating ...

Have you tested this?  Are you sure the probe request is being sent
when the channel is clear?  Are collisions detected?  Is recovery from
collision correct?

Are you sure it isn't caused by scan results being too large in busy
environment?  Is scan for specific SSID given priority in scan
results, by firmware?

I ask because I'm curious; perhaps there is something else happening
to cause scan failure.

I have reports of scan failure with mwifiex, with 8686 and 8787, but
I've not been able to prove the cause of the problem, because of high
complexity of testing.  Customer usually unwilling to go into depth.

  In scenario of tens to a hundred laptops scanning for specific SSID for
  ad-hoc in the Sugar desktop environment, this patch may decrease free
  air time considerably.
 
 You are right. Free air time will be decreased. We have discarded
 this approach considering its consequences.
 
  
  Should the number of probes be a choice of user space?
  
 
 Do you see any potential use case for multiple probe requests? 

No use case that doesn't risk interference.  I've used it in
diagnosis, and in Open Firmware driver.

 I think, we should stick to current implementation of sending 1
 probe request.

That's fine.

 Regards,
 Amitkumar

-- 
James Cameron
http://quozl.linux.org.au/
--
To unsubscribe from this list: send the line unsubscribe linux-wireless in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] mwifiex: increase number of probes for specific SSID scans

2015-04-14 Thread James Cameron
On Tue, Apr 14, 2015 at 07:49:16AM -0700, Amitkumar Karwar wrote:
 It's been observed that device sometimes fails to find AP
 configured in hidden SSID in busy environment. We will increase
 number of probes for specific SSID scans for getting better results.

I don't like this.  It worries me.  What is the underlying cause?  If
it is something other than collision, why?

In scenario of tens to a hundred laptops scanning for specific SSID
for ad-hoc in the Sugar desktop environment, this patch may decrease
free air time considerably.

Should the number of probes be a choice of user space?

-- 
James Cameron
http://quozl.linux.org.au/
--
To unsubscribe from this list: send the line unsubscribe linux-wireless in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] mwifiex: recover from skb allocation failures during RX

2015-03-23 Thread James Cameron

On 24/03/2015, at 1:20 AM, Avinash Patil wrote:

 From: Zhaoyang Liu li...@marvell.com
 
 This patch adds recovery mechanism for SDIO RX during SKB allocation
 failures.
 For allocation failures during multiport aggregation, we skip and drop RX
 packets.
 For single port read case, we will use preallocated card-mpa_rx.buf to
 complete cmd53 read.

Thanks.

Dropping RX data packets is considered safe, as the peer will retry; but does 
your patch drop events or command responses?  

Last year, I tried something similar, and I found that the driver would be 
confused if command responses were dropped.

--
James Cameron

--
To unsubscribe from this list: send the line unsubscribe linux-wireless in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: ARP dropped during WPA handshake

2015-03-13 Thread James Cameron
On Fri, Mar 13, 2015 at 11:29:34AM -0500, Dan Williams wrote:
 On Fri, 2015-03-13 at 16:53 +0100, voncken wrote:
Below, a tcpdump capture from sta.
17:43:12.964096 EAPOL key (3) v2, len 95
17:43:12.998439 EAPOL key (3) v1, len 117
17:43:13.062409 ARP, Request who-has 10.32.61.100 tell 10.32.0.1,
length 28
17:43:13.079989 EAPOL key (3) v2, len 151
17:43:13.082764 EAPOL key (3) v1, len 95
17:43:14.062381 ARP, Request who-has 10.32.61.100 tell 10.32.0.1,
length 28
17:43:14.127101 ARP, Reply 10.32.61.100 is-at b8:88:e3:45:1d:c6 (oui
Unknown), length 46
17:43:14.127123 IP 10.69.1.201.41690  10.32.61.100.5001: UDP, length
1470
17:43:14.127136 IP 10.69.1.201.41690  10.32.61.100.5001: UDP, length
1470
   
You can see the ARP request during the WPA Handshake.
   
   During the initial WPA handshake the connection is not fully set up, and 
   so
   no general traffic can (nor should) pass between the STA and AP.
   That includes ARP and any L2/L3+ protocols, except for EAP and wifi
   management packets.
   
   The interface itself must be IFF_UP before it can pass traffic, including 
   the
   WPA handshake traffic.  IFF_UP only means that the interface can be
   configured at the L2 level and the hardware is active, it does *not* mean 
   the
   interface can pass traffic.
   
   Whatever is causing the ARPs shouldn't be doing that yet, and should be 
   fixed
   to use the interface's operstate or IFF_LOWER_UP instead of IFF_UP.  
   Only
   when the supplicant changes the interface's operstate to IF_OPER_UP is the
   interface *actually* ready to pass traffic.  IFF_UP is not sufficient.
   
  
  Thanks for your reply. 
  
  It seems wpa_supplicant set the operstate to IF_OPER_DORMANT when he 
  received the ASSOCIATED Event from the driver (through netlink). And set 
  the operstate to IF_OPER_UP in case of wpa handshake success.
  
  Is it normal the local ip stack send arp when netdev it is on 
  IF_OPER_DORMANT state?
 
 I'm not sure the kernel stack cares much as long as the device is up.
 It is requesting the ARP because some application is attempting to
 communicate with that IP address.  That application should probably be
 waiting until the interface is actually ready to communicate, which
 means IF_OPER_UP.
 
 But if this is the first WPA handshake with the AP during the initial
 connection, the wifi device shouldn't even have an IP address yet, so
 nothing should be doing ARP on the interface yet.

I thought that ARP was a means to get an IP address before an
interface had an IP address, so the interface spends some time without
an IP address yet generating ARP.

 Perhaps whatever is assigning the IP address to the interface is
 doing it too early, before the interface is IF_OPER_UP?
 
 Dan
   
  
   
   
Any suggestion will be appreciate.
   
Cedric.

  Thanks for your help.
 
  Cedric Voncken
 
 
  --
  To unsubscribe from this list: send the line unsubscribe
  linux-wireless in the body of a message to
  majord...@vger.kernel.org More majordomo info at
  http://vger.kernel.org/majordomo-info.html
   
--
To unsubscribe from this list: send the line unsubscribe
linux-wireless in the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
   
  
  
 
 
 --
 To unsubscribe from this list: send the line unsubscribe linux-wireless in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
James Cameron
http://quozl.linux.org.au/
--
To unsubscribe from this list: send the line unsubscribe linux-wireless in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv2 9/9] mwifiex: delay skb allocation for RX until cmd53 over

2015-03-13 Thread James Cameron
On Fri, Mar 13, 2015 at 05:37:59PM +0530, Avinash Patil wrote:
 From: Zhaoyang Liu li...@marvell.com
 
 This patch moves SKB allocation for RX packets from current
 place i.e. after reading MP regs to place where we already
 have read data from SDIO bus ie after cmd53.
 
 mp_rx_aggr_setup has been modified accordingly to set
 skb_arr to NULL.
 
 Signed-off-by: Zhaoyang Liu li...@marvell.com
 Signed-off-by: Shengzhen Li s...@marvell.com
 Reviewed-by: Amitkumar Karwar akar...@marvell.com
 Reviewed-by: Cathy Luo c...@marvell.com
 Reviewed-by: Avinash Patil pat...@marvell.com
 ---
  drivers/net/wireless/mwifiex/sdio.c | 59 
 ++---
  drivers/net/wireless/mwifiex/sdio.h |  8 ++---
  2 files changed, 33 insertions(+), 34 deletions(-)
 
 diff --git a/drivers/net/wireless/mwifiex/sdio.c 
 b/drivers/net/wireless/mwifiex/sdio.c
 index fdeeb67..330e9d0 100644
 --- a/drivers/net/wireless/mwifiex/sdio.c
 +++ b/drivers/net/wireless/mwifiex/sdio.c

[snip]

 @@ -1538,24 +1550,11 @@ static int mwifiex_process_int_status(struct 
 mwifiex_adapter *adapter)
   rx_len);
   return -1;
   }
 - rx_len = (u16) (rx_blocks * MWIFIEX_SDIO_BLOCK_SIZE);
  
 - skb = mwifiex_alloc_dma_align_buf(rx_len,
 -   GFP_KERNEL |
 -   GFP_DMA);
 -
 - if (!skb) {
 - dev_err(adapter-dev, %s: failed to alloc skb,
 - __func__);
 - return -1;
 - }

I like it.

Because I continue to have problems with dev_alloc_skb failing, and
the return -1; that you are removing doesn't seem to leave the card
and driver in a useful state.

Your patch is hopefully an improvement.

Have you done any testing of response after skb allocation failure
before and after your patch?

-- 
James Cameron
http://quozl.linux.org.au/
--
To unsubscribe from this list: send the line unsubscribe linux-wireless in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/3] mwifiex: add support for SD8801

2015-01-22 Thread James Cameron
On Fri, Jan 23, 2015 at 05:09:17PM +0530, Avinash Patil wrote:
 From: Yogesh Ashok Powar yoge...@marvell.com
 
 SD8801 is Marvell's 1x1 802.11bgn offering.
 This patch adds Device IDs for SD8801 and also defines card
 structure which has definition for register offsets, buffer sizes etc.
 
 Signed-off-by: Yogesh Ashok Powar yoge...@marvell.com
 Signed-off-by: Avinash Patil pat...@marvell.com
 Signed-off-by: Nishant Sarmukadam nisha...@marvell.com
 Signed-off-by: Cathy Luo c...@marvell.com
 Signed-off-by: Frank Huang fra...@marvell.com

Reviewed-by: James Cameron qu...@laptop.org

(Not tested, still on 8787 with 3.5).

-- 
James Cameron
http://quozl.linux.org.au/
--
To unsubscribe from this list: send the line unsubscribe linux-wireless in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH] mwifiex: handle command response in aggregation

2015-01-22 Thread James Cameron
Firmware does occasionally pass a command response to the host on the
data port.  Ensure it is processed.

http://dev.laptop.org/ticket/12749
---
Seen on device firmwares:

14.66.9.p96
14.66.35.p52

Others not tested.

 drivers/net/wireless/mwifiex/sdio.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/net/wireless/mwifiex/sdio.c 
b/drivers/net/wireless/mwifiex/sdio.c
index 933dae1..8fe6147 100644
--- a/drivers/net/wireless/mwifiex/sdio.c
+++ b/drivers/net/wireless/mwifiex/sdio.c
@@ -1240,8 +1240,7 @@ static int mwifiex_sdio_card_to_host_mp_aggr(struct 
mwifiex_adapter *adapter,
/* copy pkt to deaggr buf */
skb_deaggr = card-mpa_rx.skb_arr[pind];
 
-   if ((pkt_type == MWIFIEX_TYPE_DATA)  (pkt_len =
-card-mpa_rx.len_arr[pind])) {
+   if (pkt_len = card-mpa_rx.len_arr[pind]) {
 
memcpy(skb_deaggr-data, curr_ptr, pkt_len);
 
@@ -1251,7 +1250,7 @@ static int mwifiex_sdio_card_to_host_mp_aggr(struct 
mwifiex_adapter *adapter,
mwifiex_decode_rx_packet(adapter, skb_deaggr,
 pkt_type);
} else {
-   dev_err(adapter-dev, wrong aggr pkt:
+   dev_err(adapter-dev, bad aggr pkt:
 type=%d len=%d max_len=%d\n,
pkt_type, pkt_len,
card-mpa_rx.len_arr[pind]);
-- 
1.9.1


-- 
James Cameron
http://quozl.linux.org.au/
--
To unsubscribe from this list: send the line unsubscribe linux-wireless in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] mwifiex: simplify ad hoc join capability info

2014-11-10 Thread James Cameron
While preparing an ad-hoc start command, the capability info bitmap is
needlessly set from the command, and then the ESS bit cleared.

Change to set the bitmap directly without reference to the command.

Signed-off-by: James Cameron qu...@laptop.org
---
 drivers/net/wireless/mwifiex/join.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/net/wireless/mwifiex/join.c 
b/drivers/net/wireless/mwifiex/join.c
index 8d6c259..411a6c2 100644
--- a/drivers/net/wireless/mwifiex/join.c
+++ b/drivers/net/wireless/mwifiex/join.c
@@ -880,9 +880,7 @@ mwifiex_cmd_802_11_ad_hoc_start(struct mwifiex_private 
*priv,
 
/* Set Capability info */
bss_desc-cap_info_bitmap |= WLAN_CAPABILITY_IBSS;
-   tmp_cap = le16_to_cpu(adhoc_start-cap_info_bitmap);
-   tmp_cap = ~WLAN_CAPABILITY_ESS;
-   tmp_cap |= WLAN_CAPABILITY_IBSS;
+   tmp_cap = WLAN_CAPABILITY_IBSS;
 
/* Set up privacy in bss_desc */
if (priv-sec_info.encryption_mode) {
-- 
1.9.1


-- 
James Cameron
http://quozl.linux.org.au/
--
To unsubscribe from this list: send the line unsubscribe linux-wireless in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Patch v2 3/5] mwifiex: fix out of memory issue observed for USB chipsets

2014-11-04 Thread James Cameron
On Wed, Nov 05, 2014 at 05:04:29PM +0530, Amitkumar Karwar wrote:
 On some platforms, system goes out of memory during heavy
 Rx traffic with our USB chipsets.
 
 In case of SDIO/PCIe, after receiving 50 packets in Rx queue
 we stop processing interrupts till packets pending fall below
 low threshold i.e 20. We don't have similar logic for USB,
 so if host platform is slow, we would hit a case where firmware
 keeps on pushing packets at high speed than driver/kernel can
 process.
 
 We will stop submitting URBs for Rx data when pending packet
 count reaches high threshold and restart them when enough
 packets are consumed to solve the problem.

Other drivers and user activity can deplete memory.  How does this
patch solve the problem when dev_alloc_skb fails?  I'm worried the
underlying issue remains; handling out of memory.

-- 
James Cameron
http://quozl.linux.org.au/
--
To unsubscribe from this list: send the line unsubscribe linux-wireless in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: mwifiex_usb_submit_rx_urb: dev_alloc_skb failed when conected to 5GHz

2014-10-14 Thread James Cameron
On Tue, Oct 14, 2014 at 10:25:01AM +0200, Belisko Marek wrote:
 Hi Amitkumar,
 
 On Tue, Oct 14, 2014 at 9:08 AM, Amitkumar Karwar akar...@marvell.com wrote:
  Hi Marek,
 
   I tried both (slightly modified as we're in 3.9 kernel) but
   issue is still reproducible. My patch against 3.9 sources:
 
  Thanks a lot for the tests.
 
   One thing which is not yet still clear to me why kernel console
   is completely unresponsive when receiving packets in high
   rates. When use iperf (on client) with -b40m it is OK but when
   increase to -b100m then console is completely unresponsive until
   iperf finish.
 
  Does the system recover when -b100M iperf is finished? Can we
  run iperf with -b40M later?  Do you see dev_alloc_skb failed
  messages in dmesg when console is unresponsive?
 When we get dev_alloc_skb failed then interface is dead (cannot
 ping ...) so no recovery is possible only system reboot.

This symptom was familiar to me, but on sdio.c, which is very
different code.

I've had a brief look at usb.c and offer the following comments:

- a list of six data endpoint urb is allocated in mwifiex_usb_rx_init,
  because MWIFIEX_RX_DATA_URB is 6,

- when data endpoint urb is submitted, a new skb is allocated, in
  mwifiex_usb_submit_rx_urb, and this is the only source of
  dev_alloc_skb failed message,

- in normal situation, when data endpoint urb is complete, skb is
  either freed or handed up to mwifiex_usb_recv, and the urb is
  resubmitted, which causes a new skb to be allocated.

- if dev_alloc_skb failed message appears, one data endpoint urb has
  been lost and is not re-used,

- if six dev_alloc_skb failed messages appear, the interface should
  be dead for data receive only.

Amitkumar mentioned this on 9th October; corresponding URB won't get
submitted.  I think this should be fixed; dev_alloc_skb should be
harmless failure, please retry.

I don't see why interface is dead with only one dev_alloc_skb
failed message.

 I don't see  dev_alloc_skb failed when console is unresponsive.
 
   Any other ideas
   what to change to check? Thanks.
 
  Could you please share dmesg log with dynamic debug enabled (using
  attached script) captured when the problem occurs?
 I tried to capture logs but when enable DYNAMIC_DEBUG I cannot
 reproduce issue (running test  30 minutes without allocation
 failure).

Yes, I've seen similar; turn on debugging, and timing critical bug
goes away.

Serial console?  If so, try turning it off, and logging to dmesg
buffer only.

-- 
James Cameron
http://quozl.linux.org.au/
--
To unsubscribe from this list: send the line unsubscribe linux-wireless in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: mwifiex_usb_submit_rx_urb: dev_alloc_skb failed when conected to 5GHz

2014-09-17 Thread James Cameron
On Wed, Sep 17, 2014 at 03:52:52AM -0700, Amitkumar Karwar wrote:
 Hi BR,
 
  Dear Amitkumar Karwar,
  
  some additional info.
  
  On Thu, Sep 11, 2014 at 5:09 PM, Amitkumar Karwar akar...@marvell.com
  wrote:
   Hi BR,
  
  
   I'm using 3.9 mainline mwifiex driver for wireless usb card. Doing
   some throughput testing (with iperf) in 5GHz I got following
  failures:
   [ 221.521799] usb 1-1: mwifiex_usb_submit_rx_urb: dev_alloc_skb
   failed
  
   This is skb allocation failure returned by kernel. 4k buffer is
  always allocated for Rx packets. This issue doesn't seem to be specific
  to 5Ghz.
  Yes you're right. I can reproduce issue also with 2.4GHz (doing iperf
  testing as mentioned in other email) by pinging device with card.
  
  
   I checked which which size fails to allocate and it's 4096 bytes. I
   was looking to changes in never kernel releases but I cannot find
   anything obvious. When connected to 2.4GHz I cannot reproduce issue
   though. I'm using FW version mwifiex 1.0 (14.68.29.p26).
  
  
   Could you please provide the platform details?
   How often the problem occurs during throughput testing? Are there any
  specific steps?
  One more observation is that when problem occurred complete system is
  unresponsive (console is almost completely dead).
 
 Thanks for the more information.
 Skb alloc failure should be gracefully handled. We will look into
 this issue.

If you get time, I'd also appreciate a look into the issue on sdio.c
during data receive.

When dev_alloc_skb fails the interrupt handler does not rewind
the driver state in preparation for a retry.  This is not graceful.

http://dev.laptop.org/ticket/12694 has details, and an adequate
solution we are using in 3.5 to rewind the driver state:

http://dev.laptop.org/git/olpc-kernel/commit/?h=arm-3.5id=59fcaf10cce5bbdc370ec1c262b12aeb66ed1dca

We're using 8787.

 
  I can workaround issue by decreasing iperf bandwidth to ~40m. I think
  in this situation we're running out of memory by exhaustive skb
  allocations.
 
 Actually 6 4K size buffers are being allocated for Rx and Tx data during 
 traffic.
 Probably your platform runs out of memory after these allocations.
 
 Could you please try changing this 
 number(MWIFIEX_TX_DATA_URB/MWIFIEX_RX_DATA_URB macros) to 3?
 
 Regards,
 Amitkumar Karwar
 N?r??yb?X??ǧv?^?)޺{.n?+{??*ޕ?,?{ay?ʇڙ?,j??f???h???z??w???
 ???j:+v???w?j?mzZ+?ݢj??!

-- 
James Cameron
http://quozl.linux.org.au/
--
To unsubscribe from this list: send the line unsubscribe linux-wireless in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html