RE: [BISECTED] brcmfmac issue since 4.3.0-rc3

2016-01-04 Thread Hante Meuleman
Hi Carlo

The expection was likely fixed with a patch with the title:

brcmfmac: Fix double free on exception at module load.

The change you found which is causing a crash caused a double free. 

It won't fix the firmware crash, but it will fix the system crash on failure.

Regards,
Hante

-Original Message-
From: carlo.cai...@gmail.com [mailto:carlo.cai...@gmail.com] On Behalf Of Carlo 
Caione
Sent: Friday, December 25, 2015 11:16 AM
To: Arend van Spriel
Cc: Carlo Caione; Hante Meuleman; Arend Van Spriel; kv...@codeaurora.org; 
linux-wireless@vger.kernel.org
Subject: Re: [BISECTED] brcmfmac issue since 4.3.0-rc3

On Fri, Dec 25, 2015 at 10:54 AM, Arend van Spriel  wrote:
> Hi Carlo,
>
> This smells like a firmware crash. Upon the brcmf_bus_start failure we
> bail out on the probe. I suspect in the error path there is an issue
> introduced (or exposed) by the mentioned patch. Did not look further
> into it as the family is giving me the angry look during the holidays ;-)

Hi Arend,
Thank you for the quick reply, this can definitely wait ;-)
Happy holidays and happy new year!

-- 
Carlo Caione



RE: [PATCH FIX?] brcmfmac: fix possible overflows in flowrings code by bumping u8 to u16

2016-02-01 Thread Hante Meuleman
Hi,

Didn’t know that that patch got reverted. Our internal repository still has 
that patch and we have had no problems whatsoever for the last 7 (june 26 this 
got submitted internally) months on any pcie target (also used it on r8000 
openwrt). Rafal's patch is overdone. If you don’t up the hashsize space then 
there is really no use to switch to u16. You can simply limit the nr of 
flowrings to 255 in brcmf_proto_msgbuf_attach or apply the patch I originally 
submitted.

Regards,
Hante

-Original Message-
From: Rafał Miłecki [mailto:zaj...@gmail.com] 
Sent: Sunday, January 31, 2016 12:44 PM
To: Arend van Spriel
Cc: Kalle Valo; linux-wireless@vger.kernel.org; Brett Rudley; Arend Van Spriel; 
Franky (Zhenhui) Lin; Hante Meuleman; brcm80211-dev-list
Subject: Re: [PATCH FIX?] brcmfmac: fix possible overflows in flowrings code by 
bumping u8 to u16

On 31 January 2016 at 10:56, Arend van Spriel  wrote:
> On 31-01-16 01:07, Rafał Miłecki wrote:
>> Some devices may use more than 255 flowings, below is log from BCM4366:
>> [  194.606245] brcmfmac: brcmf_pcie_init_ringbuffers Nr of flowrings is 264
>>
>> At various places we were using u8 which could lead to storing wrong
>> number or infinite loops when indexing incorrectly. Initially this
>> issue was spotted as infinite loop in brcmf_flowring_detach.
>
> There has already been a patch submitted for this [1]. However, because
> you reported issues with it on your device (not sure which one). Did you
> test this patch on that particular device.

I wasn't aware Hante's patch contained changes from this patch. Anyway
the main difference is that my patch doesn't touch
BRCMF_FLOWRING_HASHSIZE.

So my patch:
1) Fixes possible overflows in flowrings

Hante's patch:
1) Fixes possible overflows in flowrings
2) Bumps BRCMF_FLOWRING_HASHSIZE

It was bumping BRCMF_FLOWRING_HASHSIZE that caused problems on my
BCM43602 device back then. Please note BCM43602 wasn't affected by
flowings overflows because it wasn't using more than 255 of them:
brcmfmac: brcmf_pcie_init_ringbuffers Nr of flowrings is 132

The story is different with my BCM4366. I didn't try it with bumping
BRCMF_FLOWRING_HASHSIZE but it suffers from overflows in flowrings as
it seems to be independent issue. It's crucial that BCM4366 uses more
than 255 flowrings:
brcmfmac: brcmf_pcie_init_ringbuffers Nr of flowrings is 264


> I want Hante to review your patch, but indeed this would be 4.5 material
> and probably stable.

I just realized BCM4366 support went into 4.4 not 4.5, so Cc-ing
stable for 4.4+ is probably a good idea.

N�r��yb�X��ǧv�^�)޺{.n�+{��*ޕ�,�{ay�ʇڙ�,j��f���h���z��w���
���j:+v���w�j�mzZ+�ݢj"��!�i

RE: [PATCHv2] brcm80211:Fix locking region in the function brcmf_sdio_sendfromq

2015-10-05 Thread Hante Meuleman
Hello Nicholas, 

I can understand that if you read the comment that there is supposed to be a 
lock taken, but why would you assume that it is the txq_lock? Two things:

1) The comment that a look should have been taken is wrong. Once upon a time a 
packet could be transmitted/handled from two paths. To protect for concurrency 
the callee of brcmf_sdio_txpkt was assumed to have take some sort of lock to 
protect against this. 
2) Nowadays all frames are put on the txq and then pulled from that queue in 
the DPC. This queue is protected with a lock and therefor a lock is taken 
around the dequeueing of txq in dpc. Also the reason for the chosen var name 
txq_lock. This lock protects for concurrent access to the txq and the lock 
should be taken as short as possible, and therefor this patch could possible 
result in throughput degradation and does not solve anything.

Please do not submit this patch, if anything then remove the comment from the 
function brcmf_sdio_txpkt.

Regards,
Hante

-Original Message-
From: Nicholas Krause [mailto:xerofo...@gmail.com] 
Sent: Monday, October 05, 2015 1:11 AM
To: Brett Rudley
Cc: Arend Van Spriel; Franky (Zhenhui) Lin; Hante Meuleman; 
kv...@codeaurora.org; Pieter-Paul Giesberts; linux-wireless@vger.kernel.org; 
brcm80211-dev-list; net...@vger.kernel.org; linux-ker...@vger.kernel.org
Subject: [PATCHv2] brcm80211:Fix locking region in the function 
brcmf_sdio_sendfromq

This fixes the locking region in the function brcmf_sdio_sendfromq
to properly lock around the call to the function brcmrf_sdio_txpkt
in order to avoid concurrent access issues when calling this
function as stated in the function's comments about the caller
being required to lock around the call to this particular function.

Signed-off-by: Nicholas Krause 
---
v2
Fix issue with deadlock occurrence due to not unlocking before
break in for loop if the variable i is equal to zero
 drivers/net/wireless/brcm80211/brcmfmac/sdio.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/net/wireless/brcm80211/brcmfmac/sdio.c 
b/drivers/net/wireless/brcm80211/brcmfmac/sdio.c
index f990e3d..260f063 100644
--- a/drivers/net/wireless/brcm80211/brcmfmac/sdio.c
+++ b/drivers/net/wireless/brcm80211/brcmfmac/sdio.c
@@ -2388,11 +2388,13 @@ static uint brcmf_sdio_sendfromq(struct brcmf_sdio 
*bus, uint maxframes)
break;
__skb_queue_tail(&pktq, pkt);
}
-   spin_unlock_bh(&bus->txq_lock);
-   if (i == 0)
+   if (i == 0) {
+   spin_unlock_bh(&bus->txq_lock);
break;
+   }
 
ret = brcmf_sdio_txpkt(bus, &pktq, SDPCM_DATA_CHANNEL);
+   spin_unlock_bh(&bus->txq_lock);
 
cnt += i;
 
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: brcmf_txfinalize misses 802.1x packet leading to infinite WARNINGs

2016-09-15 Thread Hante Meuleman
Hi Rafal,

Thank you for the extensive debugging. We are looking into this. Arend wrote
yesterday to ask for detailed timing on wen eapol is inserted. We want this
so we can increase the timeout. This is not a "nice" way to solve the
problem, and it should be solved in firmware, but in the meanwhile we do
want to increase timer, because we think that ampdu issues can rise at any
given moment and even with changes/updates in firmware it might be necessary
to increase timeout.
Second problem is harder, it is good to see that the frame gets returned to
driver at some point. Our biggest worry is that a frame remains indefinitely
in the firmware, but that appears not to be the case. Now why could this
fail. There is one possible reason I found, and that is when a flowring is
deleted while it holds the eapol, see flowring.c. It does not call the
brcmf_txfinalize, but frees the packet directly. I think this is wrong but
need to investigate this in more detail. In the meanwhile, if you keep doing
tests I would like to ask you to add a WARN_ON() call to the function
__brcmu_pkt_buf_free_skb where you print ***BUG*** so we know where the
packet got freed from.

Regards,
Hante

-Original Message-
From: Rafał Miłecki [mailto:zaj...@gmail.com]
Sent: Thursday, September 15, 2016 10:12 AM
To: Hante Meuleman; Arend van Spriel; brcm80211-dev-l...@broadcom.com
Cc: linux-wireless@vger.kernel.org; Rafał Miłecki
Subject: brcmf_txfinalize misses 802.1x packet leading to infinite WARNINGs

Hi,

Yesterday I explained on OpenWrt forum [0] that there are 2 problems
leading to WARNINGs triggered by brcmf_netdev_wait_pend8021x.

The first one is firmware problem with A-MPDU implementation. I already
reported this in "AMPDU stalls with brcmfmac4366b-pcie.bin triggering
WARNINGs" e-mail thread [1].

Another one (I'm reporting right now) is related to brcmfmac and its
counting of 802.1x packets. The idea is simple:
1) In ndo_start_xmit callback there is check for ETH_P_PAE and code
   increasing 802.1x counter by 1.
2) In brcmf_txfinalize there is check for ETH_P_PAE and code decreasing
   802.1x counter by 1.
This is needed as some operations have to be handled without any 802.1x
packet pending.

Thanks to my debugging code (you can find it at the end) I just noticed
that:
1) Sometimes (very rarely) brcmf_txfinalize doesn't detect 802.1x packet
2) brcmu_pkt_buf_free_skb gets called and skb gets freed
3) Counter remains not-decreased and brcmf_netdev_wait_pend8021x will
   always time out.

I'm not sure why this could be happening. The check for ETH_P_PAE looks
exactly the same in both places. My only idea so far is firmware
corrupting skb data. This fools brcmfmac code and my debugging code was
lucky enough to keep tracing skbs (by their addresses) to notice that.

I'm going to extend my debugging patch by making a copy of eth header
and comparing it with corrupted one. As this bug occurs very rarely it
make take days or weeks to get any update.

I'd love to hear any comment meanwhile.

[0] https://forum.openwrt.org/viewtopic.php?pid=338235#p338235
[1] https://marc.info/?t=14738321621&r=1&w=2

[ 1438.965889] brcmfmac: CONSOLE: 028168.028 wl0.3: wlc_send_bar: seq 0xee
tid 0
[ 1438.993255] brcmfmac: CONSOLE: 028168.055 wl0.3: wlc_send_bar: seq 0xef
tid 0
[ 1439.000743] brcmfmac: CONSOLE: 028168.060 wl0.3: wlc_send_bar: seq 0xef
tid 0
[ 1439.008060] brcmfmac: CONSOLE: 028168.067 wl0.3: wlc_send_bar: seq 0xef
tid 0
[ 1439.015568] brcmfmac: CONSOLE: 028168.073 wl0.3: wlc_send_bar: seq 0xef
tid 0
[ 1439.022786] brcmfmac: CONSOLE: 028168.077 wl0.3: wlc_send_bar: seq 0xef
tid 0
[ 1439.030150] brcmfmac: CONSOLE: 028168.078 wl0.3: wlc_send_bar: seq 0xef
tid 0
[ 1439.037342] brcmfmac: CONSOLE: 028168.081 wl0.3: wlc_send_bar: seq 0xef
tid 0
[ 1439.044554] brcmfmac: CONSOLE: 028168.088 wl0.3: wlc_send_bar: seq 0xef
tid 0
[ 1439.051756] brcmfmac: CONSOLE: 028168.090 wl0.3: wlc_send_bar: seq 0xef
tid 0
[ 1439.362540] brcmfmac: CONSOLE: 028168.425 wl0.3: wlc_send_bar: seq 0xf0
tid 0
[ 1439.369827] brcmfmac: CONSOLE: 028168.431 wl0.3: wlc_send_bar: seq 0xf0
tid 0
[ 1439.378367] brcmfmac: CONSOLE: 028168.432 wl0.3: wlc_send_bar: seq 0xf0
tid 0
[ 1439.385590] brcmfmac: CONSOLE: 028168.434 wl0.3: wlc_send_bar: seq 0xf0
tid 0
[ 1439.392839] brcmfmac: CONSOLE: 028168.435 wl0.3: wlc_send_bar: seq 0xf0
tid 0
[ 1439.400041] brcmfmac: CONSOLE: 028168.438 wl0.3: wlc_send_bar: seq 0xf0
tid 0
[ 1439.407226] brcmfmac: CONSOLE: 028168.440 wl0.3: wlc_send_bar: seq 0xf0
tid 0
[ 1439.414848] brcmfmac: CONSOLE: 028168.447 wl0.3: wlc_send_bar: seq 0xf0
tid 0
[ 1439.422069] brcmfmac: CONSOLE: 028168.451 wl0.3: wlc_send_bar: seq 0xf0
tid 0
[ 1439.429269] brcmfmac: CONSOLE: 028168.454 wl0.3: wlc_send_bar: seq 0xf0
tid 0
[ 1439.436454] brcmfmac: CONSOLE: 028168.455 wl0.3: wlc_send_bar: seq 0xf0
tid 0
[ 1439.443653] brcmfmac: CONSOLE: 028168.457 wl0.3: wlc_send_bar: seq 0xf0
tid 0
[ 1439.450848] brcmfmac: CO

RE: [PATCH] brcmfmac: Remove waitqueue_active check

2016-03-09 Thread Hante Meuleman
Hi Hui,

Excellent find. Looks like a perfect solution to me.

Regards,
Hante

-Original Message-
From: Hui Wang [mailto:hui.w...@canonical.com]
Sent: Wednesday, March 09, 2016 8:25 AM
To: brud...@broadcom.com; ar...@broadcom.com; fran...@broadcom.com;
meule...@broadcom.com; piete...@broadcom.com
Cc: linux-wireless@vger.kernel.org; brcm80211-dev-l...@broadcom.com;
hui.w...@canonical.com
Subject: [PATCH] brcmfmac: Remove waitqueue_active check

We met a problem of pm_suspend  when repeated closing/opening the lid
on a Lenovo laptop (1/20 reproduce rate), below is the log:

[ 199.735876] PM: Entering mem sleep
[ 199.750516] e1000e: EEE TX LPI TIMER: 0011
[ 199.856638] Trying to free nonexistent resource
<d000-d0ff>
[ 201.753566] brcmfmac: brcmf_pcie_suspend: Timeout on response for
entering D3 substate
[ 201.753581] pci_legacy_suspend(): brcmf_pcie_suspend+0x0/0x1f0
[brcmfmac] returns -5
[ 201.753585] dpm_run_callback(): pci_pm_suspend+0x0/0x160 returns -5
[ 201.753589] PM: Device :04:00.0 failed to suspend async: error -5

Through debugging, we found when problem happens, it is not the device
fails to enter D3, but the signal D3_ACK comes too early to pass the
waitqueue_active() check.

Just like this:
brcmf_pcie_send_mb_data(devinfo, BRCMF_H2D_HOST_D3_INFORM);
// signal is triggered here
wait_event_timeout(devinfo->mbdata_resp_wait, devinfo->mbdata_completed,
   BRCMF_PCIE_MBDATA_TIMEOUT);

So far I think it is safe to remove waitqueue_active check since there
is only one place to trigger this signal (sending
BRCMF_H2D_HOST_D3_INFORM). And it is not a problem calling wake_up
event earlier than calling wait_event.

Cc: Brett Rudley 
Cc: Hante Meuleman 
Cc: Franky (Zhenhui) Lin 
Cc: Pieter-Paul Giesberts 
Cc: Arend van Spriel 
Signed-off-by: Hui Wang 
---
 drivers/net/wireless/broadcom/brcm80211/brcmfmac/pcie.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/pcie.c
b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/pcie.c
index 0480b70..5f12ff3 100644
--- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/pcie.c
+++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/pcie.c
@@ -675,10 +675,8 @@ static void brcmf_pcie_handle_mb_data(struct
brcmf_pciedev_info *devinfo)
brcmf_dbg(PCIE, "D2H_MB_DATA: DEEP SLEEP EXIT\n");
if (dtoh_mb_data & BRCMF_D2H_DEV_D3_ACK) {
brcmf_dbg(PCIE, "D2H_MB_DATA: D3 ACK\n");
-   if (waitqueue_active(&devinfo->mbdata_resp_wait)) {
-   devinfo->mbdata_completed = true;
-   wake_up(&devinfo->mbdata_resp_wait);
-   }
+   devinfo->mbdata_completed = true;
+   wake_up(&devinfo->mbdata_resp_wait);
}
 }

-- 
1.9.1
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: using DMA-API on ARM

2014-12-05 Thread Hante Meuleman
Thank you for investigating. Your analysis matches what was intended to be 
done. Regarding the cast, you're right, I'll improve it. My idea was that long 
long is always 64bit, but when casting directly to long long I get a compiler 
(when using C=2) warning: "warning: right shift by bigger than source value". 
Probably I concluded wrongly that I should cast it to long first. On the 
targets used the high address was always 0, so I got away with it thusfar, but 
indeed it should and shall be fixed.

Regards,
Hante

-Original Message-
From: Russell King - ARM Linux [mailto:li...@arm.linux.org.uk] 
Sent: vrijdag 5 december 2014 12:11
To: Arnd Bergmann
Cc: linux-arm-ker...@lists.infradead.org; Arend Van Spriel; linux-wireless; 
brcm80211-dev-list; David Miller; linux-ker...@vger.kernel.org
Subject: Re: using DMA-API on ARM

On Fri, Dec 05, 2014 at 10:52:02AM +0100, Arnd Bergmann wrote:
> I'm still puzzled why you'd need a single dma_sync_single_for_cpu()
> after dma_alloc_coherent though, you should not need any. Is it possible
> that the driver accidentally uses __raw_readl() instead of readl()
> in some places and you are just lacking an appropriate barrier?

Digging into the driver, it looks like individual DMA buffers are
allocated (via brcmf_pcie_init_dmabuffer_for_device) and registered
into a "commonring" layer.

Whenever the buffer is written to, space is first allocated via a call
to brcmf_commonring_reserve_for_write() or
brcmf_commonring_reserve_for_write_multiple(), data written to the
buffer, followed by a call to brcmf_commonring_write_complete().

brcmf_commonring_write_complete() calls two methods at that point:
cr_write_wptr() and cr_ring_bell(), which will be
brcmf_pcie_ring_mb_write_wptr() and brcmf_pcie_ring_mb_ring_bell().

The first calls brcmf_pcie_write_tcm16(), which uses iowrite16(),
which contains the appropriate barrier.  The bell ringing functions
also use ioread*/iowrite*().

So, on the write side, it looks fine from the barrier perspective.

On the read side, brcmf_commonring_get_read_ptr() is used before
a read access to the ring - which calls the cr_update_wptr() method,
which in turn uses an ioread16() call.  After the CPU has read data
from the ring, brcmf_commonring_read_complete() is used, which uses
iowrite16().

So, I don't see a barrier problem on the read side.

However, I did trip over this:

static void *
brcmf_pcie_init_dmabuffer_for_device(struct brcmf_pciedev_info *devinfo,
 u32 size, u32 tcm_dma_phys_addr,
 dma_addr_t *dma_handle)
{
void *ring;
long long address;

ring = dma_alloc_coherent(&devinfo->pdev->dev, size, dma_handle,
  GFP_KERNEL);
if (!ring)
return NULL;

address = (long long)(long)*dma_handle;

Casting to (long) will truncate the DMA handle to 32-bits on a 32-bit
architecture, even if it supports 64-bit DMA addresses.  There's a couple
of other places where this same truncation occurs:

address = (long long)(long)devinfo->shared.scratch_dmahandle;

and

address = (long long)(long)devinfo->shared.ringupd_dmahandle;

In any case, wouldn't using a u64 type for "address" be better - isn't
"long long" 128-bit on 64-bit architectures?

-- 
FTTC broadband for 0.8mile line: currently at 9.5Mbps down 400kbps up
according to speedtest.net.
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: using DMA-API on ARM

2014-12-05 Thread Hante Meuleman
The problem is with data coming from device, so DMA from device to host. The 
DMA takes place from device local memory to host memory, where the host memory 
is allocated with dma_alloc_coherent, which we thought should not be cached. 
The host is an ARM (as is the device). The data being DMA'ed ends up in a ring 
buffer. This ring is only being read by host when it is a d2h ring (device to 
host). Each entry in the ring is 32 bytes, and contains a sequence number. The 
sequence number is a modulo 253 and the ring has 256 entries. At some point we 
read a sequence number which was "old". Then we loop to see if the sequence 
number changes. The loop is 1024 times and uses an rmb() call. This does not 
help. After looping 1024 times it is still reading the same value for sequence 
number. Now it can happen that 256 entries further we are still reading this 
old sequence (so iso reading a seqnum which is off by 3, it is off by 6). This 
was an indication that it was cached. So instead of using rmb() we used 
dma_sync_single_for_cpu. When using that call the problem was fixed. Whenever 
an old sequence number was read a single call to dma_sync_single_for_cpu would 
flush the cache and the next read would be correct. 

However: this indicates that dma_alloc_coherent on an ARM target may result in 
a memory buffer which can be cached which conflicts with the API of this 
function. This problem has sofar not been observed on x86 hosts.

Regards,
Hante

-Original Message-
From: Will Deacon [mailto:will.dea...@arm.com] 
Sent: vrijdag 5 december 2014 13:24
To: Russell King - ARM Linux
Cc: Arend Van Spriel; Marek Szyprowski; linux-arm-ker...@lists.infradead.org; 
David Miller; linux-ker...@vger.kernel.org; brcm80211-dev-list; linux-wireless
Subject: Re: using DMA-API on ARM

On Fri, Dec 05, 2014 at 09:45:08AM +, Russell King - ARM Linux wrote:
> On Fri, Dec 05, 2014 at 10:22:22AM +0100, Arend van Spriel wrote:
> > For our brcm80211 development we are working on getting brcmfmac driver
> > up and running on a Broadcom ARM-based platform. The wireless device is
> > a PCIe device, which is hooked up to the system behind a PCIe host
> > bridge, and we transfer information between host and device using a
> > descriptor ring buffer allocated using dma_alloc_coherent(). We mostly
> > tested on x86 and seen no issue. However, on this ARM platform
> > (single-core A9) we detect occasionally that the descriptor content is
> > invalid. When this occurs we do a dma_sync_single_for_cpu() and this is
> > retried a number of times if the problem persists. Actually, found out
> > that someone made a mistake by using virt_to_dma(va) to get the
> > dma_handle parameter. So probably we only provided a delay in the retry
> > loop. After fixing that a single call to dma_sync_single_for_cpu() is
> > sufficient. The DMA-API-HOWTO clearly states that:
> > 
> > """
> > the hardware should guarantee that the device and the CPU can access the
> > data in parallel and will see updates made by each other without any
> > explicit software flushing.
> > """
> > 
> > So it seems incorrect that we would need to do a dma_sync for this
> > memory. That we do need it seems like this memory can end up in
> > cache(?), or whatever happens, in some rare condition. Is there anyway
> > to investigate this situation either through DMA-API or some low-level
> > ARM specific functions.
> 
> It's been a long while since I looked at the code, and the code for
> dma_alloc_coherent() has completely changed since then with the
> addition of CMA.  I'm afraid that anything I would say about it would
> not be accurate without research into the possible paths through that
> code - it's no longer just a simple allocator.
> 
> What you say is correct however: the memory should not have any cache
> lines associated with it, if it does, there's a bug somewhere.
> 
> Also, the memory will be weakly ordered, which means that writes to such
> memory can be reordered.  If ordering matters, barriers should be used.
> rmb() and wmb() can be used for this.
> 
> (Added Marek for comment on dma_alloc_coherent(), Will for comment on
> barrier stuff.)

I'm not quite clear on the issue being seen here: is this on write from
the CPU to the descriptor ring, or the other way around (or both?).

Either way, you need barriers on the CPU side to ensure ordering of
accesses to the buffer. rmb/wmb will work, but are heavier than what you
need (relaxed versions have been proposed on LKML recently).

Will
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: using DMA-API on ARM

2014-12-05 Thread Hante Meuleman
Ok, I'll add the necessary debug to get all the information out, 
but it will take some time to get it done, so I won't have anything 
before Monday.

-Original Message-
From: Russell King - ARM Linux [mailto:li...@arm.linux.org.uk] 
Sent: vrijdag 5 december 2014 14:24
To: Hante Meuleman
Cc: Will Deacon; Arend Van Spriel; Marek Szyprowski; 
linux-arm-ker...@lists.infradead.org; David Miller; 
linux-ker...@vger.kernel.org; brcm80211-dev-list; linux-wireless
Subject: Re: using DMA-API on ARM

Please wrap your message - replying to a message which looks like this in
my editor is far from easy, and gives me much more work to /manually/
reformat it before I can reply to it:

On Fri, Dec 05, 2014 at 12:56:45PM +0000, Hante Meuleman wrote:
> The problem is with data coming from device, so DMA from device to host. The $
>
> However: this indicates that dma_alloc_coherent on an ARM target may result i$
>
> Regards,
> Hante

Thanks.

On Fri, Dec 05, 2014 at 12:56:45PM +, Hante Meuleman wrote:
> However: this indicates that dma_alloc_coherent on an ARM target may
> result in a memory buffer which can be cached which conflicts with
> the API of this function.

If the memory has an alias which is cacheable, it is possible for cache
lines to get allocated via that alias, even if the alias has no explicit
accesses to it.

This is something which I've been going on for quite literally /years/ -
mismatched cache attributes can cause unpredictable behaviour.  I've had
a lot of push back from people who are of the opinion that "if it works
for me, then there isn't a problem" and I eventually gave up fighting
the battle, especially as the ARM architecture people weakened my
reasoning behind it by publishing a relaxation of the "no differing
attributes" issue.  This was particularly true of those who wanted to
use ioremap() on system memory - and cases such as
dma_init_coherent_memory().

So, I never fixed this problem in the original DMA allocator code; I
basically gave up with it.  It's a latent bug which did need to be fixed,
and is still present today in the non-CMA case.

The symptoms which you are reporting sound very much like this kind of
problem - the virtual address for the memory returned by
dma_alloc_coherent() will not be cacheable memory - it will have been
remapped using map_vm_area().  However, there could very well be a fully
cacheable lowmem mapping of that memory, which if a read (speculative or
otherwise) will bring a cache line in, and because the caches are VIPT
or PIPT, that cache line can be hit via the non-cacheable mapping too.

What I /really/ need is more evidence of this to tell those disbelievers
where to stick their flawed arguments. :)

-- 
FTTC broadband for 0.8mile line: currently at 9.5Mbps down 400kbps up
according to speedtest.net.
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: using DMA-API on ARM

2014-12-08 Thread Hante Meuleman
Still using outlook, but will limit the line length, I hope that works for the
moment. Attached is a log with the requested information, it is a little
bit non-standard though. The dump code from the mm was copied in
the driver and called from there, mapping the prints back to our local
printf, but it should produce the same. I did this because I didn't realize
the table is static.

Some background on the test setup: I'm using a Broadcom reference 
design AP platform with an BRCM 4708 host SOC. For the AP router 
platform the opensource packet OpenWRT was used. Some small
modifications were made to get it to work on our HW. Only one core
is enabled for the moment (no time to figure out how to enable the
other one). Openwrt was configured to use kernel 3.18-rc2 and 
the brcmfmac of the compat-wireless code was updated with our 
latest code (minor patches, which have been submitted already). 
The device used is 43602 pcie device. Some modifications to the build
system were made to enable PCIE. The test is to connect with a 
client to the AP and run iperf (TCP). The test can run for many hours
without a problem, but sometimes fails very quickly.

The log: first the ring allocation info is printed. Starting at
16.124847, ring 2, 3 and 4 are rings used for device to host. In this
log the failure is on a read of ring 3. Ring 3 is 1024 entries of each
16 bytes. The next thing printed is the kernel page tables. Then some 
OpenWRT info and the logging of part of the connection setup. Then at 
1780.130752 the logging of the failure starts. The sequence number is 
modulo 253 with ring size of 1024 matches an "old" entry (read 40, 
expected 52). Then the different pointers are printed followed by 
the kernel page table. The code does then a cache invalidate on the 
dma_handle and the next read the sequence number is correct.

Regards,
Hante



Please wrap your message - replying to a message which looks like this in
my editor is far from easy, and gives me much more work to /manually/
reformat it before I can reply to it:

On Fri, Dec 05, 2014 at 12:56:45PM +, Hante Meuleman wrote:
> The problem is with data coming from device, so DMA from device to host. The $
>
> However: this indicates that dma_alloc_coherent on an ARM target may result i$
>
> Regards,
> Hante

Thanks.

On Fri, Dec 05, 2014 at 12:56:45PM +, Hante Meuleman wrote:
> However: this indicates that dma_alloc_coherent on an ARM target may
> result in a memory buffer which can be cached which conflicts with
> the API of this function.

If the memory has an alias which is cacheable, it is possible for cache
lines to get allocated via that alias, even if the alias has no explicit
accesses to it.

This is something which I've been going on for quite literally /years/ -
mismatched cache attributes can cause unpredictable behaviour.  I've had
a lot of push back from people who are of the opinion that "if it works
for me, then there isn't a problem" and I eventually gave up fighting
the battle, especially as the ARM architecture people weakened my
reasoning behind it by publishing a relaxation of the "no differing
attributes" issue.  This was particularly true of those who wanted to
use ioremap() on system memory - and cases such as
dma_init_coherent_memory().

So, I never fixed this problem in the original DMA allocator code; I
basically gave up with it.  It's a latent bug which did need to be fixed,
and is still present today in the non-CMA case.

The symptoms which you are reporting sound very much like this kind of
problem - the virtual address for the memory returned by
dma_alloc_coherent() will not be cacheable memory - it will have been
remapped using map_vm_area().  However, there could very well be a fully
cacheable lowmem mapping of that memory, which if a read (speculative or
otherwise) will bring a cache line in, and because the caches are VIPT
or PIPT, that cache line can be hit via the non-cacheable mapping too.

What I /really/ need is more evidence of this to tell those disbelievers
where to stick their flawed arguments. :)

-- 
FTTC broadband for 0.8mile line: currently at 9.5Mbps down 400kbps up
according to speedtest.net.
root@OpenWrt:/# dmesg
[0.00] Booting Linux on physical CPU 0x0
[0.00] Linux version 3.18.0-rc2 (developer@meuleman-test) (gcc version 
4.8.3 (OpenWrt/Linaro GCC 4.8-2014.04 r43121) ) #66 SMP Mon Dec 8 12:38:39 CET 
2014
[0.00] CPU: ARMv7 Processor [413fc090] revision 0 (ARMv7),c5387d
[0.00] CPU: PIPT / VIPT nonaliasing data cache, VIPT aliasing 
instruction cache
[0.00] Machine model: Buffalo WZR-1750DHP (BCM4708)
[0.00] Memory policy: Data cache writealloc
[0.00] On node 0 totalpages: 32768
[0.00] free_area_init_node: node 0, pgdat c038ee00, node_mem_map 
c7efb000
[0.00]   Normal zone: 256 pages used for memmap
[0.00]   Normal zone: 0 pages reserved
[ 

RE: [PATCH] brcmfmac: detect & reject faked packet generated by a firmware

2018-01-31 Thread Hante Meuleman
It is an 802.2 frame, more specifically a LLC XID frames. So why it exists?
And more over, why would we crash as an result? Decoding info can be found
here:

https://www.cisco.com/c/en/us/support/docs/ibm-technologies/logical-link-control-llc/12247-45.html#con3

The frame was likely sent by the stack from remote site PC, should be
possible to capture with tcpdump.

I've seen these frames before, but don’t know what they are for. The frame
appears to be correctly encoded. The ethertype, is not a type, but a len
field. The only protocol with such a short len allowed is llc, see also

https://www.savvius.com/networking-glossary/ethernet/frame_formats/

So it is 802.2 (also known as LLC)

Regads,
Hante



-Original Message-
From: Arend van Spriel [mailto:arend.vanspr...@broadcom.com]
Sent: Wednesday, January 31, 2018 3:20 PM
To: Rafał Miłecki
Cc: Rafał Miłecki; Kalle Valo; Franky Lin; Hante Meuleman; Chi-Hsien Lin;
Wright Feng; Pieter-Paul Giesberts; linux-wireless@vger.kernel.org;
brcm80211-dev-list@broadcom.com; brcm80211-dev-l...@cypress.com
Subject: Re: [PATCH] brcmfmac: detect & reject faked packet generated by a
firmware

On 1/31/2018 2:14 PM, Rafał Miłecki wrote:
> On 2018-01-30 12:47, Arend van Spriel wrote:
>> On 1/30/2018 10:09 AM, Rafał Miłecki wrote:
>>> From: Rafał Miłecki 
>>>
>>> When using 4366b1 and 4366c0 chipsets with more recent firmwares
>>> 1) 10.10 (TOB) (r663589)
>>> 2) 10.10.122.20 (r683106)
>>> respectively, it is impossible to use brcmfmac with interface in AP
>>> mode. With the AP interface bridged and multicast used, no STA will
>>> be able to associate; the STA will be immediately disassociated when
>>> attempting to associate.
>>>
>>> Debugging revealed this to be caused by a "faked" packet (generated
>>> by firmware), that is passed to the networking subsystem and then
>>> back to the firmware. Fortunately this packet is easily identified
>>> and can be detected and ignored as a workaround for misbehaving
>>> firmware.
>>
>> I am actually wondering what this packet is. Have you checked in
>> brcmf_msgbuf_process_rx_complete(). I am curious what buflen is there
>> and what eth_type_trans() will do to the packet, ie. what protocol.
>> As everything should be 802.3 we could/should add a length check of
>> 14 bytes.
>
> Did you find anything?

I was going to say no, but below I see I misinterpreted your commit message
and thought we were getting 6 bytes from firmware, but it is 6 bytes +
ETH_HLEN.

> I got some debugging info, hopefully this is what you expected

and more ... :-)

> [  144.356648] brcmfmac: [brcmf_msgbuf_process_rx_complete] msg.msgtype:
> 0x12
> [  144.363559] brcmfmac: [brcmf_msgbuf_process_rx_complete] msg.ifidx:
> 0x00
> [  144.370374] brcmfmac: [brcmf_msgbuf_process_rx_complete] msg.flags:
> 0x80
> [  144.377179] brcmfmac: [brcmf_msgbuf_process_rx_complete] msg.rsvd0:
> 0x00
> [  144.383986] brcmfmac: [brcmf_msgbuf_process_rx_complete]
> msg.request_id: 0x0041
> [  144.391661] brcmfmac: [brcmf_msgbuf_process_rx_complete]
> compl_hdr.status:   0x
> [  144.399156] brcmfmac: [brcmf_msgbuf_process_rx_complete]
> compl_hdr.flow_ring_id: 0x
> [  144.407179] brcmfmac: [brcmf_msgbuf_process_rx_complete]
> metadata_len:   0x
> [  144.414334] brcmfmac: [brcmf_msgbuf_process_rx_complete] data_len:
> 0x0014
> [  144.421227] brcmfmac: [brcmf_msgbuf_process_rx_complete] data_offset:
> 0x
> [  144.428288] brcmfmac: [brcmf_msgbuf_process_rx_complete] flags:
> 0x0001
> [  144.434918] brcmfmac: [brcmf_msgbuf_process_rx_complete] rx_status_0:
> 0x
> [  144.442334] brcmfmac: [brcmf_msgbuf_process_rx_complete] rx_status_1:
> 0x
> [  144.449750] brcmfmac: [brcmf_msgbuf_process_rx_complete] rsvd0:
> 0x0001
> [  144.456724] brcmfmac: [brcmf_msgbuf_process_rx_complete] skb->data:
> ff ff ff ff  ff ff ec 10  7b 5f ?? ??  00 06 00 01  af 81 01
> 00 [  144.467883] brcmfmac: [brcmf_msgbuf_process_rx_complete]
> skb->protocol:  0x0400

Not sure what protocol that is. Can not find it in if_ether.h. Will look in
our firmware repo for it.

Thanks,
Arend

> (just masked 2 bytes of my MAC)
>
>
> diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/msgbuf.c
> b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/msgbuf.c
> index 1bd4b96..08cdcaf 100644
> --- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/msgbuf.c
> +++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/msgbuf.c
> @@ -1172,7 +1172,43 @@ brcmf_msgbuf_process_rx_complete(struct
> brcmf_msgbuf *msgbuf, void *buf)
> 

RE: brcmfmac bus level addressing issues.

2017-07-18 Thread Hante Meuleman
Hi Ian,

I've really no idea what you mean. Brcmf_pcie_select_core is redundant?
Care to try to boot a device without this function? Called all over the
place? Hell no, it is default pointing to PCIE2 and functions which need
to map the window to another core will do so, temporarily, but move it
back to PCIE2, at least that is the idea, may be you found a bug? We are
for sure not going to hide the selecting of the window in the read/write
routines, that would result in a giant amount of overhead. Currently PCIE
devices reach 1.5Gpbs, we need to go faster than that in the near future.
We don't want just change that to make it bit nicer. Why do you need
to see the same in the SDIO and PCIE drivers? SDIO and PCIE differ in many
aspects. Sure some things can be improved in or the other, but they sure
don't need to look alike.

It may be ugly, but thusfar it has not caused bugs (and there won't be
large changes in the near future where it will cause bugs). The concept in
pcie bus part is simple. The main core to select is PCIE2 (once you have
booted and established initial communication with firmware) and every
routine which needs to access another core will change the window
temporarily and set it back once done. Please don't mess with this, it
works, it is clear and it is fast.

Regards,
Hante

-Original Message-
From: Ian Molton [mailto:i...@mnementh.co.uk]
Sent: Tuesday, July 18, 2017 11:45 AM
To: linux-wireless@vger.kernel.org
Cc: Arend Van Spriel; Franky Lin; Hante Meuleman
Subject: RFC: brcmfmac bus level addressing issues.

Hi folks,

Its come to my attention that there is a substantial disparity between the
PCIe and SDIO variants of the driver when it comes to handlign writes via
the backplane.

The SDIO bus code checks, upon every (32 bit) access, wether the backplane
window is in the right place, and only updates it if it has actually
changed.

The PCIe code sets the window *regardless* of wether its changed, on
*every single* write.

The SDIO code has no explicit selection of the window address based on the
core selected.

The PCIe code uses brcmf_pcie_select_core(), which, ultimately, appears to
be totally redundant, due to the above mentioned 32 bit access code
setting the window register regardless of its current value.



Can we standardise how this is supposed to work? Its ugly, and its going
to cause bugs, ultimately. I suspect its probably the cause of the code I
mentioned in my recent patch titled "brcmfmac: HACK - stabilise the value
of ->sbwad in use for some xfer routines."

We really *dont* want to call brcmf_pcie_select_core() all over the place.
Its inefficient, traversing a list as it does, when all it does is return
a pointer that never actually changes, to the core structures that contain
addressing info.

I'd propose we do what I've done in my SDIO patch set - we call
brcmf_chip_get_core() *once* after the chip has been probed, and store the
pointer returned.

The window register setting can be hidden in the read32/write32 buscore
ops, and will never be incorrect from that point, and the code can simply
use a flat address space model. A single if() has got to be less costly
than writing the register on overy single read32/write32...

Anyhow, whatever we decide to do, can we do the same thing in both bus
drivers?

-Ian


RE: brcmfmac bus level addressing issues.

2017-07-19 Thread Hante Meuleman
Dear Ian,

Yes, I was intentionally provocative. Sure, you made no personal attacks,
but this is how you entered my space; I've been working (partrly) for the
last 5 years on brcmfmac and to be honest, I don't work much on it
anymore. But here it is; my screen filled with mails from you with a
number of patches on brcmfmac and some of them were on code I wrote, and
then you use words like:

- horrid
- crap
- awfully
- hideous
- spaghetti mess

No matter what, how impersonal code is supposed to be, I took it personal,
and I got agitated. Now I never directly reply on patches, so I let it be,
but then you start with "brcmfac bus level addressing issues" and you come
with claims I really didn't understand.

Stuff like " The PCIe code sets the window *regardless* of wether its
changed, on *every single* write." Is totally incorrect. Sure if you limit
yourself to the function brcmf_pcie_buscore_{read,write}32(). But you
talked about performance, and msgbuf prototocol is where performance
counts and that don't use those functions. You wrote: " The PCIe code uses
brcmf_pcie_select_core(), which, ultimately, appears to be totally
redundant" and that is simply not true. So I decided to answer that mail
and provoke you a bit. I'm sorry for that, I shouldn't have done that. But
really you may not be directly insulting persons, but when you write: "
Can we standardise how this is supposed to work? Its ugly, and its going
to cause bugs, ultimately" then I just read another negative strong word
(in this case ugly) and it is partly about code I wrote.

You obviously spent some time on creating all these patches, but why
provoke/agitate people? Why use such strong words? You may not consider
them personally, but I just explained why I do. Can you at least
understand that? Just some word of advice and then I hope we can leave it
to that. You can achieve much more without those negative strong words,
the words I listed above are what pop up with me every mail you wrote, not
the code.

Regards,
Hante


-Original Message-
From: Ian Molton [mailto:i...@mnementh.co.uk]
Sent: Wednesday, July 19, 2017 12:45 AM
To: Arend van Spriel; linux-wireless@vger.kernel.org
Cc: Franky Lin; brcm80211-dev-l...@broadcom.com; Hante Meuleman
Subject: Re: brcmfmac bus level addressing issues.

On 18/07/17 21:44, Arend van Spriel wrote:
>
>
>>> Stop. Listen. Fix it.
>
> Hi Ian,
>
> Now stop yourself and listen. This is no collaboration in any sense.

Arend - This is not directed at you - you've been polite, and I've no
issue with you.

I had felt that the initial conflict over this was over, but then Hante
wrote his last post. I may have been a bit blunt about the code initially,
but I made *no personal attacks* up until Hante's post (I've just read
*all* my posts on this thread and checked to be sure). I hop you can
understand that his comments earlier were well out of line. I've actually
chosen to take the day off today, before I replied to him with something
I'd really regret posting. I was furious.

> but you seem to easily disregard us.

Actually, no. You yourself have been helpful and responsive.

> Also you said it yourself all your patches are cleanup. What needs
> fixing is the gscan feature detection and that one is on me.

Fair enough - I guess now that its in the kernel it needs fixing before
4.13 - but I would argue that no new features should go into the driver
for the short term once that is done. Lets give it a good spring clean.

I am, actually, able to put some time into this, and *want* to help - MY
current employer would benefit greatly from this driver becoming stable.

(as it stands, back to back wpasuplpicant runs make it keel over, as does
module unload and reload).

> So I am chasing that although I am actually on vacation.

Dude - enjoy your vacation. Seriously.

> As long as my wife does not notice we are ok :-p

Shhh :)

>>> You could at the very least give some feedback on the 29 patches I
>>> sent cleaning it up.
>
> That really has to wait until I return as well as running some tests
> on older 4329 and some newer chipsets.

Fair. When do you return? No pressure - it just means I know I can hold
off and do something else on my project until then.

> Actually, Hante is not working full-time on this driver. Neither am I.

Fair enough. He did wax lyrical about his "seniority" on the driver
though. Pulling rank does not sit well with me.

> Now please stop the insults when you hit some push back or in generalf
> for that matter.

I'd recommend re-reading these threads. Whilst I'd grant my initial
comments in the first cut of the patchset were a little harsh, they were
NOT personally directed.

Hante crossed that line, but I've said my piece on the matter now. I have
nothing further to say to Hante on the matt