Re: [ath9k-devel] ath9k ARMv7 OOPS in v4.8.6, v4.2.8

2017-01-08 Thread miaoqing

>> Okay, so i was 0, so running UP probably isn't going to help.  r7 is
>> also spec_priv->rfs_chan_spec_scan.
>> 
>> So, I think the question is... how is this NULL - and has it always
>> been NULL...
> 
> The problem appears to be that ath_cmn_process_fft() isn't called that
> often.  When it is, it crashes in ath_cmn_is_fft_buf_full() because
> spec_priv->rfs_chan_spec_scan is NULL when ATH9K_DEBUGFS=n. :-(
> 
> I'm running with ATH9K_DEBUGFS=y now.  If it goes a couple of days
> without crashing, I'll gin up a patch.
> 

A similar patch was applied to ath-next branch: 
https://patchwork.kernel.org/patch/9431163/.

--
Miaoqing
___
ath9k-devel mailing list
ath9k-devel@lists.ath9k.org
https://lists.ath9k.org/mailman/listinfo/ath9k-devel


Re: [ath9k-devel] ath9k ARMv7 OOPS in v4.8.6, v4.2.8

2017-01-08 Thread Russell King - ARM Linux
On Wed, Nov 23, 2016 at 07:15:39PM +, Jason Cooper wrote:
> --- oops from v4.8.6 #2 --
> [42059.303625] Unable to handle kernel NULL pointer dereference at virtual 
> address 0020
> [42059.311799] pgd = c0004000
> [42059.314522] [0020] *pgd=
> [42059.318162] Internal error: Oops: 17 [#1] SMP ARM
> [42059.322889] Modules linked in: ath9k ath9k_common ath9k_hw ath
> [42059.328809] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.8.6 #37
> [42059.334755] Hardware name: Marvell Armada 370/XP (Device Tree)
> [42059.340613] task: c0b091c0 task.stack: c0b0
> [42059.345176] PC is at ath_cmn_process_fft+0xa0/0x578 [ath9k_common]
> [42059.351388] LR is at ath_cmn_process_fft+0xc4/0x578 [ath9k_common]
> [42059.357598] pc : []lr : []psr: 8153
> [42059.357598] sp : c0b01cd0  ip :   fp : 
> [42059.369127] r10: c0b034d4  r9 : 0069  r8 : 006c
> [42059.374374] r7 :   r6 : dcfbd340  r5 : c0b03da0  r4 : 
> [42059.380930] r3 : 0001  r2 : 0008  r1 : 0004  r0 : 

Well, the good news is that it's reproducable.

It looks like it could be this:

static int
ath_cmn_is_fft_buf_full(struct ath_spec_scan_priv *spec_priv)
{
for_each_online_cpu(i)
ret += relay_buf_full(rc->buf[i]);

where i = 8 (r2) and rc->buf is r7.  That's just a guess though, as
there's precious little to go on with the Code: line - modern GCCs
don't give us much with the Code: line anymore to figure out what's
going on without the exact object files.

e5933000ldr r3, [r3]
e1d330b4ldrhr3, [r3, #4]
e58d3030str r3, [sp, #48]   ; 0x30
ea02b   1c 
e7970102ldr r0, [r7, r2, lsl #2]

What makes me wonder though is that if i=8, that means you must have a
system with 9 online CPUs, which is probably unlikely - or maybe that's
the problem, for_each_online_cpu() is going wrong...

If it's not that line of code, I don't see what else it would be based
on the output of my compiler - there's only one case in my disassembly
that corresponds with the single code line that we have to go on, and
it's this:

 a44:   e5983020ldr r3, [r8, #32]
 a48:   e793010aldr r0, [r3, sl, lsl #2] <===
 a4c:   ebfebl  0 
 a50:   e0844000add r4, r4, r0
 a54:   e59f9434ldr r9, [pc, #1076]
 a58:   e28a2001add r2, sl, #1
 a5c:   e3a01004mov r1, #4
 a60:   e1a9mov r0, r9
 a64:   ebfebl  0 <_find_next_bit_le>
 a68:   e5953000ldr r3, [r5]
 a6c:   e153cmp r0, r3
 a70:   e1a0a000mov sl, r0
 a74:   baf2blt a44 

I'm debating now about whether we need to dump more of the code in the
oops - both before and after the faulting instruction...

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.
___
ath9k-devel mailing list
ath9k-devel@lists.ath9k.org
https://lists.ath9k.org/mailman/listinfo/ath9k-devel


Re: [ath9k-devel] ath9k ARMv7 OOPS in v4.8.6, v4.2.8

2017-01-08 Thread Russell King - ARM Linux
On Wed, Nov 23, 2016 at 08:59:17PM +, Jason Cooper wrote:
> As requested on irc:

Thanks.

>  7f0: ea02b   800 
>  7f4: e7970102ldr r0, [r7, r2, lsl #2]
>  7f8: ebfebl  0 
>  7fc: e0844000add r4, r4, r0
>  800: e300a000movwsl, #0
>  804: e28b2001add r2, fp, #1
>  808: e340a000movtsl, #0
>  80c: e3a01004mov r1, #4
>  810: e1aamov r0, sl
>  814: ebfebl  0 <_find_next_bit_le>
>  818: e5953000ldr r3, [r5]
>  81c: e153cmp r0, r3
>  820: e1a0b000mov fp, r0
>  824: e2802008add r2, r0, #8
>  828: baf1blt 7f4 

Okay, so i was 0, so running UP probably isn't going to help.  r7 is
also spec_priv->rfs_chan_spec_scan.

So, I think the question is... how is this NULL - and has it always
been NULL...

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.
___
ath9k-devel mailing list
ath9k-devel@lists.ath9k.org
https://lists.ath9k.org/mailman/listinfo/ath9k-devel


Re: [ath9k-devel] ath9k ARMv7 OOPS in v4.8.6, v4.2.8

2017-01-08 Thread Jason Cooper
On Thu, Nov 24, 2016 at 02:06:57PM +0800, miaoq...@codeaurora.org wrote:
> 
> >>Okay, so i was 0, so running UP probably isn't going to help.  r7 is
> >>also spec_priv->rfs_chan_spec_scan.
> >>
> >>So, I think the question is... how is this NULL - and has it always
> >>been NULL...
> >
> >The problem appears to be that ath_cmn_process_fft() isn't called that
> >often.  When it is, it crashes in ath_cmn_is_fft_buf_full() because
> >spec_priv->rfs_chan_spec_scan is NULL when ATH9K_DEBUGFS=n. :-(
> >
> >I'm running with ATH9K_DEBUGFS=y now.  If it goes a couple of days
> >without crashing, I'll gin up a patch.
> >
> 
> A similar patch was applied to ath-next branch:
> https://patchwork.kernel.org/patch/9431163/.

Hmm.  Ok, I'm giving it a spin on my board with SMP=y, ATH9K_DEBUGFS=n
(so the only change from known crashing is the patch) and we'll see how
it goes.

Honestly, though, I think the real problem is when kernels are built
without ATH9K_DEBUGFS.  Did the reporter of the crash say if that was
enabled on his system or not?

I'm concerned that there may be other code lurking that secretly depends
on ATH9K_DEBUGFS being enabled.

thx,

Jason.
___
ath9k-devel mailing list
ath9k-devel@lists.ath9k.org
https://lists.ath9k.org/mailman/listinfo/ath9k-devel


Re: [ath9k-devel] ath9k ARMv7 OOPS in v4.8.6, v4.2.8

2017-01-08 Thread Jason Cooper
All,

On Wed, Nov 23, 2016 at 09:40:53PM +, Jason Cooper wrote:
> I'm running with ATH9K_DEBUGFS=y now.  If it goes a couple of days
> without crashing, I'll gin up a patch.

Well, it survived overnight, which it's never done before. :-) I'm
testing the relay_open() NULL patch now.

thx,

Jason.
___
ath9k-devel mailing list
ath9k-devel@lists.ath9k.org
https://lists.ath9k.org/mailman/listinfo/ath9k-devel


Re: [ath9k-devel] ath9k ARMv7 OOPS in v4.8.6, v4.2.8

2017-01-08 Thread Jason Cooper
On Wed, Nov 23, 2016 at 09:17:45PM +, Russell King - ARM Linux wrote:
> On Wed, Nov 23, 2016 at 08:59:17PM +, Jason Cooper wrote:
> > As requested on irc:
> 
> Thanks.
> 
> >  7f0:   ea02b   800 
> >  7f4:   e7970102ldr r0, [r7, r2, lsl #2]
> >  7f8:   ebfebl  0 
> >  7fc:   e0844000add r4, r4, r0
> >  800:   e300a000movwsl, #0
> >  804:   e28b2001add r2, fp, #1
> >  808:   e340a000movtsl, #0
> >  80c:   e3a01004mov r1, #4
> >  810:   e1aamov r0, sl
> >  814:   ebfebl  0 <_find_next_bit_le>
> >  818:   e5953000ldr r3, [r5]
> >  81c:   e153cmp r0, r3
> >  820:   e1a0b000mov fp, r0
> >  824:   e2802008add r2, r0, #8
> >  828:   baf1blt 7f4 
> 
> Okay, so i was 0, so running UP probably isn't going to help.  r7 is
> also spec_priv->rfs_chan_spec_scan.
> 
> So, I think the question is... how is this NULL - and has it always
> been NULL...

The problem appears to be that ath_cmn_process_fft() isn't called that
often.  When it is, it crashes in ath_cmn_is_fft_buf_full() because
spec_priv->rfs_chan_spec_scan is NULL when ATH9K_DEBUGFS=n. :-(

I'm running with ATH9K_DEBUGFS=y now.  If it goes a couple of days
without crashing, I'll gin up a patch.

thx,

Jason.
___
ath9k-devel mailing list
ath9k-devel@lists.ath9k.org
https://lists.ath9k.org/mailman/listinfo/ath9k-devel


Re: [ath9k-devel] ath9k ARMv7 OOPS in v4.8.6, v4.2.8

2017-01-08 Thread Jason Cooper
On Wed, Nov 23, 2016 at 07:51:20PM +, Russell King - ARM Linux wrote:
> On Wed, Nov 23, 2016 at 07:15:39PM +, Jason Cooper wrote:
> > --- oops from v4.8.6 #2 --
> > [42059.303625] Unable to handle kernel NULL pointer dereference at virtual 
> > address 0020
> > [42059.311799] pgd = c0004000
> > [42059.314522] [0020] *pgd=
> > [42059.318162] Internal error: Oops: 17 [#1] SMP ARM
> > [42059.322889] Modules linked in: ath9k ath9k_common ath9k_hw ath
> > [42059.328809] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.8.6 #37
> > [42059.334755] Hardware name: Marvell Armada 370/XP (Device Tree)
> > [42059.340613] task: c0b091c0 task.stack: c0b0
> > [42059.345176] PC is at ath_cmn_process_fft+0xa0/0x578 [ath9k_common]
> > [42059.351388] LR is at ath_cmn_process_fft+0xc4/0x578 [ath9k_common]
> > [42059.357598] pc : []lr : []psr: 8153
> > [42059.357598] sp : c0b01cd0  ip :   fp : 
> > [42059.369127] r10: c0b034d4  r9 : 0069  r8 : 006c
> > [42059.374374] r7 :   r6 : dcfbd340  r5 : c0b03da0  r4 : 
> > [42059.380930] r3 : 0001  r2 : 0008  r1 : 0004  r0 : 
> 
> Well, the good news is that it's reproducable.
> 
> It looks like it could be this:
> 
> static int
> ath_cmn_is_fft_buf_full(struct ath_spec_scan_priv *spec_priv)
> {
> for_each_online_cpu(i)
> ret += relay_buf_full(rc->buf[i]);
> 
> where i = 8 (r2) and rc->buf is r7.  That's just a guess though, as
> there's precious little to go on with the Code: line - modern GCCs
> don't give us much with the Code: line anymore to figure out what's
> going on without the exact object files.
> 
> e5933000ldr r3, [r3]
> e1d330b4ldrhr3, [r3, #4]
> e58d3030str r3, [sp, #48]   ; 0x30
> ea02b   1c 
> e7970102ldr r0, [r7, r2, lsl #2]
> 

As requested on irc:


-->8
drivers/net/wireless/ath/ath9k/common-spectral.o: file format 
elf32-littlearm


Disassembly of section .text:

...

0754 :
 754:   e92d4ff0push{r4, r5, r6, r7, r8, r9, sl, fp, lr}
 758:   e24dd0d4sub sp, sp, #212; 0xd4
 75c:   e1a04002mov r4, r2
 760:   e1a06001mov r6, r1
 764:   e58d0024str r0, [sp, #36]   ; 0x24
 768:   e3a01000mov r1, #0
 76c:   e58d2018str r2, [sp, #24]
 770:   e28d0049add r0, sp, #73 ; 0x49
 774:   e3a02087mov r2, #135; 0x87
 778:   ebfebl  0 
 77c:   e5d44007ldrbr4, [r4, #7]
 780:   e20430fdand r3, r4, #253; 0xfd
 784:   e3530024cmp r3, #36 ; 0x24
 788:   13540005cmpne   r4, #5
 78c:   13a04001movne   r4, #1
 790:   03a04000moveq   r4, #0
 794:   13a0movne   r0, #0
 798:   0a01beq 7a4 
 79c:   e28dd0d4add sp, sp, #212; 0xd4
 7a0:   e8bd8ff0pop {r4, r5, r6, r7, r8, r9, sl, fp, pc}
 7a4:   e59d3018ldr r3, [sp, #24]
 7a8:   e1d380b4ldrhr8, [r3, #4]
 7ac:   e2489003sub r9, r8, #3
 7b0:   e0863009add r3, r6, r9
 7b4:   e5d30002ldrbr0, [r3, #2]
 7b8:   e210and r0, r0, #16
 7bc:   e21000ffandsr0, r0, #255; 0xff
 7c0:   0af5beq 79c 
 7c4:   e59d3024ldr r3, [sp, #36]   ; 0x24
 7c8:   e3005000movwr5, #0
 7cc:   e3405000movtr5, #0
 7d0:   e3e0b000mvn fp, #0
 7d4:   e5932000ldr r2, [r3]
 7d8:   e5937004ldr r7, [r3, #4]
 7dc:   e5923438ldr r3, [r2, #1080] ; 0x438
 7e0:   e58d2010str r2, [sp, #16]
 7e4:   e5933000ldr r3, [r3]
 7e8:   e1d330b4ldrhr3, [r3, #4]
 7ec:   e58d3030str r3, [sp, #48]   ; 0x30
 7f0:   ea02b   800 
 7f4:   e7970102ldr r0, [r7, r2, lsl #2]
 7f8:   ebfebl  0 
 7fc:   e0844000add r4, r4, r0
 800:   e300a000movwsl, #0
 804:   e28b2001add r2, fp, #1
 808:   e340a000movtsl, #0
 80c:   e3a01004mov r1, #4
 810:   e1aamov r0, sl
 814:   ebfebl  0 <_find_next_bit_le>
 818:   e5953000ldr r3, [r5]
 81c:   e153cmp r0, r3
 820:   e1a0b000mov fp, r0
 824:   e2802008add r2, 

Re: [ath9k-devel] ath9k ARMv7 OOPS in v4.8.6, v4.2.8

2017-01-08 Thread Jason Cooper
Hi Russell,

On Wed, Nov 23, 2016 at 07:51:20PM +, Russell King - ARM Linux wrote:
> On Wed, Nov 23, 2016 at 07:15:39PM +, Jason Cooper wrote:
> > --- oops from v4.8.6 #2 --
> > [42059.303625] Unable to handle kernel NULL pointer dereference at virtual 
> > address 0020
> > [42059.311799] pgd = c0004000
> > [42059.314522] [0020] *pgd=
> > [42059.318162] Internal error: Oops: 17 [#1] SMP ARM
> > [42059.322889] Modules linked in: ath9k ath9k_common ath9k_hw ath
> > [42059.328809] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.8.6 #37
> > [42059.334755] Hardware name: Marvell Armada 370/XP (Device Tree)
> > [42059.340613] task: c0b091c0 task.stack: c0b0
> > [42059.345176] PC is at ath_cmn_process_fft+0xa0/0x578 [ath9k_common]
> > [42059.351388] LR is at ath_cmn_process_fft+0xc4/0x578 [ath9k_common]
> > [42059.357598] pc : []lr : []psr: 8153
> > [42059.357598] sp : c0b01cd0  ip :   fp : 
> > [42059.369127] r10: c0b034d4  r9 : 0069  r8 : 006c
> > [42059.374374] r7 :   r6 : dcfbd340  r5 : c0b03da0  r4 : 
> > [42059.380930] r3 : 0001  r2 : 0008  r1 : 0004  r0 : 
> 
> Well, the good news is that it's reproducable.
> 
> It looks like it could be this:
> 
> static int
> ath_cmn_is_fft_buf_full(struct ath_spec_scan_priv *spec_priv)
> {
> for_each_online_cpu(i)
> ret += relay_buf_full(rc->buf[i]);

ahhh, my config has NR_CPUS=4, this SoC is uniprocessor.  I'm going to
give it a go with SMP=no.  This config is a lightly modified
mvebu_v7_defconfig.  However, NR_CPUS isn't set in mvebu_v7_defconfig.
Only in multi_v7_defconfig.

I suspect ath9k uses different logic for setting up the relay buffer(s)
than for the code you referenced.

If SMP=no fails to fail ( :-P ) then we'll know where to start digging.

thx,

Jason.
___
ath9k-devel mailing list
ath9k-devel@lists.ath9k.org
https://lists.ath9k.org/mailman/listinfo/ath9k-devel


Re: [ath9k-devel] ath9k ARMv7 OOPS in v4.8.6, v4.2.8

2017-01-08 Thread Jason Cooper
Hi Kalle,

On Wed, Nov 23, 2016 at 09:26:42PM +0200, Kalle Valo wrote:
> Jason Cooper  writes:
> > I have a Ubiquiti SR-71 mini-pcie ath9k card in a Globalscale Mirabox
> > board (Marvell Armada 370 SoC).  Every day or so I get a consistent
> > crash that brings down the whole board.  I've attached three oops I
> > captured on the serial port.
> >
> > I looked at the commits from v4.8.6 to v4.9-rc6, and nothing jumped out
> > at me as "this would fix it".  And since it takes a day or so to trigger
> > the oops, bisecting would be a bit brutal.  Does anyone have any insight
> > into this?
> 
> Is this a regression, meaning that it didn't crash on older kernels but
> crashes on newer ones? Or has it always crashed?

iirc, it's always done this.  It's one of my spare wifi backhauls that
spends most of it's time in a cardboard box waiting for a task,
collecting dust.  Kinda like the toys in Toy Story.

I pulled it out a month or so ago and the behavior started.  It had
4.2.8 on it at the time.  I upgraded to latest stable a few weeks ago
(v4.8.6) and I'm getting the same issue.

When I originally set it up, it didn't run long enough for me to recall
if the issue occurred.  Best I recall, that was with v4.2.8.

thx,

Jason.
___
ath9k-devel mailing list
ath9k-devel@lists.ath9k.org
https://lists.ath9k.org/mailman/listinfo/ath9k-devel


Re: [ath9k-devel] ath9k ARMv7 OOPS in v4.8.6, v4.2.8

2016-11-23 Thread Kalle Valo
Jason Cooper  writes:

> All,
>
> I have a Ubiquiti SR-71 mini-pcie ath9k card in a Globalscale Mirabox
> board (Marvell Armada 370 SoC).  Every day or so I get a consistent
> crash that brings down the whole board.  I've attached three oops I
> captured on the serial port.
>
> I looked at the commits from v4.8.6 to v4.9-rc6, and nothing jumped out
> at me as "this would fix it".  And since it takes a day or so to trigger
> the oops, bisecting would be a bit brutal.  Does anyone have any insight
> into this?

Is this a regression, meaning that it didn't crash on older kernels but
crashes on newer ones? Or has it always crashed?

-- 
Kalle Valo
___
ath9k-devel mailing list
ath9k-devel@lists.ath9k.org
https://lists.ath9k.org/mailman/listinfo/ath9k-devel