Re: [PATCH] net-fq: Add WARN_ON check for null flow.

2018-06-07 Thread Ben Greear

On 06/07/2018 02:52 PM, Cong Wang wrote:

On Thu, Jun 7, 2018 at 2:41 PM, Ben Greear  wrote:

On 06/07/2018 02:29 PM, Cong Wang wrote:


On Thu, Jun 7, 2018 at 9:06 AM,   wrote:


--- a/include/net/fq_impl.h
+++ b/include/net/fq_impl.h
@@ -80,6 +80,9 @@ static struct sk_buff *fq_tin_dequeue(struct fq *fq,

flow = list_first_entry(head, struct fq_flow, flowchain);

+   if (WARN_ON_ONCE(!flow))
+   return NULL;
+



How could even possibly list_first_entry() returns NULL?
You need list_first_entry_or_null().



I don't know for certain flow as null, but something was NULL in this method
near that line and it looked like a likely culprit.

I guess possibly tin or fq was passed in as NULL?


A NULL pointer is not always 0. You can trigger a NULL-ptr-def with 0x3c
too, but you are checking against 0 in your patch, that is the problem and
that is why list_first_entry_or_null() exists.



Ahh, I see what you mean, and that is my mistake.  In my case, it did seem to
be a mostly-null deref, not a 0x0 deref.

Thanks,
Ben

--
Ben Greear 
Candela Technologies Inc  http://www.candelatech.com



Re: [PATCH] net-fq: Add WARN_ON check for null flow.

2018-06-07 Thread Cong Wang
On Thu, Jun 7, 2018 at 2:41 PM, Ben Greear  wrote:
> On 06/07/2018 02:29 PM, Cong Wang wrote:
>>
>> On Thu, Jun 7, 2018 at 9:06 AM,   wrote:
>>>
>>> --- a/include/net/fq_impl.h
>>> +++ b/include/net/fq_impl.h
>>> @@ -80,6 +80,9 @@ static struct sk_buff *fq_tin_dequeue(struct fq *fq,
>>>
>>> flow = list_first_entry(head, struct fq_flow, flowchain);
>>>
>>> +   if (WARN_ON_ONCE(!flow))
>>> +   return NULL;
>>> +
>>
>>
>> How could even possibly list_first_entry() returns NULL?
>> You need list_first_entry_or_null().
>>
>
> I don't know for certain flow as null, but something was NULL in this method
> near that line and it looked like a likely culprit.
>
> I guess possibly tin or fq was passed in as NULL?

A NULL pointer is not always 0. You can trigger a NULL-ptr-def with 0x3c
too, but you are checking against 0 in your patch, that is the problem and
that is why list_first_entry_or_null() exists.


Re: [PATCH] net-fq: Add WARN_ON check for null flow.

2018-06-07 Thread Ben Greear

On 06/07/2018 02:29 PM, Cong Wang wrote:

On Thu, Jun 7, 2018 at 9:06 AM,   wrote:

--- a/include/net/fq_impl.h
+++ b/include/net/fq_impl.h
@@ -80,6 +80,9 @@ static struct sk_buff *fq_tin_dequeue(struct fq *fq,

flow = list_first_entry(head, struct fq_flow, flowchain);

+   if (WARN_ON_ONCE(!flow))
+   return NULL;
+


How could even possibly list_first_entry() returns NULL?
You need list_first_entry_or_null().



I don't know for certain flow as null, but something was NULL in this method
near that line and it looked like a likely culprit.

I guess possibly tin or fq was passed in as NULL?

Anyway, if the patch seems worthless just ignore it.  I'll leave it in my tree
since it should be harmless and will let you know if I ever hit it.

If someone else hits a similar crash, hopefully they can report it.

Thanks,
Ben

--
Ben Greear 
Candela Technologies Inc  http://www.candelatech.com



Re: [PATCH] net-fq: Add WARN_ON check for null flow.

2018-06-07 Thread Cong Wang
On Thu, Jun 7, 2018 at 9:06 AM,   wrote:
> --- a/include/net/fq_impl.h
> +++ b/include/net/fq_impl.h
> @@ -80,6 +80,9 @@ static struct sk_buff *fq_tin_dequeue(struct fq *fq,
>
> flow = list_first_entry(head, struct fq_flow, flowchain);
>
> +   if (WARN_ON_ONCE(!flow))
> +   return NULL;
> +

How could even possibly list_first_entry() returns NULL?
You need list_first_entry_or_null().


Re: [PATCH] net-fq: Add WARN_ON check for null flow.

2018-06-07 Thread Ben Greear

On 06/07/2018 09:17 AM, Eric Dumazet wrote:



On 06/07/2018 09:06 AM, gree...@candelatech.com wrote:

From: Ben Greear 

While testing an ath10k firmware that often crashed under load,
I was seeing kernel crashes as well.  One of them appeared to
be a dereference of a NULL flow object in fq_tin_dequeue.

I have since fixed the firmware flaw, but I think it would be
worth adding the WARN_ON in case the problem appears again.

 common_interrupt+0xf/0xf
 



Please find the exact commit that brought this bug,
and add a corresponding Fixes: tag


It will be a total pain to bisect this problem since my test
case that causes this is running my modified firmware (and a buggy one at that),
modified ath10k driver (to work with this firmware and support my test case 
easily),
and the failure case appears to cause multiple different-but-probably-related
crashes and often hangs or reboots the test system.

Probably this is all caused by some nasty race or buggy logic related to
dealing with a crashed ath10k firmware tearing down txq logic from the
bottom up.  There have been many such bugs in the past, I and others fixed a 
few,
and very likely more remain.

For what it is worth, I didn't see this crash in 4.13, and I spent some time
testing buggy firmware there occasionally.

If someone else has interest in debugging the ath10k driver, I will be happy to 
generate
a mostly-stock firmware image with ability to crash in the TX path and give it 
to them.
It will crash the stock upstream code reliably in my experience.

Thanks,
Ben





Signed-off-by: Ben Greear 
---
 include/net/fq_impl.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/include/net/fq_impl.h b/include/net/fq_impl.h
index be7c0fa..e40354d 100644
--- a/include/net/fq_impl.h
+++ b/include/net/fq_impl.h
@@ -80,6 +80,9 @@ static struct sk_buff *fq_tin_dequeue(struct fq *fq,

flow = list_first_entry(head, struct fq_flow, flowchain);

+   if (WARN_ON_ONCE(!flow))
+   return NULL;
+
if (flow->deficit <= 0) {
flow->deficit += fq->quantum;
list_move_tail(&flow->flowchain,






--
Ben Greear 
Candela Technologies Inc  http://www.candelatech.com



Re: [PATCH] net-fq: Add WARN_ON check for null flow.

2018-06-07 Thread Eric Dumazet



On 06/07/2018 09:06 AM, gree...@candelatech.com wrote:
> From: Ben Greear 
> 
> While testing an ath10k firmware that often crashed under load,
> I was seeing kernel crashes as well.  One of them appeared to
> be a dereference of a NULL flow object in fq_tin_dequeue.
> 
> I have since fixed the firmware flaw, but I think it would be
> worth adding the WARN_ON in case the problem appears again.
> 
>  common_interrupt+0xf/0xf
>  
> 

Please find the exact commit that brought this bug,
and add a corresponding Fixes: tag

> Signed-off-by: Ben Greear 
> ---
>  include/net/fq_impl.h | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/include/net/fq_impl.h b/include/net/fq_impl.h
> index be7c0fa..e40354d 100644
> --- a/include/net/fq_impl.h
> +++ b/include/net/fq_impl.h
> @@ -80,6 +80,9 @@ static struct sk_buff *fq_tin_dequeue(struct fq *fq,
>  
>   flow = list_first_entry(head, struct fq_flow, flowchain);
>  
> + if (WARN_ON_ONCE(!flow))
> + return NULL;
> +
>   if (flow->deficit <= 0) {
>   flow->deficit += fq->quantum;
>   list_move_tail(&flow->flowchain,
>