Re: [PATCH] net-fq: Add WARN_ON check for null flow.
On 06/07/2018 02:52 PM, Cong Wang wrote: On Thu, Jun 7, 2018 at 2:41 PM, Ben Greear wrote: On 06/07/2018 02:29 PM, Cong Wang wrote: On Thu, Jun 7, 2018 at 9:06 AM, wrote: --- a/include/net/fq_impl.h +++ b/include/net/fq_impl.h @@ -80,6 +80,9 @@ static struct sk_buff *fq_tin_dequeue(struct fq *fq, flow = list_first_entry(head, struct fq_flow, flowchain); + if (WARN_ON_ONCE(!flow)) + return NULL; + How could even possibly list_first_entry() returns NULL? You need list_first_entry_or_null(). I don't know for certain flow as null, but something was NULL in this method near that line and it looked like a likely culprit. I guess possibly tin or fq was passed in as NULL? A NULL pointer is not always 0. You can trigger a NULL-ptr-def with 0x3c too, but you are checking against 0 in your patch, that is the problem and that is why list_first_entry_or_null() exists. Ahh, I see what you mean, and that is my mistake. In my case, it did seem to be a mostly-null deref, not a 0x0 deref. Thanks, Ben -- Ben Greear Candela Technologies Inc http://www.candelatech.com
Re: [PATCH] net-fq: Add WARN_ON check for null flow.
On Thu, Jun 7, 2018 at 2:41 PM, Ben Greear wrote: > On 06/07/2018 02:29 PM, Cong Wang wrote: >> >> On Thu, Jun 7, 2018 at 9:06 AM, wrote: >>> >>> --- a/include/net/fq_impl.h >>> +++ b/include/net/fq_impl.h >>> @@ -80,6 +80,9 @@ static struct sk_buff *fq_tin_dequeue(struct fq *fq, >>> >>> flow = list_first_entry(head, struct fq_flow, flowchain); >>> >>> + if (WARN_ON_ONCE(!flow)) >>> + return NULL; >>> + >> >> >> How could even possibly list_first_entry() returns NULL? >> You need list_first_entry_or_null(). >> > > I don't know for certain flow as null, but something was NULL in this method > near that line and it looked like a likely culprit. > > I guess possibly tin or fq was passed in as NULL? A NULL pointer is not always 0. You can trigger a NULL-ptr-def with 0x3c too, but you are checking against 0 in your patch, that is the problem and that is why list_first_entry_or_null() exists.
Re: [PATCH] net-fq: Add WARN_ON check for null flow.
On 06/07/2018 02:29 PM, Cong Wang wrote: On Thu, Jun 7, 2018 at 9:06 AM, wrote: --- a/include/net/fq_impl.h +++ b/include/net/fq_impl.h @@ -80,6 +80,9 @@ static struct sk_buff *fq_tin_dequeue(struct fq *fq, flow = list_first_entry(head, struct fq_flow, flowchain); + if (WARN_ON_ONCE(!flow)) + return NULL; + How could even possibly list_first_entry() returns NULL? You need list_first_entry_or_null(). I don't know for certain flow as null, but something was NULL in this method near that line and it looked like a likely culprit. I guess possibly tin or fq was passed in as NULL? Anyway, if the patch seems worthless just ignore it. I'll leave it in my tree since it should be harmless and will let you know if I ever hit it. If someone else hits a similar crash, hopefully they can report it. Thanks, Ben -- Ben Greear Candela Technologies Inc http://www.candelatech.com
Re: [PATCH] net-fq: Add WARN_ON check for null flow.
On Thu, Jun 7, 2018 at 9:06 AM, wrote: > --- a/include/net/fq_impl.h > +++ b/include/net/fq_impl.h > @@ -80,6 +80,9 @@ static struct sk_buff *fq_tin_dequeue(struct fq *fq, > > flow = list_first_entry(head, struct fq_flow, flowchain); > > + if (WARN_ON_ONCE(!flow)) > + return NULL; > + How could even possibly list_first_entry() returns NULL? You need list_first_entry_or_null().
Re: [PATCH] net-fq: Add WARN_ON check for null flow.
On 06/07/2018 09:17 AM, Eric Dumazet wrote: On 06/07/2018 09:06 AM, gree...@candelatech.com wrote: From: Ben Greear While testing an ath10k firmware that often crashed under load, I was seeing kernel crashes as well. One of them appeared to be a dereference of a NULL flow object in fq_tin_dequeue. I have since fixed the firmware flaw, but I think it would be worth adding the WARN_ON in case the problem appears again. common_interrupt+0xf/0xf Please find the exact commit that brought this bug, and add a corresponding Fixes: tag It will be a total pain to bisect this problem since my test case that causes this is running my modified firmware (and a buggy one at that), modified ath10k driver (to work with this firmware and support my test case easily), and the failure case appears to cause multiple different-but-probably-related crashes and often hangs or reboots the test system. Probably this is all caused by some nasty race or buggy logic related to dealing with a crashed ath10k firmware tearing down txq logic from the bottom up. There have been many such bugs in the past, I and others fixed a few, and very likely more remain. For what it is worth, I didn't see this crash in 4.13, and I spent some time testing buggy firmware there occasionally. If someone else has interest in debugging the ath10k driver, I will be happy to generate a mostly-stock firmware image with ability to crash in the TX path and give it to them. It will crash the stock upstream code reliably in my experience. Thanks, Ben Signed-off-by: Ben Greear --- include/net/fq_impl.h | 3 +++ 1 file changed, 3 insertions(+) diff --git a/include/net/fq_impl.h b/include/net/fq_impl.h index be7c0fa..e40354d 100644 --- a/include/net/fq_impl.h +++ b/include/net/fq_impl.h @@ -80,6 +80,9 @@ static struct sk_buff *fq_tin_dequeue(struct fq *fq, flow = list_first_entry(head, struct fq_flow, flowchain); + if (WARN_ON_ONCE(!flow)) + return NULL; + if (flow->deficit <= 0) { flow->deficit += fq->quantum; list_move_tail(&flow->flowchain, -- Ben Greear Candela Technologies Inc http://www.candelatech.com
Re: [PATCH] net-fq: Add WARN_ON check for null flow.
On 06/07/2018 09:06 AM, gree...@candelatech.com wrote: > From: Ben Greear > > While testing an ath10k firmware that often crashed under load, > I was seeing kernel crashes as well. One of them appeared to > be a dereference of a NULL flow object in fq_tin_dequeue. > > I have since fixed the firmware flaw, but I think it would be > worth adding the WARN_ON in case the problem appears again. > > common_interrupt+0xf/0xf > > Please find the exact commit that brought this bug, and add a corresponding Fixes: tag > Signed-off-by: Ben Greear > --- > include/net/fq_impl.h | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/include/net/fq_impl.h b/include/net/fq_impl.h > index be7c0fa..e40354d 100644 > --- a/include/net/fq_impl.h > +++ b/include/net/fq_impl.h > @@ -80,6 +80,9 @@ static struct sk_buff *fq_tin_dequeue(struct fq *fq, > > flow = list_first_entry(head, struct fq_flow, flowchain); > > + if (WARN_ON_ONCE(!flow)) > + return NULL; > + > if (flow->deficit <= 0) { > flow->deficit += fq->quantum; > list_move_tail(&flow->flowchain, >