Re: [PATCH] net-fq: Add WARN_ON check for null flow.
On 06/07/2018 02:52 PM, Cong Wang wrote: On Thu, Jun 7, 2018 at 2:41 PM, Ben Greear wrote: On 06/07/2018 02:29 PM, Cong Wang wrote: On Thu, Jun 7, 2018 at 9:06 AM, wrote: --- a/include/net/fq_impl.h +++ b/include/net/fq_impl.h @@ -80,6 +80,9 @@ static struct sk_buff *fq_tin_dequeue(struct fq *fq, flow = list_first_entry(head, struct fq_flow, flowchain); + if (WARN_ON_ONCE(!flow)) + return NULL; + How could even possibly list_first_entry() returns NULL? You need list_first_entry_or_null(). I don't know for certain flow as null, but something was NULL in this method near that line and it looked like a likely culprit. I guess possibly tin or fq was passed in as NULL? A NULL pointer is not always 0. You can trigger a NULL-ptr-def with 0x3c too, but you are checking against 0 in your patch, that is the problem and that is why list_first_entry_or_null() exists. Ahh, I see what you mean, and that is my mistake. In my case, it did seem to be a mostly-null deref, not a 0x0 deref. Thanks, Ben -- Ben Greear Candela Technologies Inc http://www.candelatech.com
Re: [PATCH] net-fq: Add WARN_ON check for null flow.
On Thu, Jun 7, 2018 at 2:41 PM, Ben Greear wrote: > On 06/07/2018 02:29 PM, Cong Wang wrote: >> >> On Thu, Jun 7, 2018 at 9:06 AM, wrote: >>> >>> --- a/include/net/fq_impl.h >>> +++ b/include/net/fq_impl.h >>> @@ -80,6 +80,9 @@ static struct sk_buff *fq_tin_dequeue(struct fq *fq, >>> >>> flow = list_first_entry(head, struct fq_flow, flowchain); >>> >>> + if (WARN_ON_ONCE(!flow)) >>> + return NULL; >>> + >> >> >> How could even possibly list_first_entry() returns NULL? >> You need list_first_entry_or_null(). >> > > I don't know for certain flow as null, but something was NULL in this method > near that line and it looked like a likely culprit. > > I guess possibly tin or fq was passed in as NULL? A NULL pointer is not always 0. You can trigger a NULL-ptr-def with 0x3c too, but you are checking against 0 in your patch, that is the problem and that is why list_first_entry_or_null() exists.
Re: [PATCH] net-fq: Add WARN_ON check for null flow.
On 06/07/2018 02:29 PM, Cong Wang wrote: On Thu, Jun 7, 2018 at 9:06 AM, wrote: --- a/include/net/fq_impl.h +++ b/include/net/fq_impl.h @@ -80,6 +80,9 @@ static struct sk_buff *fq_tin_dequeue(struct fq *fq, flow = list_first_entry(head, struct fq_flow, flowchain); + if (WARN_ON_ONCE(!flow)) + return NULL; + How could even possibly list_first_entry() returns NULL? You need list_first_entry_or_null(). I don't know for certain flow as null, but something was NULL in this method near that line and it looked like a likely culprit. I guess possibly tin or fq was passed in as NULL? Anyway, if the patch seems worthless just ignore it. I'll leave it in my tree since it should be harmless and will let you know if I ever hit it. If someone else hits a similar crash, hopefully they can report it. Thanks, Ben -- Ben Greear Candela Technologies Inc http://www.candelatech.com
Re: [PATCH] net-fq: Add WARN_ON check for null flow.
On Thu, Jun 7, 2018 at 9:06 AM, wrote: > --- a/include/net/fq_impl.h > +++ b/include/net/fq_impl.h > @@ -80,6 +80,9 @@ static struct sk_buff *fq_tin_dequeue(struct fq *fq, > > flow = list_first_entry(head, struct fq_flow, flowchain); > > + if (WARN_ON_ONCE(!flow)) > + return NULL; > + How could even possibly list_first_entry() returns NULL? You need list_first_entry_or_null().
Re: [PATCH] net-fq: Add WARN_ON check for null flow.
On 06/07/2018 09:17 AM, Eric Dumazet wrote: On 06/07/2018 09:06 AM, gree...@candelatech.com wrote: From: Ben Greear While testing an ath10k firmware that often crashed under load, I was seeing kernel crashes as well. One of them appeared to be a dereference of a NULL flow object in fq_tin_dequeue. I have since fixed the firmware flaw, but I think it would be worth adding the WARN_ON in case the problem appears again. common_interrupt+0xf/0xf Please find the exact commit that brought this bug, and add a corresponding Fixes: tag It will be a total pain to bisect this problem since my test case that causes this is running my modified firmware (and a buggy one at that), modified ath10k driver (to work with this firmware and support my test case easily), and the failure case appears to cause multiple different-but-probably-related crashes and often hangs or reboots the test system. Probably this is all caused by some nasty race or buggy logic related to dealing with a crashed ath10k firmware tearing down txq logic from the bottom up. There have been many such bugs in the past, I and others fixed a few, and very likely more remain. For what it is worth, I didn't see this crash in 4.13, and I spent some time testing buggy firmware there occasionally. If someone else has interest in debugging the ath10k driver, I will be happy to generate a mostly-stock firmware image with ability to crash in the TX path and give it to them. It will crash the stock upstream code reliably in my experience. Thanks, Ben Signed-off-by: Ben Greear --- include/net/fq_impl.h | 3 +++ 1 file changed, 3 insertions(+) diff --git a/include/net/fq_impl.h b/include/net/fq_impl.h index be7c0fa..e40354d 100644 --- a/include/net/fq_impl.h +++ b/include/net/fq_impl.h @@ -80,6 +80,9 @@ static struct sk_buff *fq_tin_dequeue(struct fq *fq, flow = list_first_entry(head, struct fq_flow, flowchain); + if (WARN_ON_ONCE(!flow)) + return NULL; + if (flow->deficit <= 0) { flow->deficit += fq->quantum; list_move_tail(&flow->flowchain, -- Ben Greear Candela Technologies Inc http://www.candelatech.com
Re: [PATCH] net-fq: Add WARN_ON check for null flow.
On 06/07/2018 09:06 AM, gree...@candelatech.com wrote: > From: Ben Greear > > While testing an ath10k firmware that often crashed under load, > I was seeing kernel crashes as well. One of them appeared to > be a dereference of a NULL flow object in fq_tin_dequeue. > > I have since fixed the firmware flaw, but I think it would be > worth adding the WARN_ON in case the problem appears again. > > common_interrupt+0xf/0xf > > Please find the exact commit that brought this bug, and add a corresponding Fixes: tag > Signed-off-by: Ben Greear > --- > include/net/fq_impl.h | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/include/net/fq_impl.h b/include/net/fq_impl.h > index be7c0fa..e40354d 100644 > --- a/include/net/fq_impl.h > +++ b/include/net/fq_impl.h > @@ -80,6 +80,9 @@ static struct sk_buff *fq_tin_dequeue(struct fq *fq, > > flow = list_first_entry(head, struct fq_flow, flowchain); > > + if (WARN_ON_ONCE(!flow)) > + return NULL; > + > if (flow->deficit <= 0) { > flow->deficit += fq->quantum; > list_move_tail(&flow->flowchain, >
[PATCH] net-fq: Add WARN_ON check for null flow.
From: Ben Greear While testing an ath10k firmware that often crashed under load, I was seeing kernel crashes as well. One of them appeared to be a dereference of a NULL flow object in fq_tin_dequeue. I have since fixed the firmware flaw, but I think it would be worth adding the WARN_ON in case the problem appears again. BUG: unable to handle kernel NULL pointer dereference at 003c IP: ieee80211_tx_dequeue+0xfb/0xb10 [mac80211] PGD 8001417fe067 P4D 8001417fe067 PUD 13db41067 PMD 0 Oops: [#1] PREEMPT SMP PTI Modules linked in: nf_conntrack_netlink nf_conntrack nfnetlink nf_defrag_ipv4 libcrc32c vrf 8021q garp mrp stp llc fuse macvlan wanlink(O) pktgen lm78 ] CPU: 2 PID: 21733 Comm: ip Tainted: GW O 4.16.8+ #35 Hardware name: _ _/, BIOS 5.11 08/26/2016 RIP: 0010:ieee80211_tx_dequeue+0xfb/0xb10 [mac80211] RSP: 0018:880172d03c30 EFLAGS: 00010286 RAX: 88013b2c RBX: 88013b2c00b8 RCX: 0898 RDX: 0001 RSI: 88013b2c00d8 RDI: 88016ac40820 RBP: 88016ac42ba0 R08: 0020 R09: R10: 0010 R11: 001256c89fd8 R12: 88013b2c R13: 88013b2c00d8 R14: R15: 88013b2c00d8 FS: 7f04e3606700() GS:880172d0() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 003c CR3: 00013b35a005 CR4: 003606e0 DR0: DR1: DR2: DR3: DR6: fffe0ff0 DR7: 0400 Call Trace: ? update_load_avg+0x607/0x6f0 ath10k_mac_tx_push_txq+0x6e/0x220 [ath10k_core] ath10k_mac_tx_push_pending+0x151/0x1e0 [ath10k_core] ath10k_htt_txrx_compl_task+0x113e/0x1940 [ath10k_core] ? ath10k_ce_completed_send_next_nolock+0x6f/0x90 [ath10k_pci] ? ath10k_ce_completed_send_next+0x31/0x40 [ath10k_pci] ? ath10k_pci_htc_tx_cb+0x30/0xc0 [ath10k_pci] ? ath10k_bus_pci_write32+0x3c/0xa0 [ath10k_pci] ath10k_pci_napi_poll+0x44/0xf0 [ath10k_pci] net_rx_action+0x250/0x3b0 __do_softirq+0xc2/0x2c2 irq_exit+0x93/0xa0 do_IRQ+0x45/0xc0 common_interrupt+0xf/0xf Signed-off-by: Ben Greear --- include/net/fq_impl.h | 3 +++ 1 file changed, 3 insertions(+) diff --git a/include/net/fq_impl.h b/include/net/fq_impl.h index be7c0fa..e40354d 100644 --- a/include/net/fq_impl.h +++ b/include/net/fq_impl.h @@ -80,6 +80,9 @@ static struct sk_buff *fq_tin_dequeue(struct fq *fq, flow = list_first_entry(head, struct fq_flow, flowchain); + if (WARN_ON_ONCE(!flow)) + return NULL; + if (flow->deficit <= 0) { flow->deficit += fq->quantum; list_move_tail(&flow->flowchain, -- 2.4.11