Re: [PATCH net-next] net: gro: properly remove skb from list
From: Prashant Bhole Date: Thu, 12 Jul 2018 16:24:59 +0900 > Following crash occurs in validate_xmit_skb_list() when same skb is > iterated multiple times in the loop and consume_skb() is called. > > The root cause is calling list_del_init(&skb->list) and not clearing > skb->next in d4546c2509b1. list_del_init(&skb->list) sets skb->next > to point to skb itself. skb->next needs to be cleared because other > parts of network stack uses another kind of SKB lists. > validate_xmit_skb_list() uses such list. > > A similar type of bugfix was reported by Jesper Dangaard Brouer. > https://patchwork.ozlabs.org/patch/942541/ > > This patch clears skb->next and changes list_del_init() to list_del() > so that list->prev will maintain the list poison. ... > Fixes: d4546c2509b1 ("net: Convert GRO SKB handling to list_head.") > Signed-off-by: Prashant Bhole > Reported-by: Tyler Hicks Applied, thank you. Hopefully we can convert more layers to list_head SKB usage, and thus no longer need hacks like this. Thanks.
Re: [PATCH net-next] net: gro: properly remove skb from list
On 2018-07-12 16:24:59, Prashant Bhole wrote: > Following crash occurs in validate_xmit_skb_list() when same skb is > iterated multiple times in the loop and consume_skb() is called. > > The root cause is calling list_del_init(&skb->list) and not clearing > skb->next in d4546c2509b1. list_del_init(&skb->list) sets skb->next > to point to skb itself. skb->next needs to be cleared because other > parts of network stack uses another kind of SKB lists. > validate_xmit_skb_list() uses such list. > > A similar type of bugfix was reported by Jesper Dangaard Brouer. > https://patchwork.ozlabs.org/patch/942541/ > > This patch clears skb->next and changes list_del_init() to list_del() > so that list->prev will maintain the list poison. > > [ 148.185511] > == > [ 148.187865] BUG: KASAN: use-after-free in validate_xmit_skb_list+0x4b/0xa0 > [ 148.190158] Read of size 8 at addr 8801e52eefc0 by task swapper/1/0 > [ 148.192940] > [ 148.193642] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.18.0-rc3+ #25 > [ 148.195423] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS > ?-20180531_142017-buildhw-08.phx2.fedoraproject.org-1.fc28 04/01/2014 > [ 148.199129] Call Trace: > [ 148.200565] > [ 148.201911] dump_stack+0xc6/0x14c > [ 148.203572] ? dump_stack_print_info.cold.1+0x2f/0x2f > [ 148.205083] ? kmsg_dump_rewind_nolock+0x59/0x59 > [ 148.206307] ? validate_xmit_skb+0x2c6/0x560 > [ 148.207432] ? debug_show_held_locks+0x30/0x30 > [ 148.208571] ? validate_xmit_skb_list+0x4b/0xa0 > [ 148.211144] print_address_description+0x6c/0x23c > [ 148.212601] ? validate_xmit_skb_list+0x4b/0xa0 > [ 148.213782] kasan_report.cold.6+0x241/0x2fd > [ 148.214958] validate_xmit_skb_list+0x4b/0xa0 > [ 148.216494] sch_direct_xmit+0x1b0/0x680 > [ 148.217601] ? dev_watchdog+0x4e0/0x4e0 > [ 148.218675] ? do_raw_spin_trylock+0x10/0x120 > [ 148.219818] ? do_raw_spin_lock+0xe0/0xe0 > [ 148.221032] __dev_queue_xmit+0x1167/0x1810 > [ 148.222155] ? sched_clock+0x5/0x10 > [...] > > [ 148.474257] Allocated by task 0: > [ 148.475363] kasan_kmalloc+0xbf/0xe0 > [ 148.476503] kmem_cache_alloc+0xb4/0x1b0 > [ 148.477654] __build_skb+0x91/0x250 > [ 148.478677] build_skb+0x67/0x180 > [ 148.479657] e1000_clean_rx_irq+0x542/0x8a0 > [ 148.480757] e1000_clean+0x652/0xd10 > [ 148.481772] net_rx_action+0x4ea/0xc20 > [ 148.482808] __do_softirq+0x1f9/0x574 > [ 148.483831] > [ 148.484575] Freed by task 0: > [ 148.485504] __kasan_slab_free+0x12e/0x180 > [ 148.486589] kmem_cache_free+0xb4/0x240 > [ 148.487634] kfree_skbmem+0xed/0x150 > [ 148.488648] consume_skb+0x146/0x250 > [ 148.489665] validate_xmit_skb+0x2b7/0x560 > [ 148.490754] validate_xmit_skb_list+0x70/0xa0 > [ 148.491897] sch_direct_xmit+0x1b0/0x680 > [ 148.493949] __dev_queue_xmit+0x1167/0x1810 > [ 148.495103] br_dev_queue_push_xmit+0xce/0x250 > [ 148.496196] br_forward_finish+0x276/0x280 > [ 148.497234] __br_forward+0x44f/0x520 > [ 148.498260] br_forward+0x19f/0x1b0 > [ 148.499264] br_handle_frame_finish+0x65e/0x980 > [ 148.500398] NF_HOOK.constprop.10+0x290/0x2a0 > [ 148.501522] br_handle_frame+0x417/0x640 > [ 148.502582] __netif_receive_skb_core+0xaac/0x18f0 > [ 148.503753] __netif_receive_skb_one_core+0x98/0x120 > [ 148.504958] netif_receive_skb_internal+0xe3/0x330 > [ 148.506154] napi_gro_complete+0x190/0x2a0 > [ 148.507243] dev_gro_receive+0x9f7/0x1100 > [ 148.508316] napi_gro_receive+0xcb/0x260 > [ 148.509387] e1000_clean_rx_irq+0x2fc/0x8a0 > [ 148.510501] e1000_clean+0x652/0xd10 > [ 148.511523] net_rx_action+0x4ea/0xc20 > [ 148.512566] __do_softirq+0x1f9/0x574 > [ 148.513598] > [ 148.514346] The buggy address belongs to the object at 8801e52eefc0 > [ 148.514346] which belongs to the cache skbuff_head_cache of size 232 > [ 148.517047] The buggy address is located 0 bytes inside of > [ 148.517047] 232-byte region [8801e52eefc0, 8801e52ef0a8) > [ 148.519549] The buggy address belongs to the page: > [ 148.520726] page:ea000794bb00 count:1 mapcount:0 > mapping:880106f4dfc0 index:0x8801e52ee840 compound_mapcount: 0 > [ 148.524325] flags: 0x17c0008100(slab|head) > [ 148.525481] raw: 0017c0008100 880106b938d0 880106b938d0 > 880106f4dfc0 > [ 148.527503] raw: 8801e52ee840 00190011 0001 > > [ 148.529547] page dumped because: kasan: bad access detected > > Fixes: d4546c2509b1 ("net: Convert GRO SKB handling to list_head.") > Signed-off-by: Prashant Bhole > Reported-by: Tyler Hicks Thanks for the fix! I have verified that it fixes the crash that I was seeing. Tested-by: Tyler Hicks Your analysis of the problem makes sense. I feel like a helper function to properly remove an skb from list_head and NULL the next pointer, in the GRO code, before passing it on to something else that uses the hand-rolled linked list implementation c
[PATCH net-next] net: gro: properly remove skb from list
Following crash occurs in validate_xmit_skb_list() when same skb is iterated multiple times in the loop and consume_skb() is called. The root cause is calling list_del_init(&skb->list) and not clearing skb->next in d4546c2509b1. list_del_init(&skb->list) sets skb->next to point to skb itself. skb->next needs to be cleared because other parts of network stack uses another kind of SKB lists. validate_xmit_skb_list() uses such list. A similar type of bugfix was reported by Jesper Dangaard Brouer. https://patchwork.ozlabs.org/patch/942541/ This patch clears skb->next and changes list_del_init() to list_del() so that list->prev will maintain the list poison. [ 148.185511] == [ 148.187865] BUG: KASAN: use-after-free in validate_xmit_skb_list+0x4b/0xa0 [ 148.190158] Read of size 8 at addr 8801e52eefc0 by task swapper/1/0 [ 148.192940] [ 148.193642] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.18.0-rc3+ #25 [ 148.195423] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20180531_142017-buildhw-08.phx2.fedoraproject.org-1.fc28 04/01/2014 [ 148.199129] Call Trace: [ 148.200565] [ 148.201911] dump_stack+0xc6/0x14c [ 148.203572] ? dump_stack_print_info.cold.1+0x2f/0x2f [ 148.205083] ? kmsg_dump_rewind_nolock+0x59/0x59 [ 148.206307] ? validate_xmit_skb+0x2c6/0x560 [ 148.207432] ? debug_show_held_locks+0x30/0x30 [ 148.208571] ? validate_xmit_skb_list+0x4b/0xa0 [ 148.211144] print_address_description+0x6c/0x23c [ 148.212601] ? validate_xmit_skb_list+0x4b/0xa0 [ 148.213782] kasan_report.cold.6+0x241/0x2fd [ 148.214958] validate_xmit_skb_list+0x4b/0xa0 [ 148.216494] sch_direct_xmit+0x1b0/0x680 [ 148.217601] ? dev_watchdog+0x4e0/0x4e0 [ 148.218675] ? do_raw_spin_trylock+0x10/0x120 [ 148.219818] ? do_raw_spin_lock+0xe0/0xe0 [ 148.221032] __dev_queue_xmit+0x1167/0x1810 [ 148.222155] ? sched_clock+0x5/0x10 [...] [ 148.474257] Allocated by task 0: [ 148.475363] kasan_kmalloc+0xbf/0xe0 [ 148.476503] kmem_cache_alloc+0xb4/0x1b0 [ 148.477654] __build_skb+0x91/0x250 [ 148.478677] build_skb+0x67/0x180 [ 148.479657] e1000_clean_rx_irq+0x542/0x8a0 [ 148.480757] e1000_clean+0x652/0xd10 [ 148.481772] net_rx_action+0x4ea/0xc20 [ 148.482808] __do_softirq+0x1f9/0x574 [ 148.483831] [ 148.484575] Freed by task 0: [ 148.485504] __kasan_slab_free+0x12e/0x180 [ 148.486589] kmem_cache_free+0xb4/0x240 [ 148.487634] kfree_skbmem+0xed/0x150 [ 148.488648] consume_skb+0x146/0x250 [ 148.489665] validate_xmit_skb+0x2b7/0x560 [ 148.490754] validate_xmit_skb_list+0x70/0xa0 [ 148.491897] sch_direct_xmit+0x1b0/0x680 [ 148.493949] __dev_queue_xmit+0x1167/0x1810 [ 148.495103] br_dev_queue_push_xmit+0xce/0x250 [ 148.496196] br_forward_finish+0x276/0x280 [ 148.497234] __br_forward+0x44f/0x520 [ 148.498260] br_forward+0x19f/0x1b0 [ 148.499264] br_handle_frame_finish+0x65e/0x980 [ 148.500398] NF_HOOK.constprop.10+0x290/0x2a0 [ 148.501522] br_handle_frame+0x417/0x640 [ 148.502582] __netif_receive_skb_core+0xaac/0x18f0 [ 148.503753] __netif_receive_skb_one_core+0x98/0x120 [ 148.504958] netif_receive_skb_internal+0xe3/0x330 [ 148.506154] napi_gro_complete+0x190/0x2a0 [ 148.507243] dev_gro_receive+0x9f7/0x1100 [ 148.508316] napi_gro_receive+0xcb/0x260 [ 148.509387] e1000_clean_rx_irq+0x2fc/0x8a0 [ 148.510501] e1000_clean+0x652/0xd10 [ 148.511523] net_rx_action+0x4ea/0xc20 [ 148.512566] __do_softirq+0x1f9/0x574 [ 148.513598] [ 148.514346] The buggy address belongs to the object at 8801e52eefc0 [ 148.514346] which belongs to the cache skbuff_head_cache of size 232 [ 148.517047] The buggy address is located 0 bytes inside of [ 148.517047] 232-byte region [8801e52eefc0, 8801e52ef0a8) [ 148.519549] The buggy address belongs to the page: [ 148.520726] page:ea000794bb00 count:1 mapcount:0 mapping:880106f4dfc0 index:0x8801e52ee840 compound_mapcount: 0 [ 148.524325] flags: 0x17c0008100(slab|head) [ 148.525481] raw: 0017c0008100 880106b938d0 880106b938d0 880106f4dfc0 [ 148.527503] raw: 8801e52ee840 00190011 0001 [ 148.529547] page dumped because: kasan: bad access detected Fixes: d4546c2509b1 ("net: Convert GRO SKB handling to list_head.") Signed-off-by: Prashant Bhole Reported-by: Tyler Hicks --- net/core/dev.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/net/core/dev.c b/net/core/dev.c index d13cddcac41f..08c41941f912 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -5169,7 +5169,8 @@ static void __napi_gro_flush_chain(struct napi_struct *napi, u32 index, list_for_each_entry_safe_reverse(skb, p, head, list) { if (flush_old && NAPI_GRO_CB(skb)->age == jiffies) return; - list_del_init(&skb->list); + list_del(&skb->list); + skb->next = NULL; na