On Tue, Aug 13, 2013 at 07:41:36PM +0000, Love, Robert W wrote: > On 13-08-02 07:45 PM, Neil Horman wrote: > > From: Neil Horman <[email protected]> > > > > Recently had this Oops reported to me on the 3.10 kernel: > > > > [ 807.554955] BUG: unable to handle kernel NULL pointer dereference at > > 0000000000000008 > > [ 807.562799] IP: [<ffffffff814e6fc7>] skb_dequeue+0x47/0x70 > > [ 807.568296] PGD 20c889067 PUD 20c8b8067 PMD 0 > > [ 807.572769] Oops: 0002 [#1] SMP > > [ 807.655597] Hardware name: Dell Inc. PowerEdge R415/0DDT2D, BIOS 1.8.6 > > 12/06/2011 > > [ 807.663079] Workqueue: events fcoe_ctlr_recv_work [libfcoe] > > [ 807.668656] task: ffff88020b42a160 ti: ffff88020ae6c000 task.ti: > > ffff88020ae6c000 > > [ 807.676126] RIP: 0010:[<ffffffff814e6fc7>] [<ffffffff814e6fc7>] > > skb_dequeue+0x47/0x70 > > [ 807.684046] RSP: 0000:ffff88020ae6dd70 EFLAGS: 00010097 > > [ 807.689349] RAX: 0000000000000246 RBX: ffff8801d04d6700 RCX: > > 0000000000000000 > > [ 807.696474] RDX: 0000000000000000 RSI: 0000000000000246 RDI: > > ffff88020df26434 > > [ 807.703598] RBP: ffff88020ae6dd88 R08: 00000000000173e0 R09: > > ffff880216e173e0 > > [ 807.710723] R10: ffffffff814e5897 R11: ffffea0007413580 R12: > > ffff88020df26420 > > [ 807.717847] R13: ffff88020df26434 R14: 0000000000000004 R15: > > ffff8801d04c42ce > > [ 807.724972] FS: 00007fdaab6048c0(0000) GS:ffff880216e00000(0000) > > knlGS:0000000000000000 > > [ 807.733049] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > > [ 807.738785] CR2: 0000000000000008 CR3: 000000020cbc9000 CR4: > > 00000000000006f0 > > [ 807.745910] DR0: 0000000000000000 DR1: 0000000000000000 DR2: > > 0000000000000000 > > [ 807.753033] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: > > 0000000000000400 > > [ 807.760156] Stack: > > [ 807.762162] ffff8801d04d6700 0000000000000001 ffff88020df26400 > > ffff88020ae6de20 > > [ 807.769586] ffffffffa0444409 ffff88020b046a00 ffff88020ae6dde8 > > ffffffff810105be > > [ 807.777008] ffff88020b42a868 0000000000000000 ffff88020df264a8 > > ffff88020df26348 > > [ 807.784431] Call Trace: > > [ 807.786885] [<ffffffffa0444409>] fcoe_ctlr_recv_work+0x59/0x9a0 > > [libfcoe] > > [ 807.793755] [<ffffffff810105be>] ? __switch_to+0x13e/0x4a0 > > [ 807.799324] [<ffffffff8107d0e6>] process_one_work+0x176/0x420 > > [ 807.805151] [<ffffffff8107dd0b>] worker_thread+0x11b/0x3a0 > > [ 807.810717] [<ffffffff8107dbf0>] ? rescuer_thread+0x350/0x350 > > [ 807.816545] [<ffffffff810842b0>] kthread+0xc0/0xd0 > > [ 807.821416] [<ffffffff810841f0>] ? insert_kthread_work+0x40/0x40 > > [ 807.827503] [<ffffffff8160ce2c>] ret_from_fork+0x7c/0xb0 > > [ 807.832897] [<ffffffff810841f0>] ? insert_kthread_work+0x40/0x40 > > [ 807.858500] RIP [<ffffffff814e6fc7>] skb_dequeue+0x47/0x70 > > [ 807.864076] RSP <ffff88020ae6dd70> > > [ 807.867558] CR2: 0000000000000008 > > > > Looks like the root cause is the fact that the packet recieve function > > fcoe_ctlr_recv enqueues the skb to a sk_buff_head_list prior to ensuring > > that > > the skb is unshared. This can happen when multiple packet listeners > > recieve an > > skb, as the deliver_skb function just increments skb->users for each > > handler. > > As a result, having multiple users of a single skb results in multiple > > manipulators of its methods, implying list corruption, and the oops recorded > > above. > > > > The fix is pretty easy, just make sure that we clone the skb if its got > > multiple > > users with the skb_share_check function, like other protocols do. > > > > Signed-off-by: Neil Horman <[email protected]> > > CC: Robert Love <[email protected]> > > --- > > drivers/scsi/fcoe/fcoe_ctlr.c | 2 ++ > > 1 file changed, 2 insertions(+) > > > > diff --git a/drivers/scsi/fcoe/fcoe_ctlr.c b/drivers/scsi/fcoe/fcoe_ctlr.c > > index 203415e..faf4b1c 100644 > > --- a/drivers/scsi/fcoe/fcoe_ctlr.c > > +++ b/drivers/scsi/fcoe/fcoe_ctlr.c > > @@ -1453,6 +1453,8 @@ err: > > */ > > void fcoe_ctlr_recv(struct fcoe_ctlr *fip, struct sk_buff *skb) > > { > > + if ((skb = skb_share_check(skb, GFP_ATOMIC)) == NULL) > > + return; > > skb_queue_tail(&fip->fip_recv_list, skb); > > schedule_work(&fip->recv_work); > > } > Checking for changes in the working directory ... done > ERROR: do not use assignment in if condition > #70: FILE: drivers/scsi/fcoe/fcoe_ctlr.c:1456: > + if ((skb = skb_share_check(skb, GFP_ATOMIC)) == NULL) > > total: 1 errors, 0 warnings, 8 lines checked > > patches-fixes/fcoe-ensure-that-skb-placed-on has style problems, please > review. > > If any of these errors are false positives, please report > them to the maintainer, see CHECKPATCH in MAINTAINERS. > > The patch looks fine otherwise. I'd make the change myself, but I don't > have the time right now. > I'm not sure why checkpatch is griping about that. Its pretty common practice to do so. Look at ip_rcv and ipv6_rcv for examples Neil
> //Rob > _______________________________________________ > fcoe-devel mailing list > [email protected] > http://lists.open-fcoe.org/mailman/listinfo/fcoe-devel > _______________________________________________ fcoe-devel mailing list [email protected] http://lists.open-fcoe.org/mailman/listinfo/fcoe-devel
