On Tue, Aug 13, 2013 at 07:41:36PM +0000, Love, Robert W wrote:
> On 13-08-02 07:45 PM, Neil Horman wrote:
> > From: Neil Horman <[email protected]>
> >
> > Recently had this Oops reported to me on the 3.10 kernel:
> >
> > [  807.554955] BUG: unable to handle kernel NULL pointer dereference at 
> > 0000000000000008
> > [  807.562799] IP: [<ffffffff814e6fc7>] skb_dequeue+0x47/0x70
> > [  807.568296] PGD 20c889067 PUD 20c8b8067 PMD 0
> > [  807.572769] Oops: 0002 [#1] SMP
> > [  807.655597] Hardware name: Dell Inc. PowerEdge R415/0DDT2D, BIOS 1.8.6 
> > 12/06/2011
> > [  807.663079] Workqueue: events fcoe_ctlr_recv_work [libfcoe]
> > [  807.668656] task: ffff88020b42a160 ti: ffff88020ae6c000 task.ti: 
> > ffff88020ae6c000
> > [  807.676126] RIP: 0010:[<ffffffff814e6fc7>]  [<ffffffff814e6fc7>] 
> > skb_dequeue+0x47/0x70
> > [  807.684046] RSP: 0000:ffff88020ae6dd70  EFLAGS: 00010097
> > [  807.689349] RAX: 0000000000000246 RBX: ffff8801d04d6700 RCX: 
> > 0000000000000000
> > [  807.696474] RDX: 0000000000000000 RSI: 0000000000000246 RDI: 
> > ffff88020df26434
> > [  807.703598] RBP: ffff88020ae6dd88 R08: 00000000000173e0 R09: 
> > ffff880216e173e0
> > [  807.710723] R10: ffffffff814e5897 R11: ffffea0007413580 R12: 
> > ffff88020df26420
> > [  807.717847] R13: ffff88020df26434 R14: 0000000000000004 R15: 
> > ffff8801d04c42ce
> > [  807.724972] FS:  00007fdaab6048c0(0000) GS:ffff880216e00000(0000) 
> > knlGS:0000000000000000
> > [  807.733049] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> > [  807.738785] CR2: 0000000000000008 CR3: 000000020cbc9000 CR4: 
> > 00000000000006f0
> > [  807.745910] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 
> > 0000000000000000
> > [  807.753033] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 
> > 0000000000000400
> > [  807.760156] Stack:
> > [  807.762162]  ffff8801d04d6700 0000000000000001 ffff88020df26400 
> > ffff88020ae6de20
> > [  807.769586]  ffffffffa0444409 ffff88020b046a00 ffff88020ae6dde8 
> > ffffffff810105be
> > [  807.777008]  ffff88020b42a868 0000000000000000 ffff88020df264a8 
> > ffff88020df26348
> > [  807.784431] Call Trace:
> > [  807.786885]  [<ffffffffa0444409>] fcoe_ctlr_recv_work+0x59/0x9a0 
> > [libfcoe]
> > [  807.793755]  [<ffffffff810105be>] ? __switch_to+0x13e/0x4a0
> > [  807.799324]  [<ffffffff8107d0e6>] process_one_work+0x176/0x420
> > [  807.805151]  [<ffffffff8107dd0b>] worker_thread+0x11b/0x3a0
> > [  807.810717]  [<ffffffff8107dbf0>] ? rescuer_thread+0x350/0x350
> > [  807.816545]  [<ffffffff810842b0>] kthread+0xc0/0xd0
> > [  807.821416]  [<ffffffff810841f0>] ? insert_kthread_work+0x40/0x40
> > [  807.827503]  [<ffffffff8160ce2c>] ret_from_fork+0x7c/0xb0
> > [  807.832897]  [<ffffffff810841f0>] ? insert_kthread_work+0x40/0x40
> > [  807.858500] RIP  [<ffffffff814e6fc7>] skb_dequeue+0x47/0x70
> > [  807.864076]  RSP <ffff88020ae6dd70>
> > [  807.867558] CR2: 0000000000000008
> >
> > Looks like the root cause is the fact that the packet recieve function
> > fcoe_ctlr_recv enqueues the skb to a sk_buff_head_list prior to ensuring 
> > that
> > the skb is unshared.  This can happen when multiple packet listeners 
> > recieve an
> > skb, as the deliver_skb function just increments skb->users for each 
> > handler.
> > As a result, having multiple users of a single skb results in multiple
> > manipulators of its methods, implying list corruption, and the oops recorded
> > above.
> >
> > The fix is pretty easy, just make sure that we clone the skb if its got 
> > multiple
> > users with the skb_share_check function, like other protocols do.
> >
> > Signed-off-by: Neil Horman <[email protected]>
> > CC: Robert Love <[email protected]>
> > ---
> >   drivers/scsi/fcoe/fcoe_ctlr.c | 2 ++
> >   1 file changed, 2 insertions(+)
> >
> > diff --git a/drivers/scsi/fcoe/fcoe_ctlr.c b/drivers/scsi/fcoe/fcoe_ctlr.c
> > index 203415e..faf4b1c 100644
> > --- a/drivers/scsi/fcoe/fcoe_ctlr.c
> > +++ b/drivers/scsi/fcoe/fcoe_ctlr.c
> > @@ -1453,6 +1453,8 @@ err:
> >    */
> >   void fcoe_ctlr_recv(struct fcoe_ctlr *fip, struct sk_buff *skb)
> >   {
> > +   if ((skb = skb_share_check(skb, GFP_ATOMIC)) == NULL)
> > +           return;
> >     skb_queue_tail(&fip->fip_recv_list, skb);
> >     schedule_work(&fip->recv_work);
> >   }
> Checking for changes in the working directory ... done
> ERROR: do not use assignment in if condition
> #70: FILE: drivers/scsi/fcoe/fcoe_ctlr.c:1456:
> +    if ((skb = skb_share_check(skb, GFP_ATOMIC)) == NULL)
> 
> total: 1 errors, 0 warnings, 8 lines checked
> 
> patches-fixes/fcoe-ensure-that-skb-placed-on has style problems, please 
> review.
> 
> If any of these errors are false positives, please report
> them to the maintainer, see CHECKPATCH in MAINTAINERS.
> 
> The patch looks fine otherwise. I'd make the change myself, but I don't 
> have the time right now.
> 
I'm not sure why checkpatch is griping about that.  Its pretty common practice
to do so.  Look at ip_rcv and ipv6_rcv for examples
Neil

> //Rob
> _______________________________________________
> fcoe-devel mailing list
> [email protected]
> http://lists.open-fcoe.org/mailman/listinfo/fcoe-devel
> 
_______________________________________________
fcoe-devel mailing list
[email protected]
http://lists.open-fcoe.org/mailman/listinfo/fcoe-devel

Reply via email to