[Kernel-packages] [Bug 1794877] Re: Crash in ixgbe, during tx packet xmit (while potentially changing queues number)

Guilherme G. Piccoli Thu, 27 Sep 2018 15:01:32 -0700

A preliminary analysis of the problem, based in a crash dump collected.

>From dmesg, we have


[28663.018356] BUG: unable to handle kernel NULL pointer dereference at 
0000000000000058 
[28663.026266] IP: [<ffffffffc00ddb21>] ixgbe_xmit_frame_ring+0x81/0xf50 
[ixgbe] 

Using addr2line to validate the line in the ixgbe code, we got:

#nm ixgbe.ko |grep "ixgbe_xmit_frame_ring" 
000000000000aaa0 T ixgbe_xmit_frame_ring 

# printf "%0x\n" $((0xaaa0+0x81)) 
ab21 

# addr2line -fip -e ixgbe.ko -j .text ab21 
ixgbe_xmit_frame_ring at 
[...]/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c:7403 


Checking the code, it gives us the inlined function ixgbe_maybe_stop_tx(), 
called from ixgbe_xmit_frame_ring(): 

static inline int ixgbe_maybe_stop_tx(struct ixgbe_ring *tx_ring, u16 size) 
{ 
if (likely(ixgbe_desc_unused(tx_ring) >= size)) 
[...] 
} 


Checking now the inlined function ixgbe_desc_unused(): 
static inline u16 ixgbe_desc_unused(struct ixgbe_ring *ring) 
{ 
u16 ntc = ring->next_to_clean; 
u16 ntu = ring->next_to_use; 
[...] 
} 


Using crash, we can validate the offset 0x58 in the struct ixgbe_ring 
(from the null dereference at 0000000000000058): 

crash> struct -ox ixgbe_ring|grep -A1 58 
[0x58] u16 next_to_use; 
[0x5a] u16 next_to_clean; 

It matches what is expected given the ixgbe_desc_unused() code; struct 
ixgbe_ring was null and the function tried to get the value of next_to_use. 

Although C code shows that the value "ring->next_to_clean" should 
trigger the crash before, compiler reordered the instructions as showed 
by the crash disassembly: 

crash> disassemble ixgbe_xmit_frame_ring 
[...] 
0xffffffffc00ddab8 <+24>: mov %rdx,%rbx 
[...] 
0xffffffffc00ddb21 <+129>: movzwl 0x58(%rbx),%eax 
0xffffffffc00ddb25 <+133>: movzwl 0x5a(%rbx),%esi 
[...] 


Finally, from the stack frame information in crash, we can double-validate 
that ixgbe_ring is null: 

crash> bt -f |grep ixgbe_xmit_frame_ring -A7 
[exception RIP: ixgbe_xmit_frame_ring+129] 
RIP: ffffffffc00ddb21 RSP: ffff88103f283d20 RFLAGS: 00010246 
RAX: 00000000000000c2 RBX: 0000000000000000 RCX: 0000000000000001 
RDX: 0000000000000000 RSI: ffff8800538c0840 RDI: ffff881034167ec0 
[...] 

Since the x86-64 ABI calling convention specifies that the parameters 
are passed in registers RDI, RSI, RDX (in that order), the 3rd parameter 
(ixgbe_ring) is in RDX, which is null. 

I'll continue the investigation now to understand why this value was null 
at this point.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1794877

Title:
  Crash in ixgbe, during tx packet xmit (while potentially changing
  queues number)

Status in linux package in Ubuntu:
  Confirmed

Bug description:
  It was reported that ixgbe driver may crash with the following stack
  trace, while changing interrupt/queue configuration (probably using
  ethtool --set-channel):

  [28661.949147] init: irqbalance main process (19397) killed by TERM signal 
  [28662.381154] ixgbe 0000:04:00.0: removed PHC on eth4 
  [28662.502142] ixgbe 0000:04:00.0: Multiqueue Enabled: Rx Queue count = 18, 
Tx Queue count = 18 
  [28662.588634] ixgbe 0000:04:00.0: registered PHC device on eth4 
  [28662.689789] br-iscsi-left: port 1(eth4.4011) entered disabled state 
  [28662.689951] br-sio-bel: port 1(eth4.4015) entered disabled state 
  [28662.690039] br-sio-fel: port 1(eth4.4017) entered disabled state 
  [28662.694227] ixgbe 0000:04:00.0 eth4: NIC Link is Up 10 Gbps, Flow Control: 
RX/TX 
  [28662.694506] br-iscsi-left: port 1(eth4.4011) entered forwarding state 
  [28662.694519] br-iscsi-left: port 1(eth4.4011) entered forwarding state 
  [28662.694596] br-sio-bel: port 1(eth4.4015) entered forwarding state 
  [28662.694604] br-sio-bel: port 1(eth4.4015) entered forwarding state 
  [28662.694651] br-sio-fel: port 1(eth4.4017) entered forwarding state 
  [28662.694658] br-sio-fel: port 1(eth4.4017) entered forwarding state 
  [28662.709921] ixgbe 0000:04:00.1: removed PHC on eth5 
  [28662.834289] ixgbe 0000:04:00.1: Multiqueue Enabled: Rx Queue count = 18, 
Tx Queue count = 18 
  [28662.915121] ixgbe 0000:04:00.1: registered PHC device on eth5 
  [28663.018209] ixgbe 0000:04:00.1 eth5: NIC Link is Up 10 Gbps, Flow Control: 
RX/TX 
  [28663.018356] BUG: unable to handle kernel NULL pointer dereference at 
0000000000000058 
  [28663.026266] IP: [<ffffffffc00ddb21>] ixgbe_xmit_frame_ring+0x81/0xf50 
[ixgbe] 
  [28663.033491] PGD 8000000046bcc067 PUD 46bcd067 PMD 0 
  [28663.038562] Oops: 0000 [#1] SMP 
  [28663.328921] Call Trace: 
  [28663.334598] <IRQ> 
  [28663.336551] [<ffffffffc00dea32>] ixgbe_xmit_frame+0x42/0x90 [ixgbe] 
  [28663.349627] [<ffffffff8171532d>] dev_hard_start_xmit+0x23d/0x400 
  [28663.358854] [<ffffffff81739d44>] sch_direct_xmit+0xe4/0x1f0 
  [28663.367602] [<ffffffff81739eeb>] __qdisc_run+0x9b/0x1c0 
  [28663.376110] [<ffffffff8171220e>] net_tx_action+0x15e/0x240 
  [28663.384673] [<ffffffff81084fb6>] __do_softirq+0xe6/0x2a0 
  [28663.392944] [<ffffffff81085395>] irq_exit+0x95/0xa0 
  [28663.400720] [<ffffffff8181d0f6>] do_IRQ+0x56/0xe0 
  [28663.408338] [<ffffffff8181a77f>] common_interrupt+0xbf/0xbf 
  [28663.416733] <EOI> 
  [28663.418680] [<ffffffff810998dc>] ? worker_thread+0x18c/0x480 
  [28663.430363] [<ffffffff81099750>] ? rescuer_thread+0x310/0x310 
  [28663.438870] [<ffffffff8109f138>] kthread+0xd8/0xf0 
  [28663.446368] [<ffffffff8109f060>] ? kthread_park+0x60/0x60 
  [28663.454385] [<ffffffff81819ff5>] ret_from_fork+0x55/0x80 
  [28663.462286] [<ffffffff8109f060>] ? kthread_park+0x60/0x60 
  [28663.470488] Code: 2a 41 83 e8 01 31 c0 45 0f b7 c0 49 83 c0 01 49 c1 e0 04 
8b 74 07 3c 48 83 c0 10 8d 96 ff 3f 00 00 c1 ea 0e 01 d1 4c 39 c0 75 e8 <0f> b7 
43 58 0f b7 73 5a 83 c1 03 31 d2 66 39 f0 66 0f 43 53 54 
  [28663.498992] RIP [<ffffffffc00ddb21>] ixgbe_xmit_frame_ring+0x81/0xf50 
[ixgbe] 
  [28663.512112] RSP <ffff88103f283d20> 
  [28663.518217] CR2: 0000000000000058

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1794877/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1794877] Re: Crash in ixgbe, during tx packet xmit (while potentially changing queues number)

Reply via email to