I was reading assembly and comparing with the code to evaluate the
accuracy of the registers during the dump, and also some points in which
it could has failed.

In the tx transmit function of ixgbe - ixgbe_xmit_frame(), we have the
following:

tx_ring = ring ? ring : adapter->tx_ring[skb->queue_mapping];

Checking the assembly, it really ignores ring in this point since it comes as 
NULL, so 
we are getting a NULL tx_ring due to adapter->tx_ring[skb->queue_mapping] being 
NULL. 

The struct sk_buff passed in %rdi is odd, it contains no valid data it seems. 
Even so, 
the queue_mapping is 0x0, and checking ixgbe_adapter during the crash moment, 
adapter->tx_ring[0x0] is valid and shouldn't cause the NULL pointer 
dereference. 

I think a race may be happening and in the moment tx_ring is assigned in 
ixgbe_xmit_frame(), it's NULL, but it's filled right after with a valid pointer 
in 
another function, running concurrently. 

I've noticed a queue allocation function assigns this pointer, and also, one 
interesting thing I've observed from dmesg is a successive amount of 
interface/queue re-initialization (it seems): 

[ 6.628974] ixgbe 0000:04:00.1: Multiqueue Enabled: Rx Queue count = 20, Tx 
Queue count = 20 
[...] 
[ 1493.198280] ixgbe 0000:04:00.1 eth5: NIC Link is Up 10 Gbps, Flow Control: 
RX/TX 
[...] 
[ 4113.173315] ixgbe 0000:04:00.1: Multiqueue Enabled: Rx Queue count = 19, Tx 
Queue count = 19 
[ 4113.365528] ixgbe 0000:04:00.1 eth5: NIC Link is Up 10 Gbps, Flow Control: 
RX/TX 
[...] 
[28662.834289] ixgbe 0000:04:00.1: Multiqueue Enabled: Rx Queue count = 18, Tx 
Queue count = 18 
[28663.018209] ixgbe 0000:04:00.1 eth5: NIC Link is Up 10 Gbps, Flow Control: 
RX/TX 
[28663.018356] BUG: unable to handle kernel NULL pointer dereference at 
0000000000000058 

So, noticed the number of queues is reducing by 1 each time we see these 
messages in dmesg. 
It seems triggered by "ethtool --set-channels" changing the number of tx/rx 
queues for the interface.


Also, an oddity from the dump: 

crash> ixgbe_adapter -x ffff8800538c0840 
struct ixgbe_adapter { 
active_vlans = {0x1, 0x0, [...] 0x0, 0x5500000000000, 0x0}, 
[...] 

So, besides the VLAN 0, there's more bits set in this bit field; I don't know 
why, it seems not 
expected, will study more the code.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1794877

Title:
  Crash in ixgbe, during tx packet xmit (while potentially changing
  queues number)

Status in linux package in Ubuntu:
  Confirmed

Bug description:
  It was reported that ixgbe driver may crash with the following stack
  trace, while changing interrupt/queue configuration (probably using
  ethtool --set-channel):

  [28661.949147] init: irqbalance main process (19397) killed by TERM signal 
  [28662.381154] ixgbe 0000:04:00.0: removed PHC on eth4 
  [28662.502142] ixgbe 0000:04:00.0: Multiqueue Enabled: Rx Queue count = 18, 
Tx Queue count = 18 
  [28662.588634] ixgbe 0000:04:00.0: registered PHC device on eth4 
  [28662.689789] br-iscsi-left: port 1(eth4.4011) entered disabled state 
  [28662.689951] br-sio-bel: port 1(eth4.4015) entered disabled state 
  [28662.690039] br-sio-fel: port 1(eth4.4017) entered disabled state 
  [28662.694227] ixgbe 0000:04:00.0 eth4: NIC Link is Up 10 Gbps, Flow Control: 
RX/TX 
  [28662.694506] br-iscsi-left: port 1(eth4.4011) entered forwarding state 
  [28662.694519] br-iscsi-left: port 1(eth4.4011) entered forwarding state 
  [28662.694596] br-sio-bel: port 1(eth4.4015) entered forwarding state 
  [28662.694604] br-sio-bel: port 1(eth4.4015) entered forwarding state 
  [28662.694651] br-sio-fel: port 1(eth4.4017) entered forwarding state 
  [28662.694658] br-sio-fel: port 1(eth4.4017) entered forwarding state 
  [28662.709921] ixgbe 0000:04:00.1: removed PHC on eth5 
  [28662.834289] ixgbe 0000:04:00.1: Multiqueue Enabled: Rx Queue count = 18, 
Tx Queue count = 18 
  [28662.915121] ixgbe 0000:04:00.1: registered PHC device on eth5 
  [28663.018209] ixgbe 0000:04:00.1 eth5: NIC Link is Up 10 Gbps, Flow Control: 
RX/TX 
  [28663.018356] BUG: unable to handle kernel NULL pointer dereference at 
0000000000000058 
  [28663.026266] IP: [<ffffffffc00ddb21>] ixgbe_xmit_frame_ring+0x81/0xf50 
[ixgbe] 
  [28663.033491] PGD 8000000046bcc067 PUD 46bcd067 PMD 0 
  [28663.038562] Oops: 0000 [#1] SMP 
  [28663.328921] Call Trace: 
  [28663.334598] <IRQ> 
  [28663.336551] [<ffffffffc00dea32>] ixgbe_xmit_frame+0x42/0x90 [ixgbe] 
  [28663.349627] [<ffffffff8171532d>] dev_hard_start_xmit+0x23d/0x400 
  [28663.358854] [<ffffffff81739d44>] sch_direct_xmit+0xe4/0x1f0 
  [28663.367602] [<ffffffff81739eeb>] __qdisc_run+0x9b/0x1c0 
  [28663.376110] [<ffffffff8171220e>] net_tx_action+0x15e/0x240 
  [28663.384673] [<ffffffff81084fb6>] __do_softirq+0xe6/0x2a0 
  [28663.392944] [<ffffffff81085395>] irq_exit+0x95/0xa0 
  [28663.400720] [<ffffffff8181d0f6>] do_IRQ+0x56/0xe0 
  [28663.408338] [<ffffffff8181a77f>] common_interrupt+0xbf/0xbf 
  [28663.416733] <EOI> 
  [28663.418680] [<ffffffff810998dc>] ? worker_thread+0x18c/0x480 
  [28663.430363] [<ffffffff81099750>] ? rescuer_thread+0x310/0x310 
  [28663.438870] [<ffffffff8109f138>] kthread+0xd8/0xf0 
  [28663.446368] [<ffffffff8109f060>] ? kthread_park+0x60/0x60 
  [28663.454385] [<ffffffff81819ff5>] ret_from_fork+0x55/0x80 
  [28663.462286] [<ffffffff8109f060>] ? kthread_park+0x60/0x60 
  [28663.470488] Code: 2a 41 83 e8 01 31 c0 45 0f b7 c0 49 83 c0 01 49 c1 e0 04 
8b 74 07 3c 48 83 c0 10 8d 96 ff 3f 00 00 c1 ea 0e 01 d1 4c 39 c0 75 e8 <0f> b7 
43 58 0f b7 73 5a 83 c1 03 31 d2 66 39 f0 66 0f 43 53 54 
  [28663.498992] RIP [<ffffffffc00ddb21>] ixgbe_xmit_frame_ring+0x81/0xf50 
[ixgbe] 
  [28663.512112] RSP <ffff88103f283d20> 
  [28663.518217] CR2: 0000000000000058

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1794877/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to