Hi Hope that posting to mail-list is ok
We are observing kernel-panics on the linux running in SMP Virtual Machines with emulated e1000 network adapter (82545EM controller). This panics happens quite frequently (once per 30 minutes) on the host with 16 CPUs and 10VMs (each has 4 virtual CPU). Panic occurs in function e1000_clean_tx_irq() e1000_clean_tx_irq() { ... if (cleaned) { struct sk_buff *skb = buffer_info->skb; << skb is NULL here sometimes segs = skb_shinfo(skb)->gso_segs ?: 1; ... } } the panic occurs due to the problem described by Terry Loftin for e1000 in the http://www.mail-archive.com/e1000-devel@lists.sourceforge.net/msg02575.html (patch is named e1000e: stop cleaning when we reach tx_ring->next_to_use), but patch seems to be incomplete (and was commited only to e1000e) and just minimizes occurence of situation: seems that it still can happen on the first enter to while or if descriptor with TSO-context is hit The problem occurs due to the race (race was discovered by printing message if (eop != tx_ring->buffer_info[i].next_to_watch) at start of while-loop): e1000_clean_tx_irq() { ... i = tx_ring->next_to_clean; eop = tx_ring->buffer_info[i].next_to_watch; eop_desc = E1000_TX_DESC(*tx_ring, eop); /* here is the race!*/ while (eop_desc->status & E1000_TXD_STAT_DD) { ... cleaned = (eop == i) } } if between read of next_to_watch and check for DD-status CPU is interrupted and another CPU queues sends and hardware completes the send, then we go inside the cycle with invalid eop and buffer_info->skb is NULL. As I see the most recent version (e1000-8.0.30) still has the problem. I've looked into e100e sources and found that e1000e doesn't panic, but has race with statistics: e1000_clean_tx_irq() { ... if (cleaned) { // note: cleaned could be false really, because eop is wrong! total_tx_packets += buffer_info->segs; total_tx_bytes += buffer_info->bytecount; } } and ixgbe-3.2.10 doesn't have any race at all, because eop_desc in it is either valid or NULL. Imho possible solutions are either to reread eop after DD is found, or to check skb for NULL before accessing it. Patch for e1000-8.0.30 for e1000 that rereads eop goes next message (I consider it more readable then check eop for NULL). Hope it is acceptable. Dmitry ------------------------------------------------------------------------------ Enable your software for Intel(R) Active Management Technology to meet the growing manageability and security demands of your customers. Businesses are taking advantage of Intel(R) vPro (TM) technology - will your software be a part of the solution? Download the Intel(R) Manageability Checker today! http://p.sf.net/sfu/intel-dev2devmar _______________________________________________ E1000-devel mailing list E1000-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/e1000-devel To learn more about Intel® Ethernet, visit http://communities.intel.com/community/wired