On 03/18/2016 03:28 AM, Jesse Brandeburg wrote: > On Thu, 17 Mar 2016 14:56:14 -0400 > Sowmini Varadhan <sowmini.varad...@oracle.com> wrote: > >> On (03/17/16 10:20), zhuyj wrote: >>> 1. modprobe NET_PKTGEN >>> >>> 2. download the tar file and uncompress to any directory. >>> This tar file is from kernel. It is in samples/pktgen/ >>> >>> 3. cd pktgen >>> >>> 4. pktgen_sample02_multiqueue.sh -i ethx -s size -t cpu_number >> Indeed, I see the same thing as you, and it was very easy to >> reproduce. It was very interesting that the problem can happen with >> as few as 3 threads, at which point I see the TX hang at exactly >> -s 12305 > Okay, sorry I hadn't jumped into this thread yet. > > I can uniquivically tell you that what Sowmini saw with the MDD with > stack based RDS-STRESS testing is *NOT* the same as what you're seeing > while using pktgen with invalid huge skb->data buffers. > > We can ask on netdev if the driver should defend against this kind of > input to hard_start_xmit (transmit routine), but the driver doesn't > check the maximum length of the skb to see if it is invalid, because > the stack can never build (only pktgen can) these invalid SKBs. > > The issue is that pktgen builds skb->data with a contiguous buffer of > whatever size transmit requested, (regardless of MTU) and then sends it > straight to the transmit routine, no segmentation flags, no MSS set. > > This causes the driver to build a transmit descriptor with an invalid > length, which the hardware then "ASSERTS" on by issuing an MDD > interrupt and freezing the bad acting queue. > >> I see: >> i40e 0000:82:00.0: TX driver issue detected, PF reset issued >> i40e 0000:82:00.0 eth2: VSI_seid 390, Hung TX queue 0, tx_pending: 492, >> NTC:0x140, HWB: 0x140, NTU: 0x12c, TAIL: 0x12c >> >> I think the common factor in both our test cases is that we have some >> kernel thread that can efficiently send packets without any context >> switches. > You've found a red herring (mistakenly connected two separate events) > so I think you can stop going down this path (pktgen). > >> Has anyone here seen this before? I'll see if I can find some cycles >> to figure this out, if not, maybe its worth bringing up on netdev, >> to see if others have seen this, and to draw some patterns. > we don't need to bring it up on netdev. We have a way to troubleshoot > MDDs that I can send to you, if you want to do the work. Otherwise we > need to have some time to reproduce here. > >>> If size is set to a big number, the similar defect will occur. >>> Adjust this size to a appropriate number, my defect will not occur. >>> >>> In the test, I found some types igb nic, such as i210, will work >>> well no matter the size is a big number. >>> some nic, such as 82580, it will not work well if the size is too big. > This is mostly a combination of driver implementation and how the > hardware handles a descriptor that is too large. The driver *could* > check to make sure the skb->data is never too large, but in that same > vein, we *could* fix pktgen to never send a frame greater than MTU down > to the driver. Do you mean this is not a bug in nic? And it is unnecessary to fix it?
But if a test tool makes tests like pktgen, how to handle it? We just suggests not to make such tests? Best Regards! Zhu Yanjun > >>> As such, I think my problem results from the hardware and the big >>> size triggers this problem. >>> >>> I hope this can help us all. > Unfortunately Zhu's problem with pktgen is not a reproducer of > Sowmini's problem. > > In the case of pktgen, it is a "don't do that, because it hurts" kind of > bug. In the case of rds-stress, we need to reproduce it here and figure > out what hardware constraint the driver is violating during set up of > the transmit. > > ------------------------------------------------------------------------------ Transform Data into Opportunity. Accelerate data analysis in your applications with Intel Data Analytics Acceleration Library. Click to learn more. http://pubads.g.doubleclick.net/gampad/clk?id=278785231&iu=/4140 _______________________________________________ E1000-devel mailing list E1000-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/e1000-devel To learn more about Intel® Ethernet, visit http://communities.intel.com/community/wired