Re: [bisected regression] e1000e: "Detected Hardware Unit Hang"

2019-05-02 Thread Juliana Rodrigueiro
Hi All.

While updating to kernel 4.19, we realised that a problem reported in 2015 for 
kernel 3.7 is still around. Please see this link for more details: https://
marc.info/?l=linux-netdev&m=142124954120315

Basically, when using the e1000e driver, each few minutes the following 
messages appear in dmesg or system log.

[12465.174759] e1000e :00:19.0 eth0: Detected Hardware Unit Hang:
  TDH  
  TDT  
  next_to_use  
  next_to_clean
buffer_info[next_to_clean]:
  time_stamp   <2e5e92>
  next_to_watch
  jiffies  <2e67e8>
  next_to_watch.status <0>
MAC Status <40080083>
PHY Status <796d>
PHY 1000BASE-T Status  <7800>
PHY Extended Status<3000>
PCI Status <10>

Back in 2015, we applied a workaround that decreases the page size:

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 85ab7d7..9f0ef97 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -2108,7 +2108,7 @@ static inline void __skb_queue_purge(struct 
sk_buff_head *list)
kfree_skb(skb);
 }
 
-#define NETDEV_FRAG_PAGE_MAX_ORDER get_order(32768)
+#define NETDEV_FRAG_PAGE_MAX_ORDER get_order(4096)
 #define NETDEV_FRAG_PAGE_MAX_SIZE  (PAGE_SIZE << NETDEV_FRAG_PAGE_MAX_ORDER)
 #define NETDEV_PAGECNT_MAX_BIAS   NETDEV_FRAG_PAGE_MAX_SIZE
 

Testing kernel 4.19 with the same hardware showed the same problems, so we 
tried to adapt the old workaround to the current code:

diff -u -r -p linux-4.19.i686/net/core/sock.c linux-4.19.i686.e1000e/net/core/
sock.c
--- linux-4.19.i686/net/core/sock.c 2019-03-22 13:55:24.198266383 +0100
+++ linux-4.19.i686.e1000e/net/core/sock.c  2019-03-22 13:56:43.165765856 
+0100
@@ -2183,7 +2183,8 @@ static void sk_leave_memory_pressure(str
 }
 
 /* On 32bit arches, an skb frag is limited to 2^15 */
-#define SKB_FRAG_PAGE_ORDERget_order(32768)
+/* Limit to 4096 instead of 32768 */
+#define SKB_FRAG_PAGE_ORDERget_order(4096)
 
 /**
  * skb_page_frag_refill - check that a page_frag contains enough room


Unfortunately, this patch does not help with the "Unit Hang" messages anymore, 
the problem occurs with any page size.


Some insight in how to deal with this problem would be very much appreciated.

Thank you!







Re: [bisected regression] e1000e: "Detected Hardware Unit Hang"

2015-07-29 Thread Thomas Jarosch
Hi Jeff and Yanir,

On Saturday, 30. May 2015 01:18:44 Brown, Aaron F wrote:
> > any news on this from the Intel labs?
> 
> Nothing significant.  Another one of our testers (whom works more closely
> with the current e1000e driver owner than I) has managed to replicate it
> on several systems and I know the developer spent some time poking around
> the setup, but I don't think he's found the root cause yet and has been
> busy chasing a number of other issues.

so, any news from the Intel labs? I've seen some "hang fixes"
on 03.06.2015, but I'm not sure if they are related to this issue.

This problem is pretty annoying: We have a performance penalty for all 
network cards right now as the buffer size of the core network stack
had to be decreased to 4096 bytes again on our side.
(https://www.marc.info/?l=linux-netdev&m=142131668206333)
Better than no e1000e network connectivity though.

The initial report on this issue was on 14.01.2015:
https://www.marc.info/?l=linux-netdev&m=142124954120315

Best regards,
Thomas

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: RE: [bisected regression] e1000e: "Detected Hardware Unit Hang"

2015-05-29 Thread Brown, Aaron F
> From: Thomas Jarosch [mailto:thomas.jaro...@intra2net.com]
> Sent: Wednesday, May 27, 2015 9:01 AM
> To: Brown, Aaron F
> Cc: Kirsher, Jeffrey T; 'Linux Netdev List'; Eric Dumazet; e1000-devel
> Subject: Re: RE: [bisected regression] e1000e: "Detected Hardware Unit
> Hang"
> 
> Hi Aaron,
> 
> On Monday, 23. March 2015 22:37:08 Brown, Aaron F wrote:
> > > >
> > > > And with an internal reproduction of the issue I have created an
> > >
> > > internal
> > >
> > > > bug report, described my set of reproductions, referenced the
> similar
> > > > external ones and assigned it to our current e1000e developer.
> > >
> > >
> > > just wanted to quickly check if there has been any progress
> > > since the internal bug report has been filed?
> >
> >
> > No, no updates beyond a bit of investigation.
> 
> any news on this from the Intel labs?

Nothing significant.  Another one of our testers (whom works more closely with 
the current e1000e driver owner than I) has managed to replicate it on several 
systems and I know the developer spent some time poking around the setup, but I 
don't think he's found the root cause yet and has been busy chasing a number of 
other issues.

> 
> Another two months passed ;) It would be nice to get rid
> of the workaround that limits the max fragment size to 4096.
> 
> Thanks,
> Thomas

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RE: [bisected regression] e1000e: "Detected Hardware Unit Hang"

2015-05-27 Thread Thomas Jarosch
Hi Aaron,

On Monday, 23. March 2015 22:37:08 Brown, Aaron F wrote:
> > >
> > > And with an internal reproduction of the issue I have created an
> > 
> > internal
> > 
> > > bug report, described my set of reproductions, referenced the similar
> > > external ones and assigned it to our current e1000e developer.
> > 
> > 
> > just wanted to quickly check if there has been any progress
> > since the internal bug report has been filed?
> 
> 
> No, no updates beyond a bit of investigation.

any news on this from the Intel labs?

Another two months passed ;) It would be nice to get rid
of the workaround that limits the max fragment size to 4096.

Thanks,
Thomas

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html