RE: [Problem] broadcom tg3 network driver disconnects under high load

2015-08-10 Thread Satish Baddipadige
Hi Toan,

We could not reproduce the issue.

We have followed the below steps.
 
1. Booted to HP DeskElite 705 with Ubuntu 15.04.
2. Created 1G file with urandom
3. From another machine, repeatedly copied the 1G file back and forth 
with scp


With this set up and tests, we were unable to reproduce the issue. 

As we discussed offline, We also tried with your custom OS on HP DeskElite 705 
and couldn't reproduce the issue.

As you suggested We also tried with 1G file provided by you but couldn't 
reproduce the issue.

Since We can't reproduce this issue, We are unable to proceed further.

Thank You for continuous efforts and help.

Thanks,
Satish

-Original Message-
From: netdev-ow...@vger.kernel.org [mailto:netdev-ow...@vger.kernel.org] On 
Behalf Of Prashant Sreedharan
Sent: Thursday, April 30, 2015 12:25 AM
To: Toan Pham
Cc: Michael Chan; Sanjeev Bansal; netdev@vger.kernel.org
Subject: Re: [Problem] broadcom tg3 network driver disconnects under high load

On Wed, 2015-04-29 at 13:34 -0400, Toan Pham wrote:
> Prashant,
> 
> Unfortunately, I ran the same test 3 times with the new patch and all 
> of them failed.
> Attached file is the dmesg log, after the Watchdog had timed out, and 
> tried to restart the NIC.
> Feel free to let me know if you would like to try anything else.  
> Thanks
Toan thanks for result, so this looks to be a different problem. Sanjeev is 
setting up repo environment similar to yours to capture a pcie trace.
Will keep you posted.  


--
To unsubscribe from this list: send the line "unsubscribe netdev" in the body 
of a message to majord...@vger.kernel.org More majordomo info at  
http://vger.kernel.org/majordomo-info.html
N�r��yb�X��ǧv�^�)޺{.n�+���z�^�)w*jg����ݢj/���z�ޖ��2�ޙ&�)ߡ�a�����G���h��j:+v���w��٥

Re: [Problem] broadcom tg3 network driver disconnects under high load

2015-04-29 Thread Prashant Sreedharan
On Wed, 2015-04-29 at 13:34 -0400, Toan Pham wrote:
> Prashant,
> 
> Unfortunately, I ran the same test 3 times with the new patch and all
> of them failed.
> Attached file is the dmesg log, after the Watchdog had timed out, and
> tried to restart the NIC.
> Feel free to let me know if you would like to try anything else.  Thanks
Toan thanks for result, so this looks to be a different problem. Sanjeev
is setting up repo environment similar to yours to capture a pcie trace.
Will keep you posted.  


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Problem] broadcom tg3 network driver disconnects under high load

2015-04-28 Thread Prashant Sreedharan
On Tue, 2015-04-28 at 16:06 -0400, Toan Pham wrote:
> > We were able to reproduce this issue internally only with iommu enabled.
> 
> My last test to collect lspci-info took about 5 hours over a gigabit
> network for the bug to show up.  My setup was running 3 tx scp
> sessions, each transferring a 1GB file outbound, and 1 rx scp session
> copying another 1GB file inbound.  In a production environment with
> the BCM5762 NIC running as a server, I observed that the failure rate
> is about 1.65/week.  Please perform a similar test with iommu
> disabled, and leave it running for days if need be.

Sure will try
> 
> 
> >  Meanwhile can you try the attached patch and see if you are able to 
> > reproduce the problem ?
> 
> No problem.  I will apply the patch to kernel 4.0 and report back the
> result.  Let me know if you need me to turn on any debug options like
> pcie trace, dev debug etc  Thanks

If you can collect pcie trace that would be great. Thanks


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Problem] broadcom tg3 network driver disconnects under high load

2015-04-28 Thread Toan Pham
> We were able to reproduce this issue internally only with iommu enabled.

My last test to collect lspci-info took about 5 hours over a gigabit
network for the bug to show up.  My setup was running 3 tx scp
sessions, each transferring a 1GB file outbound, and 1 rx scp session
copying another 1GB file inbound.  In a production environment with
the BCM5762 NIC running as a server, I observed that the failure rate
is about 1.65/week.  Please perform a similar test with iommu
disabled, and leave it running for days if need be.


>  Meanwhile can you try the attached patch and see if you are able to 
> reproduce the problem ?

No problem.  I will apply the patch to kernel 4.0 and report back the
result.  Let me know if you need me to turn on any debug options like
pcie trace, dev debug etc  Thanks
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Problem] broadcom tg3 network driver disconnects under high load

2015-04-28 Thread Prashant Sreedharan
On Tue, 2015-04-28 at 11:11 -0700, Michael Chan wrote:
> On Mon, 2015-04-27 at 22:10 +, Toan Pham wrote: 
> > Michael,
> > 
> > 
> > Please see attach files.
> > 
> > BTW, I have also tested this bug on at least 8 different HP 705 PCs
> > with the 5762 NIC, so it is probably not a manufacturer defect.  In
> > addition, I can never replicate the same issue on the older chipset,
> > BCM5761, which can be found on the HP model 6005.  I hope this
> > information is helpful.  Thanks
> 
> Thanks for the data.  The memory enable bit is cleared and there are
> some correctable error bits set.  My colleague Sanjeev will look into
> this.
> 
> Do you have PCIE Advanced Error Reporting (CONFIG_PCIEAER) enabled in
> your kernel?
> 

5762 NIC has a bug due to which the chip would detect false 4G boundary
crossing and it would stall the chip. With the data you have provided it
is not clear whether we are hitting this problem or not. Register 0x4c04
bit 5 would be set when this condition occurs. But since the memory
enable bit is clear the register dump collected before the chip was
reset is having all garbage in it. 

We were able to reproduce this issue internally only with iommu enabled.
In your dmesg logs I do not see iommu enabled. So unless we have a pcie
trace we cannot confirm if this HW bug is indeed the problem you are
seeing.

Meanwhile can you try the attached patch and see if you are able to
reproduce the problem ? This patch will restrict all DMA address given
to the chip to 31 bits.

Toan, thanks for bringing this to our notice, also please cc maintainers
so that mails are not missed.
>From 488fd699985f73d361d04d4788de48833c6442ca Mon Sep 17 00:00:00 2001
From: Prashant Sreedharan 
Date: Tue, 28 Apr 2015 11:32:56 -0700
Subject: [PATCH] tg3: Restrict DMA address to 31 bits for 5762 device

---
 drivers/net/ethernet/broadcom/tg3.c |   13 +
 1 files changed, 13 insertions(+), 0 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/tg3.c b/drivers/net/ethernet/broadcom/tg3.c
index 069952f..e980c96 100644
--- a/drivers/net/ethernet/broadcom/tg3.c
+++ b/drivers/net/ethernet/broadcom/tg3.c
@@ -17707,6 +17707,8 @@ static int tg3_init_one(struct pci_dev *pdev,
 	 */
 	if (tg3_flag(tp, IS_5788))
 		persist_dma_mask = dma_mask = DMA_BIT_MASK(32);
+	else if (tg3_asic_rev(tp) == ASIC_REV_5762)
+		persist_dma_mask = dma_mask = DMA_BIT_MASK(31);
 	else if (tg3_flag(tp, 40BIT_DMA_BUG)) {
 		persist_dma_mask = dma_mask = DMA_BIT_MASK(40);
 #ifdef CONFIG_HIGHMEM
@@ -17736,6 +17738,17 @@ static int tg3_init_one(struct pci_dev *pdev,
 "No usable DMA configuration, aborting\n");
 			goto err_out_apeunmap;
 		}
+	} else {
+		err = pci_set_dma_mask(pdev, dma_mask);
+		if (!err) {
+			err = pci_set_consistent_dma_mask(pdev,
+			  persist_dma_mask);
+		}
+		if (err) {
+			dev_err(&pdev->dev,
+"No usable DMA configuration, aborting\n");
+			goto err_out_apeunmap;
+		}
 	}
 
 	tg3_init_bufmgr_config(tp);
-- 
1.7.1



Re: [Problem] broadcom tg3 network driver disconnects under high load

2015-04-28 Thread Toan Pham
>  Do you have PCIE Advanced Error Reporting (CONFIG_PCIEAER) enabled in
your kernel?

Yes, it is enabled.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Problem] broadcom tg3 network driver disconnects under high load

2015-04-28 Thread Michael Chan
On Mon, 2015-04-27 at 22:10 +, Toan Pham wrote: 
> Michael,
> 
> 
> Please see attach files.
> 
> BTW, I have also tested this bug on at least 8 different HP 705 PCs
> with the 5762 NIC, so it is probably not a manufacturer defect.  In
> addition, I can never replicate the same issue on the older chipset,
> BCM5761, which can be found on the HP model 6005.  I hope this
> information is helpful.  Thanks

Thanks for the data.  The memory enable bit is cleared and there are
some correctable error bits set.  My colleague Sanjeev will look into
this.

Do you have PCIE Advanced Error Reporting (CONFIG_PCIEAER) enabled in
your kernel?

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Problem] broadcom tg3 network driver disconnects under high load

2015-04-27 Thread Toan Pham
Michael,


Please see attach files.

BTW, I have also tested this bug on at least 8 different HP 705 PCs
with the 5762 NIC, so it is probably not a manufacturer defect.  In
addition, I can never replicate the same issue on the older chipset,
BCM5761, which can be found on the HP model 6005.  I hope this
information is helpful.  Thanks
03:00.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5762 Gigabit 
Ethernet PCIe (rev 10)
Subsystem: Hewlett-Packard Company Device 2215
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- 
Stepping- SERR+ FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR- 
Capabilities: [160 v1] Virtual Channel
Caps:   LPEVC=0 RefClk=100ns PATEntryBits=1
Arb:Fixed- WRR32- WRR64- WRR128-
Ctrl:   ArbSelect=Fixed
Status: InProgress-
VC0:Caps:   PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
Arb:Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
Ctrl:   Enable+ ID=0 ArbSelect=Fixed TC/VC=01
Status: NegoPending- InProgress-
Capabilities: [1b0 v1] Latency Tolerance Reporting
Max snoop latency: 0ns
Max no snoop latency: 0ns
Capabilities: [230 v1] Transaction Processing Hints
Interrupt vector mode supported
Steering table in MSI-X table
Kernel driver in use: tg3
00: e4 14 87 16 06 05 10 00 10 00 00 02 10 00 00 00
10: 0c 00 02 e0 00 00 00 00 0c 00 01 e0 00 00 00 00
20: 0c 00 00 e0 00 00 00 00 00 00 00 00 3c 10 15 22
30: 00 00 00 00 48 00 00 00 00 00 00 00 05 01 00 00
40: 00 00 00 00 00 00 00 fa 01 50 03 c8 08 20 00 16
50: 03 58 fc 80 00 00 00 78 05 a0 86 00 00 00 00 00
60: 00 00 00 00 00 00 00 00 98 02 00 f1 d1 02 f8 01
70: 10 10 07 00 00 ff 00 ff 00 00 00 00 00 00 00 00
80: e4 14 87 16 40 00 00 40 00 00 00 00 a5 09 00 00
90: 00 00 00 00 d2 01 00 00 00 00 00 00 4d 04 00 00
a0: 11 ac 05 80 04 00 00 00 22 01 00 00 10 00 02 00
b0: 82 8d 00 10 00 54 10 00 12 5c 47 00 43 00 12 10
c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
d0: 1f 08 08 00 00 00 00 00 00 00 00 00 01 00 01 00
e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
f0: 00 00 00 00 00 21 76 05 00 00 00 00 ff ff ff ff
100: 01 00 c1 13 00 00 10 00 00 00 00 00 30 20 06 00
110: 40 20 00 00 00 20 00 00 b4 00 00 00 01 10 00 40
120: 0f 00 00 00 48 3c 02 e0 00 00 00 00 00 00 00 00
130: 00 00 00 00 00 00 00 00 00 00 00 00 03 00 01 15
140: 8a 82 47 06 51 64 00 00 00 00 00 00 00 00 00 00
150: 04 00 01 16 00 00 00 00 16 81 07 00 01 00 00 00
160: 02 00 01 1b 00 00 00 00 00 00 00 00 00 00 00 00
170: 00 00 00 00 01 00 00 80 00 00 00 00 00 00 00 00
180: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
190: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
1a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
1b0: 18 00 01 23 00 00 00 00 00 00 00 00 00 00 00 00
1c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
1d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
1e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
1f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
200: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
210: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
220: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
230: 17 00 01 00 03 04 05 00 00 00 00 00 00 00 00 00
240: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
250: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
260: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
270: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
280: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
290: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
2a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
2b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
2c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
2d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
2e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
2f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
300: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
310: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
320: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
330: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
340: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
350: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
360: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
370: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
380: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
390: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
3a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
3b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
3c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
3d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
3e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
3f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
410: 00 00 00 00 00 00 00 00 00 00 00 00 00 0

Re: [Problem] broadcom tg3 network driver disconnects under high load

2015-04-24 Thread Michael Chan
On Fri, 2015-04-24 at 12:33 -0400, Toan Pham wrote: 
> Summary:  Broadcom 5762 NIC locks up under heavy load.

Can you provide lspci -vvvxxx -s 3:0.0

after it gets into this state?

Thanks.


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Problem] broadcom tg3 network driver disconnects under high load

2015-04-24 Thread Toan Pham
Summary:  Broadcom 5762 NIC locks up under heavy load.


Description:

The tg3 Broadcom network driver that binds with chipset 5762 locks up
when under heavy network load. When this happens, a reboot is
necessary to recover network. Sometimes, bringing the interface
offline and online (via ifconfig) would recover networking. I've also
tested with the latest tg3 driver 3.137h (dec 2014 version) and
networking is still problematic. I have also disabled TSO, GSO etc...
with ethtool, but the bug still surfaces. This bug may be related to
the integrated Firmware because at the time of the crash, the memory
dump of the bcm5762 chip is completely cleared out with 0xFFs.

Here is the procedure to replicate the issue because it is hard to
replicate it under moderate network load.

1. Bootup a machine with a broadcom 5762 NIC (ie. HP DeskElite 705)
using a Ubuntu/Kubunu Live CD 14.04-15.04, or a build with the latest
mainline kernel.
2. From another machine: start 5 sessions, repetitively copy (scp with
public key authentication) a 70 MB file back and forth to the tg3
machine in each session. (not sure if this is necessary)
3. Create a 1GB file on the tg3 machine, with something like dd
if=/dev/urandom of=/my_test_file bs=1024 count=$((1024*1000))
4. From another machine: repetitively secure copy that 1GB file from
the tg3 machine. This can be done with something like:

while [ 0 ]; do
   scp -i /my/scp/private.key u...@ip.of.tg3:/my_test_file /tmp
done;

Networking will lockup in about 10-30 minutes, in some rare cases up
to 4 hours of run time.  Having multiple instances of the 1GB file
transfer will significantly reduce the occurrence time.


Keywords: networking, tg3

kernel version: Linux version 4.0.0-gbf70def.  I have also tested with
the following kernel versions:  3.17, 3.16, 2.6.39.

Kernel log message (Oops):  (see full ref:
https://launchpadlibrarian.net/204185480/dmesg)

WARNING: CPU: 0 PID: 1830 at net/sched/sch_generic.c:303
dev_watchdog+0xfc/0x185()
NETDEV WATCHDOG: eth0 (tg3): transmit queue 0 timed out
Modules linked in:
CPU: 0 PID: 1830 Comm: cat Not tainted 4.0.0-gbf70def #4
Hardware name: Hewlett-Packard HP EliteDesk 705 G1 MT/2215, BIOS L06
v02.15 10/22/2014
   f581df18 c06e5045 c0a7ec29 f581df30 c01319e9 c0668e77
 f4c3  0005da10 f581df48 c0131a73 0009 f581df40 c0a7ec29
 f581df5c f581df78 c0668e77 c0a7ec62 012f c0a7ec29 f4c3 c0a60eba
Call Trace:
 [] dump_stack+0x41/0x52
 [] warn_slowpath_common+0x83/0x9a
 [] ? dev_watchdog+0xfc/0x185
 [] warn_slowpath_fmt+0x2b/0x2f
 [] dev_watchdog+0xfc/0x185
 [] ? pfifo_fast_dequeue+0xaf/0xaf
 [] call_timer_fn+0x47/0xcd
 [] run_timer_softirq+0x165/0x1c4
 [] ? pfifo_fast_dequeue+0xaf/0xaf
 [] __do_softirq+0xbe/0x1ef
 [] ? _local_bh_enable+0x40/0x40
 [] do_softirq_own_stack+0x22/0x28
   [] irq_exit+0x39/0x47
 [] smp_apic_timer_interrupt+0x38/0x42
 [] apic_timer_interrupt+0x2d/0x34
 [] ? _raw_spin_unlock_irqrestore+0xd/0xf
 [] extract_buf+0x83/0xc7
 [] extract_entropy_user+0xc2/0x11a
 [] urandom_read+0x68/0xbf
 [] ? extract_entropy_user+0x11a/0x11a
 [] __vfs_read+0x1b/0x47
 [] vfs_read+0x6b/0xd3
 [] SyS_read+0x44/0x84
 [] syscall_call+0x7/0x7


System info and detailed description:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1447664


I can help test proposed patches fairly quickly.  So please let me
know if you need anything.  Thank you.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html