Re: NETDEV WATCHDOG: eth0 (tg3): transmit queue 0 timed out

2018-03-26 Thread Borislav Petkov
On Tue, Mar 20, 2018 at 11:41:06AM +0530, Satish Baddipadige wrote:
> Can you please test the attached patch?

Well, the network connection just died with it. It didn't fire the
netdev watchdog but I still had to down and up eth0 in order to continue
using it. ssh connection into the box survived so I didn't have to login
again but it still died intermittently.

I'll keep playing with it to see if I'll catch some sort of splat...

Thx.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.


Re: NETDEV WATCHDOG: eth0 (tg3): transmit queue 0 timed out

2018-03-26 Thread Borislav Petkov
On Tue, Mar 20, 2018 at 11:41:06AM +0530, Satish Baddipadige wrote:
> Can you please test the attached patch?

Well, the network connection just died with it. It didn't fire the
netdev watchdog but I still had to down and up eth0 in order to continue
using it. ssh connection into the box survived so I didn't have to login
again but it still died intermittently.

I'll keep playing with it to see if I'll catch some sort of splat...

Thx.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.


Re: NETDEV WATCHDOG: eth0 (tg3): transmit queue 0 timed out

2018-03-20 Thread Borislav Petkov
On Tue, Mar 20, 2018 at 11:41:06AM +0530, Satish Baddipadige wrote:
> Can you please test the attached patch?

Sure, will do when I get back next week.

Thx.

-- 
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply. Srsly.


Re: NETDEV WATCHDOG: eth0 (tg3): transmit queue 0 timed out

2018-03-20 Thread Borislav Petkov
On Tue, Mar 20, 2018 at 11:41:06AM +0530, Satish Baddipadige wrote:
> Can you please test the attached patch?

Sure, will do when I get back next week.

Thx.

-- 
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply. Srsly.


Re: NETDEV WATCHDOG: eth0 (tg3): transmit queue 0 timed out

2018-03-20 Thread Satish Baddipadige
On Wed, Feb 28, 2018 at 7:40 PM, Siva Reddy Kallam
 wrote:
> On Sat, Feb 24, 2018 at 3:48 PM, Borislav Petkov  wrote:
>> Hi,
>>
>> this didn't happen before but after 4.16-rc1 my tg3 nic stops for
>> whatever reason and the connection to the machine is dead. It didn't show
>> anything in dmesg until today.
>>
>> The IO pagefaults look like it is trying to access something it
>> shouldn't and maybe that's why it times out.
>>
>> It triggers pretty quickly so I'd call it a reliable reproducer and thus
>> I can test patches... :-)
>>
>> Thx.
> Thanks for reporting this. Somehow, this mail moved to my spam folder.
> Hence, delay in response.
> Looks like this is similar to below issue and it was reported some time back.
> https://www.spinics.net/lists/netdev/msg482757.html
> We are actively working on this. We will soon provide you an update on this.

Hi Borislav,

Can you please test the attached patch?

Thanks,
Satish


tg3_5762_clock_override.patch
Description: Binary data


Re: NETDEV WATCHDOG: eth0 (tg3): transmit queue 0 timed out

2018-03-20 Thread Satish Baddipadige
On Wed, Feb 28, 2018 at 7:40 PM, Siva Reddy Kallam
 wrote:
> On Sat, Feb 24, 2018 at 3:48 PM, Borislav Petkov  wrote:
>> Hi,
>>
>> this didn't happen before but after 4.16-rc1 my tg3 nic stops for
>> whatever reason and the connection to the machine is dead. It didn't show
>> anything in dmesg until today.
>>
>> The IO pagefaults look like it is trying to access something it
>> shouldn't and maybe that's why it times out.
>>
>> It triggers pretty quickly so I'd call it a reliable reproducer and thus
>> I can test patches... :-)
>>
>> Thx.
> Thanks for reporting this. Somehow, this mail moved to my spam folder.
> Hence, delay in response.
> Looks like this is similar to below issue and it was reported some time back.
> https://www.spinics.net/lists/netdev/msg482757.html
> We are actively working on this. We will soon provide you an update on this.

Hi Borislav,

Can you please test the attached patch?

Thanks,
Satish


tg3_5762_clock_override.patch
Description: Binary data


Re: NETDEV WATCHDOG: eth0 (tg3): transmit queue 0 timed out

2018-02-28 Thread Siva Reddy Kallam
On Sat, Feb 24, 2018 at 3:48 PM, Borislav Petkov  wrote:
> Hi,
>
> this didn't happen before but after 4.16-rc1 my tg3 nic stops for
> whatever reason and the connection to the machine is dead. It didn't show
> anything in dmesg until today.
>
> The IO pagefaults look like it is trying to access something it
> shouldn't and maybe that's why it times out.
>
> It triggers pretty quickly so I'd call it a reliable reproducer and thus
> I can test patches... :-)
>
> Thx.
Thanks for reporting this. Somehow, this mail moved to my spam folder.
Hence, delay in response.
Looks like this is similar to below issue and it was reported some time back.
https://www.spinics.net/lists/netdev/msg482757.html
We are actively working on this. We will soon provide you an update on this.


Re: NETDEV WATCHDOG: eth0 (tg3): transmit queue 0 timed out

2018-02-28 Thread Siva Reddy Kallam
On Sat, Feb 24, 2018 at 3:48 PM, Borislav Petkov  wrote:
> Hi,
>
> this didn't happen before but after 4.16-rc1 my tg3 nic stops for
> whatever reason and the connection to the machine is dead. It didn't show
> anything in dmesg until today.
>
> The IO pagefaults look like it is trying to access something it
> shouldn't and maybe that's why it times out.
>
> It triggers pretty quickly so I'd call it a reliable reproducer and thus
> I can test patches... :-)
>
> Thx.
Thanks for reporting this. Somehow, this mail moved to my spam folder.
Hence, delay in response.
Looks like this is similar to below issue and it was reported some time back.
https://www.spinics.net/lists/netdev/msg482757.html
We are actively working on this. We will soon provide you an update on this.


NETDEV WATCHDOG: eth0 (tg3): transmit queue 0 timed out

2018-02-24 Thread Borislav Petkov
Hi,

this didn't happen before but after 4.16-rc1 my tg3 nic stops for
whatever reason and the connection to the machine is dead. It didn't show
anything in dmesg until today.

The IO pagefaults look like it is trying to access something it
shouldn't and maybe that's why it times out.

It triggers pretty quickly so I'd call it a reliable reproducer and thus
I can test patches... :-)

Thx.

...
[   15.916840] random: crng init done
[   44.792699] tg3 :01:00.0 eth0: Link is up at 100 Mbps, full duplex
[   44.793024] tg3 :01:00.0 eth0: Flow control is on for TX and on for RX
[   44.793315] tg3 :01:00.0 eth0: EEE is disabled
[   44.793395] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[   58.216474] tg3 :01:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT 
domain=0x0001 address=0x0001f0c0 flags=0x]
[   58.216943] tg3 :01:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT 
domain=0x0001 address=0x0001f100 flags=0x]
[   58.217395] tg3 :01:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT 
domain=0x0001 address=0x0001f140 flags=0x]
[   58.217844] tg3 :01:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT 
domain=0x0001 address=0x0001f180 flags=0x]
[   64.992145] [ cut here ]
[   64.992406] NETDEV WATCHDOG: eth0 (tg3): transmit queue 0 timed out
[   64.992742] WARNING: CPU: 1 PID: 0 at net/sched/sch_generic.c:464 
dev_watchdog+0x1fe/0x210
[   64.992744] Modules linked in: arc4 iwlmvm mac80211 amdgpu kvm_amd kvm 
iwlwifi irqbypass crct10dif_pclmul crc32_pclmul crc32c_intel 
snd_hda_codec_conexant snd_hda_codec_hdmi snd_hda_codec_generic aesni_intel 
sha256_generic aes_x86_64 crypto_simd snd_hda_intel cryptd glue_helper tg3 
snd_hda_codec pcspkr snd_hwdep cfg80211 joydev psmouse ptp snd_hda_core hp_wmi 
pps_core snd_pcm ehci_pci chash tpm_infineon rfkill libphy i2c_piix4 snd_timer 
fam15h_power xhci_pci ehci_hcd snd sg gpu_sched k10temp soundcore xhci_hcd 
tpm_tis tpm_tis_core video tpm battery button ac acpi_cpufreq evdev input_leds 
serio_raw sd_mod thermal pinctrl_amd
[   64.993216] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.16.0-rc1+ #2
[   64.993222] Hardware name: HP HP EliteBook 745 G3/807E, BIOS N73 Ver. 01.08 
01/28/2016
[   64.996048] RIP: 0010:dev_watchdog+0x1fe/0x210
[   64.996050] RSP: 0018:88043dc83e88 EFLAGS: 00010282
[   64.996052] RAX:  RBX:  RCX: 0103
[   64.996054] RDX: 8103 RSI: 0086 RDI: 
[   64.996055] RBP: 88042b86e39c R08: 81c0a400 R09: 0001
[   64.996057] R10: 035a R11:  R12: 88042b86e3b0
[   64.996058] R13: 88042b86e000 R14: 0005 R15: 88042a0ced80
[   64.996061] FS:  () GS:88043dc8() 
knlGS:
[   64.996063] CS:  0010 DS:  ES:  CR0: 80050033
[   64.996065] CR2: 7f98ed87eb00 CR3: 000428ea CR4: 001406e0
[   64.996068] Call Trace:
[   64.996074]  
[   64.996082]  ? qdisc_reset+0xe0/0xe0
[   64.996085]  ? qdisc_reset+0xe0/0xe0
[   64.996092]  call_timer_fn+0x2b/0x150
[   64.996097]  run_timer_softirq+0x415/0x460
[   64.996101]  ? tick_sched_timer+0x42/0x90
[   64.996106]  ? _raw_spin_lock_irq+0x1a/0x40
[   64.996110]  ? __hrtimer_run_queues+0x113/0x2d0
[   64.996114]  __do_softirq+0xeb/0x2d5
[   64.996121]  irq_exit+0xaa/0xb0
[   64.996125]  smp_apic_timer_interrupt+0x73/0x150
[   64.996128]  apic_timer_interrupt+0x7d/0x90
[   64.996131]  
[   64.996136] RIP: 0010:cpuidle_enter_state+0xa3/0x2f0
[   64.996138] RSP: 0018:c900019c3ea8 EFLAGS: 0246 ORIG_RAX: 
ff12
[   64.996141] RAX: 88043dc8 RBX: 000f21d4b954 RCX: 001f
[   64.996142] RDX: 000f21d4b954 RSI: 81da4ca1 RDI: 81db2a9e
[   64.996144] RBP: 88042a39a200 R08: 0005a0b5 R09: 000585fa
[   64.996145] R10: 0018 R11: 00049370 R12: 0002
[   64.996146] R13: 82095db8 R14:  R15: 000f0b23994e
[   64.996157]  ? cpuidle_enter_state+0x93/0x2f0
[   65.003171]  do_idle+0x19a/0x1f0
[   65.003176]  cpu_startup_entry+0x6f/0x80
[   65.003181]  start_secondary+0x1a5/0x200
[   65.003185]  secondary_startup_64+0xa5/0xb0
[   65.003189] Code: 00 49 63 4c 24 f0 eb 93 4c 89 ef c6 05 5b 10 af 00 01 e8 
b6 67 fd ff 89 d9 48 89 c2 4c 89 ee 48 c7 c7 20 f6 df 81 e8 e2 8d a7 ff <0f> ff 
eb be 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 
[   65.003234] ---[ end trace b191673f18a75f41 ]---
[   65.003243] tg3 :01:00.0 eth0: transmit timed out, resetting
[   67.679695] tg3 :01:00.0 eth0: 0x: 0x168714e4, 0x10100406, 
0x0210, 0x
[   67.680053] tg3 :01:00.0 eth0: 0x0010: 0xd082000c, 0x, 
0xd081000c, 0x
[   67.680406] tg3 :01:00.0 eth0: 0x0020: 0xd08c, 0x, 
0x, 0x807e103c
[   67.680419] tg3 :01:00.0 eth0: 0x0030: 0x00

NETDEV WATCHDOG: eth0 (tg3): transmit queue 0 timed out

2018-02-24 Thread Borislav Petkov
Hi,

this didn't happen before but after 4.16-rc1 my tg3 nic stops for
whatever reason and the connection to the machine is dead. It didn't show
anything in dmesg until today.

The IO pagefaults look like it is trying to access something it
shouldn't and maybe that's why it times out.

It triggers pretty quickly so I'd call it a reliable reproducer and thus
I can test patches... :-)

Thx.

...
[   15.916840] random: crng init done
[   44.792699] tg3 :01:00.0 eth0: Link is up at 100 Mbps, full duplex
[   44.793024] tg3 :01:00.0 eth0: Flow control is on for TX and on for RX
[   44.793315] tg3 :01:00.0 eth0: EEE is disabled
[   44.793395] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[   58.216474] tg3 :01:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT 
domain=0x0001 address=0x0001f0c0 flags=0x]
[   58.216943] tg3 :01:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT 
domain=0x0001 address=0x0001f100 flags=0x]
[   58.217395] tg3 :01:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT 
domain=0x0001 address=0x0001f140 flags=0x]
[   58.217844] tg3 :01:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT 
domain=0x0001 address=0x0001f180 flags=0x]
[   64.992145] [ cut here ]
[   64.992406] NETDEV WATCHDOG: eth0 (tg3): transmit queue 0 timed out
[   64.992742] WARNING: CPU: 1 PID: 0 at net/sched/sch_generic.c:464 
dev_watchdog+0x1fe/0x210
[   64.992744] Modules linked in: arc4 iwlmvm mac80211 amdgpu kvm_amd kvm 
iwlwifi irqbypass crct10dif_pclmul crc32_pclmul crc32c_intel 
snd_hda_codec_conexant snd_hda_codec_hdmi snd_hda_codec_generic aesni_intel 
sha256_generic aes_x86_64 crypto_simd snd_hda_intel cryptd glue_helper tg3 
snd_hda_codec pcspkr snd_hwdep cfg80211 joydev psmouse ptp snd_hda_core hp_wmi 
pps_core snd_pcm ehci_pci chash tpm_infineon rfkill libphy i2c_piix4 snd_timer 
fam15h_power xhci_pci ehci_hcd snd sg gpu_sched k10temp soundcore xhci_hcd 
tpm_tis tpm_tis_core video tpm battery button ac acpi_cpufreq evdev input_leds 
serio_raw sd_mod thermal pinctrl_amd
[   64.993216] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.16.0-rc1+ #2
[   64.993222] Hardware name: HP HP EliteBook 745 G3/807E, BIOS N73 Ver. 01.08 
01/28/2016
[   64.996048] RIP: 0010:dev_watchdog+0x1fe/0x210
[   64.996050] RSP: 0018:88043dc83e88 EFLAGS: 00010282
[   64.996052] RAX:  RBX:  RCX: 0103
[   64.996054] RDX: 8103 RSI: 0086 RDI: 
[   64.996055] RBP: 88042b86e39c R08: 81c0a400 R09: 0001
[   64.996057] R10: 035a R11:  R12: 88042b86e3b0
[   64.996058] R13: 88042b86e000 R14: 0005 R15: 88042a0ced80
[   64.996061] FS:  () GS:88043dc8() 
knlGS:
[   64.996063] CS:  0010 DS:  ES:  CR0: 80050033
[   64.996065] CR2: 7f98ed87eb00 CR3: 000428ea CR4: 001406e0
[   64.996068] Call Trace:
[   64.996074]  
[   64.996082]  ? qdisc_reset+0xe0/0xe0
[   64.996085]  ? qdisc_reset+0xe0/0xe0
[   64.996092]  call_timer_fn+0x2b/0x150
[   64.996097]  run_timer_softirq+0x415/0x460
[   64.996101]  ? tick_sched_timer+0x42/0x90
[   64.996106]  ? _raw_spin_lock_irq+0x1a/0x40
[   64.996110]  ? __hrtimer_run_queues+0x113/0x2d0
[   64.996114]  __do_softirq+0xeb/0x2d5
[   64.996121]  irq_exit+0xaa/0xb0
[   64.996125]  smp_apic_timer_interrupt+0x73/0x150
[   64.996128]  apic_timer_interrupt+0x7d/0x90
[   64.996131]  
[   64.996136] RIP: 0010:cpuidle_enter_state+0xa3/0x2f0
[   64.996138] RSP: 0018:c900019c3ea8 EFLAGS: 0246 ORIG_RAX: 
ff12
[   64.996141] RAX: 88043dc8 RBX: 000f21d4b954 RCX: 001f
[   64.996142] RDX: 000f21d4b954 RSI: 81da4ca1 RDI: 81db2a9e
[   64.996144] RBP: 88042a39a200 R08: 0005a0b5 R09: 000585fa
[   64.996145] R10: 0018 R11: 00049370 R12: 0002
[   64.996146] R13: 82095db8 R14:  R15: 000f0b23994e
[   64.996157]  ? cpuidle_enter_state+0x93/0x2f0
[   65.003171]  do_idle+0x19a/0x1f0
[   65.003176]  cpu_startup_entry+0x6f/0x80
[   65.003181]  start_secondary+0x1a5/0x200
[   65.003185]  secondary_startup_64+0xa5/0xb0
[   65.003189] Code: 00 49 63 4c 24 f0 eb 93 4c 89 ef c6 05 5b 10 af 00 01 e8 
b6 67 fd ff 89 d9 48 89 c2 4c 89 ee 48 c7 c7 20 f6 df 81 e8 e2 8d a7 ff <0f> ff 
eb be 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 
[   65.003234] ---[ end trace b191673f18a75f41 ]---
[   65.003243] tg3 :01:00.0 eth0: transmit timed out, resetting
[   67.679695] tg3 :01:00.0 eth0: 0x: 0x168714e4, 0x10100406, 
0x0210, 0x
[   67.680053] tg3 :01:00.0 eth0: 0x0010: 0xd082000c, 0x, 
0xd081000c, 0x
[   67.680406] tg3 :01:00.0 eth0: 0x0020: 0xd08c, 0x, 
0x, 0x807e103c
[   67.680419] tg3 :01:00.0 eth0: 0x0030: 0x00