subject:"netdev watchdog"

[PATCH 5.10 131/663] net: amd-xgbe: Fix NETDEV WATCHDOG transmit queue timeout warning

2021-03-01 Thread Greg Kroah-Hartman

From: Shyam Sundar S K 

[ Upstream commit 186edbb510bd60e748f93975989ccba25ee99c50 ]

The current driver calls netif_carrier_off() late in the link tear down
which can result in a netdev watchdog timeout.

Calling netif_carrier_off() immediately after netif_tx_stop_all_queues()
avoids the warning.

 [ cut here ]
 NETDEV WATCHDOG: enp3s0f2 (amd-xgbe): transmit queue 0 timed out
 WARNING: CPU: 3 PID: 0 at net/sched/sch_generic.c:461 dev_watchdog+0x20d/0x220
 Modules linked in: amd_xgbe(E)  amd-xgbe :03:00.2 enp3s0f2: Link is Down
 CPU: 3 PID: 0 Comm: swapper/3 Tainted: GE
 Hardware name: AMD Bilby-RV2/Bilby-RV2, BIOS RBB1202A 10/18/2019
 RIP: 0010:dev_watchdog+0x20d/0x220
 Code: 00 49 63 4e e0 eb 92 4c 89 e7 c6 05 c6 e2 c1 00 01 e8 e7 ce fc ff 89 d9 
48
 RSP: 0018:90cfc28c3e88 EFLAGS: 00010286
 RAX:  RBX:  RCX: 0006
 RDX: 0007 RSI: 0086 RDI: 90cfc28d63c0
 RBP: 90cfb977845c R08: 0050 R09: 00196018
 R10: 90cfc28c3ef8 R11:  R12: 90cfb9778000
 R13: 0003 R14: 90cfb9778480 R15: 0010
 FS:  () GS:90cfc28c() knlGS:
 CS:  0010 DS:  ES:  CR0: 80050033
 CR2: 7f240ff2d9d0 CR3: 0001e3e0a000 CR4: 003406e0
 Call Trace:
  
  ? pfifo_fast_reset+0x100/0x100
  call_timer_fn+0x2b/0x130
  run_timer_softirq+0x3e8/0x440
  ? enqueue_hrtimer+0x39/0x90

Fixes: e722ec82374b ("amd-xgbe: Update the BelFuse quirk to support SGMII")
Co-developed-by: Sudheesh Mavila 
Signed-off-by: Sudheesh Mavila 
Signed-off-by: Shyam Sundar S K 
Acked-by: Tom Lendacky 
Signed-off-by: David S. Miller 
Signed-off-by: Sasha Levin 
---
 drivers/net/ethernet/amd/xgbe/xgbe-drv.c  | 1 +
 drivers/net/ethernet/amd/xgbe/xgbe-mdio.c | 1 -
 2 files changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/amd/xgbe/xgbe-drv.c 
b/drivers/net/ethernet/amd/xgbe/xgbe-drv.c
index 2709a2db56577..395eb0b526802 100644
--- a/drivers/net/ethernet/amd/xgbe/xgbe-drv.c
+++ b/drivers/net/ethernet/amd/xgbe/xgbe-drv.c
@@ -1368,6 +1368,7 @@ static void xgbe_stop(struct xgbe_prv_data *pdata)
return;
 
netif_tx_stop_all_queues(netdev);
+   netif_carrier_off(pdata->netdev);
 
xgbe_stop_timers(pdata);
flush_workqueue(pdata->dev_workqueue);
diff --git a/drivers/net/ethernet/amd/xgbe/xgbe-mdio.c 
b/drivers/net/ethernet/amd/xgbe/xgbe-mdio.c
index 93ef5a30cb8d9..19ee4db0156d6 100644
--- a/drivers/net/ethernet/amd/xgbe/xgbe-mdio.c
+++ b/drivers/net/ethernet/amd/xgbe/xgbe-mdio.c
@@ -1396,7 +1396,6 @@ static void xgbe_phy_stop(struct xgbe_prv_data *pdata)
pdata->phy_if.phy_impl.stop(pdata);
 
pdata->phy.link = 0;
-   netif_carrier_off(pdata->netdev);
 
xgbe_phy_adjust_link(pdata);
 }
-- 
2.27.0

[PATCH 5.11 135/775] net: amd-xgbe: Fix NETDEV WATCHDOG transmit queue timeout warning

2021-03-01 Thread Greg Kroah-Hartman

From: Shyam Sundar S K 

[ Upstream commit 186edbb510bd60e748f93975989ccba25ee99c50 ]

The current driver calls netif_carrier_off() late in the link tear down
which can result in a netdev watchdog timeout.

Calling netif_carrier_off() immediately after netif_tx_stop_all_queues()
avoids the warning.

 [ cut here ]
 NETDEV WATCHDOG: enp3s0f2 (amd-xgbe): transmit queue 0 timed out
 WARNING: CPU: 3 PID: 0 at net/sched/sch_generic.c:461 dev_watchdog+0x20d/0x220
 Modules linked in: amd_xgbe(E)  amd-xgbe :03:00.2 enp3s0f2: Link is Down
 CPU: 3 PID: 0 Comm: swapper/3 Tainted: GE
 Hardware name: AMD Bilby-RV2/Bilby-RV2, BIOS RBB1202A 10/18/2019
 RIP: 0010:dev_watchdog+0x20d/0x220
 Code: 00 49 63 4e e0 eb 92 4c 89 e7 c6 05 c6 e2 c1 00 01 e8 e7 ce fc ff 89 d9 
48
 RSP: 0018:90cfc28c3e88 EFLAGS: 00010286
 RAX:  RBX:  RCX: 0006
 RDX: 0007 RSI: 0086 RDI: 90cfc28d63c0
 RBP: 90cfb977845c R08: 0050 R09: 00196018
 R10: 90cfc28c3ef8 R11:  R12: 90cfb9778000
 R13: 0003 R14: 90cfb9778480 R15: 0010
 FS:  () GS:90cfc28c() knlGS:
 CS:  0010 DS:  ES:  CR0: 80050033
 CR2: 7f240ff2d9d0 CR3: 0001e3e0a000 CR4: 003406e0
 Call Trace:
  
  ? pfifo_fast_reset+0x100/0x100
  call_timer_fn+0x2b/0x130
  run_timer_softirq+0x3e8/0x440
  ? enqueue_hrtimer+0x39/0x90

Fixes: e722ec82374b ("amd-xgbe: Update the BelFuse quirk to support SGMII")
Co-developed-by: Sudheesh Mavila 
Signed-off-by: Sudheesh Mavila 
Signed-off-by: Shyam Sundar S K 
Acked-by: Tom Lendacky 
Signed-off-by: David S. Miller 
Signed-off-by: Sasha Levin 
---
 drivers/net/ethernet/amd/xgbe/xgbe-drv.c  | 1 +
 drivers/net/ethernet/amd/xgbe/xgbe-mdio.c | 1 -
 2 files changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/amd/xgbe/xgbe-drv.c 
b/drivers/net/ethernet/amd/xgbe/xgbe-drv.c
index 2709a2db56577..395eb0b526802 100644
--- a/drivers/net/ethernet/amd/xgbe/xgbe-drv.c
+++ b/drivers/net/ethernet/amd/xgbe/xgbe-drv.c
@@ -1368,6 +1368,7 @@ static void xgbe_stop(struct xgbe_prv_data *pdata)
return;
 
netif_tx_stop_all_queues(netdev);
+   netif_carrier_off(pdata->netdev);
 
xgbe_stop_timers(pdata);
flush_workqueue(pdata->dev_workqueue);
diff --git a/drivers/net/ethernet/amd/xgbe/xgbe-mdio.c 
b/drivers/net/ethernet/amd/xgbe/xgbe-mdio.c
index 93ef5a30cb8d9..19ee4db0156d6 100644
--- a/drivers/net/ethernet/amd/xgbe/xgbe-mdio.c
+++ b/drivers/net/ethernet/amd/xgbe/xgbe-mdio.c
@@ -1396,7 +1396,6 @@ static void xgbe_phy_stop(struct xgbe_prv_data *pdata)
pdata->phy_if.phy_impl.stop(pdata);
 
pdata->phy.link = 0;
-   netif_carrier_off(pdata->netdev);
 
xgbe_phy_adjust_link(pdata);
 }
-- 
2.27.0

[PATCH 4.19 069/247] net: amd-xgbe: Fix NETDEV WATCHDOG transmit queue timeout warning

2021-03-01 Thread Greg Kroah-Hartman

From: Shyam Sundar S K 

[ Upstream commit 186edbb510bd60e748f93975989ccba25ee99c50 ]

The current driver calls netif_carrier_off() late in the link tear down
which can result in a netdev watchdog timeout.

Calling netif_carrier_off() immediately after netif_tx_stop_all_queues()
avoids the warning.

 [ cut here ]
 NETDEV WATCHDOG: enp3s0f2 (amd-xgbe): transmit queue 0 timed out
 WARNING: CPU: 3 PID: 0 at net/sched/sch_generic.c:461 dev_watchdog+0x20d/0x220
 Modules linked in: amd_xgbe(E)  amd-xgbe :03:00.2 enp3s0f2: Link is Down
 CPU: 3 PID: 0 Comm: swapper/3 Tainted: GE
 Hardware name: AMD Bilby-RV2/Bilby-RV2, BIOS RBB1202A 10/18/2019
 RIP: 0010:dev_watchdog+0x20d/0x220
 Code: 00 49 63 4e e0 eb 92 4c 89 e7 c6 05 c6 e2 c1 00 01 e8 e7 ce fc ff 89 d9 
48
 RSP: 0018:90cfc28c3e88 EFLAGS: 00010286
 RAX:  RBX:  RCX: 0006
 RDX: 0007 RSI: 0086 RDI: 90cfc28d63c0
 RBP: 90cfb977845c R08: 0050 R09: 00196018
 R10: 90cfc28c3ef8 R11:  R12: 90cfb9778000
 R13: 0003 R14: 90cfb9778480 R15: 0010
 FS:  () GS:90cfc28c() knlGS:
 CS:  0010 DS:  ES:  CR0: 80050033
 CR2: 7f240ff2d9d0 CR3: 0001e3e0a000 CR4: 003406e0
 Call Trace:
  
  ? pfifo_fast_reset+0x100/0x100
  call_timer_fn+0x2b/0x130
  run_timer_softirq+0x3e8/0x440
  ? enqueue_hrtimer+0x39/0x90

Fixes: e722ec82374b ("amd-xgbe: Update the BelFuse quirk to support SGMII")
Co-developed-by: Sudheesh Mavila 
Signed-off-by: Sudheesh Mavila 
Signed-off-by: Shyam Sundar S K 
Acked-by: Tom Lendacky 
Signed-off-by: David S. Miller 
Signed-off-by: Sasha Levin 
---
 drivers/net/ethernet/amd/xgbe/xgbe-drv.c  | 1 +
 drivers/net/ethernet/amd/xgbe/xgbe-mdio.c | 1 -
 2 files changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/amd/xgbe/xgbe-drv.c 
b/drivers/net/ethernet/amd/xgbe/xgbe-drv.c
index 5519eff584417..80cf6af822f72 100644
--- a/drivers/net/ethernet/amd/xgbe/xgbe-drv.c
+++ b/drivers/net/ethernet/amd/xgbe/xgbe-drv.c
@@ -1444,6 +1444,7 @@ static void xgbe_stop(struct xgbe_prv_data *pdata)
return;
 
netif_tx_stop_all_queues(netdev);
+   netif_carrier_off(pdata->netdev);
 
xgbe_stop_timers(pdata);
flush_workqueue(pdata->dev_workqueue);
diff --git a/drivers/net/ethernet/amd/xgbe/xgbe-mdio.c 
b/drivers/net/ethernet/amd/xgbe/xgbe-mdio.c
index 8a3a60bb26888..4d5506d928973 100644
--- a/drivers/net/ethernet/amd/xgbe/xgbe-mdio.c
+++ b/drivers/net/ethernet/amd/xgbe/xgbe-mdio.c
@@ -1396,7 +1396,6 @@ static void xgbe_phy_stop(struct xgbe_prv_data *pdata)
pdata->phy_if.phy_impl.stop(pdata);
 
pdata->phy.link = 0;
-   netif_carrier_off(pdata->netdev);
 
xgbe_phy_adjust_link(pdata);
 }
-- 
2.27.0

[PATCH 5.4 066/340] net: amd-xgbe: Fix NETDEV WATCHDOG transmit queue timeout warning

2021-03-01 Thread Greg Kroah-Hartman

From: Shyam Sundar S K 

[ Upstream commit 186edbb510bd60e748f93975989ccba25ee99c50 ]

The current driver calls netif_carrier_off() late in the link tear down
which can result in a netdev watchdog timeout.

Calling netif_carrier_off() immediately after netif_tx_stop_all_queues()
avoids the warning.

 [ cut here ]
 NETDEV WATCHDOG: enp3s0f2 (amd-xgbe): transmit queue 0 timed out
 WARNING: CPU: 3 PID: 0 at net/sched/sch_generic.c:461 dev_watchdog+0x20d/0x220
 Modules linked in: amd_xgbe(E)  amd-xgbe :03:00.2 enp3s0f2: Link is Down
 CPU: 3 PID: 0 Comm: swapper/3 Tainted: GE
 Hardware name: AMD Bilby-RV2/Bilby-RV2, BIOS RBB1202A 10/18/2019
 RIP: 0010:dev_watchdog+0x20d/0x220
 Code: 00 49 63 4e e0 eb 92 4c 89 e7 c6 05 c6 e2 c1 00 01 e8 e7 ce fc ff 89 d9 
48
 RSP: 0018:90cfc28c3e88 EFLAGS: 00010286
 RAX:  RBX:  RCX: 0006
 RDX: 0007 RSI: 0086 RDI: 90cfc28d63c0
 RBP: 90cfb977845c R08: 0050 R09: 00196018
 R10: 90cfc28c3ef8 R11:  R12: 90cfb9778000
 R13: 0003 R14: 90cfb9778480 R15: 0010
 FS:  () GS:90cfc28c() knlGS:
 CS:  0010 DS:  ES:  CR0: 80050033
 CR2: 7f240ff2d9d0 CR3: 0001e3e0a000 CR4: 003406e0
 Call Trace:
  
  ? pfifo_fast_reset+0x100/0x100
  call_timer_fn+0x2b/0x130
  run_timer_softirq+0x3e8/0x440
  ? enqueue_hrtimer+0x39/0x90

Fixes: e722ec82374b ("amd-xgbe: Update the BelFuse quirk to support SGMII")
Co-developed-by: Sudheesh Mavila 
Signed-off-by: Sudheesh Mavila 
Signed-off-by: Shyam Sundar S K 
Acked-by: Tom Lendacky 
Signed-off-by: David S. Miller 
Signed-off-by: Sasha Levin 
---
 drivers/net/ethernet/amd/xgbe/xgbe-drv.c  | 1 +
 drivers/net/ethernet/amd/xgbe/xgbe-mdio.c | 1 -
 2 files changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/amd/xgbe/xgbe-drv.c 
b/drivers/net/ethernet/amd/xgbe/xgbe-drv.c
index 3bd20f7651207..da8c2c4aca7ef 100644
--- a/drivers/net/ethernet/amd/xgbe/xgbe-drv.c
+++ b/drivers/net/ethernet/amd/xgbe/xgbe-drv.c
@@ -1443,6 +1443,7 @@ static void xgbe_stop(struct xgbe_prv_data *pdata)
return;
 
netif_tx_stop_all_queues(netdev);
+   netif_carrier_off(pdata->netdev);
 
xgbe_stop_timers(pdata);
flush_workqueue(pdata->dev_workqueue);
diff --git a/drivers/net/ethernet/amd/xgbe/xgbe-mdio.c 
b/drivers/net/ethernet/amd/xgbe/xgbe-mdio.c
index 8a3a60bb26888..4d5506d928973 100644
--- a/drivers/net/ethernet/amd/xgbe/xgbe-mdio.c
+++ b/drivers/net/ethernet/amd/xgbe/xgbe-mdio.c
@@ -1396,7 +1396,6 @@ static void xgbe_phy_stop(struct xgbe_prv_data *pdata)
pdata->phy_if.phy_impl.stop(pdata);
 
pdata->phy.link = 0;
-   netif_carrier_off(pdata->netdev);
 
xgbe_phy_adjust_link(pdata);
 }
-- 
2.27.0

Re: NETDEV WATCHDOG: WARNING: at net/sched/sch_generic.c:442 dev_watchdog

2020-08-19 Thread Steven Rostedt

On Wed, 19 Aug 2020 10:29:09 -0700
Jesse Brandeburg  wrote:


> What I don't understand in the stack trace is this:
> > > [  107.654661] Call Trace:
> > > [  107.657735]  
> > > [  107.663155]  ? ftrace_graph_caller+0xc0/0xc0
> > > [  107.667929]  call_timer_fn+0x3b/0x1b0
> > > [  107.672238]  ? netif_carrier_off+0x70/0x70
> > > [  107.61]  ? netif_carrier_off+0x70/0x70
> > > [  107.682656]  ? ftrace_graph_caller+0xc0/0xc0
> > > [  107.687379]  run_timer_softirq+0x3e8/0xa10
> > > [  107.694653]  ? call_timer_fn+0x1b0/0x1b0
> > > [  107.699382]  ? trace_event_raw_event_softirq+0xdd/0x150
> > > [  107.706768]  ? ring_buffer_unlock_commit+0xf5/0x210
> > > [  107.712213]  ? call_timer_fn+0x1b0/0x1b0
> > > [  107.716625]  ? __do_softirq+0x155/0x467  
> 
> 
> If the carrier was turned off by something, that could cause the stack
> to timeout since it appears the driver didn't call this itself after
> finishing all transmits like it normally would have.
> 
> Is the trace above correct? Usually the ? indicate unsure backtrace due
> to missing symbols, right?

The "?" means that there wasn't a stack frame to confirm that this was
the true call stack. What happens is that the scan of the stack will
look for any address in the stack that is for a function. If it finds
one, it will print it and add a "?" to that address. Basically, those
functions with the "?" are just addresses found in the stack but was not
part of a stack frame link.

-- Steve

Re: NETDEV WATCHDOG: WARNING: at net/sched/sch_generic.c:442 dev_watchdog

2020-08-19 Thread Jesse Brandeburg

Steven Rostedt wrote:

> On Wed, 19 Aug 2020 17:01:06 +0530
> Naresh Kamboju  wrote:
> 
> > kernel warning noticed on x86_64 while running LTP tracing 
> > ftrace-stress-test
> > case. started noticing on the stable-rc linux-5.8.y branch.
> > 
> > This device booted with KASAN config and DYNAMIC tracing configs and more.
> > This reported issue is not easily reproducible.
> > 
> > metadata:
> >   git branch: linux-5.8.y
> >   git repo: 
> > https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
> >   git commit: ad8c735b1497520df959f675718f39dca8cb8019
> >   git describe: v5.8.2
> >   make_kernelversion: 5.8.2
> >   kernel-config:
> > https://builds.tuxbuild.com/bOz0eAwkcraRiWALTW9D3Q/kernel.config
> > 
> > 
> > [   88.139387] Scheduler tracepoints stat_sleep, stat_iowait,
> > stat_blocked and stat_runtime require the kernel parameter
> > schedstats=enable or kernel.sched_schedstats=1
> > [   88.139387] Scheduler tracepoints stat_sleep, stat_iowait,
> > stat_blocked and stat_runtime require the kernel parameter
> > schedstats=enable or kernel.sched_schedstats=1
> > [  107.507991] [ cut here ]
> > [  107.513103] NETDEV WATCHDOG: eth0 (igb): transmit queue 2 timed out
> > [  107.519973] WARNING: CPU: 1 PID: 331 at net/sched/sch_generic.c:442
> > dev_watchdog+0x4c7/0x4d0
> > [  107.528907] Modules linked in: x86_pkg_temp_thermal
> > [  107.534541] CPU: 1 PID: 331 Comm: systemd-journal Not tainted 5.8.2 #1
> > [  107.541480] Hardware name: Supermicro SYS-5019S-ML/X11SSH-F, BIOS
> > 2.2 05/23/2018
> > [  107.549314] RIP: 0010:dev_watchdog+0x4c7/0x4d0
> > [  107.554226] Code: ff ff 48 8b 5d c8 c6 05 6d f7 94 01 01 48 89 df
> > e8 9e b4 f8 ff 44 89 e9 48 89 de 48 c7 c7 20 49 51 9c 48 89 c2 e8 91
> > 7e e9 fe <0f> 0b e9 03 ff ff ff 66 90 e8 9b 23 db fe 55 48 89 e5 41 57
> 
> I've triggered this myself in my testing, and I assumed that adding the
> overhead of tracing and here KASAN too, made some watchdog a bit
> unhappy. By commenting out the warning, I've seen no ill effects.
> 
> Perhaps this is something we need to dig a bit deeper into.

Looked into it a little, igb uses a timeout of 5 seconds, and the stack
prints the warning if we haven't completed the transmit in that time.

What I don't understand in the stack trace is this:
> > [  107.654661] Call Trace:
> > [  107.657735]  
> > [  107.663155]  ? ftrace_graph_caller+0xc0/0xc0
> > [  107.667929]  call_timer_fn+0x3b/0x1b0
> > [  107.672238]  ? netif_carrier_off+0x70/0x70
> > [  107.61]  ? netif_carrier_off+0x70/0x70
> > [  107.682656]  ? ftrace_graph_caller+0xc0/0xc0
> > [  107.687379]  run_timer_softirq+0x3e8/0xa10
> > [  107.694653]  ? call_timer_fn+0x1b0/0x1b0
> > [  107.699382]  ? trace_event_raw_event_softirq+0xdd/0x150
> > [  107.706768]  ? ring_buffer_unlock_commit+0xf5/0x210
> > [  107.712213]  ? call_timer_fn+0x1b0/0x1b0
> > [  107.716625]  ? __do_softirq+0x155/0x467


If the carrier was turned off by something, that could cause the stack
to timeout since it appears the driver didn't call this itself after
finishing all transmits like it normally would have.

Is the trace above correct? Usually the ? indicate unsure backtrace due
to missing symbols, right?

Re: NETDEV WATCHDOG: WARNING: at net/sched/sch_generic.c:442 dev_watchdog

2020-08-19 Thread Steven Rostedt

On Wed, 19 Aug 2020 17:01:06 +0530
Naresh Kamboju  wrote:

> kernel warning noticed on x86_64 while running LTP tracing ftrace-stress-test
> case. started noticing on the stable-rc linux-5.8.y branch.
> 
> This device booted with KASAN config and DYNAMIC tracing configs and more.
> This reported issue is not easily reproducible.
> 
> metadata:
>   git branch: linux-5.8.y
>   git repo: 
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
>   git commit: ad8c735b1497520df959f675718f39dca8cb8019
>   git describe: v5.8.2
>   make_kernelversion: 5.8.2
>   kernel-config:
> https://builds.tuxbuild.com/bOz0eAwkcraRiWALTW9D3Q/kernel.config
> 
> 
> [   88.139387] Scheduler tracepoints stat_sleep, stat_iowait,
> stat_blocked and stat_runtime require the kernel parameter
> schedstats=enable or kernel.sched_schedstats=1
> [   88.139387] Scheduler tracepoints stat_sleep, stat_iowait,
> stat_blocked and stat_runtime require the kernel parameter
> schedstats=enable or kernel.sched_schedstats=1
> [  107.507991] [ cut here ]
> [  107.513103] NETDEV WATCHDOG: eth0 (igb): transmit queue 2 timed out
> [  107.519973] WARNING: CPU: 1 PID: 331 at net/sched/sch_generic.c:442
> dev_watchdog+0x4c7/0x4d0
> [  107.528907] Modules linked in: x86_pkg_temp_thermal
> [  107.534541] CPU: 1 PID: 331 Comm: systemd-journal Not tainted 5.8.2 #1
> [  107.541480] Hardware name: Supermicro SYS-5019S-ML/X11SSH-F, BIOS
> 2.2 05/23/2018
> [  107.549314] RIP: 0010:dev_watchdog+0x4c7/0x4d0
> [  107.554226] Code: ff ff 48 8b 5d c8 c6 05 6d f7 94 01 01 48 89 df
> e8 9e b4 f8 ff 44 89 e9 48 89 de 48 c7 c7 20 49 51 9c 48 89 c2 e8 91
> 7e e9 fe <0f> 0b e9 03 ff ff ff 66 90 e8 9b 23 db fe 55 48 89 e5 41 57

I've triggered this myself in my testing, and I assumed that adding the
overhead of tracing and here KASAN too, made some watchdog a bit
unhappy. By commenting out the warning, I've seen no ill effects.

Perhaps this is something we need to dig a bit deeper into.

-- Steve


> 41 56
> [  107.573476] RSP: 0018:888230889d88 EFLAGS: 00010286
> [  107.579264] RAX:  RBX: 88822bbb RCX: 
> dc00
> [  107.586928] RDX: 111046114c99 RSI: 9a7e4dbe RDI: 
> 9b7a6da7
> [  107.594473] RBP: 888230889de0 R08: 9a7e4dd3 R09: 
> ed1044de2529
> [  107.602101] R10: 888226f12943 R11: ed1044de2528 R12: 
> 88822bbb0440
> [  107.609648] R13: 0002 R14: 88822bbb0388 R15: 
> 88822bbb0380
> [  107.617197] FS:  7f8b471bb480() GS:88823088()
> knlGS:
> [  107.625698] CS:  0010 DS:  ES:  CR0: 80050033
> [  107.631944] CR2: 0008 CR3: 000226a64001 CR4: 
> 003606e0
> [  107.639496] DR0:  DR1:  DR2: 
> 
> [  107.647092] DR3:  DR6: fffe0ff0 DR7: 
> 0400
> [  107.654661] Call Trace:
> [  107.657735]  
> [  107.663155]  ? ftrace_graph_caller+0xc0/0xc0
> [  107.667929]  call_timer_fn+0x3b/0x1b0
> [  107.672238]  ? netif_carrier_off+0x70/0x70
> [  107.61]  ? netif_carrier_off+0x70/0x70
> [  107.682656]  ? ftrace_graph_caller+0xc0/0xc0
> [  107.687379]  run_timer_softirq+0x3e8/0xa10
> [  107.694653]  ? call_timer_fn+0x1b0/0x1b0
> [  107.699382]  ? trace_event_raw_event_softirq+0xdd/0x150
> [  107.706768]  ? ring_buffer_unlock_commit+0xf5/0x210
> [  107.712213]  ? call_timer_fn+0x1b0/0x1b0
> [  107.716625]  ? __do_softirq+0x155/0x467
> Aug 22 04:21:44 intel-corei7-64 [  107.721972]  ? run_timer_softirq+0x5/0xa10
> user.warn kernel[  107.727997]  ? asm_call_on_stack+0x12/0x20
> : [  107.507991] [ c[  107.735546]  ? 
> ftrace_graph_caller+0xc0/0xc0
> ut here ]---[  107.740453]  __do_softirq+0x160/0x467
> -
> [  107.745737]  ? hrtimer_interrupt+0x5/0x340
> [  107.753961]  asm_call_on_stack+0x12/0x20
> [  107.758672]  
> [  107.761555]  do_softirq_own_stack+0x3f/0x50
> [  107.766521]  ? ftrace_graph_caller+0xc0/0xc0
> [  107.771246]  irq_exit_rcu+0xff/0x110
> [  107.776116]  ? ftrace_graph_caller+0xc0/0xc0
> [  107.780808]  sysvec_apic_timer_interrupt+0x38/0x90
> [  107.786971]  asm_sysvec_apic_timer_interrupt+0x12/0x20
> [  107.792598] RIP: 0010:profile_graph_return+0x111/0x1d0
> [  107.798204] Code: 75 e1 48 8b 45 d0 f6 c4 02 75 16 50 9d e8 f7 ff
> 02 00 48 83 c4 10 5b 41 5c 41 5d 41 5e 41 5f 5d c3 e8 c3 fb 02 00 ff
> 75 d0 9d <48> 83 c4 10 5b 41 5c 41 5d 41 5e 41 5f 5d c3 48 8d 7b 20 e8
> 77 78
> [  107.817416] RSP: 0018:8882269b73a0 EFLAGS: 0286
> [  107.823201] RAX: 8882269b73d8 RBX: 8882269b7428 RCX: 
> dc00
> [  107.830785] RDX: dc000

NETDEV WATCHDOG: WARNING: at net/sched/sch_generic.c:442 dev_watchdog

2020-08-19 Thread Naresh Kamboju

kernel warning noticed on x86_64 while running LTP tracing ftrace-stress-test
case. started noticing on the stable-rc linux-5.8.y branch.

This device booted with KASAN config and DYNAMIC tracing configs and more.
This reported issue is not easily reproducible.

metadata:
  git branch: linux-5.8.y
  git repo: 
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
  git commit: ad8c735b1497520df959f675718f39dca8cb8019
  git describe: v5.8.2
  make_kernelversion: 5.8.2
  kernel-config:
https://builds.tuxbuild.com/bOz0eAwkcraRiWALTW9D3Q/kernel.config


[   88.139387] Scheduler tracepoints stat_sleep, stat_iowait,
stat_blocked and stat_runtime require the kernel parameter
schedstats=enable or kernel.sched_schedstats=1
[   88.139387] Scheduler tracepoints stat_sleep, stat_iowait,
stat_blocked and stat_runtime require the kernel parameter
schedstats=enable or kernel.sched_schedstats=1
[  107.507991] [ cut here ]
[  107.513103] NETDEV WATCHDOG: eth0 (igb): transmit queue 2 timed out
[  107.519973] WARNING: CPU: 1 PID: 331 at net/sched/sch_generic.c:442
dev_watchdog+0x4c7/0x4d0
[  107.528907] Modules linked in: x86_pkg_temp_thermal
[  107.534541] CPU: 1 PID: 331 Comm: systemd-journal Not tainted 5.8.2 #1
[  107.541480] Hardware name: Supermicro SYS-5019S-ML/X11SSH-F, BIOS
2.2 05/23/2018
[  107.549314] RIP: 0010:dev_watchdog+0x4c7/0x4d0
[  107.554226] Code: ff ff 48 8b 5d c8 c6 05 6d f7 94 01 01 48 89 df
e8 9e b4 f8 ff 44 89 e9 48 89 de 48 c7 c7 20 49 51 9c 48 89 c2 e8 91
7e e9 fe <0f> 0b e9 03 ff ff ff 66 90 e8 9b 23 db fe 55 48 89 e5 41 57
41 56
[  107.573476] RSP: 0018:888230889d88 EFLAGS: 00010286
[  107.579264] RAX:  RBX: 88822bbb RCX: dc00
[  107.586928] RDX: 111046114c99 RSI: 9a7e4dbe RDI: 9b7a6da7
[  107.594473] RBP: 888230889de0 R08: 9a7e4dd3 R09: ed1044de2529
[  107.602101] R10: 888226f12943 R11: ed1044de2528 R12: 88822bbb0440
[  107.609648] R13: 0002 R14: 88822bbb0388 R15: 88822bbb0380
[  107.617197] FS:  7f8b471bb480() GS:88823088()
knlGS:
[  107.625698] CS:  0010 DS:  ES:  CR0: 80050033
[  107.631944] CR2: 0008 CR3: 000226a64001 CR4: 003606e0
[  107.639496] DR0:  DR1:  DR2: 
[  107.647092] DR3:  DR6: fffe0ff0 DR7: 0400
[  107.654661] Call Trace:
[  107.657735]  
[  107.663155]  ? ftrace_graph_caller+0xc0/0xc0
[  107.667929]  call_timer_fn+0x3b/0x1b0
[  107.672238]  ? netif_carrier_off+0x70/0x70
[  107.61]  ? netif_carrier_off+0x70/0x70
[  107.682656]  ? ftrace_graph_caller+0xc0/0xc0
[  107.687379]  run_timer_softirq+0x3e8/0xa10
[  107.694653]  ? call_timer_fn+0x1b0/0x1b0
[  107.699382]  ? trace_event_raw_event_softirq+0xdd/0x150
[  107.706768]  ? ring_buffer_unlock_commit+0xf5/0x210
[  107.712213]  ? call_timer_fn+0x1b0/0x1b0
[  107.716625]  ? __do_softirq+0x155/0x467
Aug 22 04:21:44 intel-corei7-64 [  107.721972]  ? run_timer_softirq+0x5/0xa10
user.warn kernel[  107.727997]  ? asm_call_on_stack+0x12/0x20
: [  107.507991] [ c[  107.735546]  ? ftrace_graph_caller+0xc0/0xc0
ut here ]---[  107.740453]  __do_softirq+0x160/0x467
-
[  107.745737]  ? hrtimer_interrupt+0x5/0x340
[  107.753961]  asm_call_on_stack+0x12/0x20
[  107.758672]  
[  107.761555]  do_softirq_own_stack+0x3f/0x50
[  107.766521]  ? ftrace_graph_caller+0xc0/0xc0
[  107.771246]  irq_exit_rcu+0xff/0x110
[  107.776116]  ? ftrace_graph_caller+0xc0/0xc0
[  107.780808]  sysvec_apic_timer_interrupt+0x38/0x90
[  107.786971]  asm_sysvec_apic_timer_interrupt+0x12/0x20
[  107.792598] RIP: 0010:profile_graph_return+0x111/0x1d0
[  107.798204] Code: 75 e1 48 8b 45 d0 f6 c4 02 75 16 50 9d e8 f7 ff
02 00 48 83 c4 10 5b 41 5c 41 5d 41 5e 41 5f 5d c3 e8 c3 fb 02 00 ff
75 d0 9d <48> 83 c4 10 5b 41 5c 41 5d 41 5e 41 5f 5d c3 48 8d 7b 20 e8
77 78
[  107.817416] RSP: 0018:8882269b73a0 EFLAGS: 0286
[  107.823201] RAX: 8882269b73d8 RBX: 8882269b7428 RCX: dc00
[  107.830785] RDX: dc00 RSI: 9a7e4dbe RDI: 9a7a955d
[  107.838411] RBP: 8882269b73d8 R08: 9a7e4dd3 R09: ed1044de2529
[  107.846072] R10: 888226f12943 R11: ed1044de2528 R12: 8882308a67c0
[  107.853621] R13: 888226f12930 R14: 8882308a67c8 R15: 88822c7e4000
[  107.863449]  ? ftrace_return_to_handler+0x1a3/0x230
Aug 22 04:21:44 [  107.869545]  ? ftrace_return_to_handler+0x18e/0x230
intel-corei7-64 [  107.875178]  ? profile_graph_return+0x10d/0x1d0
user.info kernel: [  107.513103][  107.882521]  ? unwind_dump+0x100/0x100
 NETDEV WATCHDOG: eth0 (igb): tr[  107.889054]  ?
unwind_next_frame.part.0+0xe0/0x360
ansmit queue 2 t[  107.895638]  ftrace_return_to_handler+0x18e/0x230
imed out
[  107.902594]  ? function_graph_enter+0x2d0/0x2d0
[  107.907616]  ? unwind_next_fra

Re: stable-rc 4.19: NETDEV WATCHDOG: eth0 (asix): transmit queue 0 timed out - net/sched/sch_generic.c:466 dev_watchdog

2020-05-13 Thread Naresh Kamboju

While running selftests bpf test_sysctl on stable rc 5.6 branch kernel
on arm64 hikey device. The following warning was noticed.

[ 1097.207013] NETDEV WATCHDOG: eth0 (asix): transmit queue 0 timed out
[ 1097.387913] WARNING: CPU: 0 PID: 206 at
/usr/src/kernel/net/sched/sch_generic.c:443 dev_watchdog+0x438/0x470
[ 1097.479820] Modules linked in: cls_bpf sch_fq sch_ingress test_bpf
algif_hash af_alg wl18xx wlcore mac80211 libarc4 cfg80211 hci_uart
snd_soc_audio_graph_card snd_soc_simple_card_utils btqca crct10dif_ce
btbcm adv7511 wlcore_sdio bluetooth cec ecdh_generic ecc lima rfkill
kirin_drm gpu_sched drm_kms_helper dw_drm_dsi drm fuse [last unloaded:
trace_printk]
[ 1097.684705] CPU: 0 PID: 206 Comm: jbd2/mmcblk0p9- Not tainted 5.6.13-rc1 #1
[ 1097.776526] Hardware name: HiKey Development Board (DT)
[ 1097.865766] pstate: 6005 (nZCv daif -PAN -UAO)
[ 1097.954668] pc : dev_watchdog+0x438/0x470
[ 1098.042508] lr : dev_watchdog+0x438/0x470

ref:
https://qa-reports.linaro.org/lkft/linux-stable-rc-5.6-oe/build/v5.6.12-119-gf1d28d1c7608/testrun/1430360/log


On Tue, 5 May 2020 at 17:01, Naresh Kamboju  wrote:
>
> While running selftests bpf test_sysctl on stable rc 4.19 branch kernel
> on arm64 hikey device. The following warning was noticed.
>
> [  118.957395] test_bpf: #296 BPF_MAXINSNS: exec all MSH
> [  148.966435] [ cut here ]
> [  148.988349] NETDEV WATCHDOG: eth0 (asix): transmit queue 0 timed out
> [  149.000832] WARNING: CPU: 0 PID: 0 at
> /usr/src/kernel/net/sched/sch_generic.c:466 dev_watchdog+0x2b4/0x2c0
> [  149.016470] Modules linked in: test_bpf(+) wl18xx wlcore mac80211
> cfg80211 crc32_ce hci_uart crct10dif_ce btbcm snd_soc_audio_graph_card
> bluetooth snd_soc_simple_card_utils adv7511 cec wlcore_sdio kirin_drm
> dw_drm_dsi rfkill drm_kms_helper drm drm_panel_orientation_quirks fuse
> [last unloaded: test_bpf]
> [  149.056507] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.19.121-rc1 #1
> [  149.069594] Hardware name: HiKey Development Board (DT)
> [  149.081514] pstate: 8005 (Nzcv daif -PAN -UAO)
> [  149.093062] pc : dev_watchdog+0x2b4/0x2c0
> [  149.103862] lr : dev_watchdog+0x2b4/0x2c0
> [  149.114575] sp : 08003d10
> [  149.124613] x29: 08003d10 x28: 0002
> [  149.136698] x27: 0001 x26: 
> [  149.148810] x25: 0180 x24: 800074c654b8
> [  149.160891] x23: 800074c65460 x22: 8000748dd680
> [  149.172993] x21: 0974a000 x20: 800074c65000
> [  149.185065] x19:  x18: 
> [  149.197172] x17:  x16: 
> [  149.209243] x15: 0001 x14: 09062cd8
> [  149.221234] x13: 45a6fc2a x12: 0975b630
> [  149.233166] x11:  x10: 0974fa48
> [  149.245023] x9 : 097e3000 x8 : 0974fa48
> [  149.256818] x7 : 08173694 x6 : 800077ee62d0
> [  149.268639] x5 : 800077ee62d0 x4 : 
> [  149.280412] x3 : 800077eef6c8 x2 : 0103
> [  149.292120] x1 : d13523b333b73d00 x0 : 
> [  149.303783] Call trace:
> [  149.312481]  dev_watchdog+0x2b4/0x2c0
> [  149.322463]  call_timer_fn+0xbc/0x3f0
> [  149.332463]  expire_timers+0x104/0x220
> [  149.342493]  run_timer_softirq+0xec/0x1a8
> [  149.352784]  __do_softirq+0x114/0x554
> [  149.362668]  irq_exit+0x144/0x150
> [  149.372235]  __handle_domain_irq+0x6c/0xc0
> [  149.382633]  gic_handle_irq+0x60/0xb0
> [  149.392606]  el1_irq+0xb4/0x130
> [  149.402031]  cpuidle_enter_state+0xbc/0x3f0
> [  149.412572]  cpuidle_enter+0x34/0x48
> [  149.422539]  call_cpuidle+0x44/0x78
> [  149.432410]  do_idle+0x228/0x2a8
> [  149.441959]  cpu_startup_entry+0x2c/0x30
> [  149.452185]  rest_init+0x25c/0x270
> [  149.461821]  start_kernel+0x468/0x494
> [  149.471659] irq event stamp: 5706193
> [  149.481376] hardirqs last  enabled at (5706192):
> [] console_unlock+0x424/0x638
> [  149.496628] hardirqs last disabled at (5706193):
> [] do_debug_exception+0xf8/0x1d0
> [  149.512207] softirqs last  enabled at (5706160):
> [] _local_bh_enable+0x28/0x48
> [  149.527590] softirqs last disabled at (5706161):
> [] irq_exit+0x144/0x150
> [  149.542410] ---[ end trace 4c7bd8e08a6a3d65 ]---
> [  177.828500] jited:1 1366234 PASS
>
> ref:
> https://qa-reports.linaro.org/lkft/linux-stable-rc-4.19-oe/build/v4.19.120-38-g2e3613309d93/testrun/1415357/log
>
> metadata:
>   git branch: linux-4.19.y
>   git repo: 
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
>   make_kernelversion: 4.19.121-rc1
>   kernel-config:
> http://snapshots.linaro.org/openembedded/lkft/lkft/sumo/hikey/lkft/linux-stable-rc-4.19/530/config
>

--
Linaro LKFT
https://lkft.linaro.org

stable-rc 4.19: NETDEV WATCHDOG: eth0 (asix): transmit queue 0 timed out - net/sched/sch_generic.c:466 dev_watchdog

2020-05-05 Thread Naresh Kamboju

While running selftests bpf test_sysctl on stable rc 4.19 branch kernel
on arm64 hikey device. The following warning was noticed.

[  118.957395] test_bpf: #296 BPF_MAXINSNS: exec all MSH
[  148.966435] [ cut here ]
[  148.988349] NETDEV WATCHDOG: eth0 (asix): transmit queue 0 timed out
[  149.000832] WARNING: CPU: 0 PID: 0 at
/usr/src/kernel/net/sched/sch_generic.c:466 dev_watchdog+0x2b4/0x2c0
[  149.016470] Modules linked in: test_bpf(+) wl18xx wlcore mac80211
cfg80211 crc32_ce hci_uart crct10dif_ce btbcm snd_soc_audio_graph_card
bluetooth snd_soc_simple_card_utils adv7511 cec wlcore_sdio kirin_drm
dw_drm_dsi rfkill drm_kms_helper drm drm_panel_orientation_quirks fuse
[last unloaded: test_bpf]
[  149.056507] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.19.121-rc1 #1
[  149.069594] Hardware name: HiKey Development Board (DT)
[  149.081514] pstate: 8005 (Nzcv daif -PAN -UAO)
[  149.093062] pc : dev_watchdog+0x2b4/0x2c0
[  149.103862] lr : dev_watchdog+0x2b4/0x2c0
[  149.114575] sp : 08003d10
[  149.124613] x29: 08003d10 x28: 0002
[  149.136698] x27: 0001 x26: 
[  149.148810] x25: 0180 x24: 800074c654b8
[  149.160891] x23: 800074c65460 x22: 8000748dd680
[  149.172993] x21: 0974a000 x20: 800074c65000
[  149.185065] x19:  x18: 
[  149.197172] x17:  x16: 
[  149.209243] x15: 0001 x14: 09062cd8
[  149.221234] x13: 45a6fc2a x12: 0975b630
[  149.233166] x11:  x10: 0974fa48
[  149.245023] x9 : 097e3000 x8 : 0974fa48
[  149.256818] x7 : 08173694 x6 : 800077ee62d0
[  149.268639] x5 : 800077ee62d0 x4 : 
[  149.280412] x3 : 800077eef6c8 x2 : 0103
[  149.292120] x1 : d13523b333b73d00 x0 : 
[  149.303783] Call trace:
[  149.312481]  dev_watchdog+0x2b4/0x2c0
[  149.322463]  call_timer_fn+0xbc/0x3f0
[  149.332463]  expire_timers+0x104/0x220
[  149.342493]  run_timer_softirq+0xec/0x1a8
[  149.352784]  __do_softirq+0x114/0x554
[  149.362668]  irq_exit+0x144/0x150
[  149.372235]  __handle_domain_irq+0x6c/0xc0
[  149.382633]  gic_handle_irq+0x60/0xb0
[  149.392606]  el1_irq+0xb4/0x130
[  149.402031]  cpuidle_enter_state+0xbc/0x3f0
[  149.412572]  cpuidle_enter+0x34/0x48
[  149.422539]  call_cpuidle+0x44/0x78
[  149.432410]  do_idle+0x228/0x2a8
[  149.441959]  cpu_startup_entry+0x2c/0x30
[  149.452185]  rest_init+0x25c/0x270
[  149.461821]  start_kernel+0x468/0x494
[  149.471659] irq event stamp: 5706193
[  149.481376] hardirqs last  enabled at (5706192):
[] console_unlock+0x424/0x638
[  149.496628] hardirqs last disabled at (5706193):
[] do_debug_exception+0xf8/0x1d0
[  149.512207] softirqs last  enabled at (5706160):
[] _local_bh_enable+0x28/0x48
[  149.527590] softirqs last disabled at (5706161):
[] irq_exit+0x144/0x150
[  149.542410] ---[ end trace 4c7bd8e08a6a3d65 ]---
[  177.828500] jited:1 1366234 PASS

ref:
https://qa-reports.linaro.org/lkft/linux-stable-rc-4.19-oe/build/v4.19.120-38-g2e3613309d93/testrun/1415357/log

metadata:
  git branch: linux-4.19.y
  git repo: 
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
  make_kernelversion: 4.19.121-rc1
  kernel-config:
http://snapshots.linaro.org/openembedded/lkft/lkft/sumo/hikey/lkft/linux-stable-rc-4.19/530/config

-- 
Linaro LKFT
https://lkft.linaro.org

Re: NETDEV WATCHDOG: eth0 (tg3): transmit queue 0 timed out

2018-03-26 Thread Borislav Petkov

On Tue, Mar 20, 2018 at 11:41:06AM +0530, Satish Baddipadige wrote:
> Can you please test the attached patch?

Well, the network connection just died with it. It didn't fire the
netdev watchdog but I still had to down and up eth0 in order to continue
using it. ssh connection into the box survived so I didn't have to login
again but it still died intermittently.

I'll keep playing with it to see if I'll catch some sort of splat...

Thx.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

Re: NETDEV WATCHDOG: eth0 (tg3): transmit queue 0 timed out

2018-03-26 Thread Borislav Petkov

On Tue, Mar 20, 2018 at 11:41:06AM +0530, Satish Baddipadige wrote:
> Can you please test the attached patch?

Well, the network connection just died with it. It didn't fire the
netdev watchdog but I still had to down and up eth0 in order to continue
using it. ssh connection into the box survived so I didn't have to login
again but it still died intermittently.

I'll keep playing with it to see if I'll catch some sort of splat...

Thx.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

Re: NETDEV WATCHDOG: eth0 (tg3): transmit queue 0 timed out

2018-03-20 Thread Borislav Petkov

On Tue, Mar 20, 2018 at 11:41:06AM +0530, Satish Baddipadige wrote:
> Can you please test the attached patch?

Sure, will do when I get back next week.

Thx.

-- 
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply. Srsly.

Re: NETDEV WATCHDOG: eth0 (tg3): transmit queue 0 timed out

2018-03-20 Thread Borislav Petkov

On Tue, Mar 20, 2018 at 11:41:06AM +0530, Satish Baddipadige wrote:
> Can you please test the attached patch?

Sure, will do when I get back next week.

Thx.

-- 
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply. Srsly.

Re: NETDEV WATCHDOG: eth0 (tg3): transmit queue 0 timed out

2018-03-20 Thread Satish Baddipadige

On Wed, Feb 28, 2018 at 7:40 PM, Siva Reddy Kallam
 wrote:
> On Sat, Feb 24, 2018 at 3:48 PM, Borislav Petkov  wrote:
>> Hi,
>>
>> this didn't happen before but after 4.16-rc1 my tg3 nic stops for
>> whatever reason and the connection to the machine is dead. It didn't show
>> anything in dmesg until today.
>>
>> The IO pagefaults look like it is trying to access something it
>> shouldn't and maybe that's why it times out.
>>
>> It triggers pretty quickly so I'd call it a reliable reproducer and thus
>> I can test patches... :-)
>>
>> Thx.
> Thanks for reporting this. Somehow, this mail moved to my spam folder.
> Hence, delay in response.
> Looks like this is similar to below issue and it was reported some time back.
> https://www.spinics.net/lists/netdev/msg482757.html
> We are actively working on this. We will soon provide you an update on this.

Hi Borislav,

Can you please test the attached patch?

Thanks,
Satish


tg3_5762_clock_override.patch
Description: Binary data

Re: NETDEV WATCHDOG: eth0 (tg3): transmit queue 0 timed out

2018-03-20 Thread Satish Baddipadige

On Wed, Feb 28, 2018 at 7:40 PM, Siva Reddy Kallam
 wrote:
> On Sat, Feb 24, 2018 at 3:48 PM, Borislav Petkov  wrote:
>> Hi,
>>
>> this didn't happen before but after 4.16-rc1 my tg3 nic stops for
>> whatever reason and the connection to the machine is dead. It didn't show
>> anything in dmesg until today.
>>
>> The IO pagefaults look like it is trying to access something it
>> shouldn't and maybe that's why it times out.
>>
>> It triggers pretty quickly so I'd call it a reliable reproducer and thus
>> I can test patches... :-)
>>
>> Thx.
> Thanks for reporting this. Somehow, this mail moved to my spam folder.
> Hence, delay in response.
> Looks like this is similar to below issue and it was reported some time back.
> https://www.spinics.net/lists/netdev/msg482757.html
> We are actively working on this. We will soon provide you an update on this.

Hi Borislav,

Can you please test the attached patch?

Thanks,
Satish


tg3_5762_clock_override.patch
Description: Binary data

Re: NETDEV WATCHDOG: eth0 (tg3): transmit queue 0 timed out

2018-02-28 Thread Siva Reddy Kallam

On Sat, Feb 24, 2018 at 3:48 PM, Borislav Petkov  wrote:
> Hi,
>
> this didn't happen before but after 4.16-rc1 my tg3 nic stops for
> whatever reason and the connection to the machine is dead. It didn't show
> anything in dmesg until today.
>
> The IO pagefaults look like it is trying to access something it
> shouldn't and maybe that's why it times out.
>
> It triggers pretty quickly so I'd call it a reliable reproducer and thus
> I can test patches... :-)
>
> Thx.
Thanks for reporting this. Somehow, this mail moved to my spam folder.
Hence, delay in response.
Looks like this is similar to below issue and it was reported some time back.
https://www.spinics.net/lists/netdev/msg482757.html
We are actively working on this. We will soon provide you an update on this.

Re: NETDEV WATCHDOG: eth0 (tg3): transmit queue 0 timed out

2018-02-28 Thread Siva Reddy Kallam

On Sat, Feb 24, 2018 at 3:48 PM, Borislav Petkov  wrote:
> Hi,
>
> this didn't happen before but after 4.16-rc1 my tg3 nic stops for
> whatever reason and the connection to the machine is dead. It didn't show
> anything in dmesg until today.
>
> The IO pagefaults look like it is trying to access something it
> shouldn't and maybe that's why it times out.
>
> It triggers pretty quickly so I'd call it a reliable reproducer and thus
> I can test patches... :-)
>
> Thx.
Thanks for reporting this. Somehow, this mail moved to my spam folder.
Hence, delay in response.
Looks like this is similar to below issue and it was reported some time back.
https://www.spinics.net/lists/netdev/msg482757.html
We are actively working on this. We will soon provide you an update on this.

NETDEV WATCHDOG: eth0 (tg3): transmit queue 0 timed out

2018-02-24 Thread Borislav Petkov

Hi,

this didn't happen before but after 4.16-rc1 my tg3 nic stops for
whatever reason and the connection to the machine is dead. It didn't show
anything in dmesg until today.

The IO pagefaults look like it is trying to access something it
shouldn't and maybe that's why it times out.

It triggers pretty quickly so I'd call it a reliable reproducer and thus
I can test patches... :-)

Thx.

...
[   15.916840] random: crng init done
[   44.792699] tg3 :01:00.0 eth0: Link is up at 100 Mbps, full duplex
[   44.793024] tg3 :01:00.0 eth0: Flow control is on for TX and on for RX
[   44.793315] tg3 :01:00.0 eth0: EEE is disabled
[   44.793395] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[   58.216474] tg3 :01:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT 
domain=0x0001 address=0x0001f0c0 flags=0x]
[   58.216943] tg3 :01:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT 
domain=0x0001 address=0x0001f100 flags=0x]
[   58.217395] tg3 :01:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT 
domain=0x0001 address=0x0001f140 flags=0x]
[   58.217844] tg3 :01:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT 
domain=0x0001 address=0x0001f180 flags=0x]
[   64.992145] [ cut here ]
[   64.992406] NETDEV WATCHDOG: eth0 (tg3): transmit queue 0 timed out
[   64.992742] WARNING: CPU: 1 PID: 0 at net/sched/sch_generic.c:464 
dev_watchdog+0x1fe/0x210
[   64.992744] Modules linked in: arc4 iwlmvm mac80211 amdgpu kvm_amd kvm 
iwlwifi irqbypass crct10dif_pclmul crc32_pclmul crc32c_intel 
snd_hda_codec_conexant snd_hda_codec_hdmi snd_hda_codec_generic aesni_intel 
sha256_generic aes_x86_64 crypto_simd snd_hda_intel cryptd glue_helper tg3 
snd_hda_codec pcspkr snd_hwdep cfg80211 joydev psmouse ptp snd_hda_core hp_wmi 
pps_core snd_pcm ehci_pci chash tpm_infineon rfkill libphy i2c_piix4 snd_timer 
fam15h_power xhci_pci ehci_hcd snd sg gpu_sched k10temp soundcore xhci_hcd 
tpm_tis tpm_tis_core video tpm battery button ac acpi_cpufreq evdev input_leds 
serio_raw sd_mod thermal pinctrl_amd
[   64.993216] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.16.0-rc1+ #2
[   64.993222] Hardware name: HP HP EliteBook 745 G3/807E, BIOS N73 Ver. 01.08 
01/28/2016
[   64.996048] RIP: 0010:dev_watchdog+0x1fe/0x210
[   64.996050] RSP: 0018:88043dc83e88 EFLAGS: 00010282
[   64.996052] RAX:  RBX:  RCX: 0103
[   64.996054] RDX: 8103 RSI: 0086 RDI: 
[   64.996055] RBP: 88042b86e39c R08: 81c0a400 R09: 0001
[   64.996057] R10: 035a R11:  R12: 88042b86e3b0
[   64.996058] R13: 88042b86e000 R14: 0005 R15: 88042a0ced80
[   64.996061] FS:  () GS:88043dc8() 
knlGS:
[   64.996063] CS:  0010 DS:  ES:  CR0: 80050033
[   64.996065] CR2: 7f98ed87eb00 CR3: 000428ea CR4: 001406e0
[   64.996068] Call Trace:
[   64.996074]  
[   64.996082]  ? qdisc_reset+0xe0/0xe0
[   64.996085]  ? qdisc_reset+0xe0/0xe0
[   64.996092]  call_timer_fn+0x2b/0x150
[   64.996097]  run_timer_softirq+0x415/0x460
[   64.996101]  ? tick_sched_timer+0x42/0x90
[   64.996106]  ? _raw_spin_lock_irq+0x1a/0x40
[   64.996110]  ? __hrtimer_run_queues+0x113/0x2d0
[   64.996114]  __do_softirq+0xeb/0x2d5
[   64.996121]  irq_exit+0xaa/0xb0
[   64.996125]  smp_apic_timer_interrupt+0x73/0x150
[   64.996128]  apic_timer_interrupt+0x7d/0x90
[   64.996131]  
[   64.996136] RIP: 0010:cpuidle_enter_state+0xa3/0x2f0
[   64.996138] RSP: 0018:c900019c3ea8 EFLAGS: 0246 ORIG_RAX: 
ff12
[   64.996141] RAX: 88043dc8 RBX: 000f21d4b954 RCX: 001f
[   64.996142] RDX: 000f21d4b954 RSI: 81da4ca1 RDI: 81db2a9e
[   64.996144] RBP: 88042a39a200 R08: 0005a0b5 R09: 000585fa
[   64.996145] R10: 0018 R11: 00049370 R12: 0002
[   64.996146] R13: 82095db8 R14:  R15: 000f0b23994e
[   64.996157]  ? cpuidle_enter_state+0x93/0x2f0
[   65.003171]  do_idle+0x19a/0x1f0
[   65.003176]  cpu_startup_entry+0x6f/0x80
[   65.003181]  start_secondary+0x1a5/0x200
[   65.003185]  secondary_startup_64+0xa5/0xb0
[   65.003189] Code: 00 49 63 4c 24 f0 eb 93 4c 89 ef c6 05 5b 10 af 00 01 e8 
b6 67 fd ff 89 d9 48 89 c2 4c 89 ee 48 c7 c7 20 f6 df 81 e8 e2 8d a7 ff <0f> ff 
eb be 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 
[   65.003234] ---[ end trace b191673f18a75f41 ]---
[   65.003243] tg3 :01:00.0 eth0: transmit timed out, resetting
[   67.679695] tg3 :01:00.0 eth0: 0x: 0x168714e4, 0x10100406, 
0x0210, 0x
[   67.680053] tg3 :01:00.0 eth0: 0x0010: 0xd082000c, 0x, 
0xd081000c, 0x
[   67.680406] tg3 :01:00.0 eth0: 0x0020: 0xd08c, 0x, 
0x, 0x807e103c
[   67.680419] tg3 :01:00.0 eth0: 0x0030: 0x00

NETDEV WATCHDOG: eth0 (tg3): transmit queue 0 timed out

2018-02-24 Thread Borislav Petkov

Hi,

this didn't happen before but after 4.16-rc1 my tg3 nic stops for
whatever reason and the connection to the machine is dead. It didn't show
anything in dmesg until today.

The IO pagefaults look like it is trying to access something it
shouldn't and maybe that's why it times out.

It triggers pretty quickly so I'd call it a reliable reproducer and thus
I can test patches... :-)

Thx.

...
[   15.916840] random: crng init done
[   44.792699] tg3 :01:00.0 eth0: Link is up at 100 Mbps, full duplex
[   44.793024] tg3 :01:00.0 eth0: Flow control is on for TX and on for RX
[   44.793315] tg3 :01:00.0 eth0: EEE is disabled
[   44.793395] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[   58.216474] tg3 :01:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT 
domain=0x0001 address=0x0001f0c0 flags=0x]
[   58.216943] tg3 :01:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT 
domain=0x0001 address=0x0001f100 flags=0x]
[   58.217395] tg3 :01:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT 
domain=0x0001 address=0x0001f140 flags=0x]
[   58.217844] tg3 :01:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT 
domain=0x0001 address=0x0001f180 flags=0x]
[   64.992145] [ cut here ]
[   64.992406] NETDEV WATCHDOG: eth0 (tg3): transmit queue 0 timed out
[   64.992742] WARNING: CPU: 1 PID: 0 at net/sched/sch_generic.c:464 
dev_watchdog+0x1fe/0x210
[   64.992744] Modules linked in: arc4 iwlmvm mac80211 amdgpu kvm_amd kvm 
iwlwifi irqbypass crct10dif_pclmul crc32_pclmul crc32c_intel 
snd_hda_codec_conexant snd_hda_codec_hdmi snd_hda_codec_generic aesni_intel 
sha256_generic aes_x86_64 crypto_simd snd_hda_intel cryptd glue_helper tg3 
snd_hda_codec pcspkr snd_hwdep cfg80211 joydev psmouse ptp snd_hda_core hp_wmi 
pps_core snd_pcm ehci_pci chash tpm_infineon rfkill libphy i2c_piix4 snd_timer 
fam15h_power xhci_pci ehci_hcd snd sg gpu_sched k10temp soundcore xhci_hcd 
tpm_tis tpm_tis_core video tpm battery button ac acpi_cpufreq evdev input_leds 
serio_raw sd_mod thermal pinctrl_amd
[   64.993216] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.16.0-rc1+ #2
[   64.993222] Hardware name: HP HP EliteBook 745 G3/807E, BIOS N73 Ver. 01.08 
01/28/2016
[   64.996048] RIP: 0010:dev_watchdog+0x1fe/0x210
[   64.996050] RSP: 0018:88043dc83e88 EFLAGS: 00010282
[   64.996052] RAX:  RBX:  RCX: 0103
[   64.996054] RDX: 8103 RSI: 0086 RDI: 
[   64.996055] RBP: 88042b86e39c R08: 81c0a400 R09: 0001
[   64.996057] R10: 035a R11:  R12: 88042b86e3b0
[   64.996058] R13: 88042b86e000 R14: 0005 R15: 88042a0ced80
[   64.996061] FS:  () GS:88043dc8() 
knlGS:
[   64.996063] CS:  0010 DS:  ES:  CR0: 80050033
[   64.996065] CR2: 7f98ed87eb00 CR3: 000428ea CR4: 001406e0
[   64.996068] Call Trace:
[   64.996074]  
[   64.996082]  ? qdisc_reset+0xe0/0xe0
[   64.996085]  ? qdisc_reset+0xe0/0xe0
[   64.996092]  call_timer_fn+0x2b/0x150
[   64.996097]  run_timer_softirq+0x415/0x460
[   64.996101]  ? tick_sched_timer+0x42/0x90
[   64.996106]  ? _raw_spin_lock_irq+0x1a/0x40
[   64.996110]  ? __hrtimer_run_queues+0x113/0x2d0
[   64.996114]  __do_softirq+0xeb/0x2d5
[   64.996121]  irq_exit+0xaa/0xb0
[   64.996125]  smp_apic_timer_interrupt+0x73/0x150
[   64.996128]  apic_timer_interrupt+0x7d/0x90
[   64.996131]  
[   64.996136] RIP: 0010:cpuidle_enter_state+0xa3/0x2f0
[   64.996138] RSP: 0018:c900019c3ea8 EFLAGS: 0246 ORIG_RAX: 
ff12
[   64.996141] RAX: 88043dc8 RBX: 000f21d4b954 RCX: 001f
[   64.996142] RDX: 000f21d4b954 RSI: 81da4ca1 RDI: 81db2a9e
[   64.996144] RBP: 88042a39a200 R08: 0005a0b5 R09: 000585fa
[   64.996145] R10: 0018 R11: 00049370 R12: 0002
[   64.996146] R13: 82095db8 R14:  R15: 000f0b23994e
[   64.996157]  ? cpuidle_enter_state+0x93/0x2f0
[   65.003171]  do_idle+0x19a/0x1f0
[   65.003176]  cpu_startup_entry+0x6f/0x80
[   65.003181]  start_secondary+0x1a5/0x200
[   65.003185]  secondary_startup_64+0xa5/0xb0
[   65.003189] Code: 00 49 63 4c 24 f0 eb 93 4c 89 ef c6 05 5b 10 af 00 01 e8 
b6 67 fd ff 89 d9 48 89 c2 4c 89 ee 48 c7 c7 20 f6 df 81 e8 e2 8d a7 ff <0f> ff 
eb be 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 
[   65.003234] ---[ end trace b191673f18a75f41 ]---
[   65.003243] tg3 :01:00.0 eth0: transmit timed out, resetting
[   67.679695] tg3 :01:00.0 eth0: 0x: 0x168714e4, 0x10100406, 
0x0210, 0x
[   67.680053] tg3 :01:00.0 eth0: 0x0010: 0xd082000c, 0x, 
0xd081000c, 0x
[   67.680406] tg3 :01:00.0 eth0: 0x0020: 0xd08c, 0x, 
0x, 0x807e103c
[   67.680419] tg3 :01:00.0 eth0: 0x0030: 0x00

NETDEV WATCHDOG: eth0 (r8169): transmit queue 0 timed out

2014-12-11 Thread Marco Berizzi

Hi Folks,

I'm running slackware linux 14 32 bits as a firewall/ipsec
gateway with linux 3.18.0
I got this error just after 12 hours uptime:

WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:303 dev_watchdog+0xee/0x174()
NETDEV WATCHDOG: eth0 (r8169): transmit queue 0 timed out
Modules linked in: xfrm4_tunnel af_key authenc xfrm4_mode_tunnel deflate 
zlib_deflate zlib_inflate ctr twofish_generic twofish_i586 twofish_common 
serpent_sse2_i586 serpent_generic glue_helper blowfish_generic blowfish_common 
cbc ecb sha512_generic hmac tunnel4 ipcomp xfrm_ipcomp esp4 xts lrw gf128mul 
ablk_helper cryptd aes_i586 des_generic md5 sha1_generic sha256_generic 
nf_nat_pptp nf_nat_proto_gre nf_conntrack_pptp nf_conntrack_proto_gre 
nf_nat_ftp nf_conntrack_ftp xt_helper xt_mark xt_statistic xt_nat xt_multiport 
xt_limit xt_tcpudp xt_policy xt_conntrack iptable_mangle iptable_nat 
nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_filter 
ip_tables x_tables uhci_hcd ehci_pci tg3 ehci_hcd 8250 ptp usbcore pps_core 
serial_core libphy rtc_cmos r8169 usb_common mii processor thermal_sys hwmon 
loop [last unloaded: af_key]
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.18.0 #1
Hardware name: Hewlett-Packard HP xw4400 Workstation/0A68h, BIOS 786D7 v02.07 
10/28/2010
  c125fe2a f580bf6c c10282cb c1202130 f5904000  02b1e170
  c1028346 0009 f580bf6c c1302f9c f580bf84 c1202130 c1302fd5
 012f c1302f9c f5904000 f812e2e0  8100 c1202042 f580bfbc
Call Trace:
 [] ? dump_stack+0x3e/0x4e
 [] ? warn_slowpath_common+0x61/0x74
 [] ? dev_watchdog+0xee/0x174
 [] ? warn_slowpath_fmt+0x29/0x2d
 [] ? dev_watchdog+0xee/0x174
 [] ? pfifo_fast_dequeue+0xa2/0xa2
 [] ? call_timer_fn.isra.30+0xf/0x5a
 [] ? run_timer_softirq+0x126/0x16f
 [] ? __do_softirq+0x8f/0x16c
 [] ? __hrtimer_tasklet_trampoline+0x13/0x13
 [] ? do_softirq_own_stack+0x1a/0x1f
   [] ? irq_exit+0x31/0x70
 [] ? smp_apic_timer_interrupt+0x30/0x39
 [] ? apic_timer_interrupt+0x2d/0x34
 [] ? default_idle+0x2/0x3
 [] ? arch_cpu_idle+0x6/0x7
 [] ? cpu_startup_entry+0xeb/0x211
 [] ? start_kernel+0x2d2/0x2d5
---[ end trace 71b9cfb317b62846 ]---
r8169 :10:00.0 eth0: link up

Is this error related to ASPM?
Any response are welcome.
TIA
Here is dmesg output

Linux version 3.18.0 (root@Pleiadi) (gcc version 4.7.1 (GCC) ) #1 SMP Tue Dec 9 
16:46:26 CET 2014
e820: BIOS-provided physical RAM map:
BIOS-e820: [mem 0x-0x0009fbff] usable
BIOS-e820: [mem 0x0009fc00-0x0009] reserved
BIOS-e820: [mem 0x000e8000-0x000f] reserved
BIOS-e820: [mem 0x0010-0xbffc62ff] usable
BIOS-e820: [mem 0xbffc6300-0xbfff] reserved
BIOS-e820: [mem 0xf000-0xf3ff] reserved
BIOS-e820: [mem 0xfec0-0xfed3] reserved
BIOS-e820: [mem 0xfed45000-0x] reserved
Notice: NX (Execute Disable) protection cannot be enabled: non-PAE kernel!
SMBIOS 2.4 present.
DMI: Hewlett-Packard HP xw4400 Workstation/0A68h, BIOS 786D7 v02.07 10/28/2010
e820: update [mem 0x-0x0fff] usable ==> reserved
e820: remove [mem 0x000a-0x000f] usable
e820: last_pfn = 0xbffc6 max_arch_pfn = 0x10
MTRR default type: uncachable
MTRR fixed ranges enabled:
  0-9 write-back
  A-B uncachable
  C-E3FFF write-protect
  E4000-E write-back
  F-F write-protect
MTRR variable ranges enabled:
  0 base 0 mask F8000 write-back
  1 base 08000 mask FC000 write-back
  2 disabled
  3 disabled
  4 disabled
  5 disabled
  6 disabled
  7 disabled
x86 PAT enabled: cpu 0, old 0x7040600070406, new 0x7010600070106
found SMP MP-table at [mem 0x000f9bf0-0x000f9bff] mapped at [c00f9bf0]
initial memory mapped: [mem 0x-0x017f]
Base memory trampoline at [c009b000] 9b000 size 16384
init_memory_mapping: [mem 0x-0x000f]
 [mem 0x-0x000f] page 4k
init_memory_mapping: [mem 0x3700-0x373f]
 [mem 0x3700-0x373f] page 2M
init_memory_mapping: [mem 0x3000-0x36ff]
 [mem 0x3000-0x36ff] page 2M
init_memory_mapping: [mem 0x0010-0x2fff]
 [mem 0x0010-0x003f] page 4k
 [mem 0x0040-0x2fff] page 2M
init_memory_mapping: [mem 0x3740-0x377fdfff]
 [mem 0x3740-0x377fdfff] page 4k
BRK [0x0141c000, 0x0141cfff] PGTABLE
ACPI: Early table checksum verification disabled
ACPI: RSDP 0x000E7810 14 (v00 COMPAQ)
ACPI: RSDT 0xBFFC6340 44 (v01 HPQOEM SLIC-WKS 20101028  )
ACPI: FACP 0xBFFC63EC 74 (v01 COMPAQ GLENWOOD 0001  )
ACPI: DSDT 0xBFFC6763 00037B (v01 COMPAQ DSDT_PRJ 0001 MSFT 010E)
ACPI: FACS 0xBFFC6300 40
ACPI: SSDT 0xBFFC6ADE 008C6C (v01 COMPAQ DSDT_HW  0001 MSFT 010E)
ACPI: APIC 0xBFFC6460 84 (v01 COMPAQ GLENWOOD 0001  )
ACPI: ASF! 0xBFFC64E4 63 (v32 COMPAQ GLENWOOD 0001  )
ACPI: MCFG 0xBFFC6547 3C (v01 COMPAQ GLENW

NETDEV WATCHDOG: eth0 (r8169): transmit queue 0 timed out

2014-12-11 Thread Marco Berizzi

Hi Folks,

I'm running slackware linux 14 32 bits as a firewall/ipsec
gateway with linux 3.18.0
I got this error just after 12 hours uptime:

WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:303 dev_watchdog+0xee/0x174()
NETDEV WATCHDOG: eth0 (r8169): transmit queue 0 timed out
Modules linked in: xfrm4_tunnel af_key authenc xfrm4_mode_tunnel deflate 
zlib_deflate zlib_inflate ctr twofish_generic twofish_i586 twofish_common 
serpent_sse2_i586 serpent_generic glue_helper blowfish_generic blowfish_common 
cbc ecb sha512_generic hmac tunnel4 ipcomp xfrm_ipcomp esp4 xts lrw gf128mul 
ablk_helper cryptd aes_i586 des_generic md5 sha1_generic sha256_generic 
nf_nat_pptp nf_nat_proto_gre nf_conntrack_pptp nf_conntrack_proto_gre 
nf_nat_ftp nf_conntrack_ftp xt_helper xt_mark xt_statistic xt_nat xt_multiport 
xt_limit xt_tcpudp xt_policy xt_conntrack iptable_mangle iptable_nat 
nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_filter 
ip_tables x_tables uhci_hcd ehci_pci tg3 ehci_hcd 8250 ptp usbcore pps_core 
serial_core libphy rtc_cmos r8169 usb_common mii processor thermal_sys hwmon 
loop [last unloaded: af_key]
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.18.0 #1
Hardware name: Hewlett-Packard HP xw4400 Workstation/0A68h, BIOS 786D7 v02.07 
10/28/2010
  c125fe2a f580bf6c c10282cb c1202130 f5904000  02b1e170
  c1028346 0009 f580bf6c c1302f9c f580bf84 c1202130 c1302fd5
 012f c1302f9c f5904000 f812e2e0  8100 c1202042 f580bfbc
Call Trace:
 [c125fe2a] ? dump_stack+0x3e/0x4e
 [c10282cb] ? warn_slowpath_common+0x61/0x74
 [c1202130] ? dev_watchdog+0xee/0x174
 [c1028346] ? warn_slowpath_fmt+0x29/0x2d
 [c1202130] ? dev_watchdog+0xee/0x174
 [c1202042] ? pfifo_fast_dequeue+0xa2/0xa2
 [c1051446] ? call_timer_fn.isra.30+0xf/0x5a
 [c10516b1] ? run_timer_softirq+0x126/0x16f
 [c102a10c] ? __do_softirq+0x8f/0x16c
 [c102a07d] ? __hrtimer_tasklet_trampoline+0x13/0x13
 [c1002e71] ? do_softirq_own_stack+0x1a/0x1f
 IRQ  [c102a313] ? irq_exit+0x31/0x70
 [c101cbec] ? smp_apic_timer_interrupt+0x30/0x39
 [c1263609] ? apic_timer_interrupt+0x2d/0x34
 [c1007502] ? default_idle+0x2/0x3
 [c1007a0f] ? arch_cpu_idle+0x6/0x7
 [c1046686] ? cpu_startup_entry+0xeb/0x211
 [c135a95c] ? start_kernel+0x2d2/0x2d5
---[ end trace 71b9cfb317b62846 ]---
r8169 :10:00.0 eth0: link up

Is this error related to ASPM?
Any response are welcome.
TIA
Here is dmesg output

Linux version 3.18.0 (root@Pleiadi) (gcc version 4.7.1 (GCC) ) #1 SMP Tue Dec 9 
16:46:26 CET 2014
e820: BIOS-provided physical RAM map:
BIOS-e820: [mem 0x-0x0009fbff] usable
BIOS-e820: [mem 0x0009fc00-0x0009] reserved
BIOS-e820: [mem 0x000e8000-0x000f] reserved
BIOS-e820: [mem 0x0010-0xbffc62ff] usable
BIOS-e820: [mem 0xbffc6300-0xbfff] reserved
BIOS-e820: [mem 0xf000-0xf3ff] reserved
BIOS-e820: [mem 0xfec0-0xfed3] reserved
BIOS-e820: [mem 0xfed45000-0x] reserved
Notice: NX (Execute Disable) protection cannot be enabled: non-PAE kernel!
SMBIOS 2.4 present.
DMI: Hewlett-Packard HP xw4400 Workstation/0A68h, BIOS 786D7 v02.07 10/28/2010
e820: update [mem 0x-0x0fff] usable == reserved
e820: remove [mem 0x000a-0x000f] usable
e820: last_pfn = 0xbffc6 max_arch_pfn = 0x10
MTRR default type: uncachable
MTRR fixed ranges enabled:
  0-9 write-back
  A-B uncachable
  C-E3FFF write-protect
  E4000-E write-back
  F-F write-protect
MTRR variable ranges enabled:
  0 base 0 mask F8000 write-back
  1 base 08000 mask FC000 write-back
  2 disabled
  3 disabled
  4 disabled
  5 disabled
  6 disabled
  7 disabled
x86 PAT enabled: cpu 0, old 0x7040600070406, new 0x7010600070106
found SMP MP-table at [mem 0x000f9bf0-0x000f9bff] mapped at [c00f9bf0]
initial memory mapped: [mem 0x-0x017f]
Base memory trampoline at [c009b000] 9b000 size 16384
init_memory_mapping: [mem 0x-0x000f]
 [mem 0x-0x000f] page 4k
init_memory_mapping: [mem 0x3700-0x373f]
 [mem 0x3700-0x373f] page 2M
init_memory_mapping: [mem 0x3000-0x36ff]
 [mem 0x3000-0x36ff] page 2M
init_memory_mapping: [mem 0x0010-0x2fff]
 [mem 0x0010-0x003f] page 4k
 [mem 0x0040-0x2fff] page 2M
init_memory_mapping: [mem 0x3740-0x377fdfff]
 [mem 0x3740-0x377fdfff] page 4k
BRK [0x0141c000, 0x0141cfff] PGTABLE
ACPI: Early table checksum verification disabled
ACPI: RSDP 0x000E7810 14 (v00 COMPAQ)
ACPI: RSDT 0xBFFC6340 44 (v01 HPQOEM SLIC-WKS 20101028  )
ACPI: FACP 0xBFFC63EC 74 (v01 COMPAQ GLENWOOD 0001  )
ACPI: DSDT 0xBFFC6763 00037B (v01 COMPAQ DSDT_PRJ 0001 MSFT 010E)
ACPI: FACS 0xBFFC6300 40
ACPI: SSDT 0xBFFC6ADE 008C6C (v01 COMPAQ DSDT_HW  0001 MSFT 010E)
ACPI: APIC 0xBFFC6460 84 (v01 COMPAQ GLENWOOD

Re: igb and bnx2: "NETDEV WATCHDOG: transmit queue timed out" when skb has huge linear buffer

2014-02-12 Thread Zoltan Kiss


Hi,

I still haven't managed to crack this problem. I've made sure the below 
mentioned skb's look the same as the other ones: linear buffer with 
header, and the rest is aggregated into frags. Utilizing the skb 
destructor I've also checked that these packets are all freed before the 
TX hang happens. So the only difference from current upstream is that 
the pages are grant mapped into Dom0 instead of grant copy to a local page.
I've also found some of my older notes about this issue, where I managed 
to reproduce this on igb, and in that particular case the TX hang could 
be solved with ifconfig down/up. Does the "Detected Tx Unit Hang" 
messages give any hint to igb developers?


Nov 26 04:18:34 localhost kernel: [ 7814.197868] [ cut here 
]
Nov 26 04:18:34 localhost kernel: [ 7814.197889] WARNING: at 
net/sched/sch_generic.c:255 dev_watchdog+0x165/0x220()
Nov 26 04:18:34 localhost kernel: [ 7814.197892] NETDEV WATCHDOG: eth0 
(igb): transmit queue 7 timed out
Nov 26 04:18:34 localhost kernel: [ 7814.197894] Modules linked in: tun 
nfsv3 nfs_acl nfs fscache dm_multipath scsi_dh lockd sunrpc openvswitch 
ipt_REJECT nf_conntrack_ipv4 nf_defrag_ip
v4 xt_tcpudp xt_conntrack nf_conntrack iptable_filter ip_tables x_tables 
nls_utf8 isofs dm_mirror video backlight sbs sbshc hed acpi_ipmi 
ipmi_msghandler nvram sg psmouse serio_raw igb
i2c_algo_bit ptp pps_core hpilo tpm_tis tpm tpm_bios lpc_ich mfd_core 
ehci_pci crc32_pclmul aesni_intel ablk_helper cryptd lrw aes_i586 xts 
gf128mul dm_region_hash dm_log dm_mod shpchp
hpsa sd_mod scsi_mod uhci_hcd ohci_hcd ehci_hcd fbcon font tileblit 
bitblit softcursor [last unloaded: microcode]
Nov 26 04:18:34 localhost kernel: [ 7814.197957] CPU: 5 PID: 0 Comm: 
swapper/5 Not tainted 3.10.11-0.xs1.8.50.127.377543 #1
Nov 26 04:18:34 localhost kernel: [ 7814.197959] Hardware name: HP 
ProLiant BL420c Gen8, BIOS I30 12/14/2012
Nov 26 04:18:34 localhost kernel: [ 7814.197962]  e5cd9e10 c13e4c55 
e5cd9ddc c1278546 e5cd9e00 c1047fd3 c1643220 e5cd9e2c
Nov 26 04:18:34 localhost kernel: [ 7814.197969]  00ff c13e4c55 
e1fa8700 0007 04e2 e5cd9e18 c1048093 0009
Nov 26 04:18:34 localhost kernel: [ 7814.197975]  e5cd9e10 c1643220 
e5cd9e2c e5cd9e50 c13e4c55 c163fe6b 00ff c1643220

Nov 26 04:18:34 localhost kernel: [ 7814.197982] Call Trace:
Nov 26 04:18:34 localhost kernel: [ 7814.197988]  [] ? 
dev_watchdog+0x165/0x220
Nov 26 04:18:34 localhost kernel: [ 7814.197994]  [] 
dump_stack+0x16/0x20
Nov 26 04:18:34 localhost kernel: [ 7814.198000]  [] 
warn_slowpath_common+0x63/0x80
Nov 26 04:18:34 localhost kernel: [ 7814.198003]  [] ? 
dev_watchdog+0x165/0x220
Nov 26 04:18:34 localhost kernel: [ 7814.198007]  [] 
warn_slowpath_fmt+0x33/0x40
Nov 26 04:18:34 localhost kernel: [ 7814.198011]  [] 
dev_watchdog+0x165/0x220
Nov 26 04:18:34 localhost kernel: [ 7814.198017]  [] ? 
dev_activate+0x110/0x110
Nov 26 04:18:34 localhost kernel: [ 7814.198020]  [] 
call_timer_fn+0x58/0xe0
Nov 26 04:18:34 localhost kernel: [ 7814.198024]  [] 
run_timer_softirq+0x1a8/0x1f0
Nov 26 04:18:34 localhost kernel: [ 7814.198028]  [] ? 
info_for_irq+0xd/0x20
Nov 26 04:18:34 localhost kernel: [ 7814.198031]  [] ? 
evtchn_from_irq+0x3c/0x50
Nov 26 04:18:34 localhost kernel: [ 7814.198034]  [] ? 
dev_activate+0x110/0x110
Nov 26 04:18:34 localhost kernel: [ 7814.198038]  [] 
__do_softirq+0xd9/0x1e0
Nov 26 04:18:34 localhost kernel: [ 7814.198041]  [] ? 
__xen_evtchn_do_upcall+0x245/0x280
Nov 26 04:18:34 localhost kernel: [ 7814.198045]  [] 
irq_exit+0x41/0x80
Nov 26 04:18:34 localhost kernel: [ 7814.198048]  [] 
xen_evtchn_do_upcall+0x25/0x30
Nov 26 04:18:34 localhost kernel: [ 7814.198053]  [] 
xen_do_upcall+0x7/0xc
Nov 26 04:18:34 localhost kernel: [ 7814.198058]  [] ? 
rcu_process_gp_end+0x58/0x70
Nov 26 04:18:34 localhost kernel: [ 7814.198061]  [] ? 
xen_hypercall_sched_op+0x7/0x20
Nov 26 04:18:34 localhost kernel: [ 7814.198066]  [] ? 
xen_safe_halt+0x12/0x20
Nov 26 04:18:34 localhost kernel: [ 7814.198070]  [] 
default_idle+0x56/0xb0
Nov 26 04:18:34 localhost kernel: [ 7814.198074]  [] 
arch_cpu_idle+0x17/0x30
Nov 26 04:18:34 localhost kernel: [ 7814.198078]  [] 
cpu_startup_entry+0x15e/0x1d0
Nov 26 04:18:34 localhost kernel: [ 7814.198085]  [] 
cpu_bringup_and_idle+0x12/0x20
Nov 26 04:18:34 localhost kernel: [ 7814.198088] ---[ end trace 
d8c0d3f5c187aa6b ]---


And the recovery:

Nov 26 21:47:54 localhost kernel: [70773.950715] [ cut here 
]
Nov 26 21:47:54 localhost kernel: [70773.950747] WARNING: at 
net/core/dev.c:4201 net_rx_action+0xfd/0x1c0()
Nov 26 21:47:54 localhost kernel: [70773.950751] Modules linked in: tun 
nfsv3 nfs_acl nfs fscache dm_multipath scsi_dh lockd sunrpc openvswitch 
ipt_REJECT nf_conntrack_ipv4 nf_defrag_ip
v4 xt_tcpudp xt_conntrack nf_conntrack iptable_filter ip_tables x_tables 
nls_utf8 isofs dm_mirror video backlight sbs sbshc hed acpi_ipmi 
ipmi_msghandler nvram sg psmouse serio_raw igb
i2c_algo_bit pt

Re: igb and bnx2: NETDEV WATCHDOG: transmit queue timed out when skb has huge linear buffer

2014-02-12 Thread Zoltan Kiss


Hi,

I still haven't managed to crack this problem. I've made sure the below 
mentioned skb's look the same as the other ones: linear buffer with 
header, and the rest is aggregated into frags. Utilizing the skb 
destructor I've also checked that these packets are all freed before the 
TX hang happens. So the only difference from current upstream is that 
the pages are grant mapped into Dom0 instead of grant copy to a local page.
I've also found some of my older notes about this issue, where I managed 
to reproduce this on igb, and in that particular case the TX hang could 
be solved with ifconfig down/up. Does the Detected Tx Unit Hang 
messages give any hint to igb developers?


Nov 26 04:18:34 localhost kernel: [ 7814.197868] [ cut here 
]
Nov 26 04:18:34 localhost kernel: [ 7814.197889] WARNING: at 
net/sched/sch_generic.c:255 dev_watchdog+0x165/0x220()
Nov 26 04:18:34 localhost kernel: [ 7814.197892] NETDEV WATCHDOG: eth0 
(igb): transmit queue 7 timed out
Nov 26 04:18:34 localhost kernel: [ 7814.197894] Modules linked in: tun 
nfsv3 nfs_acl nfs fscache dm_multipath scsi_dh lockd sunrpc openvswitch 
ipt_REJECT nf_conntrack_ipv4 nf_defrag_ip
v4 xt_tcpudp xt_conntrack nf_conntrack iptable_filter ip_tables x_tables 
nls_utf8 isofs dm_mirror video backlight sbs sbshc hed acpi_ipmi 
ipmi_msghandler nvram sg psmouse serio_raw igb
i2c_algo_bit ptp pps_core hpilo tpm_tis tpm tpm_bios lpc_ich mfd_core 
ehci_pci crc32_pclmul aesni_intel ablk_helper cryptd lrw aes_i586 xts 
gf128mul dm_region_hash dm_log dm_mod shpchp
hpsa sd_mod scsi_mod uhci_hcd ohci_hcd ehci_hcd fbcon font tileblit 
bitblit softcursor [last unloaded: microcode]
Nov 26 04:18:34 localhost kernel: [ 7814.197957] CPU: 5 PID: 0 Comm: 
swapper/5 Not tainted 3.10.11-0.xs1.8.50.127.377543 #1
Nov 26 04:18:34 localhost kernel: [ 7814.197959] Hardware name: HP 
ProLiant BL420c Gen8, BIOS I30 12/14/2012
Nov 26 04:18:34 localhost kernel: [ 7814.197962]  e5cd9e10 c13e4c55 
e5cd9ddc c1278546 e5cd9e00 c1047fd3 c1643220 e5cd9e2c
Nov 26 04:18:34 localhost kernel: [ 7814.197969]  00ff c13e4c55 
e1fa8700 0007 04e2 e5cd9e18 c1048093 0009
Nov 26 04:18:34 localhost kernel: [ 7814.197975]  e5cd9e10 c1643220 
e5cd9e2c e5cd9e50 c13e4c55 c163fe6b 00ff c1643220

Nov 26 04:18:34 localhost kernel: [ 7814.197982] Call Trace:
Nov 26 04:18:34 localhost kernel: [ 7814.197988]  [c13e4c55] ? 
dev_watchdog+0x165/0x220
Nov 26 04:18:34 localhost kernel: [ 7814.197994]  [c1278546] 
dump_stack+0x16/0x20
Nov 26 04:18:34 localhost kernel: [ 7814.198000]  [c1047fd3] 
warn_slowpath_common+0x63/0x80
Nov 26 04:18:34 localhost kernel: [ 7814.198003]  [c13e4c55] ? 
dev_watchdog+0x165/0x220
Nov 26 04:18:34 localhost kernel: [ 7814.198007]  [c1048093] 
warn_slowpath_fmt+0x33/0x40
Nov 26 04:18:34 localhost kernel: [ 7814.198011]  [c13e4c55] 
dev_watchdog+0x165/0x220
Nov 26 04:18:34 localhost kernel: [ 7814.198017]  [c13e4af0] ? 
dev_activate+0x110/0x110
Nov 26 04:18:34 localhost kernel: [ 7814.198020]  [c1055c18] 
call_timer_fn+0x58/0xe0
Nov 26 04:18:34 localhost kernel: [ 7814.198024]  [c1056ce8] 
run_timer_softirq+0x1a8/0x1f0
Nov 26 04:18:34 localhost kernel: [ 7814.198028]  [c12fb61d] ? 
info_for_irq+0xd/0x20
Nov 26 04:18:34 localhost kernel: [ 7814.198031]  [c12fbb6c] ? 
evtchn_from_irq+0x3c/0x50
Nov 26 04:18:34 localhost kernel: [ 7814.198034]  [c13e4af0] ? 
dev_activate+0x110/0x110
Nov 26 04:18:34 localhost kernel: [ 7814.198038]  [c104fcb9] 
__do_softirq+0xd9/0x1e0
Nov 26 04:18:34 localhost kernel: [ 7814.198041]  [c12fc045] ? 
__xen_evtchn_do_upcall+0x245/0x280
Nov 26 04:18:34 localhost kernel: [ 7814.198045]  [c104fe41] 
irq_exit+0x41/0x80
Nov 26 04:18:34 localhost kernel: [ 7814.198048]  [c12fc0e5] 
xen_evtchn_do_upcall+0x25/0x30
Nov 26 04:18:34 localhost kernel: [ 7814.198053]  [c147b287] 
xen_do_upcall+0x7/0xc
Nov 26 04:18:34 localhost kernel: [ 7814.198058]  [c10c00d8] ? 
rcu_process_gp_end+0x58/0x70
Nov 26 04:18:34 localhost kernel: [ 7814.198061]  [c10013a7] ? 
xen_hypercall_sched_op+0x7/0x20
Nov 26 04:18:34 localhost kernel: [ 7814.198066]  [c1007ef2] ? 
xen_safe_halt+0x12/0x20
Nov 26 04:18:34 localhost kernel: [ 7814.198070]  [c1015be6] 
default_idle+0x56/0xb0
Nov 26 04:18:34 localhost kernel: [ 7814.198074]  [c10158e7] 
arch_cpu_idle+0x17/0x30
Nov 26 04:18:34 localhost kernel: [ 7814.198078]  [c108e2ae] 
cpu_startup_entry+0x15e/0x1d0
Nov 26 04:18:34 localhost kernel: [ 7814.198085]  [c1464282] 
cpu_bringup_and_idle+0x12/0x20
Nov 26 04:18:34 localhost kernel: [ 7814.198088] ---[ end trace 
d8c0d3f5c187aa6b ]---


And the recovery:

Nov 26 21:47:54 localhost kernel: [70773.950715] [ cut here 
]
Nov 26 21:47:54 localhost kernel: [70773.950747] WARNING: at 
net/core/dev.c:4201 net_rx_action+0xfd/0x1c0()
Nov 26 21:47:54 localhost kernel: [70773.950751] Modules linked in: tun 
nfsv3 nfs_acl nfs fscache dm_multipath scsi_dh lockd sunrpc openvswitch 
ipt_REJECT nf_conntrack_ipv4 nf_defrag_ip
v4 xt_tcpudp

Re: igb and bnx2: "NETDEV WATCHDOG: transmit queue timed out" when skb has huge linear buffer

2014-02-06 Thread Zoltan Kiss


On 05/02/14 20:43, Andrew Cooper wrote:

On 05/02/2014 20:23, Zoltan Kiss wrote:

On 04/02/14 19:47, Michael Chan wrote:

On Fri, 2014-01-31 at 14:29 +0100, Zoltan Kiss wrote:

[ 5417.275472] WARNING: at net/sched/sch_generic.c:255
dev_watchdog+0x156/0x1f0()
[ 5417.275474] NETDEV WATCHDOG: eth1 (bnx2): transmit queue 2 timed out


The dump shows an internal IRQ pending on MSIX vector 2 which matches
the the queue number that is timing out.  I don't know what happened to
the MSIX and why the driver is not seeing it.  Do you see an IRQ error
message from the kernel a few seconds before the tx timeout message?


I haven't seen any IRQ related error message. Note, this is on Xen
4.3.1. Now I have new results with a reworked version of the patch,
unfortunately it still has this issue. Here is a bnx2 dump, lspci
output and some Xen debug output (MSI and interrupt bindings, I have
more if needed).


You need debug-keys 'Q' as well to map between the PCI devices and Xen IRQs

~Andrew



I could have it after reboot:

(XEN) [2014-02-06 09:44:34] :02:00.0 - dom 0   - MSIs < 64 65 66 67 
68 69 >


So the relevant MSI informations:

(XEN) [2014-02-05 20:15:20]  MSI-X   64 vec=d7  fixed  edge   assert 
physcpu dest=0022 mask=1/0/0
(XEN) [2014-02-05 20:15:20]  MSI-X   65 vec=ba  fixed  edge   assert 
physcpu dest= mask=1/0/0
(XEN) [2014-02-05 20:15:20]  MSI-X   66 vec=92  fixed  edge   assert 
physcpu dest=0022 mask=1/0/0
(XEN) [2014-02-05 20:15:20]  MSI-X   67 vec=3a  fixed  edge   assert 
physcpu dest=0021 mask=1/0/0
(XEN) [2014-02-05 20:15:20]  MSI-X   68 vec=b8  fixed  edge   assert 
physcpu dest=0022 mask=1/0/0
(XEN) [2014-02-05 20:15:20]  MSI-X   69 vec=2a  fixed  edge   assert 
physcpu dest=0020 mask=1/1/1

...
(XEN) [2014-02-05 20:15:22]IRQ:  64 affinity:0004 vec:d7 
type=PCI-MSI/-X  status=0030 in-flight=0 domain-list=0:304(---),
(XEN) [2014-02-05 20:15:22]IRQ:  65 affinity:0100 vec:ba 
type=PCI-MSI/-X  status=0010 in-flight=0 domain-list=0:303(---),
(XEN) [2014-02-05 20:15:22]IRQ:  66 affinity:0004 vec:92 
type=PCI-MSI/-X  status=0010 in-flight=0 domain-list=0:302(---),
(XEN) [2014-02-05 20:15:22]IRQ:  67 affinity:0002 vec:3a 
type=PCI-MSI/-X  status=0010 in-flight=0 domain-list=0:301(---),
(XEN) [2014-02-05 20:15:22]IRQ:  68 affinity:0004 vec:b8 
type=PCI-MSI/-X  status=0030 in-flight=0 domain-list=0:300(---),
(XEN) [2014-02-05 20:15:22]IRQ:  69 affinity:0001 vec:2a 
type=PCI-MSI/-X  status=0002 mapped, unbound



Zoli
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: igb and bnx2: NETDEV WATCHDOG: transmit queue timed out when skb has huge linear buffer

2014-02-06 Thread Zoltan Kiss


On 05/02/14 20:43, Andrew Cooper wrote:

On 05/02/2014 20:23, Zoltan Kiss wrote:

On 04/02/14 19:47, Michael Chan wrote:

On Fri, 2014-01-31 at 14:29 +0100, Zoltan Kiss wrote:

[ 5417.275472] WARNING: at net/sched/sch_generic.c:255
dev_watchdog+0x156/0x1f0()
[ 5417.275474] NETDEV WATCHDOG: eth1 (bnx2): transmit queue 2 timed out


The dump shows an internal IRQ pending on MSIX vector 2 which matches
the the queue number that is timing out.  I don't know what happened to
the MSIX and why the driver is not seeing it.  Do you see an IRQ error
message from the kernel a few seconds before the tx timeout message?


I haven't seen any IRQ related error message. Note, this is on Xen
4.3.1. Now I have new results with a reworked version of the patch,
unfortunately it still has this issue. Here is a bnx2 dump, lspci
output and some Xen debug output (MSI and interrupt bindings, I have
more if needed).


You need debug-keys 'Q' as well to map between the PCI devices and Xen IRQs

~Andrew



I could have it after reboot:

(XEN) [2014-02-06 09:44:34] :02:00.0 - dom 0   - MSIs  64 65 66 67 
68 69 


So the relevant MSI informations:

(XEN) [2014-02-05 20:15:20]  MSI-X   64 vec=d7  fixed  edge   assert 
physcpu dest=0022 mask=1/0/0
(XEN) [2014-02-05 20:15:20]  MSI-X   65 vec=ba  fixed  edge   assert 
physcpu dest= mask=1/0/0
(XEN) [2014-02-05 20:15:20]  MSI-X   66 vec=92  fixed  edge   assert 
physcpu dest=0022 mask=1/0/0
(XEN) [2014-02-05 20:15:20]  MSI-X   67 vec=3a  fixed  edge   assert 
physcpu dest=0021 mask=1/0/0
(XEN) [2014-02-05 20:15:20]  MSI-X   68 vec=b8  fixed  edge   assert 
physcpu dest=0022 mask=1/0/0
(XEN) [2014-02-05 20:15:20]  MSI-X   69 vec=2a  fixed  edge   assert 
physcpu dest=0020 mask=1/1/1

...
(XEN) [2014-02-05 20:15:22]IRQ:  64 affinity:0004 vec:d7 
type=PCI-MSI/-X  status=0030 in-flight=0 domain-list=0:304(---),
(XEN) [2014-02-05 20:15:22]IRQ:  65 affinity:0100 vec:ba 
type=PCI-MSI/-X  status=0010 in-flight=0 domain-list=0:303(---),
(XEN) [2014-02-05 20:15:22]IRQ:  66 affinity:0004 vec:92 
type=PCI-MSI/-X  status=0010 in-flight=0 domain-list=0:302(---),
(XEN) [2014-02-05 20:15:22]IRQ:  67 affinity:0002 vec:3a 
type=PCI-MSI/-X  status=0010 in-flight=0 domain-list=0:301(---),
(XEN) [2014-02-05 20:15:22]IRQ:  68 affinity:0004 vec:b8 
type=PCI-MSI/-X  status=0030 in-flight=0 domain-list=0:300(---),
(XEN) [2014-02-05 20:15:22]IRQ:  69 affinity:0001 vec:2a 
type=PCI-MSI/-X  status=0002 mapped, unbound



Zoli
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: igb and bnx2: "NETDEV WATCHDOG: transmit queue timed out" when skb has huge linear buffer

2014-02-05 Thread Andrew Cooper

On 05/02/2014 20:23, Zoltan Kiss wrote:
> On 04/02/14 19:47, Michael Chan wrote:
>> On Fri, 2014-01-31 at 14:29 +0100, Zoltan Kiss wrote:
>>> [ 5417.275472] WARNING: at net/sched/sch_generic.c:255
>>> dev_watchdog+0x156/0x1f0()
>>> [ 5417.275474] NETDEV WATCHDOG: eth1 (bnx2): transmit queue 2 timed out
>>
>> The dump shows an internal IRQ pending on MSIX vector 2 which matches
>> the the queue number that is timing out.  I don't know what happened to
>> the MSIX and why the driver is not seeing it.  Do you see an IRQ error
>> message from the kernel a few seconds before the tx timeout message?
>
> I haven't seen any IRQ related error message. Note, this is on Xen
> 4.3.1. Now I have new results with a reworked version of the patch,
> unfortunately it still has this issue. Here is a bnx2 dump, lspci
> output and some Xen debug output (MSI and interrupt bindings, I have
> more if needed).

You need debug-keys 'Q' as well to map between the PCI devices and Xen IRQs

~Andrew
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: igb and bnx2: "NETDEV WATCHDOG: transmit queue timed out" when skb has huge linear buffer

2014-02-05 Thread Zoltan Kiss


On 05/02/14 20:23, Zoltan Kiss wrote:

On 04/02/14 19:47, Michael Chan wrote:

On Fri, 2014-01-31 at 14:29 +0100, Zoltan Kiss wrote:

[ 5417.275472] WARNING: at net/sched/sch_generic.c:255
dev_watchdog+0x156/0x1f0()
[ 5417.275474] NETDEV WATCHDOG: eth1 (bnx2): transmit queue 2 timed out


The dump shows an internal IRQ pending on MSIX vector 2 which matches
the the queue number that is timing out.  I don't know what happened to
the MSIX and why the driver is not seeing it.  Do you see an IRQ error
message from the kernel a few seconds before the tx timeout message?


I haven't seen any IRQ related error message. Note, this is on Xen
4.3.1. Now I have new results with a reworked version of the patch,
unfortunately it still has this issue. Here is a bnx2 dump, lspci output
and some Xen debug output (MSI and interrupt bindings, I have more if
needed).


And here is the watchdog message and the first dump, if it matters:

[10118.282007] [ cut here ]
[10118.282018] WARNING: at net/sched/sch_generic.c:255 
dev_watchdog+0x156/0x1f0()

[10118.282021] NETDEV WATCHDOG: eth0 (bnx2): transmit queue 0 timed out
[10118.282024] Modules linked in: tun nfsv3 nfs_acl rpcsec_gss_krb5 
auth_rpcgss oid_registry nfsv4 nfs fscache lockd sunrpc ipv6 openvswitch(O)
frag_ipv4 xt_state nf_conntrack xt_tcpudp iptable_filter ip_tables 
x_tables sr_mod cdrom nls_utf8 isofs dm_multipath scsi_dh dm_mod 
usb_storage
lk_helper cryptd lrw aes_i586 xts gf128mul coretemp microcode 
hid_generic lpc_ich mfd_core ehci_pci ehci_hcd i7core_edac edac_core 
bnx2 sg hed u

scsi_transport_sas raid_class scsi_mod
[10118.282083] CPU: 2 PID: 0 Comm: swapper/2 Tainted: G   O 
3.10.11-0.xs1.8.50.175.377583 #1
[10118.282086] Hardware name: Dell Inc. PowerEdge M710HD/05GGXD, BIOS 
2.0.0 01/31/2011
[10118.282089]  00ff ee0a5dd0 c1488cd3 ee0a5df8 c1046664 c1658a88 
ee0a5e24 00ff
[10118.282097]  c13fc1c6 c13fc1c6 ec778000  00256a1c ee0a5e10 
c1046723 0009
[10118.282104]  ee0a5e08 c1658a88 ee0a5e24 ee0a5e48 c13fc1c6 c16556e1 
00ff c1658a88

[10118.282112] Call Trace:
[10118.282118]  [] dump_stack+0x16/0x1b
[10118.282125]  [] warn_slowpath_common+0x64/0x80
[10118.282129]  [] ? dev_watchdog+0x156/0x1f0
[10118.282133]  [] ? dev_watchdog+0x156/0x1f0
[10118.282137]  [] warn_slowpath_fmt+0x33/0x40
[10118.282141]  [] dev_watchdog+0x156/0x1f0
[10118.282149]  [] call_timer_fn+0x3e/0xf0
[10118.282155]  [] ? xen_hypercall_sched_op+0x7/0x20
[10118.282159]  [] ? __netdev_watchdog_up+0x60/0x60
[10118.282164]  [] run_timer_softirq+0x1ab/0x210
[10118.282169]  [] ? irq_get_irq_data+0xd/0x10
[10118.282176]  [] ? info_for_irq+0xd/0x20
[10118.282180]  [] ? __netdev_watchdog_up+0x60/0x60
[10118.282184]  [] __do_softirq+0xc4/0x200
[10118.282189]  [] ? evtchn_fifo_handle_events+0xf6/0x120
[10118.282193]  [] irq_exit+0x3d/0x90
[10118.282198]  [] xen_evtchn_do_upcall+0x25/0x40
[10118.282203]  [] xen_do_upcall+0x7/0xc
[10118.282207]  [] ? xen_hypercall_sched_op+0x7/0x20
[10118.282213]  [] ? xen_safe_halt+0x12/0x20
[10118.282218]  [] default_idle+0x3f/0xb0
[10118.28]  [] arch_cpu_idle+0x17/0x30
[10118.282229]  [] cpu_startup_entry+0x141/0x1f0
[10118.282234]  [] cpu_bringup_and_idle+0x12/0x14
[10118.282237] ---[ end trace 25ed24391f6c7acd ]---
[10118.282242] bnx2 :02:00.0 eth0: <--- start FTQ dump --->
[10118.282267] bnx2 :02:00.0 eth0: RV2P_PFTQ_CTL 0001
[10118.282277] bnx2 :02:00.0 eth0: RV2P_TFTQ_CTL 0002
[10118.282288] bnx2 :02:00.0 eth0: RV2P_MFTQ_CTL 4000
[10118.282298] bnx2 :02:00.0 eth0: TBDR_FTQ_CTL 4002
[10118.282309] bnx2 :02:00.0 eth0: TDMA_FTQ_CTL 00010002
[10118.282319] bnx2 :02:00.0 eth0: TXP_FTQ_CTL 01810002
[10118.282330] bnx2 :02:00.0 eth0: TXP_FTQ_CTL 01810002
[10118.282340] bnx2 :02:00.0 eth0: TPAT_FTQ_CTL 00010002
[10118.282372] bnx2 :02:00.0 eth0: RXP_CFTQ_CTL 8000
[10118.282383] bnx2 :02:00.0 eth0: RXP_FTQ_CTL 0010
[10118.282392] bnx2 :02:00.0 eth0: COM_COMXQ_FTQ_CTL 0001
[10118.282403] bnx2 :02:00.0 eth0: COM_COMTQ_FTQ_CTL 0002
[10118.282414] bnx2 :02:00.0 eth0: COM_COMQ_FTQ_CTL 0001
[10118.282425] bnx2 :02:00.0 eth0: CP_CPQ_FTQ_CTL 4000
[10118.282434] bnx2 :02:00.0 eth0: CPU states:
[10118.282449] bnx2 :02:00.0 eth0: 045000 mode b84c state 80001000 
evt_mask 500 pc 8000844 pc 80012bc instr a0e00012
[10118.282471] bnx2 :02:00.0 eth0: 085000 mode b84c state 80001000 
evt_mask 500 pc 8000a50 pc 8000ac4 instr 38420001
[10118.282493] bnx2 :02:00.0 eth0: 0c5000 mode b84c state 80001000 
evt_mask 500 pc 8004c14 pc 8004c18 instr 32070001
[10118.282515] bnx2 :02:00.0 eth0: 105000 mode b8cc state 8000 
evt_mask 500 pc 8000a9c pc 8000b28 instr 8c53
[10118.282537] bnx2 :02:00.0 eth0: 145000 mode b880 state 8000 
evt_mask 500 pc 800d1a8 pc 800af74 instr 441010a
[10118.282560] bnx2 :02:00.0 eth0: 185000 mode b8cc state 8000 
evt_mask 500 pc 8000918 pc 8000928 instr 8f

Re: igb and bnx2: "NETDEV WATCHDOG: transmit queue timed out" when skb has huge linear buffer

2014-02-05 Thread Zoltan Kiss


On 04/02/14 19:47, Michael Chan wrote:

On Fri, 2014-01-31 at 14:29 +0100, Zoltan Kiss wrote:

[ 5417.275472] WARNING: at net/sched/sch_generic.c:255
dev_watchdog+0x156/0x1f0()
[ 5417.275474] NETDEV WATCHDOG: eth1 (bnx2): transmit queue 2 timed out


The dump shows an internal IRQ pending on MSIX vector 2 which matches
the the queue number that is timing out.  I don't know what happened to
the MSIX and why the driver is not seeing it.  Do you see an IRQ error
message from the kernel a few seconds before the tx timeout message?


I haven't seen any IRQ related error message. Note, this is on Xen 
4.3.1. Now I have new results with a reworked version of the patch, 
unfortunately it still has this issue. Here is a bnx2 dump, lspci output 
and some Xen debug output (MSI and interrupt bindings, I have more if 
needed).


[82099.288743] bnx2 :02:00.0 eth0: <--- start FTQ dump --->
[82099.288767] bnx2 :02:00.0 eth0: RV2P_PFTQ_CTL 00010002
[82099.288779] bnx2 :02:00.0 eth0: RV2P_TFTQ_CTL 0002
[82099.288790] bnx2 :02:00.0 eth0: RV2P_MFTQ_CTL 4000
[82099.288801] bnx2 :02:00.0 eth0: TBDR_FTQ_CTL 00404002
[82099.288812] bnx2 :02:00.0 eth0: TDMA_FTQ_CTL 00010002
[82099.288823] bnx2 :02:00.0 eth0: TXP_FTQ_CTL 00810002
[82099.288834] bnx2 :02:00.0 eth0: TXP_FTQ_CTL 01010002
[82099.288845] bnx2 :02:00.0 eth0: TPAT_FTQ_CTL 00010002
[82099.288878] bnx2 :02:00.0 eth0: RXP_CFTQ_CTL 8000
[82099.29] bnx2 :02:00.0 eth0: RXP_FTQ_CTL 0012
[82099.288899] bnx2 :02:00.0 eth0: COM_COMXQ_FTQ_CTL 0001
[82099.288911] bnx2 :02:00.0 eth0: COM_COMTQ_FTQ_CTL 0002
[82099.288923] bnx2 :02:00.0 eth0: COM_COMQ_FTQ_CTL 0001
[82099.288934] bnx2 :02:00.0 eth0: CP_CPQ_FTQ_CTL 4000
[82099.288944] bnx2 :02:00.0 eth0: CPU states:
[82099.288960] bnx2 :02:00.0 eth0: 045000 mode b84c state 80005000 
evt_mask 500 pc 8001284 pc 8000cb8 instr 35690100
[82099.288984] bnx2 :02:00.0 eth0: 085000 mode b84c state 80001000 
evt_mask 500 pc 8000a58 pc 8000a4c instr 38420001
[82099.289007] bnx2 :02:00.0 eth0: 0c5000 mode b84c state 80001000 
evt_mask 500 pc 8004c14 pc 8004c14 instr 32050003
[82099.289030] bnx2 :02:00.0 eth0: 105000 mode b8cc state 8000 
evt_mask 500 pc 8000a94 pc 8000a94 instr 8c420020
[82099.289063] bnx2 :02:00.0 eth0: 145000 mode b880 state 8000 
evt_mask 500 pc 800d244 pc 8008aac instr 8c46
[82099.289087] bnx2 :02:00.0 eth0: 185000 mode b8cc state 8000 
evt_mask 500 pc 8000c6c pc 8000c6c instr 3c056000

[82099.289103] bnx2 :02:00.0 eth0: <--- end FTQ dump --->
[82099.289112] bnx2 :02:00.0 eth0: <--- start TBDC dump --->
[82099.289124] bnx2 :02:00.0 eth0: TBDC free cnt: 31
[82099.289133] bnx2 :02:00.0 eth0: LINE CID  BIDX   CMD  VALIDS
[82099.289148] bnx2 :02:00.0 eth0: 00000800  a3b8   00[1]
[82099.289163] bnx2 :02:00.0 eth0: 01001100  1b58   00[0]
[82099.289178] bnx2 :02:00.0 eth0: 02000800  a390   00[0]
[82099.289193] bnx2 :02:00.0 eth0: 03000800  a370   00[0]
[82099.289217] bnx2 :02:00.0 eth0: 04000800  a378   00[0]
[82099.289232] bnx2 :02:00.0 eth0: 05000800  a388   00[0]
[82099.289247] bnx2 :02:00.0 eth0: 06000800  a398   00[0]
[82099.289262] bnx2 :02:00.0 eth0: 07000800  a3a8   00[0]
[82099.289277] bnx2 :02:00.0 eth0: 08000800  a3b0   00[0]
[82099.289291] bnx2 :02:00.0 eth0: 09000800  a3b8   00[0]
[82099.289306] bnx2 :02:00.0 eth0: 0a000800  8c10   00[0]
[82099.289321] bnx2 :02:00.0 eth0: 0b000800  eaf0   00[0]
[82099.289336] bnx2 :02:00.0 eth0: 0c000800  eaf8   00[0]
[82099.289351] bnx2 :02:00.0 eth0: 0d001100  5e60   00[0]
[82099.289365] bnx2 :02:00.0 eth0: 0e001100  5e68   00[0]
[82099.289380] bnx2 :02:00.0 eth0: 0f001100  5e70   00[0]
[82099.289395] bnx2 :02:00.0 eth0: 10001100  5e88   00[0]
[82099.289410] bnx2 :02:00.0 eth0: 11001100  5e90   00[0]
[82099.289425] bnx2 :02:00.0 eth0: 12001100  5ee8   00[0]
[82099.289440] bnx2 :02:00.0 eth0: 13001100  5ef8   00[0]
[82099.289454] bnx2 :02:00.0 eth0: 14001100  5e00   00[0]
[82099.289470] bnx2 :02:00.0 eth0: 15001100  5a20   00[0]
[82099.289485] bnx2 :02:00.0 eth0: 16001100  59a8   00[0]
[82099.289499] bnx2 :02:00.0 eth0: 17001100  59b0   00[0]
[82099.289514] bnx2 :02:00.0 eth0: 18001100  59b8   00[0]
[82099.289529] bnx2 :02:00.0 eth0: 19001100  5a28   00[0]
[82099.289544] bnx2 :02:00.0 eth0: 1a001100  5a30   00[0]
[82099.289559] bnx2 :02:00.0 eth0: 1b000800  8c58   00[0]
[82099.289573] bnx2 :02:00.0 eth0: 1c000800  8c60   00[0]
[82099.289588] bnx2 :02:00.0 eth0: 1d055e80  dca8   fb[0]
[82099.289603] bnx2 :02:00.0 eth0: 1e1cf780  f7b8   af[0]
[82099

Re: igb and bnx2: NETDEV WATCHDOG: transmit queue timed out when skb has huge linear buffer

2014-02-05 Thread Zoltan Kiss


On 04/02/14 19:47, Michael Chan wrote:

On Fri, 2014-01-31 at 14:29 +0100, Zoltan Kiss wrote:

[ 5417.275472] WARNING: at net/sched/sch_generic.c:255
dev_watchdog+0x156/0x1f0()
[ 5417.275474] NETDEV WATCHDOG: eth1 (bnx2): transmit queue 2 timed out


The dump shows an internal IRQ pending on MSIX vector 2 which matches
the the queue number that is timing out.  I don't know what happened to
the MSIX and why the driver is not seeing it.  Do you see an IRQ error
message from the kernel a few seconds before the tx timeout message?


I haven't seen any IRQ related error message. Note, this is on Xen 
4.3.1. Now I have new results with a reworked version of the patch, 
unfortunately it still has this issue. Here is a bnx2 dump, lspci output 
and some Xen debug output (MSI and interrupt bindings, I have more if 
needed).


[82099.288743] bnx2 :02:00.0 eth0: --- start FTQ dump ---
[82099.288767] bnx2 :02:00.0 eth0: RV2P_PFTQ_CTL 00010002
[82099.288779] bnx2 :02:00.0 eth0: RV2P_TFTQ_CTL 0002
[82099.288790] bnx2 :02:00.0 eth0: RV2P_MFTQ_CTL 4000
[82099.288801] bnx2 :02:00.0 eth0: TBDR_FTQ_CTL 00404002
[82099.288812] bnx2 :02:00.0 eth0: TDMA_FTQ_CTL 00010002
[82099.288823] bnx2 :02:00.0 eth0: TXP_FTQ_CTL 00810002
[82099.288834] bnx2 :02:00.0 eth0: TXP_FTQ_CTL 01010002
[82099.288845] bnx2 :02:00.0 eth0: TPAT_FTQ_CTL 00010002
[82099.288878] bnx2 :02:00.0 eth0: RXP_CFTQ_CTL 8000
[82099.29] bnx2 :02:00.0 eth0: RXP_FTQ_CTL 0012
[82099.288899] bnx2 :02:00.0 eth0: COM_COMXQ_FTQ_CTL 0001
[82099.288911] bnx2 :02:00.0 eth0: COM_COMTQ_FTQ_CTL 0002
[82099.288923] bnx2 :02:00.0 eth0: COM_COMQ_FTQ_CTL 0001
[82099.288934] bnx2 :02:00.0 eth0: CP_CPQ_FTQ_CTL 4000
[82099.288944] bnx2 :02:00.0 eth0: CPU states:
[82099.288960] bnx2 :02:00.0 eth0: 045000 mode b84c state 80005000 
evt_mask 500 pc 8001284 pc 8000cb8 instr 35690100
[82099.288984] bnx2 :02:00.0 eth0: 085000 mode b84c state 80001000 
evt_mask 500 pc 8000a58 pc 8000a4c instr 38420001
[82099.289007] bnx2 :02:00.0 eth0: 0c5000 mode b84c state 80001000 
evt_mask 500 pc 8004c14 pc 8004c14 instr 32050003
[82099.289030] bnx2 :02:00.0 eth0: 105000 mode b8cc state 8000 
evt_mask 500 pc 8000a94 pc 8000a94 instr 8c420020
[82099.289063] bnx2 :02:00.0 eth0: 145000 mode b880 state 8000 
evt_mask 500 pc 800d244 pc 8008aac instr 8c46
[82099.289087] bnx2 :02:00.0 eth0: 185000 mode b8cc state 8000 
evt_mask 500 pc 8000c6c pc 8000c6c instr 3c056000

[82099.289103] bnx2 :02:00.0 eth0: --- end FTQ dump ---
[82099.289112] bnx2 :02:00.0 eth0: --- start TBDC dump ---
[82099.289124] bnx2 :02:00.0 eth0: TBDC free cnt: 31
[82099.289133] bnx2 :02:00.0 eth0: LINE CID  BIDX   CMD  VALIDS
[82099.289148] bnx2 :02:00.0 eth0: 00000800  a3b8   00[1]
[82099.289163] bnx2 :02:00.0 eth0: 01001100  1b58   00[0]
[82099.289178] bnx2 :02:00.0 eth0: 02000800  a390   00[0]
[82099.289193] bnx2 :02:00.0 eth0: 03000800  a370   00[0]
[82099.289217] bnx2 :02:00.0 eth0: 04000800  a378   00[0]
[82099.289232] bnx2 :02:00.0 eth0: 05000800  a388   00[0]
[82099.289247] bnx2 :02:00.0 eth0: 06000800  a398   00[0]
[82099.289262] bnx2 :02:00.0 eth0: 07000800  a3a8   00[0]
[82099.289277] bnx2 :02:00.0 eth0: 08000800  a3b0   00[0]
[82099.289291] bnx2 :02:00.0 eth0: 09000800  a3b8   00[0]
[82099.289306] bnx2 :02:00.0 eth0: 0a000800  8c10   00[0]
[82099.289321] bnx2 :02:00.0 eth0: 0b000800  eaf0   00[0]
[82099.289336] bnx2 :02:00.0 eth0: 0c000800  eaf8   00[0]
[82099.289351] bnx2 :02:00.0 eth0: 0d001100  5e60   00[0]
[82099.289365] bnx2 :02:00.0 eth0: 0e001100  5e68   00[0]
[82099.289380] bnx2 :02:00.0 eth0: 0f001100  5e70   00[0]
[82099.289395] bnx2 :02:00.0 eth0: 10001100  5e88   00[0]
[82099.289410] bnx2 :02:00.0 eth0: 11001100  5e90   00[0]
[82099.289425] bnx2 :02:00.0 eth0: 12001100  5ee8   00[0]
[82099.289440] bnx2 :02:00.0 eth0: 13001100  5ef8   00[0]
[82099.289454] bnx2 :02:00.0 eth0: 14001100  5e00   00[0]
[82099.289470] bnx2 :02:00.0 eth0: 15001100  5a20   00[0]
[82099.289485] bnx2 :02:00.0 eth0: 16001100  59a8   00[0]
[82099.289499] bnx2 :02:00.0 eth0: 17001100  59b0   00[0]
[82099.289514] bnx2 :02:00.0 eth0: 18001100  59b8   00[0]
[82099.289529] bnx2 :02:00.0 eth0: 19001100  5a28   00[0]
[82099.289544] bnx2 :02:00.0 eth0: 1a001100  5a30   00[0]
[82099.289559] bnx2 :02:00.0 eth0: 1b000800  8c58   00[0]
[82099.289573] bnx2 :02:00.0 eth0: 1c000800  8c60   00[0]
[82099.289588] bnx2 :02:00.0 eth0: 1d055e80  dca8   fb[0]
[82099.289603] bnx2 :02:00.0 eth0: 1e1cf780  f7b8   af[0]
[82099.289618] bnx2 :02

Re: igb and bnx2: NETDEV WATCHDOG: transmit queue timed out when skb has huge linear buffer

2014-02-05 Thread Zoltan Kiss


On 05/02/14 20:23, Zoltan Kiss wrote:

On 04/02/14 19:47, Michael Chan wrote:

On Fri, 2014-01-31 at 14:29 +0100, Zoltan Kiss wrote:

[ 5417.275472] WARNING: at net/sched/sch_generic.c:255
dev_watchdog+0x156/0x1f0()
[ 5417.275474] NETDEV WATCHDOG: eth1 (bnx2): transmit queue 2 timed out


The dump shows an internal IRQ pending on MSIX vector 2 which matches
the the queue number that is timing out.  I don't know what happened to
the MSIX and why the driver is not seeing it.  Do you see an IRQ error
message from the kernel a few seconds before the tx timeout message?


I haven't seen any IRQ related error message. Note, this is on Xen
4.3.1. Now I have new results with a reworked version of the patch,
unfortunately it still has this issue. Here is a bnx2 dump, lspci output
and some Xen debug output (MSI and interrupt bindings, I have more if
needed).


And here is the watchdog message and the first dump, if it matters:

[10118.282007] [ cut here ]
[10118.282018] WARNING: at net/sched/sch_generic.c:255 
dev_watchdog+0x156/0x1f0()

[10118.282021] NETDEV WATCHDOG: eth0 (bnx2): transmit queue 0 timed out
[10118.282024] Modules linked in: tun nfsv3 nfs_acl rpcsec_gss_krb5 
auth_rpcgss oid_registry nfsv4 nfs fscache lockd sunrpc ipv6 openvswitch(O)
frag_ipv4 xt_state nf_conntrack xt_tcpudp iptable_filter ip_tables 
x_tables sr_mod cdrom nls_utf8 isofs dm_multipath scsi_dh dm_mod 
usb_storage
lk_helper cryptd lrw aes_i586 xts gf128mul coretemp microcode 
hid_generic lpc_ich mfd_core ehci_pci ehci_hcd i7core_edac edac_core 
bnx2 sg hed u

scsi_transport_sas raid_class scsi_mod
[10118.282083] CPU: 2 PID: 0 Comm: swapper/2 Tainted: G   O 
3.10.11-0.xs1.8.50.175.377583 #1
[10118.282086] Hardware name: Dell Inc. PowerEdge M710HD/05GGXD, BIOS 
2.0.0 01/31/2011
[10118.282089]  00ff ee0a5dd0 c1488cd3 ee0a5df8 c1046664 c1658a88 
ee0a5e24 00ff
[10118.282097]  c13fc1c6 c13fc1c6 ec778000  00256a1c ee0a5e10 
c1046723 0009
[10118.282104]  ee0a5e08 c1658a88 ee0a5e24 ee0a5e48 c13fc1c6 c16556e1 
00ff c1658a88

[10118.282112] Call Trace:
[10118.282118]  [c1488cd3] dump_stack+0x16/0x1b
[10118.282125]  [c1046664] warn_slowpath_common+0x64/0x80
[10118.282129]  [c13fc1c6] ? dev_watchdog+0x156/0x1f0
[10118.282133]  [c13fc1c6] ? dev_watchdog+0x156/0x1f0
[10118.282137]  [c1046723] warn_slowpath_fmt+0x33/0x40
[10118.282141]  [c13fc1c6] dev_watchdog+0x156/0x1f0
[10118.282149]  [c10549ce] call_timer_fn+0x3e/0xf0
[10118.282155]  [c10013a7] ? xen_hypercall_sched_op+0x7/0x20
[10118.282159]  [c13fc070] ? __netdev_watchdog_up+0x60/0x60
[10118.282164]  [c1055c1b] run_timer_softirq+0x1ab/0x210
[10118.282169]  [c10be4fd] ? irq_get_irq_data+0xd/0x10
[10118.282176]  [c130fb2d] ? info_for_irq+0xd/0x20
[10118.282180]  [c13fc070] ? __netdev_watchdog_up+0x60/0x60
[10118.282184]  [c104e3f4] __do_softirq+0xc4/0x200
[10118.282189]  [c1312316] ? evtchn_fifo_handle_events+0xf6/0x120
[10118.282193]  [c104e5bd] irq_exit+0x3d/0x90
[10118.282198]  [c130fe55] xen_evtchn_do_upcall+0x25/0x40
[10118.282203]  [c14935c7] xen_do_upcall+0x7/0xc
[10118.282207]  [c10013a7] ? xen_hypercall_sched_op+0x7/0x20
[10118.282213]  [c1007f12] ? xen_safe_halt+0x12/0x20
[10118.282218]  [c1015eff] default_idle+0x3f/0xb0
[10118.28]  [c1015a17] arch_cpu_idle+0x17/0x30
[10118.282229]  [c108f591] cpu_startup_entry+0x141/0x1f0
[10118.282234]  [c147d11b] cpu_bringup_and_idle+0x12/0x14
[10118.282237] ---[ end trace 25ed24391f6c7acd ]---
[10118.282242] bnx2 :02:00.0 eth0: --- start FTQ dump ---
[10118.282267] bnx2 :02:00.0 eth0: RV2P_PFTQ_CTL 0001
[10118.282277] bnx2 :02:00.0 eth0: RV2P_TFTQ_CTL 0002
[10118.282288] bnx2 :02:00.0 eth0: RV2P_MFTQ_CTL 4000
[10118.282298] bnx2 :02:00.0 eth0: TBDR_FTQ_CTL 4002
[10118.282309] bnx2 :02:00.0 eth0: TDMA_FTQ_CTL 00010002
[10118.282319] bnx2 :02:00.0 eth0: TXP_FTQ_CTL 01810002
[10118.282330] bnx2 :02:00.0 eth0: TXP_FTQ_CTL 01810002
[10118.282340] bnx2 :02:00.0 eth0: TPAT_FTQ_CTL 00010002
[10118.282372] bnx2 :02:00.0 eth0: RXP_CFTQ_CTL 8000
[10118.282383] bnx2 :02:00.0 eth0: RXP_FTQ_CTL 0010
[10118.282392] bnx2 :02:00.0 eth0: COM_COMXQ_FTQ_CTL 0001
[10118.282403] bnx2 :02:00.0 eth0: COM_COMTQ_FTQ_CTL 0002
[10118.282414] bnx2 :02:00.0 eth0: COM_COMQ_FTQ_CTL 0001
[10118.282425] bnx2 :02:00.0 eth0: CP_CPQ_FTQ_CTL 4000
[10118.282434] bnx2 :02:00.0 eth0: CPU states:
[10118.282449] bnx2 :02:00.0 eth0: 045000 mode b84c state 80001000 
evt_mask 500 pc 8000844 pc 80012bc instr a0e00012
[10118.282471] bnx2 :02:00.0 eth0: 085000 mode b84c state 80001000 
evt_mask 500 pc 8000a50 pc 8000ac4 instr 38420001
[10118.282493] bnx2 :02:00.0 eth0: 0c5000 mode b84c state 80001000 
evt_mask 500 pc 8004c14 pc 8004c18 instr 32070001
[10118.282515] bnx2 :02:00.0 eth0: 105000 mode b8cc state 8000 
evt_mask 500 pc 8000a9c pc 8000b28 instr 8c53
[10118.282537] bnx2 :02:00.0 eth0: 145000 mode b880

Re: igb and bnx2: NETDEV WATCHDOG: transmit queue timed out when skb has huge linear buffer

2014-02-05 Thread Andrew Cooper

On 05/02/2014 20:23, Zoltan Kiss wrote:
 On 04/02/14 19:47, Michael Chan wrote:
 On Fri, 2014-01-31 at 14:29 +0100, Zoltan Kiss wrote:
 [ 5417.275472] WARNING: at net/sched/sch_generic.c:255
 dev_watchdog+0x156/0x1f0()
 [ 5417.275474] NETDEV WATCHDOG: eth1 (bnx2): transmit queue 2 timed out

 The dump shows an internal IRQ pending on MSIX vector 2 which matches
 the the queue number that is timing out.  I don't know what happened to
 the MSIX and why the driver is not seeing it.  Do you see an IRQ error
 message from the kernel a few seconds before the tx timeout message?

 I haven't seen any IRQ related error message. Note, this is on Xen
 4.3.1. Now I have new results with a reworked version of the patch,
 unfortunately it still has this issue. Here is a bnx2 dump, lspci
 output and some Xen debug output (MSI and interrupt bindings, I have
 more if needed).

You need debug-keys 'Q' as well to map between the PCI devices and Xen IRQs

~Andrew
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: igb and bnx2: "NETDEV WATCHDOG: transmit queue timed out" when skb has huge linear buffer

2014-02-04 Thread Zoltan Kiss


On 31/01/14 18:56, Wei Liu wrote:

On Thu, Jan 30, 2014 at 07:08:11PM +, Zoltan Kiss wrote:

Hi,

I've experienced some queue timeout problems mentioned in the
subject with igb and bnx2 cards. I haven't seen them on other cards
so far. I'm using XenServer with 3.10 Dom0 kernel (however igb were
already updated to latest version), and there are Windows guests
sending data through these cards. I noticed these problems in XenRT
test runs, and I know that they usually mean some lost interrupt
problem or other hardware error, but in my case they started to
appear more often, and they are likely connected to my netback grant
mapping patches. These patches causing skb's with huge (~64kb)
linear buffers to appear more often.
The reason for that is an old problem in the ring protocol:
originally the maximum amount of slots were linked to MAX_SKB_FRAGS,
as every slot ended up as a frag of the skb. When this value were
changed, netback had to cope with the situation by coalescing the
packets into fewer frags.
My patch series take a different approach: the leftover slots
(pages) were assigned to a new skb's frags, and that skb were
stashed to the frag_list of the first one. Then, before sending it
off to the stack it calls skb = skb_copy_expand(skb, 0, 0,
GFP_ATOMIC, __GFP_NOWARN), which basically creates a new skb and
copied all the data into it. As far as I understood, it put
everything into the linear buffer, which can amount to 64KB at most.
The original skb are freed then, and this new one were sent to the
stack.


Just my two cents, if it is this case, you can try to call
skb_copy_expand on every SKB netback receives to manually create SKBs
with ~64KB linear buffer to see how it goes...


I've tried it, and it did break everything in a similar way, so that's a 
strong clue that the problem lies here. I've rewrote that part of my 
patches to do less modification, based on Malcolm's idea: netback pulls 
the first frag into linear buffer, then moves a frag from the frag_list 
skb into the first one. That seems to help, but so far I have only one 
relevant test result, I'm waiting for more results.


Zoli

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: igb and bnx2: "NETDEV WATCHDOG: transmit queue timed out" when skb has huge linear buffer

2014-02-04 Thread Michael Chan

On Fri, 2014-01-31 at 14:29 +0100, Zoltan Kiss wrote: 
> [ 5417.275472] WARNING: at net/sched/sch_generic.c:255 
> dev_watchdog+0x156/0x1f0()
> [ 5417.275474] NETDEV WATCHDOG: eth1 (bnx2): transmit queue 2 timed out 

The dump shows an internal IRQ pending on MSIX vector 2 which matches
the the queue number that is timing out.  I don't know what happened to
the MSIX and why the driver is not seeing it.  Do you see an IRQ error
message from the kernel a few seconds before the tx timeout message?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: igb and bnx2: NETDEV WATCHDOG: transmit queue timed out when skb has huge linear buffer

2014-02-04 Thread Michael Chan

On Fri, 2014-01-31 at 14:29 +0100, Zoltan Kiss wrote: 
 [ 5417.275472] WARNING: at net/sched/sch_generic.c:255 
 dev_watchdog+0x156/0x1f0()
 [ 5417.275474] NETDEV WATCHDOG: eth1 (bnx2): transmit queue 2 timed out 

The dump shows an internal IRQ pending on MSIX vector 2 which matches
the the queue number that is timing out.  I don't know what happened to
the MSIX and why the driver is not seeing it.  Do you see an IRQ error
message from the kernel a few seconds before the tx timeout message?

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: igb and bnx2: NETDEV WATCHDOG: transmit queue timed out when skb has huge linear buffer

2014-02-04 Thread Zoltan Kiss


On 31/01/14 18:56, Wei Liu wrote:

On Thu, Jan 30, 2014 at 07:08:11PM +, Zoltan Kiss wrote:

Hi,

I've experienced some queue timeout problems mentioned in the
subject with igb and bnx2 cards. I haven't seen them on other cards
so far. I'm using XenServer with 3.10 Dom0 kernel (however igb were
already updated to latest version), and there are Windows guests
sending data through these cards. I noticed these problems in XenRT
test runs, and I know that they usually mean some lost interrupt
problem or other hardware error, but in my case they started to
appear more often, and they are likely connected to my netback grant
mapping patches. These patches causing skb's with huge (~64kb)
linear buffers to appear more often.
The reason for that is an old problem in the ring protocol:
originally the maximum amount of slots were linked to MAX_SKB_FRAGS,
as every slot ended up as a frag of the skb. When this value were
changed, netback had to cope with the situation by coalescing the
packets into fewer frags.
My patch series take a different approach: the leftover slots
(pages) were assigned to a new skb's frags, and that skb were
stashed to the frag_list of the first one. Then, before sending it
off to the stack it calls skb = skb_copy_expand(skb, 0, 0,
GFP_ATOMIC, __GFP_NOWARN), which basically creates a new skb and
copied all the data into it. As far as I understood, it put
everything into the linear buffer, which can amount to 64KB at most.
The original skb are freed then, and this new one were sent to the
stack.


Just my two cents, if it is this case, you can try to call
skb_copy_expand on every SKB netback receives to manually create SKBs
with ~64KB linear buffer to see how it goes...


I've tried it, and it did break everything in a similar way, so that's a 
strong clue that the problem lies here. I've rewrote that part of my 
patches to do less modification, based on Malcolm's idea: netback pulls 
the first frag into linear buffer, then moves a frag from the frag_list 
skb into the first one. That seems to help, but so far I have only one 
relevant test result, I'm waiting for more results.


Zoli

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: igb and bnx2: "NETDEV WATCHDOG: transmit queue timed out" when skb has huge linear buffer

2014-01-31 Thread Wei Liu

On Thu, Jan 30, 2014 at 07:08:11PM +, Zoltan Kiss wrote:
> Hi,
> 
> I've experienced some queue timeout problems mentioned in the
> subject with igb and bnx2 cards. I haven't seen them on other cards
> so far. I'm using XenServer with 3.10 Dom0 kernel (however igb were
> already updated to latest version), and there are Windows guests
> sending data through these cards. I noticed these problems in XenRT
> test runs, and I know that they usually mean some lost interrupt
> problem or other hardware error, but in my case they started to
> appear more often, and they are likely connected to my netback grant
> mapping patches. These patches causing skb's with huge (~64kb)
> linear buffers to appear more often.
> The reason for that is an old problem in the ring protocol:
> originally the maximum amount of slots were linked to MAX_SKB_FRAGS,
> as every slot ended up as a frag of the skb. When this value were
> changed, netback had to cope with the situation by coalescing the
> packets into fewer frags.
> My patch series take a different approach: the leftover slots
> (pages) were assigned to a new skb's frags, and that skb were
> stashed to the frag_list of the first one. Then, before sending it
> off to the stack it calls skb = skb_copy_expand(skb, 0, 0,
> GFP_ATOMIC, __GFP_NOWARN), which basically creates a new skb and
> copied all the data into it. As far as I understood, it put
> everything into the linear buffer, which can amount to 64KB at most.
> The original skb are freed then, and this new one were sent to the
> stack.

Just my two cents, if it is this case, you can try to call
skb_copy_expand on every SKB netback receives to manually create SKBs
with ~64KB linear buffer to see how it goes...

Wei.

> I suspect that this is the problem as it only happens when guests
> send too much slots. Does anyone familiar with these drivers have
> seen such issue before? (when these kind of skb's get stucked in the
> queue)
> 
> Regards,
> 
> Zoltan Kiss
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: igb and bnx2: "NETDEV WATCHDOG: transmit queue timed out" when skb has huge linear buffer

2014-01-31 Thread Zoltan Kiss


On 30/01/14 21:34, Michael Chan wrote:

On Thu, 2014-01-30 at 19:08 +, Zoltan Kiss wrote:

I've experienced some queue timeout problems mentioned in the subject
with igb and bnx2 cards.

Please provide the full tx timeout dmesg.  bnx2 dumps some diagnostic
information during tx timeout that may be useful.  Thanks.

Hi,

Here is some:

[ 5417.275463] [ cut here ]
[ 5417.275472] WARNING: at net/sched/sch_generic.c:255 
dev_watchdog+0x156/0x1f0()

[ 5417.275474] NETDEV WATCHDOG: eth1 (bnx2): transmit queue 2 timed out
[ 5417.275476] Modules linked in: tun nfsv3 nfs_acl rpcsec_gss_krb5 
auth_rpcgss oid_registry nfsv4 nfs fscache lockd sunrpc ipv6 
openvswitch(O) ipt_REJECT nf_conntrack_ipv
rack xt_tcpudp iptable_filter ip_tables x_tables nls_utf8 isofs 
dm_multipath scsi_dh dm_mod dcdbas coretemp microcode psmouse serio_raw 
lpc_ich mfd_core hid_generic ehci_p
sg hed bnx2 usbhid hid sr_mod cdrom sd_mod pata_acpi ata_generic 
ata_piix libata uhci_hcd mptsas mptscsih mptbase scsi_transport_sas scsi_mod
[ 5417.275517] CPU: 0 PID: 3 Comm: ksoftirqd/0 Tainted: G O 
3.10.11-0.xs1.8.50.170.377582 #1
[ 5417.275518] Hardware name: Dell Inc. PowerEdge R710/00W9X3, BIOS 
1.2.6 07/17/2009
[ 5417.275520]  00ff f008be08 c1488c53 f008be30 c1046664 c1658a88 
f008be5c 00ff
[ 5417.275525]  c13fc146 c13fc146 ee96a000 0002 00137d44 f008be48 
c1046723 0009
[ 5417.275530]  f008be40 c1658a88 f008be5c f008be80 c13fc146 c16556e1 
00ff c1658a88

[ 5417.275535] Call Trace:
[ 5417.275539]  [] dump_stack+0x16/0x1b
[ 5417.275544]  [] warn_slowpath_common+0x64/0x80
[ 5417.275546]  [] ? dev_watchdog+0x156/0x1f0
[ 5417.275549]  [] ? dev_watchdog+0x156/0x1f0
[ 5417.275551]  [] warn_slowpath_fmt+0x33/0x40
[ 5417.275554]  [] dev_watchdog+0x156/0x1f0
[ 5417.275559]  [] call_timer_fn+0x3e/0xf0
[ 5417.275563]  [] ? finish_task_switch+0x4e/0xb0
[ 5417.275565]  [] ? __netdev_watchdog_up+0x60/0x60
[ 5417.275568]  [] run_timer_softirq+0x1ab/0x210
[ 5417.275571]  [] ? __netdev_watchdog_up+0x60/0x60
[ 5417.275574]  [] __do_softirq+0xc4/0x200
[ 5417.275577]  [] ? xen_do_upcall+0x7/0xc
[ 5417.275579]  [] run_ksoftirqd+0x20/0x50
[ 5417.275582]  [] smpboot_thread_fn+0x142/0x150
[ 5417.275586]  [] kthread+0x9b/0xa0
[ 5417.275589]  [] ? smpboot_create_threads+0x60/0x60
[ 5417.275591]  [] ? cpu_rt_runtime_read+0x40/0x80
[ 5417.275594]  [] ret_from_kernel_thread+0x1b/0x28
[ 5417.275596]  [] ? kthread_freezable_should_stop+0x60/0x60
[ 5417.275599] ---[ end trace 691f572d388226ca ]---
[ 5417.275602] bnx2 :01:00.1 eth1: <--- start FTQ dump --->
[ 5417.275622] bnx2 :01:00.1 eth1: RV2P_PFTQ_CTL 0001
[ 5417.275629] bnx2 :01:00.1 eth1: RV2P_TFTQ_CTL 0002
[ 5417.275636] bnx2 :01:00.1 eth1: RV2P_MFTQ_CTL 4000
[ 5417.275643] bnx2 :01:00.1 eth1: TBDR_FTQ_CTL 4002
[ 5417.275650] bnx2 :01:00.1 eth1: TDMA_FTQ_CTL 00010002
[ 5417.275657] bnx2 :01:00.1 eth1: TXP_FTQ_CTL 0001
[ 5417.275663] bnx2 :01:00.1 eth1: TXP_FTQ_CTL 0001
[ 5417.275670] bnx2 :01:00.1 eth1: TPAT_FTQ_CTL 0001
[ 5417.275677] bnx2 :01:00.1 eth1: RXP_CFTQ_CTL 8000
[ 5417.275684] bnx2 :01:00.1 eth1: RXP_FTQ_CTL 0010
[ 5417.275690] bnx2 :01:00.1 eth1: COM_COMXQ_FTQ_CTL 0001
[ 5417.275698] bnx2 :01:00.1 eth1: COM_COMTQ_FTQ_CTL 0002
[ 5417.275705] bnx2 :01:00.1 eth1: COM_COMQ_FTQ_CTL 0001
[ 5417.275712] bnx2 :01:00.1 eth1: CP_CPQ_FTQ_CTL 4000
[ 5417.275718] bnx2 :01:00.1 eth1: CPU states:
[ 5417.275730] bnx2 :01:00.1 eth1: 045000 mode b84c state 80001000 
evt_mask 500 pc 8001284 pc 8001284 instr 1440fffc
[ 5417.275746] bnx2 :01:00.1 eth1: 085000 mode b84c state 80005000 
evt_mask 500 pc 8000a54 pc 8000a5c instr 10400016
[ 5417.275785] bnx2 :01:00.1 eth1: 0c5000 mode b84c state 80001000 
evt_mask 500 pc 8004c20 pc 8004c20 instr 32050003
[ 5417.275801] bnx2 :01:00.1 eth1: 105000 mode b8cc state 8000 
evt_mask 500 pc 8000a8c pc 8000a94 instr 8c420020
[ 5417.275817] bnx2 :01:00.1 eth1: 145000 mode b880 state 8000 
evt_mask 500 pc 8000ab0 pc 800d1e8 instr 27bd0020
[ 5417.275834] bnx2 :01:00.1 eth1: 185000 mode b8cc state 8000 
evt_mask 500 pc 8000cb0 pc 8000930 instr 8ce800e8

[ 5417.275845] bnx2 :01:00.1 eth1: <--- end FTQ dump --->
[ 5417.275851] bnx2 :01:00.1 eth1: <--- start TBDC dump --->
[ 5417.275858] bnx2 :01:00.1 eth1: TBDC free cnt: 32
[ 5417.275864] bnx2 :01:00.1 eth1: LINE CID  BIDX   CMD VALIDS
[ 5417.275875] bnx2 :01:00.1 eth1: 00001080  17c8   00 [0]
[ 5417.275886] bnx2 :01:00.1 eth1: 01001080  17e0   00 [0]
[ 5417.275897] bnx2 :01:00.1 eth1: 02001080  17e8   00 [0]
[ 5417.275907] bnx2 :01:00.1 eth1: 03001080  17f8   00 [0]
[ 5417.275918] bnx2 :01:00.1 eth1: 04001080  1800   00 [0]
[ 5417.275929] bnx2 :01:00.1 eth1: 05001080  17d0   00 [0]
[ 5417.275940] bnx2 :01:00.1 eth1: 06001080  17d8   00 [0]
[ 5417.275951]

Re: igb and bnx2: NETDEV WATCHDOG: transmit queue timed out when skb has huge linear buffer

2014-01-31 Thread Wei Liu

On Thu, Jan 30, 2014 at 07:08:11PM +, Zoltan Kiss wrote:
 Hi,
 
 I've experienced some queue timeout problems mentioned in the
 subject with igb and bnx2 cards. I haven't seen them on other cards
 so far. I'm using XenServer with 3.10 Dom0 kernel (however igb were
 already updated to latest version), and there are Windows guests
 sending data through these cards. I noticed these problems in XenRT
 test runs, and I know that they usually mean some lost interrupt
 problem or other hardware error, but in my case they started to
 appear more often, and they are likely connected to my netback grant
 mapping patches. These patches causing skb's with huge (~64kb)
 linear buffers to appear more often.
 The reason for that is an old problem in the ring protocol:
 originally the maximum amount of slots were linked to MAX_SKB_FRAGS,
 as every slot ended up as a frag of the skb. When this value were
 changed, netback had to cope with the situation by coalescing the
 packets into fewer frags.
 My patch series take a different approach: the leftover slots
 (pages) were assigned to a new skb's frags, and that skb were
 stashed to the frag_list of the first one. Then, before sending it
 off to the stack it calls skb = skb_copy_expand(skb, 0, 0,
 GFP_ATOMIC, __GFP_NOWARN), which basically creates a new skb and
 copied all the data into it. As far as I understood, it put
 everything into the linear buffer, which can amount to 64KB at most.
 The original skb are freed then, and this new one were sent to the
 stack.

Just my two cents, if it is this case, you can try to call
skb_copy_expand on every SKB netback receives to manually create SKBs
with ~64KB linear buffer to see how it goes...

Wei.

 I suspect that this is the problem as it only happens when guests
 send too much slots. Does anyone familiar with these drivers have
 seen such issue before? (when these kind of skb's get stucked in the
 queue)
 
 Regards,
 
 Zoltan Kiss
 --
 To unsubscribe from this list: send the line unsubscribe netdev in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: igb and bnx2: NETDEV WATCHDOG: transmit queue timed out when skb has huge linear buffer

2014-01-31 Thread Zoltan Kiss


On 30/01/14 21:34, Michael Chan wrote:

On Thu, 2014-01-30 at 19:08 +, Zoltan Kiss wrote:

I've experienced some queue timeout problems mentioned in the subject
with igb and bnx2 cards.

Please provide the full tx timeout dmesg.  bnx2 dumps some diagnostic
information during tx timeout that may be useful.  Thanks.

Hi,

Here is some:

[ 5417.275463] [ cut here ]
[ 5417.275472] WARNING: at net/sched/sch_generic.c:255 
dev_watchdog+0x156/0x1f0()

[ 5417.275474] NETDEV WATCHDOG: eth1 (bnx2): transmit queue 2 timed out
[ 5417.275476] Modules linked in: tun nfsv3 nfs_acl rpcsec_gss_krb5 
auth_rpcgss oid_registry nfsv4 nfs fscache lockd sunrpc ipv6 
openvswitch(O) ipt_REJECT nf_conntrack_ipv
rack xt_tcpudp iptable_filter ip_tables x_tables nls_utf8 isofs 
dm_multipath scsi_dh dm_mod dcdbas coretemp microcode psmouse serio_raw 
lpc_ich mfd_core hid_generic ehci_p
sg hed bnx2 usbhid hid sr_mod cdrom sd_mod pata_acpi ata_generic 
ata_piix libata uhci_hcd mptsas mptscsih mptbase scsi_transport_sas scsi_mod
[ 5417.275517] CPU: 0 PID: 3 Comm: ksoftirqd/0 Tainted: G O 
3.10.11-0.xs1.8.50.170.377582 #1
[ 5417.275518] Hardware name: Dell Inc. PowerEdge R710/00W9X3, BIOS 
1.2.6 07/17/2009
[ 5417.275520]  00ff f008be08 c1488c53 f008be30 c1046664 c1658a88 
f008be5c 00ff
[ 5417.275525]  c13fc146 c13fc146 ee96a000 0002 00137d44 f008be48 
c1046723 0009
[ 5417.275530]  f008be40 c1658a88 f008be5c f008be80 c13fc146 c16556e1 
00ff c1658a88

[ 5417.275535] Call Trace:
[ 5417.275539]  [c1488c53] dump_stack+0x16/0x1b
[ 5417.275544]  [c1046664] warn_slowpath_common+0x64/0x80
[ 5417.275546]  [c13fc146] ? dev_watchdog+0x156/0x1f0
[ 5417.275549]  [c13fc146] ? dev_watchdog+0x156/0x1f0
[ 5417.275551]  [c1046723] warn_slowpath_fmt+0x33/0x40
[ 5417.275554]  [c13fc146] dev_watchdog+0x156/0x1f0
[ 5417.275559]  [c10549ce] call_timer_fn+0x3e/0xf0
[ 5417.275563]  [c107293e] ? finish_task_switch+0x4e/0xb0
[ 5417.275565]  [c13fbff0] ? __netdev_watchdog_up+0x60/0x60
[ 5417.275568]  [c1055c1b] run_timer_softirq+0x1ab/0x210
[ 5417.275571]  [c13fbff0] ? __netdev_watchdog_up+0x60/0x60
[ 5417.275574]  [c104e3f4] __do_softirq+0xc4/0x200
[ 5417.275577]  [c1493547] ? xen_do_upcall+0x7/0xc
[ 5417.275579]  [c104e550] run_ksoftirqd+0x20/0x50
[ 5417.275582]  [c106f182] smpboot_thread_fn+0x142/0x150
[ 5417.275586]  [c1067a2b] kthread+0x9b/0xa0
[ 5417.275589]  [c106f040] ? smpboot_create_threads+0x60/0x60
[ 5417.275591]  [c107] ? cpu_rt_runtime_read+0x40/0x80
[ 5417.275594]  [c1492f77] ret_from_kernel_thread+0x1b/0x28
[ 5417.275596]  [c1067990] ? kthread_freezable_should_stop+0x60/0x60
[ 5417.275599] ---[ end trace 691f572d388226ca ]---
[ 5417.275602] bnx2 :01:00.1 eth1: --- start FTQ dump ---
[ 5417.275622] bnx2 :01:00.1 eth1: RV2P_PFTQ_CTL 0001
[ 5417.275629] bnx2 :01:00.1 eth1: RV2P_TFTQ_CTL 0002
[ 5417.275636] bnx2 :01:00.1 eth1: RV2P_MFTQ_CTL 4000
[ 5417.275643] bnx2 :01:00.1 eth1: TBDR_FTQ_CTL 4002
[ 5417.275650] bnx2 :01:00.1 eth1: TDMA_FTQ_CTL 00010002
[ 5417.275657] bnx2 :01:00.1 eth1: TXP_FTQ_CTL 0001
[ 5417.275663] bnx2 :01:00.1 eth1: TXP_FTQ_CTL 0001
[ 5417.275670] bnx2 :01:00.1 eth1: TPAT_FTQ_CTL 0001
[ 5417.275677] bnx2 :01:00.1 eth1: RXP_CFTQ_CTL 8000
[ 5417.275684] bnx2 :01:00.1 eth1: RXP_FTQ_CTL 0010
[ 5417.275690] bnx2 :01:00.1 eth1: COM_COMXQ_FTQ_CTL 0001
[ 5417.275698] bnx2 :01:00.1 eth1: COM_COMTQ_FTQ_CTL 0002
[ 5417.275705] bnx2 :01:00.1 eth1: COM_COMQ_FTQ_CTL 0001
[ 5417.275712] bnx2 :01:00.1 eth1: CP_CPQ_FTQ_CTL 4000
[ 5417.275718] bnx2 :01:00.1 eth1: CPU states:
[ 5417.275730] bnx2 :01:00.1 eth1: 045000 mode b84c state 80001000 
evt_mask 500 pc 8001284 pc 8001284 instr 1440fffc
[ 5417.275746] bnx2 :01:00.1 eth1: 085000 mode b84c state 80005000 
evt_mask 500 pc 8000a54 pc 8000a5c instr 10400016
[ 5417.275785] bnx2 :01:00.1 eth1: 0c5000 mode b84c state 80001000 
evt_mask 500 pc 8004c20 pc 8004c20 instr 32050003
[ 5417.275801] bnx2 :01:00.1 eth1: 105000 mode b8cc state 8000 
evt_mask 500 pc 8000a8c pc 8000a94 instr 8c420020
[ 5417.275817] bnx2 :01:00.1 eth1: 145000 mode b880 state 8000 
evt_mask 500 pc 8000ab0 pc 800d1e8 instr 27bd0020
[ 5417.275834] bnx2 :01:00.1 eth1: 185000 mode b8cc state 8000 
evt_mask 500 pc 8000cb0 pc 8000930 instr 8ce800e8

[ 5417.275845] bnx2 :01:00.1 eth1: --- end FTQ dump ---
[ 5417.275851] bnx2 :01:00.1 eth1: --- start TBDC dump ---
[ 5417.275858] bnx2 :01:00.1 eth1: TBDC free cnt: 32
[ 5417.275864] bnx2 :01:00.1 eth1: LINE CID  BIDX   CMD VALIDS
[ 5417.275875] bnx2 :01:00.1 eth1: 00001080  17c8   00 [0]
[ 5417.275886] bnx2 :01:00.1 eth1: 01001080  17e0   00 [0]
[ 5417.275897] bnx2 :01:00.1 eth1: 02001080  17e8   00 [0]
[ 5417.275907] bnx2 :01:00.1 eth1: 03001080  17f8   00 [0]
[ 5417.275918] bnx2 :01:00.1 eth1: 04001080  1800   00 [0]
[ 5417.275929

Re: igb and bnx2: "NETDEV WATCHDOG: transmit queue timed out" when skb has huge linear buffer

2014-01-30 Thread Michael Chan

On Thu, 2014-01-30 at 19:08 +, Zoltan Kiss wrote:
> I've experienced some queue timeout problems mentioned in the subject 
> with igb and bnx2 cards. 

Please provide the full tx timeout dmesg.  bnx2 dumps some diagnostic
information during tx timeout that may be useful.  Thanks.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

igb and bnx2: "NETDEV WATCHDOG: transmit queue timed out" when skb has huge linear buffer

2014-01-30 Thread Zoltan Kiss


Hi,

I've experienced some queue timeout problems mentioned in the subject 
with igb and bnx2 cards. I haven't seen them on other cards so far. I'm 
using XenServer with 3.10 Dom0 kernel (however igb were already updated 
to latest version), and there are Windows guests sending data through 
these cards. I noticed these problems in XenRT test runs, and I know 
that they usually mean some lost interrupt problem or other hardware 
error, but in my case they started to appear more often, and they are 
likely connected to my netback grant mapping patches. These patches 
causing skb's with huge (~64kb) linear buffers to appear more often.
The reason for that is an old problem in the ring protocol: originally 
the maximum amount of slots were linked to MAX_SKB_FRAGS, as every slot 
ended up as a frag of the skb. When this value were changed, netback had 
to cope with the situation by coalescing the packets into fewer frags.
My patch series take a different approach: the leftover slots (pages) 
were assigned to a new skb's frags, and that skb were stashed to the 
frag_list of the first one. Then, before sending it off to the stack it 
calls skb = skb_copy_expand(skb, 0, 0, GFP_ATOMIC, __GFP_NOWARN), which 
basically creates a new skb and copied all the data into it. As far as I 
understood, it put everything into the linear buffer, which can amount 
to 64KB at most. The original skb are freed then, and this new one were 
sent to the stack.
I suspect that this is the problem as it only happens when guests send 
too much slots. Does anyone familiar with these drivers have seen such 
issue before? (when these kind of skb's get stucked in the queue)


Regards,

Zoltan Kiss
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

igb and bnx2: NETDEV WATCHDOG: transmit queue timed out when skb has huge linear buffer

2014-01-30 Thread Zoltan Kiss


Hi,

I've experienced some queue timeout problems mentioned in the subject 
with igb and bnx2 cards. I haven't seen them on other cards so far. I'm 
using XenServer with 3.10 Dom0 kernel (however igb were already updated 
to latest version), and there are Windows guests sending data through 
these cards. I noticed these problems in XenRT test runs, and I know 
that they usually mean some lost interrupt problem or other hardware 
error, but in my case they started to appear more often, and they are 
likely connected to my netback grant mapping patches. These patches 
causing skb's with huge (~64kb) linear buffers to appear more often.
The reason for that is an old problem in the ring protocol: originally 
the maximum amount of slots were linked to MAX_SKB_FRAGS, as every slot 
ended up as a frag of the skb. When this value were changed, netback had 
to cope with the situation by coalescing the packets into fewer frags.
My patch series take a different approach: the leftover slots (pages) 
were assigned to a new skb's frags, and that skb were stashed to the 
frag_list of the first one. Then, before sending it off to the stack it 
calls skb = skb_copy_expand(skb, 0, 0, GFP_ATOMIC, __GFP_NOWARN), which 
basically creates a new skb and copied all the data into it. As far as I 
understood, it put everything into the linear buffer, which can amount 
to 64KB at most. The original skb are freed then, and this new one were 
sent to the stack.
I suspect that this is the problem as it only happens when guests send 
too much slots. Does anyone familiar with these drivers have seen such 
issue before? (when these kind of skb's get stucked in the queue)


Regards,

Zoltan Kiss
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: igb and bnx2: NETDEV WATCHDOG: transmit queue timed out when skb has huge linear buffer

2014-01-30 Thread Michael Chan

On Thu, 2014-01-30 at 19:08 +, Zoltan Kiss wrote:
 I've experienced some queue timeout problems mentioned in the subject 
 with igb and bnx2 cards. 

Please provide the full tx timeout dmesg.  bnx2 dumps some diagnostic
information during tx timeout that may be useful.  Thanks.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: NETDEV WATCHDOG: eth0 (e1000e): transmit queue 0 timed out

2013-12-08 Thread Ethan Zhao

Nick,
You could try 7.3.21-k8-NAPI in tree or the out-of-tree version as
Bjorn mentioned.
To read and debug an old version driver is not a interesting thing for
somebody to do.

Thanks,
Ethan

On Tue, Dec 3, 2013 at 9:33 PM, Nick Pegg  wrote:
> On Mon, Dec 2, 2013 at 10:51 PM, Ethan Zhao  wrote:
>> Bjorn,
>>Seems not the same bug as  http://sourceforge.net/p/e1000/bugs/367/
>> ,  Nick is not running his kernel on bare metal, per the error log,
>> he runs his kernel as HVM DomU guest or Dom0 on XEN ?  so just a check
>> of NULL will not fix that.
>>
>
> Sorry, I neglected to say in my original email that the kernel is
> running as a Xen Dom0. Per Todd's request, I've opened a bug report on
> sourceforge and will follow up with this issue there:
> https://sourceforge.net/p/e1000/bugs/385/
>
> Thanks,
> Nick
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: NETDEV WATCHDOG: eth0 (e1000e): transmit queue 0 timed out

2013-12-08 Thread Ethan Zhao

Nick,
You could try 7.3.21-k8-NAPI in tree or the out-of-tree version as
Bjorn mentioned.
To read and debug an old version driver is not a interesting thing for
somebody to do.

Thanks,
Ethan

On Tue, Dec 3, 2013 at 9:33 PM, Nick Pegg n...@nickpegg.com wrote:
 On Mon, Dec 2, 2013 at 10:51 PM, Ethan Zhao ethan.ker...@gmail.com wrote:
 Bjorn,
Seems not the same bug as  http://sourceforge.net/p/e1000/bugs/367/
 ,  Nick is not running his kernel on bare metal, per the error log,
 he runs his kernel as HVM DomU guest or Dom0 on XEN ?  so just a check
 of NULL will not fix that.


 Sorry, I neglected to say in my original email that the kernel is
 running as a Xen Dom0. Per Todd's request, I've opened a bug report on
 sourceforge and will follow up with this issue there:
 https://sourceforge.net/p/e1000/bugs/385/

 Thanks,
 Nick
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: NETDEV WATCHDOG: eth0 (e1000e): transmit queue 0 timed out

2013-12-03 Thread Nick Pegg

On Mon, Dec 2, 2013 at 10:51 PM, Ethan Zhao  wrote:
> Bjorn,
>Seems not the same bug as  http://sourceforge.net/p/e1000/bugs/367/
> ,  Nick is not running his kernel on bare metal, per the error log,
> he runs his kernel as HVM DomU guest or Dom0 on XEN ?  so just a check
> of NULL will not fix that.
>

Sorry, I neglected to say in my original email that the kernel is
running as a Xen Dom0. Per Todd's request, I've opened a bug report on
sourceforge and will follow up with this issue there:
https://sourceforge.net/p/e1000/bugs/385/

Thanks,
Nick
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: NETDEV WATCHDOG: eth0 (e1000e): transmit queue 0 timed out

2013-12-03 Thread Nick Pegg

On Mon, Dec 2, 2013 at 10:51 PM, Ethan Zhao ethan.ker...@gmail.com wrote:
 Bjorn,
Seems not the same bug as  http://sourceforge.net/p/e1000/bugs/367/
 ,  Nick is not running his kernel on bare metal, per the error log,
 he runs his kernel as HVM DomU guest or Dom0 on XEN ?  so just a check
 of NULL will not fix that.


Sorry, I neglected to say in my original email that the kernel is
running as a Xen Dom0. Per Todd's request, I've opened a bug report on
sourceforge and will follow up with this issue there:
https://sourceforge.net/p/e1000/bugs/385/

Thanks,
Nick
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: NETDEV WATCHDOG: eth0 (e1000e): transmit queue 0 timed out

2013-12-02 Thread Ethan Zhao

Bjorn,
   Seems not the same bug as  http://sourceforge.net/p/e1000/bugs/367/
,  Nick is not running his kernel on bare metal, per the error log,
he runs his kernel as HVM DomU guest or Dom0 on XEN ?  so just a check
of NULL will not fix that.

Thanks,
Ethan

On Thu, Nov 21, 2013 at 5:22 AM, Bjorn Helgaas  wrote:
> [+cc e1000-devel]
>
> On Wed, Nov 20, 2013 at 11:44 AM, Nick Pegg  wrote:
>> Hello,
>>
>> I've been seeing some servers hit a condition where they receive a
>> large number of packets (over 500,000 per second, for example) which
>> causes a kernel panic due to a null pointer dereference. I've included
>> the tracebacks below.
>>
>> I have not been able to reproduce this in my lab, but out in the field
>> I've seen this happen with kernel versions 3.7.6 through 3.9.2 (e1000e
>> driver versions 2.1.4-k through 2.2.14-k), running with Intel 82574L
>> NICs.
>>
>> I've seen previous posts to this mailing list suggesting that this is
>> a hardware issue (the mitigation being turning TSO/GSO off), however
>> those tracebacks didn't show the interface getting unexpectedly reset,
>> causing the null pointer dereference. Is this possibly a problem with
>> the e1000e driver where it's not gracefully handling the reset?
>>
>> Let me know if more information is needed. And please CC me in replies
>> since I'm not subscribed to this list. Thanks!
>
> Intel maintains newer drivers out-of-tree at
> http://sourceforge.net/projects/e1000/, and it's possible this is some
> bug that has already been fixed.  The current version there looks like
> e1000e-2.5.4, released 2013-09-05.
>
> Possible similar report: http://sourceforge.net/p/e1000/bugs/367/ (no
> real data there).
>
>> 
>> Nov 16 07:03:19 rx [ cut here ]--------
>> Nov 16 07:03:19 rx WARNING: at net/sched/sch_generic.c:255
>> dev_watchdog+0x25b/0x270()
>> Nov 16 07:03:19 rx Hardware name: X8DT6
>> Nov 16 07:03:19 rx NETDEV WATCHDOG: eth0 (e1000e): transmit queue 0 timed out
>> Nov 16 07:03:19 rx Modules linked in: xt_u32 xt_physdev
>> ip6table_mangle ip6_tables ebt_comment ebt_set ebt_arp ebt_limit
>> ebt_ip6 ebt_ip ip_set_hash_net ip_set ebtable_nat xen_gntdev bonding
>> ebtable_filter 8021q mrp e1000e ptp pps_core
>> Nov 16 07:03:19 rx Pid: 14268, comm: kworker/0:0 Not tainted 3.9.2-1 #1
>> Nov 16 07:03:19 rx Call Trace:
>> Nov 16 07:03:19 rx[] 
>> warn_slowpath_common+0x7a/0xc0
>> Nov 16 07:03:19 rx  [] warn_slowpath_fmt+0x41/0x50
>> Nov 16 07:03:19 rx  [] dev_watchdog+0x25b/0x270
>> Nov 16 07:03:19 rx  [] ? __netdev_watchdog_up+0x80/0x80
>> Nov 16 07:03:19 rx  [] call_timer_fn+0x44/0x120
>> Nov 16 07:03:19 rx  [] run_timer_softirq+0x241/0x2b0
>> Nov 16 07:03:19 rx  [] ? __netdev_watchdog_up+0x80/0x80
>> Nov 16 07:03:19 rx  [] __do_softirq+0xef/0x270
>> Nov 16 07:03:19 rx  [] irq_exit+0xb5/0xc0
>> Nov 16 07:03:19 rx  [] xen_evtchn_do_upcall+0x2f/0x40
>> Nov 16 07:03:19 rx  [] xen_do_hypervisor_callback+0x1e/0x30
>> Nov 16 07:03:19 rx[] ? delay_tsc+0x32/0x80
>> Nov 16 07:03:19 rx  [] ? delay_tsc+0x4a/0x80
>> Nov 16 07:03:19 rx  [] ? __const_udelay+0x28/0x30
>> Nov 16 07:03:19 rx  [] ?
>> e1000e_read_phy_reg_mdic+0xce/0x120 [e1000e]
>> Nov 16 07:03:19 rx  [] ?
>> e1000_get_hw_semaphore_82574+0x20/0x40 [e1000e]
>> Nov 16 07:03:19 rx  [] ?
>> e1000e_read_phy_reg_bm2+0x55/0xb0 [e1000e]
>> Nov 16 07:03:19 rx  [] ?
>> e1000e_flush_descriptors+0x96/0x270 [e1000e]
>> Nov 16 07:03:19 rx  [] ?
>> e1000_check_phy_82574+0x27/0x60 [e1000e]
>> Nov 16 07:03:19 rx  [] ?
>> e1000_watchdog_task+0x648/0x830 [e1000e]
>> Nov 16 07:03:19 rx  [] ? __schedule+0x3a7/0x7c0
>> Nov 16 07:03:19 rx  [] ? process_one_work+0x16e/0x430
>> Nov 16 07:03:19 rx  [] ? worker_thread+0x11c/0x410
>> Nov 16 07:03:19 rx  [] ? manage_workers+0x360/0x360
>> Nov 16 07:03:19 rx  [] ? kthread+0xc6/0xd0
>> Nov 16 07:03:19 rx  [] ? xen_end_context_switch+0x19/0x20
>> Nov 16 07:03:19 rx  [] ?
>> kthread_freezable_should_stop+0x70/0x70
>> Nov 16 07:03:19 rx  [] ? ret_from_fork+0x7c/0xb0
>> Nov 16 07:03:19 rx  [] ?
>> kthread_freezable_should_stop+0x70/0x70
>> Nov 16 07:03:19 rx ---[ end trace f896af6c9f44182d ]---
>> Nov 16 07:03:19 rx e1000e :03:00.0 eth0: Reset adapter unexpectedly
>> Nov 16 07:03:19 rx BUG: unable to handle kernel NULL pointer
>> dereference at 00d0
>> Nov 16 07:03:19 rx IP: []
>> e1000_clean_rx_irq+0x101/0x490 [e1000e]
>> Nov 16 07:03:19 rx PGD 4a6c3067 PUD 4f440067 PMD 0
>> Nov 16 07:03:19 rx Oops:  [#1] SMP
>> Nov 16 07:03:19 rx

RE: [E1000-devel] NETDEV WATCHDOG: eth0 (e1000e): transmit queue 0 timed out

2013-12-02 Thread Fujinaka, Todd

I'm having difficulty following this issue, most likely because of our email 
system. Can you file a new bug on sourceforge?

Thanks.

Todd Fujinaka
Software Application Engineer
Networking Division (ND)
Intel Corporation
todd.fujin...@intel.com
(503) 712-4565


-Original Message-
From: Nick Pegg [mailto:n...@nickpegg.com] 
Sent: Monday, December 02, 2013 2:57 PM
To: linux-kernel@vger.kernel.org; e1000-de...@lists.sourceforge.net
Subject: Re: [E1000-devel] NETDEV WATCHDOG: eth0 (e1000e): transmit queue 0 
timed out

> Intel maintains newer drivers out-of-tree at 
> http://sourceforge.net/projects/e1000/, and it's possible this is some 
> bug that has already been fixed.  The current version there looks like 
> e1000e-2.5.4, released 2013-09-05.
>
> Possible similar report: http://sourceforge.net/p/e1000/bugs/367/ (no 
> real data there).

I've looked through the existing bug reports and version changelogs and didn't 
see anything that seemed very relevant.

I was able to debug the e1000e object file and get the specific code that's 
bugging out after the interface is unexpectedly reset:

(gdb) l *e1000_clean_rx_irq+0x101
0x19d81 is in e1000_clean_rx_irq
(drivers/net/ethernet/intel/e1000e/netdev.c:933).
928 rmb();  /* read descriptor and rx_buffer_info
after status DD */
929
930 skb = buffer_info->skb;
931 buffer_info->skb = NULL;
932
933 prefetch(skb->data - NET_IP_ALIGN);
934
935 i++;
936 if (i == rx_ring->count)
937 i = 0;


The above code is from kernel version 3.9.2 and e1000e driver version 2.2.14-k. 
Should there be a check here to see if skb is NULL? I checked the latest e1000e 
release (2.5.4) and there is no check there either (near netdev.c:994).

--
Rapidly troubleshoot problems before they affect your business. Most IT 
organizations don't have a clear picture of how application performance affects 
their revenue. With AppDynamics, you get 100% visibility into your Java,.NET, & 
PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro!
http://pubads.g.doubleclick.net/gampad/clk?id=84349351=/4140/ostg.clktrk
___
E1000-devel mailing list
e1000-de...@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel Ethernet, visit 
http://communities.intel.com/community/wired
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: NETDEV WATCHDOG: eth0 (e1000e): transmit queue 0 timed out

2013-12-02 Thread Nick Pegg

> Intel maintains newer drivers out-of-tree at
> http://sourceforge.net/projects/e1000/, and it's possible this is some
> bug that has already been fixed.  The current version there looks like
> e1000e-2.5.4, released 2013-09-05.
>
> Possible similar report: http://sourceforge.net/p/e1000/bugs/367/ (no
> real data there).

I've looked through the existing bug reports and version changelogs
and didn't see anything that seemed very relevant.

I was able to debug the e1000e object file and get the specific code
that's bugging out after the interface is unexpectedly reset:

(gdb) l *e1000_clean_rx_irq+0x101
0x19d81 is in e1000_clean_rx_irq
(drivers/net/ethernet/intel/e1000e/netdev.c:933).
928 rmb();  /* read descriptor and rx_buffer_info
after status DD */
929
930 skb = buffer_info->skb;
931 buffer_info->skb = NULL;
932
933 prefetch(skb->data - NET_IP_ALIGN);
934
935 i++;
936 if (i == rx_ring->count)
937 i = 0;


The above code is from kernel version 3.9.2 and e1000e driver version
2.2.14-k. Should there be a check here to see if skb is NULL? I
checked the latest e1000e release (2.5.4) and there is no check there
either (near netdev.c:994).
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: NETDEV WATCHDOG: eth0 (e1000e): transmit queue 0 timed out

2013-12-02 Thread Nick Pegg

 Intel maintains newer drivers out-of-tree at
 http://sourceforge.net/projects/e1000/, and it's possible this is some
 bug that has already been fixed.  The current version there looks like
 e1000e-2.5.4, released 2013-09-05.

 Possible similar report: http://sourceforge.net/p/e1000/bugs/367/ (no
 real data there).

I've looked through the existing bug reports and version changelogs
and didn't see anything that seemed very relevant.

I was able to debug the e1000e object file and get the specific code
that's bugging out after the interface is unexpectedly reset:

(gdb) l *e1000_clean_rx_irq+0x101
0x19d81 is in e1000_clean_rx_irq
(drivers/net/ethernet/intel/e1000e/netdev.c:933).
928 rmb();  /* read descriptor and rx_buffer_info
after status DD */
929
930 skb = buffer_info-skb;
931 buffer_info-skb = NULL;
932
933 prefetch(skb-data - NET_IP_ALIGN);
934
935 i++;
936 if (i == rx_ring-count)
937 i = 0;


The above code is from kernel version 3.9.2 and e1000e driver version
2.2.14-k. Should there be a check here to see if skb is NULL? I
checked the latest e1000e release (2.5.4) and there is no check there
either (near netdev.c:994).
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [E1000-devel] NETDEV WATCHDOG: eth0 (e1000e): transmit queue 0 timed out

2013-12-02 Thread Fujinaka, Todd

I'm having difficulty following this issue, most likely because of our email 
system. Can you file a new bug on sourceforge?

Thanks.

Todd Fujinaka
Software Application Engineer
Networking Division (ND)
Intel Corporation
todd.fujin...@intel.com
(503) 712-4565


-Original Message-
From: Nick Pegg [mailto:n...@nickpegg.com] 
Sent: Monday, December 02, 2013 2:57 PM
To: linux-kernel@vger.kernel.org; e1000-de...@lists.sourceforge.net
Subject: Re: [E1000-devel] NETDEV WATCHDOG: eth0 (e1000e): transmit queue 0 
timed out

 Intel maintains newer drivers out-of-tree at 
 http://sourceforge.net/projects/e1000/, and it's possible this is some 
 bug that has already been fixed.  The current version there looks like 
 e1000e-2.5.4, released 2013-09-05.

 Possible similar report: http://sourceforge.net/p/e1000/bugs/367/ (no 
 real data there).

I've looked through the existing bug reports and version changelogs and didn't 
see anything that seemed very relevant.

I was able to debug the e1000e object file and get the specific code that's 
bugging out after the interface is unexpectedly reset:

(gdb) l *e1000_clean_rx_irq+0x101
0x19d81 is in e1000_clean_rx_irq
(drivers/net/ethernet/intel/e1000e/netdev.c:933).
928 rmb();  /* read descriptor and rx_buffer_info
after status DD */
929
930 skb = buffer_info-skb;
931 buffer_info-skb = NULL;
932
933 prefetch(skb-data - NET_IP_ALIGN);
934
935 i++;
936 if (i == rx_ring-count)
937 i = 0;


The above code is from kernel version 3.9.2 and e1000e driver version 2.2.14-k. 
Should there be a check here to see if skb is NULL? I checked the latest e1000e 
release (2.5.4) and there is no check there either (near netdev.c:994).

--
Rapidly troubleshoot problems before they affect your business. Most IT 
organizations don't have a clear picture of how application performance affects 
their revenue. With AppDynamics, you get 100% visibility into your Java,.NET,  
PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro!
http://pubads.g.doubleclick.net/gampad/clk?id=84349351iu=/4140/ostg.clktrk
___
E1000-devel mailing list
e1000-de...@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel#174; Ethernet, visit 
http://communities.intel.com/community/wired
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: NETDEV WATCHDOG: eth0 (e1000e): transmit queue 0 timed out

2013-12-02 Thread Ethan Zhao

Bjorn,
   Seems not the same bug as  http://sourceforge.net/p/e1000/bugs/367/
,  Nick is not running his kernel on bare metal, per the error log,
he runs his kernel as HVM DomU guest or Dom0 on XEN ?  so just a check
of NULL will not fix that.

Thanks,
Ethan

On Thu, Nov 21, 2013 at 5:22 AM, Bjorn Helgaas bhelg...@google.com wrote:
 [+cc e1000-devel]

 On Wed, Nov 20, 2013 at 11:44 AM, Nick Pegg n...@nickpegg.com wrote:
 Hello,

 I've been seeing some servers hit a condition where they receive a
 large number of packets (over 500,000 per second, for example) which
 causes a kernel panic due to a null pointer dereference. I've included
 the tracebacks below.

 I have not been able to reproduce this in my lab, but out in the field
 I've seen this happen with kernel versions 3.7.6 through 3.9.2 (e1000e
 driver versions 2.1.4-k through 2.2.14-k), running with Intel 82574L
 NICs.

 I've seen previous posts to this mailing list suggesting that this is
 a hardware issue (the mitigation being turning TSO/GSO off), however
 those tracebacks didn't show the interface getting unexpectedly reset,
 causing the null pointer dereference. Is this possibly a problem with
 the e1000e driver where it's not gracefully handling the reset?

 Let me know if more information is needed. And please CC me in replies
 since I'm not subscribed to this list. Thanks!

 Intel maintains newer drivers out-of-tree at
 http://sourceforge.net/projects/e1000/, and it's possible this is some
 bug that has already been fixed.  The current version there looks like
 e1000e-2.5.4, released 2013-09-05.

 Possible similar report: http://sourceforge.net/p/e1000/bugs/367/ (no
 real data there).

 
 Nov 16 07:03:19 rx [ cut here ]
 Nov 16 07:03:19 rx WARNING: at net/sched/sch_generic.c:255
 dev_watchdog+0x25b/0x270()
 Nov 16 07:03:19 rx Hardware name: X8DT6
 Nov 16 07:03:19 rx NETDEV WATCHDOG: eth0 (e1000e): transmit queue 0 timed out
 Nov 16 07:03:19 rx Modules linked in: xt_u32 xt_physdev
 ip6table_mangle ip6_tables ebt_comment ebt_set ebt_arp ebt_limit
 ebt_ip6 ebt_ip ip_set_hash_net ip_set ebtable_nat xen_gntdev bonding
 ebtable_filter 8021q mrp e1000e ptp pps_core
 Nov 16 07:03:19 rx Pid: 14268, comm: kworker/0:0 Not tainted 3.9.2-1 #1
 Nov 16 07:03:19 rx Call Trace:
 Nov 16 07:03:19 rx  IRQ  [8105070a] 
 warn_slowpath_common+0x7a/0xc0
 Nov 16 07:03:19 rx  [810507f1] warn_slowpath_fmt+0x41/0x50
 Nov 16 07:03:19 rx  [8168e2ab] dev_watchdog+0x25b/0x270
 Nov 16 07:03:19 rx  [8168e050] ? __netdev_watchdog_up+0x80/0x80
 Nov 16 07:03:19 rx  [81060464] call_timer_fn+0x44/0x120
 Nov 16 07:03:19 rx  [81060a71] run_timer_softirq+0x241/0x2b0
 Nov 16 07:03:19 rx  [8168e050] ? __netdev_watchdog_up+0x80/0x80
 Nov 16 07:03:19 rx  [8105893f] __do_softirq+0xef/0x270
 Nov 16 07:03:19 rx  [81058bc5] irq_exit+0xb5/0xc0
 Nov 16 07:03:19 rx  [81435a8f] xen_evtchn_do_upcall+0x2f/0x40
 Nov 16 07:03:19 rx  [817a24fe] xen_do_hypervisor_callback+0x1e/0x30
 Nov 16 07:03:19 rx  EOI  [813b3722] ? delay_tsc+0x32/0x80
 Nov 16 07:03:19 rx  [813b373a] ? delay_tsc+0x4a/0x80
 Nov 16 07:03:19 rx  [813b36e8] ? __const_udelay+0x28/0x30
 Nov 16 07:03:19 rx  [a0025b0e] ?
 e1000e_read_phy_reg_mdic+0xce/0x120 [e1000e]
 Nov 16 07:03:19 rx  [a0018bd0] ?
 e1000_get_hw_semaphore_82574+0x20/0x40 [e1000e]
 Nov 16 07:03:19 rx  [a0027a75] ?
 e1000e_read_phy_reg_bm2+0x55/0xb0 [e1000e]
 Nov 16 07:03:19 rx  [a00330d6] ?
 e1000e_flush_descriptors+0x96/0x270 [e1000e]
 Nov 16 07:03:19 rx  [a00181b7] ?
 e1000_check_phy_82574+0x27/0x60 [e1000e]
 Nov 16 07:03:19 rx  [a0034178] ?
 e1000_watchdog_task+0x648/0x830 [e1000e]
 Nov 16 07:03:19 rx  [81797877] ? __schedule+0x3a7/0x7c0
 Nov 16 07:03:19 rx  [8106c74e] ? process_one_work+0x16e/0x430
 Nov 16 07:03:19 rx  [8106ea3c] ? worker_thread+0x11c/0x410
 Nov 16 07:03:19 rx  [8106e920] ? manage_workers+0x360/0x360
 Nov 16 07:03:19 rx  [810738f6] ? kthread+0xc6/0xd0
 Nov 16 07:03:19 rx  [81003869] ? xen_end_context_switch+0x19/0x20
 Nov 16 07:03:19 rx  [81073830] ?
 kthread_freezable_should_stop+0x70/0x70
 Nov 16 07:03:19 rx  [817a10bc] ? ret_from_fork+0x7c/0xb0
 Nov 16 07:03:19 rx  [81073830] ?
 kthread_freezable_should_stop+0x70/0x70
 Nov 16 07:03:19 rx ---[ end trace f896af6c9f44182d ]---
 Nov 16 07:03:19 rx e1000e :03:00.0 eth0: Reset adapter unexpectedly
 Nov 16 07:03:19 rx BUG: unable to handle kernel NULL pointer
 dereference at 00d0
 Nov 16 07:03:19 rx IP: [a0031d51]
 e1000_clean_rx_irq+0x101/0x490 [e1000e]
 Nov 16 07:03:19 rx PGD 4a6c3067 PUD 4f440067 PMD 0
 Nov 16 07:03:19 rx Oops:  [#1] SMP
 Nov 16 07:03:19 rx Modules linked in: xt_u32 xt_physdev
 ip6table_mangle ip6_tables ebt_comment ebt_set ebt_arp ebt_limit
 ebt_ip6 ebt_ip ip_set_hash_net ip_set ebtable_nat

NETDEV WATCHDOG: eth0 (e1000e): transmit queue 0 timed out

2013-11-20 Thread Nick Pegg

Hello,

I've been seeing some servers hit a condition where they receive a
large number of packets (over 500,000 per second, for example) which
causes a kernel panic due to a null pointer dereference. I've included
the tracebacks below.

I have not been able to reproduce this in my lab, but out in the field
I've seen this happen with kernel versions 3.7.6 through 3.9.2 (e1000e
driver versions 2.1.4-k through 2.2.14-k), running with Intel 82574L
NICs.

I've seen previous posts to this mailing list suggesting that this is
a hardware issue (the mitigation being turning TSO/GSO off), however
those tracebacks didn't show the interface getting unexpectedly reset,
causing the null pointer dereference. Is this possibly a problem with
the e1000e driver where it's not gracefully handling the reset?

Let me know if more information is needed. And please CC me in replies
since I'm not subscribed to this list. Thanks!

-Nick



Nov 16 07:03:19 rx [ cut here ]
Nov 16 07:03:19 rx WARNING: at net/sched/sch_generic.c:255
dev_watchdog+0x25b/0x270()
Nov 16 07:03:19 rx Hardware name: X8DT6
Nov 16 07:03:19 rx NETDEV WATCHDOG: eth0 (e1000e): transmit queue 0 timed out
Nov 16 07:03:19 rx Modules linked in: xt_u32 xt_physdev
ip6table_mangle ip6_tables ebt_comment ebt_set ebt_arp ebt_limit
ebt_ip6 ebt_ip ip_set_hash_net ip_set ebtable_nat xen_gntdev bonding
ebtable_filter 8021q mrp e1000e ptp pps_core
Nov 16 07:03:19 rx Pid: 14268, comm: kworker/0:0 Not tainted 3.9.2-1 #1
Nov 16 07:03:19 rx Call Trace:
Nov 16 07:03:19 rx[] warn_slowpath_common+0x7a/0xc0
Nov 16 07:03:19 rx  [] warn_slowpath_fmt+0x41/0x50
Nov 16 07:03:19 rx  [] dev_watchdog+0x25b/0x270
Nov 16 07:03:19 rx  [] ? __netdev_watchdog_up+0x80/0x80
Nov 16 07:03:19 rx  [] call_timer_fn+0x44/0x120
Nov 16 07:03:19 rx  [] run_timer_softirq+0x241/0x2b0
Nov 16 07:03:19 rx  [] ? __netdev_watchdog_up+0x80/0x80
Nov 16 07:03:19 rx  [] __do_softirq+0xef/0x270
Nov 16 07:03:19 rx  [] irq_exit+0xb5/0xc0
Nov 16 07:03:19 rx  [] xen_evtchn_do_upcall+0x2f/0x40
Nov 16 07:03:19 rx  [] xen_do_hypervisor_callback+0x1e/0x30
Nov 16 07:03:19 rx[] ? delay_tsc+0x32/0x80
Nov 16 07:03:19 rx  [] ? delay_tsc+0x4a/0x80
Nov 16 07:03:19 rx  [] ? __const_udelay+0x28/0x30
Nov 16 07:03:19 rx  [] ?
e1000e_read_phy_reg_mdic+0xce/0x120 [e1000e]
Nov 16 07:03:19 rx  [] ?
e1000_get_hw_semaphore_82574+0x20/0x40 [e1000e]
Nov 16 07:03:19 rx  [] ?
e1000e_read_phy_reg_bm2+0x55/0xb0 [e1000e]
Nov 16 07:03:19 rx  [] ?
e1000e_flush_descriptors+0x96/0x270 [e1000e]
Nov 16 07:03:19 rx  [] ?
e1000_check_phy_82574+0x27/0x60 [e1000e]
Nov 16 07:03:19 rx  [] ?
e1000_watchdog_task+0x648/0x830 [e1000e]
Nov 16 07:03:19 rx  [] ? __schedule+0x3a7/0x7c0
Nov 16 07:03:19 rx  [] ? process_one_work+0x16e/0x430
Nov 16 07:03:19 rx  [] ? worker_thread+0x11c/0x410
Nov 16 07:03:19 rx  [] ? manage_workers+0x360/0x360
Nov 16 07:03:19 rx  [] ? kthread+0xc6/0xd0
Nov 16 07:03:19 rx  [] ? xen_end_context_switch+0x19/0x20
Nov 16 07:03:19 rx  [] ?
kthread_freezable_should_stop+0x70/0x70
Nov 16 07:03:19 rx  [] ? ret_from_fork+0x7c/0xb0
Nov 16 07:03:19 rx  [] ?
kthread_freezable_should_stop+0x70/0x70
Nov 16 07:03:19 rx ---[ end trace f896af6c9f44182d ]---
Nov 16 07:03:19 rx e1000e :03:00.0 eth0: Reset adapter unexpectedly
Nov 16 07:03:19 rx BUG: unable to handle kernel NULL pointer
dereference at 00d0
Nov 16 07:03:19 rx IP: []
e1000_clean_rx_irq+0x101/0x490 [e1000e]
Nov 16 07:03:19 rx PGD 4a6c3067 PUD 4f440067 PMD 0
Nov 16 07:03:19 rx Oops:  [#1] SMP
Nov 16 07:03:19 rx Modules linked in: xt_u32 xt_physdev
ip6table_mangle ip6_tables ebt_comment ebt_set ebt_arp ebt_limit
ebt_ip6 ebt_ip ip_set_hash_net ip_set ebtable_nat xen_gntdev bonding
ebtable_filter 8021q mrp e1000e ptp pps_core
Nov 16 07:03:19 rx CPU 0
Nov 16 07:03:19 rx Pid: 14268, comm: kworker/0:0 Tainted: GW
 3.9.2-1 #1 Supermicro X8DT6/X8DT6
Nov 16 07:03:19 rx RIP: e030:[]
[] e1000_clean_rx_irq+0x101/0x490 [e1000e]
Nov 16 07:03:19 rx RSP: e02b:88008ea03d98  EFLAGS: 00010202
Nov 16 07:03:19 rx RAX: 001a RBX: c900115f9000 RCX:
88008ea03e64
Nov 16 07:03:19 rx RDX: 88008ea03e64 RSI: 880002dc6e00 RDI:
8800844026c0
Nov 16 07:03:19 rx RBP: 88008ea03e38 R08: 000169e0 R09:
ea0b7180
Nov 16 07:03:19 rx R10: 00020063 R11:  R12:

Nov 16 07:03:19 rx R13: 880081fb6000 R14: 88008016c700 R15:
880081fb6000
Nov 16 07:03:19 rx FS:  7fde5acb8700()
GS:88008ea0() knlGS:
Nov 16 07:03:19 rx CS:  e033 DS:  ES:  CR0: 8005003b
Nov 16 07:03:19 rx CR2: 00d0 CR3: 71e8 CR4:
2660
Nov 16 07:03:19 rx DR0:  DR1:  DR2:

Nov 16 07:03:19 rx DR3:  DR6: 0ff0 DR7:
0400
Nov 16 07:03:19 rx Process kworker/0:0 (pid: 14268, threadinfo
880045f8a000, task 88005c9c)
Nov 16 07:03:19 rx

NETDEV WATCHDOG: eth0 (e1000e): transmit queue 0 timed out

2013-11-20 Thread Nick Pegg

Hello,

I've been seeing some servers hit a condition where they receive a
large number of packets (over 500,000 per second, for example) which
causes a kernel panic due to a null pointer dereference. I've included
the tracebacks below.

I have not been able to reproduce this in my lab, but out in the field
I've seen this happen with kernel versions 3.7.6 through 3.9.2 (e1000e
driver versions 2.1.4-k through 2.2.14-k), running with Intel 82574L
NICs.

I've seen previous posts to this mailing list suggesting that this is
a hardware issue (the mitigation being turning TSO/GSO off), however
those tracebacks didn't show the interface getting unexpectedly reset,
causing the null pointer dereference. Is this possibly a problem with
the e1000e driver where it's not gracefully handling the reset?

Let me know if more information is needed. And please CC me in replies
since I'm not subscribed to this list. Thanks!

-Nick



Nov 16 07:03:19 rx [ cut here ]
Nov 16 07:03:19 rx WARNING: at net/sched/sch_generic.c:255
dev_watchdog+0x25b/0x270()
Nov 16 07:03:19 rx Hardware name: X8DT6
Nov 16 07:03:19 rx NETDEV WATCHDOG: eth0 (e1000e): transmit queue 0 timed out
Nov 16 07:03:19 rx Modules linked in: xt_u32 xt_physdev
ip6table_mangle ip6_tables ebt_comment ebt_set ebt_arp ebt_limit
ebt_ip6 ebt_ip ip_set_hash_net ip_set ebtable_nat xen_gntdev bonding
ebtable_filter 8021q mrp e1000e ptp pps_core
Nov 16 07:03:19 rx Pid: 14268, comm: kworker/0:0 Not tainted 3.9.2-1 #1
Nov 16 07:03:19 rx Call Trace:
Nov 16 07:03:19 rx  IRQ  [8105070a] warn_slowpath_common+0x7a/0xc0
Nov 16 07:03:19 rx  [810507f1] warn_slowpath_fmt+0x41/0x50
Nov 16 07:03:19 rx  [8168e2ab] dev_watchdog+0x25b/0x270
Nov 16 07:03:19 rx  [8168e050] ? __netdev_watchdog_up+0x80/0x80
Nov 16 07:03:19 rx  [81060464] call_timer_fn+0x44/0x120
Nov 16 07:03:19 rx  [81060a71] run_timer_softirq+0x241/0x2b0
Nov 16 07:03:19 rx  [8168e050] ? __netdev_watchdog_up+0x80/0x80
Nov 16 07:03:19 rx  [8105893f] __do_softirq+0xef/0x270
Nov 16 07:03:19 rx  [81058bc5] irq_exit+0xb5/0xc0
Nov 16 07:03:19 rx  [81435a8f] xen_evtchn_do_upcall+0x2f/0x40
Nov 16 07:03:19 rx  [817a24fe] xen_do_hypervisor_callback+0x1e/0x30
Nov 16 07:03:19 rx  EOI  [813b3722] ? delay_tsc+0x32/0x80
Nov 16 07:03:19 rx  [813b373a] ? delay_tsc+0x4a/0x80
Nov 16 07:03:19 rx  [813b36e8] ? __const_udelay+0x28/0x30
Nov 16 07:03:19 rx  [a0025b0e] ?
e1000e_read_phy_reg_mdic+0xce/0x120 [e1000e]
Nov 16 07:03:19 rx  [a0018bd0] ?
e1000_get_hw_semaphore_82574+0x20/0x40 [e1000e]
Nov 16 07:03:19 rx  [a0027a75] ?
e1000e_read_phy_reg_bm2+0x55/0xb0 [e1000e]
Nov 16 07:03:19 rx  [a00330d6] ?
e1000e_flush_descriptors+0x96/0x270 [e1000e]
Nov 16 07:03:19 rx  [a00181b7] ?
e1000_check_phy_82574+0x27/0x60 [e1000e]
Nov 16 07:03:19 rx  [a0034178] ?
e1000_watchdog_task+0x648/0x830 [e1000e]
Nov 16 07:03:19 rx  [81797877] ? __schedule+0x3a7/0x7c0
Nov 16 07:03:19 rx  [8106c74e] ? process_one_work+0x16e/0x430
Nov 16 07:03:19 rx  [8106ea3c] ? worker_thread+0x11c/0x410
Nov 16 07:03:19 rx  [8106e920] ? manage_workers+0x360/0x360
Nov 16 07:03:19 rx  [810738f6] ? kthread+0xc6/0xd0
Nov 16 07:03:19 rx  [81003869] ? xen_end_context_switch+0x19/0x20
Nov 16 07:03:19 rx  [81073830] ?
kthread_freezable_should_stop+0x70/0x70
Nov 16 07:03:19 rx  [817a10bc] ? ret_from_fork+0x7c/0xb0
Nov 16 07:03:19 rx  [81073830] ?
kthread_freezable_should_stop+0x70/0x70
Nov 16 07:03:19 rx ---[ end trace f896af6c9f44182d ]---
Nov 16 07:03:19 rx e1000e :03:00.0 eth0: Reset adapter unexpectedly
Nov 16 07:03:19 rx BUG: unable to handle kernel NULL pointer
dereference at 00d0
Nov 16 07:03:19 rx IP: [a0031d51]
e1000_clean_rx_irq+0x101/0x490 [e1000e]
Nov 16 07:03:19 rx PGD 4a6c3067 PUD 4f440067 PMD 0
Nov 16 07:03:19 rx Oops:  [#1] SMP
Nov 16 07:03:19 rx Modules linked in: xt_u32 xt_physdev
ip6table_mangle ip6_tables ebt_comment ebt_set ebt_arp ebt_limit
ebt_ip6 ebt_ip ip_set_hash_net ip_set ebtable_nat xen_gntdev bonding
ebtable_filter 8021q mrp e1000e ptp pps_core
Nov 16 07:03:19 rx CPU 0
Nov 16 07:03:19 rx Pid: 14268, comm: kworker/0:0 Tainted: GW
 3.9.2-1 #1 Supermicro X8DT6/X8DT6
Nov 16 07:03:19 rx RIP: e030:[a0031d51]
[a0031d51] e1000_clean_rx_irq+0x101/0x490 [e1000e]
Nov 16 07:03:19 rx RSP: e02b:88008ea03d98  EFLAGS: 00010202
Nov 16 07:03:19 rx RAX: 001a RBX: c900115f9000 RCX:
88008ea03e64
Nov 16 07:03:19 rx RDX: 88008ea03e64 RSI: 880002dc6e00 RDI:
8800844026c0
Nov 16 07:03:19 rx RBP: 88008ea03e38 R08: 000169e0 R09:
ea0b7180
Nov 16 07:03:19 rx R10: 00020063 R11:  R12:

Nov 16 07:03:19 rx R13: 880081fb6000 R14: 88008016c700 R15:
880081fb6000
Nov 16 07:03:19 rx FS

Re: niu lock-up (Transmit timed out, resetting) and NETDEV WATCHDOG

2013-08-27 Thread Andrew Brooks

Hi

> On 26 March 2013 13:44, Andrew Brooks  wrote:
>> Using niu driver for this card: Oracle/SUN Multithreaded 10-Gigabit
>> Ethernet Network Controller and after a period the interface will hang
>> with errors every 5 seconds
>> "niu: xxx: eth2: Transmit timed out, resetting"

Here's more information about the problem:
When the interface hangs we see these messages from the driver:

[3408740.816032] niu: niu_interrupt() ldg[8807141d16d0](18)
v0[80] v1[0] v2[0]
[3408740.816036] niu :09:00.0: eth2: niu_txchan_intr() cs[b860b86c000]
[3408740.816038] niu :09:00.0: eth2: niu_poll_core() v0[0080]
[3408740.816040] niu :09:00.0: eth2: niu_tx_work() pkt_cnt[0] cons[119]
[3408740.816042] niu: niu_interrupt() ldg[8807141d16d0](18)
v0[80] v1[0] v2[0]
[3408740.820004] [sched_delayed] sched: RT throttling activated
[3408740.824021] niu :09:00.0: eth2: Disable interrupts
[3408740.824044] niu :09:00.0: eth2: Disable RX MAC
[3408740.824048] niu :09:00.0: eth2: Disable IPP
[3408740.824054] niu :09:00.0: eth2: Stop TX channels
[3408740.824641] niu :09:00.0: eth2: Stop RX channels
[3408740.824652] niu :09:00.0: eth2: Reset TX channels
[3408740.825212] niu :09:00.0: eth2: Reset RX channels
[3408740.825999] niu :09:00.0: eth2: Initialize TXC
[3408740.826002] niu :09:00.0: eth2: Initialize TX channels

However the interface doesn't recover :-(
Are there any clues there?

Thanks!
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: niu lock-up (Transmit timed out, resetting) and NETDEV WATCHDOG

2013-08-27 Thread Andrew Brooks

Hi

 On 26 March 2013 13:44, Andrew Brooks a...@sat.dundee.ac.uk wrote:
 Using niu driver for this card: Oracle/SUN Multithreaded 10-Gigabit
 Ethernet Network Controller and after a period the interface will hang
 with errors every 5 seconds
 niu: xxx: eth2: Transmit timed out, resetting

Here's more information about the problem:
When the interface hangs we see these messages from the driver:

[3408740.816032] niu: niu_interrupt() ldg[8807141d16d0](18)
v0[80] v1[0] v2[0]
[3408740.816036] niu :09:00.0: eth2: niu_txchan_intr() cs[b860b86c000]
[3408740.816038] niu :09:00.0: eth2: niu_poll_core() v0[0080]
[3408740.816040] niu :09:00.0: eth2: niu_tx_work() pkt_cnt[0] cons[119]
[3408740.816042] niu: niu_interrupt() ldg[8807141d16d0](18)
v0[80] v1[0] v2[0]
[3408740.820004] [sched_delayed] sched: RT throttling activated
[3408740.824021] niu :09:00.0: eth2: Disable interrupts
[3408740.824044] niu :09:00.0: eth2: Disable RX MAC
[3408740.824048] niu :09:00.0: eth2: Disable IPP
[3408740.824054] niu :09:00.0: eth2: Stop TX channels
[3408740.824641] niu :09:00.0: eth2: Stop RX channels
[3408740.824652] niu :09:00.0: eth2: Reset TX channels
[3408740.825212] niu :09:00.0: eth2: Reset RX channels
[3408740.825999] niu :09:00.0: eth2: Initialize TXC
[3408740.826002] niu :09:00.0: eth2: Initialize TX channels

However the interface doesn't recover :-(
Are there any clues there?

Thanks!
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: niu lock-up (Transmit timed out, resetting) and NETDEV WATCHDOG

2013-06-24 Thread Andrew Brooks

On 26 March 2013 13:44, Andrew Brooks  wrote:
>
> Using niu driver for this card: Oracle/SUN Multithreaded 10-Gigabit
> Ethernet Network Controller and after a period the interface will hang
> with errors every 5 seconds
> "niu: xxx: eth2: Transmit timed out, resetting"
>
> Sometimes also in syslog are messages
> WARNING: at sch_generic:255 dev_watchdog
> NETDEV WATCHDOG: eth2 (niu): transmit queue 10 timed out

Do you think this could be caused by a problem I've seen reported
by other machines on the network
"received unsolicited ack for DL_UNITDATA_REQ on nxge0" ?
Is there some bad packet flying around that causes the
niu driver to lock up the kernel?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: niu lock-up (Transmit timed out, resetting) and NETDEV WATCHDOG

2013-06-24 Thread Andrew Brooks

On 26 March 2013 13:44, Andrew Brooks a...@sat.dundee.ac.uk wrote:

 Using niu driver for this card: Oracle/SUN Multithreaded 10-Gigabit
 Ethernet Network Controller and after a period the interface will hang
 with errors every 5 seconds
 niu: xxx: eth2: Transmit timed out, resetting

 Sometimes also in syslog are messages
 WARNING: at sch_generic:255 dev_watchdog
 NETDEV WATCHDOG: eth2 (niu): transmit queue 10 timed out

Do you think this could be caused by a problem I've seen reported
by other machines on the network
received unsolicited ack for DL_UNITDATA_REQ on nxge0 ?
Is there some bad packet flying around that causes the
niu driver to lock up the kernel?
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

niu lock-up (Transmit timed out, resetting) and NETDEV WATCHDOG

2013-04-08 Thread Andrew Brooks

Hello

Using niu driver for this card: Oracle/SUN Multithreaded 10-Gigabit
Ethernet Network Controller
after a period (often less than 24 hours) the interface will hang with
errors every 5 seconds
"niu: xxx: eth2: Transmit timed out, resetting"

Sometimes also in syslog are messages
WARNING: at sch_generic:255 dev_watchdog
NETDEV WATCHDOG: eth2 (niu): transmit queue 10 timed out

Does anyone know which driver revision has fixed this problem or if
it's still buggy?

Thanks!

Andrew

P.S. My guess is the commit on 2012-10-02 ??

2013-02-04  ethernet: Remove unnecessary alloc/OOM messages, alloc
cleanupsJoe Perches 1   -1/+1
2013-01-09  remove init of dev->perm_addr in driversJiri Pirko  
1   -26/+20
2012-12-07  drivers/net: fix up function prototypes after __dev*
removalsGreg Kroah-Hartman  1   -26/+17
2012-12-03  net/sun: remove __dev* attributes   Bill Pemberton  1   
-45/+45
2012-10-07  drivers/net/ethernet/sun/niu.c: fix error return code   Peter
Senna Tschudin  1   -0/+1
2012-10-02  Merge branch 'for-3.7' of
git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq Linus
Torvalds1   -1/+1 [flush_work_sync is now flush_work]
2012-08-23  niu: Use PCI Express Capability accessors   Jiang Liu   
1   -12/+7
2012-08-20  workqueue: deprecate flush[_delayed]_work_sync()Tejun 
Heo   1   -1/+1
2012-07-23  niu: Change niu_rbr_fill() to use unlikely() to check
niu_rbr_add_page() return value Shuah Khan  1   -1/+1
2012-07-23  niu: Fix to check for dma mapping errors.

2012-06-08  Revert "niu: Add support for byte queue limits."David S.
Miller  1   -11/+1
2012-05-03  net/niu: remove one superfluous dma mask check  Sebastian
Andrzej Siewior 1   -1/+1
2012-02-23  ethernet: unify return value of .ndo_set_mac_address if
address is invalid  Danny Kukawka   1   -1/+1
2012-01-31  drivers/net: Remove alloc_etherdev error messages   Joe 
Perches 1   -3/+1
2011-12-02  niu: Fix typo in comment.   David S. Miller 1   -1/+1
2011-12-02  niu: Add support for byte queue limits. David S. Miller 1   
-1/+11
2011-12-02  niu: Remove redundant PHY ID test.  David S. Miller 1   
-2/+4
2011-11-21  net: Change mii to ethtool advertisement function
names   Matt Carlson1   -2/+2
2011-11-16  net: Add ethtool to mii advertisment conversion
helpers Matt Carlson1   -13/+2
2011-11-14  Sweep additional floors of strcpy in .get_drvinfo routines

2011-10-19  net: add skb frag size accessorsEric Dumazet1   
-3/+3
2011-10-14  niu: fix skb truesize underestimation   Eric Dumazet1   
-8/+4
2011-09-16  ethtool: Update ethtool_rxnfc::rule_cnt on return from
ETHTOOL_GRXCLSRLALL Ben Hutchings   1   -0/+2
2011-09-16  ethtool: Clean up definitions of rule location arrays in RX
NFC Ben Hutchings   1   -2/+2
2011-09-15  niu: convert to SKB paged frag API. Ian Campbell1   
-5/+2
2011-08-18  net: introduce IFF_UNICAST_FLT private flag Jiri Pirko  
1   -1/+4
2011-08-11  cassini/niu/sun*: Move the Sun drivers
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

niu lock-up (Transmit timed out, resetting) and NETDEV WATCHDOG

2013-04-08 Thread Andrew Brooks

Hello

Using niu driver for this card: Oracle/SUN Multithreaded 10-Gigabit
Ethernet Network Controller
after a period (often less than 24 hours) the interface will hang with
errors every 5 seconds
niu: xxx: eth2: Transmit timed out, resetting

Sometimes also in syslog are messages
WARNING: at sch_generic:255 dev_watchdog
NETDEV WATCHDOG: eth2 (niu): transmit queue 10 timed out

Does anyone know which driver revision has fixed this problem or if
it's still buggy?

Thanks!

Andrew

P.S. My guess is the commit on 2012-10-02 ??

2013-02-04  ethernet: Remove unnecessary alloc/OOM messages, alloc
cleanupsJoe Perches 1   -1/+1
2013-01-09  remove init of dev-perm_addr in driversJiri Pirko  
1   -26/+20
2012-12-07  drivers/net: fix up function prototypes after __dev*
removalsGreg Kroah-Hartman  1   -26/+17
2012-12-03  net/sun: remove __dev* attributes   Bill Pemberton  1   
-45/+45
2012-10-07  drivers/net/ethernet/sun/niu.c: fix error return code   Peter
Senna Tschudin  1   -0/+1
2012-10-02  Merge branch 'for-3.7' of
git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq Linus
Torvalds1   -1/+1 [flush_work_sync is now flush_work]
2012-08-23  niu: Use PCI Express Capability accessors   Jiang Liu   
1   -12/+7
2012-08-20  workqueue: deprecate flush[_delayed]_work_sync()Tejun 
Heo   1   -1/+1
2012-07-23  niu: Change niu_rbr_fill() to use unlikely() to check
niu_rbr_add_page() return value Shuah Khan  1   -1/+1
2012-07-23  niu: Fix to check for dma mapping errors.

2012-06-08  Revert niu: Add support for byte queue limits.David S.
Miller  1   -11/+1
2012-05-03  net/niu: remove one superfluous dma mask check  Sebastian
Andrzej Siewior 1   -1/+1
2012-02-23  ethernet: unify return value of .ndo_set_mac_address if
address is invalid  Danny Kukawka   1   -1/+1
2012-01-31  drivers/net: Remove alloc_etherdev error messages   Joe 
Perches 1   -3/+1
2011-12-02  niu: Fix typo in comment.   David S. Miller 1   -1/+1
2011-12-02  niu: Add support for byte queue limits. David S. Miller 1   
-1/+11
2011-12-02  niu: Remove redundant PHY ID test.  David S. Miller 1   
-2/+4
2011-11-21  net: Change mii to ethtool advertisement function
names   Matt Carlson1   -2/+2
2011-11-16  net: Add ethtool to mii advertisment conversion
helpers Matt Carlson1   -13/+2
2011-11-14  Sweep additional floors of strcpy in .get_drvinfo routines

2011-10-19  net: add skb frag size accessorsEric Dumazet1   
-3/+3
2011-10-14  niu: fix skb truesize underestimation   Eric Dumazet1   
-8/+4
2011-09-16  ethtool: Update ethtool_rxnfc::rule_cnt on return from
ETHTOOL_GRXCLSRLALL Ben Hutchings   1   -0/+2
2011-09-16  ethtool: Clean up definitions of rule location arrays in RX
NFC Ben Hutchings   1   -2/+2
2011-09-15  niu: convert to SKB paged frag API. Ian Campbell1   
-5/+2
2011-08-18  net: introduce IFF_UNICAST_FLT private flag Jiri Pirko  
1   -1/+4
2011-08-11  cassini/niu/sun*: Move the Sun drivers
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

BUG: ipheth - NETDEV WATCHDOG: eth1 (ipheth): transmit queue 0 timed out

2013-03-20 Thread Joerg Mayer

Hello,

ipheth does not work for me at all with iPhone5 and iOS 6.

Setup:
Thinkpad T60 (32 bit) with openSUSE 12.1 (with current kernel repo)
iPhone5 with iOS 6.1.2
USB-Cable between laptop and iphone

In the moment I enable Hotspot with USB-cable connected to phone and
laptop, I get the following message:


[Wed Mar 20 04:51:24 2013] usb 1-2: new high-speed USB device number 4 using 
ehci-pci
[Wed Mar 20 04:51:24 2013] usb 1-2: New USB device found, idVendor=05ac, 
idProduct=12a8
[Wed Mar 20 04:51:24 2013] usb 1-2: New USB device strings: Mfr=1, Product=2, 
SerialNumber=3
[Wed Mar 20 04:51:24 2013] usb 1-2: Product: iPhone
[Wed Mar 20 04:51:24 2013] usb 1-2: Manufacturer: Apple Inc.
[Wed Mar 20 04:51:24 2013] usb 1-2: SerialNumber: XXX
[Wed Mar 20 04:51:24 2013] ipheth 1-2:4.2: Apple iPhone USB Ethernet device 
attached
[Wed Mar 20 04:51:24 2013] usbcore: registered new interface driver ipheth
[Wed Mar 20 04:52:46 2013] IPv6: ADDRCONF(NETDEV_UP): eth1: link is not ready
[Wed Mar 20 04:55:28 2013] IPv6: ADDRCONF(NETDEV_CHANGE): eth1: link becomes 
ready
[Wed Mar 20 04:55:33 2013] [ cut here ]
[Wed Mar 20 04:55:33 2013] WARNING: at 
/home/abuild/rpmbuild/BUILD/kernel-desktop-3.8.3/linux-3.8/net/sched/sch_generic.c:254
 dev_watchdog+0x1e0/0x1f0()
[Wed Mar 20 04:55:33 2013] Hardware name: 200763G
[Wed Mar 20 04:55:33 2013] NETDEV WATCHDOG: eth1 (ipheth): transmit queue 0 
timed out
[Wed Mar 20 04:55:33 2013] Modules linked in: ipheth md4 md5 nls_utf8 cifs 
fscache ppdev parport_pc lp parport af_packet rfcomm bnep binfmt_misc 
tp_smapi(OF) thinkpad_ec(OF) cpufreq_conservative cpufreq_userspace 
cpufreq_powersave sha256_generic cbc dm_crypt dm_mod snd_hda_codec_analog 
snd_hda_intel btusb snd_hda_codec bluetooth acpi_cpufreq arc4 pcmcia iwl3945 
mperf iwlegacy coretemp snd_hwdep snd_pcm mac80211 kvm_intel iTCO_wdt 
iTCO_vendor_support snd_timer kvm sg lpc_ich snd_page_alloc mfd_core 
yenta_socket cfg80211 thinkpad_acpi pcmcia_rsrc pcmcia_core sr_mod cdrom 
i2c_i801 rfkill snd joydev irda tpm_tis e1000e tpm tpm_bios crc_ccitt microcode 
video battery soundcore ac button autofs4 radeon ttm drm_kms_helper drm 
i2c_algo_bit scsi_dh_hp_sw scsi_dh_emc scsi_dh_alua scsi_dh_rdac scsi_dh edd 
fan ata_generic ata_piix thermal processor thermal_sys
[Wed Mar 20 04:55:33 2013] Pid: 0, comm: swapper/0 Tainted: GF  O 
3.8.3-1-desktop #1
[Wed Mar 20 04:55:33 2013] Call Trace:
[Wed Mar 20 04:55:33 2013]  [] try_stack_unwind+0x199/0x1b0
[Wed Mar 20 04:55:33 2013]  [] dump_trace+0x47/0xf0
[Wed Mar 20 04:55:33 2013]  [] show_trace_log_lvl+0x4b/0x60
[Wed Mar 20 04:55:33 2013]  [] show_trace+0x18/0x20
[Wed Mar 20 04:55:33 2013]  [] dump_stack+0x6d/0x72
[Wed Mar 20 04:55:33 2013]  [] warn_slowpath_common+0x78/0xb0
[Wed Mar 20 04:55:33 2013]  [] warn_slowpath_fmt+0x33/0x40
[Wed Mar 20 04:55:33 2013]  [] dev_watchdog+0x1e0/0x1f0
[Wed Mar 20 04:55:33 2013]  [] call_timer_fn+0x24/0x120
[Wed Mar 20 04:55:33 2013]  [] run_timer_softirq+0x1a2/0x240
[Wed Mar 20 04:55:33 2013]  [] __do_softirq+0x99/0x1e0
[Wed Mar 20 04:55:33 2013]  [] do_softirq+0x76/0xb0
[Wed Mar 20 04:55:33 2013]  [<000f>] 0xe
[Wed Mar 20 04:55:33 2013] ---[ end trace dd5a449765340b63 ]---
[Wed Mar 20 04:55:33 2013] ipheth 1-2:4.2: ipheth_tx_timeout: TX timeout
[Wed Mar 20 04:55:38 2013] ipheth 1-2:4.2: ipheth_tx_timeout: TX timeout
[Wed Mar 20 04:55:48 2013] ipheth 1-2:4.2: ipheth_tx_timeout: TX timeout
[Wed Mar 20 04:55:58 2013] ipheth 1-2:4.2: ipheth_tx_timeout: TX timeout
[Wed Mar 20 04:56:08 2013] ipheth 1-2:4.2: ipheth_tx_timeout: TX timeout
[Wed Mar 20 04:56:18 2013] ipheth 1-2:4.2: ipheth_tx_timeout: TX timeout


The bug also occurs
- when I turn on hotpot first and plug in the cable afterwards.
- with the vanilla kernel from the same repo (and without loading
  the tp_smapi module which taints the kernel in the log above).
- for others (https://bugzilla.novell.com/show_bug.cgi?id=779643).
  That link also contains messages from another system with an earlier
  kernel other iphone model and earlier iOS.

What other information is required to get this problem resolved?

Thanks
Jörg
-- 
Joerg Mayer   
We are stuck with technology when what we really want is just stuff that
works. Some say that should read Microsoft instead of technology.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

BUG: ipheth - NETDEV WATCHDOG: eth1 (ipheth): transmit queue 0 timed out

2013-03-20 Thread Joerg Mayer

Hello,

ipheth does not work for me at all with iPhone5 and iOS 6.

Setup:
Thinkpad T60 (32 bit) with openSUSE 12.1 (with current kernel repo)
iPhone5 with iOS 6.1.2
USB-Cable between laptop and iphone

In the moment I enable Hotspot with USB-cable connected to phone and
laptop, I get the following message:


[Wed Mar 20 04:51:24 2013] usb 1-2: new high-speed USB device number 4 using 
ehci-pci
[Wed Mar 20 04:51:24 2013] usb 1-2: New USB device found, idVendor=05ac, 
idProduct=12a8
[Wed Mar 20 04:51:24 2013] usb 1-2: New USB device strings: Mfr=1, Product=2, 
SerialNumber=3
[Wed Mar 20 04:51:24 2013] usb 1-2: Product: iPhone
[Wed Mar 20 04:51:24 2013] usb 1-2: Manufacturer: Apple Inc.
[Wed Mar 20 04:51:24 2013] usb 1-2: SerialNumber: XXX
[Wed Mar 20 04:51:24 2013] ipheth 1-2:4.2: Apple iPhone USB Ethernet device 
attached
[Wed Mar 20 04:51:24 2013] usbcore: registered new interface driver ipheth
[Wed Mar 20 04:52:46 2013] IPv6: ADDRCONF(NETDEV_UP): eth1: link is not ready
[Wed Mar 20 04:55:28 2013] IPv6: ADDRCONF(NETDEV_CHANGE): eth1: link becomes 
ready
[Wed Mar 20 04:55:33 2013] [ cut here ]
[Wed Mar 20 04:55:33 2013] WARNING: at 
/home/abuild/rpmbuild/BUILD/kernel-desktop-3.8.3/linux-3.8/net/sched/sch_generic.c:254
 dev_watchdog+0x1e0/0x1f0()
[Wed Mar 20 04:55:33 2013] Hardware name: 200763G
[Wed Mar 20 04:55:33 2013] NETDEV WATCHDOG: eth1 (ipheth): transmit queue 0 
timed out
[Wed Mar 20 04:55:33 2013] Modules linked in: ipheth md4 md5 nls_utf8 cifs 
fscache ppdev parport_pc lp parport af_packet rfcomm bnep binfmt_misc 
tp_smapi(OF) thinkpad_ec(OF) cpufreq_conservative cpufreq_userspace 
cpufreq_powersave sha256_generic cbc dm_crypt dm_mod snd_hda_codec_analog 
snd_hda_intel btusb snd_hda_codec bluetooth acpi_cpufreq arc4 pcmcia iwl3945 
mperf iwlegacy coretemp snd_hwdep snd_pcm mac80211 kvm_intel iTCO_wdt 
iTCO_vendor_support snd_timer kvm sg lpc_ich snd_page_alloc mfd_core 
yenta_socket cfg80211 thinkpad_acpi pcmcia_rsrc pcmcia_core sr_mod cdrom 
i2c_i801 rfkill snd joydev irda tpm_tis e1000e tpm tpm_bios crc_ccitt microcode 
video battery soundcore ac button autofs4 radeon ttm drm_kms_helper drm 
i2c_algo_bit scsi_dh_hp_sw scsi_dh_emc scsi_dh_alua scsi_dh_rdac scsi_dh edd 
fan ata_generic ata_piix thermal processor thermal_sys
[Wed Mar 20 04:55:33 2013] Pid: 0, comm: swapper/0 Tainted: GF  O 
3.8.3-1-desktop #1
[Wed Mar 20 04:55:33 2013] Call Trace:
[Wed Mar 20 04:55:33 2013]  [c0205709] try_stack_unwind+0x199/0x1b0
[Wed Mar 20 04:55:33 2013]  [c0204417] dump_trace+0x47/0xf0
[Wed Mar 20 04:55:33 2013]  [c020576b] show_trace_log_lvl+0x4b/0x60
[Wed Mar 20 04:55:33 2013]  [c0205798] show_trace+0x18/0x20
[Wed Mar 20 04:55:33 2013]  [c071a6cb] dump_stack+0x6d/0x72
[Wed Mar 20 04:55:33 2013]  [c02391c8] warn_slowpath_common+0x78/0xb0
[Wed Mar 20 04:55:33 2013]  [c0239293] warn_slowpath_fmt+0x33/0x40
[Wed Mar 20 04:55:33 2013]  [c0666350] dev_watchdog+0x1e0/0x1f0
[Wed Mar 20 04:55:33 2013]  [c0247ed4] call_timer_fn+0x24/0x120
[Wed Mar 20 04:55:33 2013]  [c0248172] run_timer_softirq+0x1a2/0x240
[Wed Mar 20 04:55:33 2013]  [c0240fb9] __do_softirq+0x99/0x1e0
[Wed Mar 20 04:55:33 2013]  [c02042f6] do_softirq+0x76/0xb0
[Wed Mar 20 04:55:33 2013]  [000f] 0xe
[Wed Mar 20 04:55:33 2013] ---[ end trace dd5a449765340b63 ]---
[Wed Mar 20 04:55:33 2013] ipheth 1-2:4.2: ipheth_tx_timeout: TX timeout
[Wed Mar 20 04:55:38 2013] ipheth 1-2:4.2: ipheth_tx_timeout: TX timeout
[Wed Mar 20 04:55:48 2013] ipheth 1-2:4.2: ipheth_tx_timeout: TX timeout
[Wed Mar 20 04:55:58 2013] ipheth 1-2:4.2: ipheth_tx_timeout: TX timeout
[Wed Mar 20 04:56:08 2013] ipheth 1-2:4.2: ipheth_tx_timeout: TX timeout
[Wed Mar 20 04:56:18 2013] ipheth 1-2:4.2: ipheth_tx_timeout: TX timeout


The bug also occurs
- when I turn on hotpot first and plug in the cable afterwards.
- with the vanilla kernel from the same repo (and without loading
  the tp_smapi module which taints the kernel in the log above).
- for others (https://bugzilla.novell.com/show_bug.cgi?id=779643).
  That link also contains messages from another system with an earlier
  kernel other iphone model and earlier iOS.

What other information is required to get this problem resolved?

Thanks
Jörg
-- 
Joerg Mayer   jma...@loplof.de
We are stuck with technology when what we really want is just stuff that
works. Some say that should read Microsoft instead of technology.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [3.8-rc] regression: NETDEV WATCHDOG: eth0 (r8169): transmit queue 0 timed out

2013-02-06 Thread Francois Romieu

Jörg Otte  :
[...]
> To Summarize: Two net-regressions where introduced in v3.8 (driver r8169):
> 
> 1) NETDEV WATCHDOG: eth0 (r8169): transmit queue 0 timed out
> was introduced by commit
> e0c075577965d1c01b30038d38bf637b027a1df3
> ("r8169: enable ALDPS for power saving")

Hayes Wang  authored it. You should ask him
why commit e0c075577965d1c01b30038d38bf637b027a1df3 sometimes chokes
with the 8168evl. 

And you can ask him if there is a chance that the non-8168evl that are
handled by the patch (mis-)behave the same too.

-- 
Ueimor
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [3.8-rc] regression: NETDEV WATCHDOG: eth0 (r8169): transmit queue 0 timed out

2013-02-06 Thread Jörg Otte

2013/2/3 Jörg Otte :
> 2013/1/6 Jörg Otte :
>> 2013/1/5 Francois Romieu :
>>> Can you check if things improve with v3.8-rc2 after removing :
>>>
>>> 2. d64ec841517a25f6d468bde9f67e5b4cffdc67c7
>>>r8169: enable internal ASPM and clock request settings
>>
>> this fixes a second issue for me:
>> In 3.7.1 at startup the link came up after 15 sec.:
>> grep r8169 dmesg.3.7.1
>> [1.956842] r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded
>> [1.957059] r8169 :02:00.0: irq 42 for MSI/MSI-X
>> [1.957161] r8169 :02:00.0 eth0: RTL8168evl/8111evl at..
>> [1.957163] r8169 :02:00.0 eth0: jumbo features [frames..
>> [   13.575452] r8169 :02:00.0 eth0: link down
>> [   13.575475] r8169 :02:00.0 eth0: link down
>> [   15.181317] r8169 :02:00.0 eth0: link up
>>
>> In 3.8rc the time increased to 24 seconds:
>> grep r8169 dmesg.3.8.0
>> [1.852546] r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded
>> [1.852765] r8169 :02:00.0: irq 42 for MSI/MSI-X
>> [1.852872] r8169 :02:00.0 eth0: RTL8168evl/8111evl at...
>> [1.852874] r8169 :02:00.0 eth0: jumbo features [frames...
>> [   14.150212] r8169 :02:00.0 eth0: link down
>> [   14.150229] r8169 :02:00.0 eth0: link down
>> [   24.140263] r8169 :02:00.0 eth0: link up
>>
>> But with this revert I get the old performance:
>> dmesg | grep r8169
>> [1.816613] r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded
>> [1.816832] r8169 :02:00.0: irq 42 for MSI/MSI-X
>> [1.816947] r8169 :02:00.0 eth0: RTL8168evl/8111evl at...
>> [1.816948] r8169 :02:00.0 eth0: jumbo features [frames...
>> [   13.986401] r8169 :02:00.0 eth0: link down
>> [   13.986422] r8169 :02:00.0 eth0: link down
>> [   15.623631] r8169 :02:00.0 eth0: link up
>>
>>
>>> 3. e0c075577965d1c01b30038d38bf637b027a1df3
>>>r8169: enable ALDPS for power saving
>>
>> That's it! This fixes the problem for me!
>>
>> Thanks, Jörg
>
>
> We are closely before v3.8 and I didn't see a solution
> so far.
> What is the plan regarding this issue(s)?
>
> Thanks, Jörg

No response, so I Cc to Linus:

To Summarize: Two net-regressions where introduced in v3.8 (driver r8169):

1) NETDEV WATCHDOG: eth0 (r8169): transmit queue 0 timed out
was introduced by commit
e0c075577965d1c01b30038d38bf637b027a1df3
("r8169: enable ALDPS for power saving")

2) Boot-time increased from 15sec (V3.7) to 24sec (V3.8)
by commit:
d64ec841517a25f6d468bde9f67e5b4cffdc67c7
("r8169: enable internal ASPM and clock request settings")

Reverting the commits resolve the problems entirely.

As long as the issues are not fixed the commits should be reverted.

Thanks, Jörg
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [3.8-rc] regression: NETDEV WATCHDOG: eth0 (r8169): transmit queue 0 timed out

2013-02-06 Thread Jörg Otte

2013/2/3 Jörg Otte jrg.o...@gmail.com:
 2013/1/6 Jörg Otte jrg.o...@gmail.com:
 2013/1/5 Francois Romieu rom...@fr.zoreil.com:
 Can you check if things improve with v3.8-rc2 after removing :

 2. d64ec841517a25f6d468bde9f67e5b4cffdc67c7
r8169: enable internal ASPM and clock request settings

 this fixes a second issue for me:
 In 3.7.1 at startup the link came up after 15 sec.:
 grep r8169 dmesg.3.7.1
 [1.956842] r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded
 [1.957059] r8169 :02:00.0: irq 42 for MSI/MSI-X
 [1.957161] r8169 :02:00.0 eth0: RTL8168evl/8111evl at..
 [1.957163] r8169 :02:00.0 eth0: jumbo features [frames..
 [   13.575452] r8169 :02:00.0 eth0: link down
 [   13.575475] r8169 :02:00.0 eth0: link down
 [   15.181317] r8169 :02:00.0 eth0: link up

 In 3.8rc the time increased to 24 seconds:
 grep r8169 dmesg.3.8.0
 [1.852546] r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded
 [1.852765] r8169 :02:00.0: irq 42 for MSI/MSI-X
 [1.852872] r8169 :02:00.0 eth0: RTL8168evl/8111evl at...
 [1.852874] r8169 :02:00.0 eth0: jumbo features [frames...
 [   14.150212] r8169 :02:00.0 eth0: link down
 [   14.150229] r8169 :02:00.0 eth0: link down
 [   24.140263] r8169 :02:00.0 eth0: link up

 But with this revert I get the old performance:
 dmesg | grep r8169
 [1.816613] r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded
 [1.816832] r8169 :02:00.0: irq 42 for MSI/MSI-X
 [1.816947] r8169 :02:00.0 eth0: RTL8168evl/8111evl at...
 [1.816948] r8169 :02:00.0 eth0: jumbo features [frames...
 [   13.986401] r8169 :02:00.0 eth0: link down
 [   13.986422] r8169 :02:00.0 eth0: link down
 [   15.623631] r8169 :02:00.0 eth0: link up


 3. e0c075577965d1c01b30038d38bf637b027a1df3
r8169: enable ALDPS for power saving

 That's it! This fixes the problem for me!

 Thanks, Jörg


 We are closely before v3.8 and I didn't see a solution
 so far.
 What is the plan regarding this issue(s)?

 Thanks, Jörg

No response, so I Cc to Linus:

To Summarize: Two net-regressions where introduced in v3.8 (driver r8169):

1) NETDEV WATCHDOG: eth0 (r8169): transmit queue 0 timed out
was introduced by commit
e0c075577965d1c01b30038d38bf637b027a1df3
(r8169: enable ALDPS for power saving)

2) Boot-time increased from 15sec (V3.7) to 24sec (V3.8)
by commit:
d64ec841517a25f6d468bde9f67e5b4cffdc67c7
(r8169: enable internal ASPM and clock request settings)

Reverting the commits resolve the problems entirely.

As long as the issues are not fixed the commits should be reverted.

Thanks, Jörg
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [3.8-rc] regression: NETDEV WATCHDOG: eth0 (r8169): transmit queue 0 timed out

2013-02-06 Thread Francois Romieu

Jörg Otte jrg.o...@gmail.com :
[...]
 To Summarize: Two net-regressions where introduced in v3.8 (driver r8169):
 
 1) NETDEV WATCHDOG: eth0 (r8169): transmit queue 0 timed out
 was introduced by commit
 e0c075577965d1c01b30038d38bf637b027a1df3
 (r8169: enable ALDPS for power saving)

Hayes Wang hayesw...@realtek.com authored it. You should ask him
why commit e0c075577965d1c01b30038d38bf637b027a1df3 sometimes chokes
with the 8168evl. 

And you can ask him if there is a chance that the non-8168evl that are
handled by the patch (mis-)behave the same too.

-- 
Ueimor
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [3.8-rc] regression: NETDEV WATCHDOG: eth0 (r8169): transmit queue 0 timed out

2013-02-03 Thread Jörg Otte

2013/1/6 Jörg Otte :
> 2013/1/5 Francois Romieu :
>> Can you check if things improve with v3.8-rc2 after removing :
>>
>> 1. 9ecb9aabaf634677c77af467f4e3028b09d7bcda
>>r8169: workaround for missing extended GigaMAC registers
>> 2. d64ec841517a25f6d468bde9f67e5b4cffdc67c7
>>r8169: enable internal ASPM and clock request settings
>
> Doesn't help for this problem.
>
> However this fixes a second issue for me:
> In 3.7.1 at startup the link came up after 15 sec.:
> grep r8169 dmesg.3.7.1
> [1.956842] r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded
> [1.957059] r8169 :02:00.0: irq 42 for MSI/MSI-X
> [1.957161] r8169 :02:00.0 eth0: RTL8168evl/8111evl at..
> [1.957163] r8169 :02:00.0 eth0: jumbo features [frames..
> [   13.575452] r8169 :02:00.0 eth0: link down
> [   13.575475] r8169 :02:00.0 eth0: link down
> [   15.181317] r8169 :02:00.0 eth0: link up
>
> In 3.8rc the time increased to 24 seconds:
> grep r8169 dmesg.3.8.0
> [1.852546] r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded
> [1.852765] r8169 :02:00.0: irq 42 for MSI/MSI-X
> [1.852872] r8169 :02:00.0 eth0: RTL8168evl/8111evl at...
> [1.852874] r8169 :02:00.0 eth0: jumbo features [frames...
> [   14.150212] r8169 :02:00.0 eth0: link down
> [   14.150229] r8169 :02:00.0 eth0: link down
> [   24.140263] r8169 :02:00.0 eth0: link up
>
> But with this revert I get the old performance:
> dmesg | grep r8169
> [1.816613] r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded
> [1.816832] r8169 :02:00.0: irq 42 for MSI/MSI-X
> [1.816947] r8169 :02:00.0 eth0: RTL8168evl/8111evl at...
> [1.816948] r8169 :02:00.0 eth0: jumbo features [frames...
> [   13.986401] r8169 :02:00.0 eth0: link down
> [   13.986422] r8169 :02:00.0 eth0: link down
> [   15.623631] r8169 :02:00.0 eth0: link up
>
> Thus I recommend to revert this too.
>
>> 3. e0c075577965d1c01b30038d38bf637b027a1df3
>>r8169: enable ALDPS for power saving
>
> That's it! This fixes the problem for me!
>


We are closely before v3.8 and I didn't see a solution
so far.
What is the plan regarding this issue(s)?

Thanks, Jörg
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [3.8-rc] regression: NETDEV WATCHDOG: eth0 (r8169): transmit queue 0 timed out

2013-02-03 Thread Jörg Otte

2013/1/6 Jörg Otte jrg.o...@gmail.com:
 2013/1/5 Francois Romieu rom...@fr.zoreil.com:
 Can you check if things improve with v3.8-rc2 after removing :

 1. 9ecb9aabaf634677c77af467f4e3028b09d7bcda
r8169: workaround for missing extended GigaMAC registers
 2. d64ec841517a25f6d468bde9f67e5b4cffdc67c7
r8169: enable internal ASPM and clock request settings

 Doesn't help for this problem.

 However this fixes a second issue for me:
 In 3.7.1 at startup the link came up after 15 sec.:
 grep r8169 dmesg.3.7.1
 [1.956842] r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded
 [1.957059] r8169 :02:00.0: irq 42 for MSI/MSI-X
 [1.957161] r8169 :02:00.0 eth0: RTL8168evl/8111evl at..
 [1.957163] r8169 :02:00.0 eth0: jumbo features [frames..
 [   13.575452] r8169 :02:00.0 eth0: link down
 [   13.575475] r8169 :02:00.0 eth0: link down
 [   15.181317] r8169 :02:00.0 eth0: link up

 In 3.8rc the time increased to 24 seconds:
 grep r8169 dmesg.3.8.0
 [1.852546] r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded
 [1.852765] r8169 :02:00.0: irq 42 for MSI/MSI-X
 [1.852872] r8169 :02:00.0 eth0: RTL8168evl/8111evl at...
 [1.852874] r8169 :02:00.0 eth0: jumbo features [frames...
 [   14.150212] r8169 :02:00.0 eth0: link down
 [   14.150229] r8169 :02:00.0 eth0: link down
 [   24.140263] r8169 :02:00.0 eth0: link up

 But with this revert I get the old performance:
 dmesg | grep r8169
 [1.816613] r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded
 [1.816832] r8169 :02:00.0: irq 42 for MSI/MSI-X
 [1.816947] r8169 :02:00.0 eth0: RTL8168evl/8111evl at...
 [1.816948] r8169 :02:00.0 eth0: jumbo features [frames...
 [   13.986401] r8169 :02:00.0 eth0: link down
 [   13.986422] r8169 :02:00.0 eth0: link down
 [   15.623631] r8169 :02:00.0 eth0: link up

 Thus I recommend to revert this too.

 3. e0c075577965d1c01b30038d38bf637b027a1df3
r8169: enable ALDPS for power saving

 That's it! This fixes the problem for me!



We are closely before v3.8 and I didn't see a solution
so far.
What is the plan regarding this issue(s)?

Thanks, Jörg
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [3.8-rc] regression: NETDEV WATCHDOG: eth0 (r8169): transmit queue 0 timed out

2013-01-06 Thread Jörg Otte

2013/1/5 Francois Romieu :
> Can you check if things improve with v3.8-rc2 after removing :
>
> 1. 9ecb9aabaf634677c77af467f4e3028b09d7bcda
>r8169: workaround for missing extended GigaMAC registers
> 2. d64ec841517a25f6d468bde9f67e5b4cffdc67c7
>r8169: enable internal ASPM and clock request settings

Doesn't help for this problem.

However this fixes a second issue for me:
In 3.7.1 at startup the link came up after 15 sec.:
grep r8169 dmesg.3.7.1
[1.956842] r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded
[1.957059] r8169 :02:00.0: irq 42 for MSI/MSI-X
[1.957161] r8169 :02:00.0 eth0: RTL8168evl/8111evl at..
[1.957163] r8169 :02:00.0 eth0: jumbo features [frames..
[   13.575452] r8169 :02:00.0 eth0: link down
[   13.575475] r8169 :02:00.0 eth0: link down
[   15.181317] r8169 :02:00.0 eth0: link up

In 3.8rc the time increased to 24 seconds:
grep r8169 dmesg.3.8.0
[1.852546] r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded
[1.852765] r8169 :02:00.0: irq 42 for MSI/MSI-X
[1.852872] r8169 :02:00.0 eth0: RTL8168evl/8111evl at...
[1.852874] r8169 :02:00.0 eth0: jumbo features [frames...
[   14.150212] r8169 :02:00.0 eth0: link down
[   14.150229] r8169 :02:00.0 eth0: link down
[   24.140263] r8169 :02:00.0 eth0: link up

But with this revert I get the old performance:
dmesg | grep r8169
[1.816613] r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded
[1.816832] r8169 :02:00.0: irq 42 for MSI/MSI-X
[1.816947] r8169 :02:00.0 eth0: RTL8168evl/8111evl at...
[1.816948] r8169 :02:00.0 eth0: jumbo features [frames...
[   13.986401] r8169 :02:00.0 eth0: link down
[   13.986422] r8169 :02:00.0 eth0: link down
[   15.623631] r8169 :02:00.0 eth0: link up

Thus I recommend to revert this too.

> 3. e0c075577965d1c01b30038d38bf637b027a1df3
>r8169: enable ALDPS for power saving

That's it! This fixes the problem for me!


Thanks, Jörg
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [3.8-rc] regression: NETDEV WATCHDOG: eth0 (r8169): transmit queue 0 timed out

2013-01-06 Thread Jörg Otte

2013/1/5 Francois Romieu rom...@fr.zoreil.com:
 Can you check if things improve with v3.8-rc2 after removing :

 1. 9ecb9aabaf634677c77af467f4e3028b09d7bcda
r8169: workaround for missing extended GigaMAC registers
 2. d64ec841517a25f6d468bde9f67e5b4cffdc67c7
r8169: enable internal ASPM and clock request settings

Doesn't help for this problem.

However this fixes a second issue for me:
In 3.7.1 at startup the link came up after 15 sec.:
grep r8169 dmesg.3.7.1
[1.956842] r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded
[1.957059] r8169 :02:00.0: irq 42 for MSI/MSI-X
[1.957161] r8169 :02:00.0 eth0: RTL8168evl/8111evl at..
[1.957163] r8169 :02:00.0 eth0: jumbo features [frames..
[   13.575452] r8169 :02:00.0 eth0: link down
[   13.575475] r8169 :02:00.0 eth0: link down
[   15.181317] r8169 :02:00.0 eth0: link up

In 3.8rc the time increased to 24 seconds:
grep r8169 dmesg.3.8.0
[1.852546] r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded
[1.852765] r8169 :02:00.0: irq 42 for MSI/MSI-X
[1.852872] r8169 :02:00.0 eth0: RTL8168evl/8111evl at...
[1.852874] r8169 :02:00.0 eth0: jumbo features [frames...
[   14.150212] r8169 :02:00.0 eth0: link down
[   14.150229] r8169 :02:00.0 eth0: link down
[   24.140263] r8169 :02:00.0 eth0: link up

But with this revert I get the old performance:
dmesg | grep r8169
[1.816613] r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded
[1.816832] r8169 :02:00.0: irq 42 for MSI/MSI-X
[1.816947] r8169 :02:00.0 eth0: RTL8168evl/8111evl at...
[1.816948] r8169 :02:00.0 eth0: jumbo features [frames...
[   13.986401] r8169 :02:00.0 eth0: link down
[   13.986422] r8169 :02:00.0 eth0: link down
[   15.623631] r8169 :02:00.0 eth0: link up

Thus I recommend to revert this too.

 3. e0c075577965d1c01b30038d38bf637b027a1df3
r8169: enable ALDPS for power saving

That's it! This fixes the problem for me!


Thanks, Jörg
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [3.8-rc] regression: NETDEV WATCHDOG: eth0 (r8169): transmit queue 0 timed out

2013-01-05 Thread Francois Romieu

Jörg Otte  :
[...]
> jojo@ahorn:~$ dmesg | grep XID
> [1.808847] r8169 :02:00.0 eth0: RTL8168evl/8111evl at
> 0xc9054000, 5c:9a:d8:69:2b:39, XID 0c900800 IRQ 42

Can you check if things improve with v3.8-rc2 after removing :

1. 9ecb9aabaf634677c77af467f4e3028b09d7bcda 
   r8169: workaround for missing extended GigaMAC registers
2. d64ec841517a25f6d468bde9f67e5b4cffdc67c7
   r8169: enable internal ASPM and clock request settings
3. e0c075577965d1c01b30038d38bf637b027a1df3
   r8169: enable ALDPS for power saving

(you can directly try v3.7 r8169.c with v3.8-rc2 if it worked for you
so far) 

If the regression is still there, please apply the patch below to both
v3.8-rc2 unpatched and a known working version then send me their dmesg
after you 'ip link set dev eth0 up'.

diff --git a/drivers/net/ethernet/realtek/r8169.c 
b/drivers/net/ethernet/realtek/r8169.c
index ed96f30..3d2d2446 100644
--- a/drivers/net/ethernet/realtek/r8169.c
+++ b/drivers/net/ethernet/realtek/r8169.c
@@ -90,10 +90,28 @@ static const int multicast_filter_limit = 32;
 #define RTL8169_TX_TIMEOUT (6*HZ)
 #define RTL8169_PHY_TIMEOUT(10*HZ)
 
+static void rw8(void __iomem *ioaddr, u8 b)
+{
+   printk(KERN_DEBUG PFX "w %p %02x\n", ioaddr, b);
+   writeb(b, ioaddr);
+}
+
+static void rw16(void __iomem *ioaddr, u16 w)
+{
+   printk(KERN_DEBUG PFX "w %p %04x\n", ioaddr, w);
+   writew(w, ioaddr);
+}
+
+static void rw32(void __iomem *ioaddr, u32 d)
+{
+   printk(KERN_DEBUG PFX "w %p %08x\n", ioaddr, d);
+   writel(d, ioaddr);
+}
+
 /* write/read MMIO register */
-#define RTL_W8(reg, val8)  writeb ((val8), ioaddr + (reg))
-#define RTL_W16(reg, val16)writew ((val16), ioaddr + (reg))
-#define RTL_W32(reg, val32)writel ((val32), ioaddr + (reg))
+#define RTL_W8(reg, val8)  rw8(ioaddr + (reg), (val8))
+#define RTL_W16(reg, val16)rw16(ioaddr + (reg), (val16))
+#define RTL_W32(reg, val32)rw32(ioaddr + (reg), (val32))
 #define RTL_R8(reg)readb (ioaddr + (reg))
 #define RTL_R16(reg)   readw (ioaddr + (reg))
 #define RTL_R32(reg)   readl (ioaddr + (reg))
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [3.8-rc] regression: NETDEV WATCHDOG: eth0 (r8169): transmit queue 0 timed out

2013-01-05 Thread Jörg Otte

2013/1/5 Francois Romieu :
> Jörg Otte  :
> [...]
>> It's a regression, it never happend before 3.8-rc.
>
> Please check that 'dmesg | grep XID' exhibits a 8168evl.

jojo@ahorn:~$ dmesg | grep XID
[1.808847] r8169 :02:00.0 eth0: RTL8168evl/8111evl at
0xc9054000, 5c:9a:d8:69:2b:39, XID 0c900800 IRQ 42
jojo@ahorn:~$

>
> I'll showe and dig it. It's epidemic.
>

Thanks, Jörg
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [3.8-rc] regression: NETDEV WATCHDOG: eth0 (r8169): transmit queue 0 timed out

2013-01-05 Thread Francois Romieu

Jörg Otte  :
[...]
> It's a regression, it never happend before 3.8-rc.

Please check that 'dmesg | grep XID' exhibits a 8168evl.

I'll showe and dig it. It's epidemic.

-- 
Ueimor
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[3.8-rc] regression: NETDEV WATCHDOG: eth0 (r8169): transmit queue 0 timed out

2013-01-05 Thread Jörg Otte

I frequently see the following in the syslog:

[  184.552914] [ cut here ]
[  184.552927] WARNING: at
/data/kernel/linux/net/sched/sch_generic.c:254
dev_watchdog+0xf2/0x151()
[  184.552929] Hardware name: LIFEBOOK AH532
[  184.552932] NETDEV WATCHDOG: eth0 (r8169): transmit queue 0 timed out
[  184.552937] Pid: 0, comm: swapper/1 Not tainted
3.8.0-rc2-b11-00221-gd1c3ed6 #15
[  184.552939] Call Trace:
[  184.552941][] ? dev_watchdog+0xf2/0x151
[  184.552953]  [] ? warn_slowpath_common+0x73/0x87
[  184.552956]  [] ? netif_tx_unlock+0x49/0x49
[  184.552961]  [] ? warn_slowpath_fmt+0x45/0x4a
[  184.552967]  [] ? netif_tx_lock+0x40/0x75
[  184.552971]  [] ? dev_watchdog+0xf2/0x151
[  184.552977]  [] ? call_timer_fn.isra.32+0x1d/0x73
[  184.552981]  [] ? run_timer_softirq+0x154/0x194
[  184.552988]  [] ? timekeeping_get_ns.constprop.6+0xd/0x31
[  184.552992]  [] ? __do_softirq+0x96/0x139
[  184.552997]  [] ? call_softirq+0x1c/0x26
[  184.553002]  [] ? do_softirq+0x2e/0x62
[  184.553006]  [] ? irq_exit+0x3d/0x98
[  184.553011]  [] ? smp_apic_timer_interrupt+0x73/0x80
[  184.553018]  [] ? apic_timer_interrupt+0x6a/0x70
[  184.553020][] ? cpuidle_wrap_enter+0x38/0x69
[  184.553033]  [] ? cpuidle_wrap_enter+0x34/0x69
[  184.553039]  [] ? cpuidle_enter_state+0xa/0x31
[  184.553044]  [] ? cpuidle_idle_call+0x99/0xb9
[  184.553050]  [] ? cpu_idle+0x99/0xe0
[  184.553056]  [] ? start_secondary+0x1d6/0x1dc
[  184.553059] ---[ end trace 54db26a54b22f673 ]---
[  184.587487] r8169 :02:00.0 eth0: link up

It's a regression, it never happend before 3.8-rc.

-- Jörg
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[3.8-rc] regression: NETDEV WATCHDOG: eth0 (r8169): transmit queue 0 timed out

2013-01-05 Thread Jörg Otte

I frequently see the following in the syslog:

[  184.552914] [ cut here ]
[  184.552927] WARNING: at
/data/kernel/linux/net/sched/sch_generic.c:254
dev_watchdog+0xf2/0x151()
[  184.552929] Hardware name: LIFEBOOK AH532
[  184.552932] NETDEV WATCHDOG: eth0 (r8169): transmit queue 0 timed out
[  184.552937] Pid: 0, comm: swapper/1 Not tainted
3.8.0-rc2-b11-00221-gd1c3ed6 #15
[  184.552939] Call Trace:
[  184.552941]  IRQ  [8138d4a2] ? dev_watchdog+0xf2/0x151
[  184.552953]  [81025c8a] ? warn_slowpath_common+0x73/0x87
[  184.552956]  [8138d3b0] ? netif_tx_unlock+0x49/0x49
[  184.552961]  [81025d02] ? warn_slowpath_fmt+0x45/0x4a
[  184.552967]  [8138d332] ? netif_tx_lock+0x40/0x75
[  184.552971]  [8138d4a2] ? dev_watchdog+0xf2/0x151
[  184.552977]  [8102f1a1] ? call_timer_fn.isra.32+0x1d/0x73
[  184.552981]  [8102f34b] ? run_timer_softirq+0x154/0x194
[  184.552988]  [8104cb84] ? timekeeping_get_ns.constprop.6+0xd/0x31
[  184.552992]  [8102b4a5] ? __do_softirq+0x96/0x139
[  184.552997]  [8146b00c] ? call_softirq+0x1c/0x26
[  184.553002]  [81003cf4] ? do_softirq+0x2e/0x62
[  184.553006]  [8102b615] ? irq_exit+0x3d/0x98
[  184.553011]  [810184ad] ? smp_apic_timer_interrupt+0x73/0x80
[  184.553018]  [8146aa0a] ? apic_timer_interrupt+0x6a/0x70
[  184.553020]  EOI  [81326f2b] ? cpuidle_wrap_enter+0x38/0x69
[  184.553033]  [81326f27] ? cpuidle_wrap_enter+0x34/0x69
[  184.553039]  [81326d81] ? cpuidle_enter_state+0xa/0x31
[  184.553044]  [81326e41] ? cpuidle_idle_call+0x99/0xb9
[  184.553050]  [81009059] ? cpu_idle+0x99/0xe0
[  184.553056]  [8145e3a4] ? start_secondary+0x1d6/0x1dc
[  184.553059] ---[ end trace 54db26a54b22f673 ]---
[  184.587487] r8169 :02:00.0 eth0: link up

It's a regression, it never happend before 3.8-rc.

-- Jörg
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [3.8-rc] regression: NETDEV WATCHDOG: eth0 (r8169): transmit queue 0 timed out

2013-01-05 Thread Francois Romieu

Jörg Otte jrg.o...@gmail.com :
[...]
 It's a regression, it never happend before 3.8-rc.

Please check that 'dmesg | grep XID' exhibits a 8168evl.

I'll showe and dig it. It's epidemic.

-- 
Ueimor
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [3.8-rc] regression: NETDEV WATCHDOG: eth0 (r8169): transmit queue 0 timed out

2013-01-05 Thread Jörg Otte

2013/1/5 Francois Romieu rom...@fr.zoreil.com:
 Jörg Otte jrg.o...@gmail.com :
 [...]
 It's a regression, it never happend before 3.8-rc.

 Please check that 'dmesg | grep XID' exhibits a 8168evl.

jojo@ahorn:~$ dmesg | grep XID
[1.808847] r8169 :02:00.0 eth0: RTL8168evl/8111evl at
0xc9054000, 5c:9a:d8:69:2b:39, XID 0c900800 IRQ 42
jojo@ahorn:~$


 I'll showe and dig it. It's epidemic.


Thanks, Jörg
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [3.8-rc] regression: NETDEV WATCHDOG: eth0 (r8169): transmit queue 0 timed out

2013-01-05 Thread Francois Romieu

Jörg Otte jrg.o...@gmail.com :
[...]
 jojo@ahorn:~$ dmesg | grep XID
 [1.808847] r8169 :02:00.0 eth0: RTL8168evl/8111evl at
 0xc9054000, 5c:9a:d8:69:2b:39, XID 0c900800 IRQ 42

Can you check if things improve with v3.8-rc2 after removing :

1. 9ecb9aabaf634677c77af467f4e3028b09d7bcda 
   r8169: workaround for missing extended GigaMAC registers
2. d64ec841517a25f6d468bde9f67e5b4cffdc67c7
   r8169: enable internal ASPM and clock request settings
3. e0c075577965d1c01b30038d38bf637b027a1df3
   r8169: enable ALDPS for power saving

(you can directly try v3.7 r8169.c with v3.8-rc2 if it worked for you
so far) 

If the regression is still there, please apply the patch below to both
v3.8-rc2 unpatched and a known working version then send me their dmesg
after you 'ip link set dev eth0 up'.

diff --git a/drivers/net/ethernet/realtek/r8169.c 
b/drivers/net/ethernet/realtek/r8169.c
index ed96f30..3d2d2446 100644
--- a/drivers/net/ethernet/realtek/r8169.c
+++ b/drivers/net/ethernet/realtek/r8169.c
@@ -90,10 +90,28 @@ static const int multicast_filter_limit = 32;
 #define RTL8169_TX_TIMEOUT (6*HZ)
 #define RTL8169_PHY_TIMEOUT(10*HZ)
 
+static void rw8(void __iomem *ioaddr, u8 b)
+{
+   printk(KERN_DEBUG PFX w %p %02x\n, ioaddr, b);
+   writeb(b, ioaddr);
+}
+
+static void rw16(void __iomem *ioaddr, u16 w)
+{
+   printk(KERN_DEBUG PFX w %p %04x\n, ioaddr, w);
+   writew(w, ioaddr);
+}
+
+static void rw32(void __iomem *ioaddr, u32 d)
+{
+   printk(KERN_DEBUG PFX w %p %08x\n, ioaddr, d);
+   writel(d, ioaddr);
+}
+
 /* write/read MMIO register */
-#define RTL_W8(reg, val8)  writeb ((val8), ioaddr + (reg))
-#define RTL_W16(reg, val16)writew ((val16), ioaddr + (reg))
-#define RTL_W32(reg, val32)writel ((val32), ioaddr + (reg))
+#define RTL_W8(reg, val8)  rw8(ioaddr + (reg), (val8))
+#define RTL_W16(reg, val16)rw16(ioaddr + (reg), (val16))
+#define RTL_W32(reg, val32)rw32(ioaddr + (reg), (val32))
 #define RTL_R8(reg)readb (ioaddr + (reg))
 #define RTL_R16(reg)   readw (ioaddr + (reg))
 #define RTL_R32(reg)   readl (ioaddr + (reg))
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: NETDEV WATCHDOG (3Com 3c905 adapter)

2007-11-24 Thread BERTRAND Joël


Steffen Klassert wrote:

On Fri, Nov 23, 2007 at 04:52:39PM +0100, BERTRAND Joël wrote:

BERTRAND Joël wrote:

   Hello,

   Since I have installed a 2.6.23.1 linux kernel on my U60, I can see 
several NETDEV WATCHDOG. This trouble never occurs with 2.6.23-rc4.

This bug occurs after a random uptime.
	I have made the same constation this evening on a amd64/up with two 
3C905 and a 2.6.21.3 linux kernel... This evening, I have rebooted my 
U60 with a 2.6.23.8 kernel and the same 3C905 runs fine. Wait and see 
(but I suspect a kernel bug...).


For main linux kernel list:


End of dmesg :

NETDEV WATCHDOG: eth2: transmit timed out
eth2: transmit timed out, tx_status 00 status 8601.
 diagnostics: net 0ccc media 8880 dma 003a fifo 
eth2: Interrupt posted but not delivered -- IRQ blocked by another device?


This looks like a problem that several network drivers had recently.
The problem was genirq related and should be fixed in 2.6.23,
so I'm surprised that this happens with 2.6.23.1 but not with 2.6.23-rc4.


	I can confirm that my U60 worked fine sixty days with 2.6.23-rc4 
without any trouble. With 2.6.23.1, 3com NIC hangs after three or four 
days...


Regards,

JKB
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: NETDEV WATCHDOG (3Com 3c905 adapter)

2007-11-24 Thread Steffen Klassert

On Fri, Nov 23, 2007 at 04:52:39PM +0100, BERTRAND Joël wrote:
> BERTRAND Joël wrote:
> >Hello,
> >
> >Since I have installed a 2.6.23.1 linux kernel on my U60, I can see 
> >several NETDEV WATCHDOG. This trouble never occurs with 2.6.23-rc4.
> >This bug occurs after a random uptime.
> 
>   I have made the same constation this evening on a amd64/up with two 
> 3C905 and a 2.6.21.3 linux kernel... This evening, I have rebooted my 
> U60 with a 2.6.23.8 kernel and the same 3C905 runs fine. Wait and see 
> (but I suspect a kernel bug...).
> 
>   For main linux kernel list:
> 
> >End of dmesg :
> >
> >NETDEV WATCHDOG: eth2: transmit timed out
> >eth2: transmit timed out, tx_status 00 status 8601.
> >  diagnostics: net 0ccc media 8880 dma 003a fifo 
> >eth2: Interrupt posted but not delivered -- IRQ blocked by another device?

This looks like a problem that several network drivers had recently.
The problem was genirq related and should be fixed in 2.6.23,
so I'm surprised that this happens with 2.6.23.1 but not with 2.6.23-rc4.

Steffen
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: NETDEV WATCHDOG (3Com 3c905 adapter)

2007-11-24 Thread Steffen Klassert

On Fri, Nov 23, 2007 at 04:52:39PM +0100, BERTRAND Joël wrote:
 BERTRAND Joël wrote:
 Hello,
 
 Since I have installed a 2.6.23.1 linux kernel on my U60, I can see 
 several NETDEV WATCHDOG. This trouble never occurs with 2.6.23-rc4.
 This bug occurs after a random uptime.
 
   I have made the same constation this evening on a amd64/up with two 
 3C905 and a 2.6.21.3 linux kernel... This evening, I have rebooted my 
 U60 with a 2.6.23.8 kernel and the same 3C905 runs fine. Wait and see 
 (but I suspect a kernel bug...).
 
   For main linux kernel list:
 
 End of dmesg :
 
 NETDEV WATCHDOG: eth2: transmit timed out
 eth2: transmit timed out, tx_status 00 status 8601.
   diagnostics: net 0ccc media 8880 dma 003a fifo 
 eth2: Interrupt posted but not delivered -- IRQ blocked by another device?

This looks like a problem that several network drivers had recently.
The problem was genirq related and should be fixed in 2.6.23,
so I'm surprised that this happens with 2.6.23.1 but not with 2.6.23-rc4.

Steffen
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: NETDEV WATCHDOG (3Com 3c905 adapter)

2007-11-24 Thread BERTRAND Joël


Steffen Klassert wrote:

On Fri, Nov 23, 2007 at 04:52:39PM +0100, BERTRAND Joël wrote:

BERTRAND Joël wrote:

   Hello,

   Since I have installed a 2.6.23.1 linux kernel on my U60, I can see 
several NETDEV WATCHDOG. This trouble never occurs with 2.6.23-rc4.

This bug occurs after a random uptime.
	I have made the same constation this evening on a amd64/up with two 
3C905 and a 2.6.21.3 linux kernel... This evening, I have rebooted my 
U60 with a 2.6.23.8 kernel and the same 3C905 runs fine. Wait and see 
(but I suspect a kernel bug...).


For main linux kernel list:


End of dmesg :

NETDEV WATCHDOG: eth2: transmit timed out
eth2: transmit timed out, tx_status 00 status 8601.
 diagnostics: net 0ccc media 8880 dma 003a fifo 
eth2: Interrupt posted but not delivered -- IRQ blocked by another device?


This looks like a problem that several network drivers had recently.
The problem was genirq related and should be fixed in 2.6.23,
so I'm surprised that this happens with 2.6.23.1 but not with 2.6.23-rc4.


	I can confirm that my U60 worked fine sixty days with 2.6.23-rc4 
without any trouble. With 2.6.23.1, 3com NIC hangs after three or four 
days...


Regards,

JKB
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: NETDEV WATCHDOG (3Com 3c905 adapter)

2007-11-23 Thread BERTRAND Joël


BERTRAND Joël wrote:

Hello,

Since I have installed a 2.6.23.1 linux kernel on my U60, I can see 
several NETDEV WATCHDOG. This trouble never occurs with 2.6.23-rc4.

This bug occurs after a random uptime.


	I have made the same constation this evening on a amd64/up with two 
3C905 and a 2.6.21.3 linux kernel... This evening, I have rebooted my 
U60 with a 2.6.23.8 kernel and the same 3C905 runs fine. Wait and see 
(but I suspect a kernel bug...).


For main linux kernel list:


End of dmesg :

NETDEV WATCHDOG: eth2: transmit timed out
eth2: transmit timed out, tx_status 00 status 8601.
  diagnostics: net 0ccc media 8880 dma 003a fifo 
eth2: Interrupt posted but not delivered -- IRQ blocked by another device?
  Flags; bus-master 1, dirty 144(0) current 144(0)
  Transmit list  vs. f800a06b0200.
  0: @f800a06b0200  length 804a status 0001004a
  1: @f800a06b0260  length 8062 status 00010062
  2: @f800a06b02c0  length 8062 status 00010062
  3: @f800a06b0320  length 8062 status 00010062
  4: @f800a06b0380  length 8062 status 00010062
  5: @f800a06b03e0  length 8062 status 00010062
  6: @f800a06b0440  length 8062 status 00010062
  7: @f800a06b04a0  length 8062 status 00010062
  8: @f800a06b0500  length 8062 status 00010062
  9: @f800a06b0560  length 8062 status 00010062
  10: @f800a06b05c0  length 8062 status 00010062
  11: @f800a06b0620  length 8062 status 00010062
  12: @f800a06b0680  length 8062 status 00010062
  13: @f800a06b06e0  length 804a status 0001004a
  14: @f800a06b0740  length 8062 status 80010062
  15: @f800a06b07a0  length 8062 status 80010062
eth2: Resetting the Tx ring pointer.
eth2:  setting full-duplex.

Root rayleigh:[/proc] > cat interrupts
   CPU0   CPU2
  0:  318071624  318071584   timer
  1:  0  0  sun4u  PSYCHO_PCIERR
  2:  0  0  sun4u  PSYCHO_UE
  3:  0  0  sun4u  PSYCHO_CE
  8: 559916  0  sun4u  su(kbd)
  9:  04170845  sun4u  su(mouse)
 10:  0  0  sun4u  parport0
 11:  2  0  sun4u  floppy
 12:  0  0  sun4u  cs4231(capture)
 13: 789858  0  sun4u  cs4231(play)
 14:   1329   17123911  sun4u  eth0
 15:  0   15905664  sun4u  sym53c8xx
 16: 30  0  sun4u  sym53c8xx
 17: 364465 223693  sun4u  eth2
 18:   11043701  0  sun4u  aic7xxx
 19:  1  50585  sun4u  ohci_hcd:usb2
 20:  0  0  sun4u  ohci_hcd:usb3
 21:  1  1  sun4u  ehci_hcd:usb1
 22:  0  0  sun4u  PSYCHO_PCIERR
 24:   17346674332  sun4u  eth1
Root rayleigh:[/proc] > lspci
:00:00.0 Host bridge: Sun Microsystems Computer Corp. Psycho PCI Bus 
Module

:00:01.0 Bridge: Sun Microsystems Computer Corp. EBUS (rev 01)
:00:01.1 Ethernet controller: Sun Microsystems Computer Corp. Happy 
Meal (rev 01)
:00:02.0 Ethernet controller: 3Com Corporation 3c905C-TX/TX-M 
[Tornado] (rev 78)
:00:03.0 SCSI storage controller: LSI Logic / Symbios Logic 53c875 
(rev 14)
:00:03.1 SCSI storage controller: LSI Logic / Symbios Logic 53c875 
(rev 14)

:00:04.0 SCSI storage controller: Adaptec AIC-7892A U160/m (rev 02)
:00:05.0 USB Controller: NEC Corporation USB (rev 43)
:00:05.1 USB Controller: NEC Corporation USB (rev 43)
:00:05.2 USB Controller: NEC Corporation USB 2.0 (rev 04)
0001:00:00.0 Host bridge: Sun Microsystems Computer Corp. Psycho PCI Bus 
Module

0001:80:01.0 Bridge: Sun Microsystems Computer Corp. EBUS (rev 01)
0001:80:01.1 Ethernet controller: Sun Microsystems Computer Corp. Happy 
Meal (rev 01)

Root rayleigh:[/proc] > cat irq/17/smp_affinity
f
Root rayleigh:[/proc/net] > cat dev
Inter-|   Receive|  
Transmit
 face |bytespackets errs drop fifo frame compressed multicast|bytes 
   packets errs drop fifo colls carrier compressed
  eth1:9421385785 9787261000 0  0 0 
2418854910 7465439000 0   0  0
  eth0:3977175515 8320824000 0  0 0 
8252404266 9204851000 0   0  0
  eth2:47410723  34451500   51 0  0 0 
31875309  278022  87200 0   0  0
lo:240810303 1749211000 0  0 0 
240810303 1749211000 0   0  0


Regards,

JKB

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: NETDEV WATCHDOG (3Com 3c905 adapter)

2007-11-23 Thread BERTRAND Joël


BERTRAND Joël wrote:

Hello,

Since I have installed a 2.6.23.1 linux kernel on my U60, I can see 
several NETDEV WATCHDOG. This trouble never occurs with 2.6.23-rc4.

This bug occurs after a random uptime.


	I have made the same constation this evening on a amd64/up with two 
3C905 and a 2.6.21.3 linux kernel... This evening, I have rebooted my 
U60 with a 2.6.23.8 kernel and the same 3C905 runs fine. Wait and see 
(but I suspect a kernel bug...).


For main linux kernel list:


End of dmesg :

NETDEV WATCHDOG: eth2: transmit timed out
eth2: transmit timed out, tx_status 00 status 8601.
  diagnostics: net 0ccc media 8880 dma 003a fifo 
eth2: Interrupt posted but not delivered -- IRQ blocked by another device?
  Flags; bus-master 1, dirty 144(0) current 144(0)
  Transmit list  vs. f800a06b0200.
  0: @f800a06b0200  length 804a status 0001004a
  1: @f800a06b0260  length 8062 status 00010062
  2: @f800a06b02c0  length 8062 status 00010062
  3: @f800a06b0320  length 8062 status 00010062
  4: @f800a06b0380  length 8062 status 00010062
  5: @f800a06b03e0  length 8062 status 00010062
  6: @f800a06b0440  length 8062 status 00010062
  7: @f800a06b04a0  length 8062 status 00010062
  8: @f800a06b0500  length 8062 status 00010062
  9: @f800a06b0560  length 8062 status 00010062
  10: @f800a06b05c0  length 8062 status 00010062
  11: @f800a06b0620  length 8062 status 00010062
  12: @f800a06b0680  length 8062 status 00010062
  13: @f800a06b06e0  length 804a status 0001004a
  14: @f800a06b0740  length 8062 status 80010062
  15: @f800a06b07a0  length 8062 status 80010062
eth2: Resetting the Tx ring pointer.
eth2:  setting full-duplex.

Root rayleigh:[/proc]  cat interrupts
   CPU0   CPU2
  0:  318071624  318071584 NULL  timer
  1:  0  0  sun4u  PSYCHO_PCIERR
  2:  0  0  sun4u  PSYCHO_UE
  3:  0  0  sun4u  PSYCHO_CE
  8: 559916  0  sun4u  su(kbd)
  9:  04170845  sun4u  su(mouse)
 10:  0  0  sun4u  parport0
 11:  2  0  sun4u  floppy
 12:  0  0  sun4u  cs4231(capture)
 13: 789858  0  sun4u  cs4231(play)
 14:   1329   17123911  sun4u  eth0
 15:  0   15905664  sun4u  sym53c8xx
 16: 30  0  sun4u  sym53c8xx
 17: 364465 223693  sun4u  eth2
 18:   11043701  0  sun4u  aic7xxx
 19:  1  50585  sun4u  ohci_hcd:usb2
 20:  0  0  sun4u  ohci_hcd:usb3
 21:  1  1  sun4u  ehci_hcd:usb1
 22:  0  0  sun4u  PSYCHO_PCIERR
 24:   17346674332  sun4u  eth1
Root rayleigh:[/proc]  lspci
:00:00.0 Host bridge: Sun Microsystems Computer Corp. Psycho PCI Bus 
Module

:00:01.0 Bridge: Sun Microsystems Computer Corp. EBUS (rev 01)
:00:01.1 Ethernet controller: Sun Microsystems Computer Corp. Happy 
Meal (rev 01)
:00:02.0 Ethernet controller: 3Com Corporation 3c905C-TX/TX-M 
[Tornado] (rev 78)
:00:03.0 SCSI storage controller: LSI Logic / Symbios Logic 53c875 
(rev 14)
:00:03.1 SCSI storage controller: LSI Logic / Symbios Logic 53c875 
(rev 14)

:00:04.0 SCSI storage controller: Adaptec AIC-7892A U160/m (rev 02)
:00:05.0 USB Controller: NEC Corporation USB (rev 43)
:00:05.1 USB Controller: NEC Corporation USB (rev 43)
:00:05.2 USB Controller: NEC Corporation USB 2.0 (rev 04)
0001:00:00.0 Host bridge: Sun Microsystems Computer Corp. Psycho PCI Bus 
Module

0001:80:01.0 Bridge: Sun Microsystems Computer Corp. EBUS (rev 01)
0001:80:01.1 Ethernet controller: Sun Microsystems Computer Corp. Happy 
Meal (rev 01)

Root rayleigh:[/proc]  cat irq/17/smp_affinity
f
Root rayleigh:[/proc/net]  cat dev
Inter-|   Receive|  
Transmit
 face |bytespackets errs drop fifo frame compressed multicast|bytes 
   packets errs drop fifo colls carrier compressed
  eth1:9421385785 9787261000 0  0 0 
2418854910 7465439000 0   0  0
  eth0:3977175515 8320824000 0  0 0 
8252404266 9204851000 0   0  0
  eth2:47410723  34451500   51 0  0 0 
31875309  278022  87200 0   0  0
lo:240810303 1749211000 0  0 0 
240810303 1749211000 0   0  0


Regards,

JKB

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: PROBLEM: 2.6.23-rc "NETDEV WATCHDOG: eth0: transmit timed out"

2007-10-01 Thread Karl Meyer

Hi,

after reading about issues with the nics on kontron boards I did a
bios upgrade,
but this did not change anything.
However, yesterday the nic (onboard) I used died. No link at all,
after switching to
the next onboard  nic I got a NETDEV transmit timeout with that one on
kernel 2.6.22-r2.
It seems the whole thing is a hardware issue. I will try to figure out
with kontron.

Sorry :(

Karl

2007/9/12, Francois Romieu <[EMAIL PROTECTED]>:
> Karl Meyer <[EMAIL PROTECTED]> :
> [...]
> > am am looking for this issue for some time now, but there where no
> > errors in 2.6.22-r2 (gentoo speak, I guess this is 2.6.22.2
> > officially), I also ran git-bisect (for more information see the older
> > messages in this thread).
>
> 2.6.22-r2 in gentoo is based on 2.6.22.1. It is way before
> 0e4851502f846b13b29b7f88f1250c980d57e944 that you reported to work.
> Thus it is not surprizing that it works.
>
> Any update regarding the patchkit that I sent on 2007/08/16 ?
>
> It would help to narrow the culprit.
>
> --
> Ueimor
>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: PROBLEM: 2.6.23-rc NETDEV WATCHDOG: eth0: transmit timed out

2007-10-01 Thread Karl Meyer

Hi,

after reading about issues with the nics on kontron boards I did a
bios upgrade,
but this did not change anything.
However, yesterday the nic (onboard) I used died. No link at all,
after switching to
the next onboard  nic I got a NETDEV transmit timeout with that one on
kernel 2.6.22-r2.
It seems the whole thing is a hardware issue. I will try to figure out
with kontron.

Sorry :(

Karl

2007/9/12, Francois Romieu [EMAIL PROTECTED]:
 Karl Meyer [EMAIL PROTECTED] :
 [...]
  am am looking for this issue for some time now, but there where no
  errors in 2.6.22-r2 (gentoo speak, I guess this is 2.6.22.2
  officially), I also ran git-bisect (for more information see the older
  messages in this thread).

 2.6.22-r2 in gentoo is based on 2.6.22.1. It is way before
 0e4851502f846b13b29b7f88f1250c980d57e944 that you reported to work.
 Thus it is not surprizing that it works.

 Any update regarding the patchkit that I sent on 2007/08/16 ?

 It would help to narrow the culprit.

 --
 Ueimor

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: PROBLEM: 2.6.23-rc "NETDEV WATCHDOG: eth0: transmit timed out"

2007-09-26 Thread Karl Meyer

Hi Francois,

this is what I found and sent:

The error exists from patch 2 on. I did some network testing with
patch 1 and currently use it and have no errors so far.
>From my experiences up to now patch 1 should be error free.

Do you need additional info?

2007/9/12, Francois Romieu <[EMAIL PROTECTED]>:
> Karl Meyer <[EMAIL PROTECTED]> :
> [...]
> > am am looking for this issue for some time now, but there where no
> > errors in 2.6.22-r2 (gentoo speak, I guess this is 2.6.22.2
> > officially), I also ran git-bisect (for more information see the older
> > messages in this thread).
>
> 2.6.22-r2 in gentoo is based on 2.6.22.1. It is way before
> 0e4851502f846b13b29b7f88f1250c980d57e944 that you reported to work.
> Thus it is not surprizing that it works.
>
> Any update regarding the patchkit that I sent on 2007/08/16 ?
>
> It would help to narrow the culprit.
>
> --
> Ueimor
>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: PROBLEM: 2.6.23-rc NETDEV WATCHDOG: eth0: transmit timed out

2007-09-26 Thread Karl Meyer

Hi Francois,

this is what I found and sent:

The error exists from patch 2 on. I did some network testing with
patch 1 and currently use it and have no errors so far.
From my experiences up to now patch 1 should be error free.

Do you need additional info?

2007/9/12, Francois Romieu [EMAIL PROTECTED]:
 Karl Meyer [EMAIL PROTECTED] :
 [...]
  am am looking for this issue for some time now, but there where no
  errors in 2.6.22-r2 (gentoo speak, I guess this is 2.6.22.2
  officially), I also ran git-bisect (for more information see the older
  messages in this thread).

 2.6.22-r2 in gentoo is based on 2.6.22.1. It is way before
 0e4851502f846b13b29b7f88f1250c980d57e944 that you reported to work.
 Thus it is not surprizing that it works.

 Any update regarding the patchkit that I sent on 2007/08/16 ?

 It would help to narrow the culprit.

 --
 Ueimor

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: PROBLEM: 2.6.23-rc "NETDEV WATCHDOG: eth0: transmit timed out"

2007-09-12 Thread Francois Romieu

Karl Meyer <[EMAIL PROTECTED]> :
[...]
> am am looking for this issue for some time now, but there where no
> errors in 2.6.22-r2 (gentoo speak, I guess this is 2.6.22.2
> officially), I also ran git-bisect (for more information see the older
> messages in this thread).

2.6.22-r2 in gentoo is based on 2.6.22.1. It is way before
0e4851502f846b13b29b7f88f1250c980d57e944 that you reported to work.
Thus it is not surprizing that it works.

Any update regarding the patchkit that I sent on 2007/08/16 ?

It would help to narrow the culprit.

-- 
Ueimor
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: PROBLEM: 2.6.23-rc NETDEV WATCHDOG: eth0: transmit timed out

2007-09-12 Thread Francois Romieu

Karl Meyer [EMAIL PROTECTED] :
[...]
 am am looking for this issue for some time now, but there where no
 errors in 2.6.22-r2 (gentoo speak, I guess this is 2.6.22.2
 officially), I also ran git-bisect (for more information see the older
 messages in this thread).

2.6.22-r2 in gentoo is based on 2.6.22.1. It is way before
0e4851502f846b13b29b7f88f1250c980d57e944 that you reported to work.
Thus it is not surprizing that it works.

Any update regarding the patchkit that I sent on 2007/08/16 ?

It would help to narrow the culprit.

-- 
Ueimor
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: PROBLEM: 2.6.23-rc "NETDEV WATCHDOG: eth0: transmit timed out"

2007-09-02 Thread Karl Meyer

Hi,

am am looking for this issue for some time now, but there where no
errors in 2.6.22-r2 (gentoo speak, I guess this is 2.6.22.2
officially), I also ran git-bisect (for more information see the older
messages in this thread).

2007/9/3, Michal Piotrowski <[EMAIL PROTECTED]>:
> Hi,
>
> On 01/09/07, Karl Meyer <[EMAIL PROTECTED]> wrote:
> > This is what happened today:
> >
> > Sep  1 21:08:01 frege NETDEV WATCHDOG: eth0: transmit timed out
> > frege ~ # uname -r
> > 2.6.22.5-cfs-v20.5
>
> Can you reproduce this on 2.6.22 (not 2.6.22.x - it might be a -stable
> regression)?
>
> Regards,
> Michal
>
> --
> LOG
> http://www.stardust.webpages.pl/log/
>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: PROBLEM: 2.6.23-rc "NETDEV WATCHDOG: eth0: transmit timed out"

2007-09-02 Thread Michal Piotrowski

Hi,

On 01/09/07, Karl Meyer <[EMAIL PROTECTED]> wrote:
> This is what happened today:
>
> Sep  1 21:08:01 frege NETDEV WATCHDOG: eth0: transmit timed out
> frege ~ # uname -r
> 2.6.22.5-cfs-v20.5

Can you reproduce this on 2.6.22 (not 2.6.22.x - it might be a -stable
regression)?

Regards,
Michal

-- 
LOG
http://www.stardust.webpages.pl/log/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: PROBLEM: 2.6.23-rc NETDEV WATCHDOG: eth0: transmit timed out

2007-09-02 Thread Michal Piotrowski

Hi,

On 01/09/07, Karl Meyer [EMAIL PROTECTED] wrote:
 This is what happened today:

 Sep  1 21:08:01 frege NETDEV WATCHDOG: eth0: transmit timed out
 frege ~ # uname -r
 2.6.22.5-cfs-v20.5

Can you reproduce this on 2.6.22 (not 2.6.22.x - it might be a -stable
regression)?

Regards,
Michal

-- 
LOG
http://www.stardust.webpages.pl/log/
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: PROBLEM: 2.6.23-rc NETDEV WATCHDOG: eth0: transmit timed out

2007-09-02 Thread Karl Meyer

Hi,

am am looking for this issue for some time now, but there where no
errors in 2.6.22-r2 (gentoo speak, I guess this is 2.6.22.2
officially), I also ran git-bisect (for more information see the older
messages in this thread).

2007/9/3, Michal Piotrowski [EMAIL PROTECTED]:
 Hi,

 On 01/09/07, Karl Meyer [EMAIL PROTECTED] wrote:
  This is what happened today:
 
  Sep  1 21:08:01 frege NETDEV WATCHDOG: eth0: transmit timed out
  frege ~ # uname -r
  2.6.22.5-cfs-v20.5

 Can you reproduce this on 2.6.22 (not 2.6.22.x - it might be a -stable
 regression)?

 Regards,
 Michal

 --
 LOG
 http://www.stardust.webpages.pl/log/

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: PROBLEM: 2.6.23-rc "NETDEV WATCHDOG: eth0: transmit timed out"

2007-09-01 Thread Karl Meyer

This is what happened today:

Sep  1 21:08:01 frege NETDEV WATCHDOG: eth0: transmit timed out
frege ~ # uname -r
2.6.22.5-cfs-v20.5


2007/8/16, Francois Romieu <[EMAIL PROTECTED]>:
> (please do not remove the netdev Cc:)
>
> Francois Romieu <[EMAIL PROTECTED]> :
> [...]
> > If it does not work I'll dissect 0e4851502f846b13b29b7f88f1250c980d57e944
> > tomorrow.
>
> You will find a tgz archive in attachment which contains a serie of patches
> (0001-... to 0005-...) to walk from 6dccd16b7c2703e8bbf8bca62b5cf248332afbe2
> to 0e4851502f846b13b29b7f88f1250c980d57e944 in smaller steps.
>
> Please apply 0001 on top of 6dccd16b7c2703e8bbf8bca62b5cf248332afbe2. If it
> still works, apply 0002 on top of 0001, etc.
>
> --
> Ueimor
>
>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: PROBLEM: 2.6.23-rc NETDEV WATCHDOG: eth0: transmit timed out

2007-09-01 Thread Karl Meyer

This is what happened today:

Sep  1 21:08:01 frege NETDEV WATCHDOG: eth0: transmit timed out
frege ~ # uname -r
2.6.22.5-cfs-v20.5


2007/8/16, Francois Romieu [EMAIL PROTECTED]:
 (please do not remove the netdev Cc:)

 Francois Romieu [EMAIL PROTECTED] :
 [...]
  If it does not work I'll dissect 0e4851502f846b13b29b7f88f1250c980d57e944
  tomorrow.

 You will find a tgz archive in attachment which contains a serie of patches
 (0001-... to 0005-...) to walk from 6dccd16b7c2703e8bbf8bca62b5cf248332afbe2
 to 0e4851502f846b13b29b7f88f1250c980d57e944 in smaller steps.

 Please apply 0001 on top of 6dccd16b7c2703e8bbf8bca62b5cf248332afbe2. If it
 still works, apply 0002 on top of 0001, etc.

 --
 Ueimor


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: PROBLEM: 2.6.23-rc "NETDEV WATCHDOG: eth0: transmit timed out"

2007-08-27 Thread Jarek Poplawski

On 21-08-2007 12:56, Karl Meyer wrote:
> fyi:
> I do not know whether it is related to the problem, but since using
> the version you told me there are these entries is my log:
> frege Hangcheck: hangcheck value past margin!
...

BTW, I don't know wheter it's related too, but I think you should try
first to get rid of these errors:

> Freeing unused kernel memory: 220k freed
> usb_id[1320]: segfault at  eip b7e25db2 esp bfd1d734 error 4
> usb_id[1329]: segfault at  eip b7e1bdb2 esp bf9c9224 error 4
> usb_id[1322]: segfault at  eip b7df3db2 esp bfcb66c4 error 4
> usb_id[1321]: segfault at  eip b7e11db2 esp bf8f4b04 error 4

Regards,
Jarek P.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: PROBLEM: 2.6.23-rc NETDEV WATCHDOG: eth0: transmit timed out

2007-08-27 Thread Jarek Poplawski

On 21-08-2007 12:56, Karl Meyer wrote:
 fyi:
 I do not know whether it is related to the problem, but since using
 the version you told me there are these entries is my log:
 frege Hangcheck: hangcheck value past margin!
...

BTW, I don't know wheter it's related too, but I think you should try
first to get rid of these errors:

 Freeing unused kernel memory: 220k freed
 usb_id[1320]: segfault at  eip b7e25db2 esp bfd1d734 error 4
 usb_id[1329]: segfault at  eip b7e1bdb2 esp bf9c9224 error 4
 usb_id[1322]: segfault at  eip b7df3db2 esp bfcb66c4 error 4
 usb_id[1321]: segfault at  eip b7e11db2 esp bf8f4b04 error 4

Regards,
Jarek P.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

1 2 3 >

1 - 100 of 258 matches

Mail list logo