Re: [3.5.4] rcu_sched self-detected stall on CPU { 1} (t=54862991 jiffies)

2012-09-25 Thread Greg KH
On Tue, Sep 25, 2012 at 07:04:19PM +0200, Paweł Sikora wrote:
> On Tuesday 25 of September 2012 09:44:54 Greg KH wrote:
> > On Tue, Sep 25, 2012 at 06:31:36PM +0200, Paweł Sikora wrote:
> > > On Monday 24 of September 2012 10:36:33 Greg KH wrote:
> > > > On Mon, Sep 24, 2012 at 10:05:23AM +0200, Paweł Sikora wrote:
> > > > > Hi,
> > > > > 
> > > > > with the new stable line i'm observing strange locks on my old 
> > > > > amd-phenom-II mini-server.
> > > > > here's a dmesg:
> > > > 
> > > > Did this show up in 3.5.3?  If not, can you run 'git bisect' to find the
> > > > problem patch?
> > > 
> > > heh, the old good kernel put some light on this issue.
> > > 
> > > Sep 25 08:50:24 nexus kernel: [60330.301639] Clocksource tsc unstable 
> > > (delta = -474690884 ns)
> > > Sep 25 08:50:24 nexus kernel: [60330.325477] [ cut here 
> > > ]
> > > Sep 25 08:50:24 nexus kernel: [60330.325484] WARNING: at 
> > > /home/users/builder/rpm/BUILD/kernel-2.6.37.6/linux-2.6.37/net/sched/sch_generic.c:258
> > >  dev_watchdog+0x25d/0x270()
> > > Sep 25 08:50:24 nexus kernel: [60330.325486] Hardware name: 
> > > GA-MA785GMT-UD2H
> > > Sep 25 08:50:24 nexus kernel: [60330.325487] NETDEV WATCHDOG: eth0 
> > > (r8169): transmit queue 0 timed out
> > > (...)
> > > Sep 25 08:50:25 nexus kernel: [60330.851093] Switching to clocksource 
> > > acpi_pm
> > > 
> > > afaics, this amd-phenom cpu does the cpu frequency scaling and causes 
> > > plain 'tsc' timer
> > > instability which leads to network card watchdog timeout (i can login via 
> > > local console
> > > while any network traffic is dead). on the recent 3.5.x kernel the 
> > > 'clocksource unstable'
> > > message appears *after* 'task blocked' flood and there's no clear info 
> > > about watchog timeout.
> > > currently i'm testing hpet clocksource becasue better tsc modes 
> > > (constant_tsc, nonstop_tsc)
> > > aren't present in 
> > > /sys/devices/system/clocksource/clocksource0/available_clocksource while
> > > cpu supports them.
> > 
> > I'm sorry, I don't understand, that's a 2.6.37 kernel you are comparing
> > this to.  Where did this problem show up?  In 3.5.4 where 3.5.3 was
> > fine?
> 
> 'cpu-stall' from topic has appeared in 3.5.2 (after upgrade from 3.4.10).

So, can you run 'git bisect' from 3.4.10 and 3.5.2 to find the commit
causing the problem?

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [3.5.4] rcu_sched self-detected stall on CPU { 1} (t=54862991 jiffies)

2012-09-25 Thread Paweł Sikora
On Tuesday 25 of September 2012 09:44:54 Greg KH wrote:
> On Tue, Sep 25, 2012 at 06:31:36PM +0200, Paweł Sikora wrote:
> > On Monday 24 of September 2012 10:36:33 Greg KH wrote:
> > > On Mon, Sep 24, 2012 at 10:05:23AM +0200, Paweł Sikora wrote:
> > > > Hi,
> > > > 
> > > > with the new stable line i'm observing strange locks on my old 
> > > > amd-phenom-II mini-server.
> > > > here's a dmesg:
> > > 
> > > Did this show up in 3.5.3?  If not, can you run 'git bisect' to find the
> > > problem patch?
> > 
> > heh, the old good kernel put some light on this issue.
> > 
> > Sep 25 08:50:24 nexus kernel: [60330.301639] Clocksource tsc unstable 
> > (delta = -474690884 ns)
> > Sep 25 08:50:24 nexus kernel: [60330.325477] [ cut here 
> > ]
> > Sep 25 08:50:24 nexus kernel: [60330.325484] WARNING: at 
> > /home/users/builder/rpm/BUILD/kernel-2.6.37.6/linux-2.6.37/net/sched/sch_generic.c:258
> >  dev_watchdog+0x25d/0x270()
> > Sep 25 08:50:24 nexus kernel: [60330.325486] Hardware name: GA-MA785GMT-UD2H
> > Sep 25 08:50:24 nexus kernel: [60330.325487] NETDEV WATCHDOG: eth0 (r8169): 
> > transmit queue 0 timed out
> > (...)
> > Sep 25 08:50:25 nexus kernel: [60330.851093] Switching to clocksource 
> > acpi_pm
> > 
> > afaics, this amd-phenom cpu does the cpu frequency scaling and causes plain 
> > 'tsc' timer
> > instability which leads to network card watchdog timeout (i can login via 
> > local console
> > while any network traffic is dead). on the recent 3.5.x kernel the 
> > 'clocksource unstable'
> > message appears *after* 'task blocked' flood and there's no clear info 
> > about watchog timeout.
> > currently i'm testing hpet clocksource becasue better tsc modes 
> > (constant_tsc, nonstop_tsc)
> > aren't present in 
> > /sys/devices/system/clocksource/clocksource0/available_clocksource while
> > cpu supports them.
> 
> I'm sorry, I don't understand, that's a 2.6.37 kernel you are comparing
> this to.  Where did this problem show up?  In 3.5.4 where 3.5.3 was
> fine?

'cpu-stall' from topic has appeared in 3.5.2 (after upgrade from 3.4.10).
the 3.5.4 also has the same problem as 3.5.2, so i've went back to initial 
2.6.37.6
which had worked fine for many months. now i'm pretty sure that all these 
problems
are related to tsc instability and appears on different kernels in different 
form.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [3.5.4] rcu_sched self-detected stall on CPU { 1} (t=54862991 jiffies)

2012-09-25 Thread Greg KH
On Tue, Sep 25, 2012 at 06:31:36PM +0200, Paweł Sikora wrote:
> On Monday 24 of September 2012 10:36:33 Greg KH wrote:
> > On Mon, Sep 24, 2012 at 10:05:23AM +0200, Paweł Sikora wrote:
> > > Hi,
> > > 
> > > with the new stable line i'm observing strange locks on my old 
> > > amd-phenom-II mini-server.
> > > here's a dmesg:
> > 
> > Did this show up in 3.5.3?  If not, can you run 'git bisect' to find the
> > problem patch?
> 
> heh, the old good kernel put some light on this issue.
> 
> Sep 25 08:50:24 nexus kernel: [60330.301639] Clocksource tsc unstable (delta 
> = -474690884 ns)
> Sep 25 08:50:24 nexus kernel: [60330.325477] [ cut here 
> ]
> Sep 25 08:50:24 nexus kernel: [60330.325484] WARNING: at 
> /home/users/builder/rpm/BUILD/kernel-2.6.37.6/linux-2.6.37/net/sched/sch_generic.c:258
>  dev_watchdog+0x25d/0x270()
> Sep 25 08:50:24 nexus kernel: [60330.325486] Hardware name: GA-MA785GMT-UD2H
> Sep 25 08:50:24 nexus kernel: [60330.325487] NETDEV WATCHDOG: eth0 (r8169): 
> transmit queue 0 timed out
> (...)
> Sep 25 08:50:25 nexus kernel: [60330.851093] Switching to clocksource acpi_pm
> 
> afaics, this amd-phenom cpu does the cpu frequency scaling and causes plain 
> 'tsc' timer
> instability which leads to network card watchdog timeout (i can login via 
> local console
> while any network traffic is dead). on the recent 3.5.x kernel the 
> 'clocksource unstable'
> message appears *after* 'task blocked' flood and there's no clear info about 
> watchog timeout.
> currently i'm testing hpet clocksource becasue better tsc modes 
> (constant_tsc, nonstop_tsc)
> aren't present in 
> /sys/devices/system/clocksource/clocksource0/available_clocksource while
> cpu supports them.

I'm sorry, I don't understand, that's a 2.6.37 kernel you are comparing
this to.  Where did this problem show up?  In 3.5.4 where 3.5.3 was
fine?

Or somewhere else?

confused,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [3.5.4] rcu_sched self-detected stall on CPU { 1} (t=54862991 jiffies)

2012-09-25 Thread Paweł Sikora
On Monday 24 of September 2012 10:36:33 Greg KH wrote:
> On Mon, Sep 24, 2012 at 10:05:23AM +0200, Paweł Sikora wrote:
> > Hi,
> > 
> > with the new stable line i'm observing strange locks on my old 
> > amd-phenom-II mini-server.
> > here's a dmesg:
> 
> Did this show up in 3.5.3?  If not, can you run 'git bisect' to find the
> problem patch?

heh, the old good kernel put some light on this issue.

Sep 25 08:50:24 nexus kernel: [60330.301639] Clocksource tsc unstable (delta = 
-474690884 ns)
Sep 25 08:50:24 nexus kernel: [60330.325477] [ cut here 
]
Sep 25 08:50:24 nexus kernel: [60330.325484] WARNING: at 
/home/users/builder/rpm/BUILD/kernel-2.6.37.6/linux-2.6.37/net/sched/sch_generic.c:258
 dev_watchdog+0x25d/0x270()
Sep 25 08:50:24 nexus kernel: [60330.325486] Hardware name: GA-MA785GMT-UD2H
Sep 25 08:50:24 nexus kernel: [60330.325487] NETDEV WATCHDOG: eth0 (r8169): 
transmit queue 0 timed out
(...)
Sep 25 08:50:25 nexus kernel: [60330.851093] Switching to clocksource acpi_pm

afaics, this amd-phenom cpu does the cpu frequency scaling and causes plain 
'tsc' timer
instability which leads to network card watchdog timeout (i can login via local 
console
while any network traffic is dead). on the recent 3.5.x kernel the 'clocksource 
unstable'
message appears *after* 'task blocked' flood and there's no clear info about 
watchog timeout.
currently i'm testing hpet clocksource becasue better tsc modes (constant_tsc, 
nonstop_tsc)
aren't present in 
/sys/devices/system/clocksource/clocksource0/available_clocksource while
cpu supports them.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [3.5.4] rcu_sched self-detected stall on CPU { 1} (t=54862991 jiffies)

2012-09-25 Thread Paweł Sikora
On Monday 24 of September 2012 10:36:33 Greg KH wrote:
 On Mon, Sep 24, 2012 at 10:05:23AM +0200, Paweł Sikora wrote:
  Hi,
  
  with the new stable line i'm observing strange locks on my old 
  amd-phenom-II mini-server.
  here's a dmesg:
 
 Did this show up in 3.5.3?  If not, can you run 'git bisect' to find the
 problem patch?

heh, the old good kernel put some light on this issue.

Sep 25 08:50:24 nexus kernel: [60330.301639] Clocksource tsc unstable (delta = 
-474690884 ns)
Sep 25 08:50:24 nexus kernel: [60330.325477] [ cut here 
]
Sep 25 08:50:24 nexus kernel: [60330.325484] WARNING: at 
/home/users/builder/rpm/BUILD/kernel-2.6.37.6/linux-2.6.37/net/sched/sch_generic.c:258
 dev_watchdog+0x25d/0x270()
Sep 25 08:50:24 nexus kernel: [60330.325486] Hardware name: GA-MA785GMT-UD2H
Sep 25 08:50:24 nexus kernel: [60330.325487] NETDEV WATCHDOG: eth0 (r8169): 
transmit queue 0 timed out
(...)
Sep 25 08:50:25 nexus kernel: [60330.851093] Switching to clocksource acpi_pm

afaics, this amd-phenom cpu does the cpu frequency scaling and causes plain 
'tsc' timer
instability which leads to network card watchdog timeout (i can login via local 
console
while any network traffic is dead). on the recent 3.5.x kernel the 'clocksource 
unstable'
message appears *after* 'task blocked' flood and there's no clear info about 
watchog timeout.
currently i'm testing hpet clocksource becasue better tsc modes (constant_tsc, 
nonstop_tsc)
aren't present in 
/sys/devices/system/clocksource/clocksource0/available_clocksource while
cpu supports them.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [3.5.4] rcu_sched self-detected stall on CPU { 1} (t=54862991 jiffies)

2012-09-25 Thread Greg KH
On Tue, Sep 25, 2012 at 06:31:36PM +0200, Paweł Sikora wrote:
 On Monday 24 of September 2012 10:36:33 Greg KH wrote:
  On Mon, Sep 24, 2012 at 10:05:23AM +0200, Paweł Sikora wrote:
   Hi,
   
   with the new stable line i'm observing strange locks on my old 
   amd-phenom-II mini-server.
   here's a dmesg:
  
  Did this show up in 3.5.3?  If not, can you run 'git bisect' to find the
  problem patch?
 
 heh, the old good kernel put some light on this issue.
 
 Sep 25 08:50:24 nexus kernel: [60330.301639] Clocksource tsc unstable (delta 
 = -474690884 ns)
 Sep 25 08:50:24 nexus kernel: [60330.325477] [ cut here 
 ]
 Sep 25 08:50:24 nexus kernel: [60330.325484] WARNING: at 
 /home/users/builder/rpm/BUILD/kernel-2.6.37.6/linux-2.6.37/net/sched/sch_generic.c:258
  dev_watchdog+0x25d/0x270()
 Sep 25 08:50:24 nexus kernel: [60330.325486] Hardware name: GA-MA785GMT-UD2H
 Sep 25 08:50:24 nexus kernel: [60330.325487] NETDEV WATCHDOG: eth0 (r8169): 
 transmit queue 0 timed out
 (...)
 Sep 25 08:50:25 nexus kernel: [60330.851093] Switching to clocksource acpi_pm
 
 afaics, this amd-phenom cpu does the cpu frequency scaling and causes plain 
 'tsc' timer
 instability which leads to network card watchdog timeout (i can login via 
 local console
 while any network traffic is dead). on the recent 3.5.x kernel the 
 'clocksource unstable'
 message appears *after* 'task blocked' flood and there's no clear info about 
 watchog timeout.
 currently i'm testing hpet clocksource becasue better tsc modes 
 (constant_tsc, nonstop_tsc)
 aren't present in 
 /sys/devices/system/clocksource/clocksource0/available_clocksource while
 cpu supports them.

I'm sorry, I don't understand, that's a 2.6.37 kernel you are comparing
this to.  Where did this problem show up?  In 3.5.4 where 3.5.3 was
fine?

Or somewhere else?

confused,

greg k-h
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [3.5.4] rcu_sched self-detected stall on CPU { 1} (t=54862991 jiffies)

2012-09-25 Thread Paweł Sikora
On Tuesday 25 of September 2012 09:44:54 Greg KH wrote:
 On Tue, Sep 25, 2012 at 06:31:36PM +0200, Paweł Sikora wrote:
  On Monday 24 of September 2012 10:36:33 Greg KH wrote:
   On Mon, Sep 24, 2012 at 10:05:23AM +0200, Paweł Sikora wrote:
Hi,

with the new stable line i'm observing strange locks on my old 
amd-phenom-II mini-server.
here's a dmesg:
   
   Did this show up in 3.5.3?  If not, can you run 'git bisect' to find the
   problem patch?
  
  heh, the old good kernel put some light on this issue.
  
  Sep 25 08:50:24 nexus kernel: [60330.301639] Clocksource tsc unstable 
  (delta = -474690884 ns)
  Sep 25 08:50:24 nexus kernel: [60330.325477] [ cut here 
  ]
  Sep 25 08:50:24 nexus kernel: [60330.325484] WARNING: at 
  /home/users/builder/rpm/BUILD/kernel-2.6.37.6/linux-2.6.37/net/sched/sch_generic.c:258
   dev_watchdog+0x25d/0x270()
  Sep 25 08:50:24 nexus kernel: [60330.325486] Hardware name: GA-MA785GMT-UD2H
  Sep 25 08:50:24 nexus kernel: [60330.325487] NETDEV WATCHDOG: eth0 (r8169): 
  transmit queue 0 timed out
  (...)
  Sep 25 08:50:25 nexus kernel: [60330.851093] Switching to clocksource 
  acpi_pm
  
  afaics, this amd-phenom cpu does the cpu frequency scaling and causes plain 
  'tsc' timer
  instability which leads to network card watchdog timeout (i can login via 
  local console
  while any network traffic is dead). on the recent 3.5.x kernel the 
  'clocksource unstable'
  message appears *after* 'task blocked' flood and there's no clear info 
  about watchog timeout.
  currently i'm testing hpet clocksource becasue better tsc modes 
  (constant_tsc, nonstop_tsc)
  aren't present in 
  /sys/devices/system/clocksource/clocksource0/available_clocksource while
  cpu supports them.
 
 I'm sorry, I don't understand, that's a 2.6.37 kernel you are comparing
 this to.  Where did this problem show up?  In 3.5.4 where 3.5.3 was
 fine?

'cpu-stall' from topic has appeared in 3.5.2 (after upgrade from 3.4.10).
the 3.5.4 also has the same problem as 3.5.2, so i've went back to initial 
2.6.37.6
which had worked fine for many months. now i'm pretty sure that all these 
problems
are related to tsc instability and appears on different kernels in different 
form.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [3.5.4] rcu_sched self-detected stall on CPU { 1} (t=54862991 jiffies)

2012-09-25 Thread Greg KH
On Tue, Sep 25, 2012 at 07:04:19PM +0200, Paweł Sikora wrote:
 On Tuesday 25 of September 2012 09:44:54 Greg KH wrote:
  On Tue, Sep 25, 2012 at 06:31:36PM +0200, Paweł Sikora wrote:
   On Monday 24 of September 2012 10:36:33 Greg KH wrote:
On Mon, Sep 24, 2012 at 10:05:23AM +0200, Paweł Sikora wrote:
 Hi,
 
 with the new stable line i'm observing strange locks on my old 
 amd-phenom-II mini-server.
 here's a dmesg:

Did this show up in 3.5.3?  If not, can you run 'git bisect' to find the
problem patch?
   
   heh, the old good kernel put some light on this issue.
   
   Sep 25 08:50:24 nexus kernel: [60330.301639] Clocksource tsc unstable 
   (delta = -474690884 ns)
   Sep 25 08:50:24 nexus kernel: [60330.325477] [ cut here 
   ]
   Sep 25 08:50:24 nexus kernel: [60330.325484] WARNING: at 
   /home/users/builder/rpm/BUILD/kernel-2.6.37.6/linux-2.6.37/net/sched/sch_generic.c:258
dev_watchdog+0x25d/0x270()
   Sep 25 08:50:24 nexus kernel: [60330.325486] Hardware name: 
   GA-MA785GMT-UD2H
   Sep 25 08:50:24 nexus kernel: [60330.325487] NETDEV WATCHDOG: eth0 
   (r8169): transmit queue 0 timed out
   (...)
   Sep 25 08:50:25 nexus kernel: [60330.851093] Switching to clocksource 
   acpi_pm
   
   afaics, this amd-phenom cpu does the cpu frequency scaling and causes 
   plain 'tsc' timer
   instability which leads to network card watchdog timeout (i can login via 
   local console
   while any network traffic is dead). on the recent 3.5.x kernel the 
   'clocksource unstable'
   message appears *after* 'task blocked' flood and there's no clear info 
   about watchog timeout.
   currently i'm testing hpet clocksource becasue better tsc modes 
   (constant_tsc, nonstop_tsc)
   aren't present in 
   /sys/devices/system/clocksource/clocksource0/available_clocksource while
   cpu supports them.
  
  I'm sorry, I don't understand, that's a 2.6.37 kernel you are comparing
  this to.  Where did this problem show up?  In 3.5.4 where 3.5.3 was
  fine?
 
 'cpu-stall' from topic has appeared in 3.5.2 (after upgrade from 3.4.10).

So, can you run 'git bisect' from 3.4.10 and 3.5.2 to find the commit
causing the problem?

thanks,

greg k-h
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [3.5.4] rcu_sched self-detected stall on CPU { 1} (t=54862991 jiffies)

2012-09-24 Thread Greg KH
On Mon, Sep 24, 2012 at 10:05:23AM +0200, Paweł Sikora wrote:
> Hi,
> 
> with the new stable line i'm observing strange locks on my old amd-phenom-II 
> mini-server.
> here's a dmesg:

Did this show up in 3.5.3?  If not, can you run 'git bisect' to find the
problem patch?

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [3.5.4] rcu_sched self-detected stall on CPU { 1} (t=54862991 jiffies)

2012-09-24 Thread Greg KH
On Mon, Sep 24, 2012 at 10:05:23AM +0200, Paweł Sikora wrote:
 Hi,
 
 with the new stable line i'm observing strange locks on my old amd-phenom-II 
 mini-server.
 here's a dmesg:

Did this show up in 3.5.3?  If not, can you run 'git bisect' to find the
problem patch?

thanks,

greg k-h
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/