Re: 2.6.1[01] freeze on x86_64

2005-03-22 Thread Sean Russell
Jan Engelhardt wrote:
Er... by serial console, I assume you mean via a serial cable and some other
device.  If so, then no, I don't have that capability.  I didn't know about
netconsole before you mentioned it; I'll do some research and set it up.  I do
   

Serial console -- only requires a serial cable, available in the next computer 
store -- also works with non-Linux, non-x86 and (mostly) systems-w/o-compiler.
 

Well, that and the knowledge of how to monitor it on the other end, 
which I lack :-).  And does one need a null-modem cable?  I haven't used 
serial cables since USB was introduced.  Is serial monitoring preferred 
over netconsole?

In any case, I've figured out how to get netconsole working, and have 
started monitoring it from my wife's laptop.  I just need to reboot and 
make sure it is working (that I got the UDP addresses right), and then 
wait for a crash.  I won't get to this until this evening, probably.

Mark, I'm going to start CC'ing you, and I'll forward you the previous 
emails.

Thanks, everybody, for responding.
--- SER
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.1[01] freeze on x86_64

2005-03-22 Thread Mark Nipper
On 22 Mar 2005, Jan Engelhardt wrote:
> >> >acpi_thermal-0400 [23] acpi_thermal_get_trip_: Invalid active
> >> > threshold [0]
> >> 
> >> You mean you got this in /var/log/messages?
> >
> > Yes, in /var/log/messages.  The lock up occurs without warning, so the only
> > opportunity I have to look for error messages is in the syslogs.
> >
> >> Can you connect a serial console or netconsole and see if that 
> >> 
> > Er... by serial console, I assume you mean via a serial cable and some other
> > device.  If so, then no, I don't have that capability.  I didn't know about
> > netconsole before you mentioned it; I'll do some research and set it up.  I 
> > do
> > have a second computer (well, my wife's laptop is also running Linux) that I
> > could use to monitor UDP traffic, if I can figure out what to use as a 
> > client
> > to capture the messages.  This may take me a couple of days.
> 
> Serial console -- only requires a serial cable, available in the next 
> computer 
> store -- also works with non-Linux, non-x86 and (mostly) systems-w/o-compiler.
> 
> 
> Jan Engelhardt

I've actually got old dumb terminals sitting around.
I'll hook one up and set the oops=panic option also.  Maybe we
can nail this down as I've pretty much avoided using my x86-64
desktop ever since.  I'd been torn trying to decide whether or
not to migrate to a different file system.

-- 
Mark Nippere-contacts:
4475 Carter Creek Parkway   [EMAIL PROTECTED]
Apartment 724   http://nipsy.bitgnome.net/
Bryan, Texas, 77802-4481   AIM/Yahoo: texasnipsy ICQ: 66971617
(979)575-3193  MSN: [EMAIL PROTECTED]

-BEGIN GEEK CODE BLOCK-
Version: 3.1
GG/IT d- s++:+ a- C++$ UBL$ P--->+++ L+++$ !E---
W++(--) N+ o K++ w(---) O++ M V(--) PS+++(+) PE(--)
Y+ PGP t+ 5 X R tv b+++@ DI+(++) D+ G e h r++ y+(**)
--END GEEK CODE BLOCK--

---begin random quote of the moment---
He hoped and prayed that there wasn't an afterlife. Then he
realized there was a contradiction involved here and merely
hoped that there wasn't an afterlife.
 -- Douglas Adams
end random quote of the moment
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.1[01] freeze on x86_64

2005-03-22 Thread Jan Engelhardt
>> >acpi_thermal-0400 [23] acpi_thermal_get_trip_: Invalid active
>> > threshold [0]
>> 
>> You mean you got this in /var/log/messages?
>
> Yes, in /var/log/messages.  The lock up occurs without warning, so the only
> opportunity I have to look for error messages is in the syslogs.
>
>> Can you connect a serial console or netconsole and see if that 
>> 
> Er... by serial console, I assume you mean via a serial cable and some other
> device.  If so, then no, I don't have that capability.  I didn't know about
> netconsole before you mentioned it; I'll do some research and set it up.  I do
> have a second computer (well, my wife's laptop is also running Linux) that I
> could use to monitor UDP traffic, if I can figure out what to use as a client
> to capture the messages.  This may take me a couple of days.

Serial console -- only requires a serial cable, available in the next computer 
store -- also works with non-Linux, non-x86 and (mostly) systems-w/o-compiler.


Jan Engelhardt
-- 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.1[01] freeze on x86_64

2005-03-22 Thread Sean Russell
Andi Kleen wrote:
Sean Russell <[EMAIL PROTECTED]> writes:
 

   acpi_thermal-0400 [23] acpi_thermal_get_trip_: Invalid active
threshold [0]
   

You mean you got this in /var/log/messages?
 

Yes, in /var/log/messages.  The lock up occurs without warning, so the 
only opportunity I have to look for error messages is in the syslogs.

Can you connect a serial console or netconsole and see if that 
 

Er... by serial console, I assume you mean via a serial cable and some 
other device.  If so, then no, I don't have that capability.  I didn't 
know about netconsole before you mentioned it; I'll do some research and 
set it up.  I do have a second computer (well, my wife's laptop is also 
running Linux) that I could use to monitor UDP traffic, if I can figure 
out what to use as a client to capture the messages.  This may take me a 
couple of days.

I didn't post to the list earlier specifically because I knew the 
debugging process would rapidly exceed my knowledge about kernel 
debugging.  I appologize for making you walk me through the process.

catches anything?  Also boot with oops=panic
 

As a boot parameter?  I'll give that a try.
--- SER
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.1[01] freeze on x86_64

2005-03-22 Thread Andi Kleen
Sean Russell <[EMAIL PROTECTED]> writes:


> appear to be related to the lockup.  In my logs, the last message
> before the crash is always (that I've noticed) an ACPI error:
>
> acpi_thermal-0400 [23] acpi_thermal_get_trip_: Invalid active
> threshold [0]

You mean you got this in /var/log/messages?

Can you connect a serial console or netconsole and see if that 
catches anything?  Also boot with oops=panic

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.1[01] freeze on x86_64

2005-03-22 Thread Andi Kleen
Sean Russell [EMAIL PROTECTED] writes:


 appear to be related to the lockup.  In my logs, the last message
 before the crash is always (that I've noticed) an ACPI error:

 acpi_thermal-0400 [23] acpi_thermal_get_trip_: Invalid active
 threshold [0]

You mean you got this in /var/log/messages?

Can you connect a serial console or netconsole and see if that 
catches anything?  Also boot with oops=panic

-Andi
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.1[01] freeze on x86_64

2005-03-22 Thread Sean Russell
Andi Kleen wrote:
Sean Russell [EMAIL PROTECTED] writes:
 

   acpi_thermal-0400 [23] acpi_thermal_get_trip_: Invalid active
threshold [0]
   

You mean you got this in /var/log/messages?
 

Yes, in /var/log/messages.  The lock up occurs without warning, so the 
only opportunity I have to look for error messages is in the syslogs.

Can you connect a serial console or netconsole and see if that 
 

Er... by serial console, I assume you mean via a serial cable and some 
other device.  If so, then no, I don't have that capability.  I didn't 
know about netconsole before you mentioned it; I'll do some research and 
set it up.  I do have a second computer (well, my wife's laptop is also 
running Linux) that I could use to monitor UDP traffic, if I can figure 
out what to use as a client to capture the messages.  This may take me a 
couple of days.

I didn't post to the list earlier specifically because I knew the 
debugging process would rapidly exceed my knowledge about kernel 
debugging.  I appologize for making you walk me through the process.

catches anything?  Also boot with oops=panic
 

As a boot parameter?  I'll give that a try.
--- SER
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.1[01] freeze on x86_64

2005-03-22 Thread Jan Engelhardt
 acpi_thermal-0400 [23] acpi_thermal_get_trip_: Invalid active
  threshold [0]
 
 You mean you got this in /var/log/messages?

 Yes, in /var/log/messages.  The lock up occurs without warning, so the only
 opportunity I have to look for error messages is in the syslogs.

 Can you connect a serial console or netconsole and see if that 
 
 Er... by serial console, I assume you mean via a serial cable and some other
 device.  If so, then no, I don't have that capability.  I didn't know about
 netconsole before you mentioned it; I'll do some research and set it up.  I do
 have a second computer (well, my wife's laptop is also running Linux) that I
 could use to monitor UDP traffic, if I can figure out what to use as a client
 to capture the messages.  This may take me a couple of days.

Serial console -- only requires a serial cable, available in the next computer 
store -- also works with non-Linux, non-x86 and (mostly) systems-w/o-compiler.


Jan Engelhardt
-- 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.1[01] freeze on x86_64

2005-03-22 Thread Mark Nipper
On 22 Mar 2005, Jan Engelhardt wrote:
  acpi_thermal-0400 [23] acpi_thermal_get_trip_: Invalid active
   threshold [0]
  
  You mean you got this in /var/log/messages?
 
  Yes, in /var/log/messages.  The lock up occurs without warning, so the only
  opportunity I have to look for error messages is in the syslogs.
 
  Can you connect a serial console or netconsole and see if that 
  
  Er... by serial console, I assume you mean via a serial cable and some other
  device.  If so, then no, I don't have that capability.  I didn't know about
  netconsole before you mentioned it; I'll do some research and set it up.  I 
  do
  have a second computer (well, my wife's laptop is also running Linux) that I
  could use to monitor UDP traffic, if I can figure out what to use as a 
  client
  to capture the messages.  This may take me a couple of days.
 
 Serial console -- only requires a serial cable, available in the next 
 computer 
 store -- also works with non-Linux, non-x86 and (mostly) systems-w/o-compiler.
 
 
 Jan Engelhardt

I've actually got old dumb terminals sitting around.
I'll hook one up and set the oops=panic option also.  Maybe we
can nail this down as I've pretty much avoided using my x86-64
desktop ever since.  I'd been torn trying to decide whether or
not to migrate to a different file system.

-- 
Mark Nippere-contacts:
4475 Carter Creek Parkway   [EMAIL PROTECTED]
Apartment 724   http://nipsy.bitgnome.net/
Bryan, Texas, 77802-4481   AIM/Yahoo: texasnipsy ICQ: 66971617
(979)575-3193  MSN: [EMAIL PROTECTED]

-BEGIN GEEK CODE BLOCK-
Version: 3.1
GG/IT d- s++:+ a- C++$ UBL$ P---+++ L+++$ !E---
W++(--) N+ o K++ w(---) O++ M V(--) PS+++(+) PE(--)
Y+ PGP t+ 5 X R tv b+++@ DI+(++) D+ G e h r++ y+(**)
--END GEEK CODE BLOCK--

---begin random quote of the moment---
He hoped and prayed that there wasn't an afterlife. Then he
realized there was a contradiction involved here and merely
hoped that there wasn't an afterlife.
 -- Douglas Adams
end random quote of the moment
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.1[01] freeze on x86_64

2005-03-22 Thread Sean Russell
Jan Engelhardt wrote:
Er... by serial console, I assume you mean via a serial cable and some other
device.  If so, then no, I don't have that capability.  I didn't know about
netconsole before you mentioned it; I'll do some research and set it up.  I do
   

Serial console -- only requires a serial cable, available in the next computer 
store -- also works with non-Linux, non-x86 and (mostly) systems-w/o-compiler.
 

Well, that and the knowledge of how to monitor it on the other end, 
which I lack :-).  And does one need a null-modem cable?  I haven't used 
serial cables since USB was introduced.  Is serial monitoring preferred 
over netconsole?

In any case, I've figured out how to get netconsole working, and have 
started monitoring it from my wife's laptop.  I just need to reboot and 
make sure it is working (that I got the UDP addresses right), and then 
wait for a crash.  I won't get to this until this evening, probably.

Mark, I'm going to start CC'ing you, and I'll forward you the previous 
emails.

Thanks, everybody, for responding.
--- SER
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


2.6.1[01] freeze on x86_64

2005-03-21 Thread Sean Russell
Hello,
One liner:  I'm getting mysterious (to me), almost random hard freezes 
of the kernel running 2.6.10 and 2.6.11.
Kernel version: Linux version 2.6.11-gentoo-r3 ([EMAIL PROTECTED]) (gcc version 
3.4.2 (Gentoo Linux 3.4.2-r2, ssp-3.4.1-1, pie-8.7.6.5))

Mark Nipper posted a message on March 5 regarding some mysterious kernel 
lockups which he didn't get a response to (I've contacted him about 
it).  Since I'm having what I think is the same problem, I thought I'd 
post a message so he's not just a single lonely voice in the dark.

Mark and I have similar set-ups.  We're both running x86_64 kernels, and 
ReiserFS3.  He's running Debian, I'm running Gentoo.  We haven't 
compared kernel config files yet; it might mean something to him, but to 
be honest, I barely know enough to compile my own kernels and wouldn't 
know where to begin to look for the problem.  Mark has only encountered 
this on 2.6.11, but I don't think he's tried any other kernel versions 
on x86_64; I get this problem on both 2.6.10 and 2.6.11.  I didn't see 
the problem on 2.6.9.

In both of our cases, the kernel is locking up, and requires a power 
cycle to get it back.  We're not able to SSH into our machines, and we 
get no response from any of the input devices.  Furthermore, even with 
full debugging turned on, there are no messages in the log file that 
appear to be related to the lockup.  In my logs, the last message before 
the crash is always (that I've noticed) an ACPI error:

   acpi_thermal-0400 [23] acpi_thermal_get_trip_: Invalid active 
threshold [0]

but this message appears a lot in my logs, so I think it is 
coincidence.  For Mark, the last message was some ReiserFS message.  
Mark feels like the error is ReiserFS related, and I was pretty sure it 
was swap related, until I turned off all swap partitions and the problem 
still occurred.  I *may* try converting all of my filesystems to 
something else if somebody knowledgeable thinks it could be the problem, 
but I'm guessing it is something deeper in; I've never seen a filesystem 
related problem that caused a lock-up like this.

I still feel that this may be memory related.  When I turn off swap, or 
when a drastically reduce my memory use, my laptop can run for hours, or 
even days with little use.  On the other hand, it can freeze up after 
five minutes, even before KDE has finished loading completely, with the 
swap on.  However, I haven't found a situation where it won't, 
eventually, lock up.  But I can't really pin it down, so I don't know 
where the problem is.

I haven't noticed the lockups without X, but I haven't run for any great 
length of time without X.  I'm running the ATI proprietary drivers, but 
I even when I revert to the XOrg ATI drivers (non-proprietary), I still 
get the lockups.

I'm really sorry that I can't provide more information; I'm usually not 
totally incompetent at narrowing down problems in software, but I have 
no idea where to even start looking for the problem here.  If there are 
any things I should try that might provide more information, please let 
me know.

I'm attaching my kernel config, plus all of the info from /proc that is 
suggested by the FAQ to be included.  I'll be happy to recompile my 
kernel with other options, if I can get some hints at starting points; I 
doubt my changing flags at random will help much.

Thanks, in advance.
Sean Russell
processor   : 0
vendor_id   : AuthenticAMD
cpu family  : 15
model   : 4
model name  : AMD Athlon(tm) 64 Processor 3400+
stepping: 10
cpu MHz : 801.849
cache size  : 1024 KB
fpu : yes
fpu_exception   : yes
cpuid level : 1
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov 
pat pse36 clflush mmx fxsr sse sse2 pni syscall nx mmxext lm 3dnowext 3dnow
bogomips: 1572.86
TLB size: 1024 4K pages
clflush size: 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management: ts fid vid ttp


-0009f7ff : System RAM
0009f800-0009 : reserved
000a-000b : Video RAM area
000c-000cefff : Video ROM
000cf000-000c : Adapter ROM
000f-000f : System ROM
0010-3fee : System RAM
  0010-003ae374 : Kernel code
  003ae375-004e0d27 : Kernel data
3fef-3fef9fff : ACPI Tables
3fefa000-3fef : ACPI Non-volatile Storage
3ff0-3fff : reserved
4000-4fff : :00:05.0
  4000-4fff : ipw2200
40001000-40001fff : :00:0c.0
d000-d0003fff : :00:06.0
d0004000-d0004fff : :00:0e.0
d0005000-d0005fff : :00:0e.0
d0006000-d0006fff : :00:0e.1
d0007000-d0007fff : :00:0e.1
d0008000-d00087ff : :00:06.0
  d0008000-d00087ff : ohci1394
d0008800-d00088ff : :00:08.0
  d0008800-d00088ff : r8169
d0008c00-d0008cff : :00:10.3
  d0008c00-d0008cff : ehci_hcd
d010-d01f : PCI Bus #01
  d010-d010 : :01:00.0
d010-d010 : radeonfb

2.6.1[01] freeze on x86_64

2005-03-21 Thread Sean Russell
Hello,
One liner:  I'm getting mysterious (to me), almost random hard freezes 
of the kernel running 2.6.10 and 2.6.11.
Kernel version: Linux version 2.6.11-gentoo-r3 ([EMAIL PROTECTED]) (gcc version 
3.4.2 (Gentoo Linux 3.4.2-r2, ssp-3.4.1-1, pie-8.7.6.5))

Mark Nipper posted a message on March 5 regarding some mysterious kernel 
lockups which he didn't get a response to (I've contacted him about 
it).  Since I'm having what I think is the same problem, I thought I'd 
post a message so he's not just a single lonely voice in the dark.

Mark and I have similar set-ups.  We're both running x86_64 kernels, and 
ReiserFS3.  He's running Debian, I'm running Gentoo.  We haven't 
compared kernel config files yet; it might mean something to him, but to 
be honest, I barely know enough to compile my own kernels and wouldn't 
know where to begin to look for the problem.  Mark has only encountered 
this on 2.6.11, but I don't think he's tried any other kernel versions 
on x86_64; I get this problem on both 2.6.10 and 2.6.11.  I didn't see 
the problem on 2.6.9.

In both of our cases, the kernel is locking up, and requires a power 
cycle to get it back.  We're not able to SSH into our machines, and we 
get no response from any of the input devices.  Furthermore, even with 
full debugging turned on, there are no messages in the log file that 
appear to be related to the lockup.  In my logs, the last message before 
the crash is always (that I've noticed) an ACPI error:

   acpi_thermal-0400 [23] acpi_thermal_get_trip_: Invalid active 
threshold [0]

but this message appears a lot in my logs, so I think it is 
coincidence.  For Mark, the last message was some ReiserFS message.  
Mark feels like the error is ReiserFS related, and I was pretty sure it 
was swap related, until I turned off all swap partitions and the problem 
still occurred.  I *may* try converting all of my filesystems to 
something else if somebody knowledgeable thinks it could be the problem, 
but I'm guessing it is something deeper in; I've never seen a filesystem 
related problem that caused a lock-up like this.

I still feel that this may be memory related.  When I turn off swap, or 
when a drastically reduce my memory use, my laptop can run for hours, or 
even days with little use.  On the other hand, it can freeze up after 
five minutes, even before KDE has finished loading completely, with the 
swap on.  However, I haven't found a situation where it won't, 
eventually, lock up.  But I can't really pin it down, so I don't know 
where the problem is.

I haven't noticed the lockups without X, but I haven't run for any great 
length of time without X.  I'm running the ATI proprietary drivers, but 
I even when I revert to the XOrg ATI drivers (non-proprietary), I still 
get the lockups.

I'm really sorry that I can't provide more information; I'm usually not 
totally incompetent at narrowing down problems in software, but I have 
no idea where to even start looking for the problem here.  If there are 
any things I should try that might provide more information, please let 
me know.

I'm attaching my kernel config, plus all of the info from /proc that is 
suggested by the FAQ to be included.  I'll be happy to recompile my 
kernel with other options, if I can get some hints at starting points; I 
doubt my changing flags at random will help much.

Thanks, in advance.
Sean Russell
processor   : 0
vendor_id   : AuthenticAMD
cpu family  : 15
model   : 4
model name  : AMD Athlon(tm) 64 Processor 3400+
stepping: 10
cpu MHz : 801.849
cache size  : 1024 KB
fpu : yes
fpu_exception   : yes
cpuid level : 1
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov 
pat pse36 clflush mmx fxsr sse sse2 pni syscall nx mmxext lm 3dnowext 3dnow
bogomips: 1572.86
TLB size: 1024 4K pages
clflush size: 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management: ts fid vid ttp


-0009f7ff : System RAM
0009f800-0009 : reserved
000a-000b : Video RAM area
000c-000cefff : Video ROM
000cf000-000c : Adapter ROM
000f-000f : System ROM
0010-3fee : System RAM
  0010-003ae374 : Kernel code
  003ae375-004e0d27 : Kernel data
3fef-3fef9fff : ACPI Tables
3fefa000-3fef : ACPI Non-volatile Storage
3ff0-3fff : reserved
4000-4fff : :00:05.0
  4000-4fff : ipw2200
40001000-40001fff : :00:0c.0
d000-d0003fff : :00:06.0
d0004000-d0004fff : :00:0e.0
d0005000-d0005fff : :00:0e.0
d0006000-d0006fff : :00:0e.1
d0007000-d0007fff : :00:0e.1
d0008000-d00087ff : :00:06.0
  d0008000-d00087ff : ohci1394
d0008800-d00088ff : :00:08.0
  d0008800-d00088ff : r8169
d0008c00-d0008cff : :00:10.3
  d0008c00-d0008cff : ehci_hcd
d010-d01f : PCI Bus #01
  d010-d010 : :01:00.0
d010-d010 : radeonfb