Re: Netra X1 strange lock up Debian 3.1, kernel 2.6.12.3

2005-09-06 Thread Jurij Smakov

On Tue, 6 Sep 2005, Luigi Gangitano wrote:



Il giorno 05/set/05, alle ore 22:25, William Herrin ha scritto:


I'm getting a wierd partial-lockup under Debian 3.1 (sarge) on a Netra X1
with a 2.6.11.3 kernel compiled fresh from the sources.


I had the same problem on my Sun Blade 100. Never got to catch lock-up while 
still logged in, so I could not debug this issue.


This happened with kernels from 2.6.9 to 2.6.11. It appears to have been 
fixed in 2.6.12, no lockup in the last 20 days (before it was 6 days between 
each freeze).


Hi,

Did you ever try the stock Debian kernels? There is a bug report (322960) 
stating that stock 2.6.12 kernels do not boot on Netra T1. I wonder 
whether the same problem is present for X1.


Thanks and best regards,

Jurij Smakov[EMAIL PROTECTED]
Key: http://www.wooyd.org/pgpkey/   KeyID: C99E03CC


--
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Re: Netra X1 strange lock up Debian 3.1, kernel 2.6.12.3

2005-09-05 Thread Luigi Gangitano


Il giorno 05/set/05, alle ore 22:25, William Herrin ha scritto:

I'm getting a wierd partial-lockup under Debian 3.1 (sarge) on a  
Netra X1

with a 2.6.11.3 kernel compiled fresh from the sources.


I had the same problem on my Sun Blade 100. Never got to catch lock- 
up while still logged in, so I could not debug this issue.


This happened with kernels from 2.6.9 to 2.6.11. It appears to have  
been fixed in 2.6.12, no lockup in the last 20 days (before it was 6  
days between each freeze).


Regards,

Luigi Gangitano


--
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Re: Netra X1 strange lock up Debian 3.1, kernel 2.6.12.3

2005-09-05 Thread William Herrin
> > I'm getting a wierd partial-lockup under Debian 3.1 (sarge) on a  
> > Netra X1
> > with a 2.6.11.3 kernel compiled fresh from the sources.
> 
> This happened with kernels from 2.6.9 to 2.6.11. It appears to have  
> been fixed in 2.6.12, no lockup in the last 20 days (before it was 6  
> days between each freeze).

I'm sorry, that was a typo on my part. Its a 2.6.12.3 kernel, not 2.6.11.3.
I used 2.6.11.7 and when that failed I tried 2.6.12.3. The latter also failed.

Did you have good results with 2.6.8? Are there any serious bugs I need to
watch out for if I drop back to that version?

Thanks,
Bill

-- 
William D. Herrin  [EMAIL PROTECTED]  [EMAIL PROTECTED]
3005 Crane Dr.Web: 
Falls Church, VA 22042-3004


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Re: Netra X1 strange lock up Debian 3.1, kernel 2.6.12.3

2005-09-05 Thread William Herrin
> The box is up for weeks on end you say? But you have no swap added. 
> Could it be linux's 2.6 MM refusing to spawn a new shell due to low mem?

I disabled the swap after the second time it happened on the theory that it
might be thrashing (the machine is remotely located so I can't just listen
for the drive). It dies the same way with or without swap.

Anyway, I tested a program which allocated and initialized memory until there
was none left. The kernel killed it like its supposed to and the system date
didn't change. So, it doesn't seem to be an out of memory condition.

/* malloc memory until I croak */

#include 
#include 
#include 
#include 

int main (int argc, char **argv)
{
  char *p;

  while (1) {
p = (char*) malloc (sizeof(char) * 1);
if (!p) {
  printf ("Out of memory hit; waiting.\n");
  fflush (stdout);
  sleep (300);
  return 1;
}
bzero (p,sizeof(char)*1);
  }
  return 0;
}

{lily:herrin:/home/h/herrin:!} ./a.out
Killed
{lily:herrin:/home/h/herrin:!}


I have observed that if I have more running it dies sooner. For example, when
I had sendmail and apache running too it would die after a week or ten days.
Now it only dies every few weeks.

-Bill


-- 
William D. Herrin  [EMAIL PROTECTED]  [EMAIL PROTECTED]
3005 Crane Dr.Web: 
Falls Church, VA 22042-3004


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Re: Netra X1 strange lock up Debian 3.1, kernel 2.6.12.3

2005-09-05 Thread Scott Walker
The box is up for weeks on end you say? But you have no swap added. 
Could it be linux's 2.6 MM refusing to spawn a new shell due to low mem?


William Herrin wrote:

I'm getting a wierd partial-lockup under Debian 3.1 (sarge) on a Netra X1
with a 2.6.11.3 kernel compiled fresh from the sources.

The system will run fine for several weeks. Then it will refuse to run new
shells. Running daemons will continue to run but an attempt to start a new
shell will fail. I put some echos in /etc/profile to see where it stops.
The login stops responding when running the "id -u" command in /etc/profile.
When I remove that command it makes it to the end of /etc/profile but never
starts $HOME/.profile.

The other symptom is that the system clock jumps forward by 3 days, 6 hours,
11 minutes and 15 seconds. Every time. NTP is not running, nor is anything
else that should modify the date.

Examples from the /var/log/messages:

Jul 27 21:03:19 lily -- MARK --
Jul 27 21:04:19 lily -- MARK --
Jul 27 21:05:19 lily -- MARK --
Jul 31 03:17:34 lily -- MARK --
Jul 31 03:18:34 lily -- MARK --
Jul 31 03:19:34 lily -- MARK --

Sep  2 10:27:32 lily -- MARK --
Sep  2 10:28:32 lily -- MARK --
Sep  2 10:29:32 lily -- MARK --
Sep  5 16:41:47 lily -- MARK --
Sep  5 16:42:47 lily -- MARK --
Sep  5 16:43:47 lily -- MARK --

So, the hard drive is still writing. The Bind daemon (named) continues to run
and respond to queries. That and ssh are the only network services I have
running on the box. The kernel continues to output log messages from
iptables (with the wrong date) but outputs no other messages. Ssh will
connect, but it may or may not get past public key authentication. The
console will accept a login and make it past /etc/profile, but never
makes it to a prompt.

uname -a
Linux lily 2.6.12.3-lily #1 Mon Aug 1 18:40:53 EDT 2005 sparc64 GNU/Linux

cat /proc/cpuinfo
cpu : TI UltraSparc IIe (Hummingbird)
fpu : UltraSparc IIe integrated FPU
promlib : Version 3 Revision 0
prom: 4.0.6
type: sun4u
ncpus probed: 1
ncpus active: 1
Cpu0Bogo: 794.62
Cpu0ClkTck  : 17d78400
MMU Type: Spitfire

free
 total   used   free sharedbuffers cached
Mem:512496  49408 463088  0   8400  24976
-/+ buffers/cache:  16032 496464
Swap:0  0  0


I had the same problem with a 2.6.11.7 kernel, also compiled fresh from the
source on kernel.org. I havn't tried the stock Sarge kernel; that's next on my
list.

Has anyone seen anything like this before? Any suggestions for what to try to
track down the problem?

Thanks in advance,
Bill Herrin





--
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Re: Netra X1 strange lock up Debian 3.1, kernel 2.6.12.3

2005-09-05 Thread Admar Schoonen
On Mon, Sep 05, 2005 at 04:25:34PM -0400, William Herrin wrote:
> I'm getting a wierd partial-lockup under Debian 3.1 (sarge) on a Netra X1
> with a 2.6.11.3 kernel compiled fresh from the sources.



> Has anyone seen anything like this before? Any suggestions for what to try to
> track down the problem?

I have exactly the same problems on my blade 100 (same cpu, but at 500 MHz, 640
MB ram), running custom kernel 2.6.9 (from kernel.org sources I believe).
Unfortunately, I have no clue what is causing the problem, nor do I know any
workaround.

As this is my workstation, and usually I'm the only user, I just power cycle the
box, but it can be very inconvenient.

Admar


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Netra X1 strange lock up Debian 3.1, kernel 2.6.12.3

2005-09-05 Thread William Herrin
I'm getting a wierd partial-lockup under Debian 3.1 (sarge) on a Netra X1
with a 2.6.11.3 kernel compiled fresh from the sources.

The system will run fine for several weeks. Then it will refuse to run new
shells. Running daemons will continue to run but an attempt to start a new
shell will fail. I put some echos in /etc/profile to see where it stops.
The login stops responding when running the "id -u" command in /etc/profile.
When I remove that command it makes it to the end of /etc/profile but never
starts $HOME/.profile.

The other symptom is that the system clock jumps forward by 3 days, 6 hours,
11 minutes and 15 seconds. Every time. NTP is not running, nor is anything
else that should modify the date.

Examples from the /var/log/messages:

Jul 27 21:03:19 lily -- MARK --
Jul 27 21:04:19 lily -- MARK --
Jul 27 21:05:19 lily -- MARK --
Jul 31 03:17:34 lily -- MARK --
Jul 31 03:18:34 lily -- MARK --
Jul 31 03:19:34 lily -- MARK --

Sep  2 10:27:32 lily -- MARK --
Sep  2 10:28:32 lily -- MARK --
Sep  2 10:29:32 lily -- MARK --
Sep  5 16:41:47 lily -- MARK --
Sep  5 16:42:47 lily -- MARK --
Sep  5 16:43:47 lily -- MARK --

So, the hard drive is still writing. The Bind daemon (named) continues to run
and respond to queries. That and ssh are the only network services I have
running on the box. The kernel continues to output log messages from
iptables (with the wrong date) but outputs no other messages. Ssh will
connect, but it may or may not get past public key authentication. The
console will accept a login and make it past /etc/profile, but never
makes it to a prompt.

uname -a
Linux lily 2.6.12.3-lily #1 Mon Aug 1 18:40:53 EDT 2005 sparc64 GNU/Linux

cat /proc/cpuinfo
cpu : TI UltraSparc IIe (Hummingbird)
fpu : UltraSparc IIe integrated FPU
promlib : Version 3 Revision 0
prom: 4.0.6
type: sun4u
ncpus probed: 1
ncpus active: 1
Cpu0Bogo: 794.62
Cpu0ClkTck  : 17d78400
MMU Type: Spitfire

free
 total   used   free sharedbuffers cached
Mem:512496  49408 463088  0   8400  24976
-/+ buffers/cache:  16032 496464
Swap:0  0  0


I had the same problem with a 2.6.11.7 kernel, also compiled fresh from the
source on kernel.org. I havn't tried the stock Sarge kernel; that's next on my
list.

Has anyone seen anything like this before? Any suggestions for what to try to
track down the problem?

Thanks in advance,
Bill Herrin


-- 
William D. Herrin  [EMAIL PROTECTED]  [EMAIL PROTECTED]
3005 Crane Dr.Web: 
Falls Church, VA 22042-3004


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]