Re: Debian ADM64 Etch (testing/unstable) system freeze

2006-02-04 Thread Rami Saarinen
On Friday 03 February 2006 15:47, Anthony DeRobertis wrote:
> Rami Saarinen wrote:
> > Anyway, I am glad to inform that yes it really was the memory that was
> > causing the trouble. I let the machine run the memtest86+ last night and
> > after 10 hours it had found four memory errors. Apparently I was too
> > hasty at the first time.
>
> Well, now you get the next fun step... verifying that the bad memory
> didn't corrupt your system install, or your data. I think you said you
> have ECC memory, so you're probably safe, but you should run debsums,
> making sure it checks every package installed on your system (you'll
> have to download copies of a bunch of the .deb's that don't include
> md5sum information in them).

Hey, thanks! I almost forgot this. Just can't wait for that fun to begin... :)

Thanks to all for good help!

-- 
Rami Saarinen


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Re: Debian ADM64 Etch (testing/unstable) system freeze

2006-02-03 Thread Anthony DeRobertis
Rami Saarinen wrote:

> Anyway, I am glad to inform that yes it really was the memory that was
> causing the trouble. I let the machine run the memtest86+ last night and
> after 10 hours it had found four memory errors. Apparently I was too
> hasty at the first time.

Well, now you get the next fun step... verifying that the bad memory
didn't corrupt your system install, or your data. I think you said you
have ECC memory, so you're probably safe, but you should run debsums,
making sure it checks every package installed on your system (you'll
have to download copies of a bunch of the .deb's that don't include
md5sum information in them).


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Re: Debian ADM64 Etch (testing/unstable) system freeze

2006-01-31 Thread Andrew Sharp
On Tue, Jan 31, 2006 at 12:08:28AM -0800, Corey Hickey wrote:
> Rami Saarinen wrote:
> > Anyway, I am glad to inform that yes it really was the memory that was 
> > causing the trouble. I let the machine run the memtest86+ last night and 
> > after 10 hours it had found four memory errors. Apparently I was too 
> > hasty at the first time.
> > 
> > I have one more stupid question: as it may take couple of days for me to 
> > get the new memory. Is there any way to block / reserve the faulty 
> > memory area so that it would not be available for use?
> 
> If memtest86+ is consistently reporting a few addresses, then you can
> use the badram kernel patch:
> 
> http://rick.vanrein.org/linux/badram/
> 
> I had some very slight stability issues with my machine after I build
> it, and memtest86+ reported one memory failure after I ran it for a
> while. The problem turned out to be that my BIOS was, for some reason,
> setting the memory timing (CAS/RAS/etc. -- I don't remember which) more
> aggressively than the values at which the RAM was specced to operate.
> So, if memtest86+ seems to be reporting random, sporadic failures, you
> might try checking and increasing your memory timings.

You might also want to try reseating the memory once or twice, and
checking the cooling to make sure it isn't a heat problem.  If you
haven't already, that is.

Cheers,

a


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Re: Debian ADM64 Etch (testing/unstable) system freeze

2006-01-31 Thread Corey Hickey
Rami Saarinen wrote:
> Anyway, I am glad to inform that yes it really was the memory that was 
> causing the trouble. I let the machine run the memtest86+ last night and 
> after 10 hours it had found four memory errors. Apparently I was too 
> hasty at the first time.
> 
> I have one more stupid question: as it may take couple of days for me to 
> get the new memory. Is there any way to block / reserve the faulty 
> memory area so that it would not be available for use?

If memtest86+ is consistently reporting a few addresses, then you can
use the badram kernel patch:

http://rick.vanrein.org/linux/badram/

I had some very slight stability issues with my machine after I build
it, and memtest86+ reported one memory failure after I ran it for a
while. The problem turned out to be that my BIOS was, for some reason,
setting the memory timing (CAS/RAS/etc. -- I don't remember which) more
aggressively than the values at which the RAM was specced to operate.
So, if memtest86+ seems to be reporting random, sporadic failures, you
might try checking and increasing your memory timings.

-Corey


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Re: Debian ADM64 Etch (testing/unstable) system freeze

2006-01-30 Thread Rami Saarinen

Anthony DeRobertis wrote:

Rami Saarinen wrote:


Well, somehow I assumed that if the fault is in memory, it is probably in a 
fixed location



Depends on the type of memory problem. Memory problems can cover
everything from "this one certain bit is stuck at 0" (what you're
thinking of) to "the memory timings/voltage/whatever are off, memory
functions as a hardware random number generater as a result."


Yes, very true.


Oh, and memory allocation is not random. The kernel is going to wind up
in a certain spot every time. So will, e.g., init.


Yes. Somehow I ended up thinking that if the fault is in the memory area 
the kernel uses, the faulty behaviour would be more devastating and 
would occur more ofter. After all I have ran the system for hours 
without a problem.


Anyway, I am glad to inform that yes it really was the memory that was 
causing the trouble. I let the machine run the memtest86+ last night and 
after 10 hours it had found four memory errors. Apparently I was too 
hasty at the first time.


I have one more stupid question: as it may take couple of days for me to 
get the new memory. Is there any way to block / reserve the faulty 
memory area so that it would not be available for use?


Thanks again for help!

--
Rami Saarinen




--
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Re: Debian ADM64 Etch (testing/unstable) system freeze

2006-01-30 Thread Anthony DeRobertis
Rami Saarinen wrote:

> Well, somehow I assumed that if the fault is in memory, it is probably in a 
> fixed location

Depends on the type of memory problem. Memory problems can cover
everything from "this one certain bit is stuck at 0" (what you're
thinking of) to "the memory timings/voltage/whatever are off, memory
functions as a hardware random number generater as a result."

Oh, and memory allocation is not random. The kernel is going to wind up
in a certain spot every time. So will, e.g., init.


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Re: Debian ADM64 Etch (testing/unstable) system freeze

2006-01-30 Thread Rami Saarinen
On Friday 27 January 2006 23:58, Andrew Sharp wrote:
> On Fri, Jan 27, 2006 at 01:57:59AM +0200, Rami Saarinen wrote:
> > On Thursday 26 January 2006 15:21, Andrew Syrewicze wrote:
> > > I wouldn't rule out the possibility of your processor getting to hot.
> > > The newcastle cores aren't as solid as venice cores, and i hear they
> > > run a little hotter too. I use a venice core and i can overclock the
> > > crap out of that thing. (with a huge thermaltake fan on it of course )
> > > :-P.
> > >
> > > Anyway i would start by checking your cpu temp. I would first check in
> > > BIOS.
> >
> > Froze two times again today. First time I was moving a 2.1 Gb file to
> > another location on the disk and the second time I was doing the same as
> > in the my previous post. This time I was lucky as there was actually some
> > output.
> >
> > First time froze with: "kernel stack segment  [1]"
> > and the second: "general protection fault "
> >
> > Afrer reboot I checked the temperature from BIOS - 32 celsius, so it is
> > not overheating issue.
> >
> > I doubt the memory issue also as I'd expect alternating symptoms like
> > programs crashing etc. not just full system freeze every time. (?)
> > Thanks for everyone for help.
>
> I don't know why you would assume that.  Memory problems can cause
> any/all of these symptoms, but don't have to cause any particular one.
> It sure sounds like a hardware/memory problem to me.
>

Well, somehow I assumed that if the fault is in memory, it is probably in a 
fixed location and there could be variance of which program gets the faulty 
part. For example I might assume that typical memory error is that the value 
stored in the memory is changed when it is fetched and thus would cause 
various symptoms from rampant crashes to system freeze. But then again I am 
no memory expert. (Firefox does seem to be unstable at the moment).

Anyway I am going to run memtest seriously this time and I am also trying to 
borrow some other memory to see if the problems persist. 

Thanks all for help.

-- 
Rami Saarinen


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Re: Debian ADM64 Etch (testing/unstable) system freeze

2006-01-27 Thread Andrew Sharp
On Fri, Jan 27, 2006 at 01:57:59AM +0200, Rami Saarinen wrote:
> On Thursday 26 January 2006 15:21, Andrew Syrewicze wrote:
> > I wouldn't rule out the possibility of your processor getting to hot. The
> > newcastle cores aren't as solid as venice cores, and i hear they run a
> > little hotter too. I use a venice core and i can overclock the crap out of
> > that thing. (with a huge thermaltake fan on it of course ) :-P.
> >
> > Anyway i would start by checking your cpu temp. I would first check in
> > BIOS.
> 
> Froze two times again today. First time I was moving a 2.1 Gb file to another 
> location on the disk and the second time I was doing the same as in the my 
> previous post. This time I was lucky as there was actually some output. 
> 
> First time froze with: "kernel stack segment  [1]"
> and the second: "general protection fault "
> 
> Afrer reboot I checked the temperature from BIOS - 32 celsius, so it is not 
> overheating issue. 
> 
> I doubt the memory issue also as I'd expect alternating symptoms like 
> programs 
> crashing etc. not just full system freeze every time. (?)
> Thanks for everyone for help. 

I don't know why you would assume that.  Memory problems can cause
any/all of these symptoms, but don't have to cause any particular one.
It sure sounds like a hardware/memory problem to me.

Cheers,

a



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Re: Debian ADM64 Etch (testing/unstable) system freeze

2006-01-26 Thread Rami Saarinen
On Thursday 26 January 2006 15:21, Andrew Syrewicze wrote:
> I wouldn't rule out the possibility of your processor getting to hot. The
> newcastle cores aren't as solid as venice cores, and i hear they run a
> little hotter too. I use a venice core and i can overclock the crap out of
> that thing. (with a huge thermaltake fan on it of course ) :-P.
>
> Anyway i would start by checking your cpu temp. I would first check in
> BIOS.

Froze two times again today. First time I was moving a 2.1 Gb file to another 
location on the disk and the second time I was doing the same as in the my 
previous post. This time I was lucky as there was actually some output. 

First time froze with: "kernel stack segment  [1]"
and the second: "general protection fault "

Afrer reboot I checked the temperature from BIOS - 32 celsius, so it is not 
overheating issue. 

I doubt the memory issue also as I'd expect alternating symptoms like programs 
crashing etc. not just full system freeze every time. (?)
Thanks for everyone for help. 

--
Rami Saarinen


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Re: Debian ADM64 Etch (testing/unstable) system freeze

2006-01-26 Thread Andrew Syrewicze
I wouldn't rule out the possibility of your processor getting to hot. The newcastle cores aren't as solid as venice cores, and i hear they run a little hotter too. I use a venice core and i can overclock the crap out of that thing. (with a huge thermaltake fan on it of course ) :-P. 
Anyway i would start by checking your cpu temp. I would first check in BIOS.You might also try installing gkrellm. It's a nice program for system monitoring. Make sure you have acpi installed as well!!!
You could also try UNDERclocking your processor, and if none of this works, try putting in another video card. Worst case it's your system board. (which i highly doubt).good luck -Andy