Re: ThinkPad R51 creeping segmentation faults

2015-09-11 Thread Paul Ausbeck
I finally got back to this ThinkPad R51 stability problem and was able 
to definitively assign blame to a defective (at least in this box) 
memory module. Defective memory was suggested on list as a probable 
cause, so thank you. I first used the "mem=1G" kernel boot parameter to 
limit memory used to the lowest addressed 1GB part out of two. When that 
configuration proved stable, I then swapped the memory module in the 
expansion bay, assumed higher address, with one in another laptop that I 
have. That configuration proved stable as well. So problem solved for 
the ThinkPad R51.


Both modules mentioned are by Crucial. Thus far, the  module that didn't 
function reliably in the ThinkPad R51 seems to have no problems in my 
Compaq NX6110. Of course I'm still tracking that module and have a 
couple of backup modules on order. The modules on order are not by Crucial.




Re: Re: ThinkPad R51 creeping segmentation faults

2015-06-19 Thread Sven Arvidsson
On Fri, 2015-06-19 at 11:55 -0700, Paul Ausbeck wrote:
> for emacs23, or at least I can't find any. I'm not yet ready to install 
> emacs24 because I have some confidence that the problem won't occur with 
> emacs24,  just as it doesn't occur with my built emacs23. But I'll still 
> have the problem so I'm going to keep the native emacs23 for a bit 
> longer to see if I can come up with a more general solution.

Oh, my mistake then, you are running wheezy? 
I guess it didn't have -dbg for emacs back then. 

> seem to indicate some significant differences. Just for posterity, 
> does 
> anyone have any insight into how one can build the identical Debian 
> binary to that installed?

How did you build it, from upstream source? Debian might add patches
and use different configuration options. 

Even so, it's not really a guarantee for an identical binary, and I'm
guessing that compiling and stripping debug info, (to later load it
with gdb) might also interfere. 

-- 
Cheers,
Sven Arvidsson
http://www.whiz.se
PGP Key ID 6FAB5CD5




signature.asc
Description: This is a digitally signed message part


Re: Re: ThinkPad R51 creeping segmentation faults

2015-06-19 Thread Paul Ausbeck
I apologize, Sven, for not following up on your suggestion. Or rather 
for not mentioning my followup in my last post. I did look at the 
available symbols packages. However, there aren't any symbols available 
for emacs23, or at least I can't find any. I'm not yet ready to install 
emacs24 because I have some confidence that the problem won't occur with 
emacs24,  just as it doesn't occur with my built emacs23. But I'll still 
have the problem so I'm going to keep the native emacs23 for a bit 
longer to see if I can come up with a more general solution.


Regards,

Paul Ausbeck


--
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org

Archive: https://lists.debian.org/55846585.3080...@alumni.cse.ucsc.edu



Re: ThinkPad R51 creeping segmentation faults

2015-06-18 Thread Sven Arvidsson
On Wed, 2015-06-17 at 16:06 -0700, Paul Ausbeck wrote:
anyone have any insight into how one can build the identical Debian 
> 
> binary to that installed?

My previous reply:

"It definitively sounds like a hardware problem, but I just wanted to
address the above. Debian have quite a few -dbg packages. For emacs
there is emacs24-dbg"

-- 
Cheers,
Sven Arvidsson
http://www.whiz.se
PGP Key ID 6FAB5CD5




signature.asc
Description: This is a digitally signed message part


Re: ThinkPad R51 creeping segmentation faults

2015-06-17 Thread Paul Ausbeck
Thanks to everyone who read and/or responded to my query. I've got some 
additional information that may prompt some additional discussion.


It seems there there is some chance that the problem is due to a RAM 
fault. I had run memtest86+ before I made the initial posting and hadn't 
gotten any failure indication. More recently I've run a program called 
memtester that runs not at boot time, but under linux. It can't test all 
memory, but about 90%. linux itself appears to be quite stable on this 
machine with no other problems after more than four days of uptime and 
quite a bit of activity on the machine. So I'm not that concerned about 
the memory that can't be tested as it doesn't seem to be the source the 
of the problem.  Anyhow, memtester hasn't found any failures at all thus 
far.


Running memtester showed that the segmentation fault problem could be 
cleared by allocating a large block of memory and then freeing it, 
thereby kind of resetting the system heap and minimizing the memory used 
for buffers and cached pages. So the problem really isn't that critical 
any more, in that if it occurs I can just free up system buffers and 
cached pages:


echo 3 | sudo tee /proc/sys/vm/drop_caches

and everything just hums along until some other future time, when said 
incantation can just be done again.


I've also run across a system tool called "pmap" that shows the memory 
map of an indicated process. As an example here's the memory map of the 
"ed" editor. Quite a bit smaller than emacs, ~145M, and vi, ~45M.


pmap 4493
4493:   ed
08048000 40K r-x--  /bin/ed
08052000  4K r  /bin/ed
08053000  4K rw---  /bin/ed
08c86000132K rw---[ anon ]
b749f000   1500K r  /usr/lib/locale/locale-archive
b7616000  4K rw---[ anon ]
b7617000   1404K r-x-- /lib/i386-linux-gnu/i686/cmov/libc-2.13.so
b7776000  8K r /lib/i386-linux-gnu/i686/cmov/libc-2.13.so
b7778000  4K rw--- /lib/i386-linux-gnu/i686/cmov/libc-2.13.so
b7779000 12K rw---[ anon ]
b7786000  4K rw---[ anon ]
b7787000 28K r--s- /usr/lib/i386-linux-gnu/gconv/gconv-modules.cache
b778e000  8K rw---[ anon ]
b779  4K r-x--[ anon ]
b7791000  8K r[ anon ]
b7793000112K r-x--  /lib/i386-linux-gnu/ld-2.13.so
b77af000  4K r  /lib/i386-linux-gnu/ld-2.13.so
b77b  4K rw---  /lib/i386-linux-gnu/ld-2.13.so
bfe04000132K rw---[ stack ]
 total 3416K

I'm thinking that I can use this tool together with the grub BADRAM 
facility, /etc/default/grub, to maybe find and map out failing memory 
locations. If anyone has any experience with such things please post any 
time saving hints that you may have.


It also turns out that building emacs is not that difficult, with the 
proper incantations:


sudo aptitude install dkpg-dev
sudo apt-get build-dep emacs23
cd ~/src/emacs
apt-get source emacs23 --compile

But it also turns out that the resulting emacs binary is not the same as 
the installed binary and indeed does not fault when the installed binary 
does fault. The sizes of the two different binaries:


-rwxr-xr-x 1 root root 6731016 Sep  9  2012 /usr/bin/emacs23-x
-rwxr-xr-x 1 paula paula 6825224 Jun 16 17:05 emacs23-x

seem to indicate some significant differences. Just for posterity, does 
anyone have any insight into how one can build the identical Debian 
binary to that installed?


Paul Ausbeck


--
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org

Archive: https://lists.debian.org/5581fd8c.5060...@alumni.cse.ucsc.edu



Re: ThinkPad R51 creeping segmentation faults

2015-06-16 Thread Sven Arvidsson
On Sun, 2015-06-14 at 14:49 -0700, Paul Ausbeck wrote:
> I've looked at
> compiling a debug version of emacs but that isn't trivial, still in 
> progress. 

It definitively sounds like a hardware problem, but I just wanted to
address the above. Debian have quite a few -dbg packages. For emacs
there is emacs24-dbg

-- 
Cheers,
Sven Arvidsson
http://www.whiz.se
PGP Key ID 6FAB5CD5



signature.asc
Description: This is a digitally signed message part


Re: ThinkPad R51 creeping segmentation faults

2015-06-15 Thread Gary Dale

On 15/06/15 07:52 PM, Bob Proulx wrote:

Martin Read wrote:

Bob Proulx wrote:

In the old days computers would use ECC ram throughout.

ECC (in the strict sense) has never been ubiquitous.

At one time every computer I interfaced with had ECC.  It was very
popular with me and everyone else I knew. :-)


Parity was quite common in certain timeframes, but parity won't stop your
system crashing if you get bitflips - it'll just make it crash
*immediately*.

Parity would at least provide for better error messages and
diagnosibility.  I just tossed out a bad one year old 4G 204 pin ram
just TODAY that caused really wierd errors on the system.  I pulled it
and ran memtest86 on it in another system and it threw errors on an
overnight run fortunately confirming the problem.

Bob
At one year old, it probably was still under warranty. I've sent back 
lots of memory for replacement when it failed prematurely.


Memtest86 can pick up a lot of memory errors but I've also seen 
memory/disk errors occur, where the memory checks out OK even on 
overnight runs of memtest and the disk checks out fine, but when you use 
both together, you get weird errors. Klaus Knopper also reported on this 
a few years back. I suspect its why motherboard manufacturers only 
certify certain RAM with their boards.



--
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org

Archive: https://lists.debian.org/557f70ee.6050...@torfree.net



Re: ThinkPad R51 creeping segmentation faults

2015-06-15 Thread Bob Proulx
Martin Read wrote:
> Bob Proulx wrote:
> >In the old days computers would use ECC ram throughout.
> 
> ECC (in the strict sense) has never been ubiquitous.

At one time every computer I interfaced with had ECC.  It was very
popular with me and everyone else I knew. :-)

> Parity was quite common in certain timeframes, but parity won't stop your
> system crashing if you get bitflips - it'll just make it crash
> *immediately*.

Parity would at least provide for better error messages and
diagnosibility.  I just tossed out a bad one year old 4G 204 pin ram
just TODAY that caused really wierd errors on the system.  I pulled it
and ran memtest86 on it in another system and it threw errors on an
overnight run fortunately confirming the problem.

Bob


signature.asc
Description: Digital signature


Re: ThinkPad R51 creeping segmentation faults

2015-06-15 Thread Martin Read

On 14/06/15 23:40, Bob Proulx wrote:

In the old days computers would use ECC ram throughout.


ECC (in the strict sense) has never been ubiquitous.

Parity was quite common in certain timeframes, but parity won't stop 
your system crashing if you get bitflips - it'll just make it crash 
*immediately*.



--
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org

Archive: https://lists.debian.org/557ef1b4.8000...@zen.co.uk



Re: ThinkPad R51 creeping segmentation faults

2015-06-14 Thread Bob Proulx
Paul Ausbeck wrote:
> I recently replaced the hard disk in my ThinkPad R51 with a solid
> state drive

The ThinkPad R51 is a solid machine.  Don't let anyone tell you
otherwise.

> The symptom is that as time goes on more and more programs will cause a
> segmentation fault while loading. For instance, emacs commonly is the first
> program to go. Then maybe iceweasel. Just today iceweasel wouldn't load at
> all but then following another suspend/resume cycle it now loads to a point
> where it presents a safe mode dialog but then crashes if the mouse pointer
> is moved over the dialog box.

That sounds very much like a hardware fault.  Probably a ram failure.
Which is the best type to have because ram is the cheapest to swap out.

If it isn't a ram failure then unfortunately it would most likely be a
cpu failure.  It would be possible to swap the cpu but much more
inconvenient.  Third likely would be some failure on the motherboard.

The root cause of a segmentation fault that isn't a software bug is
that bits are getting flipped.  Let's say a pointer to some piece of
memory is being accessed but a bit of the pointer value is
flipped.  That will cause it to access the array out of bounds and
cause a segmentation violation.  Those will be random because the
location of the program is different at different times and bits being
flipped could be anywhere.  This is most likely to occur when running
programs that use a lot of memory.  That is why you are seeing it on
Iceweasel, which is true memory hog, and ahem, my favorite editor
Emacs too.  Those programs are making the most use of your memory and
are therefore the mostly likely to suffer from flipped bits.

In the old days computers would use ECC ram throughout.  The ECC would
protect you from these problems.  For years however we have suffered
under MS quality hardware.  It doesn't make financial sense to make
hardware more reliable than the OS sold with it and most machines have
been sold with MS.

> I've looked around a bit on the internet for similar problems and come up
> short. In fact, this class of problem seems inherently difficult to drive to
> ground, at least with the knowledge that I currently possess. So what I hope
> is that the Debian mailing list can give me some good seeds for new
> knowledge to acquire. In particular I'd be interested in how others might
> have approached similar situations.

I would start by running memtest86+ overnight.

  apt-get install memtest86+

Then rebooting to the memtest system and letting it run overnight.
Hopefully it will indicate a problem.  That would be the best result.

> I've tried loading emacs and iceweasel with gdb to get stack
> backtraces.

If random programs are segfaulting then it is very unlikely to be a
problem with any of those programs.

> One last specific question that sort of embarrasses me to ask, is
> where should segmentation fault messages be logged?

/var/log/syslog logs all system messages.  I always look there.  Red
Hat calls it /var/log/messages and Debian also logs there too.  The
/var/log/kern.log is for the subset that are kernel messages.

To understand the difference look at /etc/rsyslog.conf and see what
gets logged different places.  /var/log/syslog contains pretty much
everything and the other logs contain more specific things.  Mostly.

Do you have mcelog installed?  If not then install it.

  apt-get install mcelog

> I've grepped around and there are a few segfault messages from maybe
> a week ago in kern.log.1 and messages.1, but nothing in kern.log or
> messages.log. Perhaps these are still in a memory ring buffer
> somewhere? Is there some sort of tool for viewing user space log
> messages, I mean other than dmesg which doesn't appear to show any
> user space messages?

What I have told you applies to Wheezy 7 you are running which is
running sysvinit.  A lot of flamewar has been spent on the new systemd
binary file logging in Jessie 8.  I mention this only to give you a
heads up that everything you have previously learned about the system
up through Wheezy 7 is all changed in Jessie 8.  If you decide to
stick with sysvinit then what you learn about /etc/rsyslog.conf
applies.  If you go with the new systemd journal in Jessie 8 then the
entire universe is a different place and you will need to learn it all
new for systemd.  Just to let you know there was a major change that
rolled out with the Jessie 8 release.

Bob


signature.asc
Description: Digital signature


Re: ThinkPad R51 creeping segmentation faults

2015-06-14 Thread Andrew M.A. Cater
On Sun, Jun 14, 2015 at 02:49:22PM -0700, Paul Ausbeck wrote:
> I recently replaced the hard disk in my ThinkPad R51 with a solid state
> drive and when I did so I installed Debian Wheezy LXDE updated with a 3.16
> kernel as one of the boot options. I really am pleased with how the system
> looks and acts except for a curious instability that occurs increasingly
> frequently as uptime and/or suspend/resume cycles increase.
> 

This is a machine from 2004 or so so 10 years old. Is this running on the 
original power supply and any of the original memory?

It may just be that the machine is nearing end of life.

> The symptom is that as time goes on more and more programs will cause a
> segmentation fault while loading. For instance, emacs commonly is the first
> program to go. Then maybe iceweasel. Just today iceweasel wouldn't load at
> all but then following another suspend/resume cycle it now loads to a point
> where it presents a safe mode dialog but then crashes if the mouse pointer
> is moved over the dialog box.
> 

Top is good to see what's running at any one time.

> The machine has 2GB of dram, an Intel 2200BG wireless card, and an ATI/AMD
> mobility graphics subsystem. I mention the ram to show that it has plenty,
> the 2200BG as it's driver will occassionally start using 100% of the CPU and
> must be reset by unloading and reloading ipw2200, and the graphics subsystem
> as this machine is the only machine that I have that contains an ATI/AMD
> graphics subsystem and the first where I've used the open source radeon
> driver. Also, both the ipw2200 driver and radeon driver require binary
> firmware blobs. One other item of interest at least to me, is that I've
> configured the machine with a smaller swap file, 1GB, than the size of
> physical memory. I'm not positive, but this may be the only machine that
> I've personally configured that has a swap file smaller than physical
> memory.
> 

2G and 1G swap may be a touch tight for wome programs - stuff has got bigger 
over the last 10 years.  I have a netbook with that sort of memory - but a 
dying RTC
- and it struggles with load.

> This is the eighth machine where I've installed Debian Wheezy or Jessie and
> I've not previously encountered a similar problem. I was getting to think I
> understood linux a bit but now I'm thinking I need a whole new layer of
> debug/diagnostic techniques. The reason that I'm posting about this is that
> I'm reasonably convinced that this is not a symptom of flaky hardware. I've
> checked the system memory with various tools and there is no obvious
> problem. Significantly for me, Windows XP and 7 both run on this system
> without any problems, well no  vaguely similar problems. And I've been using
> this machine with Windows XP for more than 10 years. Just as an aside,
> Windows 7 is not really an option on this machine as there is no available
> Radeon Mobility graphics driver, making videos not really playable.
> 
> I'm reasonably certain that the problem is not configuration related. I've
> used 3.2, 3.12 and 3.14 kernels on this box and all behave similarly to the
> 3.16 kernel. I've also used Debian Jessie and though segmentation faults are
> not reported, in the same creeping fashion the loader will begin to refuse
> to load certain programs, and though right now I can't remember the exact
> cryptic error the whole problem feels as if it is just a different
> manifestation of the segfault problem on Wheezy.
> 

Memtest run for a significant period might help to flush out problems.

> I've looked around a bit on the internet for similar problems and come up
> short. In fact, this class of problem seems inherently difficult to drive to
> ground, at least with the knowledge that I currently possess. So what I hope
> is that the Debian mailing list can give me some good seeds for new
> knowledge to acquire. In particular I'd be interested in how others might
> have approached similar situations. I've tried loading emacs and iceweasel
> with gdb to get stack backtraces. With emacs, absolutely no symbols. With
> iceweasel, a few symbols and it appears that the crash happens during a
> memory free operation. I've looked at compiling a debug version of emacs but
> that isn't trivial, still in progress. The whole exercise has got me
> wondering if there are any other debug/diagnostic options to try before
> recompiling various parts of the system. One last specific question that
> sort of embarrasses me to ask, is where should segmentation fault messages
> be logged? I've grepped around and there are a few segfault messages from
> maybe a week ago in kern.log.1 and messages.1, but nothing in kern.log or
> messages.log. Perhaps these are still in a memory ring buffer somewhere? Is
> there some sort of tool for viewing user space log messages, I mean other
> than dmesg which doesn't appear to show any user space messages?
> 
> 

All the best

AndyC

> -- 
> To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org wi

ThinkPad R51 creeping segmentation faults

2015-06-14 Thread Paul Ausbeck
I recently replaced the hard disk in my ThinkPad R51 with a solid state 
drive and when I did so I installed Debian Wheezy LXDE updated with a 
3.16 kernel as one of the boot options. I really am pleased with how the 
system looks and acts except for a curious instability that occurs 
increasingly frequently as uptime and/or suspend/resume cycles increase.


The symptom is that as time goes on more and more programs will cause a 
segmentation fault while loading. For instance, emacs commonly is the 
first program to go. Then maybe iceweasel. Just today iceweasel wouldn't 
load at all but then following another suspend/resume cycle it now loads 
to a point where it presents a safe mode dialog but then crashes if the 
mouse pointer is moved over the dialog box.


The machine has 2GB of dram, an Intel 2200BG wireless card, and an 
ATI/AMD mobility graphics subsystem. I mention the ram to show that it 
has plenty, the 2200BG as it's driver will occassionally start using 
100% of the CPU and must be reset by unloading and reloading ipw2200, 
and the graphics subsystem as this machine is the only machine that I 
have that contains an ATI/AMD graphics subsystem and the first where 
I've used the open source radeon driver. Also, both the ipw2200 driver 
and radeon driver require binary firmware blobs. One other item of 
interest at least to me, is that I've configured the machine with a 
smaller swap file, 1GB, than the size of physical memory. I'm not 
positive, but this may be the only machine that I've personally 
configured that has a swap file smaller than physical memory.


This is the eighth machine where I've installed Debian Wheezy or Jessie 
and I've not previously encountered a similar problem. I was getting to 
think I understood linux a bit but now I'm thinking I need a whole new 
layer of debug/diagnostic techniques. The reason that I'm posting about 
this is that I'm reasonably convinced that this is not a symptom of 
flaky hardware. I've checked the system memory with various tools and 
there is no obvious problem. Significantly for me, Windows XP and 7 both 
run on this system without any problems, well no  vaguely similar 
problems. And I've been using this machine with Windows XP for more than 
10 years. Just as an aside, Windows 7 is not really an option on this 
machine as there is no available Radeon Mobility graphics driver, making 
videos not really playable.


I'm reasonably certain that the problem is not configuration related. 
I've used 3.2, 3.12 and 3.14 kernels on this box and all behave 
similarly to the 3.16 kernel. I've also used Debian Jessie and though 
segmentation faults are not reported, in the same creeping fashion the 
loader will begin to refuse to load certain programs, and though right 
now I can't remember the exact cryptic error the whole problem feels as 
if it is just a different manifestation of the segfault problem on Wheezy.


I've looked around a bit on the internet for similar problems and come 
up short. In fact, this class of problem seems inherently difficult to 
drive to ground, at least with the knowledge that I currently possess. 
So what I hope is that the Debian mailing list can give me some good 
seeds for new knowledge to acquire. In particular I'd be interested in 
how others might have approached similar situations. I've tried loading 
emacs and iceweasel with gdb to get stack backtraces. With emacs, 
absolutely no symbols. With iceweasel, a few symbols and it appears that 
the crash happens during a memory free operation. I've looked at 
compiling a debug version of emacs but that isn't trivial, still in 
progress. The whole exercise has got me wondering if there are any other 
debug/diagnostic options to try before recompiling various parts of the 
system. One last specific question that sort of embarrasses me to ask, 
is where should segmentation fault messages be logged? I've grepped 
around and there are a few segfault messages from maybe a week ago in 
kern.log.1 and messages.1, but nothing in kern.log or messages.log. 
Perhaps these are still in a memory ring buffer somewhere? Is there some 
sort of tool for viewing user space log messages, I mean other than 
dmesg which doesn't appear to show any user space messages?



--
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org

Archive: https://lists.debian.org/557df6e2.6010...@alumni.cse.ucsc.edu