Re: ThinkPad R51 creeping segmentation faults
I finally got back to this ThinkPad R51 stability problem and was able to definitively assign blame to a defective (at least in this box) memory module. Defective memory was suggested on list as a probable cause, so thank you. I first used the "mem=1G" kernel boot parameter to limit memory used to the lowest addressed 1GB part out of two. When that configuration proved stable, I then swapped the memory module in the expansion bay, assumed higher address, with one in another laptop that I have. That configuration proved stable as well. So problem solved for the ThinkPad R51. Both modules mentioned are by Crucial. Thus far, the module that didn't function reliably in the ThinkPad R51 seems to have no problems in my Compaq NX6110. Of course I'm still tracking that module and have a couple of backup modules on order. The modules on order are not by Crucial.
Re: Re: ThinkPad R51 creeping segmentation faults
On Fri, 2015-06-19 at 11:55 -0700, Paul Ausbeck wrote: > for emacs23, or at least I can't find any. I'm not yet ready to install > emacs24 because I have some confidence that the problem won't occur with > emacs24, just as it doesn't occur with my built emacs23. But I'll still > have the problem so I'm going to keep the native emacs23 for a bit > longer to see if I can come up with a more general solution. Oh, my mistake then, you are running wheezy? I guess it didn't have -dbg for emacs back then. > seem to indicate some significant differences. Just for posterity, > does > anyone have any insight into how one can build the identical Debian > binary to that installed? How did you build it, from upstream source? Debian might add patches and use different configuration options. Even so, it's not really a guarantee for an identical binary, and I'm guessing that compiling and stripping debug info, (to later load it with gdb) might also interfere. -- Cheers, Sven Arvidsson http://www.whiz.se PGP Key ID 6FAB5CD5 signature.asc Description: This is a digitally signed message part
Re: Re: ThinkPad R51 creeping segmentation faults
I apologize, Sven, for not following up on your suggestion. Or rather for not mentioning my followup in my last post. I did look at the available symbols packages. However, there aren't any symbols available for emacs23, or at least I can't find any. I'm not yet ready to install emacs24 because I have some confidence that the problem won't occur with emacs24, just as it doesn't occur with my built emacs23. But I'll still have the problem so I'm going to keep the native emacs23 for a bit longer to see if I can come up with a more general solution. Regards, Paul Ausbeck -- To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org Archive: https://lists.debian.org/55846585.3080...@alumni.cse.ucsc.edu
Re: ThinkPad R51 creeping segmentation faults
On Wed, 2015-06-17 at 16:06 -0700, Paul Ausbeck wrote: anyone have any insight into how one can build the identical Debian > > binary to that installed? My previous reply: "It definitively sounds like a hardware problem, but I just wanted to address the above. Debian have quite a few -dbg packages. For emacs there is emacs24-dbg" -- Cheers, Sven Arvidsson http://www.whiz.se PGP Key ID 6FAB5CD5 signature.asc Description: This is a digitally signed message part
Re: ThinkPad R51 creeping segmentation faults
Thanks to everyone who read and/or responded to my query. I've got some additional information that may prompt some additional discussion. It seems there there is some chance that the problem is due to a RAM fault. I had run memtest86+ before I made the initial posting and hadn't gotten any failure indication. More recently I've run a program called memtester that runs not at boot time, but under linux. It can't test all memory, but about 90%. linux itself appears to be quite stable on this machine with no other problems after more than four days of uptime and quite a bit of activity on the machine. So I'm not that concerned about the memory that can't be tested as it doesn't seem to be the source the of the problem. Anyhow, memtester hasn't found any failures at all thus far. Running memtester showed that the segmentation fault problem could be cleared by allocating a large block of memory and then freeing it, thereby kind of resetting the system heap and minimizing the memory used for buffers and cached pages. So the problem really isn't that critical any more, in that if it occurs I can just free up system buffers and cached pages: echo 3 | sudo tee /proc/sys/vm/drop_caches and everything just hums along until some other future time, when said incantation can just be done again. I've also run across a system tool called "pmap" that shows the memory map of an indicated process. As an example here's the memory map of the "ed" editor. Quite a bit smaller than emacs, ~145M, and vi, ~45M. pmap 4493 4493: ed 08048000 40K r-x-- /bin/ed 08052000 4K r /bin/ed 08053000 4K rw--- /bin/ed 08c86000132K rw---[ anon ] b749f000 1500K r /usr/lib/locale/locale-archive b7616000 4K rw---[ anon ] b7617000 1404K r-x-- /lib/i386-linux-gnu/i686/cmov/libc-2.13.so b7776000 8K r /lib/i386-linux-gnu/i686/cmov/libc-2.13.so b7778000 4K rw--- /lib/i386-linux-gnu/i686/cmov/libc-2.13.so b7779000 12K rw---[ anon ] b7786000 4K rw---[ anon ] b7787000 28K r--s- /usr/lib/i386-linux-gnu/gconv/gconv-modules.cache b778e000 8K rw---[ anon ] b779 4K r-x--[ anon ] b7791000 8K r[ anon ] b7793000112K r-x-- /lib/i386-linux-gnu/ld-2.13.so b77af000 4K r /lib/i386-linux-gnu/ld-2.13.so b77b 4K rw--- /lib/i386-linux-gnu/ld-2.13.so bfe04000132K rw---[ stack ] total 3416K I'm thinking that I can use this tool together with the grub BADRAM facility, /etc/default/grub, to maybe find and map out failing memory locations. If anyone has any experience with such things please post any time saving hints that you may have. It also turns out that building emacs is not that difficult, with the proper incantations: sudo aptitude install dkpg-dev sudo apt-get build-dep emacs23 cd ~/src/emacs apt-get source emacs23 --compile But it also turns out that the resulting emacs binary is not the same as the installed binary and indeed does not fault when the installed binary does fault. The sizes of the two different binaries: -rwxr-xr-x 1 root root 6731016 Sep 9 2012 /usr/bin/emacs23-x -rwxr-xr-x 1 paula paula 6825224 Jun 16 17:05 emacs23-x seem to indicate some significant differences. Just for posterity, does anyone have any insight into how one can build the identical Debian binary to that installed? Paul Ausbeck -- To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org Archive: https://lists.debian.org/5581fd8c.5060...@alumni.cse.ucsc.edu
Re: ThinkPad R51 creeping segmentation faults
On Sun, 2015-06-14 at 14:49 -0700, Paul Ausbeck wrote: > I've looked at > compiling a debug version of emacs but that isn't trivial, still in > progress. It definitively sounds like a hardware problem, but I just wanted to address the above. Debian have quite a few -dbg packages. For emacs there is emacs24-dbg -- Cheers, Sven Arvidsson http://www.whiz.se PGP Key ID 6FAB5CD5 signature.asc Description: This is a digitally signed message part
Re: ThinkPad R51 creeping segmentation faults
On 15/06/15 07:52 PM, Bob Proulx wrote: Martin Read wrote: Bob Proulx wrote: In the old days computers would use ECC ram throughout. ECC (in the strict sense) has never been ubiquitous. At one time every computer I interfaced with had ECC. It was very popular with me and everyone else I knew. :-) Parity was quite common in certain timeframes, but parity won't stop your system crashing if you get bitflips - it'll just make it crash *immediately*. Parity would at least provide for better error messages and diagnosibility. I just tossed out a bad one year old 4G 204 pin ram just TODAY that caused really wierd errors on the system. I pulled it and ran memtest86 on it in another system and it threw errors on an overnight run fortunately confirming the problem. Bob At one year old, it probably was still under warranty. I've sent back lots of memory for replacement when it failed prematurely. Memtest86 can pick up a lot of memory errors but I've also seen memory/disk errors occur, where the memory checks out OK even on overnight runs of memtest and the disk checks out fine, but when you use both together, you get weird errors. Klaus Knopper also reported on this a few years back. I suspect its why motherboard manufacturers only certify certain RAM with their boards. -- To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org Archive: https://lists.debian.org/557f70ee.6050...@torfree.net
Re: ThinkPad R51 creeping segmentation faults
Martin Read wrote: > Bob Proulx wrote: > >In the old days computers would use ECC ram throughout. > > ECC (in the strict sense) has never been ubiquitous. At one time every computer I interfaced with had ECC. It was very popular with me and everyone else I knew. :-) > Parity was quite common in certain timeframes, but parity won't stop your > system crashing if you get bitflips - it'll just make it crash > *immediately*. Parity would at least provide for better error messages and diagnosibility. I just tossed out a bad one year old 4G 204 pin ram just TODAY that caused really wierd errors on the system. I pulled it and ran memtest86 on it in another system and it threw errors on an overnight run fortunately confirming the problem. Bob signature.asc Description: Digital signature
Re: ThinkPad R51 creeping segmentation faults
On 14/06/15 23:40, Bob Proulx wrote: In the old days computers would use ECC ram throughout. ECC (in the strict sense) has never been ubiquitous. Parity was quite common in certain timeframes, but parity won't stop your system crashing if you get bitflips - it'll just make it crash *immediately*. -- To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org Archive: https://lists.debian.org/557ef1b4.8000...@zen.co.uk
Re: ThinkPad R51 creeping segmentation faults
Paul Ausbeck wrote: > I recently replaced the hard disk in my ThinkPad R51 with a solid > state drive The ThinkPad R51 is a solid machine. Don't let anyone tell you otherwise. > The symptom is that as time goes on more and more programs will cause a > segmentation fault while loading. For instance, emacs commonly is the first > program to go. Then maybe iceweasel. Just today iceweasel wouldn't load at > all but then following another suspend/resume cycle it now loads to a point > where it presents a safe mode dialog but then crashes if the mouse pointer > is moved over the dialog box. That sounds very much like a hardware fault. Probably a ram failure. Which is the best type to have because ram is the cheapest to swap out. If it isn't a ram failure then unfortunately it would most likely be a cpu failure. It would be possible to swap the cpu but much more inconvenient. Third likely would be some failure on the motherboard. The root cause of a segmentation fault that isn't a software bug is that bits are getting flipped. Let's say a pointer to some piece of memory is being accessed but a bit of the pointer value is flipped. That will cause it to access the array out of bounds and cause a segmentation violation. Those will be random because the location of the program is different at different times and bits being flipped could be anywhere. This is most likely to occur when running programs that use a lot of memory. That is why you are seeing it on Iceweasel, which is true memory hog, and ahem, my favorite editor Emacs too. Those programs are making the most use of your memory and are therefore the mostly likely to suffer from flipped bits. In the old days computers would use ECC ram throughout. The ECC would protect you from these problems. For years however we have suffered under MS quality hardware. It doesn't make financial sense to make hardware more reliable than the OS sold with it and most machines have been sold with MS. > I've looked around a bit on the internet for similar problems and come up > short. In fact, this class of problem seems inherently difficult to drive to > ground, at least with the knowledge that I currently possess. So what I hope > is that the Debian mailing list can give me some good seeds for new > knowledge to acquire. In particular I'd be interested in how others might > have approached similar situations. I would start by running memtest86+ overnight. apt-get install memtest86+ Then rebooting to the memtest system and letting it run overnight. Hopefully it will indicate a problem. That would be the best result. > I've tried loading emacs and iceweasel with gdb to get stack > backtraces. If random programs are segfaulting then it is very unlikely to be a problem with any of those programs. > One last specific question that sort of embarrasses me to ask, is > where should segmentation fault messages be logged? /var/log/syslog logs all system messages. I always look there. Red Hat calls it /var/log/messages and Debian also logs there too. The /var/log/kern.log is for the subset that are kernel messages. To understand the difference look at /etc/rsyslog.conf and see what gets logged different places. /var/log/syslog contains pretty much everything and the other logs contain more specific things. Mostly. Do you have mcelog installed? If not then install it. apt-get install mcelog > I've grepped around and there are a few segfault messages from maybe > a week ago in kern.log.1 and messages.1, but nothing in kern.log or > messages.log. Perhaps these are still in a memory ring buffer > somewhere? Is there some sort of tool for viewing user space log > messages, I mean other than dmesg which doesn't appear to show any > user space messages? What I have told you applies to Wheezy 7 you are running which is running sysvinit. A lot of flamewar has been spent on the new systemd binary file logging in Jessie 8. I mention this only to give you a heads up that everything you have previously learned about the system up through Wheezy 7 is all changed in Jessie 8. If you decide to stick with sysvinit then what you learn about /etc/rsyslog.conf applies. If you go with the new systemd journal in Jessie 8 then the entire universe is a different place and you will need to learn it all new for systemd. Just to let you know there was a major change that rolled out with the Jessie 8 release. Bob signature.asc Description: Digital signature
Re: ThinkPad R51 creeping segmentation faults
On Sun, Jun 14, 2015 at 02:49:22PM -0700, Paul Ausbeck wrote: > I recently replaced the hard disk in my ThinkPad R51 with a solid state > drive and when I did so I installed Debian Wheezy LXDE updated with a 3.16 > kernel as one of the boot options. I really am pleased with how the system > looks and acts except for a curious instability that occurs increasingly > frequently as uptime and/or suspend/resume cycles increase. > This is a machine from 2004 or so so 10 years old. Is this running on the original power supply and any of the original memory? It may just be that the machine is nearing end of life. > The symptom is that as time goes on more and more programs will cause a > segmentation fault while loading. For instance, emacs commonly is the first > program to go. Then maybe iceweasel. Just today iceweasel wouldn't load at > all but then following another suspend/resume cycle it now loads to a point > where it presents a safe mode dialog but then crashes if the mouse pointer > is moved over the dialog box. > Top is good to see what's running at any one time. > The machine has 2GB of dram, an Intel 2200BG wireless card, and an ATI/AMD > mobility graphics subsystem. I mention the ram to show that it has plenty, > the 2200BG as it's driver will occassionally start using 100% of the CPU and > must be reset by unloading and reloading ipw2200, and the graphics subsystem > as this machine is the only machine that I have that contains an ATI/AMD > graphics subsystem and the first where I've used the open source radeon > driver. Also, both the ipw2200 driver and radeon driver require binary > firmware blobs. One other item of interest at least to me, is that I've > configured the machine with a smaller swap file, 1GB, than the size of > physical memory. I'm not positive, but this may be the only machine that > I've personally configured that has a swap file smaller than physical > memory. > 2G and 1G swap may be a touch tight for wome programs - stuff has got bigger over the last 10 years. I have a netbook with that sort of memory - but a dying RTC - and it struggles with load. > This is the eighth machine where I've installed Debian Wheezy or Jessie and > I've not previously encountered a similar problem. I was getting to think I > understood linux a bit but now I'm thinking I need a whole new layer of > debug/diagnostic techniques. The reason that I'm posting about this is that > I'm reasonably convinced that this is not a symptom of flaky hardware. I've > checked the system memory with various tools and there is no obvious > problem. Significantly for me, Windows XP and 7 both run on this system > without any problems, well no vaguely similar problems. And I've been using > this machine with Windows XP for more than 10 years. Just as an aside, > Windows 7 is not really an option on this machine as there is no available > Radeon Mobility graphics driver, making videos not really playable. > > I'm reasonably certain that the problem is not configuration related. I've > used 3.2, 3.12 and 3.14 kernels on this box and all behave similarly to the > 3.16 kernel. I've also used Debian Jessie and though segmentation faults are > not reported, in the same creeping fashion the loader will begin to refuse > to load certain programs, and though right now I can't remember the exact > cryptic error the whole problem feels as if it is just a different > manifestation of the segfault problem on Wheezy. > Memtest run for a significant period might help to flush out problems. > I've looked around a bit on the internet for similar problems and come up > short. In fact, this class of problem seems inherently difficult to drive to > ground, at least with the knowledge that I currently possess. So what I hope > is that the Debian mailing list can give me some good seeds for new > knowledge to acquire. In particular I'd be interested in how others might > have approached similar situations. I've tried loading emacs and iceweasel > with gdb to get stack backtraces. With emacs, absolutely no symbols. With > iceweasel, a few symbols and it appears that the crash happens during a > memory free operation. I've looked at compiling a debug version of emacs but > that isn't trivial, still in progress. The whole exercise has got me > wondering if there are any other debug/diagnostic options to try before > recompiling various parts of the system. One last specific question that > sort of embarrasses me to ask, is where should segmentation fault messages > be logged? I've grepped around and there are a few segfault messages from > maybe a week ago in kern.log.1 and messages.1, but nothing in kern.log or > messages.log. Perhaps these are still in a memory ring buffer somewhere? Is > there some sort of tool for viewing user space log messages, I mean other > than dmesg which doesn't appear to show any user space messages? > > All the best AndyC > -- > To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org wi
ThinkPad R51 creeping segmentation faults
I recently replaced the hard disk in my ThinkPad R51 with a solid state drive and when I did so I installed Debian Wheezy LXDE updated with a 3.16 kernel as one of the boot options. I really am pleased with how the system looks and acts except for a curious instability that occurs increasingly frequently as uptime and/or suspend/resume cycles increase. The symptom is that as time goes on more and more programs will cause a segmentation fault while loading. For instance, emacs commonly is the first program to go. Then maybe iceweasel. Just today iceweasel wouldn't load at all but then following another suspend/resume cycle it now loads to a point where it presents a safe mode dialog but then crashes if the mouse pointer is moved over the dialog box. The machine has 2GB of dram, an Intel 2200BG wireless card, and an ATI/AMD mobility graphics subsystem. I mention the ram to show that it has plenty, the 2200BG as it's driver will occassionally start using 100% of the CPU and must be reset by unloading and reloading ipw2200, and the graphics subsystem as this machine is the only machine that I have that contains an ATI/AMD graphics subsystem and the first where I've used the open source radeon driver. Also, both the ipw2200 driver and radeon driver require binary firmware blobs. One other item of interest at least to me, is that I've configured the machine with a smaller swap file, 1GB, than the size of physical memory. I'm not positive, but this may be the only machine that I've personally configured that has a swap file smaller than physical memory. This is the eighth machine where I've installed Debian Wheezy or Jessie and I've not previously encountered a similar problem. I was getting to think I understood linux a bit but now I'm thinking I need a whole new layer of debug/diagnostic techniques. The reason that I'm posting about this is that I'm reasonably convinced that this is not a symptom of flaky hardware. I've checked the system memory with various tools and there is no obvious problem. Significantly for me, Windows XP and 7 both run on this system without any problems, well no vaguely similar problems. And I've been using this machine with Windows XP for more than 10 years. Just as an aside, Windows 7 is not really an option on this machine as there is no available Radeon Mobility graphics driver, making videos not really playable. I'm reasonably certain that the problem is not configuration related. I've used 3.2, 3.12 and 3.14 kernels on this box and all behave similarly to the 3.16 kernel. I've also used Debian Jessie and though segmentation faults are not reported, in the same creeping fashion the loader will begin to refuse to load certain programs, and though right now I can't remember the exact cryptic error the whole problem feels as if it is just a different manifestation of the segfault problem on Wheezy. I've looked around a bit on the internet for similar problems and come up short. In fact, this class of problem seems inherently difficult to drive to ground, at least with the knowledge that I currently possess. So what I hope is that the Debian mailing list can give me some good seeds for new knowledge to acquire. In particular I'd be interested in how others might have approached similar situations. I've tried loading emacs and iceweasel with gdb to get stack backtraces. With emacs, absolutely no symbols. With iceweasel, a few symbols and it appears that the crash happens during a memory free operation. I've looked at compiling a debug version of emacs but that isn't trivial, still in progress. The whole exercise has got me wondering if there are any other debug/diagnostic options to try before recompiling various parts of the system. One last specific question that sort of embarrasses me to ask, is where should segmentation fault messages be logged? I've grepped around and there are a few segfault messages from maybe a week ago in kern.log.1 and messages.1, but nothing in kern.log or messages.log. Perhaps these are still in a memory ring buffer somewhere? Is there some sort of tool for viewing user space log messages, I mean other than dmesg which doesn't appear to show any user space messages? -- To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org Archive: https://lists.debian.org/557df6e2.6010...@alumni.cse.ucsc.edu