Re: [Re: gcc: internal compiler error: program cc1 got fatal signal 11]
Ive also had a problem with signal 11, heres a great page explaining the aspects of signal 11 error from gcc (http://www.bitwizard.nl/sig11/). Signal 11 is usually a hardware problem, as the article points out. I found a sloppy soulution playing with my BIOS settings, turns out there was an option called "Memory Hole at 15Mb Addr." I enabled it and i got no more sig11, however when I boot up, Linux only recognizes like 13Mb of my 64Mb of RAM. Anyway, there are my 2 cents. Luis -- ___ FREE Personalized E-mail at Mail.com http://www.mail.com/?sr=signup FREE PC-to-Phone calls with Net2Phone http://www.net2phone.com/cgi-bin/link.cgi?121 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Re: gcc: internal compiler error: program cc1 got fatal signal 11]
Ive also had a problem with signal 11, heres a great page explaining the aspects of signal 11 error from gcc (http://www.bitwizard.nl/sig11/). Signal 11 is usually a hardware problem, as the article points out. I found a sloppy soulution playing with my BIOS settings, turns out there was an option called Memory Hole at 15Mb Addr. I enabled it and i got no more sig11, however when I boot up, Linux only recognizes like 13Mb of my 64Mb of RAM. Anyway, there are my 2 cents. Luis phlash -- ___ FREE Personalized E-mail at Mail.com http://www.mail.com/?sr=signup FREE PC-to-Phone calls with Net2Phone http://www.net2phone.com/cgi-bin/link.cgi?121 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Re: gcc: internal compiler error: program cc1 got fatal signal 11]
Riley Williams wrote: > Hi Peter. > > >> Wasn't 2.2.12 the kernel that included the `lock halt` bug patch? > > > Perhaps, but is has absolutely nothing to do with the rest of > > this discussion. > > The `lock halt` bug patch was specific to the Cyrix processors (not to > be confused with the `lock registers` patch for the Intel processors, > and I noted that the processor in question was a Cyrix one, hence the > comment. > Oh. Sorry, I don't know about "lock halt" and its effects. However, if it refers to the instruction sequence LOCK HLT I find it hard to believe it would have the symptoms described. -hpa - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Re: gcc: internal compiler error: program cc1 got fatal signal 11]
Riley Williams wrote: > > Wasn't 2.2.12 the kernel that included the `lock halt` bug patch? > Perhaps, but is has absolutely nothing to do with the rest of this discussion. -hpa - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Re: gcc: internal compiler error: program cc1 got fatal signal 11]
Followup to: <[EMAIL PROTECTED]> By author:szonyi calin <[EMAIL PROTECTED]> In newsgroup: linux.dev.kernel > > Almost always ? > It seems like gcc is THE ONLY program which gets > signal 11 > Why the X server doesn't get signal 11 ? > Why others programs don't get signal 11 ? > gcc happens to be one of the best memory testers known to man -- much better than most other programs. A big reason for that is that it accesses lots of memory in funny patterns, *AND* accesses to it are likely to be fatal. It is just the way it is. gcc doing the signal 11 is HIGHLY correlated with the hardware you are running on, which means it's *usually* hardware-related. > [... Lots of M$ flames ignored ...] > Some time ago I installed Linux (Redhat 6.0) on my pc (Cx486 8M RAM) > and gcc had a lot of signal 11 (a couple every hour) I was upgrading > the kernel every time there was a new kernel and from 2.2.12(or 14) > no more signal 11 (very rare) Is this still a hardware problem ? > Was a bug in kernel ? > > I think the last answer is more obvious.(or the gcc > had a bug and the kernel -- a workaround). Most likely is that your *hardware* had a bug and the new kernel a workaround (this is quite common), but without more detail it is very hard to know. -hpa -- <[EMAIL PROTECTED]> at work, <[EMAIL PROTECTED]> in private! "Unix gives you enough rope to shoot yourself in the foot." http://www.zytor.com/~hpa/puzzle.txt - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Re: gcc: internal compiler error: program cc1 got fatal signal 11]
Followup to: [EMAIL PROTECTED] By author:szonyi calin [EMAIL PROTECTED] In newsgroup: linux.dev.kernel Almost always ? It seems like gcc is THE ONLY program which gets signal 11 Why the X server doesn't get signal 11 ? Why others programs don't get signal 11 ? gcc happens to be one of the best memory testers known to man -- much better than most other programs. A big reason for that is that it accesses lots of memory in funny patterns, *AND* accesses to it are likely to be fatal. It is just the way it is. gcc doing the signal 11 is HIGHLY correlated with the hardware you are running on, which means it's *usually* hardware-related. [... Lots of M$ flames ignored ...] Some time ago I installed Linux (Redhat 6.0) on my pc (Cx486 8M RAM) and gcc had a lot of signal 11 (a couple every hour) I was upgrading the kernel every time there was a new kernel and from 2.2.12(or 14) no more signal 11 (very rare) Is this still a hardware problem ? Was a bug in kernel ? I think the last answer is more obvious.(or the gcc had a bug and the kernel -- a workaround). Most likely is that your *hardware* had a bug and the new kernel a workaround (this is quite common), but without more detail it is very hard to know. -hpa -- [EMAIL PROTECTED] at work, [EMAIL PROTECTED] in private! Unix gives you enough rope to shoot yourself in the foot. http://www.zytor.com/~hpa/puzzle.txt - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Re: gcc: internal compiler error: program cc1 got fatal signal 11]
Riley Williams wrote: Wasn't 2.2.12 the kernel that included the `lock halt` bug patch? Perhaps, but is has absolutely nothing to do with the rest of this discussion. -hpa - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Re: gcc: internal compiler error: program cc1 got fatal signal 11]
Riley Williams wrote: Hi Peter. Wasn't 2.2.12 the kernel that included the `lock halt` bug patch? Perhaps, but is has absolutely nothing to do with the rest of this discussion. The `lock halt` bug patch was specific to the Cyrix processors (not to be confused with the `lock registers` patch for the Intel processors, and I noted that the processor in question was a Cyrix one, hence the comment. Oh. Sorry, I don't know about lock halt and its effects. However, if it refers to the instruction sequence LOCK HLT I find it hard to believe it would have the symptoms described. -hpa - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Re: gcc: internal compiler error: program cc1 got fatal signal 11]
> Almost always ? > It seems like gcc is THE ONLY program which gets > signal 11 > Why the X server doesn't get signal 11 ? > Why others programs don't get signal 11 ? ... > Some time ago I installed Linux (Redhat 6.0) on my > pc (Cx486 8M RAM) and gcc had a lot of signal 11 (a > couple every hour) I was upgrading > the kernel every time there was a new kernel and > from 2.2.12(or 14) no more signal 11 (very rare) > Is this still a hardware problem ? It could be. One possible way: 1. your system is clogged with dust 2. gcc runs the CPU hard, generating lots of heat 3. the heat causes crashes 4. a new Linux version that sets a Cyrix-specific power-saving mode 5. your heat problems go away, and so do the crashes Another possible way: 1. you have buggy motherboard or disk hardware 2. when you swap, gcc gets corrupted by the hardware 3. you get a new Linux kernel that has a bug work-around 4. your problems go away Yet another way: 1. your room is hot, your computer is near a huge motor... 2. you upgrade to Linux 2.2.12 and move your computer 3. soon you realize that the crashes are gone 4. you credit the kernel, but location was the problem - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Re: gcc: internal compiler error: program cc1 got fatal signal 11]
- Received message begins Here - > > > --- Jesse Pollard <[EMAIL PROTECTED]> > wrote: > > > > > > > > > "This is almost always the result of flakiness in > > your hardware - either > > > RAM (most likely), or motherboard (less likely). > > " > > > > > > I cannot understand > > this. There are many other > > > stuffs that I compiled with gcc without any > > problem. Again compilation is only > > > a application. It only parse and gernerates > > object files. How can RAM or > > > motherboard makes different > > > > It's most likely flackey memory. > > > > Remember- a single bit that dropps can cause the > > signal 11. It doesn't have > > to happen consistently either. I had the same > > problem until I slowed down > > memory access (that seemd to cover the borderline > > chip). > > > > The compiler uses different amounts of memory > > depending on the source file, > > number of symbols defined (via include headers). > > When the multiple passes > > occur simultaneously, there is higher memory > > pressure, and more of the > > free space used. One of the pages may flake out. > > Compiling the kernel > > puts more pressure on memory than compiling most > > applications. > > > > > - > > Jesse I Pollard, II > > Email: [EMAIL PROTECTED] > > > > Any opinions expressed are solely my own. > > - > > To unsubscribe from this list: send the line > > "unsubscribe linux-kernel" in > > the body of a message to [EMAIL PROTECTED] > > More majordomo info at > > http://vger.kernel.org/majordomo-info.html > > Please read the FAQ at http://www.tux.org/lkml/ > > Almost always ? > It seems like gcc is THE ONLY program which gets > signal 11 > Why the X server doesn't get signal 11 ? > Why others programs don't get signal 11 ? Load the system down with lots of processes/large image windows. Unless the bit in question is in a pointer, or data used in pointer arithmetic or function call it won't segfault. Applications (if an instruction page gets hit) may get an illegal instruction. > I remember that once Bill Gates was asked about > crashes in windows and he said: It's a hardware > problem. > It was also a joke on that subject: > Winerr xxx: Hardware problem (it's not our fault, it's > not, it's not, it's not, it's not...) Yup - because it crashed VERY frequently when it was obviously a software bug. > Seems to me like Micro$oft way of handling problems. > > We must agree that gcc is full of bugs (xanim does not > > run corectly if it is compiled with gcc 2.95.3 > and other programs which use floating point > calculations do the same (spice 3f5)) Generating wrong code is different than a segfault. Currently I'm using egcs-2.91.66 on a 486, without problems. (I don't do floating point on a 486... too slow). > Some time ago I installed Linux (Redhat 6.0) on my > pc (Cx486 8M RAM) and gcc had a lot of signal 11 (a > couple every hour) I was upgrading > the kernel every time there was a new kernel and > from 2.2.12(or 14) no more signal 11 (very rare) > Is this still a hardware problem ? > Was a bug in kernel ? Not likely - It could just depend on whether all of available was used. If the physical page with the problem doesn't get used very often, it won't show up. If the bit in question is not part of a pointer, or used in pointer arithmetic, again it won't show up (actually, any operation on addresses). Wrong, or slightly wrong results MAY show up. > I think the last answer is more obvious.(or the gcc > had a bug and the kernel -- a workaround). > > Sorry for bothering you but in every piece of linux > documentation signal 11 seems to be __identic__ with > hardware problem. > Bye Only when it appears in random location. GCC is a fairly well debugged program and doesn't segfault unless you run out of memory, or flakey memory. - Jesse I Pollard, II Email: [EMAIL PROTECTED] Any opinions expressed are solely my own. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: gcc: internal compiler error: program cc1 got fatal signal 11
At 10:20 AM 6/29/01, you wrote: >Almost always ? >It seems like gcc is THE ONLY program which gets >signal 11 >Why the X server doesn't get signal 11 ? >Why others programs don't get signal 11 ? > >I remember that once Bill Gates was asked about >crashes in windows and he said: It's a hardware >problem. >It was also a joke on that subject: >Winerr xxx: Hardware problem (it's not our fault, it's >not, it's not, it's not, it's not...) > > >Seems to me like Micro$oft way of handling problems. > >We must agree that gcc is full of bugs (xanim does not >run corectly if it is compiled with gcc 2.95.3 >and other programs which use floating point >calculations do the same (spice 3f5)) All versions of gcc have bugs. They generally show up as incorrect complaints about the source code, as generated code that is less than optimal or that is flat out wrong. With this kind of bug, if you compile the program twice you'll get the same (buggy) result. Sig 11 is a bit different. With a compiler bug causing the sig 11, the problem will happen EVERY time you compile the given file - because the compiler is busted. This kind of problem is detected early in the compiler's life cycle and gets fixed. Then there are the intermittent sig 11 errors. If the software was broken, the sig 11 would happen whenever you do the same thing. Being able to compile a bunch of files, get a sig 11, compile a bunch more, sig 11, a bunch more ... is a sign that the problem isn't the compiler. Peoples' experience over the years has shown that symptoms of this type are cause by (intermittent) hardware problems. Why does this affect gcc more than other programs? Because gcc uses gazillions of pointers and bad memory causes bad pointers causes sig 11. Hope this helps. David P.S. Years ago, installing OS/2 on an apparently 100% working system would show similar problems. OS/2 was the first widely used 32 bit operating system on Intel hardware. It exercised the hardware differently from DOS, Windows, etc and flaky memory would make itself known. The usual reaction was "But my system worked fine before OS/2" The response was "different software exercises the hardware differently and may reveal unsuspected problems". David Relson Osage Software Systems, Inc. [EMAIL PROTECTED] Ann Arbor, MI 48103 www.osagesoftware.com tel: 734.821.8800 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Re: gcc: internal compiler error: program cc1 got fatal signal 11]
--- Jesse Pollard <[EMAIL PROTECTED]> wrote: > > > > > > "This is almost always the result of flakiness in > your hardware - either > > RAM (most likely), or motherboard (less likely). > " > > > > I cannot understand > this. There are many other > > stuffs that I compiled with gcc without any > problem. Again compilation is only > > a application. It only parse and gernerates > object files. How can RAM or > > motherboard makes different > > It's most likely flackey memory. > > Remember- a single bit that dropps can cause the > signal 11. It doesn't have > to happen consistently either. I had the same > problem until I slowed down > memory access (that seemd to cover the borderline > chip). > > The compiler uses different amounts of memory > depending on the source file, > number of symbols defined (via include headers). > When the multiple passes > occur simultaneously, there is higher memory > pressure, and more of the > free space used. One of the pages may flake out. > Compiling the kernel > puts more pressure on memory than compiling most > applications. > > - > Jesse I Pollard, II > Email: [EMAIL PROTECTED] > > Any opinions expressed are solely my own. > - > To unsubscribe from this list: send the line > "unsubscribe linux-kernel" in > the body of a message to [EMAIL PROTECTED] > More majordomo info at > http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ Almost always ? It seems like gcc is THE ONLY program which gets signal 11 Why the X server doesn't get signal 11 ? Why others programs don't get signal 11 ? I remember that once Bill Gates was asked about crashes in windows and he said: It's a hardware problem. It was also a joke on that subject: Winerr xxx: Hardware problem (it's not our fault, it's not, it's not, it's not, it's not...) Seems to me like Micro$oft way of handling problems. We must agree that gcc is full of bugs (xanim does not run corectly if it is compiled with gcc 2.95.3 and other programs which use floating point calculations do the same (spice 3f5)) Some time ago I installed Linux (Redhat 6.0) on my pc (Cx486 8M RAM) and gcc had a lot of signal 11 (a couple every hour) I was upgrading the kernel every time there was a new kernel and from 2.2.12(or 14) no more signal 11 (very rare) Is this still a hardware problem ? Was a bug in kernel ? I think the last answer is more obvious.(or the gcc had a bug and the kernel -- a workaround). Sorry for bothering you but in every piece of linux documentation signal 11 seems to be __identic__ with hardware problem. Bye __ Do You Yahoo!? Get personalized email addresses from Yahoo! Mail http://personal.mail.yahoo.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Re: gcc: internal compiler error: program cc1 got fatal signal 11]
> > > "This is almost always the result of flakiness in your hardware - either > RAM (most likely), or motherboard (less likely). " > > I cannot understand this. There are many other > stuffs that I compiled with gcc without any problem. Again compilation is only > a application. It only parse and gernerates object files. How can RAM or > motherboard makes different It's most likely flackey memory. Remember- a single bit that dropps can cause the signal 11. It doesn't have to happen consistently either. I had the same problem until I slowed down memory access (that seemd to cover the borderline chip). The compiler uses different amounts of memory depending on the source file, number of symbols defined (via include headers). When the multiple passes occur simultaneously, there is higher memory pressure, and more of the free space used. One of the pages may flake out. Compiling the kernel puts more pressure on memory than compiling most applications. - Jesse I Pollard, II Email: [EMAIL PROTECTED] Any opinions expressed are solely my own. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Re: gcc: internal compiler error: program cc1 got fatal signal 11]
On Thu, Jun 28, 2001 at 11:23:37PM -0600, Blesson Paul wrote: > > "This is almost always the result of flakiness in your hardware - either > RAM (most likely), or motherboard (less likely). " > > I cannot understand this. There are many other > stuffs that I compiled with gcc without any problem. Again compilation is only > a application. It only parse and gernerates object files. How can RAM or > motherboard makes different Please read the complete Sig11 FAQ (http://www.bitwizard.nl/sig11/ ), your question is discussed in it as well. Erik -- J.A.K. (Erik) Mouw, Information and Communication Theory Group, Department of Electrical Engineering, Faculty of Information Technology and Systems, Delft University of Technology, PO BOX 5031, 2600 GA Delft, The Netherlands Phone: +31-15-2783635 Fax: +31-15-2781843 Email: [EMAIL PROTECTED] WWW: http://www-ict.its.tudelft.nl/~erik/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Re: gcc: internal compiler error: program cc1 got fatal signal 11]
On Thu, Jun 28, 2001 at 11:23:37PM -0600, Blesson Paul wrote: This is almost always the result of flakiness in your hardware - either RAM (most likely), or motherboard (less likely). I cannot understand this. There are many other stuffs that I compiled with gcc without any problem. Again compilation is only a application. It only parse and gernerates object files. How can RAM or motherboard makes different Please read the complete Sig11 FAQ (http://www.bitwizard.nl/sig11/ ), your question is discussed in it as well. Erik -- J.A.K. (Erik) Mouw, Information and Communication Theory Group, Department of Electrical Engineering, Faculty of Information Technology and Systems, Delft University of Technology, PO BOX 5031, 2600 GA Delft, The Netherlands Phone: +31-15-2783635 Fax: +31-15-2781843 Email: [EMAIL PROTECTED] WWW: http://www-ict.its.tudelft.nl/~erik/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Re: gcc: internal compiler error: program cc1 got fatal signal 11]
This is almost always the result of flakiness in your hardware - either RAM (most likely), or motherboard (less likely). I cannot understand this. There are many other stuffs that I compiled with gcc without any problem. Again compilation is only a application. It only parse and gernerates object files. How can RAM or motherboard makes different It's most likely flackey memory. Remember- a single bit that dropps can cause the signal 11. It doesn't have to happen consistently either. I had the same problem until I slowed down memory access (that seemd to cover the borderline chip). The compiler uses different amounts of memory depending on the source file, number of symbols defined (via include headers). When the multiple passes occur simultaneously, there is higher memory pressure, and more of the free space used. One of the pages may flake out. Compiling the kernel puts more pressure on memory than compiling most applications. - Jesse I Pollard, II Email: [EMAIL PROTECTED] Any opinions expressed are solely my own. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Re: gcc: internal compiler error: program cc1 got fatal signal 11]
--- Jesse Pollard [EMAIL PROTECTED] wrote: This is almost always the result of flakiness in your hardware - either RAM (most likely), or motherboard (less likely). I cannot understand this. There are many other stuffs that I compiled with gcc without any problem. Again compilation is only a application. It only parse and gernerates object files. How can RAM or motherboard makes different It's most likely flackey memory. Remember- a single bit that dropps can cause the signal 11. It doesn't have to happen consistently either. I had the same problem until I slowed down memory access (that seemd to cover the borderline chip). The compiler uses different amounts of memory depending on the source file, number of symbols defined (via include headers). When the multiple passes occur simultaneously, there is higher memory pressure, and more of the free space used. One of the pages may flake out. Compiling the kernel puts more pressure on memory than compiling most applications. - Jesse I Pollard, II Email: [EMAIL PROTECTED] Any opinions expressed are solely my own. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ Almost always ? It seems like gcc is THE ONLY program which gets signal 11 Why the X server doesn't get signal 11 ? Why others programs don't get signal 11 ? I remember that once Bill Gates was asked about crashes in windows and he said: It's a hardware problem. It was also a joke on that subject: Winerr xxx: Hardware problem (it's not our fault, it's not, it's not, it's not, it's not...) Seems to me like Micro$oft way of handling problems. We must agree that gcc is full of bugs (xanim does not run corectly if it is compiled with gcc 2.95.3 and other programs which use floating point calculations do the same (spice 3f5)) Some time ago I installed Linux (Redhat 6.0) on my pc (Cx486 8M RAM) and gcc had a lot of signal 11 (a couple every hour) I was upgrading the kernel every time there was a new kernel and from 2.2.12(or 14) no more signal 11 (very rare) Is this still a hardware problem ? Was a bug in kernel ? I think the last answer is more obvious.(or the gcc had a bug and the kernel -- a workaround). Sorry for bothering you but in every piece of linux documentation signal 11 seems to be __identic__ with hardware problem. Bye __ Do You Yahoo!? Get personalized email addresses from Yahoo! Mail http://personal.mail.yahoo.com/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: gcc: internal compiler error: program cc1 got fatal signal 11
At 10:20 AM 6/29/01, you wrote: Almost always ? It seems like gcc is THE ONLY program which gets signal 11 Why the X server doesn't get signal 11 ? Why others programs don't get signal 11 ? I remember that once Bill Gates was asked about crashes in windows and he said: It's a hardware problem. It was also a joke on that subject: Winerr xxx: Hardware problem (it's not our fault, it's not, it's not, it's not, it's not...) Seems to me like Micro$oft way of handling problems. We must agree that gcc is full of bugs (xanim does not run corectly if it is compiled with gcc 2.95.3 and other programs which use floating point calculations do the same (spice 3f5)) All versions of gcc have bugs. They generally show up as incorrect complaints about the source code, as generated code that is less than optimal or that is flat out wrong. With this kind of bug, if you compile the program twice you'll get the same (buggy) result. Sig 11 is a bit different. With a compiler bug causing the sig 11, the problem will happen EVERY time you compile the given file - because the compiler is busted. This kind of problem is detected early in the compiler's life cycle and gets fixed. Then there are the intermittent sig 11 errors. If the software was broken, the sig 11 would happen whenever you do the same thing. Being able to compile a bunch of files, get a sig 11, compile a bunch more, sig 11, a bunch more ... is a sign that the problem isn't the compiler. Peoples' experience over the years has shown that symptoms of this type are cause by (intermittent) hardware problems. Why does this affect gcc more than other programs? Because gcc uses gazillions of pointers and bad memory causes bad pointers causes sig 11. Hope this helps. David P.S. Years ago, installing OS/2 on an apparently 100% working system would show similar problems. OS/2 was the first widely used 32 bit operating system on Intel hardware. It exercised the hardware differently from DOS, Windows, etc and flaky memory would make itself known. The usual reaction was But my system worked fine before OS/2 The response was different software exercises the hardware differently and may reveal unsuspected problems. David Relson Osage Software Systems, Inc. [EMAIL PROTECTED] Ann Arbor, MI 48103 www.osagesoftware.com tel: 734.821.8800 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Re: gcc: internal compiler error: program cc1 got fatal signal 11]
- Received message begins Here - --- Jesse Pollard [EMAIL PROTECTED] wrote: This is almost always the result of flakiness in your hardware - either RAM (most likely), or motherboard (less likely). I cannot understand this. There are many other stuffs that I compiled with gcc without any problem. Again compilation is only a application. It only parse and gernerates object files. How can RAM or motherboard makes different It's most likely flackey memory. Remember- a single bit that dropps can cause the signal 11. It doesn't have to happen consistently either. I had the same problem until I slowed down memory access (that seemd to cover the borderline chip). The compiler uses different amounts of memory depending on the source file, number of symbols defined (via include headers). When the multiple passes occur simultaneously, there is higher memory pressure, and more of the free space used. One of the pages may flake out. Compiling the kernel puts more pressure on memory than compiling most applications. - Jesse I Pollard, II Email: [EMAIL PROTECTED] Any opinions expressed are solely my own. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ Almost always ? It seems like gcc is THE ONLY program which gets signal 11 Why the X server doesn't get signal 11 ? Why others programs don't get signal 11 ? Load the system down with lots of processes/large image windows. Unless the bit in question is in a pointer, or data used in pointer arithmetic or function call it won't segfault. Applications (if an instruction page gets hit) may get an illegal instruction. I remember that once Bill Gates was asked about crashes in windows and he said: It's a hardware problem. It was also a joke on that subject: Winerr xxx: Hardware problem (it's not our fault, it's not, it's not, it's not, it's not...) Yup - because it crashed VERY frequently when it was obviously a software bug. Seems to me like Micro$oft way of handling problems. We must agree that gcc is full of bugs (xanim does not run corectly if it is compiled with gcc 2.95.3 and other programs which use floating point calculations do the same (spice 3f5)) Generating wrong code is different than a segfault. Currently I'm using egcs-2.91.66 on a 486, without problems. (I don't do floating point on a 486... too slow). Some time ago I installed Linux (Redhat 6.0) on my pc (Cx486 8M RAM) and gcc had a lot of signal 11 (a couple every hour) I was upgrading the kernel every time there was a new kernel and from 2.2.12(or 14) no more signal 11 (very rare) Is this still a hardware problem ? Was a bug in kernel ? Not likely - It could just depend on whether all of available was used. If the physical page with the problem doesn't get used very often, it won't show up. If the bit in question is not part of a pointer, or used in pointer arithmetic, again it won't show up (actually, any operation on addresses). Wrong, or slightly wrong results MAY show up. I think the last answer is more obvious.(or the gcc had a bug and the kernel -- a workaround). Sorry for bothering you but in every piece of linux documentation signal 11 seems to be __identic__ with hardware problem. Bye Only when it appears in random location. GCC is a fairly well debugged program and doesn't segfault unless you run out of memory, or flakey memory. - Jesse I Pollard, II Email: [EMAIL PROTECTED] Any opinions expressed are solely my own. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Re: gcc: internal compiler error: program cc1 got fatal signal 11]
Almost always ? It seems like gcc is THE ONLY program which gets signal 11 Why the X server doesn't get signal 11 ? Why others programs don't get signal 11 ? ... Some time ago I installed Linux (Redhat 6.0) on my pc (Cx486 8M RAM) and gcc had a lot of signal 11 (a couple every hour) I was upgrading the kernel every time there was a new kernel and from 2.2.12(or 14) no more signal 11 (very rare) Is this still a hardware problem ? It could be. One possible way: 1. your system is clogged with dust 2. gcc runs the CPU hard, generating lots of heat 3. the heat causes crashes 4. a new Linux version that sets a Cyrix-specific power-saving mode 5. your heat problems go away, and so do the crashes Another possible way: 1. you have buggy motherboard or disk hardware 2. when you swap, gcc gets corrupted by the hardware 3. you get a new Linux kernel that has a bug work-around 4. your problems go away Yet another way: 1. your room is hot, your computer is near a huge motor... 2. you upgrade to Linux 2.2.12 and move your computer 3. soon you realize that the crashes are gone 4. you credit the kernel, but location was the problem - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Re: gcc: internal compiler error: program cc1 got fatal signal 11]
"This is almost always the result of flakiness in your hardware - either RAM (most likely), or motherboard (less likely). " I cannot understand this. There are many other stuffs that I compiled with gcc without any problem. Again compilation is only a application. It only parse and gernerates object files. How can RAM or motherboard makes different - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
gcc: internal compiler error: program cc1 got fatal signal 11
hi I am trying to compile the kernel2.4.5 source code. Presently I have kernel2.2.14 and Redhat6.2. I have egcs1.2.2. Now when I compile I will get the following error gcc: Internel compiler error: program cc1 got fatal signal 11 make Error 1 Leaving directory ... .. . Assembler messages Warning: end of file not at end of file: newline inserted cpp: output pipe has been closed Error: suffix or operands invalid for mov Here cofusion part is that, when I recompile, the same part where this error occured will compile perfectly. But again after some compilation, the same error will show in any other place. The last line in the error statement may be different in the second time. Moreover my cpu info in given below. I have given processor i486. Is there any particular choice should be made to compile kernel source code processor : 0 vendor_id : AuthenticAMD cpu family : 5 model : 8 model name : AMD-K6(tm) 3D processor stepping: 12 cpu MHz : 400.921117 fdiv_bug: no hlt_bug : no sep_bug : no f00f_bug: no coma_bug: no fpu : yes fpu_exception : yes cpuid level : 1 wp : yes flags : fpu vme de pse tsc msr mce cx8 sep mtrr pge mmx 3dnow bogomips: 799.54 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
gcc: internal compiler error: program cc1 got fatal signal 11
hi I am trying to compile the kernel2.4.5 source code. Presently I have kernel2.2.14 and Redhat6.2. I have egcs1.2.2. Now when I compile I will get the following error gcc: Internel compiler error: program cc1 got fatal signal 11 make Error 1 Leaving directory ... .. . Assembler messages Warning: end of file not at end of file: newline inserted cpp: output pipe has been closed Error: suffix or operands invalid for mov Here cofusion part is that, when I recompile, the same part where this error occured will compile perfectly. But again after some compilation, the same error will show in any other place. The last line in the error statement may be different in the second time. Moreover my cpu info in given below. I have given processor i486. Is there any particular choice should be made to compile kernel source code processor : 0 vendor_id : AuthenticAMD cpu family : 5 model : 8 model name : AMD-K6(tm) 3D processor stepping: 12 cpu MHz : 400.921117 fdiv_bug: no hlt_bug : no sep_bug : no f00f_bug: no coma_bug: no fpu : yes fpu_exception : yes cpuid level : 1 wp : yes flags : fpu vme de pse tsc msr mce cx8 sep mtrr pge mmx 3dnow bogomips: 799.54 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Re: gcc: internal compiler error: program cc1 got fatal signal 11]
This is almost always the result of flakiness in your hardware - either RAM (most likely), or motherboard (less likely). I cannot understand this. There are many other stuffs that I compiled with gcc without any problem. Again compilation is only a application. It only parse and gernerates object files. How can RAM or motherboard makes different - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: Is this kernel related (signal 11)?
Hi all, Well, I upgraded my system to glibc 2.2.1 with few problems. Unfortunately, there are no improvements in my stability problems. X still dies. So, I ask again, how can I debug this? How can I determine if this is a kernel problem or not? Thanks, --Rainer - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
RE: Is this kernel related (signal 11)?
Hi all, Well, I upgraded my system to glibc 2.2.1 with few problems. Unfortunately, there are no improvements in my stability problems. X still dies. So, I ask again, how can I debug this? How can I determine if this is a kernel problem or not? Thanks, --Rainer - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
RE: Is this kernel related (signal 11)?
As per Russell King's suggestion, I ran memtest86 on my system for about 12 hours last night. I found no memory errors. Note that the tests did not complete because I had to stop them this morning. I'll contiue them tonight. They got through test 9 of 11. As per David Ford's suggestion, I am looking into upgrading to glibc 2.2.1. Can someone please give hints on doing this. I tried to upgrade to 2.2 a few weeks ago and after the 'make install' and then reboot my system was very broken and I had to reinstall the RedHat glibc RPM from CD to recover. I found a howto but it seems pretty old. How do other people do this? I've also done a strace on X. Now what do I do with this 4 MB log file? Thanks, --Rainer - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
RE: Is this kernel related (signal 11)?
Thanks for all the info, comments below: First, I ran X in gdb and got the following via 'bt' after X died. This is my first experience with gdb so if I should do anything in particular, please tell me. #0 0x401addeb in __sigsuspend (set=0xb930) at ../sysdeps/unix/sysv/linux/sigsuspend.c:48 #1 0x80495a4 in startServer () #2 0x804922c in main () #3 0x401a79cb in __libc_start_main (main=0x8048ee0 , argc=5, argv=0xbacc, init=0x8048a64 <_init>, fini=0x8049a44 <_fini>, rtld_fini=0x4000ae60 <_dl_fini>, stack_end=0xbac4) at ../sysdeps/generic/libc-start.c:92 > David Ford: > > Upgrade -past- 2.2, get 2.2.1. 2.2 causes numerous segfaults, > notably sendmail > and apache stop working. I'm willing. Are there any good how-tos on doing this without killing your system? The last time I manually upgraded libc was about 5 years ago. > Russell King: > > > In answer to the original posters question, the first step would be > to grab a copy of memtest86 (iirc its a program that is run from floppy > disk) and run that on your system. That /should/ (and I stress should > there) detect any RAM problems you have. I'll try this. > Barry K. Nathan: > > > Does it always happen when you are moving the mouse over a button or > windowbar or some other on-screen object like that? Nope. If anything I'd say it happens during blitting (scrolling, screen refreshing, etc). Also, I'm not overclocking anything. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
RE: Is this kernel related (signal 11)?
Thanks for all the info, comments below: First, I ran X in gdb and got the following via 'bt' after X died. This is my first experience with gdb so if I should do anything in particular, please tell me. #0 0x401addeb in __sigsuspend (set=0xb930) at ../sysdeps/unix/sysv/linux/sigsuspend.c:48 #1 0x80495a4 in startServer () #2 0x804922c in main () #3 0x401a79cb in __libc_start_main (main=0x8048ee0 main, argc=5, argv=0xbacc, init=0x8048a64 _init, fini=0x8049a44 _fini, rtld_fini=0x4000ae60 _dl_fini, stack_end=0xbac4) at ../sysdeps/generic/libc-start.c:92 David Ford: Upgrade -past- 2.2, get 2.2.1. 2.2 causes numerous segfaults, notably sendmail and apache stop working. I'm willing. Are there any good how-tos on doing this without killing your system? The last time I manually upgraded libc was about 5 years ago. Russell King: In answer to the original posters question, the first step would be to grab a copy of memtest86 (iirc its a program that is run from floppy disk) and run that on your system. That /should/ (and I stress should there) detect any RAM problems you have. I'll try this. Barry K. Nathan: Does it always happen when you are moving the mouse over a button or windowbar or some other on-screen object like that? Nope. If anything I'd say it happens during blitting (scrolling, screen refreshing, etc). Also, I'm not overclocking anything. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
RE: Is this kernel related (signal 11)?
As per Russell King's suggestion, I ran memtest86 on my system for about 12 hours last night. I found no memory errors. Note that the tests did not complete because I had to stop them this morning. I'll contiue them tonight. They got through test 9 of 11. As per David Ford's suggestion, I am looking into upgrading to glibc 2.2.1. Can someone please give hints on doing this. I tried to upgrade to 2.2 a few weeks ago and after the 'make install' and then reboot my system was very broken and I had to reinstall the RedHat glibc RPM from CD to recover. I found a howto but it seems pretty old. How do other people do this? I've also done a strace on X. Now what do I do with this 4 MB log file? Thanks, --Rainer - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: oops, signal 11
On Sat, Jan 20, 2001 at 01:46:50PM +0100, [EMAIL PROTECTED] wrote: > I know that signal 11 with gcc is a sign of bad hardware; however it > strikes me that I don't get random oopses - a whole bunch of them is appended. The compiler tends to hammer harder on the memory than the kernel; this is a sign of the great effort which was taken to optimize the kernel's cache usage. Ralf - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Is this kernel related (signal 11)?
Rainer Mager wrote: > > Would this be an SMP IA32 box with glibc 2.2? I have two such boxen > > showing exactly the same behaviour, although I can't reproduce it at will. > > Close, it is actually an SMP IA32 box with glibc 2.1.3. But you've now > convinced me to not upgrade glibc yet ;-) Upgrade -past- 2.2, get 2.2.1. 2.2 causes numerous segfaults, notably sendmail and apache stop working. -d -- There is a natural aristocracy among men. The grounds of this are virtue and talents. Thomas Jefferson The good thing about standards is that there are so many to choose from. Andrew S. Tanenbaum - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Is this kernel related (signal 11)?
On Mon, 22 Jan 2001, Russell King wrote: > Evidence: I recently had a bad 128MB SDRAM which *always* failed at byte > address 0x220068, and X is likely to be the biggest process by far on a box, so statistically will be the process that hits this bad byte the most. no? regards, -- Paul Jakma [EMAIL PROTECTED] [EMAIL PROTECTED] PGP5 key: http://www.clubi.ie/jakma/publickey.txt --- Fortune: The bomb will never go off. I speak as an expert in explosives. -- Admiral William Leahy, U.S. Atomic Bomb Project - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Is this kernel related (signal 11)?
Rogier Wolff writes: > Harware problems are normally not reproducable. Can you attach a > debugger to your X server, and catch it when things go bad? (And > give the Xfree86 people a backtrace) Bad RAM can be extremely reproducable though, and can certainly produce SEGVs. Evidence: I recently had a bad 128MB SDRAM which *always* failed at byte address 0x220068, which was the middle of the mem_map array. All I needed to do was 'dd if=/dev/hda of=/dev/null' and the machine would die within 5 minutes due to an invalid buffer_head pointer. The SDRAM naturally passed each and every single memory test I could throw at it. However, a new SDRAM fixed the problem. It is quite common for SDRAMs to fail in this way - think about the failure mode. Some of the silicon in the SDRAM is damaged. This isn't going to move about, so its going to be in a fixed position. A fixed position means a specific set of transistors, gate, and therefore memory location. In answer to the original posters question, the first step would be to grab a copy of memtest86 (iirc its a program that is run from floppy disk) and run that on your system. That /should/ (and I stress should there) detect any RAM problems you have. -- Russell King ([EMAIL PROTECTED])The developer of ARM Linux http://www.arm.linux.org.uk/personal/aboutme.html - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Is this kernel related (signal 11)?
Rainer Mager wrote: > particular problem still exists. In brief, X windows dies with signal 11. I [snip] Does it always happen when you are moving the mouse over a button or windowbar or some other on-screen object like that? Usually, when I have that happen, it's because I'm overclocking the machine too much... I have no idea if that helps, but I thought I'd go ahead and throw in my two cents, just in case it does. -Barry K. Nathan <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Is this kernel related (signal 11)?
Rainer Mager wrote: particular problem still exists. In brief, X windows dies with signal 11. I [snip] Does it always happen when you are moving the mouse over a button or windowbar or some other on-screen object like that? Usually, when I have that happen, it's because I'm overclocking the machine too much... I have no idea if that helps, but I thought I'd go ahead and throw in my two cents, just in case it does. -Barry K. Nathan [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Is this kernel related (signal 11)?
Rogier Wolff writes: Harware problems are normally not reproducable. Can you attach a debugger to your X server, and catch it when things go bad? (And give the Xfree86 people a backtrace) Bad RAM can be extremely reproducable though, and can certainly produce SEGVs. Evidence: I recently had a bad 128MB SDRAM which *always* failed at byte address 0x220068, which was the middle of the mem_map array. All I needed to do was 'dd if=/dev/hda of=/dev/null' and the machine would die within 5 minutes due to an invalid buffer_head pointer. The SDRAM naturally passed each and every single memory test I could throw at it. However, a new SDRAM fixed the problem. It is quite common for SDRAMs to fail in this way - think about the failure mode. Some of the silicon in the SDRAM is damaged. This isn't going to move about, so its going to be in a fixed position. A fixed position means a specific set of transistors, gate, and therefore memory location. In answer to the original posters question, the first step would be to grab a copy of memtest86 (iirc its a program that is run from floppy disk) and run that on your system. That /should/ (and I stress should there) detect any RAM problems you have. -- Russell King ([EMAIL PROTECTED])The developer of ARM Linux http://www.arm.linux.org.uk/personal/aboutme.html - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Is this kernel related (signal 11)?
On Mon, 22 Jan 2001, Russell King wrote: Evidence: I recently had a bad 128MB SDRAM which *always* failed at byte address 0x220068, and X is likely to be the biggest process by far on a box, so statistically will be the process that hits this bad byte the most. no? regards, -- Paul Jakma [EMAIL PROTECTED] [EMAIL PROTECTED] PGP5 key: http://www.clubi.ie/jakma/publickey.txt --- Fortune: The bomb will never go off. I speak as an expert in explosives. -- Admiral William Leahy, U.S. Atomic Bomb Project - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: oops, signal 11
On Sat, Jan 20, 2001 at 01:46:50PM +0100, [EMAIL PROTECTED] wrote: I know that signal 11 with gcc is a sign of bad hardware; however it strikes me that I don't get random oopses - a whole bunch of them is appended. The compiler tends to hammer harder on the memory than the kernel; this is a sign of the great effort which was taken to optimize the kernel's cache usage. Ralf - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
RE: Is this kernel related (signal 11)?
> Would this be an SMP IA32 box with glibc 2.2? I have two such boxen > showing exactly the same behaviour, although I can't reproduce it at will. Close, it is actually an SMP IA32 box with glibc 2.1.3. But you've now convinced me to not upgrade glibc yet ;-) --Rainer - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Is this kernel related (signal 11)?
Rainer Mager wrote: > that it is likely a hardware or kernel problem. So, my question is, > how can I pin point the problem? Is this likely to be a kernel > issue? No, not hardware. No not kernel. Harware problems are normally not reproducable. Can you attach a debugger to your X server, and catch it when things go bad? (And give the Xfree86 people a backtrace) Roger. -- ** [EMAIL PROTECTED] ** http://www.BitWizard.nl/ ** +31-15-2137555 ** *-- BitWizard writes Linux device drivers for any device you may have! --* * There are old pilots, and there are bold pilots. * There are also old, bald pilots. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Is this kernel related (signal 11)?
On Mon, 22 Jan 2001, Rainer Mager wrote: > I brought up this issue last month and had some response but as > of yet my particular problem still exists. In brief, X windows dies > with signal 11. I have done quite a bit of testing and this does not > seem to be a hardware issue. Also, I have never managed to get a > signal 11 error when not running X. Would this be an SMP IA32 box with glibc 2.2? I have two such boxen showing exactly the same behaviour, although I can't reproduce it at will. It happens even when I use the same kernel and XFree86 binaries which were working perfectly before the upgrade. The LDT handling fixes which were added between 2.4.0-prerelease and the real 2.4.0 appeared to make this _slightly_ less frequent, but I still rarely have an X server uptime of more than a few days. -- dwmw2 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Is this kernel related (signal 11)?
Hi all, I brought up this issue last month and had some response but as of yet my particular problem still exists. In brief, X windows dies with signal 11. I have done quite a bit of testing and this does not seem to be a hardware issue. Also, I have never managed to get a signal 11 error when not running X. I posted on the X Free86 mailing lists and the consensus there seems to be that it is likely a hardware or kernel problem. So, my question is, how can I pin point the problem? Is this likely to be a kernel issue? Recently I have been able to reproduce the problem reliably in a few ways. First, if I use an app that uses ncurses (like 'make menuconfig' on the Linux kernel) from within Gnome-terminal then X dies instantly. For now I have gone to using only xterm. I can also cause the error from xmms by scrolling the playlist repeatedly. This will happen within a few seconds but not instantly like above. I have also seen the error in other cases but none that I am yet able to reproduce on demand. PLEASE, any suggestions? --Rainer - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Is this kernel related (signal 11)?
Hi all, I brought up this issue last month and had some response but as of yet my particular problem still exists. In brief, X windows dies with signal 11. I have done quite a bit of testing and this does not seem to be a hardware issue. Also, I have never managed to get a signal 11 error when not running X. I posted on the X Free86 mailing lists and the consensus there seems to be that it is likely a hardware or kernel problem. So, my question is, how can I pin point the problem? Is this likely to be a kernel issue? Recently I have been able to reproduce the problem reliably in a few ways. First, if I use an app that uses ncurses (like 'make menuconfig' on the Linux kernel) from within Gnome-terminal then X dies instantly. For now I have gone to using only xterm. I can also cause the error from xmms by scrolling the playlist repeatedly. This will happen within a few seconds but not instantly like above. I have also seen the error in other cases but none that I am yet able to reproduce on demand. PLEASE, any suggestions? --Rainer - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Is this kernel related (signal 11)?
On Mon, 22 Jan 2001, Rainer Mager wrote: I brought up this issue last month and had some response but as of yet my particular problem still exists. In brief, X windows dies with signal 11. I have done quite a bit of testing and this does not seem to be a hardware issue. Also, I have never managed to get a signal 11 error when not running X. Would this be an SMP IA32 box with glibc 2.2? I have two such boxen showing exactly the same behaviour, although I can't reproduce it at will. It happens even when I use the same kernel and XFree86 binaries which were working perfectly before the upgrade. The LDT handling fixes which were added between 2.4.0-prerelease and the real 2.4.0 appeared to make this _slightly_ less frequent, but I still rarely have an X server uptime of more than a few days. -- dwmw2 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Is this kernel related (signal 11)?
Rainer Mager wrote: that it is likely a hardware or kernel problem. So, my question is, how can I pin point the problem? Is this likely to be a kernel issue? No, not hardware. No not kernel. Harware problems are normally not reproducable. Can you attach a debugger to your X server, and catch it when things go bad? (And give the Xfree86 people a backtrace) Roger. -- ** [EMAIL PROTECTED] ** http://www.BitWizard.nl/ ** +31-15-2137555 ** *-- BitWizard writes Linux device drivers for any device you may have! --* * There are old pilots, and there are bold pilots. * There are also old, bald pilots. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
RE: Is this kernel related (signal 11)?
Would this be an SMP IA32 box with glibc 2.2? I have two such boxen showing exactly the same behaviour, although I can't reproduce it at will. Close, it is actually an SMP IA32 box with glibc 2.1.3. But you've now convinced me to not upgrade glibc yet ;-) --Rainer - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
oops, signal 11
I know that signal 11 with gcc is a sign of bad hardware; however it strikes me that I don't get random oopses - a whole bunch of them is appended. I used 2.4.0 with alsa, kmp3player running and an endless loop compiling the kernel. Mirko Kloppstech ksymoops 2.3.7 on i686 2.4.0. Options used -V (default) -k /proc/ksyms (default) -l /proc/modules (default) -o /lib/modules/2.4.0/ (default) -m /boot/System.map (specified) Stack: 27ad 08140b88 bfffe03c c125abd8 cff7ea64 0001 0001 cca70824 cca70780 c01245f3 ccaafcc0 ccaafce0 cb09df90 c0124530 ffea ccaafcc0 27ad 1000 17ad 08141b88 Call Trace: [] [] [] [] Code: 39 7b 08 75 f0 8b 74 24 24 39 73 0c 75 e7 53 e8 4d 4d 00 00 Using defaults from ksymoops -t elf32-i386 -a i386 Trace; c01245f3 Trace; c0124530 Trace; c013029e Trace; c0108f27 Code; Before first symbol <_EIP>: Code; Before first symbol 0: 39 7b 08 cmpl %edi,0x8(%ebx) Code; 0003 Before first symbol 3: 75 f0 jnefff5 <_EIP+0xfff5> fff5 Code; 0005 Before first symbol 5: 8b 74 24 24 movl 0x24(%esp,1),%esi Code; 0009 Before first symbol 9: 39 73 0c cmpl %esi,0xc(%ebx) Code; 000c Before first symbol c: 75 e7 jnefff5 <_EIP+0xfff5> fff5 Code; 000e Before first symbol e: 53pushl %ebx Code; 000f Before first symbol f: e8 4d 4d 00 00call 4d61 <_EIP+0x4d61> 4d61 Before first symbol Unable to handle kernel paging request at virtual address 3640 c012414f *pde = Oops: CPU:0 EIP:0010:[] EFLAGS: 00010202 eax: cff4 ebx: 3638 ecx: 0010 edx: cff7ea64 esi: cca70780 edi: cca70824 ebp: 1000 esp: cad1ff40 ds: 0018 es: 0018 ss: 0018 Process cpp (pid: 15018, stackpage=cad1f000) Stack: 27ad 0809ab20 bfffd900 c125abd8 cff7ea64 0001 0001 cca70824 cca70780 c01245f3 cbe42340 cbe42360 cad1ff90 c0124530 ffea cbe42340 27ad 1000 17ad 0809bb20 Call Trace: [] [] [] [] Code: 39 7b 08 75 f0 8b 74 24 24 39 73 0c 75 e7 53 e8 4d 4d 00 00 >>EIP; c012414f<= Trace; c01245f3 Trace; c0124530 Trace; c013029e Trace; c0108f27 Code; c012414f <_EIP>: Code; c012414f<= 0: 39 7b 08 cmpl %edi,0x8(%ebx) <= Code; c0124152 3: 75 f0 jnefff5 <_EIP+0xfff5> c0124144 Code; c0124154 5: 8b 74 24 24 movl 0x24(%esp,1),%esi Code; c0124158 9: 39 73 0c cmpl %esi,0xc(%ebx) Code; c012415b c: 75 e7 jnefff5 <_EIP+0xfff5> c0124144 Code; c012415d e: 53pushl %ebx Code; c012415e f: e8 4d 4d 00 00call 4d61 <_EIP+0x4d61> c0128eb0 Unable to handle kernel paging request at virtual address 3659 c012414f *pde = Oops: CPU:0 EIP:0010:[] EFLAGS: 00010202 eax: cff4 ebx: 3651 ecx: 0010 edx: cff7ea64 esi: cca70780 edi: cca70824 ebp: 1000 esp: ca31df40 ds: 0018 es: 0018 ss: 0018 Process cpp (pid: 15039, stackpage=ca31d000) Stack: 27ad 08140b88 bfffe03c c125abd8 cff7ea64 0001 0001 cca70824 cca70780 c01245f3 cc5fed40 cc5fed60 ca31df90 c0124530 ffea cc5fed40 27ad 1000 17ad 08141b88 Call Trace: [] [] [] [] Code: 39 7b 08 75 f0 8b 74 24 24 39 73 0c 75 e7 53 e8 4d 4d 00 00 >>EIP; c012414f<= Trace; c01245f3 Trace; c0124530 Trace; c013029e Trace; c0108f27 Code; c012414f <_EIP>: Code; c012414f<= 0: 39 7b 08 cmpl %edi,0x8(%ebx) <= Code; c0124152 3: 75 f0 jnefff5 <_EIP+0xfff5> c0124144 Code; c0124154 5: 8b 74 24 24 movl 0x24(%esp,1),%esi Code; c0124158 9: 39 73 0c cmpl %esi,0xc(%ebx) Code; c012415b c: 75 e7 jnefff5 <_EIP+0xfff5> c0124144 Code; c012415d e: 53pushl %ebx Code; c012415e f: e8 4d 4d 00 00call 4d61 <_EIP+0x4d61> c0128eb0 Unable to handle kernel paging request at virtual address 3663 c012414f *pde = Oops: CPU:0 EIP:0010:[] EFLAGS: 00010202 eax: cff4 ebx: 365b ecx: 0010 edx: cff7ea64 esi: cca70780 edi: cca70824 ebp: 1000 esp: cb09df40 ds: 0018 es: 0018 ss: 0018 Process cpp (pid: 15089, stackpage=cb09d000) Stack: 27ad 0809ab20 bfffd900 c125abd8 cff7ea64 0001
oops, signal 11
I know that signal 11 with gcc is a sign of bad hardware; however it strikes me that I don't get random oopses - a whole bunch of them is appended. I used 2.4.0 with alsa, kmp3player running and an endless loop compiling the kernel. Mirko Kloppstech ksymoops 2.3.7 on i686 2.4.0. Options used -V (default) -k /proc/ksyms (default) -l /proc/modules (default) -o /lib/modules/2.4.0/ (default) -m /boot/System.map (specified) Stack: 27ad 08140b88 bfffe03c c125abd8 cff7ea64 0001 0001 cca70824 cca70780 c01245f3 ccaafcc0 ccaafce0 cb09df90 c0124530 ffea ccaafcc0 27ad 1000 17ad 08141b88 Call Trace: [c01245f3] [c0124530] [c013029e] [c0108f27] Code: 39 7b 08 75 f0 8b 74 24 24 39 73 0c 75 e7 53 e8 4d 4d 00 00 Using defaults from ksymoops -t elf32-i386 -a i386 Trace; c01245f3 generic_file_read+63/80 Trace; c0124530 file_read_actor+0/60 Trace; c013029e sys_read+8e/d0 Trace; c0108f27 system_call+33/38 Code; Before first symbol _EIP: Code; Before first symbol 0: 39 7b 08 cmpl %edi,0x8(%ebx) Code; 0003 Before first symbol 3: 75 f0 jnefff5 _EIP+0xfff5 fff5 END_OF_CODE+2f7904e2/ Code; 0005 Before first symbol 5: 8b 74 24 24 movl 0x24(%esp,1),%esi Code; 0009 Before first symbol 9: 39 73 0c cmpl %esi,0xc(%ebx) Code; 000c Before first symbol c: 75 e7 jnefff5 _EIP+0xfff5 fff5 END_OF_CODE+2f7904e2/ Code; 000e Before first symbol e: 53pushl %ebx Code; 000f Before first symbol f: e8 4d 4d 00 00call 4d61 _EIP+0x4d61 4d61 Before first symbol Unable to handle kernel paging request at virtual address 3640 c012414f *pde = Oops: CPU:0 EIP:0010:[c012414f] EFLAGS: 00010202 eax: cff4 ebx: 3638 ecx: 0010 edx: cff7ea64 esi: cca70780 edi: cca70824 ebp: 1000 esp: cad1ff40 ds: 0018 es: 0018 ss: 0018 Process cpp (pid: 15018, stackpage=cad1f000) Stack: 27ad 0809ab20 bfffd900 c125abd8 cff7ea64 0001 0001 cca70824 cca70780 c01245f3 cbe42340 cbe42360 cad1ff90 c0124530 ffea cbe42340 27ad 1000 17ad 0809bb20 Call Trace: [c01245f3] [c0124530] [c013029e] [c0108f27] Code: 39 7b 08 75 f0 8b 74 24 24 39 73 0c 75 e7 53 e8 4d 4d 00 00 EIP; c012414f do_generic_file_read+1af/590 = Trace; c01245f3 generic_file_read+63/80 Trace; c0124530 file_read_actor+0/60 Trace; c013029e sys_read+8e/d0 Trace; c0108f27 system_call+33/38 Code; c012414f do_generic_file_read+1af/590 _EIP: Code; c012414f do_generic_file_read+1af/590 = 0: 39 7b 08 cmpl %edi,0x8(%ebx) = Code; c0124152 do_generic_file_read+1b2/590 3: 75 f0 jnefff5 _EIP+0xfff5 c0124144 do_generic_file_read+1a4/590 Code; c0124154 do_generic_file_read+1b4/590 5: 8b 74 24 24 movl 0x24(%esp,1),%esi Code; c0124158 do_generic_file_read+1b8/590 9: 39 73 0c cmpl %esi,0xc(%ebx) Code; c012415b do_generic_file_read+1bb/590 c: 75 e7 jnefff5 _EIP+0xfff5 c0124144 do_generic_file_read+1a4/590 Code; c012415d do_generic_file_read+1bd/590 e: 53pushl %ebx Code; c012415e do_generic_file_read+1be/590 f: e8 4d 4d 00 00call 4d61 _EIP+0x4d61 c0128eb0 age_page_up+0/30 Unable to handle kernel paging request at virtual address 3659 c012414f *pde = Oops: CPU:0 EIP:0010:[c012414f] EFLAGS: 00010202 eax: cff4 ebx: 3651 ecx: 0010 edx: cff7ea64 esi: cca70780 edi: cca70824 ebp: 1000 esp: ca31df40 ds: 0018 es: 0018 ss: 0018 Process cpp (pid: 15039, stackpage=ca31d000) Stack: 27ad 08140b88 bfffe03c c125abd8 cff7ea64 0001 0001 cca70824 cca70780 c01245f3 cc5fed40 cc5fed60 ca31df90 c0124530 ffea cc5fed40 27ad 1000 17ad 08141b88 Call Trace: [c01245f3] [c0124530] [c013029e] [c0108f27] Code: 39 7b 08 75 f0 8b 74 24 24 39 73 0c 75 e7 53 e8 4d 4d 00 00 EIP; c012414f do_generic_file_read+1af/590 = Trace; c01245f3 generic_file_read+63/80 Trace; c0124530 file_read_actor+0/60 Trace; c013029e sys_read+8e/d0 Trace; c0108f27 system_call+33/38 Code; c012414f do_generic_file_read+1af/590 _EIP: Code; c012414f do_generic_file_read+1af/590 = 0: 39 7b 08 cmpl %edi,0x8(%ebx) = Code; c0124152 do_generic_file_read+1b2/590 3: 75 f0 jnefff5 _EIP+0xfff5 c0124144 do_generic_file_read+1a4/590 Code; c0124154 do_generic_file_read+1b4/590 5: 8b 74 24 24 movl 0x24(%esp,1),%esi Code
Signal 11 - revisited
I was wondering if anyone had any new info/suggestions for the Signal 11 problem. I think I last reported that I had tried 2.4.0test12 w AGPGart and DRM turned off. This seemed a bit more stable but I did have X crash with Signall 11 after about 1.5 days. I'd really appreciate any advice on how to diagnose this. Thanks, --Rainer - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Signal 11 - revisited
I was wondering if anyone had any new info/suggestions for the Signal 11 problem. I think I last reported that I had tried 2.4.0test12 w AGPGart and DRM turned off. This seemed a bit more stable but I did have X crash with Signall 11 after about 1.5 days. I'd really appreciate any advice on how to diagnose this. Thanks, --Rainer - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Signal 11
On Thu, 14 Dec 2000, Linus Torvalds wrote: > Yes. > > And I realize that somebody inside RedHat really wanted to use a snapshot > in order to get some C++ code to compile right. > > But it at the same time threw C stability out the window, by using a > not-very-widely-tested snapshot for a major new release. > > Are you seriously saying that you think it was a good trade-off? Or are > you just ashamed of admitting that RH did something stupid? > Pardon the poking in here, but I must say I agree here. RH did a VERY dumb thing. > I have a report from a Sony VAIO user that couldn't compile the CVS X at > all on his picturebook (and you need to compile the CVS tree in order to > get required fixes for the ATI Rage Mobility in that machine). I don't > know the details, but they were apparently due to RH 7 issues. It's not in the X tree or anything, but here's a personal example. Machine: Dual P3 550 HDD: Dual Ultra2Wide Seagate 18GB Hdd OS: RedHat 7 Compile Target: Linux Kernel 2.2.17 Result with gcc 2.96: Failure (syntax errors in the i386 branch of the arch tree) Result with compat-egcs-62: Success on the first try. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Signal 11
Date:Fri, 15 Dec 2000 01:09:29 + (GMT) From: Alan Cox <[EMAIL PROTECTED]> > > oWe tell vendors to build RPMv3 , glibc 2.1.x > Curious HOW do you tell vendors?? When they ask. More usefully Dan Quinlann and most vendors put together a recommended set of things to build with and use. It warns about library pitfalls, kernel changes and what packaging is supported. It is far from perfect and nothing like the LSB goals but its a start and following it does give you applications that with a bit of care run on everything. In the interests of making sure everyone understands the history: The Linux Development Platform Specification (LDPS) was started as a result of an informal evening post-LSB-meeting gathering in June --- to which by the way Red Hat didn't send any representatives(*) --- the discussion at the restaurant started along the lines of "Oh, my *GOD* RedHat is about to do something stupid --- they're releasing Red Hat 7.0 with beta/snapshots of just about every single critical system component except the kernel --- and vendors who fall into the trap developing against Red Hat 7.0 won't work with any other distribution. This is going to be *bad* for Linux." So yes, the reason why LDPS was formed was to recommend to vendors what they should build and use --- but while Alan gave comments about the LDPS once it was announced that a group of people were working on the LDPS , there is no way that the LDPS could even vaguely be considered a Red Hat initiative. (The LDPS is a separate work group which is part of the FSG, so it is a sister group to the LSB effort.) - Ted (*) Ever since Jim Kingdon left Red Hat (he was at VA Linux for a while, and is now at SGI), as far as I know no one at Red Hat is actively participating in the LSB activities --- they haven't sent anyone to the physical LSB meetings, or participated in the bi-weekly phone conferences, or taken work items to help finish the LSB. Alan does participate on the mailing lists, and makes quite helpful comments, but as far as I know that's about the limit to Red Hat's participation to either the LSB or the LDPS specification work. Speaking as someone who has been contributing time and effort to the LSB, it would be great if Red Hat were to become more fully involved in the LSB; I (and I'm sure all the other LSB volunteers) would welcome a greater level of participation by Red Hat. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Signal 11
Date:Fri, 15 Dec 2000 01:09:29 + (GMT) From: Alan Cox [EMAIL PROTECTED] oWe tell vendors to build RPMv3 , glibc 2.1.x Curious HOW do you tell vendors?? When they ask. More usefully Dan Quinlann and most vendors put together a recommended set of things to build with and use. It warns about library pitfalls, kernel changes and what packaging is supported. It is far from perfect and nothing like the LSB goals but its a start and following it does give you applications that with a bit of care run on everything. In the interests of making sure everyone understands the history: The Linux Development Platform Specification (LDPS) was started as a result of an informal evening post-LSB-meeting gathering in June --- to which by the way Red Hat didn't send any representatives(*) --- the discussion at the restaurant started along the lines of "Oh, my *GOD* RedHat is about to do something stupid --- they're releasing Red Hat 7.0 with beta/snapshots of just about every single critical system component except the kernel --- and vendors who fall into the trap developing against Red Hat 7.0 won't work with any other distribution. This is going to be *bad* for Linux." So yes, the reason why LDPS was formed was to recommend to vendors what they should build and use --- but while Alan gave comments about the LDPS once it was announced that a group of people were working on the LDPS , there is no way that the LDPS could even vaguely be considered a Red Hat initiative. (The LDPS is a separate work group which is part of the FSG, so it is a sister group to the LSB effort.) - Ted (*) Ever since Jim Kingdon left Red Hat (he was at VA Linux for a while, and is now at SGI), as far as I know no one at Red Hat is actively participating in the LSB activities --- they haven't sent anyone to the physical LSB meetings, or participated in the bi-weekly phone conferences, or taken work items to help finish the LSB. Alan does participate on the mailing lists, and makes quite helpful comments, but as far as I know that's about the limit to Red Hat's participation to either the LSB or the LDPS specification work. Speaking as someone who has been contributing time and effort to the LSB, it would be great if Red Hat were to become more fully involved in the LSB; I (and I'm sure all the other LSB volunteers) would welcome a greater level of participation by Red Hat. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Signal 11
> > o We tell vendors to build RPMv3 , glibc 2.1.x > Curious HOW do you tell vendors?? When they ask. More usefully Dan Quinlann and most vendors put together a recommended set of things to build with and use. It warns about library pitfalls, kernel changes and what packaging is supported. It is far from perfect and nothing like the LSB goals but its a start and following it does give you applications that with a bit of care run on everything. > > o Vendors not being stupid understand that they have a bigger market > > share if they do that. > Ummm.. I remember Oracle's first release... wasn't it JUST redhat?? I believe so, and Adabas was SuSE only, and I doubt either vendor wanted it that way. Both actually ran fine on the other but were not supported. Alan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Signal 11
Sticking my nose where it doesn't belong... On Thu, 14 Dec 2000, Alan Cox wrote: > > Yes, but 2.96 is also binary incompatible with all non-redhat distro's. > > And since redhat is _the_ distro that commercial entities use to > > release software for, this was very arguably a bad move. > o We tell vendors to build RPMv3 , glibc 2.1.x Curious HOW do you tell vendors?? > o Vendors not being stupid understand that they have a bigger market > share if they do that. Ummm.. I remember Oracle's first release... wasn't it JUST redhat?? -- Michael Peddemors - Senior Consultant Unix Administration - WebSite Hosting Network Services - Programming Wizard Internet Services http://www.wizard.ca Linux Support Specialist - http://www.linuxmagic.com (604) 589-0037 Beautiful British Columbia, Canada - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Signal 11
In article <[EMAIL PROTECTED]>, Alan Cox <[EMAIL PROTECTED]> wrote: >> Yes, but 2.96 is also binary incompatible with all non-redhat distro's. >> And since redhat is _the_ distro that commercial entities use to >> release software for, this was very arguably a bad move. > >Except you conveniently ignore a few facts Doesn't everyone. I should have included a smiley with as comment that I was only half-joking. Anyway this is the kernel list, and as such this is becoming off-topic. Mike. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Signal 11
> Yes, but 2.96 is also binary incompatible with all non-redhat distro's. > And since redhat is _the_ distro that commercial entities use to > release software for, this was very arguably a bad move. Except you conveniently ignore a few facts o Someone else moved to 2.95 not RH . In fact some of us felt 2.95 wasnt fit to ship at the time. o We tell vendors to build RPMv3 , glibc 2.1.x o Vendors not being stupid understand that they have a bigger market share if they do that. Alan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Signal 11
I had tons of problems with K6III/450s in ASUS P5A motherboards with various kinds of 128MB SIMMs. There were multiple different symptoms, including just sig11s on compiles, corrupted input (leading to syntax error) in compiles, and corrupted input in the buffer cache (same crash over and over, but dd if=/dev/hda of=/dev/null bs=1024k count=128 fixed it). Swapping the memory would sometimes get rid of the problem, but then it would come back weeks-months later. I saw a bizzare problem once in an Tyan dual proc PIII/500 box with 2x256MB ECC RAM that one of the ECC RAM sticks was bad and that repeated kernel compiles would hang after about 24 hours. Strange problem, but found that in troubleshooting it, the problem followed this stick of RAM around to different machines. Blamed the RAM but don't understand what the underlying problem was... On Fri, 8 Dec 2000 [EMAIL PROTECTED] wrote: > On Thu, 7 Dec 2000, Jeff V. Merkey wrote: > > > It's related to some change in 2.4 vs. 2.2. There are other programs > > affected other than X, SSH also get's spurious signal 11's now and again > > with 2.4 and glibc <= 2.1 and it does not occur on 2.2. > > > > I've begun to get a bit paranoid about my K6-2 500 box. > > Various processes have been getting random signals after heavy CPU usage. > Playing an MPEG movie, kernel compile, or even just some small apps > compiling sometimes. Just for the record, this isn't an OOM situation, > I've watched this box with half its memory free or in buffers left > unattended, and suddenly a compile will just die. > > I replaced the CPU with a brand new K6-2. Problem remained. > Next suspect was faulty RAM. Despite having passed a memtest, I > swapped out the DIMMs for some known good ones. > Suspecting cooling problems, I added some case fans. > Next came a bigger power supply. Still the problems. > The latest last ditch attempt to make this box stable has been > to attach the biggest fan I could find that would fit a socket 7 CPU. > > And still the problems are there. > The only remaining suspect would be a flaky motherboard. > But then comes the real killer : This box is rock solid under 2.2 > > *boggle* > > I'm not sure exactly when this started, but I think I first noticed > it around test5 or so, but didn't suspect the kernel at the time. > > I've tried kernels compiled with everything from 2.91.66 when this > was a Redhat box, to gcc 2.95.2 (from Debian woody) when I installed > debian on it. If this is a compiler bug, it's one that no compiler > I've tried seems to be immune from. > > regards, > > Davej. > > - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Signal 11
In article <[EMAIL PROTECTED]>, Bernhard Rosenkraenzer <[EMAIL PROTECTED]> wrote: >The same thing is true of *any* gcc release. >For example, C++-ABI wise, 2.95.x is incompatible BOTH with egcs 1.1.x >_and_ the upcoming 3.0 release. Yes, but 2.96 is also binary incompatible with all non-redhat distro's. And since redhat is _the_ distro that commercial entities use to release software for, this was very arguably a bad move. There's simply no excuse. It's too obvious. Mike. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Signal 11
On Thu, 14 Dec 2000, Jakub Jelinek wrote: > On Thu, Dec 14, 2000 at 11:11:28AM -0800, Linus Torvalds wrote: > > user applications and (b) gcc-2.96 is so broken that it requires special > > libraries for C++ vtable chunks handling that is different, so the > > _working_ gcc can only be used with programs that do not need such > > library support. > > Every major g++ release had incompatible libstdc++, even g++ 2.95.2 if > bootstrapped under glibc 2.1.x is binary incompatible with g++ 2.95.2 > bootstrapped under glibc 2.2.x (libstdc++ uses different soname then; > even if we used g++ 2.95.2 we would not have C++ binary compatible with > other distributions). Yes. And I realize that somebody inside RedHat really wanted to use a snapshot in order to get some C++ code to compile right. But it at the same time threw C stability out the window, by using a not-very-widely-tested snapshot for a major new release. Are you seriously saying that you think it was a good trade-off? Or are you just ashamed of admitting that RH did something stupid? > > compiler to something that works better RSN. It apparently has problems > > compiling stuff like the CVS snapshots of X etc too (and obviously, > > anything you compile under gcc-2.96 is not likely to work anywhere else > > except with the broken libraries). > > Can you point to things in X which were actually miscompiled because of bugs > in gcc 2.96? I have a report from a Sony VAIO user that couldn't compile the CVS X at all on his picturebook (and you need to compile the CVS tree in order to get required fixes for the ATI Rage Mobility in that machine). I don't know the details, but they were apparently due to RH 7 issues. > So far I was aware about X bugs (already fixed in X CVS) which > were triggered with -fstrict-aliasing which is now the default while > gcc 2.95.2 had -fstrict-aliasing disabled by default. I hope that's another thing that the gcc people fix by the time they do a _real_ release. Anobody who thinks that "-fstrict-aliasing" being on by default is a good idea is probably a compiler person who hasn't seen real code. > That is not to say there were not bugs in the gcc we shipped, but the bugs > which were reported against it have been fixed already. That's good. It's even better if you don't play quite as fast-and-lose with your shipping compiler. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Signal 11
On Thu, Dec 14, 2000 at 11:11:28AM -0800, Linus Torvalds wrote: > user applications and (b) gcc-2.96 is so broken that it requires special > libraries for C++ vtable chunks handling that is different, so the > _working_ gcc can only be used with programs that do not need such > library support. Every major g++ release had incompatible libstdc++, even g++ 2.95.2 if bootstrapped under glibc 2.1.x is binary incompatible with g++ 2.95.2 bootstrapped under glibc 2.2.x (libstdc++ uses different soname then; even if we used g++ 2.95.2 we would not have C++ binary compatible with other distributions). This will change once 3.0 is out, but it will still take some time. > compiler to something that works better RSN. It apparently has problems > compiling stuff like the CVS snapshots of X etc too (and obviously, > anything you compile under gcc-2.96 is not likely to work anywhere else > except with the broken libraries). Can you point to things in X which were actually miscompiled because of bugs in gcc 2.96? So far I was aware about X bugs (already fixed in X CVS) which were triggered with -fstrict-aliasing which is now the default while gcc 2.95.2 had -fstrict-aliasing disabled by default. That is not to say there were not bugs in the gcc we shipped, but the bugs which were reported against it have been fixed already. Jakub - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Signal 11
> If you ask any gcc folks, the main reason they think this was a really > stupid thing to do was exactly that the 2.96 thing is incompatible BOTH > with the 2.95.x release _and_ the upcoming 3.0 release. And with egcs 1.1.2. So egcs is a different format to all others 2.95 is a different format to all others 2.96 is a different format to all others and 2.96 is a C++ compiler > gcc-2.95.2 is at least a real release, from a branch that is actively > maintained - so a 2.95.3 is likely to happen reasonably soon, fixing as > many problems as possible _without_ being incompatible like the snapshots > are. The 2.96 tree is maintained actively. Updates for the Red Hat 7 packages are being worked on and CygnusHat people are working on both that maintenance and on feeding all they find back to the core gcc team. In fact we have sufficient faith in it we sell packages and support based around that and our preparedness to support it. > As to X compile problems - neither egcs nor 2.95.2 appears to have any > trouble with the CVS tree. Possibly because they got fixed, because, after > all, at least those were real releases. I asked Jakub. He's confused as to your report. As far as he is aware the only X problems in the CVS tree were related to XFree86 source code bugs misusing type punning. If you have a case to lookat Jakub would love to hear about it and fix either X or gcc. > I'd applaud RedHat for making snapshots available, but they should be > marked as SNAPSHOTS, and not as the main compiler with no way to fix the > damn problems it causes. That it was confusing and mistaken by some as an official GNU group release is something we never intended and have already apologised for. It was done without malice or ill intent. > As it is, anybody doing development is probably better off at RH-6.2. > That is doubly true if they intend to release binaries. We strongly recommend that people use 6.2 for developing binaries for general release unless they have specific requirements for glibc 2.2. Thats the same guidelines the LSB 'oops we havent finished yet here is a quickie for now' documentation recommends. Similarly RPM packaging using RPMv3 is recommended. Alan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Signal 11
On Thu, 14 Dec 2000, Bernhard Rosenkraenzer wrote: > > > > gcc-2.95.2 is at least a real release, from a branch that is actively > > maintained > > Not very actively. > Please take the time to compare the activity in gcc_2_95_branch with the > patches in the current "2.96" version in rawhide. Take a look at the differences in linux-2.2.x and linux-2.3.x. linux-2.3.x is was a h*ll of a lot more "actively maintained". But nobody really considers that to be an argument for RedHat (or anybody else) to installa 2.3.x kernel by default. Sure, most distributions have a "hacker kernel", but it's NOT installed by default, and it is clearly marked as experimental. Your arguments make no sense. The compiler is often _more_ important to system stability than the kernel. A "real release" implies that it at least had testing, and that people know what the problem spots tend to be. Note that the "know what the problem spots tend to be" is important. > > As to X compile problems - neither egcs nor 2.95.2 appears to have any > > trouble with the CVS tree. > > Neither does 2.96-68. Good. Maybe you'd make it clearer to everybody who installed from your CD's that they had better upgrade. Pronto. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Signal 11
On Thu, 14 Dec 2000, Linus Torvalds wrote: > If you ask any gcc folks, the main reason they think this was a really > stupid thing to do was exactly that the 2.96 thing is incompatible BOTH > with the 2.95.x release _and_ the upcoming 3.0 release. The same thing is true of *any* gcc release. For example, C++-ABI wise, 2.95.x is incompatible BOTH with egcs 1.1.x _and_ the upcoming 3.0 release. > > Like what - gcc 2.5.8 ? The problem is not in general that the snapshot is any > > buggier than before, but that the bugs are in different places. egcs and gcc295 > > both caused X compile problems too. > > gcc-2.95.2 is at least a real release, from a branch that is actively > maintained Not very actively. Please take the time to compare the activity in gcc_2_95_branch with the patches in the current "2.96" version in rawhide. > - so a 2.95.3 is likely to happen reasonably soon, fixing as > many problems as possible _without_ being incompatible like the snapshots > are. It will be incompatible with any non-2.95.x-version, and I don't think 2.96-68 is any more buggy than the current 2.95 branch. The initial 2.96 "release" did have some odd bugs; all the known ones have been fixed. > Or just stay at 2.91.66 (egcs). This may be good for the kernel, but it's not acceptable for C++. Also, there's no support for some of the platforms we have to work with, such as ia64 and S/390 - using different compilers for different architectures isn't a real solution either. > As to X compile problems - neither egcs nor 2.95.2 appears to have any > trouble with the CVS tree. Neither does 2.96-68. LLaP bero - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Signal 11
On Thu, 14 Dec 2000, Alan Cox wrote: > > > user applications and (b) gcc-2.96 is so broken that it requires special > > libraries for C++ vtable chunks handling that is different, so the > > Wrong - the C++ vtable format change is part of the intended progression of the > compiler and needed to meet standards compliance. gcc 295 also changed the > internal formats. Unfortunately the gcc295 and 296 formats are both probably > not the final format. The compiler folks are not willing to guarantee anything > untill gcc 3.0, which may actually be out by the time 2.4 is stable. If you ask any gcc folks, the main reason they think this was a really stupid thing to do was exactly that the 2.96 thing is incompatible BOTH with the 2.95.x release _and_ the upcoming 3.0 release. Nobody asked the people who knew this, apparently. > > unusable as a development platform, and I hope RH downgrades their > > compiler to something that works better RSN. It apparently has problems > > Like what - gcc 2.5.8 ? The problem is not in general that the snapshot is any > buggier than before, but that the bugs are in different places. egcs and gcc295 > both caused X compile problems too. gcc-2.95.2 is at least a real release, from a branch that is actively maintained - so a 2.95.3 is likely to happen reasonably soon, fixing as many problems as possible _without_ being incompatible like the snapshots are. Or just stay at 2.91.66 (egcs). As to X compile problems - neither egcs nor 2.95.2 appears to have any trouble with the CVS tree. Possibly because they got fixed, because, after all, at least those were real releases. I'd applaud RedHat for making snapshots available, but they should be marked as SNAPSHOTS, and not as the main compiler with no way to fix the damn problems it causes. As it is, anybody doing development is probably better off at RH-6.2. That is doubly true if they intend to release binaries. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Signal 11
On Thu, Dec 14, 2000 at 04:42:03AM -0800, Clayton Weaver wrote: > There has a been a thread on the teTeX mailing list the last few days > about a (RedHat, but probably more general than just their rpms) > gcc-2.9.6 w/glibc-2.2.x bug. At -O2, it can miscompile > > unsigned varname; /* "unsigned int varname;" is ok */ > > (no problem at -O or no optimization at all, and doesn't happen if teTeX > is compiled with kgcc). That one is fixed already for some time, it was a bug in loop unrolling (that patch is still pending review for the mainline CVS though). Jakub - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Signal 11
> I don't know why RH decided to do their idiotic gcc-2.96 release (it > certainly wasn't approved by any technical gcc people - the gcc people Every single patch in that release barring I believe 2 was accepted into the main tree. So they liked the code. The naming did upset people and was unfortunate, but done talking to the compiler folks at Red Hat with the best of intentions behind it. If we had called it 'Red Hat cc' I think people would have been even more offended at the way they had been discredited. I do understand why they got peeved, I do understand why they feel no urge to support the 296 codebase (nor would I want them to). I hit 'd' when I see 'I have 2.2.18 patched with [reiserfs|ext3|bigmem|lfs]' for the same reasons. > They included another (non-broken) compiler, and called it "kgcc". > "kgcc" stands for "kernel gcc", apparently because (a) they realised kgcc is a convention invented a long time ago by Conectiva. Debian also used to have gcc272. It is done because gcc272 is useless at C++, has lots of bugs egcs is no better at C++ and has lots of bugs gcc295 is a little better at C++ and is _Crawling_ with bugs gcc296(redhat) is a lot better at C++ and doesn't appear to be any buggier. In fact gcc296 is the first compiler that can compiled 2.2.16 correctly. All the previous compilers miscompile the strstr() inline in some cases. Thats why I had to hack the 2.2 kernel tree to make it work. (And the cases where you got compile time errors gcc was right to moan about - like using (...) in traditional > user applications and (b) gcc-2.96 is so broken that it requires special > libraries for C++ vtable chunks handling that is different, so the Wrong - the C++ vtable format change is part of the intended progression of the compiler and needed to meet standards compliance. gcc 295 also changed the internal formats. Unfortunately the gcc295 and 296 formats are both probably not the final format. The compiler folks are not willing to guarantee anything untill gcc 3.0, which may actually be out by the time 2.4 is stable. > unusable as a development platform, and I hope RH downgrades their > compiler to something that works better RSN. It apparently has problems Like what - gcc 2.5.8 ? The problem is not in general that the snapshot is any buggier than before, but that the bugs are in different places. egcs and gcc295 both caused X compile problems too. I still advise people: Use egcs-1.1.2 for Linux 2.2.x. You can build 2.2.18 with gcc 2.9.6 but I personally wouldn't be running production systems on a kernel built that way - but NOT because gcc296 is buggier but because the bugs are going to be in different places and I firmly believe production system people should let the loons find them ;) Alan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Signal 11
In article <[EMAIL PROTECTED]>, Clayton Weaver <[EMAIL PROTECTED]> wrote: > >There has a been a thread on the teTeX mailing list the last few days >about a (RedHat, but probably more general than just their rpms) >gcc-2.9.6 w/glibc-2.2.x bug. At -O2, it can miscompile Quite frankly, anybody who uses RedHat 7.0 and their broken compiler for _anything_ is going to have trouble. I don't know why RH decided to do their idiotic gcc-2.96 release (it certainly wasn't approved by any technical gcc people - the gcc people were upset about it too), and I find it even more surprising that they apparently KNEW that the compiler they were using was completely broken. They included another (non-broken) compiler, and called it "kgcc". "kgcc" stands for "kernel gcc", apparently because (a) they realised that a miscompiled kernel is even worse than miscompiling some random user applications and (b) gcc-2.96 is so broken that it requires special libraries for C++ vtable chunks handling that is different, so the _working_ gcc can only be used with programs that do not need such library support. Namely the kernel. In case it wasn't obvious yet, I consider RedHat-7.0 to be basically unusable as a development platform, and I hope RH downgrades their compiler to something that works better RSN. It apparently has problems compiling stuff like the CVS snapshots of X etc too (and obviously, anything you compile under gcc-2.96 is not likely to work anywhere else except with the broken libraries). Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Signal 11
This is unrelated to the signal 11 problem, but something to consider for "random crashes and segfaults", ie are you using this compiler and glibc version combination. There has a been a thread on the teTeX mailing list the last few days about a (RedHat, but probably more general than just their rpms) gcc-2.9.6 w/glibc-2.2.x bug. At -O2, it can miscompile unsigned varname; /* "unsigned int varname;" is ok */ (no problem at -O or no optimization at all, and doesn't happen if teTeX is compiled with kgcc). Showed up in the kpathsea library (which began to split paths on '-' as well as '/' after a user upgraded compiler and libc and recompiled teTeX). Regards, Clayton Weaver <mailto:[EMAIL PROTECTED]> (Seattle) "Everybody's ignorant, just in different subjects." Will Rogers - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Signal 11
This is unrelated to the signal 11 problem, but something to consider for "random crashes and segfaults", ie are you using this compiler and glibc version combination. There has a been a thread on the teTeX mailing list the last few days about a (RedHat, but probably more general than just their rpms) gcc-2.9.6 w/glibc-2.2.x bug. At -O2, it can miscompile unsigned varname; /* "unsigned int varname;" is ok */ (no problem at -O or no optimization at all, and doesn't happen if teTeX is compiled with kgcc). Showed up in the kpathsea library (which began to split paths on '-' as well as '/' after a user upgraded compiler and libc and recompiled teTeX). Regards, Clayton Weaver mailto:[EMAIL PROTECTED] (Seattle) "Everybody's ignorant, just in different subjects." Will Rogers - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Signal 11
In article [EMAIL PROTECTED], Clayton Weaver [EMAIL PROTECTED] wrote: There has a been a thread on the teTeX mailing list the last few days about a (RedHat, but probably more general than just their rpms) gcc-2.9.6 w/glibc-2.2.x bug. At -O2, it can miscompile Quite frankly, anybody who uses RedHat 7.0 and their broken compiler for _anything_ is going to have trouble. I don't know why RH decided to do their idiotic gcc-2.96 release (it certainly wasn't approved by any technical gcc people - the gcc people were upset about it too), and I find it even more surprising that they apparently KNEW that the compiler they were using was completely broken. They included another (non-broken) compiler, and called it "kgcc". "kgcc" stands for "kernel gcc", apparently because (a) they realised that a miscompiled kernel is even worse than miscompiling some random user applications and (b) gcc-2.96 is so broken that it requires special libraries for C++ vtable chunks handling that is different, so the _working_ gcc can only be used with programs that do not need such library support. Namely the kernel. In case it wasn't obvious yet, I consider RedHat-7.0 to be basically unusable as a development platform, and I hope RH downgrades their compiler to something that works better RSN. It apparently has problems compiling stuff like the CVS snapshots of X etc too (and obviously, anything you compile under gcc-2.96 is not likely to work anywhere else except with the broken libraries). Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Signal 11
I don't know why RH decided to do their idiotic gcc-2.96 release (it certainly wasn't approved by any technical gcc people - the gcc people Every single patch in that release barring I believe 2 was accepted into the main tree. So they liked the code. The naming did upset people and was unfortunate, but done talking to the compiler folks at Red Hat with the best of intentions behind it. If we had called it 'Red Hat cc' I think people would have been even more offended at the way they had been discredited. I do understand why they got peeved, I do understand why they feel no urge to support the 296 codebase (nor would I want them to). I hit 'd' when I see 'I have 2.2.18 patched with [reiserfs|ext3|bigmem|lfs]' for the same reasons. They included another (non-broken) compiler, and called it "kgcc". "kgcc" stands for "kernel gcc", apparently because (a) they realised kgcc is a convention invented a long time ago by Conectiva. Debian also used to have gcc272. It is done because gcc272 is useless at C++, has lots of bugs egcs is no better at C++ and has lots of bugs gcc295 is a little better at C++ and is _Crawling_ with bugs gcc296(redhat) is a lot better at C++ and doesn't appear to be any buggier. In fact gcc296 is the first compiler that can compiled 2.2.16 correctly. All the previous compilers miscompile the strstr() inline in some cases. Thats why I had to hack the 2.2 kernel tree to make it work. (And the cases where you got compile time errors gcc was right to moan about - like using (...) in traditional user applications and (b) gcc-2.96 is so broken that it requires special libraries for C++ vtable chunks handling that is different, so the Wrong - the C++ vtable format change is part of the intended progression of the compiler and needed to meet standards compliance. gcc 295 also changed the internal formats. Unfortunately the gcc295 and 296 formats are both probably not the final format. The compiler folks are not willing to guarantee anything untill gcc 3.0, which may actually be out by the time 2.4 is stable. unusable as a development platform, and I hope RH downgrades their compiler to something that works better RSN. It apparently has problems Like what - gcc 2.5.8 ? The problem is not in general that the snapshot is any buggier than before, but that the bugs are in different places. egcs and gcc295 both caused X compile problems too. I still advise people: Use egcs-1.1.2 for Linux 2.2.x. You can build 2.2.18 with gcc 2.9.6 but I personally wouldn't be running production systems on a kernel built that way - but NOT because gcc296 is buggier but because the bugs are going to be in different places and I firmly believe production system people should let the loons find them ;) Alan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Signal 11
On Thu, 14 Dec 2000, Alan Cox wrote: user applications and (b) gcc-2.96 is so broken that it requires special libraries for C++ vtable chunks handling that is different, so the Wrong - the C++ vtable format change is part of the intended progression of the compiler and needed to meet standards compliance. gcc 295 also changed the internal formats. Unfortunately the gcc295 and 296 formats are both probably not the final format. The compiler folks are not willing to guarantee anything untill gcc 3.0, which may actually be out by the time 2.4 is stable. If you ask any gcc folks, the main reason they think this was a really stupid thing to do was exactly that the 2.96 thing is incompatible BOTH with the 2.95.x release _and_ the upcoming 3.0 release. Nobody asked the people who knew this, apparently. unusable as a development platform, and I hope RH downgrades their compiler to something that works better RSN. It apparently has problems Like what - gcc 2.5.8 ? The problem is not in general that the snapshot is any buggier than before, but that the bugs are in different places. egcs and gcc295 both caused X compile problems too. gcc-2.95.2 is at least a real release, from a branch that is actively maintained - so a 2.95.3 is likely to happen reasonably soon, fixing as many problems as possible _without_ being incompatible like the snapshots are. Or just stay at 2.91.66 (egcs). As to X compile problems - neither egcs nor 2.95.2 appears to have any trouble with the CVS tree. Possibly because they got fixed, because, after all, at least those were real releases. I'd applaud RedHat for making snapshots available, but they should be marked as SNAPSHOTS, and not as the main compiler with no way to fix the damn problems it causes. As it is, anybody doing development is probably better off at RH-6.2. That is doubly true if they intend to release binaries. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Signal 11
On Thu, Dec 14, 2000 at 04:42:03AM -0800, Clayton Weaver wrote: There has a been a thread on the teTeX mailing list the last few days about a (RedHat, but probably more general than just their rpms) gcc-2.9.6 w/glibc-2.2.x bug. At -O2, it can miscompile unsigned varname; /* "unsigned int varname;" is ok */ (no problem at -O or no optimization at all, and doesn't happen if teTeX is compiled with kgcc). That one is fixed already for some time, it was a bug in loop unrolling (that patch is still pending review for the mainline CVS though). Jakub - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Signal 11
On Thu, 14 Dec 2000, Linus Torvalds wrote: If you ask any gcc folks, the main reason they think this was a really stupid thing to do was exactly that the 2.96 thing is incompatible BOTH with the 2.95.x release _and_ the upcoming 3.0 release. The same thing is true of *any* gcc release. For example, C++-ABI wise, 2.95.x is incompatible BOTH with egcs 1.1.x _and_ the upcoming 3.0 release. Like what - gcc 2.5.8 ? The problem is not in general that the snapshot is any buggier than before, but that the bugs are in different places. egcs and gcc295 both caused X compile problems too. gcc-2.95.2 is at least a real release, from a branch that is actively maintained Not very actively. Please take the time to compare the activity in gcc_2_95_branch with the patches in the current "2.96" version in rawhide. - so a 2.95.3 is likely to happen reasonably soon, fixing as many problems as possible _without_ being incompatible like the snapshots are. It will be incompatible with any non-2.95.x-version, and I don't think 2.96-68 is any more buggy than the current 2.95 branch. The initial 2.96 "release" did have some odd bugs; all the known ones have been fixed. Or just stay at 2.91.66 (egcs). This may be good for the kernel, but it's not acceptable for C++. Also, there's no support for some of the platforms we have to work with, such as ia64 and S/390 - using different compilers for different architectures isn't a real solution either. As to X compile problems - neither egcs nor 2.95.2 appears to have any trouble with the CVS tree. Neither does 2.96-68. LLaP bero - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Signal 11
If you ask any gcc folks, the main reason they think this was a really stupid thing to do was exactly that the 2.96 thing is incompatible BOTH with the 2.95.x release _and_ the upcoming 3.0 release. And with egcs 1.1.2. So egcs is a different format to all others 2.95 is a different format to all others 2.96 is a different format to all others and 2.96 is a C++ compiler gcc-2.95.2 is at least a real release, from a branch that is actively maintained - so a 2.95.3 is likely to happen reasonably soon, fixing as many problems as possible _without_ being incompatible like the snapshots are. The 2.96 tree is maintained actively. Updates for the Red Hat 7 packages are being worked on and CygnusHat people are working on both that maintenance and on feeding all they find back to the core gcc team. In fact we have sufficient faith in it we sell packages and support based around that and our preparedness to support it. As to X compile problems - neither egcs nor 2.95.2 appears to have any trouble with the CVS tree. Possibly because they got fixed, because, after all, at least those were real releases. I asked Jakub. He's confused as to your report. As far as he is aware the only X problems in the CVS tree were related to XFree86 source code bugs misusing type punning. If you have a case to lookat Jakub would love to hear about it and fix either X or gcc. I'd applaud RedHat for making snapshots available, but they should be marked as SNAPSHOTS, and not as the main compiler with no way to fix the damn problems it causes. That it was confusing and mistaken by some as an official GNU group release is something we never intended and have already apologised for. It was done without malice or ill intent. As it is, anybody doing development is probably better off at RH-6.2. That is doubly true if they intend to release binaries. We strongly recommend that people use 6.2 for developing binaries for general release unless they have specific requirements for glibc 2.2. Thats the same guidelines the LSB 'oops we havent finished yet here is a quickie for now' documentation recommends. Similarly RPM packaging using RPMv3 is recommended. Alan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Signal 11
On Thu, Dec 14, 2000 at 11:11:28AM -0800, Linus Torvalds wrote: user applications and (b) gcc-2.96 is so broken that it requires special libraries for C++ vtable chunks handling that is different, so the _working_ gcc can only be used with programs that do not need such library support. Every major g++ release had incompatible libstdc++, even g++ 2.95.2 if bootstrapped under glibc 2.1.x is binary incompatible with g++ 2.95.2 bootstrapped under glibc 2.2.x (libstdc++ uses different soname then; even if we used g++ 2.95.2 we would not have C++ binary compatible with other distributions). This will change once 3.0 is out, but it will still take some time. compiler to something that works better RSN. It apparently has problems compiling stuff like the CVS snapshots of X etc too (and obviously, anything you compile under gcc-2.96 is not likely to work anywhere else except with the broken libraries). Can you point to things in X which were actually miscompiled because of bugs in gcc 2.96? So far I was aware about X bugs (already fixed in X CVS) which were triggered with -fstrict-aliasing which is now the default while gcc 2.95.2 had -fstrict-aliasing disabled by default. That is not to say there were not bugs in the gcc we shipped, but the bugs which were reported against it have been fixed already. Jakub - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Signal 11
On Thu, 14 Dec 2000, Jakub Jelinek wrote: On Thu, Dec 14, 2000 at 11:11:28AM -0800, Linus Torvalds wrote: user applications and (b) gcc-2.96 is so broken that it requires special libraries for C++ vtable chunks handling that is different, so the _working_ gcc can only be used with programs that do not need such library support. Every major g++ release had incompatible libstdc++, even g++ 2.95.2 if bootstrapped under glibc 2.1.x is binary incompatible with g++ 2.95.2 bootstrapped under glibc 2.2.x (libstdc++ uses different soname then; even if we used g++ 2.95.2 we would not have C++ binary compatible with other distributions). Yes. And I realize that somebody inside RedHat really wanted to use a snapshot in order to get some C++ code to compile right. But it at the same time threw C stability out the window, by using a not-very-widely-tested snapshot for a major new release. Are you seriously saying that you think it was a good trade-off? Or are you just ashamed of admitting that RH did something stupid? compiler to something that works better RSN. It apparently has problems compiling stuff like the CVS snapshots of X etc too (and obviously, anything you compile under gcc-2.96 is not likely to work anywhere else except with the broken libraries). Can you point to things in X which were actually miscompiled because of bugs in gcc 2.96? I have a report from a Sony VAIO user that couldn't compile the CVS X at all on his picturebook (and you need to compile the CVS tree in order to get required fixes for the ATI Rage Mobility in that machine). I don't know the details, but they were apparently due to RH 7 issues. So far I was aware about X bugs (already fixed in X CVS) which were triggered with -fstrict-aliasing which is now the default while gcc 2.95.2 had -fstrict-aliasing disabled by default. I hope that's another thing that the gcc people fix by the time they do a _real_ release. Anobody who thinks that "-fstrict-aliasing" being on by default is a good idea is probably a compiler person who hasn't seen real code. That is not to say there were not bugs in the gcc we shipped, but the bugs which were reported against it have been fixed already. That's good. It's even better if you don't play quite as fast-and-lose with your shipping compiler. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Signal 11
In article [EMAIL PROTECTED], Bernhard Rosenkraenzer [EMAIL PROTECTED] wrote: The same thing is true of *any* gcc release. For example, C++-ABI wise, 2.95.x is incompatible BOTH with egcs 1.1.x _and_ the upcoming 3.0 release. Yes, but 2.96 is also binary incompatible with all non-redhat distro's. And since redhat is _the_ distro that commercial entities use to release software for, this was very arguably a bad move. There's simply no excuse. It's too obvious. Mike. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Signal 11
I had tons of problems with K6III/450s in ASUS P5A motherboards with various kinds of 128MB SIMMs. There were multiple different symptoms, including just sig11s on compiles, corrupted input (leading to syntax error) in compiles, and corrupted input in the buffer cache (same crash over and over, but dd if=/dev/hda of=/dev/null bs=1024k count=128 fixed it). Swapping the memory would sometimes get rid of the problem, but then it would come back weeks-months later. I saw a bizzare problem once in an Tyan dual proc PIII/500 box with 2x256MB ECC RAM that one of the ECC RAM sticks was bad and that repeated kernel compiles would hang after about 24 hours. Strange problem, but found that in troubleshooting it, the problem followed this stick of RAM around to different machines. Blamed the RAM but don't understand what the underlying problem was... On Fri, 8 Dec 2000 [EMAIL PROTECTED] wrote: On Thu, 7 Dec 2000, Jeff V. Merkey wrote: It's related to some change in 2.4 vs. 2.2. There are other programs affected other than X, SSH also get's spurious signal 11's now and again with 2.4 and glibc = 2.1 and it does not occur on 2.2. AOL I've begun to get a bit paranoid about my K6-2 500 box. Various processes have been getting random signals after heavy CPU usage. Playing an MPEG movie, kernel compile, or even just some small apps compiling sometimes. Just for the record, this isn't an OOM situation, I've watched this box with half its memory free or in buffers left unattended, and suddenly a compile will just die. I replaced the CPU with a brand new K6-2. Problem remained. Next suspect was faulty RAM. Despite having passed a memtest, I swapped out the DIMMs for some known good ones. Suspecting cooling problems, I added some case fans. Next came a bigger power supply. Still the problems. The latest last ditch attempt to make this box stable has been to attach the biggest fan I could find that would fit a socket 7 CPU. And still the problems are there. The only remaining suspect would be a flaky motherboard. But then comes the real killer : This box is rock solid under 2.2 *boggle* I'm not sure exactly when this started, but I think I first noticed it around test5 or so, but didn't suspect the kernel at the time. I've tried kernels compiled with everything from 2.91.66 when this was a Redhat box, to gcc 2.95.2 (from Debian woody) when I installed debian on it. If this is a compiler bug, it's one that no compiler I've tried seems to be immune from. regards, Davej. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Signal 11
Yes, but 2.96 is also binary incompatible with all non-redhat distro's. And since redhat is _the_ distro that commercial entities use to release software for, this was very arguably a bad move. Except you conveniently ignore a few facts o Someone else moved to 2.95 not RH . In fact some of us felt 2.95 wasnt fit to ship at the time. o We tell vendors to build RPMv3 , glibc 2.1.x o Vendors not being stupid understand that they have a bigger market share if they do that. Alan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Signal 11
In article [EMAIL PROTECTED], Alan Cox [EMAIL PROTECTED] wrote: Yes, but 2.96 is also binary incompatible with all non-redhat distro's. And since redhat is _the_ distro that commercial entities use to release software for, this was very arguably a bad move. Except you conveniently ignore a few facts Doesn't everyone. I should have included a smiley with as comment that I was only half-joking. Anyway this is the kernel list, and as such this is becoming off-topic. Mike. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Signal 11
Sticking my nose where it doesn't belong... On Thu, 14 Dec 2000, Alan Cox wrote: Yes, but 2.96 is also binary incompatible with all non-redhat distro's. And since redhat is _the_ distro that commercial entities use to release software for, this was very arguably a bad move. o We tell vendors to build RPMv3 , glibc 2.1.x Curious HOW do you tell vendors?? o Vendors not being stupid understand that they have a bigger market share if they do that. Ummm.. I remember Oracle's first release... wasn't it JUST redhat?? -- Michael Peddemors - Senior Consultant Unix Administration - WebSite Hosting Network Services - Programming Wizard Internet Services http://www.wizard.ca Linux Support Specialist - http://www.linuxmagic.com (604) 589-0037 Beautiful British Columbia, Canada - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Signal 11
o We tell vendors to build RPMv3 , glibc 2.1.x Curious HOW do you tell vendors?? When they ask. More usefully Dan Quinlann and most vendors put together a recommended set of things to build with and use. It warns about library pitfalls, kernel changes and what packaging is supported. It is far from perfect and nothing like the LSB goals but its a start and following it does give you applications that with a bit of care run on everything. o Vendors not being stupid understand that they have a bigger market share if they do that. Ummm.. I remember Oracle's first release... wasn't it JUST redhat?? I believe so, and Adabas was SuSE only, and I doubt either vendor wanted it that way. Both actually ran fine on the other but were not supported. Alan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Signal 11 - the continuing saga
On Wed, 13 Dec 2000, Linus Torvalds wrote: > On Wed, 13 Dec 2000, Linus Torvalds wrote: > > > > Hint: "ptep_mkdirty()". rather obvious oopsie.. once spotted. > In case you wonder why the bug was so insidious, what this caused was two > separate problems, both of them able to cause SIGSGV's. > > One: we didn't mark the page table entry dirty like we were supposed to. > > Two: by making it writable, we also made the page shared, even if it > wasn't supposed to be shared (so when the next process wrote to the page, > if the swap page was shared with somebody else, the changes would show up > even in the process that _didn't_ write to it). > > And "ptep_mkdirty()" is only used by swapoff, so nothing else would show > this. Which was why it hadn't been immediately obvious that anything was > broken. The terminal OOM problem is now gone and I haven't seen a SIGSEGV yet running virgin source. IOU 5 bogo$$ -Mike (I still see something with IKD that _could_ be timing related troubles. There are a couple of grubby fingerprints I need to wipe off, and some churn/burn hours to be sure) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Signal 11 - the continuing saga
On Wed, 13 Dec 2000, Linus Torvalds wrote: > On Wed, 13 Dec 2000, Mike Galbraith wrote: > > > > Not in my test tree. Same fault, and same trace leading up to it. no > > Ok. > > It definitely looks like a swapoff() problem. > > Have you ever seen the behaviour without running swapoff? No. > Also, can you re-create it without running swapon() (if it's something > like a lost dirty bit, it should be possible to trigger even without the > swapon, and I'd like to hear if that can happen - if it only happens with > swapon() and you can't trigger it with just a swapoff() it might be a > question of re-using some swap file stuff and delaying the writeout or > whatever). I'll try loading up swap, swapoff and then doing jobs that fit in ram. (hmm.. what about inactive_clean list when you do swapoff.. might there be pages sitting there that are [were] swap cache? reclaim_page=kaboom?) -Mike - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
RE: Signal 11 - the continuing saga
Err, for those of us who aren't up to our elbows in the kernel code, is there a patch for this? Presumeably this will be rolled into 2.4.0test13 but I'd like to try it out? Also, can someone summarize the fix in English along with the expected, improved behavior (e.g. Linux will never have a signal 11 again and will never, ever crash ;-) Finally, as soon as there is a patch, can other people who have seen this problem test it. My problem is so random that I'd need at least a few days to gain some confidence this is fixed. Thanks all. --Rainer > -Original Message- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED]]On Behalf Of Linus Torvalds > Sent: Thursday, December 14, 2000 5:19 AM > To: Mike Galbraith > Cc: Kernel Mailing List > Subject: Re: Signal 11 - the continuing saga > > > On Wed, 13 Dec 2000, Linus Torvalds wrote: > > > > Hint: "ptep_mkdirty()". > > In case you wonder why the bug was so insidious, what this caused was two > separate problems, both of them able to cause SIGSGV's. > > One: we didn't mark the page table entry dirty like we were supposed to. > > Two: by making it writable, we also made the page shared, even if it > wasn't supposed to be shared (so when the next process wrote to the page, > if the swap page was shared with somebody else, the changes would show up > even in the process that _didn't_ write to it). > > And "ptep_mkdirty()" is only used by swapoff, so nothing else would show > this. Which was why it hadn't been immediately obvious that anything was > broken. > > Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Signal 11 - the continuing saga
On Wed, 13 Dec 2000, Linus Torvalds wrote: > > > Ehh, I think I found it. > > Hint: "ptep_mkdirty()". > > Oops. > > I'll bet you $5 USD (and these days, that's about a gadzillion Euros) that Poor European Gérard as slim as 1.84 meter - 78 Kg these days. What about old days poor European Linus versus these days American Linus on these points ? ;-) > this explains it. Really ? :o) > Linus Gérard. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Signal 11 - the continuing saga
On Wed, Dec 13, 2000 at 11:35:57AM -0800, Linus Torvalds wrote: > > > Ehh, I think I found it. > > Hint: "ptep_mkdirty()". > > Oops. > > I'll bet you $5 USD (and these days, that's about a gadzillion Euros) that > this explains it. > > Linus Good. Sounds like you guys have a handle on it now. :-) Jeff > > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to [EMAIL PROTECTED] > Please read the FAQ at http://www.tux.org/lkml/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Signal 11 - the continuing saga
On Wed, 13 Dec 2000, Linus Torvalds wrote: > > Hint: "ptep_mkdirty()". In case you wonder why the bug was so insidious, what this caused was two separate problems, both of them able to cause SIGSGV's. One: we didn't mark the page table entry dirty like we were supposed to. Two: by making it writable, we also made the page shared, even if it wasn't supposed to be shared (so when the next process wrote to the page, if the swap page was shared with somebody else, the changes would show up even in the process that _didn't_ write to it). And "ptep_mkdirty()" is only used by swapoff, so nothing else would show this. Which was why it hadn't been immediately obvious that anything was broken. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Signal 11 - the continuing saga
On Wed, 13 Dec 2000, Mike Galbraith wrote: > > Not in my test tree. Same fault, and same trace leading up to it. no Ok. It definitely looks like a swapoff() problem. Have you ever seen the behaviour without running swapoff? Also, can you re-create it without running swapon() (if it's something like a lost dirty bit, it should be possible to trigger even without the swapon, and I'd like to hear if that can happen - if it only happens with swapon() and you can't trigger it with just a swapoff() it might be a question of re-using some swap file stuff and delaying the writeout or whatever). Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Signal 11 - the continuing saga
On Wed, 13 Dec 2000, Linus Torvalds wrote: > On Wed, 13 Dec 2000, Linus Torvalds wrote: > > > > Lookin gat "swapoff()", it could easily be something like > > > > - swapoff walks theough the processes, marking the pages dirty > >(correctly) > > - swapoff goes on to the next swap entry, and because it needs memory for > >this, the VM layer will swap out old entries by marking them dirty in > >the "struct page". > > - final stages of swapoff() removes the swap cache entry, never minding > >the fact that it is marked dirty again in "struct page", and clean in > >various VM page tables. > > > > Ho humm.. I don't think that is it exactly, but something along those > > lines. > > Actually, having thought about it for five more minutes, I actually think > that that _is_ it. > > If so, the fix looks like it could be really simple. The whole problem > arises from the fact that we remove the page from the swap cache only > _after_ we've walked the page-tables to look at it. It looks like the > fairly trivial fix is simply to remove it from the swap cache before, > getting rid of all such races in swapoff(). > > Mind trying out this patch? > > NOTE! It's untested. It might not work. It might trigger some sanity-test > somewhere else. But it looks like it should do the right thing (the page > might be moved to _another_ swap device early, if there are multiple swap > areas, but even that should be fine - the unuse_process() stuff doesn't > care about what swapcache this actually is any more. > > Does this patch make a difference (I moved the delete seven lines upwards, > and removed the test - the test looks extraneous). Not in my test tree. Same fault, and same trace leading up to it. I'll run virgin source hard tomorrow to be sure. (No message means no change) -Mike - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Signal 11 - the continuing saga
On Wed, 13 Dec 2000, Linus Torvalds wrote: > > Lookin gat "swapoff()", it could easily be something like > > - swapoff walks theough the processes, marking the pages dirty >(correctly) > - swapoff goes on to the next swap entry, and because it needs memory for >this, the VM layer will swap out old entries by marking them dirty in >the "struct page". > - final stages of swapoff() removes the swap cache entry, never minding >the fact that it is marked dirty again in "struct page", and clean in >various VM page tables. > > Ho humm.. I don't think that is it exactly, but something along those > lines. Actually, having thought about it for five more minutes, I actually think that that _is_ it. If so, the fix looks like it could be really simple. The whole problem arises from the fact that we remove the page from the swap cache only _after_ we've walked the page-tables to look at it. It looks like the fairly trivial fix is simply to remove it from the swap cache before, getting rid of all such races in swapoff(). Mind trying out this patch? NOTE! It's untested. It might not work. It might trigger some sanity-test somewhere else. But it looks like it should do the right thing (the page might be moved to _another_ swap device early, if there are multiple swap areas, but even that should be fine - the unuse_process() stuff doesn't care about what swapcache this actually is any more. Does this patch make a difference (I moved the delete seven lines upwards, and removed the test - the test looks extraneous). Linus --- v2.4.0-test12/linux/mm/swapfile.c Tue Oct 31 12:42:27 2000 +++ linux/mm/swapfile.c Wed Dec 13 09:17:51 2000 @@ -370,6 +370,7 @@ swap_free(entry); return -ENOMEM; } + delete_from_swap_cache(page); read_lock(_lock); for_each_task(p) unuse_process(p->mm, entry, page); @@ -377,8 +378,6 @@ shm_unuse(entry, page); /* Now get rid of the extra reference to the temporary page we've been using. */ - if (PageSwapCache(page)) - delete_from_swap_cache(page); page_cache_release(page); /* * Check for and clear any overflowed swap map counts. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Signal 11 - the continuing saga
On Tue, Dec 12, 2000 at 07:17:41PM -0800, Linus Torvalds wrote: > In article <[EMAIL PROTECTED]>, > Jeff V. Merkey <[EMAIL PROTECTED]> wrote: > >On Wed, Dec 13, 2000 at 09:22:55AM +0900, Rainer Mager wrote: > >>I have a tiny bash script that launches a Java swing app. If I run my > >> script from an xterm (or gnome-terminal or whatever) then it starts up fine. > >> If, however, I try to launch it from my gnome taskbar's menu then it dies > >> with signal 11 (the Java log is available upon request). This seems to be > >> 100% consistent, since I noticed it yesterday, even across reboots. > >> Interestingly, the same behavior occurs if I try to run the program from > >> withis JBuilder 4. > >>So, is this related to the larger signal 11 problems? > > > >There's a corruption bug in the page cache somewhere, and it's 100% > >reproducable. Finding it will be tough > > Unlikely. If the actual program data was corrupted, it would SIGSEGV > regardless of how it's executed. > > I'd guess that the program has a bug, and depending on the arguments and > environment (especially the latter will be different), it shows up or > not. Things like not having a LOCALE set in either case or similar. > > Linus Linus, I agree that there may be some problem in the code above -- the question is what has changed to make this behavior emerge? I see it with a host of programs(ssh, make, netscape) -- true all are userspace. Time permitting, I may attempt to track this down in ssh and make in jobserver mode. It may be related to some interaction that changed underneath. Jeff > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to [EMAIL PROTECTED] > Please read the FAQ at http://www.tux.org/lkml/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
RE: Signal 11 - the continuing saga
On Wed, 13 Dec 2000, Rainer Mager wrote: > Mike et al, > > I have no idea what IKD is and I don't know what to do with any results I > might find BUT I'd be happy to do this if it will help. Please pass on the > info with the instructions. Who should I report the results to? IKD is a debugging toolkit. The trap I have set up freezes the kernel trace buffer at SIGSEGV time. From there you have to read it backward looking for problems. (which isn't particularly easy). I was thinking you wanted to roll your shirt sleeves up and maybe this would help ;-) If you want it, and do a trace, I'b be very interested in the last couple of schedules to compare to my traces. It's not something you can just run and report though. -Mike - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
R: Signal 11 - the continuing saga
>> From: CMA [mailto:[EMAIL PROTECTED]] >> Did you already try to selectively disable L1 and L2 caches (if >> your box has both) and see what happens? > >Anyone know how to do this? If you own a p6 class machine (sorry but I didn't find your hw specs in previous messages) you should be able to enter setup and disable L1 and/or L2 usually in "advanced setup". If you disable L1, the machine will be *much* slower. If you disable L2, you will notice it under heavy load. Most of the times sig 11 is due L1 cache overheating (on chip). Just controlling whether cpu cooling fan is properly seated and spinning solves the problem. Regards. Dr. Eng. Mauro Tassinari www.c-m-a.it - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
RE: Signal 11 - the continuing saga
Mike et al, I have no idea what IKD is and I don't know what to do with any results I might find BUT I'd be happy to do this if it will help. Please pass on the info with the instructions. Who should I report the results to? --Rainer > [mailto:[EMAIL PROTECTED]]On Behalf Of Mike Galbraith > If you want, I can extract IKD.. which happens to have a trap in place > for this (because I have a 100% reproducable swap related SIGSEGV that > I'm trying to figure out). > > If you're interested, let me know and I'll extract it (quite large) and > send it along instructions on how to do the trap. > > -Mike - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
RE: Signal 11 - the continuing saga
Give that man a cigarit was an env var (not LOCALE but LANG). I'd actually checked this but I didn't think that made a difference in my case. Thanks Linus, now can you fix the larger signal 11 problem? --Rainer > [mailto:[EMAIL PROTECTED]]On Behalf Of Linus Torvalds > I'd guess that the program has a bug, and depending on the arguments and > environment (especially the latter will be different), it shows up or > not. Things like not having a LOCALE set in either case or similar. > > Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
RE: Signal 11 - the continuing saga
Mike et al, I have no idea what IKD is and I don't know what to do with any results I might find BUT I'd be happy to do this if it will help. Please pass on the info with the instructions. Who should I report the results to? --Rainer [mailto:[EMAIL PROTECTED]]On Behalf Of Mike Galbraith If you want, I can extract IKD.. which happens to have a trap in place for this (because I have a 100% reproducable swap related SIGSEGV that I'm trying to figure out). If you're interested, let me know and I'll extract it (quite large) and send it along instructions on how to do the trap. -Mike - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/