Dan Malek <dan at netx4.com> writes: > Marcus Sundberg wrote: > > > Well, it also happens to be typical to the MM problem solved by > > the patches at http://www.zeta.org.au/~linsol/..... > > That's pretty interesting, since the Software Emulation trap is > caused by fetching trash from memory, and the MMU changes are to > properly track dirty data pages. I don't buy it. All you did > by adding these changes was eliminate (or change) the data TLB > miss timing.
Well, I've verified the problem on MBX, ADS, FADS, RPX Lite/Classic and 3 different custom boards, with 823, 850, 860, 860T and 860P CPUs. It has a 100% chance of crashing without the patch, and a 0% chance of crashing with it. Especially Pavel's description sounded very much like this problem. All you have to do to experience the problem is allocate memory until the RAM gets low (allocating buffer cache by copying files around is enough.) If you are running a dynamicly linked glibc2 app what is most likely to happen is that the (dirty) jump table is thrown out by the broken MM code. When the application tries to call a function in a shared library the jump table will be read in from disk again, but as it's not filled in properly the app will crash, usually by jumping into some data area and getting an illegal instruction. What happens next is usually that the app's parent will run, and crash for the same reason. This will go on until we get up to init, which can not be killed, and get stuck in an endless loop of trying to execute the same illegal instruction. If you are running a staticly linked app, or using libc 1.99, it is usually more common to get a segfault, but in the end it all depends on what pages have been erroneously thrown out and which are accessed first. //Marcus -- Signature under construction, please come back later. ** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/
