Eric Dumazet <[EMAIL PROTECTED]> wrote: > > Hi Andi > > I have very strange coredumps happening on a big 64bits program. > > Some background : > - This program is multi-threaded > - Machine is a dual Opteron 248 machine, 12GB ram. > - Kernel 2.6.6 (tried 2.6.10 too but problems too) > - The program uses hugetlb pages. > - The program uses prefetchnta > - The program uses about 8GB of ram. > > After numerous differents core dumps of this program, and gdb debugging > I found : > > Every time the crash occurs when one thread is using some ram located at > virtual address 0xffffe6xx
What does "using" mean? Is the program executing from that location? > When examining the core image, the data saved on this page seems correct > (ie countains coherent user data). But one register (%rbx) is usually > corrupted and contains a small value (like 0x3c) > > The last instruction using this register is : > prefetchnta 0x18(,%rbx,4) > > > Examining linux sources, I found that 0xffffe000 is 'special' (ia 32 > vsyscall) and 0xffffe600 is about sigreturn subsection of this special area. > > Is it possible some vm trick just kicks in and corrupts my true 64bits > program ? > Interesting. IIRC, opterons will very occasionally (and incorrectly) take a fault when performing a prefetch against a dud pointer. The kernel will fix that up. At a guess, I'd say tha the fixup code isn't doing the right thing when the faulting EIP is in the vsyscall page. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/