Eric Dumazet <[EMAIL PROTECTED]> wrote:
>
> Hi Andi
> 
> I have very strange coredumps happening on a big 64bits program.
> 
> Some background :
> - This program is multi-threaded
> - Machine is a dual Opteron 248 machine, 12GB ram.
> - Kernel 2.6.6  (tried 2.6.10 too but problems too)
> - The program uses hugetlb pages.
> - The program uses prefetchnta
> - The program uses about 8GB of ram.
> 
> After numerous differents core dumps of this program, and gdb debugging 
> I found :
> 
> Every time the crash occurs when one thread is using some ram located at 
> virtual address 0xffffe6xx

What does "using" mean?  Is the program executing from that location?

> When examining the core image, the data saved on this page seems correct 
> (ie countains coherent user data). But one register (%rbx) is usually 
> corrupted and contains a small value (like 0x3c)
> 
> The last instruction using this register is :
>       prefetchnta 0x18(,%rbx,4)
> 
> 
> Examining linux sources, I found that 0xffffe000 is 'special' (ia 32 
> vsyscall) and 0xffffe600 is about sigreturn subsection of this special area.
> 
> Is it possible some vm trick just kicks in and corrupts my true 64bits 
> program ?
> 

Interesting.  IIRC, opterons will very occasionally (and incorrectly) take
a fault when performing a prefetch against a dud pointer.  The kernel will
fix that up.  At a guess, I'd say tha the fixup code isn't doing the right
thing when the faulting EIP is in the vsyscall page.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to