This was actually caused by a bug in one of the older versions of the 
"mempool: use map_anon() for large allocations or when memory is 
fragmented"  patch. It turns out I forgot that object_size() also needs to 
account mamp_anon() based allocations and do it properly ;-) My latest - 
version 4 - of this patch should work better, plus I added a unit test 
around it. But it still needs to be reviewed.

On Wednesday, March 25, 2020 at 11:48:52 AM UTC-4, Waldek Kozaczuk wrote:
>
> This is really related to the "OOM query" thread but I wanted to send new 
> email as the other thread has gotten quite long.
>
> I any case we are troubleshooting an app crash which happens pretty 
> instantly after boot and one of the of thread stack trace looks like this:
>
> (gdb) bt
> #0  0x00000000403a7bea in processor::cli_hlt () at
> arch/x64/processor.hh:247
> #1  nmi (ef=0xffff80003fa1c068) at arch/x64/exceptions.cc:306
> #2  <signal handler called>
> #3  0x00000000403940a3 in memcpy_repmov_ssse3 (dest=0x2000415014c0,
> src=0x20004e7851d4, n=16) at /usr/include/c++/9/array:185
> #4  0x0000100001756a5b in ?? ()
> #5  0x0000000000000000 in ?? ()
>
> Also this is with the last 2 patches - "[PATCH V2 1/2] mempool: fix a bug 
> in page_range_allocator() when handling worst case O(n) scenario" and 
> "[PATCH V2 2/2] mempool: use map_anon() for large allocations or when 
> memory is fragmented" applied to address fragmentation that make 
> malloc_large() use mmu::map_anon() in certain cases.
>
> So as you tell mempy (or specifically memcpy_repmov_ssse3()) triggers NMI 
> (Non-maskable interrupt) exception in memcpy between memory areas allocated 
> with mmu::map_anon() (see dest=0x2000415014c0,
> src=0x20004e7851d4, n=16). I really have no idea why. But have a hunch 
> that possibly it happens because mapping tables are not being refreshed 
> properly/flushed. Possibly allocation in requested on one cpu and then 
> memcpy()  called on another one which does not see a mapping yet because. 
> Or maybe TLB needs to flushed. From cursory reading it look mmu::map_anon() 
> might be doing it (somewhere downstream) but not 100% sure.
>
> Or maybe this NMI is caused by misaligned memory allocation (had question 
> in my patch if it really addresses it properly). Or maybe a bug in my 
> patch? Or maybe there is something fundamental in the way memory allocated 
> with map_anon() vs allocation using contiguous physical memory. 
>
> Anybody has other smart ideas?
>
> Waldek
>

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/osv-dev/899b38b5-aed4-4497-ab83-c161a6b673ea%40googlegroups.com.

Reply via email to