Re: UVMHIST, pmap_get_physpage panic

2018-12-17 Thread Thomas Klausner
On Mon, Dec 17, 2018 at 08:29:47AM +0100, Maxime Villard wrote:
> Do you also get out-of-memory when you disable UVMHIST?

Adding UVMHIST was done for debugging the "X server being killed a
lot" problem I posted about on current-users in the last months.

I didn't have out-of-memory panics of the kernel before, when I was
running with KASAN but without UVMHIST.
 Thomas


Re: UVMHIST, pmap_get_physpage panic

2018-12-16 Thread Maxime Villard

Le 17/12/2018 à 08:10, Thomas Klausner a écrit :

On Mon, Dec 17, 2018 at 08:06:36AM +0100, Maxime Villard wrote:

Le 16/12/2018 à 09:09, Thomas Klausner a écrit :

[ 16674.534547] panic: pmap_get_physpage: out of memory


Well, out of memory means out of memory. KASAN consumes a bit more than
1/8 of the KVA. So if in normal times your system would use 8GB of ram,
KASAN adds an extra ~1.1GB.


So why doesn't it kill userland processes? I don't believe my kernel
needs all 32GB of RAM.


I don't know. In fact I don't understand how it is normal to get this:

[ 16674.544550] pmap_growkernel() at netbsd:pmap_growkernel
[ 16674.544550] kasan_shadow_map() at netbsd:kasan_shadow_map+0xff
[ 16674.544550] pmap_growkernel() at netbsd:pmap_growkernel+0x283

pmap_growkernel() does

mutex_enter(kpm->pm_lock);

So if it's called recursively I think we have a problem. The call
path is:

pmap_growkernel -> kasan_shadow_map -> pmap_get_physpage ->
[somewhere we need to allocate KVA] -> pmap_growkernel

This problem is not KASAN-specific, because KASAN just duplicates
the existing logic:

pmap_growkernel -> pmap_alloc_level -> pmap_get_physpage

Maybe KASAN makes the problem more visible.

Do you also get out-of-memory when you disable UVMHIST?


Re: UVMHIST, pmap_get_physpage panic

2018-12-16 Thread Thomas Klausner
On Mon, Dec 17, 2018 at 08:06:36AM +0100, Maxime Villard wrote:
> Le 16/12/2018 à 09:09, Thomas Klausner a écrit :
> > [ 16674.534547] panic: pmap_get_physpage: out of memory
> 
> Well, out of memory means out of memory. KASAN consumes a bit more than
> 1/8 of the KVA. So if in normal times your system would use 8GB of ram,
> KASAN adds an extra ~1.1GB.

So why doesn't it kill userland processes? I don't believe my kernel
needs all 32GB of RAM.
 Thomas


Re: UVMHIST, pmap_get_physpage panic

2018-12-16 Thread Maxime Villard

Le 16/12/2018 à 09:09, Thomas Klausner a écrit :

[ 16674.534547] panic: pmap_get_physpage: out of memory


Well, out of memory means out of memory. KASAN consumes a bit more than
1/8 of the KVA. So if in normal times your system would use 8GB of ram,
KASAN adds an extra ~1.1GB.


UVMHIST, pmap_get_physpage panic

2018-12-16 Thread Thomas Klausner
Hi!

I've been adding UVMHIST to my kernel config (now its GENERIC + KASAN
+ UVMHIST). I noticed that UVMHIST slowed the machine down a bit (not
by a factor of two, but in the ballpark, for bulk builds). And I had
two panics since.

The machine is doing a bulk build (in a tmpfs) and some file I/O (via
NFS mostly).

The first panic was the usual SPL NOT LOWERED gibberish (attached).

The second was:

[ 16674.534547] panic: pmap_get_physpage: out of memory
[ 16674.534547] cpu10: Begin traceback...
[ 16674.534547] vpanic() at netbsd:vpanic+0x221
[ 16674.534547] snprintf() at netbsd:snprintf
[ 16674.544550] pmap_growkernel() at netbsd:pmap_growkernel
[ 16674.544550] kasan_shadow_map() at netbsd:kasan_shadow_map+0xff
[ 16674.544550] pmap_growkernel() at netbsd:pmap_growkernel+0x283
[ 16674.554553] uvm_map_prepare() at netbsd:uvm_map_prepare+0xe14
[ 16674.554553] uvm_map() at netbsd:uvm_map+0xec
[ 16674.564557] uvm_km_alloc() at netbsd:uvm_km_alloc+0x466
[ 16674.564557] pool_grow() at netbsd:pool_grow+0xbb
[ 16674.574561] pool_catchup() at netbsd:pool_catchup+0x46
[ 16674.574561] pool_get() at netbsd:pool_get+0x7e1
[ 16674.584564] allocbuf() at netbsd:allocbuf+0x119
[ 16674.584564] getblk() at netbsd:getblk+0x185
[ 16674.584564] bio_doread() at netbsd:bio_doread+0x1b
[ 16674.594568] bread() at netbsd:bread+0x18
[ 16674.594568] ffs_init_vnode() at netbsd:ffs_init_vnode+0x1cd
[ 16674.604572] ffs_loadvnode() at netbsd:ffs_loadvnode+0xc8
[ 16674.604572] vcache_get() at netbsd:vcache_get+0x4f4
[ 16674.604572] ufs_lookup() at netbsd:ufs_lookup+0x1320
[ 16674.614575] VOP_LOOKUP() at netbsd:VOP_LOOKUP+0xb6
[ 16674.614575] lookup_once() at netbsd:lookup_once+0x34b
[ 16674.624579] namei_tryemulroot() at netbsd:namei_tryemulroot+0x87d
[ 16674.624579] namei() at netbsd:namei+0x65
[ 16674.634583] fd_nameiat.isra.2() at netbsd:fd_nameiat.isra.2+0xd1
[ 16674.634583] do_sys_statat() at netbsd:do_sys_statat+0x111
[ 16674.644586] sys___lstat50() at netbsd:sys___lstat50+0x85
[ 16674.644586] syscall() at netbsd:syscall+0x308
[ 16674.644586] --- syscall (number 441) ---
[ 16674.644586] 761a961145aa:
[ 16674.644586] cpu10: End traceback...

I have a kernel core dump for this one.

Is this a bug or do I need to get more RAM?

Comments on UVMHIST performance cost and the first panic are also
appreciated.

Thanks,
 Thomas


panic.gz
Description: application/gunzip