Unkillable process in "vm map (user)"

2017-12-10 Thread Peter Jeremy
I was experimenting with ports/devel/libmill (which is a library that
provides Go-styly functionality for C programs) and managed to create
an unkillable process by spawning 100 "goroutines" (think very
cheap "thread" or "coroutine") joined by "channels" (think message
passing pipes).  (The program ran basically instantaneously with 1
or 10 "goroutines", and the Go version has no problems with 100
goroutines on a much smaller system).

According to SIGINFO, it's blocked on "vm map (user)" but I can't kill
it.  Can anyone suggest a way to unwedge it?

This is on a system running FreeBSD/amd64 11.1-STABLE r324494.

server% procstat -kk 452
  PIDTID COMMTDNAME  KSTACK
  452 102382 chain   -   mi_switch+0x17c 
sleepq_switch+0x118 sleepq_wait+0x43 _sx_slock_hard+0x34e _sx_slock+0xd4 
vm_map_lookup+0xbd vm_fault_hold+0x194b vm_fault+0x75 trap_pfault+0x107 
trap+0x382 calltrap+0x8
server% ps -wal -p 452
UID PID  PPID CPU PRI NI   VSZ  RSS MWCHAN   STAT TT TIME COMMAND
204 452 53567   0  20  0 244064932 2180 vm map ( DL+  13  0:10.31 ./chain 
100
server% cat src/mill/chain.c
#include 
#include 
#include 

coroutine void f(chan left, chan right) {
chs(left, int, 1 + chr(right, int));
}

int main(int argc, char **argv) {
int i, n = argv[1] ? atoi(argv[1]) : 1;
chan leftmost = chmake(int, 0);
chan left = NULL;
chan right = leftmost;
for (i = 0; i < n; i++) {
left = right;
right = chmake(int, 0);
go(f(left, right));
}
chs(right, int, 0);
i = chr(leftmost, int);
printf("result = %d\n", i);
return 0;
}
server%

-- 
Peter Jeremy


signature.asc
Description: PGP signature


Re: Unkillable process in "vm map (user)"

2017-12-10 Thread Konstantin Belousov
On Mon, Dec 11, 2017 at 07:09:31AM +1100, Peter Jeremy wrote:
> I was experimenting with ports/devel/libmill (which is a library that
> provides Go-styly functionality for C programs) and managed to create
> an unkillable process by spawning 100 "goroutines" (think very
> cheap "thread" or "coroutine") joined by "channels" (think message
> passing pipes).  (The program ran basically instantaneously with 1
> or 10 "goroutines", and the Go version has no problems with 100
> goroutines on a much smaller system).
> 
> According to SIGINFO, it's blocked on "vm map (user)" but I can't kill
> it.  Can anyone suggest a way to unwedge it?
> 
> This is on a system running FreeBSD/amd64 11.1-STABLE r324494.
Ensure that you use at least r326188.

> 
> server% procstat -kk 452
>   PIDTID COMMTDNAME  KSTACK
>   452 102382 chain   -   mi_switch+0x17c 
> sleepq_switch+0x118 sleepq_wait+0x43 _sx_slock_hard+0x34e _sx_slock+0xd4 
> vm_map_lookup+0xbd vm_fault_hold+0x194b vm_fault+0x75 trap_pfault+0x107 
> trap+0x382 calltrap+0x8

There is another thread owning the map lock, and seeing what that thread
does is the next step.

Can you provide a binary to reproduce which does not depend on any
library except the base libs ?
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Unkillable process in "vm map (user)"

2017-12-22 Thread Peter Holm
On Sun, Dec 10, 2017 at 10:42:17PM +0200, Konstantin Belousov wrote:
> On Mon, Dec 11, 2017 at 07:09:31AM +1100, Peter Jeremy wrote:
> > I was experimenting with ports/devel/libmill (which is a library that
> > provides Go-styly functionality for C programs) and managed to create
> > an unkillable process by spawning 100 "goroutines" (think very
> > cheap "thread" or "coroutine") joined by "channels" (think message
> > passing pipes).  (The program ran basically instantaneously with 1
> > or 10 "goroutines", and the Go version has no problems with 100
> > goroutines on a much smaller system).
> > 
> > According to SIGINFO, it's blocked on "vm map (user)" but I can't kill
> > it.  Can anyone suggest a way to unwedge it?
> > 
> > This is on a system running FreeBSD/amd64 11.1-STABLE r324494.
> Ensure that you use at least r326188.
> 
> > 
> > server% procstat -kk 452
> >   PIDTID COMMTDNAME  KSTACK
> >   452 102382 chain   -   mi_switch+0x17c 
> > sleepq_switch+0x118 sleepq_wait+0x43 _sx_slock_hard+0x34e _sx_slock+0xd4 
> > vm_map_lookup+0xbd vm_fault_hold+0x194b vm_fault+0x75 trap_pfault+0x107 
> > trap+0x382 calltrap+0x8
> 
> There is another thread owning the map lock, and seeing what that thread
> does is the next step.
> 
> Can you provide a binary to reproduce which does not depend on any
> library except the base libs ?

Here's some more info, using the original scenario:
https://people.freebsd.org/~pho/stress/log/kostik1070.txt

- Peter
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Unkillable process in "vm map (user)"

2017-12-22 Thread Konstantin Belousov
On Fri, Dec 22, 2017 at 10:26:07AM +0100, Peter Holm wrote:
> Here's some more info, using the original scenario:
> https://people.freebsd.org/~pho/stress/log/kostik1070.txt

This is somewhat weird but also not too puzzling.

The vmdaemon (pid 41) is running, it tries to reduce the count of resident
pages in some pmap, most likely the one from the pid 20655.  This process
seems to be huge: according to the v_stats, there is 15681264 inactive pages,
and the pagedaemon tries to obtain a vm object lock which is owned by
vmdaemon, resident count for that object is 15897170 (~64Gb).

So basically almost all memory belongs to the single object and vmdaemon
processing it.  Since the object' queue is huge, the map and the object
locks are taken for long time, preventing other processes touching them
from making a progress.

Might be try this (it combines new changes with the OOM patch). I am not
sure that should_yield() in the vm_swapout_object_deactivate_pages() is
a good idea unconditionally, but it might be better than the current
situation.

diff --git a/sys/vm/vm_fault.c b/sys/vm/vm_fault.c
index ece496407c2..ce6208569c6 100644
--- a/sys/vm/vm_fault.c
+++ b/sys/vm/vm_fault.c
@@ -134,6 +134,16 @@ static void vm_fault_dontneed(const struct faultstate *fs, 
vm_offset_t vaddr,
 static void vm_fault_prefault(const struct faultstate *fs, vm_offset_t addra,
int backward, int forward);
 
+static int vm_pfault_oom_attempts = 3;
+SYSCTL_INT(_vm, OID_AUTO, pfault_oom_attempts, CTLFLAG_RWTUN,
+&vm_pfault_oom_attempts, 0,
+"");
+
+static int vm_pfault_oom_wait = 10;
+SYSCTL_INT(_vm, OID_AUTO, pfault_oom_wait, CTLFLAG_RWTUN,
+&vm_pfault_oom_wait, 0,
+"");
+
 static inline void
 release_page(struct faultstate *fs)
 {
@@ -531,7 +541,7 @@ vm_fault_hold(vm_map_t map, vm_offset_t vaddr, vm_prot_t 
fault_type,
vm_pindex_t retry_pindex;
vm_prot_t prot, retry_prot;
int ahead, alloc_req, behind, cluster_offset, error, era, faultcount;
-   int locked, nera, result, rv;
+   int locked, nera, oom, result, rv;
u_char behavior;
boolean_t wired;/* Passed by reference. */
bool dead, hardfault, is_first_object_locked;
@@ -542,7 +552,9 @@ vm_fault_hold(vm_map_t map, vm_offset_t vaddr, vm_prot_t 
fault_type,
nera = -1;
hardfault = false;
 
-RetryFault:;
+RetryFault:
+   oom = 0;
+RetryFault_oom:
 
/*
 * Find the backing store object and offset into it to begin the
@@ -787,7 +799,17 @@ RetryFault:;
}
if (fs.m == NULL) {
unlock_and_deallocate(&fs);
-   VM_WAITPFAULT;
+   if (vm_pfault_oom_attempts < 0 ||
+   oom < vm_pfault_oom_attempts) {
+   oom++;
+   vm_waitpfault(vm_pfault_oom_wait);
+   goto RetryFault_oom;
+   }
+   if (bootverbose)
+   printf(
+   "proc %d (%s) failed to alloc page on fault, starting OOM\n",
+   curproc->p_pid, curproc->p_comm);
+   vm_pageout_oom(VM_OOM_MEM_PF);
goto RetryFault;
}
}
diff --git a/sys/vm/vm_page.c b/sys/vm/vm_page.c
index 0397dfef457..7d403497550 100644
--- a/sys/vm/vm_page.c
+++ b/sys/vm/vm_page.c
@@ -2652,7 +2652,7 @@ vm_page_reclaim_contig(int req, u_long npages, vm_paddr_t 
low, vm_paddr_t high,
  * - Called in various places before memory allocations.
  */
 static void
-_vm_wait(void)
+_vm_wait(int timo)
 {
 
mtx_assert(&vm_page_queue_free_mtx, MA_OWNED);
@@ -2669,16 +2669,16 @@ _vm_wait(void)
}
vm_pages_needed = true;
msleep(&vm_cnt.v_free_count, &vm_page_queue_free_mtx, PDROP | 
PVM,
-   "vmwait", 0);
+   "vmwait", timo * hz);
}
 }
 
 void
-vm_wait(void)
+vm_wait(int timo)
 {
 
mtx_lock(&vm_page_queue_free_mtx);
-   _vm_wait();
+   _vm_wait(timo);
 }
 
 /*
@@ -2703,7 +2703,7 @@ vm_page_alloc_fail(vm_object_t object, int req)
if (req & (VM_ALLOC_WAITOK | VM_ALLOC_WAITFAIL)) {
if (object != NULL) 
VM_OBJECT_WUNLOCK(object);
-   _vm_wait();
+   _vm_wait(0);
if (object != NULL) 
VM_OBJECT_WLOCK(object);
if (req & VM_ALLOC_WAITOK)
@@ -2724,7 +2724,7 @@ vm_page_alloc_fail(vm_object_t object, int req)
  *   this balance without careful testing first.
  */
 void
-vm_waitpfault(void)
+vm_waitpfault(int timo)
 {
 
mtx_lock(&vm_page_queue_free_mtx);
@@ -2734,7 +2734,7 @@ vm_waitpfault(void)
}
vm_pages_needed = tr

Re: Unkillable process in "vm map (user)"

2017-12-22 Thread Peter Holm
On Fri, Dec 22, 2017 at 02:45:21PM +0200, Konstantin Belousov wrote:
> On Fri, Dec 22, 2017 at 10:26:07AM +0100, Peter Holm wrote:
> > Here's some more info, using the original scenario:
> > https://people.freebsd.org/~pho/stress/log/kostik1070.txt
> 
> This is somewhat weird but also not too puzzling.
> 
> The vmdaemon (pid 41) is running, it tries to reduce the count of resident
> pages in some pmap, most likely the one from the pid 20655.  This process
> seems to be huge: according to the v_stats, there is 15681264 inactive pages,
> and the pagedaemon tries to obtain a vm object lock which is owned by
> vmdaemon, resident count for that object is 15897170 (~64Gb).
> 
> So basically almost all memory belongs to the single object and vmdaemon
> processing it.  Since the object' queue is huge, the map and the object
> locks are taken for long time, preventing other processes touching them
> from making a progress.
> 
> Might be try this (it combines new changes with the OOM patch). I am not
> sure that should_yield() in the vm_swapout_object_deactivate_pages() is
> a good idea unconditionally, but it might be better than the current
> situation.
> 
> diff --git a/sys/vm/vm_fault.c b/sys/vm/vm_fault.c
> index ece496407c2..ce6208569c6 100644
> --- a/sys/vm/vm_fault.c

The patch fixes the problem I got with this scenario.

- Peter
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"