Re: NFS related panic? (was: Re: Killing a zombie process?)

2015-11-03 Thread Rhialto
On Fri 23 Oct 2015 at 00:46:57 +0200, Rhialto wrote: > This problem is very repeatable, usually within a few hours, just now it > happened within half an hour. > > It seems to me that somehow the nfs_reqq list gets corrupted. Then > either there is a crash when traversing it in nfs_timer()

Re: NFS related panic? (was: Re: Killing a zombie process?)

2015-10-22 Thread Rhialto
This problem is very repeatable, usually within a few hours, just now it happened within half an hour. It seems to me that somehow the nfs_reqq list gets corrupted. Then either there is a crash when traversing it in nfs_timer() (occurring in nfs_sigintr() due to being called with a bogus

Re: NFS related panic? (was: Re: Killing a zombie process?)

2015-10-19 Thread Rhialto
On Tue 20 Oct 2015 at 01:04:59 +0200, Rhialto wrote: > with a rebuilt netbsd.gdb (hopefully the addresses match) > > #5 0x806b94b4 in nfs_sigintr (nmp=0x0, rep=0xfe81163730a8, > l=0x0) at ../../../../nfs/nfs_socket.c:871 nmp should not be NULL here... let's look at rep, where it

Re: NFS related panic? (was: Re: Killing a zombie process?)

2015-10-19 Thread Rhialto
with a rebuilt netbsd.gdb (hopefully the addresses match) (gdb) target kvm netbsd.5.core 0x8063d735 in cpu_reboot (howto=howto@entry=260, bootstr=bootstr@entry=0x0) at ../../../../arch/amd64/amd64/machdep.c:671 671 dumpsys(); (gdb) bt #0 0x8063d735 in

NFS related panic? (was: Re: Killing a zombie process?)

2015-10-19 Thread Rhialto
On Fri 16 Oct 2015 at 16:31:18 +0200, J. Hannken-Illjes wrote: > On 16 Oct 2015, at 13:44, Rhialto wrote: > > > "Interesting" results: it built packages overnight (from around 22:30 to > > 12:13, so for nearly 14 hours), then, when I didn't look, it rebooted. > > With panic? I

Re: Killing a zombie process?

2015-10-16 Thread Rhialto
On Thu 15 Oct 2015 at 20:12:44 +0200, Rhialto wrote: > On Thu 15 Oct 2015 at 06:57:42 +0700, Robert Elz wrote: > > Do you really need that mounted twice like that, and if not, can you try > > with one of them missing and see if the problem remains ? > > Good idea, I'll try that later!

Re: Killing a zombie process?

2015-10-16 Thread Rhialto
On Fri 16 Oct 2015 at 16:29:55 +0200, J. Hannken-Illjes wrote: > Looks like we are waiting for a NFS operation to complete. > > Did the machine hang here? No, but I didn't try specifically to access the nfs volumes. Interestingly enough, after the reboot (which used the stock 7.0 GENERIC

Re: Killing a zombie process?

2015-10-16 Thread J. Hannken-Illjes
On 15 Oct 2015, at 00:21, Rhialto wrote: > On Wed 14 Oct 2015 at 09:39:40 +0200, J. Hannken-Illjes wrote: >> Looks like a deadlock, two threads in tstile. >> >> Please take a backtrace (with arguments) of these threads. > > I've got a whole lot more in tstile, and that is even

Re: Killing a zombie process?

2015-10-16 Thread Rhialto
On Fri 16 Oct 2015 at 16:31:18 +0200, J. Hannken-Illjes wrote: > On 16 Oct 2015, at 13:44, Rhialto wrote: > > > On Thu 15 Oct 2015 at 20:12:44 +0200, Rhialto wrote: > >> On Thu 15 Oct 2015 at 06:57:42 +0700, Robert Elz wrote: > >>> Do you really need that mounted twice like

Re: Killing a zombie process?

2015-10-16 Thread J. Hannken-Illjes
On 16 Oct 2015, at 13:44, Rhialto wrote: > On Thu 15 Oct 2015 at 20:12:44 +0200, Rhialto wrote: >> On Thu 15 Oct 2015 at 06:57:42 +0700, Robert Elz wrote: >>> Do you really need that mounted twice like that, and if not, can you try >>> with one of them missing and see if the

Re: Killing a zombie process?

2015-10-15 Thread Rhialto
On Thu 15 Oct 2015 at 06:57:42 +0700, Robert Elz wrote: > I do wonder about ... > > | procfs on /usr/pkg/emul/linux32/proc type procfs (read-only, local) > | procfs on /usr/pkg/emul/linux32/proc type procfs (local) Ah good catch. That seems to be a botched attempt to mount the linux procfs

Re: Killing a zombie process?

2015-10-14 Thread J. Hannken-Illjes
On 14 Oct 2015, at 00:20, Rhialto wrote: > I may have something similar; with 7.0/amd64 GENERIC kernel. > > I've been doing builds in pkg_comp with the chroot directory and /usr/pkgsrc > mounted over nfs. After some packages, some processes simply don't terminate. > > Some of

Re: Killing a zombie process?

2015-10-14 Thread Rhialto
On Thu 15 Oct 2015 at 00:21:55 +0200, Rhialto wrote: > I've got a whole lot more in tstile, and that is even just from running > pkg_comp in the chroot. I didn't try to interrupt anything yet. I forgot to mention that this is with a kernel cvs'ed about 24 hours ago. So this issue isn't the same

Re: Killing a zombie process?

2015-10-14 Thread Robert Elz
Date:Thu, 15 Oct 2015 00:21:55 +0200 From:Rhialto Message-ID: <20151014222155.ga25...@falu.nl> First, I agree this has nothing at all do do with the zombie refcount issue (nothing to do with zombies, or process lists or anything slightly related). I

Re: Killing a zombie process?

2015-10-13 Thread Rhialto
I may have something similar; with 7.0/amd64 GENERIC kernel. I've been doing builds in pkg_comp with the chroot directory and /usr/pkgsrc mounted over nfs. After some packages, some processes simply don't terminate. Some of my processes are now (after trying to exit pkg_comp which hangs) UID

Re: Killing a zombie process?

2015-10-04 Thread Paul Goyette
On Sun, 4 Oct 2015, Robert Elz wrote: Date:Sun, 4 Oct 2015 17:25:21 +0800 (PHT) From:Paul Goyette Message-ID: | I'm pretty much convinced that the p_nstopchild accounting is screwed up |

Re: Killing a zombie process?

2015-10-04 Thread Paul Goyette
I'm pretty much convinced that the p_nstopchild accounting is screwed up somewhere. I'm planning on adding the following code in "optimization" in kern_exit so I can catch it as soon as it happens. Basically, if the optimization would cause us to stop looking for a process to report, this

Re: Killing a zombie process?

2015-10-04 Thread Robert Elz
Date:Sun, 4 Oct 2015 17:25:21 +0800 (PHT) From:Paul Goyette Message-ID: | I'm pretty much convinced that the p_nstopchild accounting is screwed up | somewhere. I think I agree. |

Re: Killing a zombie process?

2015-10-04 Thread Paul Goyette
On Sun, 4 Oct 2015, Paul Goyette wrote: | 1. Is it correct for init's p_nstopchild to be zero when it has several | children whose p_state is SSTOP? Depends whether those children have previously been waited for or not. Stopped children don't go away when they're waited for, so there

Re: Killing a zombie process?

2015-10-04 Thread Robert Elz
Date:Sun, 4 Oct 2015 20:52:43 +0800 (PHT) From:Paul Goyette Message-ID: | I do occassionally switch to another wsdisplay screen (away from the X | one), but not frequently. I

Re: Killing a zombie process?

2015-10-03 Thread Robert Elz
Date:Fri, 2 Oct 2015 15:26:42 +0800 (PHT) From:Paul Goyette Message-ID: | 1. Is it correct for init's p_nstopchild to be zero when it has several | children whose p_state is SSTOP?

Re: Killing a zombie process?

2015-10-03 Thread Paul Goyette
On Sun, 4 Oct 2015, Robert Elz wrote: Date:Fri, 2 Oct 2015 15:26:42 +0800 (PHT) From:Paul Goyette Message-ID: | 1. Is it correct for init's p_nstopchild to be zero when it has several |

Re: Killing a zombie process?

2015-10-02 Thread Paul Goyette
On Fri, 2 Oct 2015, Paul Goyette wrote: For now, I took a quick look into the zombie's struct proc. p_exitsig = 0x14 = SIGCHILD p_flag= 0x0 p_sflag = 0x2000 = PS_WEXIT p_slflag = 0x0 p_lflag = 0x2= PL_CONTROLT p_stflag = 0x0

Re: Killing a zombie process?

2015-10-02 Thread Paul Goyette
On Fri, 2 Oct 2015, Paul Goyette wrote: On Fri, 2 Oct 2015, Paul Goyette wrote: For now, I took a quick look into the zombie's struct proc. p_exitsig = 0x14 = SIGCHILD p_flag= 0x0 p_sflag = 0x2000 = PS_WEXIT p_slflag = 0x0 p_lflag = 0x2=

Re: Killing a zombie process?

2015-10-01 Thread Paul Goyette
On Fri, 2 Oct 2015, Paul Goyette wrote: Still trying to track this down A modified version of ps(1) shows that the process state is clearly LSZOMB and not LSDEAD. Furthermore, "ps -s" doesn't show any LWP for the zombie process, so it would seem that process clean up has progressed

Re: Killing a zombie process?

2015-10-01 Thread Paul Goyette
Still trying to track this down A modified version of ps(1) shows that the process state is clearly LSZOMB and not LSDEAD. Furthermore, "ps -s" doesn't show any LWP for the zombie process, so it would seem that process clean up has progressed relatively far. I was able to use "ps axl

Re: Killing a zombie process?

2015-09-30 Thread Brian Buhrow
, 3:55pm, Paul Goyette wrote: } Subject: Re: Killing a zombie process? } On Wed, 30 Sep 2015, Paul Goyette wrote: } } >> # kill -HUP 1 } >> # ps axl | grep ' Z ' } >> 0 27237 1 0 0 0 0 0 - Zpts/2- 0:00.00 } >> (sh) } > } > Well

Re: Killing a zombie process?

2015-09-30 Thread Paul Goyette
On Wed, 30 Sep 2015, Paul Goyette wrote: # kill -HUP 1 # ps axl | grep ' Z ' 0 27237 1 0 0 0 0 0 - Zpts/2- 0:00.00 (sh) Well, it happened again! I rebooted earlier today, and then deinstalled and rebuilt about 40 packages within the

Re: Killing a zombie process?

2015-09-30 Thread Robert Elz
Date:Wed, 30 Sep 2015 15:55:04 +0800 (PHT) From:Paul Goyette Message-ID: | So there must be some difference in how init(8) waits during normal | operation and how it waits during the

Re: Killing a zombie process?

2015-09-30 Thread Robert Elz
Date:Wed, 30 Sep 2015 18:29:20 +0800 (PHT) From:Paul Goyette Message-ID: | Well, a quick read through sbin/init.c shows that sometimes it waits | with WNOHANG and sometimes it doesn't.

Re: Killing a zombie process?

2015-09-30 Thread Paul Goyette
On Wed, 30 Sep 2015, Robert Elz wrote: Date:Wed, 30 Sep 2015 15:55:04 +0800 (PHT) From:Paul Goyette Message-ID: | So there must be some difference in how init(8) waits during normal |

Re: Killing a zombie process?

2015-09-30 Thread Paul Goyette
: } Subject: Re: Killing a zombie process? } On Wed, 30 Sep 2015, Paul Goyette wrote: } } >> # kill -HUP 1 } >> # ps axl | grep ' Z ' } >> 0 27237 1 0 0 0 0 0 - Zpts/2- 0:00.00 } >> (sh) } > } > Well, it happened again! }

Killing a zombie process?

2015-09-24 Thread Paul Goyette
I'm not sure how I got to this point (but see high-level steps below). I have this zombie process: root27237 0.0 0.0 0 0 pts/2- Z - 0:00.00 (sh) Various web resources say "kill the parent" and the zombie child will die, too. But that's probably not a good idea here,

Re: Killing a zombie process?

2015-09-24 Thread Gary Duzan
In Message , Paul Goyette wrote: =>I'm not sure how I got to this point (but see high-level steps below). =>I have this zombie process: => =>root27237 0.0 0.0 0 0 pts/2- Z - 0:00.00 (sh) =>

Re: Killing a zombie process?

2015-09-24 Thread Paul Goyette
On Thu, 24 Sep 2015, Gary Duzan wrote: In Message , Paul Goyette wrote: =>I'm not sure how I got to this point (but see high-level steps below). =>I have this zombie process: => =>root27237 0.0 0.0 0

Re: Killing a zombie process?

2015-09-24 Thread Paul Goyette
On Thu, 24 Sep 2015, Greg Troxel wrote: Paul Goyette writes: On Thu, 24 Sep 2015, Gary Duzan wrote: Yup, my zombie's parent PPID==1 If init is really its parent, check its "ps axl" output and check its WCHAN. If it isn't "wait", maybe run "ktruss -p 1" to get an

Re: Killing a zombie process?

2015-09-24 Thread Greg Troxel
Paul Goyette writes: > On Thu, 24 Sep 2015, Gary Duzan wrote: > Yup, my zombie's parent PPID==1 > >> If init is really its parent, check its "ps axl" output and >> check its WCHAN. If it isn't "wait", maybe run "ktruss -p 1" to >> get an idea of what it is doing