On Fri 23 Oct 2015 at 00:46:57 +0200, Rhialto wrote:
> This problem is very repeatable, usually within a few hours, just now it
> happened within half an hour.
>
> It seems to me that somehow the nfs_reqq list gets corrupted. Then
> either there is a crash when traversing it in nfs_timer()
This problem is very repeatable, usually within a few hours, just now it
happened within half an hour.
It seems to me that somehow the nfs_reqq list gets corrupted. Then
either there is a crash when traversing it in nfs_timer() (occurring in
nfs_sigintr() due to being called with a bogus
On Tue 20 Oct 2015 at 01:04:59 +0200, Rhialto wrote:
> with a rebuilt netbsd.gdb (hopefully the addresses match)
>
> #5 0x806b94b4 in nfs_sigintr (nmp=0x0, rep=0xfe81163730a8,
> l=0x0) at ../../../../nfs/nfs_socket.c:871
nmp should not be NULL here... let's look at rep, where it
with a rebuilt netbsd.gdb (hopefully the addresses match)
(gdb) target kvm netbsd.5.core
0x8063d735 in cpu_reboot (howto=howto@entry=260,
bootstr=bootstr@entry=0x0) at ../../../../arch/amd64/amd64/machdep.c:671
671 dumpsys();
(gdb) bt
#0 0x8063d735 in
On Fri 16 Oct 2015 at 16:31:18 +0200, J. Hannken-Illjes wrote:
> On 16 Oct 2015, at 13:44, Rhialto wrote:
>
> > "Interesting" results: it built packages overnight (from around 22:30 to
> > 12:13, so for nearly 14 hours), then, when I didn't look, it rebooted.
>
> With panic?
I
On Thu 15 Oct 2015 at 20:12:44 +0200, Rhialto wrote:
> On Thu 15 Oct 2015 at 06:57:42 +0700, Robert Elz wrote:
> > Do you really need that mounted twice like that, and if not, can you try
> > with one of them missing and see if the problem remains ?
>
> Good idea, I'll try that later!
On Fri 16 Oct 2015 at 16:29:55 +0200, J. Hannken-Illjes wrote:
> Looks like we are waiting for a NFS operation to complete.
>
> Did the machine hang here?
No, but I didn't try specifically to access the nfs volumes.
Interestingly enough, after the reboot (which used the stock 7.0 GENERIC
On 15 Oct 2015, at 00:21, Rhialto wrote:
> On Wed 14 Oct 2015 at 09:39:40 +0200, J. Hannken-Illjes wrote:
>> Looks like a deadlock, two threads in tstile.
>>
>> Please take a backtrace (with arguments) of these threads.
>
> I've got a whole lot more in tstile, and that is even
On Fri 16 Oct 2015 at 16:31:18 +0200, J. Hannken-Illjes wrote:
> On 16 Oct 2015, at 13:44, Rhialto wrote:
>
> > On Thu 15 Oct 2015 at 20:12:44 +0200, Rhialto wrote:
> >> On Thu 15 Oct 2015 at 06:57:42 +0700, Robert Elz wrote:
> >>> Do you really need that mounted twice like
On 16 Oct 2015, at 13:44, Rhialto wrote:
> On Thu 15 Oct 2015 at 20:12:44 +0200, Rhialto wrote:
>> On Thu 15 Oct 2015 at 06:57:42 +0700, Robert Elz wrote:
>>> Do you really need that mounted twice like that, and if not, can you try
>>> with one of them missing and see if the
On Thu 15 Oct 2015 at 06:57:42 +0700, Robert Elz wrote:
> I do wonder about ...
>
> | procfs on /usr/pkg/emul/linux32/proc type procfs (read-only, local)
> | procfs on /usr/pkg/emul/linux32/proc type procfs (local)
Ah good catch. That seems to be a botched attempt to mount the linux
procfs
On 14 Oct 2015, at 00:20, Rhialto wrote:
> I may have something similar; with 7.0/amd64 GENERIC kernel.
>
> I've been doing builds in pkg_comp with the chroot directory and /usr/pkgsrc
> mounted over nfs. After some packages, some processes simply don't terminate.
>
> Some of
On Thu 15 Oct 2015 at 00:21:55 +0200, Rhialto wrote:
> I've got a whole lot more in tstile, and that is even just from running
> pkg_comp in the chroot. I didn't try to interrupt anything yet.
I forgot to mention that this is with a kernel cvs'ed about 24 hours
ago. So this issue isn't the same
Date:Thu, 15 Oct 2015 00:21:55 +0200
From:Rhialto
Message-ID: <20151014222155.ga25...@falu.nl>
First, I agree this has nothing at all do do with the zombie refcount
issue (nothing to do with zombies, or process lists or anything slightly
related).
I
I may have something similar; with 7.0/amd64 GENERIC kernel.
I've been doing builds in pkg_comp with the chroot directory and /usr/pkgsrc
mounted over nfs. After some packages, some processes simply don't terminate.
Some of my processes are now (after trying to exit pkg_comp which hangs)
UID
On Sun, 4 Oct 2015, Robert Elz wrote:
Date:Sun, 4 Oct 2015 17:25:21 +0800 (PHT)
From:Paul Goyette
Message-ID:
| I'm pretty much convinced that the p_nstopchild accounting is screwed up
|
I'm pretty much convinced that the p_nstopchild accounting is screwed up
somewhere. I'm planning on adding the following code in "optimization"
in kern_exit so I can catch it as soon as it happens.
Basically, if the optimization would cause us to stop looking for a
process to report, this
Date:Sun, 4 Oct 2015 17:25:21 +0800 (PHT)
From:Paul Goyette
Message-ID:
| I'm pretty much convinced that the p_nstopchild accounting is screwed up
| somewhere.
I think I agree.
|
On Sun, 4 Oct 2015, Paul Goyette wrote:
| 1. Is it correct for init's p_nstopchild to be zero when it has several
| children whose p_state is SSTOP?
Depends whether those children have previously been waited for or not.
Stopped children don't go away when they're waited for, so there
Date:Sun, 4 Oct 2015 20:52:43 +0800 (PHT)
From:Paul Goyette
Message-ID:
| I do occassionally switch to another wsdisplay screen (away from the X
| one), but not frequently. I
Date:Fri, 2 Oct 2015 15:26:42 +0800 (PHT)
From:Paul Goyette
Message-ID:
| 1. Is it correct for init's p_nstopchild to be zero when it has several
| children whose p_state is SSTOP?
On Sun, 4 Oct 2015, Robert Elz wrote:
Date:Fri, 2 Oct 2015 15:26:42 +0800 (PHT)
From:Paul Goyette
Message-ID:
| 1. Is it correct for init's p_nstopchild to be zero when it has several
|
On Fri, 2 Oct 2015, Paul Goyette wrote:
For now, I took a quick look into the zombie's struct proc.
p_exitsig = 0x14 = SIGCHILD
p_flag= 0x0
p_sflag = 0x2000 = PS_WEXIT
p_slflag = 0x0
p_lflag = 0x2= PL_CONTROLT
p_stflag = 0x0
On Fri, 2 Oct 2015, Paul Goyette wrote:
On Fri, 2 Oct 2015, Paul Goyette wrote:
For now, I took a quick look into the zombie's struct proc.
p_exitsig = 0x14 = SIGCHILD
p_flag= 0x0
p_sflag = 0x2000 = PS_WEXIT
p_slflag = 0x0
p_lflag = 0x2=
On Fri, 2 Oct 2015, Paul Goyette wrote:
Still trying to track this down
A modified version of ps(1) shows that the process state is clearly LSZOMB
and not LSDEAD. Furthermore, "ps -s" doesn't show any LWP for the zombie
process, so it would seem that process clean up has progressed
Still trying to track this down
A modified version of ps(1) shows that the process state is clearly
LSZOMB and not LSDEAD. Furthermore, "ps -s" doesn't show any LWP for
the zombie process, so it would seem that process clean up has
progressed relatively far.
I was able to use "ps axl
, 3:55pm, Paul Goyette wrote:
} Subject: Re: Killing a zombie process?
} On Wed, 30 Sep 2015, Paul Goyette wrote:
}
} >> # kill -HUP 1
} >> # ps axl | grep ' Z '
} >> 0 27237 1 0 0 0 0 0 - Zpts/2- 0:00.00
} >> (sh)
} >
} > Well
On Wed, 30 Sep 2015, Paul Goyette wrote:
# kill -HUP 1
# ps axl | grep ' Z '
0 27237 1 0 0 0 0 0 - Zpts/2- 0:00.00
(sh)
Well, it happened again!
I rebooted earlier today, and then deinstalled and rebuilt about 40
packages within the
Date:Wed, 30 Sep 2015 15:55:04 +0800 (PHT)
From:Paul Goyette
Message-ID:
| So there must be some difference in how init(8) waits during normal
| operation and how it waits during the
Date:Wed, 30 Sep 2015 18:29:20 +0800 (PHT)
From:Paul Goyette
Message-ID:
| Well, a quick read through sbin/init.c shows that sometimes it waits
| with WNOHANG and sometimes it doesn't.
On Wed, 30 Sep 2015, Robert Elz wrote:
Date:Wed, 30 Sep 2015 15:55:04 +0800 (PHT)
From:Paul Goyette
Message-ID:
| So there must be some difference in how init(8) waits during normal
|
:
} Subject: Re: Killing a zombie process?
} On Wed, 30 Sep 2015, Paul Goyette wrote:
}
} >> # kill -HUP 1
} >> # ps axl | grep ' Z '
} >> 0 27237 1 0 0 0 0 0 - Zpts/2- 0:00.00
} >> (sh)
} >
} > Well, it happened again!
}
I'm not sure how I got to this point (but see high-level steps below).
I have this zombie process:
root27237 0.0 0.0 0 0 pts/2- Z - 0:00.00 (sh)
Various web resources say "kill the parent" and the zombie child will
die, too. But that's probably not a good idea here,
In Message ,
Paul Goyette wrote:
=>I'm not sure how I got to this point (but see high-level steps below).
=>I have this zombie process:
=>
=>root27237 0.0 0.0 0 0 pts/2- Z - 0:00.00 (sh)
=>
On Thu, 24 Sep 2015, Gary Duzan wrote:
In Message ,
Paul Goyette wrote:
=>I'm not sure how I got to this point (but see high-level steps below).
=>I have this zombie process:
=>
=>root27237 0.0 0.0 0
On Thu, 24 Sep 2015, Greg Troxel wrote:
Paul Goyette writes:
On Thu, 24 Sep 2015, Gary Duzan wrote:
Yup, my zombie's parent PPID==1
If init is really its parent, check its "ps axl" output and
check its WCHAN. If it isn't "wait", maybe run "ktruss -p 1" to
get an
Paul Goyette writes:
> On Thu, 24 Sep 2015, Gary Duzan wrote:
> Yup, my zombie's parent PPID==1
>
>> If init is really its parent, check its "ps axl" output and
>> check its WCHAN. If it isn't "wait", maybe run "ktruss -p 1" to
>> get an idea of what it is doing
37 matches
Mail list logo