Re: Blocked process

2009-09-08 Thread Daniel O'Connor
On Thu, 20 Aug 2009, Daniel O'Connor wrote:
> Unfortunately the system is in Finland and I'm in Australia so I
> can't sit at the console :(

Someone visited the site and determined that the floppy drive cable was 
intermittently fouling the CPU fan. I believe this was causing the CPU 
to overheat and be throttled by the BIOS.

-- 
Daniel O'Connor software and network engineer
for Genesis Software - http://www.gsoft.com.au
"The nice thing about standards is that there
are so many of them to choose from."
  -- Andrew Tanenbaum
GPG Fingerprint - 5596 B766 97C0 0E94 4347 295E E593 DC20 7B3F CE8C


signature.asc
Description: This is a digitally signed message part.


Re: Blocked process

2009-08-22 Thread Daniel O'Connor
On Sat, 22 Aug 2009, Robert Watson wrote:
> On Sat, 22 Aug 2009, Daniel O'Connor wrote:
> > On Fri, 21 Aug 2009, CmdLnKid wrote:
> >> came back or the machine was rebooted. I continued for a while
> >> using /var/mail over NFS while setting or unset mail variables for
> >> the shell. You may also want to check into whether something is
> >> trying to acquire a lock on a file over that NFS mount which could
> >> accrue some extra time making it seem like a process is hung.
> >
> > We don't have any NFS mounts so I don't think that's it :(
>
> A number of issues were corrected over the course of the 6.x life
> span involving scheduing, including some relating to "lost wakeups". 
> Many bug fixes relating to threading were also introduced (not sure
> if that's relevant to your workload).  While it's never a
> particularly fun recommendation, I think I'd suggest sliding forward
> to the most recent 6.x kernel (but otherwise identical
> configuration), perhaps sticking with your current userspace, and
> seeing if that resolves the issue.

OK, should be feasible to do that I think. Luckily people at this site 
don't need to drive for a few hours to fix the PC :)

Thanks.

-- 
Daniel O'Connor software and network engineer
for Genesis Software - http://www.gsoft.com.au
"The nice thing about standards is that there
are so many of them to choose from."
  -- Andrew Tanenbaum
GPG Fingerprint - 5596 B766 97C0 0E94 4347 295E E593 DC20 7B3F CE8C


signature.asc
Description: This is a digitally signed message part.


Re: Blocked process

2009-08-22 Thread Robert Watson


On Sat, 22 Aug 2009, Daniel O'Connor wrote:


On Fri, 21 Aug 2009, CmdLnKid wrote:
came back or the machine was rebooted. I continued for a while using 
/var/mail over NFS while setting or unset mail variables for the shell. You 
may also want to check into whether something is trying to acquire a lock 
on a file over that NFS mount which could accrue some extra time making it 
seem like a process is hung.


We don't have any NFS mounts so I don't think that's it :(


Hi Daniel--

A number of issues were corrected over the course of the 6.x life span 
involving scheduing, including some relating to "lost wakeups".  Many bug 
fixes relating to threading were also introduced (not sure if that's relevant 
to your workload).  While it's never a particularly fun recommendation, I 
think I'd suggest sliding forward to the most recent 6.x kernel (but otherwise 
identical configuration), perhaps sticking with your current userspace, and 
seeing if that resolves the issue.


Robert N M Watson
Computer Laboratory
University of Cambridge
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Blocked process

2009-08-22 Thread Daniel O'Connor
On Fri, 21 Aug 2009, CmdLnKid wrote:
> came back or the machine was rebooted. I continued for a while using
> /var/mail over NFS while setting or unset mail variables for the
> shell. You may also want to check into whether something is trying to
> acquire a lock on a file over that NFS mount which could accrue some
> extra time making it seem like a process is hung.

We don't have any NFS mounts so I don't think that's it :(

-- 
Daniel O'Connor software and network engineer
for Genesis Software - http://www.gsoft.com.au
"The nice thing about standards is that there
are so many of them to choose from."
  -- Andrew Tanenbaum
GPG Fingerprint - 5596 B766 97C0 0E94 4347 295E E593 DC20 7B3F CE8C


signature.asc
Description: This is a digitally signed message part.


Re: Blocked process

2009-08-20 Thread CmdLnKid

On Thu, 20 Aug 2009 08:42 -, doconnor wrote:


On Thu, 20 Aug 2009, Kostik Belousov wrote:

Things like ls on the console might take several seconds to respond
when the box didn't seem to be very busy (but wasn't idle, maybe
serving a little NFS). It wasn't the shell getting swapped out or
anything else obvious. This was on SMP, not using X. The problem
went away with 6.4R (had to stay with 6.x for unrelated reasons).


6.1 was released with a bug in NFS server, causing serious slowdown
when non-MPSAFE fs was exported.


Hmmm.. this is 6.2 (and a half) so I guess that's not my problem.

Next! ;)



I had a problem like this once when the NFS mount stopped responding and 
any command that was issued seemed to hang. This all was happening while 
not paying attention to the NFS mount and that mount being various 
directories under /var and including /var/mail. A little deeper I 
eventually came across and what made me feel pretty stupid is that the 
"$SHELL" whether it be csh, ksh, bash or sh checks for mail on command 
completion or invocation and being so that the NFS mount stopped 
responding the process would hang until the mount came back or the 
machine was rebooted. I continued for a while using /var/mail over NFS 
while setting or unset mail variables for the shell. You may also 
want to check into whether something is trying to acquire a lock on a 
file over that NFS mount which could accrue some extra time making it 
seem like a process is hung.


--

 - (2^(N-1))
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Blocked process

2009-08-20 Thread Daniel O'Connor
On Thu, 20 Aug 2009, Kostik Belousov wrote:
> > Things like ls on the console might take several seconds to respond
> > when the box didn't seem to be very busy (but wasn't idle, maybe
> > serving a little NFS). It wasn't the shell getting swapped out or
> > anything else obvious. This was on SMP, not using X. The problem
> > went away with 6.4R (had to stay with 6.x for unrelated reasons).
>
> 6.1 was released with a bug in NFS server, causing serious slowdown
> when non-MPSAFE fs was exported.

Hmmm.. this is 6.2 (and a half) so I guess that's not my problem.

Next! ;)

-- 
Daniel O'Connor software and network engineer
for Genesis Software - http://www.gsoft.com.au
"The nice thing about standards is that there
are so many of them to choose from."
  -- Andrew Tanenbaum
GPG Fingerprint - 5596 B766 97C0 0E94 4347 295E E593 DC20 7B3F CE8C


signature.asc
Description: This is a digitally signed message part.


Re: Blocked process

2009-08-20 Thread Kostik Belousov
On Thu, Aug 20, 2009 at 12:35:46PM +0100, Bob Bishop wrote:
> 
> On 20 Aug 2009, at 12:06, Daniel O'Connor wrote:
> 
> >On Thu, 20 Aug 2009, Daniel O'Connor wrote:
> >>On Thu, 20 Aug 2009, Bob Bishop wrote:
> >>>On 20 Aug 2009, at 08:25, Daniel O'Connor wrote:
> This is running in 6.2 ish using 4BSD, I was under the impression
> ULE wasn't very stable in 6.2.
> 
> I could probably try it though...
> >>>
> >>>Hmm. ISTR having similar problems around the 6.1-2 era. You might
> >>>try 6.4 if that's possible for you.
> >>
> >>Someone is going to visit it, but if I can't solve it remotely I'll
> >>probably just update it to 7.2 or so.
> >
> >What sort of problems did you have BTW?
> 
> Things like ls on the console might take several seconds to respond  
> when the box didn't seem to be very busy (but wasn't idle, maybe  
> serving a little NFS). It wasn't the shell getting swapped out or  
> anything else obvious. This was on SMP, not using X. The problem went  
> away with 6.4R (had to stay with 6.x for unrelated reasons).

6.1 was released with a bug in NFS server, causing serious slowdown
when non-MPSAFE fs was exported.


pgpsWKZDXLIe8.pgp
Description: PGP signature


Re: Blocked process

2009-08-20 Thread Bob Bishop


On 20 Aug 2009, at 12:06, Daniel O'Connor wrote:


On Thu, 20 Aug 2009, Daniel O'Connor wrote:

On Thu, 20 Aug 2009, Bob Bishop wrote:

On 20 Aug 2009, at 08:25, Daniel O'Connor wrote:

This is running in 6.2 ish using 4BSD, I was under the impression
ULE wasn't very stable in 6.2.

I could probably try it though...


Hmm. ISTR having similar problems around the 6.1-2 era. You might
try 6.4 if that's possible for you.


Someone is going to visit it, but if I can't solve it remotely I'll
probably just update it to 7.2 or so.


What sort of problems did you have BTW?


Things like ls on the console might take several seconds to respond  
when the box didn't seem to be very busy (but wasn't idle, maybe  
serving a little NFS). It wasn't the shell getting swapped out or  
anything else obvious. This was on SMP, not using X. The problem went  
away with 6.4R (had to stay with 6.x for unrelated reasons).


--
Bob Bishop
r...@gid.co.uk




___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Blocked process

2009-08-20 Thread Daniel O'Connor
On Thu, 20 Aug 2009, Daniel O'Connor wrote:
> On Thu, 20 Aug 2009, Bob Bishop wrote:
> > On 20 Aug 2009, at 08:25, Daniel O'Connor wrote:
> > > This is running in 6.2 ish using 4BSD, I was under the impression
> > > ULE wasn't very stable in 6.2.
> > >
> > > I could probably try it though...
> >
> > Hmm. ISTR having similar problems around the 6.1-2 era. You might
> > try 6.4 if that's possible for you.
>
> Someone is going to visit it, but if I can't solve it remotely I'll
> probably just update it to 7.2 or so.

What sort of problems did you have BTW?

-- 
Daniel O'Connor software and network engineer
for Genesis Software - http://www.gsoft.com.au
"The nice thing about standards is that there
are so many of them to choose from."
  -- Andrew Tanenbaum
GPG Fingerprint - 5596 B766 97C0 0E94 4347 295E E593 DC20 7B3F CE8C


signature.asc
Description: This is a digitally signed message part.


Re: Blocked process

2009-08-20 Thread Daniel O'Connor
On Thu, 20 Aug 2009, Bob Bishop wrote:
> On 20 Aug 2009, at 08:25, Daniel O'Connor wrote:
> > This is running in 6.2 ish using 4BSD, I was under the impression
> > ULE wasn't very stable in 6.2.
> >
> > I could probably try it though...
>
> Hmm. ISTR having similar problems around the 6.1-2 era. You might try
> 6.4 if that's possible for you.

Someone is going to visit it, but if I can't solve it remotely I'll 
probably just update it to 7.2 or so.

-- 
Daniel O'Connor software and network engineer
for Genesis Software - http://www.gsoft.com.au
"The nice thing about standards is that there
are so many of them to choose from."
  -- Andrew Tanenbaum
GPG Fingerprint - 5596 B766 97C0 0E94 4347 295E E593 DC20 7B3F CE8C


signature.asc
Description: This is a digitally signed message part.


Re: Blocked process

2009-08-20 Thread Bob Bishop

On 20 Aug 2009, at 08:25, Daniel O'Connor wrote:


This is running in 6.2 ish using 4BSD, I was under the impression ULE
wasn't very stable in 6.2.

I could probably try it though...



Hmm. ISTR having similar problems around the 6.1-2 era. You might try  
6.4 if that's possible for you.


--
Bob Bishop
r...@gid.co.uk




___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Blocked process

2009-08-20 Thread Daniel O'Connor
On Thu, 20 Aug 2009, Bob Bishop wrote:
> Hi,
>
> On 20 Aug 2009, at 03:34, Daniel O'Connor wrote:
> > [...]
> > The problem appears to now be that the userland process that reads
> > data
> > out of the kernel is being stalled for over 4 seconds. This process
> > reads from the kernel and does some minor processing and then
> > writes it
> > out to a child process to do some more work on it.
> >
> > [...]
> > Given that renice'ing has an effect it seems to be a scheduler
> > problem, [etc]
>
> Which scheduler are you using? Have you tried the other one?

This is running in 6.2 ish using 4BSD, I was under the impression ULE 
wasn't very stable in 6.2.

I could probably try it though...

-- 
Daniel O'Connor software and network engineer
for Genesis Software - http://www.gsoft.com.au
"The nice thing about standards is that there
are so many of them to choose from."
  -- Andrew Tanenbaum
GPG Fingerprint - 5596 B766 97C0 0E94 4347 295E E593 DC20 7B3F CE8C


signature.asc
Description: This is a digitally signed message part.


Re: Blocked process

2009-08-20 Thread Bob Bishop

Hi,

On 20 Aug 2009, at 03:34, Daniel O'Connor wrote:


[...]
The problem appears to now be that the userland process that reads  
data

out of the kernel is being stalled for over 4 seconds. This process
reads from the kernel and does some minor processing and then writes  
it

out to a child process to do some more work on it.

[...]
Given that renice'ing has an effect it seems to be a scheduler  
problem, [etc]


Which scheduler are you using? Have you tried the other one?

--
Bob Bishop
r...@gid.co.uk




___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"