Re: How to report bugs (Re: 6.2-STABLE deadlock?)

2007-04-27 Thread Marc G. Fournier
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1



- --On Tuesday, April 24, 2007 23:53:16 -0400 Kris Kennaway
[EMAIL PROTECTED] 
wrote:

 On Wed, Apr 25, 2007 at 10:53:08AM +0800, LI Xin wrote:
 Hi, Oleg,

 Oleg Derevenetz wrote:
  ??? LI Xin [EMAIL PROTECTED]:
 [...]
  I'm not very sure if this is specific to one disk controller.  Actually
  I got some occasional reports about similar hangs on amd64 6.2-RELEASE
  (slightly patched version) that most of processes stuck in the 'ufs'
  state, under very light load, the box was equipped with amr(4) RAID.
 
  I was not able to reproduce the problem at my lab, though, it's still
  unknown that how to trigger the livelock :-(  Still need some
  investigate on their production system.
 
  I reported simular issue for FreeBSD 6.2 in audit-trail for kern/104406:
 
  http://www.freebsd.org/cgi/query-pr.cgi?pr=104406cat=
 
  and there should be a thread related to this. Briefly, I suspects that
  this is  related to nullfs filesystems on my server and when I cvsuped to
  FreeBSD 6.2- STABLE with Daichi's unionfs-related patches and replaced
  nullfs-mounted fs  with unionfs-mounted (that was done 10.03.07) problem
  is gone (seems to be so,  at least).

 Hmm...  Seems to be different issues.  The problem I have received was a
 pgsql server (no nullfs/unionfs involved), and the hang always happen
 when it is not being heavily loaded (usually in the morning, for
 instance, and there is no special configuration, like scheduled tasks
 which can generate disk load, etc., only the entropy harvesting), so
 this is quite confusing.

 Yes, a large part of the confusion is the unfortunate tendency of
 people to do the following:

 user1 my system hangs/panics/etc
 user2 my system hangs/panics/etc too; it must be the same problem!

 What we really need is for every FreeBSD user who encounters a
 hang/panic/etc to avoid jumping to conclusions -- no matter how many
 superficial similarities there may seem to you -- and instead go
 through the relevant steps described here:


 http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kernelde
 bug.html

 Until you (or a developer) have analyzed the resulting information,
 you cannot definitively determine whether or not your problem is the
 same as a given random other problem, and you may just confuse the
 issue by making claims of similarity when you are really reporting a
 completely separate problem.

What about those that don't have the benefit of being able to access the 
console? :(  I've recently started buying servers that have builtin, full 
remote console (ie. the HP servers), but, for instance, I have one box that I 
have to consistently reboot ever 3 days due to a 'No Buffer Space Available' 
...

A thought: how hard would it be to add some method of forcing a system crash, 
that would dump core, from the command line?  Something that, by default, would 
be disabled, but for remote debugging purposes, one could enable in the kernel 
and do a 'sysctl kernel.force_core_crash=1' to have it do it?  I imagine that 
having a core to analyze would allow providing more information then nothing at 
all, no?


- 
Marc G. Fournier   Hub.Org Networking Services (http://www.hub.org)
Email . [EMAIL PROTECTED]  MSN . [EMAIL PROTECTED]
Yahoo . yscrappy   Skype: hub.orgICQ . 7615664
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.5 (FreeBSD)

iD8DBQFGMkj34QvfyHIvDvMRAnIsAJ42loBGh0TkX4mfWSrZrMq2FheBuQCgiu4l
B0PCLtLhd9ZiJ4oNLWZ6LT0=
=KK9Y
-END PGP SIGNATURE-

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: How to report bugs (Re: 6.2-STABLE deadlock?)

2007-04-27 Thread Nicolas Rachinsky
* Marc G. Fournier [EMAIL PROTECTED] [2007-04-27 16:03 -0300]:
 A thought: how hard would it be to add some method of forcing a system crash, 
 that would dump core, from the command line?  Something that, by default, 
 would 

Doesn't 'kill -6 1' work anymore?

Nicolas

-- 
http://www.rachinsky.de/nicolas
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: How to report bugs (Re: 6.2-STABLE deadlock?)

2007-04-27 Thread Marc G. Fournier
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1



- --On Friday, April 27, 2007 22:57:29 +0200 Nicolas Rachinsky 
[EMAIL PROTECTED] wrote:

 * Marc G. Fournier [EMAIL PROTECTED] [2007-04-27 16:03 -0300]:
 A thought: how hard would it be to add some method of forcing a system
 crash,  that would dump core, from the command line?  Something that, by
 default, would

 Doesn't 'kill -6 1' work anymore?

I'd never heard of that one ... will it dump core if I do that?

Please note, in my case, with the Buffer Space issue ... I can login and 
cleanly reboot the server, so doing something like the above to get a core dump 
is definitely doable, I'd just never seen a reference to a 'kill -6 1' before 
for doing that ...

Side question to this though ... I remember awhile back using a 'client-server' 
mechanism that allowed me to dump core to a seperate server ... it was so long 
ago that my memory is faint, but there was a reason why I couldn't dump to the 
local server ... not sure whatever happened to that code, but, if one can do 
that for dumping core, shouldn't there be some method possible to connect to 
DDB over the Ethernet without having to have a serial console in place?  For 
the core dump case, the ethernet obviously stayed up while it dump'd, couldn't 
some sort of 'ddb.conf' file be setup that would allow it to ifconfig an IP 
within that shell so that you could connect to it remotely?  say with an 
'from-ip' directive?

Just a thought ...


- 
Marc G. Fournier   Hub.Org Networking Services (http://www.hub.org)
Email . [EMAIL PROTECTED]  MSN . [EMAIL PROTECTED]
Yahoo . yscrappy   Skype: hub.orgICQ . 7615664
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.5 (FreeBSD)

iD8DBQFGMmx04QvfyHIvDvMRAlNcAJ0QcIMoRnq+0T9yJVuMwZvTNQnNXwCfaEKK
JB4cHzSbiklD/sodWvNSSzE=
=BwuL
-END PGP SIGNATURE-

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: How to report bugs (Re: 6.2-STABLE deadlock?)

2007-04-27 Thread Daniel O'Connor
On Saturday 28 April 2007 04:33, Marc G. Fournier wrote:
 A thought: how hard would it be to add some method of forcing a
 system crash, that would dump core, from the command line?  Something
 that, by default, would be disabled, but for remote debugging
 purposes, one could enable in the kernel and do a 'sysctl
 kernel.force_core_crash=1' to have it do it?  I imagine that having a
 core to analyze would allow providing more information then nothing
 at all, no?

I think you can do this..
sysctl debug.kdb.panic=1

Alas that appears to be a -current thing. 6.x has debug.kdb.enter 
though.

-- 
Daniel O'Connor software and network engineer
for Genesis Software - http://www.gsoft.com.au
The nice thing about standards is that there
are so many of them to choose from.
  -- Andrew Tanenbaum
GPG Fingerprint - 5596 B766 97C0 0E94 4347 295E E593 DC20 7B3F CE8C


pgp1RMXdUwoh1.pgp
Description: PGP signature


Re: How to report bugs (Re: 6.2-STABLE deadlock?)

2007-04-25 Thread Oleg Derevenetz
Цитирую Kris Kennaway [EMAIL PROTECTED]:

  Oleg Derevenetz wrote:
   ??? LI Xin [EMAIL PROTECTED]:
  [...]
   I'm not very sure if this is specific to one disk controller. 
 Actually
   I got some occasional reports about similar hangs on amd64
 6.2-RELEASE
   (slightly patched version) that most of processes stuck in the
 'ufs'
   state, under very light load, the box was equipped with amr(4)
 RAID.
  
   I was not able to reproduce the problem at my lab, though, it's
 still
   unknown that how to trigger the livelock :-(  Still need some
   investigate on their production system.
   
   I reported simular issue for FreeBSD 6.2 in audit-trail for
 kern/104406:
   
   http://www.freebsd.org/cgi/query-pr.cgi?pr=104406cat=
   
   and there should be a thread related to this. Briefly, I suspects
 that this is 
   related to nullfs filesystems on my server and when I cvsuped to
 FreeBSD 6.2-
   STABLE with Daichi's unionfs-related patches and replaced
 nullfs-mounted fs 
   with unionfs-mounted (that was done 10.03.07) problem is gone (seems
 to be so, 
   at least).
  
  Hmm...  Seems to be different issues.  The problem I have received was
 a
  pgsql server (no nullfs/unionfs involved), and the hang always happen
  when it is not being heavily loaded (usually in the morning, for
  instance, and there is no special configuration, like scheduled tasks
  which can generate disk load, etc., only the entropy harvesting), so
  this is quite confusing.
 
 Yes, a large part of the confusion is the unfortunate tendency of
 people to do the following:
 
 user1 my system hangs/panics/etc
 user2 my system hangs/panics/etc too; it must be the same problem!
 
 What we really need is for every FreeBSD user who encounters a
 hang/panic/etc to avoid jumping to conclusions -- no matter how many
 superficial similarities there may seem to you -- and instead go
 through the relevant steps described here:
 
  
 http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-
handbook/kerneldebug.html
 
 Until you (or a developer) have analyzed the resulting information,
 you cannot definitively determine whether or not your problem is the
 same as a given random other problem, and you may just confuse the
 issue by making claims of similarity when you are really reporting a
 completely separate problem.

Not all people can do deadlock debugging, though. In my case turning on 
INVARIANTS and WITNESS leads to unacceptable performance penalty due to heavily 
loaded server. So I can only describe my case, actions and result without 
providing any debug information.


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: How to report bugs (Re: 6.2-STABLE deadlock?)

2007-04-25 Thread LI Xin
Oleg Derevenetz wrote:
[snip]
 Not all people can do deadlock debugging, though. In my case turning on 
 INVARIANTS and WITNESS leads to unacceptable performance penalty due to 
 heavily 
 loaded server. So I can only describe my case, actions and result without 
 providing any debug information.

I'd say that I completely agree with Kris because that it's very hard
for developers to investigate problems if there is no detailed
information available, especially for those problems that can not easily
reproduced.  Of course, deadlock debugging could be tricky, but having a
backtrace can usually save a lot of time (and fortunately that is not
that hard even for average users :)

What I wanted to suggest is that, we hope that the submitter can provide
detailed steps to reliably reproduce the problem whenever possible, if
they are not able to diagnose the problem themselves, so we will be able
to extract more information at lab, and possibly reach a fix.

The problem I have is that the reporter of the issue is not quite
cooperative as they did before, and what I wanted to say is that it's
possible to trigger the livelock without nullfs/unionfs, and I did not
figured out why (yet) because I can not reproduce it in my environment :-(

Cheers,
-- 
Xin LI [EMAIL PROTECTED]  http://www.delphij.net/
FreeBSD - The Power to Serve!



signature.asc
Description: OpenPGP digital signature


Re: How to report bugs (Re: 6.2-STABLE deadlock?)

2007-04-25 Thread Kris Kennaway
On Wed, Apr 25, 2007 at 12:14:20PM +0400, Oleg Derevenetz wrote:

  Until you (or a developer) have analyzed the resulting information,
  you cannot definitively determine whether or not your problem is the
  same as a given random other problem, and you may just confuse the
  issue by making claims of similarity when you are really reporting a
  completely separate problem.
 
 Not all people can do deadlock debugging, though. In my case turning on 
 INVARIANTS and WITNESS leads to unacceptable performance penalty due to 
 heavily 
 loaded server. So I can only describe my case, actions and result without 
 providing any debug information.

But you can still do *some* things, e.g. backtraces and/or a coredump:
every little bit helps.

Ultimately, though, you have to understand and accept that the less
information you provide, the less chance there is that a developer
will be able to track down your problem.  In fact a developer may have
to effectively ignore your problem report altogether, because of what
I explained about symptoms usually not being enough to tell one bug
from another.

In general, when you encounter a bug in FreeBSD, you have a little bit
of work to do on your side before we can start doing the rest.  I
understand that you may not be in a position to do that work, but that
means you also need to understand that we can't do it either.

Kris


pgpe7wGSIKiIP.pgp
Description: PGP signature


Re: How to report bugs (Re: 6.2-STABLE deadlock?)

2007-04-25 Thread Oleg Derevenetz
Цитирую Kris Kennaway [EMAIL PROTECTED]:

 On Wed, Apr 25, 2007 at 12:14:20PM +0400, Oleg Derevenetz wrote:
 
   Until you (or a developer) have analyzed the resulting information,
   you cannot definitively determine whether or not your problem is
 the
   same as a given random other problem, and you may just confuse the
   issue by making claims of similarity when you are really reporting
 a
   completely separate problem.
  
  Not all people can do deadlock debugging, though. In my case turning
 on 
  INVARIANTS and WITNESS leads to unacceptable performance penalty due
 to heavily 
  loaded server. So I can only describe my case, actions and result
 without 
  providing any debug information.
 
 But you can still do *some* things, e.g. backtraces and/or a coredump:
 every little bit helps.
 
 Ultimately, though, you have to understand and accept that the less
 information you provide, the less chance there is that a developer
 will be able to track down your problem.  In fact a developer may have
 to effectively ignore your problem report altogether, because of what
 I explained about symptoms usually not being enough to tell one bug
 from another.
 
 In general, when you encounter a bug in FreeBSD, you have a little bit
 of work to do on your side before we can start doing the rest.  I
 understand that you may not be in a position to do that work, but that
 means you also need to understand that we can't do it either.

In fact, I solved (or workarounded) this problem for me, so in this thread I 
provide my workaround as possible workaround for users that experiences the 
same problem. This only hint for them, and not a bugreport for you. I could not 
provide a full (or only partial) debug information because I will not back out 
cvsuped sources, will not replace unionfs with nullfs again and will not wait 
week or more for another stuck.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 6.2-STABLE deadlock?

2007-04-25 Thread Adrian Wontroba
On Mon, Apr 23, 2007 at 03:56:32AM +0100, Adrian Wontroba wrote:
 Another 6-STABLE (cvsupped on 27/03/07) example, with diagnostics taken
 rather sooner after the hang.  Processes with wmesg=ufs feature often in
 the ps output.

Thanks for the assorted replies. I'll try using INVARIANTS, as I'd much
prefer a panic and automatic reboot rather than creeping death for this
server which is only attended 06:00-22:00 weekdays yet is a critical
point in our 24*7 monitoring. Sigh. Champagne tastes on a beer budget.

Other background information (from memory as I can't access it from
here):
I'm not using unionfs / nullfs.
I am using MFS and softupdates.
NFS is in occassional use.
NTFS is not used.
The server is ancient. A 4 way Zeon, state of the art in 1998.
The problem has gone from absent through reproducible if you
try hard enough to strikes according to Murphy through 5.5-STABLE to
6.2-STABLE.

Once the sendmail milter ABI damage is fixed I'll bring the machine up
to date - it is also the build box for a dozen or so machines running
something close to 6.2-RELEASE, and I'd rather not upgrade all of them
them when the next ClamAV release appears.

If / when it hangs again, I'll include more information and maybe even
raise a real PR.

-- 
Adrian Wontroba
It's always a long day; 86400 doesn't fit into a short.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: How to report bugs (Re: 6.2-STABLE deadlock?)

2007-04-25 Thread Kris Kennaway
On Wed, Apr 25, 2007 at 10:20:25PM +0400, Oleg Derevenetz wrote:
 ??? Kris Kennaway [EMAIL PROTECTED]:
 
  On Wed, Apr 25, 2007 at 12:14:20PM +0400, Oleg Derevenetz wrote:
  
Until you (or a developer) have analyzed the resulting information,
you cannot definitively determine whether or not your problem is
  the
same as a given random other problem, and you may just confuse the
issue by making claims of similarity when you are really reporting
  a
completely separate problem.
   
   Not all people can do deadlock debugging, though. In my case turning
  on 
   INVARIANTS and WITNESS leads to unacceptable performance penalty due
  to heavily 
   loaded server. So I can only describe my case, actions and result
  without 
   providing any debug information.
  
  But you can still do *some* things, e.g. backtraces and/or a coredump:
  every little bit helps.
  
  Ultimately, though, you have to understand and accept that the less
  information you provide, the less chance there is that a developer
  will be able to track down your problem.  In fact a developer may have
  to effectively ignore your problem report altogether, because of what
  I explained about symptoms usually not being enough to tell one bug
  from another.
  
  In general, when you encounter a bug in FreeBSD, you have a little bit
  of work to do on your side before we can start doing the rest.  I
  understand that you may not be in a position to do that work, but that
  means you also need to understand that we can't do it either.
 
 In fact, I solved (or workarounded) this problem for me, so in this thread I 
 provide my workaround as possible workaround for users that experiences the 
 same problem. This only hint for them, and not a bugreport for you. I could 
 not 
 provide a full (or only partial) debug information because I will not back 
 out 
 cvsuped sources, will not replace unionfs with nullfs again and will not wait 
 week or more for another stuck.

OK.  FYI I use nullfs on a few dozen heavily loaded machines without
issue for the past year or so, so if you are seeing a nullfs issue it
is probably an obscure one.

Kris


pgpIUn3mCMoxg.pgp
Description: PGP signature


Re: 6.2-STABLE deadlock?

2007-04-25 Thread Kris Kennaway
On Thu, Apr 26, 2007 at 02:01:57AM +0100, Adrian Wontroba wrote:
 On Mon, Apr 23, 2007 at 03:56:32AM +0100, Adrian Wontroba wrote:
  Another 6-STABLE (cvsupped on 27/03/07) example, with diagnostics taken
  rather sooner after the hang.  Processes with wmesg=ufs feature often in
  the ps output.
 
 Thanks for the assorted replies. I'll try using INVARIANTS, as I'd much
 prefer a panic and automatic reboot rather than creeping death for this
 server which is only attended 06:00-22:00 weekdays yet is a critical
 point in our 24*7 monitoring. Sigh. Champagne tastes on a beer budget.
 
 Other background information (from memory as I can't access it from
 here):
 I'm not using unionfs / nullfs.
 I am using MFS and softupdates.
 NFS is in occassional use.
 NTFS is not used.
 The server is ancient. A 4 way Zeon, state of the art in 1998.
 The problem has gone from absent through reproducible if you
 try hard enough to strikes according to Murphy through 5.5-STABLE to
 6.2-STABLE.
 
 Once the sendmail milter ABI damage is fixed I'll bring the machine up
 to date - it is also the build box for a dozen or so machines running
 something close to 6.2-RELEASE, and I'd rather not upgrade all of them
 them when the next ClamAV release appears.

Fixed the other day, FYI.

 If / when it hangs again, I'll include more information and maybe even
 raise a real PR.

Thanks.

Kris
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 6.2-STABLE deadlock?

2007-04-24 Thread LI Xin
Kostik Belousov wrote:
 On Mon, Apr 23, 2007 at 03:56:32AM +0100, Adrian Wontroba wrote:
 On Tue, Mar 13, 2007 at 02:08:48PM +, Adrian Wontroba wrote:
 At work, amoungst my stable of old computers running FreeBSD, I have a
 Fujitsu M800 - a 4 Zeon SMP processor with 4 GB of memory. This
 primarily runs Nagios and a small and lightly used MySQL database, along
 with a few inbound FTP transfers per minute. It has a Mylex card based
 disc subsystem, ruling out crash dumps.

 At some point during 5.5-STABLE this machine started to occasionally hang 
 ...
 Another 6-STABLE (cvsupped on 27/03/07) example, with diagnostics taken
 rather sooner after the hang.  Processes with wmesg=ufs feature often in
 the ps output.

 http://www.stade.co.uk/crash1/
 
 I would suspect the mlx controller. There is several processes (for instance,
 988, 50918) waiting for completion of block read, and processes in the ufs
 states are the result of the lock cascade, IMHO.

I'm not very sure if this is specific to one disk controller.  Actually
I got some occasional reports about similar hangs on amd64 6.2-RELEASE
(slightly patched version) that most of processes stuck in the 'ufs'
state, under very light load, the box was equipped with amr(4) RAID.

I was not able to reproduce the problem at my lab, though, it's still
unknown that how to trigger the livelock :-(  Still need some
investigate on their production system.

Cheers,
-- 
Xin LI [EMAIL PROTECTED]  http://www.delphij.net/
FreeBSD - The Power to Serve!



signature.asc
Description: OpenPGP digital signature


Re: 6.2-STABLE deadlock?

2007-04-24 Thread Oleg Derevenetz
Цитирую LI Xin [EMAIL PROTECTED]:

 Kostik Belousov wrote:
  On Mon, Apr 23, 2007 at 03:56:32AM +0100, Adrian Wontroba wrote:
  On Tue, Mar 13, 2007 at 02:08:48PM +, Adrian Wontroba wrote:
  At work, amoungst my stable of old computers running FreeBSD, I have
 a
  Fujitsu M800 - a 4 Zeon SMP processor with 4 GB of memory. This
  primarily runs Nagios and a small and lightly used MySQL database,
 along
  with a few inbound FTP transfers per minute. It has a Mylex card
 based
  disc subsystem, ruling out crash dumps.
 
  At some point during 5.5-STABLE this machine started to occasionally
 hang ...
  Another 6-STABLE (cvsupped on 27/03/07) example, with diagnostics
 taken
  rather sooner after the hang.  Processes with wmesg=ufs feature often
 in
  the ps output.
 
  http://www.stade.co.uk/crash1/
  
  I would suspect the mlx controller. There is several processes (for
 instance,
  988, 50918) waiting for completion of block read, and processes in the
 ufs
  states are the result of the lock cascade, IMHO.
 
 I'm not very sure if this is specific to one disk controller.  Actually
 I got some occasional reports about similar hangs on amd64 6.2-RELEASE
 (slightly patched version) that most of processes stuck in the 'ufs'
 state, under very light load, the box was equipped with amr(4) RAID.
 
 I was not able to reproduce the problem at my lab, though, it's still
 unknown that how to trigger the livelock :-(  Still need some
 investigate on their production system.

I reported simular issue for FreeBSD 6.2 in audit-trail for kern/104406:

http://www.freebsd.org/cgi/query-pr.cgi?pr=104406cat=

and there should be a thread related to this. Briefly, I suspects that this is 
related to nullfs filesystems on my server and when I cvsuped to FreeBSD 6.2-
STABLE with Daichi's unionfs-related patches and replaced nullfs-mounted fs 
with unionfs-mounted (that was done 10.03.07) problem is gone (seems to be so, 
at least).
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


RE: 6.2-STABLE deadlock?

2007-04-24 Thread Jan Mikkelsen
LI Xin wrote:
 Kostik Belousov wrote:
  On Mon, Apr 23, 2007 at 03:56:32AM +0100, Adrian Wontroba wrote:
  On Tue, Mar 13, 2007 at 02:08:48PM +, Adrian Wontroba wrote:
  At work, amoungst my stable of old computers running 
 FreeBSD, I have a
  Fujitsu M800 - a 4 Zeon SMP processor with 4 GB of memory. This
  primarily runs Nagios and a small and lightly used MySQL 
 database, along
  with a few inbound FTP transfers per minute. It has a 
 Mylex card based
  disc subsystem, ruling out crash dumps.
 
  At some point during 5.5-STABLE this machine started to 
 occasionally hang ...
  Another 6-STABLE (cvsupped on 27/03/07) example, with 
 diagnostics taken
  rather sooner after the hang.  Processes with wmesg=ufs 
 feature often in
  the ps output.
 
  http://www.stade.co.uk/crash1/
  
  I would suspect the mlx controller. There is several 
 processes (for instance,
  988, 50918) waiting for completion of block read, and 
 processes in the ufs
  states are the result of the lock cascade, IMHO.
 
 I'm not very sure if this is specific to one disk controller. 
  Actually
 I got some occasional reports about similar hangs on amd64 6.2-RELEASE
 (slightly patched version) that most of processes stuck in the 'ufs'
 state, under very light load, the box was equipped with amr(4) RAID.
 
 I was not able to reproduce the problem at my lab, though, it's still
 unknown that how to trigger the livelock :-(  Still need some
 investigate on their production system.

I have seen something similar once, on a machine with an Areca (arcmsr)
controller, running 6.2-RELEASE (with unionfs patches).  Processes stuck in
ufs, and the machine needed physical intervention to reboot.  I haven't
seen it since.  From memory, it happened during startup of the applications
and jails on the machine.

Jan.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


RE: 6.2-STABLE deadlock?

2007-04-24 Thread Jan Mikkelsen
Oleg Derevenetz wrote:
 [ ... ] 
 I reported simular issue for FreeBSD 6.2 in audit-trail for 
 kern/104406:
 
 http://www.freebsd.org/cgi/query-pr.cgi?pr=104406cat=
 
 and there should be a thread related to this. Briefly, I 
 suspects that this is 
 related to nullfs filesystems on my server and when I cvsuped 
 to FreeBSD 6.2-
 STABLE with Daichi's unionfs-related patches and replaced 
 nullfs-mounted fs 
 with unionfs-mounted (that was done 10.03.07) problem is gone 
 (seems to be so, 
 at least).

Interesting.  In the instance I saw, there were also nullfs filesystems
mounted.

Regards,

Jan.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 6.2-STABLE deadlock?

2007-04-24 Thread Eugene Grosbein
Kostik Belousov wrote:

 I would suspect the mlx controller. There is several processes (for instance,
 988, 50918) waiting for completion of block read, and processes in the ufs
 states are the result of the lock cascade, IMHO.

It may be possible that controller is not guilty.

You can easily reproduce lock in ufs state with commands from
the How-To-Repeat section of:
http://www.FreeBSD.org/cgi/query-pr.cgi?pr=kern/107439

The PR is closed but the problem still exists in recent 6.2-STABLE.
GENERIC has the problem too, GENERIC+INVARIANTS panices at once
instead of producing locked processes.

Eugene Grosbein.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 6.2-STABLE deadlock?

2007-04-24 Thread LI Xin
Hi, Oleg,

Oleg Derevenetz wrote:
 Цитирую LI Xin [EMAIL PROTECTED]:
[...]
 I'm not very sure if this is specific to one disk controller.  Actually
 I got some occasional reports about similar hangs on amd64 6.2-RELEASE
 (slightly patched version) that most of processes stuck in the 'ufs'
 state, under very light load, the box was equipped with amr(4) RAID.

 I was not able to reproduce the problem at my lab, though, it's still
 unknown that how to trigger the livelock :-(  Still need some
 investigate on their production system.
 
 I reported simular issue for FreeBSD 6.2 in audit-trail for kern/104406:
 
 http://www.freebsd.org/cgi/query-pr.cgi?pr=104406cat=
 
 and there should be a thread related to this. Briefly, I suspects that this 
 is 
 related to nullfs filesystems on my server and when I cvsuped to FreeBSD 6.2-
 STABLE with Daichi's unionfs-related patches and replaced nullfs-mounted fs 
 with unionfs-mounted (that was done 10.03.07) problem is gone (seems to be 
 so, 
 at least).

Hmm...  Seems to be different issues.  The problem I have received was a
pgsql server (no nullfs/unionfs involved), and the hang always happen
when it is not being heavily loaded (usually in the morning, for
instance, and there is no special configuration, like scheduled tasks
which can generate disk load, etc., only the entropy harvesting), so
this is quite confusing.

Cheers,
-- 
Xin LI [EMAIL PROTECTED]  http://www.delphij.net/
FreeBSD - The Power to Serve!



signature.asc
Description: OpenPGP digital signature


Re: 6.2-STABLE deadlock?

2007-04-24 Thread Kris Kennaway
On Wed, Apr 25, 2007 at 11:53:32AM +1000, Jan Mikkelsen wrote:
 LI Xin wrote:
  Kostik Belousov wrote:
   On Mon, Apr 23, 2007 at 03:56:32AM +0100, Adrian Wontroba wrote:
   On Tue, Mar 13, 2007 at 02:08:48PM +, Adrian Wontroba wrote:
   At work, amoungst my stable of old computers running 
  FreeBSD, I have a
   Fujitsu M800 - a 4 Zeon SMP processor with 4 GB of memory. This
   primarily runs Nagios and a small and lightly used MySQL 
  database, along
   with a few inbound FTP transfers per minute. It has a 
  Mylex card based
   disc subsystem, ruling out crash dumps.
  
   At some point during 5.5-STABLE this machine started to 
  occasionally hang ...
   Another 6-STABLE (cvsupped on 27/03/07) example, with 
  diagnostics taken
   rather sooner after the hang.  Processes with wmesg=ufs 
  feature often in
   the ps output.
  
   http://www.stade.co.uk/crash1/
   
   I would suspect the mlx controller. There is several 
  processes (for instance,
   988, 50918) waiting for completion of block read, and 
  processes in the ufs
   states are the result of the lock cascade, IMHO.
  
  I'm not very sure if this is specific to one disk controller. 
   Actually
  I got some occasional reports about similar hangs on amd64 6.2-RELEASE
  (slightly patched version) that most of processes stuck in the 'ufs'
  state, under very light load, the box was equipped with amr(4) RAID.
  
  I was not able to reproduce the problem at my lab, though, it's still
  unknown that how to trigger the livelock :-(  Still need some
  investigate on their production system.
 
 I have seen something similar once, on a machine with an Areca (arcmsr)
 controller, running 6.2-RELEASE (with unionfs patches).  Processes stuck in
 ufs, and the machine needed physical intervention to reboot.  I haven't
 seen it since.  From memory, it happened during startup of the applications
 and jails on the machine.

Sounds like one of the known unionfs bugs.

Kris
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


How to report bugs (Re: 6.2-STABLE deadlock?)

2007-04-24 Thread Kris Kennaway
On Wed, Apr 25, 2007 at 10:53:08AM +0800, LI Xin wrote:
 Hi, Oleg,
 
 Oleg Derevenetz wrote:
  ??? LI Xin [EMAIL PROTECTED]:
 [...]
  I'm not very sure if this is specific to one disk controller.  Actually
  I got some occasional reports about similar hangs on amd64 6.2-RELEASE
  (slightly patched version) that most of processes stuck in the 'ufs'
  state, under very light load, the box was equipped with amr(4) RAID.
 
  I was not able to reproduce the problem at my lab, though, it's still
  unknown that how to trigger the livelock :-(  Still need some
  investigate on their production system.
  
  I reported simular issue for FreeBSD 6.2 in audit-trail for kern/104406:
  
  http://www.freebsd.org/cgi/query-pr.cgi?pr=104406cat=
  
  and there should be a thread related to this. Briefly, I suspects that this 
  is 
  related to nullfs filesystems on my server and when I cvsuped to FreeBSD 
  6.2-
  STABLE with Daichi's unionfs-related patches and replaced nullfs-mounted fs 
  with unionfs-mounted (that was done 10.03.07) problem is gone (seems to be 
  so, 
  at least).
 
 Hmm...  Seems to be different issues.  The problem I have received was a
 pgsql server (no nullfs/unionfs involved), and the hang always happen
 when it is not being heavily loaded (usually in the morning, for
 instance, and there is no special configuration, like scheduled tasks
 which can generate disk load, etc., only the entropy harvesting), so
 this is quite confusing.

Yes, a large part of the confusion is the unfortunate tendency of
people to do the following:

user1 my system hangs/panics/etc
user2 my system hangs/panics/etc too; it must be the same problem!

What we really need is for every FreeBSD user who encounters a
hang/panic/etc to avoid jumping to conclusions -- no matter how many
superficial similarities there may seem to you -- and instead go
through the relevant steps described here:

  
http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug.html

Until you (or a developer) have analyzed the resulting information,
you cannot definitively determine whether or not your problem is the
same as a given random other problem, and you may just confuse the
issue by making claims of similarity when you are really reporting a
completely separate problem.

Thanks,
Kris

pgp3OkN96LYEW.pgp
Description: PGP signature


Re: 6.2-STABLE deadlock?

2007-04-23 Thread Kostik Belousov
On Mon, Apr 23, 2007 at 03:56:32AM +0100, Adrian Wontroba wrote:
 On Tue, Mar 13, 2007 at 02:08:48PM +, Adrian Wontroba wrote:
  At work, amoungst my stable of old computers running FreeBSD, I have a
  Fujitsu M800 - a 4 Zeon SMP processor with 4 GB of memory. This
  primarily runs Nagios and a small and lightly used MySQL database, along
  with a few inbound FTP transfers per minute. It has a Mylex card based
  disc subsystem, ruling out crash dumps.
  
  At some point during 5.5-STABLE this machine started to occasionally hang 
  ...
 
 Another 6-STABLE (cvsupped on 27/03/07) example, with diagnostics taken
 rather sooner after the hang.  Processes with wmesg=ufs feature often in
 the ps output.
 
 http://www.stade.co.uk/crash1/

I would suspect the mlx controller. There is several processes (for instance,
988, 50918) waiting for completion of block read, and processes in the ufs
states are the result of the lock cascade, IMHO.


pgpToTKzpBFHu.pgp
Description: PGP signature


Re: 6.2-STABLE deadlock?

2007-04-22 Thread Adrian Wontroba
On Tue, Mar 13, 2007 at 02:08:48PM +, Adrian Wontroba wrote:
 At work, amoungst my stable of old computers running FreeBSD, I have a
 Fujitsu M800 - a 4 Zeon SMP processor with 4 GB of memory. This
 primarily runs Nagios and a small and lightly used MySQL database, along
 with a few inbound FTP transfers per minute. It has a Mylex card based
 disc subsystem, ruling out crash dumps.
 
 At some point during 5.5-STABLE this machine started to occasionally hang ...

Another 6-STABLE (cvsupped on 27/03/07) example, with diagnostics taken
rather sooner after the hang.  Processes with wmesg=ufs feature often in
the ps output.

http://www.stade.co.uk/crash1/

-- 
Adrian Wontroba
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


6.2-STABLE deadlock?

2007-03-13 Thread Adrian Wontroba
At work, amoungst my stable of old computers running FreeBSD, I have a
Fujitsu M800 - a 4 Zeon SMP processor with 4 GB of memory. This
primarily runs Nagios and a small and lightly used MySQL database, along
with a few inbound FTP transfers per minute. It has a Mylex card based
disc subsystem, ruling out crash dumps.

At some point during 5.5-STABLE this machine started to occasionally hang while
performing its daily application housekeeping - closing and restarting
Apache and Nagios, and dumping the database. Upgrading to 6.2-STABLE
appeared to solve the problem, with no problems visible while running
1,000 cycles of the sequence which seemed to provoke the problem.

cvsup for this version of the kernel and userland was run at 01:20 GMT
on 06 March.

However, shortly after 15:15 last Sunday afternoon the machine hung
again out of the blue. kdb diagnostics were taken some 12 hours later,
and look somewhat odd. Maybe it was left to fester for too long.

ps etc output at http://www.stade.co.uk/crash/console which contains
boot to boot serial console output, including some output from test
cycles. I'd be grateful for any expert comments on the ps etc output.

Supporting stuff. 

[EMAIL PROTECTED] ~/crash]# df
Filesystem1K-blocks UsedAvail Capacity  Mounted on
/dev/mlxd0s1a50763070074   39694615%/
devfs 110   100%/dev
/dev/mlxd0s1f  63541498 44355014 1410316676%/home
/dev/mlxd0s1e  16244334  6784900  815988845%/usr
/dev/mlxd0s1d   1012974   117456   81448213%/var
/dev/md0   1646   32 1484 2%/home/topftp/instances
/dev/md1 253678  132   233252 0%/tmp

[EMAIL PROTECTED] ~]# find /var -inum 23 -ls
234 -rw-r--r--1 daemon   daemon 60 Mar 
12 20:22 /var/rwho/whod.xjamesfriis

Problem stopped http and FTP logging soon after 15:14 on Sunday 11, diagnostics 
taken and machine rebooted around 04:30 on Monday 12.

172.19.112.92 - - [11/Mar/2007:15:14:53 +] GET / HTTP/1.0 200 688 - 
check_http/1.89 (nagios-plugins 1.4.3)
time passes
172.19.112.92 - - [12/Mar/2007:04:44:14 +] GET / HTTP/1.0 200 688 - 
check_http/1.89 (nagios-plugins 1.4.3)

Mar 11 15:15:35 beastie ftpd[91652]: connection from appsupcen (10.208.1.134)
Mar 11 15:15:35 beastie ftpd[91652]: FTP LOGIN FROM appsupcen as topftp
Mar 11 15:15:35 beastie ftpd[91652]: session root changed to 
/home/topftp/instances
Mar 11 15:15:35 beastie ftpd[91652]: put in.env_status.html.gz = 592 bytes (wd: 
/topftp/appsupcen; chrooted)
time passes
Mar 11 15:15:35 beastie ftpd[91652]: rename in.env_status.html.gz 
env_status.html.gz (wd: /topftp/appsupcen; chrooted)
Mar 12 04:44:31 beastie ftpd[1161]: connection from appsupcen (10.208.1.134)
Mar 12 04:44:31 beastie ftpd[1161]: FTP LOGIN FROM appsupcen as topftp
Mar 12 04:44:31 beastie ftpd[1161]: session root changed to 
/home/topftp/instances
Mar 12 04:44:31 beastie ftpd[1161]: mkdir topftp/appsupcen (wd: /; chrooted)

Support diary:

15:20
Beastie seems like its crashed and down;

16:54
Beastie is now longer pingable by rjmon1;

04:30 - 04:43
(support person quoting from the documentation I'd provided about what
to do after a hang)
Type return tilde hash (CR~#) which will make cu send a break signal to 
beastie, and should cause beastie to drop into the ddb kernel debugger.
In the following, you may see more prompts. Type space at each for the next 
page.
Type these debugger commands
ps
show pcpu
show allpcpu
show locks
show alllocks
show lockedvnods
trace
alltrace
04:43 - beastie now back up and working now by typing call cpu_reset()
after the above commands to reboot beastie.

AW: preserved and inspected diagnostic output. It looks very unlike
that for previous crashes (without a serial console) where a noticable
feature was many ftpd processes in a UFS state. Possibly things
happened in the 12 hour period between the onset of the problem on
Sunday afternoon and the diagnostics being taken on Monday morning.

-- 
Adrian Wontroba
Adrian's Birthday Celebration: Crewe Limelight, Saturday 17 March. David
Hughes and Tiny Tin Lady.  Free but ticketed - email me your postal
address if you want to come. No under 18s.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]