Re: How to report bugs (Re: 6.2-STABLE deadlock?)

2007-04-27 Thread Daniel O'Connor
On Saturday 28 April 2007 04:33, Marc G. Fournier wrote:
> A thought: how hard would it be to add some method of forcing a
> system crash, that would dump core, from the command line?  Something
> that, by default, would be disabled, but for remote debugging
> purposes, one could enable in the kernel and do a 'sysctl
> kernel.force_core_crash=1' to have it do it?  I imagine that having a
> core to analyze would allow providing more information then nothing
> at all, no?

I think you can do this..
sysctl debug.kdb.panic=1

Alas that appears to be a -current thing. 6.x has debug.kdb.enter 
though.

-- 
Daniel O'Connor software and network engineer
for Genesis Software - http://www.gsoft.com.au
"The nice thing about standards is that there
are so many of them to choose from."
  -- Andrew Tanenbaum
GPG Fingerprint - 5596 B766 97C0 0E94 4347 295E E593 DC20 7B3F CE8C


pgp1RMXdUwoh1.pgp
Description: PGP signature


Re: How to report bugs (Re: 6.2-STABLE deadlock?)

2007-04-27 Thread Marc G. Fournier
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1



- --On Friday, April 27, 2007 22:57:29 +0200 Nicolas Rachinsky 
<[EMAIL PROTECTED]> wrote:

> * "Marc G. Fournier" <[EMAIL PROTECTED]> [2007-04-27 16:03 -0300]:
>> A thought: how hard would it be to add some method of forcing a system
>> crash,  that would dump core, from the command line?  Something that, by
>> default, would
>
> Doesn't 'kill -6 1' work anymore?

I'd never heard of that one ... will it dump core if I do that?

Please note, in my case, with the Buffer Space issue ... I can login and 
cleanly reboot the server, so doing something like the above to get a core dump 
is definitely doable, I'd just never seen a reference to a 'kill -6 1' before 
for doing that ...

Side question to this though ... I remember awhile back using a 'client-server' 
mechanism that allowed me to dump core to a seperate server ... it was so long 
ago that my memory is faint, but there was a reason why I couldn't dump to the 
local server ... not sure whatever happened to that code, but, if one can do 
that for dumping core, shouldn't there be some method possible to connect to 
DDB over the Ethernet without having to have a serial console in place?  For 
the core dump case, the ethernet obviously stayed up while it dump'd, couldn't 
some sort of 'ddb.conf' file be setup that would allow it to ifconfig an IP 
within that shell so that you could connect to it remotely?  say with an 
'from-ip' directive?

Just a thought ...


- 
Marc G. Fournier   Hub.Org Networking Services (http://www.hub.org)
Email . [EMAIL PROTECTED]  MSN . [EMAIL PROTECTED]
Yahoo . yscrappy   Skype: hub.orgICQ . 7615664
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.5 (FreeBSD)

iD8DBQFGMmx04QvfyHIvDvMRAlNcAJ0QcIMoRnq+0T9yJVuMwZvTNQnNXwCfaEKK
JB4cHzSbiklD/sodWvNSSzE=
=BwuL
-END PGP SIGNATURE-

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: How to report bugs (Re: 6.2-STABLE deadlock?)

2007-04-27 Thread Nicolas Rachinsky
* "Marc G. Fournier" <[EMAIL PROTECTED]> [2007-04-27 16:03 -0300]:
> A thought: how hard would it be to add some method of forcing a system crash, 
> that would dump core, from the command line?  Something that, by default, 
> would 

Doesn't 'kill -6 1' work anymore?

Nicolas

-- 
http://www.rachinsky.de/nicolas
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: How to report bugs (Re: 6.2-STABLE deadlock?)

2007-04-27 Thread Marc G. Fournier
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1



- --On Tuesday, April 24, 2007 23:53:16 -0400 Kris Kennaway
<[EMAIL PROTECTED]> 
wrote:

> On Wed, Apr 25, 2007 at 10:53:08AM +0800, LI Xin wrote:
>> Hi, Oleg,
>>
>> Oleg Derevenetz wrote:
>> > ??? LI Xin <[EMAIL PROTECTED]>:
>> [...]
>> >> I'm not very sure if this is specific to one disk controller.  Actually
>> >> I got some occasional reports about similar hangs on amd64 6.2-RELEASE
>> >> (slightly patched version) that most of processes stuck in the 'ufs'
>> >> state, under very light load, the box was equipped with amr(4) RAID.
>> >>
>> >> I was not able to reproduce the problem at my lab, though, it's still
>> >> unknown that how to trigger the livelock :-(  Still need some
>> >> investigate on their production system.
>> >
>> > I reported simular issue for FreeBSD 6.2 in audit-trail for kern/104406:
>> >
>> > http://www.freebsd.org/cgi/query-pr.cgi?pr=104406&cat=
>> >
>> > and there should be a thread related to this. Briefly, I suspects that
>> > this is  related to nullfs filesystems on my server and when I cvsuped to
>> > FreeBSD 6.2- STABLE with Daichi's unionfs-related patches and replaced
>> > nullfs-mounted fs  with unionfs-mounted (that was done 10.03.07) problem
>> > is gone (seems to be so,  at least).
>>
>> Hmm...  Seems to be different issues.  The problem I have received was a
>> pgsql server (no nullfs/unionfs involved), and the hang always happen
>> when it is not being heavily loaded (usually in the morning, for
>> instance, and there is no special configuration, like scheduled tasks
>> which can generate disk load, etc., only the entropy harvesting), so
>> this is quite confusing.
>
> Yes, a large part of the confusion is the unfortunate tendency of
> people to do the following:
>
>  my system hangs/panics/etc
>  my system hangs/panics/etc too; it must be the same problem!
>
> What we really need is for every FreeBSD user who encounters a
> hang/panic/etc to avoid jumping to conclusions -- no matter how many
> superficial similarities there may seem to you -- and instead go
> through the relevant steps described here:
>
>
> http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kernelde
> bug.html
>
> Until you (or a developer) have analyzed the resulting information,
> you cannot definitively determine whether or not your problem is the
> same as a given random other problem, and you may just confuse the
> issue by making claims of similarity when you are really reporting a
> completely separate problem.

What about those that don't have the benefit of being able to access the 
console? :(  I've recently started buying servers that have builtin, full 
remote console (ie. the HP servers), but, for instance, I have one box that I 
have to consistently reboot ever 3 days due to a 'No Buffer Space Available' 
...

A thought: how hard would it be to add some method of forcing a system crash, 
that would dump core, from the command line?  Something that, by default, would 
be disabled, but for remote debugging purposes, one could enable in the kernel 
and do a 'sysctl kernel.force_core_crash=1' to have it do it?  I imagine that 
having a core to analyze would allow providing more information then nothing at 
all, no?


- 
Marc G. Fournier   Hub.Org Networking Services (http://www.hub.org)
Email . [EMAIL PROTECTED]  MSN . [EMAIL PROTECTED]
Yahoo . yscrappy   Skype: hub.orgICQ . 7615664
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.5 (FreeBSD)

iD8DBQFGMkj34QvfyHIvDvMRAnIsAJ42loBGh0TkX4mfWSrZrMq2FheBuQCgiu4l
B0PCLtLhd9ZiJ4oNLWZ6LT0=
=KK9Y
-END PGP SIGNATURE-

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: 6.2-STABLE deadlock?

2007-04-25 Thread Kris Kennaway
On Thu, Apr 26, 2007 at 02:01:57AM +0100, Adrian Wontroba wrote:
> On Mon, Apr 23, 2007 at 03:56:32AM +0100, Adrian Wontroba wrote:
> > Another 6-STABLE (cvsupped on 27/03/07) example, with diagnostics taken
> > rather sooner after the hang.  Processes with wmesg=ufs feature often in
> > the ps output.
> 
> Thanks for the assorted replies. I'll try using INVARIANTS, as I'd much
> prefer a panic and automatic reboot rather than creeping death for this
> server which is only attended 06:00-22:00 weekdays yet is a critical
> point in our 24*7 monitoring. Sigh. Champagne tastes on a beer budget.
> 
> Other background information (from memory as I can't access it from
> here):
> I'm not using unionfs / nullfs.
> I am using MFS and softupdates.
> NFS is in occassional use.
> NTFS is not used.
> The server is ancient. A 4 way Zeon, state of the art in 1998.
> The problem has gone from "absent" through "reproducible if you
> try hard enough" to "strikes according to Murphy" through 5.5-STABLE to
> 6.2-STABLE.
> 
> Once the sendmail milter ABI damage is fixed I'll bring the machine up
> to date - it is also the build box for a dozen or so machines running
> something close to 6.2-RELEASE, and I'd rather not upgrade all of them
> them when the next ClamAV release appears.

Fixed the other day, FYI.

> If / when it hangs again, I'll include more information and maybe even
> raise a real PR.

Thanks.

Kris
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: How to report bugs (Re: 6.2-STABLE deadlock?)

2007-04-25 Thread Kris Kennaway
On Wed, Apr 25, 2007 at 10:20:25PM +0400, Oleg Derevenetz wrote:
> ??? Kris Kennaway <[EMAIL PROTECTED]>:
> 
> > On Wed, Apr 25, 2007 at 12:14:20PM +0400, Oleg Derevenetz wrote:
> > 
> > > > Until you (or a developer) have analyzed the resulting information,
> > > > you cannot definitively determine whether or not your problem is
> > the
> > > > same as a given random other problem, and you may just confuse the
> > > > issue by making claims of similarity when you are really reporting
> > a
> > > > completely separate problem.
> > > 
> > > Not all people can do deadlock debugging, though. In my case turning
> > on 
> > > INVARIANTS and WITNESS leads to unacceptable performance penalty due
> > to heavily 
> > > loaded server. So I can only describe my case, actions and result
> > without 
> > > providing any debug information.
> > 
> > But you can still do *some* things, e.g. backtraces and/or a coredump:
> > every little bit helps.
> > 
> > Ultimately, though, you have to understand and accept that the less
> > information you provide, the less chance there is that a developer
> > will be able to track down your problem.  In fact a developer may have
> > to effectively ignore your problem report altogether, because of what
> > I explained about "symptoms" usually not being enough to tell one bug
> > from another.
> > 
> > In general, when you encounter a bug in FreeBSD, you have a little bit
> > of work to do on your side before we can start doing the rest.  I
> > understand that you may not be in a position to do that work, but that
> > means you also need to understand that we can't do it either.
> 
> In fact, I solved (or workarounded) this problem for me, so in this thread I 
> provide my workaround as possible workaround for users that experiences the 
> same problem. This only hint for them, and not a bugreport for you. I could 
> not 
> provide a full (or only partial) debug information because I will not back 
> out 
> cvsuped sources, will not replace unionfs with nullfs again and will not wait 
> week or more for another stuck.

OK.  FYI I use nullfs on a few dozen heavily loaded machines without
issue for the past year or so, so if you are seeing a nullfs issue it
is probably an obscure one.

Kris


pgpIUn3mCMoxg.pgp
Description: PGP signature


Re: 6.2-STABLE deadlock?

2007-04-25 Thread Adrian Wontroba
On Mon, Apr 23, 2007 at 03:56:32AM +0100, Adrian Wontroba wrote:
> Another 6-STABLE (cvsupped on 27/03/07) example, with diagnostics taken
> rather sooner after the hang.  Processes with wmesg=ufs feature often in
> the ps output.

Thanks for the assorted replies. I'll try using INVARIANTS, as I'd much
prefer a panic and automatic reboot rather than creeping death for this
server which is only attended 06:00-22:00 weekdays yet is a critical
point in our 24*7 monitoring. Sigh. Champagne tastes on a beer budget.

Other background information (from memory as I can't access it from
here):
I'm not using unionfs / nullfs.
I am using MFS and softupdates.
NFS is in occassional use.
NTFS is not used.
The server is ancient. A 4 way Zeon, state of the art in 1998.
The problem has gone from "absent" through "reproducible if you
try hard enough" to "strikes according to Murphy" through 5.5-STABLE to
6.2-STABLE.

Once the sendmail milter ABI damage is fixed I'll bring the machine up
to date - it is also the build box for a dozen or so machines running
something close to 6.2-RELEASE, and I'd rather not upgrade all of them
them when the next ClamAV release appears.

If / when it hangs again, I'll include more information and maybe even
raise a real PR.

-- 
Adrian Wontroba
It's always a long day; 86400 doesn't fit into a short.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: How to report bugs (Re: 6.2-STABLE deadlock?)

2007-04-25 Thread Oleg Derevenetz
Цитирую Kris Kennaway <[EMAIL PROTECTED]>:

> On Wed, Apr 25, 2007 at 12:14:20PM +0400, Oleg Derevenetz wrote:
> 
> > > Until you (or a developer) have analyzed the resulting information,
> > > you cannot definitively determine whether or not your problem is
> the
> > > same as a given random other problem, and you may just confuse the
> > > issue by making claims of similarity when you are really reporting
> a
> > > completely separate problem.
> > 
> > Not all people can do deadlock debugging, though. In my case turning
> on 
> > INVARIANTS and WITNESS leads to unacceptable performance penalty due
> to heavily 
> > loaded server. So I can only describe my case, actions and result
> without 
> > providing any debug information.
> 
> But you can still do *some* things, e.g. backtraces and/or a coredump:
> every little bit helps.
> 
> Ultimately, though, you have to understand and accept that the less
> information you provide, the less chance there is that a developer
> will be able to track down your problem.  In fact a developer may have
> to effectively ignore your problem report altogether, because of what
> I explained about "symptoms" usually not being enough to tell one bug
> from another.
> 
> In general, when you encounter a bug in FreeBSD, you have a little bit
> of work to do on your side before we can start doing the rest.  I
> understand that you may not be in a position to do that work, but that
> means you also need to understand that we can't do it either.

In fact, I solved (or workarounded) this problem for me, so in this thread I 
provide my workaround as possible workaround for users that experiences the 
same problem. This only hint for them, and not a bugreport for you. I could not 
provide a full (or only partial) debug information because I will not back out 
cvsuped sources, will not replace unionfs with nullfs again and will not wait 
week or more for another stuck.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: How to report bugs (Re: 6.2-STABLE deadlock?)

2007-04-25 Thread Kris Kennaway
On Wed, Apr 25, 2007 at 12:14:20PM +0400, Oleg Derevenetz wrote:

> > Until you (or a developer) have analyzed the resulting information,
> > you cannot definitively determine whether or not your problem is the
> > same as a given random other problem, and you may just confuse the
> > issue by making claims of similarity when you are really reporting a
> > completely separate problem.
> 
> Not all people can do deadlock debugging, though. In my case turning on 
> INVARIANTS and WITNESS leads to unacceptable performance penalty due to 
> heavily 
> loaded server. So I can only describe my case, actions and result without 
> providing any debug information.

But you can still do *some* things, e.g. backtraces and/or a coredump:
every little bit helps.

Ultimately, though, you have to understand and accept that the less
information you provide, the less chance there is that a developer
will be able to track down your problem.  In fact a developer may have
to effectively ignore your problem report altogether, because of what
I explained about "symptoms" usually not being enough to tell one bug
from another.

In general, when you encounter a bug in FreeBSD, you have a little bit
of work to do on your side before we can start doing the rest.  I
understand that you may not be in a position to do that work, but that
means you also need to understand that we can't do it either.

Kris


pgpe7wGSIKiIP.pgp
Description: PGP signature


Re: How to report bugs (Re: 6.2-STABLE deadlock?)

2007-04-25 Thread LI Xin
Oleg Derevenetz wrote:
[snip]
> Not all people can do deadlock debugging, though. In my case turning on 
> INVARIANTS and WITNESS leads to unacceptable performance penalty due to 
> heavily 
> loaded server. So I can only describe my case, actions and result without 
> providing any debug information.

I'd say that I completely agree with Kris because that it's very hard
for developers to investigate problems if there is no detailed
information available, especially for those problems that can not easily
reproduced.  Of course, deadlock debugging could be tricky, but having a
backtrace can usually save a lot of time (and fortunately that is not
that hard even for average users :)

What I wanted to suggest is that, we hope that the submitter can provide
detailed steps to reliably reproduce the problem whenever possible, if
they are not able to diagnose the problem themselves, so we will be able
to extract more information at lab, and possibly reach a fix.

The problem I have is that the reporter of the issue is not quite
cooperative as they did before, and what I wanted to say is that it's
possible to trigger the livelock without nullfs/unionfs, and I did not
figured out why (yet) because I can not reproduce it in my environment :-(

Cheers,
-- 
Xin LI <[EMAIL PROTECTED]>  http://www.delphij.net/
FreeBSD - The Power to Serve!



signature.asc
Description: OpenPGP digital signature


Re: How to report bugs (Re: 6.2-STABLE deadlock?)

2007-04-25 Thread Oleg Derevenetz
Цитирую Kris Kennaway <[EMAIL PROTECTED]>:

> > Oleg Derevenetz wrote:
> > > ??? LI Xin <[EMAIL PROTECTED]>:
> > [...]
> > >> I'm not very sure if this is specific to one disk controller. 
> Actually
> > >> I got some occasional reports about similar hangs on amd64
> 6.2-RELEASE
> > >> (slightly patched version) that most of processes stuck in the
> 'ufs'
> > >> state, under very light load, the box was equipped with amr(4)
> RAID.
> > >>
> > >> I was not able to reproduce the problem at my lab, though, it's
> still
> > >> unknown that how to trigger the livelock :-(  Still need some
> > >> investigate on their production system.
> > > 
> > > I reported simular issue for FreeBSD 6.2 in audit-trail for
> kern/104406:
> > > 
> > > http://www.freebsd.org/cgi/query-pr.cgi?pr=104406&cat=
> > > 
> > > and there should be a thread related to this. Briefly, I suspects
> that this is 
> > > related to nullfs filesystems on my server and when I cvsuped to
> FreeBSD 6.2-
> > > STABLE with Daichi's unionfs-related patches and replaced
> nullfs-mounted fs 
> > > with unionfs-mounted (that was done 10.03.07) problem is gone (seems
> to be so, 
> > > at least).
> > 
> > Hmm...  Seems to be different issues.  The problem I have received was
> a
> > pgsql server (no nullfs/unionfs involved), and the hang always happen
> > when it is not being heavily loaded (usually in the morning, for
> > instance, and there is no special configuration, like scheduled tasks
> > which can generate disk load, etc., only the entropy harvesting), so
> > this is quite confusing.
> 
> Yes, a large part of the confusion is the unfortunate tendency of
> people to do the following:
> 
>  my system hangs/panics/etc
>  my system hangs/panics/etc too; it must be the same problem!
> 
> What we really need is for every FreeBSD user who encounters a
> hang/panic/etc to avoid jumping to conclusions -- no matter how many
> superficial similarities there may seem to you -- and instead go
> through the relevant steps described here:
> 
>  
> http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-
handbook/kerneldebug.html
> 
> Until you (or a developer) have analyzed the resulting information,
> you cannot definitively determine whether or not your problem is the
> same as a given random other problem, and you may just confuse the
> issue by making claims of similarity when you are really reporting a
> completely separate problem.

Not all people can do deadlock debugging, though. In my case turning on 
INVARIANTS and WITNESS leads to unacceptable performance penalty due to heavily 
loaded server. So I can only describe my case, actions and result without 
providing any debug information.


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


How to report bugs (Re: 6.2-STABLE deadlock?)

2007-04-24 Thread Kris Kennaway
On Wed, Apr 25, 2007 at 10:53:08AM +0800, LI Xin wrote:
> Hi, Oleg,
> 
> Oleg Derevenetz wrote:
> > ??? LI Xin <[EMAIL PROTECTED]>:
> [...]
> >> I'm not very sure if this is specific to one disk controller.  Actually
> >> I got some occasional reports about similar hangs on amd64 6.2-RELEASE
> >> (slightly patched version) that most of processes stuck in the 'ufs'
> >> state, under very light load, the box was equipped with amr(4) RAID.
> >>
> >> I was not able to reproduce the problem at my lab, though, it's still
> >> unknown that how to trigger the livelock :-(  Still need some
> >> investigate on their production system.
> > 
> > I reported simular issue for FreeBSD 6.2 in audit-trail for kern/104406:
> > 
> > http://www.freebsd.org/cgi/query-pr.cgi?pr=104406&cat=
> > 
> > and there should be a thread related to this. Briefly, I suspects that this 
> > is 
> > related to nullfs filesystems on my server and when I cvsuped to FreeBSD 
> > 6.2-
> > STABLE with Daichi's unionfs-related patches and replaced nullfs-mounted fs 
> > with unionfs-mounted (that was done 10.03.07) problem is gone (seems to be 
> > so, 
> > at least).
> 
> Hmm...  Seems to be different issues.  The problem I have received was a
> pgsql server (no nullfs/unionfs involved), and the hang always happen
> when it is not being heavily loaded (usually in the morning, for
> instance, and there is no special configuration, like scheduled tasks
> which can generate disk load, etc., only the entropy harvesting), so
> this is quite confusing.

Yes, a large part of the confusion is the unfortunate tendency of
people to do the following:

 my system hangs/panics/etc
 my system hangs/panics/etc too; it must be the same problem!

What we really need is for every FreeBSD user who encounters a
hang/panic/etc to avoid jumping to conclusions -- no matter how many
superficial similarities there may seem to you -- and instead go
through the relevant steps described here:

  
http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug.html

Until you (or a developer) have analyzed the resulting information,
you cannot definitively determine whether or not your problem is the
same as a given random other problem, and you may just confuse the
issue by making claims of similarity when you are really reporting a
completely separate problem.

Thanks,
Kris

pgp3OkN96LYEW.pgp
Description: PGP signature


Re: 6.2-STABLE deadlock?

2007-04-24 Thread Kris Kennaway
On Wed, Apr 25, 2007 at 11:53:32AM +1000, Jan Mikkelsen wrote:
> LI Xin wrote:
> > Kostik Belousov wrote:
> > > On Mon, Apr 23, 2007 at 03:56:32AM +0100, Adrian Wontroba wrote:
> > >> On Tue, Mar 13, 2007 at 02:08:48PM +, Adrian Wontroba wrote:
> > >>> At work, amoungst my stable of old computers running 
> > FreeBSD, I have a
> > >>> Fujitsu M800 - a 4 Zeon SMP processor with 4 GB of memory. This
> > >>> primarily runs Nagios and a small and lightly used MySQL 
> > database, along
> > >>> with a few inbound FTP transfers per minute. It has a 
> > Mylex card based
> > >>> disc subsystem, ruling out crash dumps.
> > >>>
> > >>> At some point during 5.5-STABLE this machine started to 
> > occasionally hang ...
> > >> Another 6-STABLE (cvsupped on 27/03/07) example, with 
> > diagnostics taken
> > >> rather sooner after the hang.  Processes with wmesg=ufs 
> > feature often in
> > >> the ps output.
> > >>
> > >> http://www.stade.co.uk/crash1/
> > > 
> > > I would suspect the mlx controller. There is several 
> > processes (for instance,
> > > 988, 50918) waiting for completion of block read, and 
> > processes in the "ufs"
> > > states are the result of the lock cascade, IMHO.
> > 
> > I'm not very sure if this is specific to one disk controller. 
> >  Actually
> > I got some occasional reports about similar hangs on amd64 6.2-RELEASE
> > (slightly patched version) that most of processes stuck in the 'ufs'
> > state, under very light load, the box was equipped with amr(4) RAID.
> > 
> > I was not able to reproduce the problem at my lab, though, it's still
> > unknown that how to trigger the livelock :-(  Still need some
> > investigate on their production system.
> 
> I have seen something similar once, on a machine with an Areca (arcmsr)
> controller, running 6.2-RELEASE (with unionfs patches).  Processes stuck in
> "ufs", and the machine needed physical intervention to reboot.  I haven't
> seen it since.  From memory, it happened during startup of the applications
> and jails on the machine.

Sounds like one of the known unionfs bugs.

Kris
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: 6.2-STABLE deadlock?

2007-04-24 Thread LI Xin
Hi, Oleg,

Oleg Derevenetz wrote:
> Цитирую LI Xin <[EMAIL PROTECTED]>:
[...]
>> I'm not very sure if this is specific to one disk controller.  Actually
>> I got some occasional reports about similar hangs on amd64 6.2-RELEASE
>> (slightly patched version) that most of processes stuck in the 'ufs'
>> state, under very light load, the box was equipped with amr(4) RAID.
>>
>> I was not able to reproduce the problem at my lab, though, it's still
>> unknown that how to trigger the livelock :-(  Still need some
>> investigate on their production system.
> 
> I reported simular issue for FreeBSD 6.2 in audit-trail for kern/104406:
> 
> http://www.freebsd.org/cgi/query-pr.cgi?pr=104406&cat=
> 
> and there should be a thread related to this. Briefly, I suspects that this 
> is 
> related to nullfs filesystems on my server and when I cvsuped to FreeBSD 6.2-
> STABLE with Daichi's unionfs-related patches and replaced nullfs-mounted fs 
> with unionfs-mounted (that was done 10.03.07) problem is gone (seems to be 
> so, 
> at least).

Hmm...  Seems to be different issues.  The problem I have received was a
pgsql server (no nullfs/unionfs involved), and the hang always happen
when it is not being heavily loaded (usually in the morning, for
instance, and there is no special configuration, like scheduled tasks
which can generate disk load, etc., only the entropy harvesting), so
this is quite confusing.

Cheers,
-- 
Xin LI <[EMAIL PROTECTED]>  http://www.delphij.net/
FreeBSD - The Power to Serve!



signature.asc
Description: OpenPGP digital signature


Re: 6.2-STABLE deadlock?

2007-04-24 Thread Eugene Grosbein
Kostik Belousov wrote:

> I would suspect the mlx controller. There is several processes (for instance,
> 988, 50918) waiting for completion of block read, and processes in the "ufs"
> states are the result of the lock cascade, IMHO.

It may be possible that controller is not guilty.

You can easily reproduce lock in "ufs" state with commands from
the "How-To-Repeat" section of:
http://www.FreeBSD.org/cgi/query-pr.cgi?pr=kern/107439

The PR is closed but the problem still exists in recent 6.2-STABLE.
GENERIC has the problem too, GENERIC+INVARIANTS panices at once
instead of producing locked processes.

Eugene Grosbein.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


RE: 6.2-STABLE deadlock?

2007-04-24 Thread Jan Mikkelsen
Oleg Derevenetz wrote:
> [ ... ] 
> I reported simular issue for FreeBSD 6.2 in audit-trail for 
> kern/104406:
> 
> http://www.freebsd.org/cgi/query-pr.cgi?pr=104406&cat=
> 
> and there should be a thread related to this. Briefly, I 
> suspects that this is 
> related to nullfs filesystems on my server and when I cvsuped 
> to FreeBSD 6.2-
> STABLE with Daichi's unionfs-related patches and replaced 
> nullfs-mounted fs 
> with unionfs-mounted (that was done 10.03.07) problem is gone 
> (seems to be so, 
> at least).

Interesting.  In the instance I saw, there were also nullfs filesystems
mounted.

Regards,

Jan.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


RE: 6.2-STABLE deadlock?

2007-04-24 Thread Jan Mikkelsen
LI Xin wrote:
> Kostik Belousov wrote:
> > On Mon, Apr 23, 2007 at 03:56:32AM +0100, Adrian Wontroba wrote:
> >> On Tue, Mar 13, 2007 at 02:08:48PM +, Adrian Wontroba wrote:
> >>> At work, amoungst my stable of old computers running 
> FreeBSD, I have a
> >>> Fujitsu M800 - a 4 Zeon SMP processor with 4 GB of memory. This
> >>> primarily runs Nagios and a small and lightly used MySQL 
> database, along
> >>> with a few inbound FTP transfers per minute. It has a 
> Mylex card based
> >>> disc subsystem, ruling out crash dumps.
> >>>
> >>> At some point during 5.5-STABLE this machine started to 
> occasionally hang ...
> >> Another 6-STABLE (cvsupped on 27/03/07) example, with 
> diagnostics taken
> >> rather sooner after the hang.  Processes with wmesg=ufs 
> feature often in
> >> the ps output.
> >>
> >> http://www.stade.co.uk/crash1/
> > 
> > I would suspect the mlx controller. There is several 
> processes (for instance,
> > 988, 50918) waiting for completion of block read, and 
> processes in the "ufs"
> > states are the result of the lock cascade, IMHO.
> 
> I'm not very sure if this is specific to one disk controller. 
>  Actually
> I got some occasional reports about similar hangs on amd64 6.2-RELEASE
> (slightly patched version) that most of processes stuck in the 'ufs'
> state, under very light load, the box was equipped with amr(4) RAID.
> 
> I was not able to reproduce the problem at my lab, though, it's still
> unknown that how to trigger the livelock :-(  Still need some
> investigate on their production system.

I have seen something similar once, on a machine with an Areca (arcmsr)
controller, running 6.2-RELEASE (with unionfs patches).  Processes stuck in
"ufs", and the machine needed physical intervention to reboot.  I haven't
seen it since.  From memory, it happened during startup of the applications
and jails on the machine.

Jan.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: 6.2-STABLE deadlock?

2007-04-24 Thread Oleg Derevenetz
Цитирую LI Xin <[EMAIL PROTECTED]>:

> Kostik Belousov wrote:
> > On Mon, Apr 23, 2007 at 03:56:32AM +0100, Adrian Wontroba wrote:
> >> On Tue, Mar 13, 2007 at 02:08:48PM +, Adrian Wontroba wrote:
> >>> At work, amoungst my stable of old computers running FreeBSD, I have
> a
> >>> Fujitsu M800 - a 4 Zeon SMP processor with 4 GB of memory. This
> >>> primarily runs Nagios and a small and lightly used MySQL database,
> along
> >>> with a few inbound FTP transfers per minute. It has a Mylex card
> based
> >>> disc subsystem, ruling out crash dumps.
> >>>
> >>> At some point during 5.5-STABLE this machine started to occasionally
> hang ...
> >> Another 6-STABLE (cvsupped on 27/03/07) example, with diagnostics
> taken
> >> rather sooner after the hang.  Processes with wmesg=ufs feature often
> in
> >> the ps output.
> >>
> >> http://www.stade.co.uk/crash1/
> > 
> > I would suspect the mlx controller. There is several processes (for
> instance,
> > 988, 50918) waiting for completion of block read, and processes in the
> "ufs"
> > states are the result of the lock cascade, IMHO.
> 
> I'm not very sure if this is specific to one disk controller.  Actually
> I got some occasional reports about similar hangs on amd64 6.2-RELEASE
> (slightly patched version) that most of processes stuck in the 'ufs'
> state, under very light load, the box was equipped with amr(4) RAID.
> 
> I was not able to reproduce the problem at my lab, though, it's still
> unknown that how to trigger the livelock :-(  Still need some
> investigate on their production system.

I reported simular issue for FreeBSD 6.2 in audit-trail for kern/104406:

http://www.freebsd.org/cgi/query-pr.cgi?pr=104406&cat=

and there should be a thread related to this. Briefly, I suspects that this is 
related to nullfs filesystems on my server and when I cvsuped to FreeBSD 6.2-
STABLE with Daichi's unionfs-related patches and replaced nullfs-mounted fs 
with unionfs-mounted (that was done 10.03.07) problem is gone (seems to be so, 
at least).
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: 6.2-STABLE deadlock?

2007-04-24 Thread LI Xin
Kostik Belousov wrote:
> On Mon, Apr 23, 2007 at 03:56:32AM +0100, Adrian Wontroba wrote:
>> On Tue, Mar 13, 2007 at 02:08:48PM +, Adrian Wontroba wrote:
>>> At work, amoungst my stable of old computers running FreeBSD, I have a
>>> Fujitsu M800 - a 4 Zeon SMP processor with 4 GB of memory. This
>>> primarily runs Nagios and a small and lightly used MySQL database, along
>>> with a few inbound FTP transfers per minute. It has a Mylex card based
>>> disc subsystem, ruling out crash dumps.
>>>
>>> At some point during 5.5-STABLE this machine started to occasionally hang 
>>> ...
>> Another 6-STABLE (cvsupped on 27/03/07) example, with diagnostics taken
>> rather sooner after the hang.  Processes with wmesg=ufs feature often in
>> the ps output.
>>
>> http://www.stade.co.uk/crash1/
> 
> I would suspect the mlx controller. There is several processes (for instance,
> 988, 50918) waiting for completion of block read, and processes in the "ufs"
> states are the result of the lock cascade, IMHO.

I'm not very sure if this is specific to one disk controller.  Actually
I got some occasional reports about similar hangs on amd64 6.2-RELEASE
(slightly patched version) that most of processes stuck in the 'ufs'
state, under very light load, the box was equipped with amr(4) RAID.

I was not able to reproduce the problem at my lab, though, it's still
unknown that how to trigger the livelock :-(  Still need some
investigate on their production system.

Cheers,
-- 
Xin LI <[EMAIL PROTECTED]>  http://www.delphij.net/
FreeBSD - The Power to Serve!



signature.asc
Description: OpenPGP digital signature


Re: 6.2-STABLE deadlock?

2007-04-23 Thread Kostik Belousov
On Mon, Apr 23, 2007 at 03:56:32AM +0100, Adrian Wontroba wrote:
> On Tue, Mar 13, 2007 at 02:08:48PM +, Adrian Wontroba wrote:
> > At work, amoungst my stable of old computers running FreeBSD, I have a
> > Fujitsu M800 - a 4 Zeon SMP processor with 4 GB of memory. This
> > primarily runs Nagios and a small and lightly used MySQL database, along
> > with a few inbound FTP transfers per minute. It has a Mylex card based
> > disc subsystem, ruling out crash dumps.
> > 
> > At some point during 5.5-STABLE this machine started to occasionally hang 
> > ...
> 
> Another 6-STABLE (cvsupped on 27/03/07) example, with diagnostics taken
> rather sooner after the hang.  Processes with wmesg=ufs feature often in
> the ps output.
> 
> http://www.stade.co.uk/crash1/

I would suspect the mlx controller. There is several processes (for instance,
988, 50918) waiting for completion of block read, and processes in the "ufs"
states are the result of the lock cascade, IMHO.


pgpToTKzpBFHu.pgp
Description: PGP signature


Re: 6.2-STABLE deadlock?

2007-04-22 Thread Adrian Wontroba
On Tue, Mar 13, 2007 at 02:08:48PM +, Adrian Wontroba wrote:
> At work, amoungst my stable of old computers running FreeBSD, I have a
> Fujitsu M800 - a 4 Zeon SMP processor with 4 GB of memory. This
> primarily runs Nagios and a small and lightly used MySQL database, along
> with a few inbound FTP transfers per minute. It has a Mylex card based
> disc subsystem, ruling out crash dumps.
> 
> At some point during 5.5-STABLE this machine started to occasionally hang ...

Another 6-STABLE (cvsupped on 27/03/07) example, with diagnostics taken
rather sooner after the hang.  Processes with wmesg=ufs feature often in
the ps output.

http://www.stade.co.uk/crash1/

-- 
Adrian Wontroba
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


6.2-STABLE deadlock?

2007-03-13 Thread Adrian Wontroba
At work, amoungst my stable of old computers running FreeBSD, I have a
Fujitsu M800 - a 4 Zeon SMP processor with 4 GB of memory. This
primarily runs Nagios and a small and lightly used MySQL database, along
with a few inbound FTP transfers per minute. It has a Mylex card based
disc subsystem, ruling out crash dumps.

At some point during 5.5-STABLE this machine started to occasionally hang while
performing its daily "application" housekeeping - closing and restarting
Apache and Nagios, and dumping the database. Upgrading to 6.2-STABLE
appeared to solve the problem, with no problems visible while running
1,000 cycles of the sequence which seemed to provoke the problem.

cvsup for this version of the kernel and userland was run at 01:20 GMT
on 06 March.

However, shortly after 15:15 last Sunday afternoon the machine hung
again "out of the blue". kdb diagnostics were taken some 12 hours later,
and look somewhat odd. Maybe it was left to fester for too long.

ps etc output at http://www.stade.co.uk/crash/console which contains
boot to boot serial console output, including some output from test
cycles. I'd be grateful for any expert comments on the ps etc output.

Supporting stuff. 

[EMAIL PROTECTED] ~/crash]# df
Filesystem1K-blocks UsedAvail Capacity  Mounted on
/dev/mlxd0s1a50763070074   39694615%/
devfs 110   100%/dev
/dev/mlxd0s1f  63541498 44355014 1410316676%/home
/dev/mlxd0s1e  16244334  6784900  815988845%/usr
/dev/mlxd0s1d   1012974   117456   81448213%/var
/dev/md0   1646   32 1484 2%/home/topftp/instances
/dev/md1 253678  132   233252 0%/tmp

[EMAIL PROTECTED] ~]# find /var -inum 23 -ls
234 -rw-r--r--1 daemon   daemon 60 Mar 
12 20:22 /var/rwho/whod.xjamesfriis

Problem stopped http and FTP logging soon after 15:14 on Sunday 11, diagnostics 
taken and machine rebooted around 04:30 on Monday 12.

172.19.112.92 - - [11/Mar/2007:15:14:53 +] "GET / HTTP/1.0" 200 688 "-" 
"check_http/1.89 (nagios-plugins 1.4.3)"

172.19.112.92 - - [12/Mar/2007:04:44:14 +] "GET / HTTP/1.0" 200 688 "-" 
"check_http/1.89 (nagios-plugins 1.4.3)"

Mar 11 15:15:35 beastie ftpd[91652]: connection from appsupcen (10.208.1.134)
Mar 11 15:15:35 beastie ftpd[91652]: FTP LOGIN FROM appsupcen as topftp
Mar 11 15:15:35 beastie ftpd[91652]: session root changed to 
/home/topftp/instances
Mar 11 15:15:35 beastie ftpd[91652]: put in.env_status.html.gz = 592 bytes (wd: 
/topftp/appsupcen; chrooted)

Mar 11 15:15:35 beastie ftpd[91652]: rename in.env_status.html.gz 
env_status.html.gz (wd: /topftp/appsupcen; chrooted)
Mar 12 04:44:31 beastie ftpd[1161]: connection from appsupcen (10.208.1.134)
Mar 12 04:44:31 beastie ftpd[1161]: FTP LOGIN FROM appsupcen as topftp
Mar 12 04:44:31 beastie ftpd[1161]: session root changed to 
/home/topftp/instances
Mar 12 04:44:31 beastie ftpd[1161]: mkdir topftp/appsupcen (wd: /; chrooted)

Support diary:

15:20
Beastie seems like its crashed and down;

16:54
Beastie is now longer pingable by rjmon1;

04:30 - 04:43
(support person quoting from the documentation I'd provided about what
to do after a hang)
Type "return tilde hash" (CR~#) which will make cu send a break signal to 
beastie, and should cause beastie to drop into the ddb kernel debugger.
In the following, you may see "more" prompts. Type space at each for the next 
page.
Type these debugger commands
ps
show pcpu
show allpcpu
show locks
show alllocks
show lockedvnods
trace
alltrace
04:43 - beastie now back up and working now by typing call cpu_reset()
after the above commands to reboot beastie.

AW: preserved and inspected diagnostic output. It looks very unlike
that for previous crashes (without a serial console) where a noticable
feature was many ftpd processes in a UFS state. Possibly "things
happened" in the 12 hour period between the onset of the problem on
Sunday afternoon and the diagnostics being taken on Monday morning.

-- 
Adrian Wontroba
Adrian's Birthday Celebration: Crewe Limelight, Saturday 17 March. David
Hughes and Tiny Tin Lady.  Free but ticketed - email me your postal
address if you want to come. No under 18s.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"