Re: How to report bugs (Re: 6.2-STABLE deadlock?)
On Saturday 28 April 2007 04:33, Marc G. Fournier wrote: > A thought: how hard would it be to add some method of forcing a > system crash, that would dump core, from the command line? Something > that, by default, would be disabled, but for remote debugging > purposes, one could enable in the kernel and do a 'sysctl > kernel.force_core_crash=1' to have it do it? I imagine that having a > core to analyze would allow providing more information then nothing > at all, no? I think you can do this.. sysctl debug.kdb.panic=1 Alas that appears to be a -current thing. 6.x has debug.kdb.enter though. -- Daniel O'Connor software and network engineer for Genesis Software - http://www.gsoft.com.au "The nice thing about standards is that there are so many of them to choose from." -- Andrew Tanenbaum GPG Fingerprint - 5596 B766 97C0 0E94 4347 295E E593 DC20 7B3F CE8C pgp1RMXdUwoh1.pgp Description: PGP signature
Re: How to report bugs (Re: 6.2-STABLE deadlock?)
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 - --On Friday, April 27, 2007 22:57:29 +0200 Nicolas Rachinsky <[EMAIL PROTECTED]> wrote: > * "Marc G. Fournier" <[EMAIL PROTECTED]> [2007-04-27 16:03 -0300]: >> A thought: how hard would it be to add some method of forcing a system >> crash, that would dump core, from the command line? Something that, by >> default, would > > Doesn't 'kill -6 1' work anymore? I'd never heard of that one ... will it dump core if I do that? Please note, in my case, with the Buffer Space issue ... I can login and cleanly reboot the server, so doing something like the above to get a core dump is definitely doable, I'd just never seen a reference to a 'kill -6 1' before for doing that ... Side question to this though ... I remember awhile back using a 'client-server' mechanism that allowed me to dump core to a seperate server ... it was so long ago that my memory is faint, but there was a reason why I couldn't dump to the local server ... not sure whatever happened to that code, but, if one can do that for dumping core, shouldn't there be some method possible to connect to DDB over the Ethernet without having to have a serial console in place? For the core dump case, the ethernet obviously stayed up while it dump'd, couldn't some sort of 'ddb.conf' file be setup that would allow it to ifconfig an IP within that shell so that you could connect to it remotely? say with an 'from-ip' directive? Just a thought ... - Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email . [EMAIL PROTECTED] MSN . [EMAIL PROTECTED] Yahoo . yscrappy Skype: hub.orgICQ . 7615664 -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.5 (FreeBSD) iD8DBQFGMmx04QvfyHIvDvMRAlNcAJ0QcIMoRnq+0T9yJVuMwZvTNQnNXwCfaEKK JB4cHzSbiklD/sodWvNSSzE= =BwuL -END PGP SIGNATURE- ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: How to report bugs (Re: 6.2-STABLE deadlock?)
* "Marc G. Fournier" <[EMAIL PROTECTED]> [2007-04-27 16:03 -0300]: > A thought: how hard would it be to add some method of forcing a system crash, > that would dump core, from the command line? Something that, by default, > would Doesn't 'kill -6 1' work anymore? Nicolas -- http://www.rachinsky.de/nicolas ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: How to report bugs (Re: 6.2-STABLE deadlock?)
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 - --On Tuesday, April 24, 2007 23:53:16 -0400 Kris Kennaway <[EMAIL PROTECTED]> wrote: > On Wed, Apr 25, 2007 at 10:53:08AM +0800, LI Xin wrote: >> Hi, Oleg, >> >> Oleg Derevenetz wrote: >> > ??? LI Xin <[EMAIL PROTECTED]>: >> [...] >> >> I'm not very sure if this is specific to one disk controller. Actually >> >> I got some occasional reports about similar hangs on amd64 6.2-RELEASE >> >> (slightly patched version) that most of processes stuck in the 'ufs' >> >> state, under very light load, the box was equipped with amr(4) RAID. >> >> >> >> I was not able to reproduce the problem at my lab, though, it's still >> >> unknown that how to trigger the livelock :-( Still need some >> >> investigate on their production system. >> > >> > I reported simular issue for FreeBSD 6.2 in audit-trail for kern/104406: >> > >> > http://www.freebsd.org/cgi/query-pr.cgi?pr=104406&cat= >> > >> > and there should be a thread related to this. Briefly, I suspects that >> > this is related to nullfs filesystems on my server and when I cvsuped to >> > FreeBSD 6.2- STABLE with Daichi's unionfs-related patches and replaced >> > nullfs-mounted fs with unionfs-mounted (that was done 10.03.07) problem >> > is gone (seems to be so, at least). >> >> Hmm... Seems to be different issues. The problem I have received was a >> pgsql server (no nullfs/unionfs involved), and the hang always happen >> when it is not being heavily loaded (usually in the morning, for >> instance, and there is no special configuration, like scheduled tasks >> which can generate disk load, etc., only the entropy harvesting), so >> this is quite confusing. > > Yes, a large part of the confusion is the unfortunate tendency of > people to do the following: > > my system hangs/panics/etc > my system hangs/panics/etc too; it must be the same problem! > > What we really need is for every FreeBSD user who encounters a > hang/panic/etc to avoid jumping to conclusions -- no matter how many > superficial similarities there may seem to you -- and instead go > through the relevant steps described here: > > > http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kernelde > bug.html > > Until you (or a developer) have analyzed the resulting information, > you cannot definitively determine whether or not your problem is the > same as a given random other problem, and you may just confuse the > issue by making claims of similarity when you are really reporting a > completely separate problem. What about those that don't have the benefit of being able to access the console? :( I've recently started buying servers that have builtin, full remote console (ie. the HP servers), but, for instance, I have one box that I have to consistently reboot ever 3 days due to a 'No Buffer Space Available' ... A thought: how hard would it be to add some method of forcing a system crash, that would dump core, from the command line? Something that, by default, would be disabled, but for remote debugging purposes, one could enable in the kernel and do a 'sysctl kernel.force_core_crash=1' to have it do it? I imagine that having a core to analyze would allow providing more information then nothing at all, no? - Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email . [EMAIL PROTECTED] MSN . [EMAIL PROTECTED] Yahoo . yscrappy Skype: hub.orgICQ . 7615664 -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.5 (FreeBSD) iD8DBQFGMkj34QvfyHIvDvMRAnIsAJ42loBGh0TkX4mfWSrZrMq2FheBuQCgiu4l B0PCLtLhd9ZiJ4oNLWZ6LT0= =KK9Y -END PGP SIGNATURE- ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: 6.2-STABLE deadlock?
On Thu, Apr 26, 2007 at 02:01:57AM +0100, Adrian Wontroba wrote: > On Mon, Apr 23, 2007 at 03:56:32AM +0100, Adrian Wontroba wrote: > > Another 6-STABLE (cvsupped on 27/03/07) example, with diagnostics taken > > rather sooner after the hang. Processes with wmesg=ufs feature often in > > the ps output. > > Thanks for the assorted replies. I'll try using INVARIANTS, as I'd much > prefer a panic and automatic reboot rather than creeping death for this > server which is only attended 06:00-22:00 weekdays yet is a critical > point in our 24*7 monitoring. Sigh. Champagne tastes on a beer budget. > > Other background information (from memory as I can't access it from > here): > I'm not using unionfs / nullfs. > I am using MFS and softupdates. > NFS is in occassional use. > NTFS is not used. > The server is ancient. A 4 way Zeon, state of the art in 1998. > The problem has gone from "absent" through "reproducible if you > try hard enough" to "strikes according to Murphy" through 5.5-STABLE to > 6.2-STABLE. > > Once the sendmail milter ABI damage is fixed I'll bring the machine up > to date - it is also the build box for a dozen or so machines running > something close to 6.2-RELEASE, and I'd rather not upgrade all of them > them when the next ClamAV release appears. Fixed the other day, FYI. > If / when it hangs again, I'll include more information and maybe even > raise a real PR. Thanks. Kris ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: How to report bugs (Re: 6.2-STABLE deadlock?)
On Wed, Apr 25, 2007 at 10:20:25PM +0400, Oleg Derevenetz wrote: > ??? Kris Kennaway <[EMAIL PROTECTED]>: > > > On Wed, Apr 25, 2007 at 12:14:20PM +0400, Oleg Derevenetz wrote: > > > > > > Until you (or a developer) have analyzed the resulting information, > > > > you cannot definitively determine whether or not your problem is > > the > > > > same as a given random other problem, and you may just confuse the > > > > issue by making claims of similarity when you are really reporting > > a > > > > completely separate problem. > > > > > > Not all people can do deadlock debugging, though. In my case turning > > on > > > INVARIANTS and WITNESS leads to unacceptable performance penalty due > > to heavily > > > loaded server. So I can only describe my case, actions and result > > without > > > providing any debug information. > > > > But you can still do *some* things, e.g. backtraces and/or a coredump: > > every little bit helps. > > > > Ultimately, though, you have to understand and accept that the less > > information you provide, the less chance there is that a developer > > will be able to track down your problem. In fact a developer may have > > to effectively ignore your problem report altogether, because of what > > I explained about "symptoms" usually not being enough to tell one bug > > from another. > > > > In general, when you encounter a bug in FreeBSD, you have a little bit > > of work to do on your side before we can start doing the rest. I > > understand that you may not be in a position to do that work, but that > > means you also need to understand that we can't do it either. > > In fact, I solved (or workarounded) this problem for me, so in this thread I > provide my workaround as possible workaround for users that experiences the > same problem. This only hint for them, and not a bugreport for you. I could > not > provide a full (or only partial) debug information because I will not back > out > cvsuped sources, will not replace unionfs with nullfs again and will not wait > week or more for another stuck. OK. FYI I use nullfs on a few dozen heavily loaded machines without issue for the past year or so, so if you are seeing a nullfs issue it is probably an obscure one. Kris pgpIUn3mCMoxg.pgp Description: PGP signature
Re: 6.2-STABLE deadlock?
On Mon, Apr 23, 2007 at 03:56:32AM +0100, Adrian Wontroba wrote: > Another 6-STABLE (cvsupped on 27/03/07) example, with diagnostics taken > rather sooner after the hang. Processes with wmesg=ufs feature often in > the ps output. Thanks for the assorted replies. I'll try using INVARIANTS, as I'd much prefer a panic and automatic reboot rather than creeping death for this server which is only attended 06:00-22:00 weekdays yet is a critical point in our 24*7 monitoring. Sigh. Champagne tastes on a beer budget. Other background information (from memory as I can't access it from here): I'm not using unionfs / nullfs. I am using MFS and softupdates. NFS is in occassional use. NTFS is not used. The server is ancient. A 4 way Zeon, state of the art in 1998. The problem has gone from "absent" through "reproducible if you try hard enough" to "strikes according to Murphy" through 5.5-STABLE to 6.2-STABLE. Once the sendmail milter ABI damage is fixed I'll bring the machine up to date - it is also the build box for a dozen or so machines running something close to 6.2-RELEASE, and I'd rather not upgrade all of them them when the next ClamAV release appears. If / when it hangs again, I'll include more information and maybe even raise a real PR. -- Adrian Wontroba It's always a long day; 86400 doesn't fit into a short. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: How to report bugs (Re: 6.2-STABLE deadlock?)
Цитирую Kris Kennaway <[EMAIL PROTECTED]>: > On Wed, Apr 25, 2007 at 12:14:20PM +0400, Oleg Derevenetz wrote: > > > > Until you (or a developer) have analyzed the resulting information, > > > you cannot definitively determine whether or not your problem is > the > > > same as a given random other problem, and you may just confuse the > > > issue by making claims of similarity when you are really reporting > a > > > completely separate problem. > > > > Not all people can do deadlock debugging, though. In my case turning > on > > INVARIANTS and WITNESS leads to unacceptable performance penalty due > to heavily > > loaded server. So I can only describe my case, actions and result > without > > providing any debug information. > > But you can still do *some* things, e.g. backtraces and/or a coredump: > every little bit helps. > > Ultimately, though, you have to understand and accept that the less > information you provide, the less chance there is that a developer > will be able to track down your problem. In fact a developer may have > to effectively ignore your problem report altogether, because of what > I explained about "symptoms" usually not being enough to tell one bug > from another. > > In general, when you encounter a bug in FreeBSD, you have a little bit > of work to do on your side before we can start doing the rest. I > understand that you may not be in a position to do that work, but that > means you also need to understand that we can't do it either. In fact, I solved (or workarounded) this problem for me, so in this thread I provide my workaround as possible workaround for users that experiences the same problem. This only hint for them, and not a bugreport for you. I could not provide a full (or only partial) debug information because I will not back out cvsuped sources, will not replace unionfs with nullfs again and will not wait week or more for another stuck. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: How to report bugs (Re: 6.2-STABLE deadlock?)
On Wed, Apr 25, 2007 at 12:14:20PM +0400, Oleg Derevenetz wrote: > > Until you (or a developer) have analyzed the resulting information, > > you cannot definitively determine whether or not your problem is the > > same as a given random other problem, and you may just confuse the > > issue by making claims of similarity when you are really reporting a > > completely separate problem. > > Not all people can do deadlock debugging, though. In my case turning on > INVARIANTS and WITNESS leads to unacceptable performance penalty due to > heavily > loaded server. So I can only describe my case, actions and result without > providing any debug information. But you can still do *some* things, e.g. backtraces and/or a coredump: every little bit helps. Ultimately, though, you have to understand and accept that the less information you provide, the less chance there is that a developer will be able to track down your problem. In fact a developer may have to effectively ignore your problem report altogether, because of what I explained about "symptoms" usually not being enough to tell one bug from another. In general, when you encounter a bug in FreeBSD, you have a little bit of work to do on your side before we can start doing the rest. I understand that you may not be in a position to do that work, but that means you also need to understand that we can't do it either. Kris pgpe7wGSIKiIP.pgp Description: PGP signature
Re: How to report bugs (Re: 6.2-STABLE deadlock?)
Oleg Derevenetz wrote: [snip] > Not all people can do deadlock debugging, though. In my case turning on > INVARIANTS and WITNESS leads to unacceptable performance penalty due to > heavily > loaded server. So I can only describe my case, actions and result without > providing any debug information. I'd say that I completely agree with Kris because that it's very hard for developers to investigate problems if there is no detailed information available, especially for those problems that can not easily reproduced. Of course, deadlock debugging could be tricky, but having a backtrace can usually save a lot of time (and fortunately that is not that hard even for average users :) What I wanted to suggest is that, we hope that the submitter can provide detailed steps to reliably reproduce the problem whenever possible, if they are not able to diagnose the problem themselves, so we will be able to extract more information at lab, and possibly reach a fix. The problem I have is that the reporter of the issue is not quite cooperative as they did before, and what I wanted to say is that it's possible to trigger the livelock without nullfs/unionfs, and I did not figured out why (yet) because I can not reproduce it in my environment :-( Cheers, -- Xin LI <[EMAIL PROTECTED]> http://www.delphij.net/ FreeBSD - The Power to Serve! signature.asc Description: OpenPGP digital signature
Re: How to report bugs (Re: 6.2-STABLE deadlock?)
Цитирую Kris Kennaway <[EMAIL PROTECTED]>: > > Oleg Derevenetz wrote: > > > ??? LI Xin <[EMAIL PROTECTED]>: > > [...] > > >> I'm not very sure if this is specific to one disk controller. > Actually > > >> I got some occasional reports about similar hangs on amd64 > 6.2-RELEASE > > >> (slightly patched version) that most of processes stuck in the > 'ufs' > > >> state, under very light load, the box was equipped with amr(4) > RAID. > > >> > > >> I was not able to reproduce the problem at my lab, though, it's > still > > >> unknown that how to trigger the livelock :-( Still need some > > >> investigate on their production system. > > > > > > I reported simular issue for FreeBSD 6.2 in audit-trail for > kern/104406: > > > > > > http://www.freebsd.org/cgi/query-pr.cgi?pr=104406&cat= > > > > > > and there should be a thread related to this. Briefly, I suspects > that this is > > > related to nullfs filesystems on my server and when I cvsuped to > FreeBSD 6.2- > > > STABLE with Daichi's unionfs-related patches and replaced > nullfs-mounted fs > > > with unionfs-mounted (that was done 10.03.07) problem is gone (seems > to be so, > > > at least). > > > > Hmm... Seems to be different issues. The problem I have received was > a > > pgsql server (no nullfs/unionfs involved), and the hang always happen > > when it is not being heavily loaded (usually in the morning, for > > instance, and there is no special configuration, like scheduled tasks > > which can generate disk load, etc., only the entropy harvesting), so > > this is quite confusing. > > Yes, a large part of the confusion is the unfortunate tendency of > people to do the following: > > my system hangs/panics/etc > my system hangs/panics/etc too; it must be the same problem! > > What we really need is for every FreeBSD user who encounters a > hang/panic/etc to avoid jumping to conclusions -- no matter how many > superficial similarities there may seem to you -- and instead go > through the relevant steps described here: > > > http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers- handbook/kerneldebug.html > > Until you (or a developer) have analyzed the resulting information, > you cannot definitively determine whether or not your problem is the > same as a given random other problem, and you may just confuse the > issue by making claims of similarity when you are really reporting a > completely separate problem. Not all people can do deadlock debugging, though. In my case turning on INVARIANTS and WITNESS leads to unacceptable performance penalty due to heavily loaded server. So I can only describe my case, actions and result without providing any debug information. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
How to report bugs (Re: 6.2-STABLE deadlock?)
On Wed, Apr 25, 2007 at 10:53:08AM +0800, LI Xin wrote: > Hi, Oleg, > > Oleg Derevenetz wrote: > > ??? LI Xin <[EMAIL PROTECTED]>: > [...] > >> I'm not very sure if this is specific to one disk controller. Actually > >> I got some occasional reports about similar hangs on amd64 6.2-RELEASE > >> (slightly patched version) that most of processes stuck in the 'ufs' > >> state, under very light load, the box was equipped with amr(4) RAID. > >> > >> I was not able to reproduce the problem at my lab, though, it's still > >> unknown that how to trigger the livelock :-( Still need some > >> investigate on their production system. > > > > I reported simular issue for FreeBSD 6.2 in audit-trail for kern/104406: > > > > http://www.freebsd.org/cgi/query-pr.cgi?pr=104406&cat= > > > > and there should be a thread related to this. Briefly, I suspects that this > > is > > related to nullfs filesystems on my server and when I cvsuped to FreeBSD > > 6.2- > > STABLE with Daichi's unionfs-related patches and replaced nullfs-mounted fs > > with unionfs-mounted (that was done 10.03.07) problem is gone (seems to be > > so, > > at least). > > Hmm... Seems to be different issues. The problem I have received was a > pgsql server (no nullfs/unionfs involved), and the hang always happen > when it is not being heavily loaded (usually in the morning, for > instance, and there is no special configuration, like scheduled tasks > which can generate disk load, etc., only the entropy harvesting), so > this is quite confusing. Yes, a large part of the confusion is the unfortunate tendency of people to do the following: my system hangs/panics/etc my system hangs/panics/etc too; it must be the same problem! What we really need is for every FreeBSD user who encounters a hang/panic/etc to avoid jumping to conclusions -- no matter how many superficial similarities there may seem to you -- and instead go through the relevant steps described here: http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug.html Until you (or a developer) have analyzed the resulting information, you cannot definitively determine whether or not your problem is the same as a given random other problem, and you may just confuse the issue by making claims of similarity when you are really reporting a completely separate problem. Thanks, Kris pgp3OkN96LYEW.pgp Description: PGP signature
Re: 6.2-STABLE deadlock?
On Wed, Apr 25, 2007 at 11:53:32AM +1000, Jan Mikkelsen wrote: > LI Xin wrote: > > Kostik Belousov wrote: > > > On Mon, Apr 23, 2007 at 03:56:32AM +0100, Adrian Wontroba wrote: > > >> On Tue, Mar 13, 2007 at 02:08:48PM +, Adrian Wontroba wrote: > > >>> At work, amoungst my stable of old computers running > > FreeBSD, I have a > > >>> Fujitsu M800 - a 4 Zeon SMP processor with 4 GB of memory. This > > >>> primarily runs Nagios and a small and lightly used MySQL > > database, along > > >>> with a few inbound FTP transfers per minute. It has a > > Mylex card based > > >>> disc subsystem, ruling out crash dumps. > > >>> > > >>> At some point during 5.5-STABLE this machine started to > > occasionally hang ... > > >> Another 6-STABLE (cvsupped on 27/03/07) example, with > > diagnostics taken > > >> rather sooner after the hang. Processes with wmesg=ufs > > feature often in > > >> the ps output. > > >> > > >> http://www.stade.co.uk/crash1/ > > > > > > I would suspect the mlx controller. There is several > > processes (for instance, > > > 988, 50918) waiting for completion of block read, and > > processes in the "ufs" > > > states are the result of the lock cascade, IMHO. > > > > I'm not very sure if this is specific to one disk controller. > > Actually > > I got some occasional reports about similar hangs on amd64 6.2-RELEASE > > (slightly patched version) that most of processes stuck in the 'ufs' > > state, under very light load, the box was equipped with amr(4) RAID. > > > > I was not able to reproduce the problem at my lab, though, it's still > > unknown that how to trigger the livelock :-( Still need some > > investigate on their production system. > > I have seen something similar once, on a machine with an Areca (arcmsr) > controller, running 6.2-RELEASE (with unionfs patches). Processes stuck in > "ufs", and the machine needed physical intervention to reboot. I haven't > seen it since. From memory, it happened during startup of the applications > and jails on the machine. Sounds like one of the known unionfs bugs. Kris ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: 6.2-STABLE deadlock?
Hi, Oleg, Oleg Derevenetz wrote: > Цитирую LI Xin <[EMAIL PROTECTED]>: [...] >> I'm not very sure if this is specific to one disk controller. Actually >> I got some occasional reports about similar hangs on amd64 6.2-RELEASE >> (slightly patched version) that most of processes stuck in the 'ufs' >> state, under very light load, the box was equipped with amr(4) RAID. >> >> I was not able to reproduce the problem at my lab, though, it's still >> unknown that how to trigger the livelock :-( Still need some >> investigate on their production system. > > I reported simular issue for FreeBSD 6.2 in audit-trail for kern/104406: > > http://www.freebsd.org/cgi/query-pr.cgi?pr=104406&cat= > > and there should be a thread related to this. Briefly, I suspects that this > is > related to nullfs filesystems on my server and when I cvsuped to FreeBSD 6.2- > STABLE with Daichi's unionfs-related patches and replaced nullfs-mounted fs > with unionfs-mounted (that was done 10.03.07) problem is gone (seems to be > so, > at least). Hmm... Seems to be different issues. The problem I have received was a pgsql server (no nullfs/unionfs involved), and the hang always happen when it is not being heavily loaded (usually in the morning, for instance, and there is no special configuration, like scheduled tasks which can generate disk load, etc., only the entropy harvesting), so this is quite confusing. Cheers, -- Xin LI <[EMAIL PROTECTED]> http://www.delphij.net/ FreeBSD - The Power to Serve! signature.asc Description: OpenPGP digital signature
Re: 6.2-STABLE deadlock?
Kostik Belousov wrote: > I would suspect the mlx controller. There is several processes (for instance, > 988, 50918) waiting for completion of block read, and processes in the "ufs" > states are the result of the lock cascade, IMHO. It may be possible that controller is not guilty. You can easily reproduce lock in "ufs" state with commands from the "How-To-Repeat" section of: http://www.FreeBSD.org/cgi/query-pr.cgi?pr=kern/107439 The PR is closed but the problem still exists in recent 6.2-STABLE. GENERIC has the problem too, GENERIC+INVARIANTS panices at once instead of producing locked processes. Eugene Grosbein. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
RE: 6.2-STABLE deadlock?
Oleg Derevenetz wrote: > [ ... ] > I reported simular issue for FreeBSD 6.2 in audit-trail for > kern/104406: > > http://www.freebsd.org/cgi/query-pr.cgi?pr=104406&cat= > > and there should be a thread related to this. Briefly, I > suspects that this is > related to nullfs filesystems on my server and when I cvsuped > to FreeBSD 6.2- > STABLE with Daichi's unionfs-related patches and replaced > nullfs-mounted fs > with unionfs-mounted (that was done 10.03.07) problem is gone > (seems to be so, > at least). Interesting. In the instance I saw, there were also nullfs filesystems mounted. Regards, Jan. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
RE: 6.2-STABLE deadlock?
LI Xin wrote: > Kostik Belousov wrote: > > On Mon, Apr 23, 2007 at 03:56:32AM +0100, Adrian Wontroba wrote: > >> On Tue, Mar 13, 2007 at 02:08:48PM +, Adrian Wontroba wrote: > >>> At work, amoungst my stable of old computers running > FreeBSD, I have a > >>> Fujitsu M800 - a 4 Zeon SMP processor with 4 GB of memory. This > >>> primarily runs Nagios and a small and lightly used MySQL > database, along > >>> with a few inbound FTP transfers per minute. It has a > Mylex card based > >>> disc subsystem, ruling out crash dumps. > >>> > >>> At some point during 5.5-STABLE this machine started to > occasionally hang ... > >> Another 6-STABLE (cvsupped on 27/03/07) example, with > diagnostics taken > >> rather sooner after the hang. Processes with wmesg=ufs > feature often in > >> the ps output. > >> > >> http://www.stade.co.uk/crash1/ > > > > I would suspect the mlx controller. There is several > processes (for instance, > > 988, 50918) waiting for completion of block read, and > processes in the "ufs" > > states are the result of the lock cascade, IMHO. > > I'm not very sure if this is specific to one disk controller. > Actually > I got some occasional reports about similar hangs on amd64 6.2-RELEASE > (slightly patched version) that most of processes stuck in the 'ufs' > state, under very light load, the box was equipped with amr(4) RAID. > > I was not able to reproduce the problem at my lab, though, it's still > unknown that how to trigger the livelock :-( Still need some > investigate on their production system. I have seen something similar once, on a machine with an Areca (arcmsr) controller, running 6.2-RELEASE (with unionfs patches). Processes stuck in "ufs", and the machine needed physical intervention to reboot. I haven't seen it since. From memory, it happened during startup of the applications and jails on the machine. Jan. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: 6.2-STABLE deadlock?
Цитирую LI Xin <[EMAIL PROTECTED]>: > Kostik Belousov wrote: > > On Mon, Apr 23, 2007 at 03:56:32AM +0100, Adrian Wontroba wrote: > >> On Tue, Mar 13, 2007 at 02:08:48PM +, Adrian Wontroba wrote: > >>> At work, amoungst my stable of old computers running FreeBSD, I have > a > >>> Fujitsu M800 - a 4 Zeon SMP processor with 4 GB of memory. This > >>> primarily runs Nagios and a small and lightly used MySQL database, > along > >>> with a few inbound FTP transfers per minute. It has a Mylex card > based > >>> disc subsystem, ruling out crash dumps. > >>> > >>> At some point during 5.5-STABLE this machine started to occasionally > hang ... > >> Another 6-STABLE (cvsupped on 27/03/07) example, with diagnostics > taken > >> rather sooner after the hang. Processes with wmesg=ufs feature often > in > >> the ps output. > >> > >> http://www.stade.co.uk/crash1/ > > > > I would suspect the mlx controller. There is several processes (for > instance, > > 988, 50918) waiting for completion of block read, and processes in the > "ufs" > > states are the result of the lock cascade, IMHO. > > I'm not very sure if this is specific to one disk controller. Actually > I got some occasional reports about similar hangs on amd64 6.2-RELEASE > (slightly patched version) that most of processes stuck in the 'ufs' > state, under very light load, the box was equipped with amr(4) RAID. > > I was not able to reproduce the problem at my lab, though, it's still > unknown that how to trigger the livelock :-( Still need some > investigate on their production system. I reported simular issue for FreeBSD 6.2 in audit-trail for kern/104406: http://www.freebsd.org/cgi/query-pr.cgi?pr=104406&cat= and there should be a thread related to this. Briefly, I suspects that this is related to nullfs filesystems on my server and when I cvsuped to FreeBSD 6.2- STABLE with Daichi's unionfs-related patches and replaced nullfs-mounted fs with unionfs-mounted (that was done 10.03.07) problem is gone (seems to be so, at least). ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: 6.2-STABLE deadlock?
Kostik Belousov wrote: > On Mon, Apr 23, 2007 at 03:56:32AM +0100, Adrian Wontroba wrote: >> On Tue, Mar 13, 2007 at 02:08:48PM +, Adrian Wontroba wrote: >>> At work, amoungst my stable of old computers running FreeBSD, I have a >>> Fujitsu M800 - a 4 Zeon SMP processor with 4 GB of memory. This >>> primarily runs Nagios and a small and lightly used MySQL database, along >>> with a few inbound FTP transfers per minute. It has a Mylex card based >>> disc subsystem, ruling out crash dumps. >>> >>> At some point during 5.5-STABLE this machine started to occasionally hang >>> ... >> Another 6-STABLE (cvsupped on 27/03/07) example, with diagnostics taken >> rather sooner after the hang. Processes with wmesg=ufs feature often in >> the ps output. >> >> http://www.stade.co.uk/crash1/ > > I would suspect the mlx controller. There is several processes (for instance, > 988, 50918) waiting for completion of block read, and processes in the "ufs" > states are the result of the lock cascade, IMHO. I'm not very sure if this is specific to one disk controller. Actually I got some occasional reports about similar hangs on amd64 6.2-RELEASE (slightly patched version) that most of processes stuck in the 'ufs' state, under very light load, the box was equipped with amr(4) RAID. I was not able to reproduce the problem at my lab, though, it's still unknown that how to trigger the livelock :-( Still need some investigate on their production system. Cheers, -- Xin LI <[EMAIL PROTECTED]> http://www.delphij.net/ FreeBSD - The Power to Serve! signature.asc Description: OpenPGP digital signature
Re: 6.2-STABLE deadlock?
On Mon, Apr 23, 2007 at 03:56:32AM +0100, Adrian Wontroba wrote: > On Tue, Mar 13, 2007 at 02:08:48PM +, Adrian Wontroba wrote: > > At work, amoungst my stable of old computers running FreeBSD, I have a > > Fujitsu M800 - a 4 Zeon SMP processor with 4 GB of memory. This > > primarily runs Nagios and a small and lightly used MySQL database, along > > with a few inbound FTP transfers per minute. It has a Mylex card based > > disc subsystem, ruling out crash dumps. > > > > At some point during 5.5-STABLE this machine started to occasionally hang > > ... > > Another 6-STABLE (cvsupped on 27/03/07) example, with diagnostics taken > rather sooner after the hang. Processes with wmesg=ufs feature often in > the ps output. > > http://www.stade.co.uk/crash1/ I would suspect the mlx controller. There is several processes (for instance, 988, 50918) waiting for completion of block read, and processes in the "ufs" states are the result of the lock cascade, IMHO. pgpToTKzpBFHu.pgp Description: PGP signature
Re: 6.2-STABLE deadlock?
On Tue, Mar 13, 2007 at 02:08:48PM +, Adrian Wontroba wrote: > At work, amoungst my stable of old computers running FreeBSD, I have a > Fujitsu M800 - a 4 Zeon SMP processor with 4 GB of memory. This > primarily runs Nagios and a small and lightly used MySQL database, along > with a few inbound FTP transfers per minute. It has a Mylex card based > disc subsystem, ruling out crash dumps. > > At some point during 5.5-STABLE this machine started to occasionally hang ... Another 6-STABLE (cvsupped on 27/03/07) example, with diagnostics taken rather sooner after the hang. Processes with wmesg=ufs feature often in the ps output. http://www.stade.co.uk/crash1/ -- Adrian Wontroba ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
6.2-STABLE deadlock?
At work, amoungst my stable of old computers running FreeBSD, I have a Fujitsu M800 - a 4 Zeon SMP processor with 4 GB of memory. This primarily runs Nagios and a small and lightly used MySQL database, along with a few inbound FTP transfers per minute. It has a Mylex card based disc subsystem, ruling out crash dumps. At some point during 5.5-STABLE this machine started to occasionally hang while performing its daily "application" housekeeping - closing and restarting Apache and Nagios, and dumping the database. Upgrading to 6.2-STABLE appeared to solve the problem, with no problems visible while running 1,000 cycles of the sequence which seemed to provoke the problem. cvsup for this version of the kernel and userland was run at 01:20 GMT on 06 March. However, shortly after 15:15 last Sunday afternoon the machine hung again "out of the blue". kdb diagnostics were taken some 12 hours later, and look somewhat odd. Maybe it was left to fester for too long. ps etc output at http://www.stade.co.uk/crash/console which contains boot to boot serial console output, including some output from test cycles. I'd be grateful for any expert comments on the ps etc output. Supporting stuff. [EMAIL PROTECTED] ~/crash]# df Filesystem1K-blocks UsedAvail Capacity Mounted on /dev/mlxd0s1a50763070074 39694615%/ devfs 110 100%/dev /dev/mlxd0s1f 63541498 44355014 1410316676%/home /dev/mlxd0s1e 16244334 6784900 815988845%/usr /dev/mlxd0s1d 1012974 117456 81448213%/var /dev/md0 1646 32 1484 2%/home/topftp/instances /dev/md1 253678 132 233252 0%/tmp [EMAIL PROTECTED] ~]# find /var -inum 23 -ls 234 -rw-r--r--1 daemon daemon 60 Mar 12 20:22 /var/rwho/whod.xjamesfriis Problem stopped http and FTP logging soon after 15:14 on Sunday 11, diagnostics taken and machine rebooted around 04:30 on Monday 12. 172.19.112.92 - - [11/Mar/2007:15:14:53 +] "GET / HTTP/1.0" 200 688 "-" "check_http/1.89 (nagios-plugins 1.4.3)" 172.19.112.92 - - [12/Mar/2007:04:44:14 +] "GET / HTTP/1.0" 200 688 "-" "check_http/1.89 (nagios-plugins 1.4.3)" Mar 11 15:15:35 beastie ftpd[91652]: connection from appsupcen (10.208.1.134) Mar 11 15:15:35 beastie ftpd[91652]: FTP LOGIN FROM appsupcen as topftp Mar 11 15:15:35 beastie ftpd[91652]: session root changed to /home/topftp/instances Mar 11 15:15:35 beastie ftpd[91652]: put in.env_status.html.gz = 592 bytes (wd: /topftp/appsupcen; chrooted) Mar 11 15:15:35 beastie ftpd[91652]: rename in.env_status.html.gz env_status.html.gz (wd: /topftp/appsupcen; chrooted) Mar 12 04:44:31 beastie ftpd[1161]: connection from appsupcen (10.208.1.134) Mar 12 04:44:31 beastie ftpd[1161]: FTP LOGIN FROM appsupcen as topftp Mar 12 04:44:31 beastie ftpd[1161]: session root changed to /home/topftp/instances Mar 12 04:44:31 beastie ftpd[1161]: mkdir topftp/appsupcen (wd: /; chrooted) Support diary: 15:20 Beastie seems like its crashed and down; 16:54 Beastie is now longer pingable by rjmon1; 04:30 - 04:43 (support person quoting from the documentation I'd provided about what to do after a hang) Type "return tilde hash" (CR~#) which will make cu send a break signal to beastie, and should cause beastie to drop into the ddb kernel debugger. In the following, you may see "more" prompts. Type space at each for the next page. Type these debugger commands ps show pcpu show allpcpu show locks show alllocks show lockedvnods trace alltrace 04:43 - beastie now back up and working now by typing call cpu_reset() after the above commands to reboot beastie. AW: preserved and inspected diagnostic output. It looks very unlike that for previous crashes (without a serial console) where a noticable feature was many ftpd processes in a UFS state. Possibly "things happened" in the 12 hour period between the onset of the problem on Sunday afternoon and the diagnostics being taken on Monday morning. -- Adrian Wontroba Adrian's Birthday Celebration: Crewe Limelight, Saturday 17 March. David Hughes and Tiny Tin Lady. Free but ticketed - email me your postal address if you want to come. No under 18s. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"