Re: Possible scheduler (SCHED_ULE) bug?
On 10/23/09, Jaime Bozza jbo...@mindsites.com wrote: I believe I found a problem with the ULE scheduler - At least the fact that there is a problem, but I'm not sure where to go from here. The system locks all processes, but doesn't panic, so I have no output to give. I was able to duplicate this on three different machines and solved it by switching to the scheduler to 4BSD. Here's the environment: FreeBSD 7.2 i386, installed from bootonly ISO, Custom install, minimal, no other changes other than setting timezone, changing root password, and turning on sshd (allowing root and password connection). Running portsnap (fetch, then extract) to get latest ports tree. From ports, make installs of lang/php5 and www/lighttpd, using defaults for all ports installed. Modified lighttpd.conf for PHP (attached diff), created a short script called uploadfile.php (attached). File was installed at /usr/local/www/data/uploadfile.php Start lighttpd (lighttpd_enable=YES in rc.conf, /usr/local/etc/rc.d/lighttpd start), connect and run script. As long as I upload a file less than 64K, everything works fine. If I try to upload something larger than 64K, system no longer responds. Console prompt at login will allow me to enter username/password, but nothing happens after that. Console prompt logged in will allow me to type a single line, but if I press enter, nothing after that. No errors get written anywhere - console, logs, etc. I'm at a loss of what to do next. Can anyone give me ideas of what else I can do? Superficially, this seams identical to a deadlock I reported for 7.1-RC1. Would you mind compiling a kernel with these options: options DDB options KDB options SW_WATCHDOG options DEBUG_VFS_LOCKS then add the following to /etc/rc.conf: watchdogd_enable=YES watchdogd_flags=-e 'ls -al /etc' This should force a panic when the lockup happens again, which will drop to a debugger. Please check the backtrace, and tell me if the call stack is the same as this one (between the --- interrupt, and --- syscall sections): KDB: stack backtrace: db_trace_self_wrapper(c0b55b52,e66e0ae0,c07615e9,c0b50617,8ca93,...) at db_trace_self_wrapper+0x26 kdb_backtrace(c0b50617,8ca93,0,c41a7690,2,...) at kdb_backtrace+0x29 hardclock(0,c07ff29d,0,0,4,...) at hardclock+0x1f9 lapic_handle_timer(e66e0b08) at lapic_handle_timer+0x9c Xtimerint() at Xtimerint+0x1f --- interrupt, eip = 0xc07ff29d, esp = 0xe66e0b48, ebp = 0xe66e0c34 --- kern_sendfile(c41a7690,e66e0cfc,0,0,0,...) at kern_sendfile+0x90d do_sendfile(e66e0d2c,c0aba265,c41a7690,e66e0cfc,20,...) at do_sendfile+0xb1 sendfile(c41a7690,e66e0cfc,20,16,e66e0d2c,...) at sendfile+0x13 syscall(e66e0d38) at syscall+0x335 Xint0x80_syscall() at Xint0x80_syscall+0x20 --- syscall (393, FreeBSD ELF32, sendfile), eip = 0x282cb0cb, esp = 0xbfbfc7cc, ebp = 0xbfbfe848 --- KDB: enter: watchdog timeout You can type 'reboot' to reboot the machine (in my case, panic would not work, so a useful dump wasn't in the cards) ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Mounting / using /dev/ufs/name
On Wed, Jan 14, 2009 at 1:00 PM, Václav Haisman v.hais...@sh.cvut.cz wrote: Hi, I tried to mount root slice using the device nodes provided in /dev/ufs directory. It works fine for other slices but not for the root slice. If I try it I get prompt asking for root slice at boot time. It this not possible at all or am I doing something wrong? You should add vfs.root.mountfrom=ufs:ufs/whatever to /boot/loader.conf This will short circuit the bootloader's attempts to resolve it from the rootdev:/etc/fstab entry for /, which, occasionally, will be unable to deal with an fstab with an otherwise legal ufs label. I noticed it a few years ago when I moved every machine I had to using geom_label to find the root device, but I was unable to find the source of the bug in the code (src/sys/boot/common/boot.c, the getrootmount routine). ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Big problems with 7.1 locking up :-(
On Sun, Jan 11, 2009 at 11:27 AM, Pete French petefre...@ticketswitch.com wrote: My kernconf is below, try building the kernel, and send an email containing the backtrace from any process that has blocked (in my Well, I havent managed to get a backtrace, but immediately upon booting the system halts with the following: http://www.twisted.org.uk/~pete/71_lor1.jpg Not Found ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Hard lock on 7.1-RC1
On Sun, Dec 21, 2008 at 3:05 PM, Dylan Cochran a134q...@gmail.com wrote: I'm hitting a strange lockup on 7.1-RC1, where some socket operations seem to stall, as well as basic file operations. The only reproducable way I have of triggering it is by doing multiple inserts into phpmyadmin on lighttpd+fastcgi php5 + mysql51-server, though this isn't the only thing which triggers it, just the only one which is semi reliable. I've also reproduced this on another machine, set up specifically to rule out any machine specific problems (as they have different drive controllers, one uses gjournal, etc). I inititially built a kernel with SW_WATCHDOG, and attempted to use watchdogd and DDB to get an output from show locks, but the watchdogd hasn't panicked the machine, so at least devfs is still unlocked; I'm not able to get physical access to the machine until monday. The bug was introduced as far as I can tell, between 7.1-BETA2 and 7.1-RC1. Any suggestions on what I can test for tommorow? I updated the kernel source to RELENG_7_1 as of a few hours ago, and built with DEBUG_VFS_LOCKS as well. Luckily the backtrace included the operating it was at before the watchdog, which seems to be kern_sendfile(). I'm no expert at kernel debugging, so any assistance on tracking this down further would be greatly appreciated. And, as promised, here is the output of script after the watchdog induced panic: Script started on Tue Dec 23 01:05:56 2008 # cu -l cua01 interrupt total irq4: sio0 623 irq6: fdc0 1 irq17: fwohci0 3 irq18: rl0 uhci2++ 60718 irq23: rl1 ehci0 206 cpu0: timer 514596 Total 576147 KDB: stack backtrace: db_trace_self_wrapper(c0b55b52,e66e0ae0,c07615e9,c0b50617,8ca93,...) at db_trace_self_wrapper+0x26 kdb_backtrace(c0b50617,8ca93,0,c41a7690,2,...) at kdb_backtrace+0x29 hardclock(0,c07ff29d,0,0,4,...) at hardclock+0x1f9 lapic_handle_timer(e66e0b08) at lapic_handle_timer+0x9c Xtimerint() at Xtimerint+0x1f --- interrupt, eip = 0xc07ff29d, esp = 0xe66e0b48, ebp = 0xe66e0c34 --- kern_sendfile(c41a7690,e66e0cfc,0,0,0,...) at kern_sendfile+0x90d do_sendfile(e66e0d2c,c0aba265,c41a7690,e66e0cfc,20,...) at do_sendfile+0xb1 sendfile(c41a7690,e66e0cfc,20,16,e66e0d2c,...) at sendfile+0x13 syscall(e66e0d38) at syscall+0x335 Xint0x80_syscall() at Xint0x80_syscall+0x20 --- syscall (393, FreeBSD ELF32, sendfile), eip = 0x282cb0cb, esp = 0xbfbfc7cc, ebp = 0xbfbfe848 --- KDB: enter: watchdog timeout [thread pid 1288 tid 100060 ] Stopped at kdb_enter_why+0x3a: movl$0,kdb_why db show lock db p show all proc pid ppid pgrp uid state wmesg wchancmd 1600 902 902 0 R watchdogd 1470 1469 1470 0 S+ ttyin0xc418fc10 csh 1469 1 1469 0 Ss+ wait 0xc46032b8 login 1468 1 1468 0 Ss+ ttyin0xc41ac810 getty 1427 1 1427 0 Ss nanslp 0xc0c7dc44 cron 1420 1 1420 0 Ss select 0xc0c88eb8 sshd 1419 1289 128980 SJ accept 0xc445ab9a php-cgi 1418 1289 128980 SJ accept 0xc445ab9a php-cgi 1417 1289 128980 SJ accept 0xc445ab9a php-cgi 1416 1289 128980 SJ accept 0xc445ab9a php-cgi 1415 1289 128980 SJ accept 0xc445ab9a php-cgi 1414 1289 128980 SJ accept 0xc445ab9a php-cgi 1413 1289 128980 SJ accept 0xc445ab9a php-cgi 1412 1289 128980 SJ accept 0xc445ab9a php-cgi 1411 1289 128980 SJ accept 0xc445ab9a php-cgi 1410 1289 128980 SJ accept 0xc445ab9a php-cgi 1409 1289 128980 SJ accept 0xc445ab9a php-cgi 1408 1289 128980 SJ accept 0xc445ab9a php-cgi 1407 1289 128980 SJ accept 0xc445ab9a php-cgi --More-- 1406 1289 128980 SJ accept 0xc445ab9a php-cgi --More-- 1405 1289 128980 SJ accept 0xc445ab9a php-cgi 1404 1289 128980 SJ accept 0xc445ab9a php-cgi 1403 1300 130080 SJ accept 0xc445a6ba php-cgi 1402 1300 130080 SJ accept 0xc445a6ba php-cgi 1401 1300 130080 SJ accept 0xc445a6ba php-cgi 1400 1300 130080 SJ accept 0xc445a6ba php-cgi 1399 1300 130080 RJ php-cgi 1398 1300 130080 SJ accept 0xc445a6ba php-cgi 1397 1300 130080 SJ accept 0xc445a6ba php-cgi 1396 1300 130080 SJ accept 0xc445a6ba php-cgi 1395 1300 130080 SJ accept 0xc445a6ba php-cgi 1394 1300 130080 SJ accept 0xc445a6ba php-cgi 1393 1300 130080 SJ accept 0xc445a6ba php-cgi 1392 1300 130080 SJ accept 0xc445a6ba php-cgi 1391 1300 130080 SJ accept 0xc445a6ba php-cgi 1390 1300 130080 SJ accept 0xc445a6ba php-cgi
Hard lock on 7.1-RC1
I'm hitting a strange lockup on 7.1-RC1, where some socket operations seem to stall, as well as basic file operations. The only reproducable way I have of triggering it is by doing multiple inserts into phpmyadmin on lighttpd+fastcgi php5 + mysql51-server, though this isn't the only thing which triggers it, just the only one which is semi reliable. I've also reproduced this on another machine, set up specifically to rule out any machine specific problems (as they have different drive controllers, one uses gjournal, etc). I inititially built a kernel with SW_WATCHDOG, and attempted to use watchdogd and DDB to get an output from show locks, but the watchdogd hasn't panicked the machine, so at least devfs is still unlocked; I'm not able to get physical access to the machine until monday. The bug was introduced as far as I can tell, between 7.1-BETA2 and 7.1-RC1. Any suggestions on what I can test for tommorow? ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Benefits of multiple release branches (Was: Re: Upcoming Releases Schedule...)
On Mon, Sep 22, 2008 at 5:37 PM, Doug Barton [EMAIL PROTECTED] wrote: Dylan Cochran wrote: One of the biggest (and most prominent, though not obviously so) issues is the lack of concurrency with regards to releases. With the default system, having multiple freebsd releases side by side (both different versions, and different architectures) is infeasible. This makes the choice more critical, while hindering flexibility. The necessity of long support schedules is one of the symptoms. While on the one hand I can understand the users' frustration on this point, IMO having at least 2 release branches is necessary. We are trying to walk the fine line between pleasing those who want new features (including new drivers), better performance, etc. that a newer release branch offers (in this case 7.x) and those that want long-term API stability, and other forms of stability that an established release offers. The only practical way to accomplish both of those goals is with 2 release branches. I agree completely. My point is that as of right now, there is a large degree of collisions that take place, that prevent an install of 6.3-RELEASE and 7.0-RELEASE from existing at the same time on the same drive, and being trivial to switch between the two if need be. Same as with i386/amd64. This is an artificial construct, and doesn't /have/ to continue existing. Which imo, will be far more useful then expecting a large amount of time expended to dead-end branches of code that are well past their expiration date, and begin suffering from massive bitrot. At the very least, it will make the default system more robust by moving the majority of the upgrade procedure from being /replacements/ of files, to creating new files coupled with an atomic activation 'switch'. Please don't misinterpret my ideas as being supporting his viewpoint, merely pointing out a perspective to the problem which hasn't been mentioned. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Upcoming Releases Schedule...
On Thu, Sep 18, 2008 at 12:25 AM, Jo Rhett [EMAIL PROTECTED] wrote: I understand what you mean, but the statement is blatantly false as stated. Anyone selling software to the US Government *must* specify (or meet, depending) a minimum support period, and must also specify a cost the agency can pay to extend the support period. Not relevant to FreeBSD -- just qualifying the statement as it stands. For the obvious comparison, Solaris versions have well-published release and support periods, usually upwards of 8 years. Obviously they have more resources to do this, I'm just pointing out that the statement you made is incorrect as stated. and I'm not sure you could do it differently -- no one plans to ship a lemon, but once in a while you discover that things don't go as planned. I am amazed at the preposterously large elephant in the room that none of you are willing to address. Watching each of you dance around it would be terribly funny if it didn't affect my job so badly. (and if I wasn't going to have to bail on FreeBSD and go to some crap form of Linux because the FreeBSD developers appear to be unwilling to consider the idea of getting more help) My opinion on this matter may be considered radical, but I do think it should be at least recorded, if not impartially considered. While this problem can't be solved just by extending time with the hope that the resources will be allocated (no offense to your character, but that promise is made by a lot of people, and it doesn't always work out that way; particularly in environments with ingrained and blind politics where the money flows can change based on pride and/or sheer ignorance), it may be advantageous to treat the root causes. One of the biggest (and most prominent, though not obviously so) issues is the lack of concurrency with regards to releases. With the default system, having multiple freebsd releases side by side (both different versions, and different architectures) is infeasible. This makes the choice more critical, while hindering flexibility. The necessity of long support schedules is one of the symptoms. The fact that you have to choose, and then to change the choice you must clean up, back up, and create a new environment in order to test on a different release/architecture (release in this context includes kernel, a chroot is incomplete for testing), has two major effects: it hinders users from being able to selectively test newer releases with their software stack/hardware selection, with no adverse (within reason; obviously bugs like disk corruption will still happen) changes that will prevent them from reverting. While it may not please the accountants, cleaning up the namespace and allowing safe concurrency of releases will increase the /legitmate/ feasibility of using FreeBSD on a large scale. Oh, I forgot to mention, this is far from a pipe dream. I have a working environment with this capability, and I use it whenever I am able. This isn't to say it is the only cause, it is one of many, and I would never even claim it was a magic bullet. But it is my opinion that this problem is best solved not by arguing how to work around the symptoms, but to analyze and solve the parent problems that may not be so obvious. There's my two cents on the matter. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]