Re: Possible scheduler (SCHED_ULE) bug?

2009-10-23 Thread Dylan Cochran
On 10/23/09, Jaime Bozza jbo...@mindsites.com wrote:
 I believe I found a problem with the ULE scheduler - At least the fact that
 there is a problem, but I'm not sure where to go from here.   The system
 locks all processes, but doesn't panic, so I have no output to give.

 I was able to duplicate this on three different machines and solved it by
 switching to the scheduler to 4BSD.

 Here's the environment:

 FreeBSD 7.2 i386, installed from bootonly ISO, Custom install, minimal, no
 other changes other than setting timezone, changing root password, and
 turning on sshd (allowing root and password connection).

 Running portsnap (fetch, then extract) to get latest ports tree.

 From ports, make installs of lang/php5 and www/lighttpd, using defaults for
 all ports installed.

 Modified lighttpd.conf for PHP (attached diff), created a short script
 called uploadfile.php (attached).  File was installed at
 /usr/local/www/data/uploadfile.php

 Start lighttpd (lighttpd_enable=YES in rc.conf,
 /usr/local/etc/rc.d/lighttpd start), connect and run script.

 As long as I upload a file less than 64K, everything works fine.  If I try
 to upload something larger than 64K, system no longer responds.   Console
 prompt at login will allow me to enter username/password, but nothing
 happens after that.  Console prompt logged in will allow me to type a single
 line, but if I press enter, nothing after that.

 No errors get written anywhere - console, logs, etc.

 I'm at a loss of what to do next.  Can anyone give me ideas of what else I
 can do?

Superficially, this seams identical to a deadlock I reported for
7.1-RC1. Would you mind compiling a kernel with these options:

options DDB
options KDB
options SW_WATCHDOG
options DEBUG_VFS_LOCKS


then add the following to /etc/rc.conf:

watchdogd_enable=YES
watchdogd_flags=-e 'ls -al /etc'

This should force a panic when the lockup happens again, which will
drop to a debugger.

Please check the backtrace, and tell me if the call stack is the same
as this one (between the --- interrupt, and --- syscall sections):

KDB: stack backtrace:
db_trace_self_wrapper(c0b55b52,e66e0ae0,c07615e9,c0b50617,8ca93,...)
at db_trace_self_wrapper+0x26
kdb_backtrace(c0b50617,8ca93,0,c41a7690,2,...) at kdb_backtrace+0x29
hardclock(0,c07ff29d,0,0,4,...) at hardclock+0x1f9
lapic_handle_timer(e66e0b08) at lapic_handle_timer+0x9c
Xtimerint() at Xtimerint+0x1f
--- interrupt, eip = 0xc07ff29d, esp = 0xe66e0b48, ebp = 0xe66e0c34 ---
kern_sendfile(c41a7690,e66e0cfc,0,0,0,...) at kern_sendfile+0x90d
do_sendfile(e66e0d2c,c0aba265,c41a7690,e66e0cfc,20,...) at do_sendfile+0xb1
sendfile(c41a7690,e66e0cfc,20,16,e66e0d2c,...) at sendfile+0x13
syscall(e66e0d38) at syscall+0x335
Xint0x80_syscall() at Xint0x80_syscall+0x20
--- syscall (393, FreeBSD ELF32, sendfile), eip = 0x282cb0cb, esp =
0xbfbfc7cc, ebp = 0xbfbfe848 ---
KDB: enter: watchdog timeout

You can type 'reboot' to reboot the machine (in my case, panic would
not work, so a useful dump wasn't in the cards)
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: Mounting / using /dev/ufs/name

2009-01-14 Thread Dylan Cochran
On Wed, Jan 14, 2009 at 1:00 PM, Václav Haisman v.hais...@sh.cvut.cz wrote:
 Hi,
 I tried to mount root slice using the device nodes provided in /dev/ufs
 directory. It works fine for other slices but not for the root slice. If I
 try it I get prompt asking for root slice at boot time. It this not possible
 at all or am I doing something wrong?

You should add vfs.root.mountfrom=ufs:ufs/whatever to /boot/loader.conf

This will short circuit the bootloader's attempts to resolve it from
the rootdev:/etc/fstab entry for /, which, occasionally, will be
unable to deal with an fstab with an otherwise legal ufs label. I
noticed it a few years ago when I moved every machine I had to using
geom_label to find the root device, but I was unable to find the
source of the bug in the code (src/sys/boot/common/boot.c, the
getrootmount routine).
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: Big problems with 7.1 locking up :-(

2009-01-11 Thread Dylan Cochran
On Sun, Jan 11, 2009 at 11:27 AM, Pete French
petefre...@ticketswitch.com wrote:
 My kernconf is below, try building the kernel, and send an email
 containing the backtrace from any process that has blocked (in my

 Well, I havent managed to get a backtrace, but immediately upon
 booting the system halts with the following:

http://www.twisted.org.uk/~pete/71_lor1.jpg

Not Found
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: Hard lock on 7.1-RC1

2008-12-22 Thread Dylan Cochran
On Sun, Dec 21, 2008 at 3:05 PM, Dylan Cochran a134q...@gmail.com wrote:
 I'm hitting a strange lockup on 7.1-RC1, where some socket operations
 seem to stall, as well as basic file operations. The only reproducable
 way I have of triggering it is by doing multiple inserts into
 phpmyadmin on lighttpd+fastcgi php5 + mysql51-server, though this
 isn't the only thing which triggers it, just the only one which is
 semi reliable. I've also reproduced this on another machine, set up
 specifically to rule out any machine specific problems (as they have
 different drive controllers, one uses gjournal, etc).

 I inititially built a kernel with SW_WATCHDOG, and attempted to use
 watchdogd and DDB to get an output from show locks, but the watchdogd
 hasn't panicked the machine, so at least devfs is still unlocked; I'm
 not able to get physical access to the machine until monday.

 The bug was introduced as far as I can tell, between 7.1-BETA2 and 7.1-RC1.

 Any suggestions on what I can test for tommorow?

I updated the kernel source to RELENG_7_1 as of a few hours ago, and
built with DEBUG_VFS_LOCKS as well.

Luckily the backtrace included the operating it was at before the
watchdog, which seems to be kern_sendfile(). I'm no expert at kernel
debugging, so any assistance on tracking this down further would be
greatly appreciated.

And, as promised, here is the output of script after the watchdog induced panic:

Script started on Tue Dec 23 01:05:56 2008
# cu -l cua01
interrupt   total
irq4: sio0   623
irq6: fdc0 1
irq17: fwohci0 3
irq18: rl0 uhci2++ 60718
irq23: rl1 ehci0 206
cpu0: timer   514596
Total  576147
KDB: stack backtrace:
db_trace_self_wrapper(c0b55b52,e66e0ae0,c07615e9,c0b50617,8ca93,...)
at db_trace_self_wrapper+0x26
kdb_backtrace(c0b50617,8ca93,0,c41a7690,2,...) at kdb_backtrace+0x29
hardclock(0,c07ff29d,0,0,4,...) at hardclock+0x1f9
lapic_handle_timer(e66e0b08) at lapic_handle_timer+0x9c
Xtimerint() at Xtimerint+0x1f
--- interrupt, eip = 0xc07ff29d, esp = 0xe66e0b48, ebp = 0xe66e0c34 ---
kern_sendfile(c41a7690,e66e0cfc,0,0,0,...) at kern_sendfile+0x90d
do_sendfile(e66e0d2c,c0aba265,c41a7690,e66e0cfc,20,...) at do_sendfile+0xb1
sendfile(c41a7690,e66e0cfc,20,16,e66e0d2c,...) at sendfile+0x13
syscall(e66e0d38) at syscall+0x335
Xint0x80_syscall() at Xint0x80_syscall+0x20
--- syscall (393, FreeBSD ELF32, sendfile), eip = 0x282cb0cb, esp =
0xbfbfc7cc, ebp = 0xbfbfe848 ---
KDB: enter: watchdog timeout
[thread pid 1288 tid 100060 ]
Stopped at  kdb_enter_why+0x3a: movl$0,kdb_why
db show lock
db p show all proc
  pid  ppid  pgrp   uid   state   wmesg wchancmd
 1600   902   902 0  R   watchdogd
 1470  1469  1470 0  S+  ttyin0xc418fc10 csh
 1469 1  1469 0  Ss+ wait 0xc46032b8 login
 1468 1  1468 0  Ss+ ttyin0xc41ac810 getty
 1427 1  1427 0  Ss  nanslp   0xc0c7dc44 cron
 1420 1  1420 0  Ss  select   0xc0c88eb8 sshd
 1419  1289  128980  SJ  accept   0xc445ab9a php-cgi
 1418  1289  128980  SJ  accept   0xc445ab9a php-cgi
 1417  1289  128980  SJ  accept   0xc445ab9a php-cgi
 1416  1289  128980  SJ  accept   0xc445ab9a php-cgi
 1415  1289  128980  SJ  accept   0xc445ab9a php-cgi
 1414  1289  128980  SJ  accept   0xc445ab9a php-cgi
 1413  1289  128980  SJ  accept   0xc445ab9a php-cgi
 1412  1289  128980  SJ  accept   0xc445ab9a php-cgi
 1411  1289  128980  SJ  accept   0xc445ab9a php-cgi
 1410  1289  128980  SJ  accept   0xc445ab9a php-cgi
 1409  1289  128980  SJ  accept   0xc445ab9a php-cgi
 1408  1289  128980  SJ  accept   0xc445ab9a php-cgi
 1407  1289  128980  SJ  accept   0xc445ab9a php-cgi
--More--

 1406  1289  128980  SJ  accept   0xc445ab9a php-cgi
--More--

 1405  1289  128980  SJ  accept   0xc445ab9a php-cgi
 1404  1289  128980  SJ  accept   0xc445ab9a php-cgi
 1403  1300  130080  SJ  accept   0xc445a6ba php-cgi
 1402  1300  130080  SJ  accept   0xc445a6ba php-cgi
 1401  1300  130080  SJ  accept   0xc445a6ba php-cgi
 1400  1300  130080  SJ  accept   0xc445a6ba php-cgi
 1399  1300  130080  RJ  php-cgi
 1398  1300  130080  SJ  accept   0xc445a6ba php-cgi
 1397  1300  130080  SJ  accept   0xc445a6ba php-cgi
 1396  1300  130080  SJ  accept   0xc445a6ba php-cgi
 1395  1300  130080  SJ  accept   0xc445a6ba php-cgi
 1394  1300  130080  SJ  accept   0xc445a6ba php-cgi
 1393  1300  130080  SJ  accept   0xc445a6ba php-cgi
 1392  1300  130080  SJ  accept   0xc445a6ba php-cgi
 1391  1300  130080  SJ  accept   0xc445a6ba php-cgi
 1390  1300  130080  SJ  accept   0xc445a6ba php-cgi

Hard lock on 7.1-RC1

2008-12-21 Thread Dylan Cochran
I'm hitting a strange lockup on 7.1-RC1, where some socket operations
seem to stall, as well as basic file operations. The only reproducable
way I have of triggering it is by doing multiple inserts into
phpmyadmin on lighttpd+fastcgi php5 + mysql51-server, though this
isn't the only thing which triggers it, just the only one which is
semi reliable. I've also reproduced this on another machine, set up
specifically to rule out any machine specific problems (as they have
different drive controllers, one uses gjournal, etc).

I inititially built a kernel with SW_WATCHDOG, and attempted to use
watchdogd and DDB to get an output from show locks, but the watchdogd
hasn't panicked the machine, so at least devfs is still unlocked; I'm
not able to get physical access to the machine until monday.

The bug was introduced as far as I can tell, between 7.1-BETA2 and 7.1-RC1.

Any suggestions on what I can test for tommorow?
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: Benefits of multiple release branches (Was: Re: Upcoming Releases Schedule...)

2008-09-22 Thread Dylan Cochran
On Mon, Sep 22, 2008 at 5:37 PM, Doug Barton [EMAIL PROTECTED] wrote:
 Dylan Cochran wrote:
 One of the biggest (and most prominent, though not obviously so)
 issues is the lack of concurrency with regards to releases. With the
 default system, having multiple freebsd releases side by side (both
 different versions, and different architectures) is infeasible. This
 makes the choice more critical, while hindering flexibility. The
 necessity of long support schedules is one of the symptoms.

 While on the one hand I can understand the users' frustration on this
 point, IMO having at least 2 release branches is necessary. We are
 trying to walk the fine line between pleasing those who want new
 features (including new drivers), better performance, etc. that a newer
 release branch offers (in this case 7.x) and those that want long-term
 API stability, and other forms of stability that an established
 release offers. The only practical way to accomplish both of those goals
 is with 2 release branches.

I agree completely. My point is that as of right now, there is a large
degree of collisions that take place, that prevent an install of
6.3-RELEASE and 7.0-RELEASE from existing at the same time on the same
drive, and being trivial to switch between the two if need be. Same as
with i386/amd64.

This is an artificial construct, and doesn't /have/ to continue
existing. Which imo, will be far more useful then expecting a large
amount of time expended to dead-end branches of code that are well
past their expiration date, and begin suffering from massive bitrot.

At the very least, it will make the default system more robust by
moving the majority of the upgrade procedure from being /replacements/
of files, to creating new files coupled with an atomic activation
'switch'.

Please don't misinterpret my ideas as being supporting his viewpoint,
merely pointing out a perspective to the problem which hasn't been
mentioned.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Upcoming Releases Schedule...

2008-09-18 Thread Dylan Cochran
On Thu, Sep 18, 2008 at 12:25 AM, Jo Rhett [EMAIL PROTECTED] wrote:
 I understand what you mean, but the statement is blatantly false as stated.
  Anyone selling software to the US Government *must* specify (or meet,
 depending) a minimum support period, and must also specify a cost the agency
 can pay to extend the support period.

 Not relevant to FreeBSD -- just qualifying the statement as it stands.  For
 the obvious comparison, Solaris versions have well-published release and
 support periods, usually upwards of 8 years.  Obviously they have more
 resources to do this, I'm just pointing out that the statement you made is
 incorrect as stated.

 and I'm not sure you could do it differently -- no one plans to ship a
 lemon, but once in a while you discover that things don't go as planned.


 I am amazed at the preposterously large elephant in the room that none of
 you are willing to address.  Watching each of you dance around it would be
 terribly funny if it didn't affect my job so badly.  (and if I wasn't going
 to have to bail on FreeBSD and go to some crap form of Linux because the
 FreeBSD developers appear to be unwilling to consider the idea of getting
 more help)

My opinion on this matter may be considered radical, but I do think it
should be at least recorded, if not impartially considered. While this
problem can't be solved just by extending time with the hope that the
resources will be allocated (no offense to your character, but that
promise is made by a lot of people, and it doesn't always work out
that way; particularly in environments with ingrained and blind
politics where the money flows can change based on pride and/or sheer
ignorance), it may be advantageous to treat the root causes.

One of the biggest (and most prominent, though not obviously so)
issues is the lack of concurrency with regards to releases. With the
default system, having multiple freebsd releases side by side (both
different versions, and different architectures) is infeasible. This
makes the choice more critical, while hindering flexibility. The
necessity of long support schedules is one of the symptoms.

The fact that you have to choose, and then to change the choice you
must clean up, back up, and create a new environment in order to test
on a different release/architecture (release in this context includes
kernel, a chroot is incomplete for testing), has two major effects: it
hinders users from being able to selectively test newer releases with
their software stack/hardware selection, with no adverse (within
reason; obviously bugs like disk corruption will still happen) changes
that will prevent them from reverting.

While it may not please the accountants, cleaning up the namespace and
allowing safe concurrency of releases will increase the /legitmate/
feasibility of using FreeBSD on a large scale. Oh, I forgot to
mention, this is far from a pipe dream. I have a working environment
with this capability, and I use it whenever I am able.

This isn't to say it is the only cause, it is one of many, and I would
never even claim it was a magic bullet. But it is my opinion that this
problem is best solved not by arguing how to work around the symptoms,
but to analyze and solve the parent problems that may not be so
obvious.

There's my two cents on the matter.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]