Re: FreeBSD 9 recompile ports
On Fri, Jan 13, 2012 at 04:11:22PM +0200, Andriy Gapon wrote: on 13/01/2012 14:57 George Kontostanos said the following: Still the question remains regarding COMPAT_FREEBSD8 and how does this affects ports/misc/compat8x/ Looks like all the previous hints have not been clear enough. There is no direct relation between COMPAT_FREEBSD8 and misc/compat8x. COMPAT_FREEBSDX options are only needed when going from release X to release X+1 there was a change to an existing system call at the kernel-userland boundary. A side note: kernel options affect only what's in the kernel, quite obviously. misc/compatXx contains versions of shared libraries from release X that are no longer present in X+1. Additional twist is that not every change at the kernel/usermode boundary is covered with backward-compatibility shims. Recent example is the CAM ABI change, which makes libcam.so.5 from the compat8x useless. pgpsHXivVSJKp.pgp Description: PGP signature
Re: Mystery panic, FreeBSD 7.2-PRE
On Thu, Dec 22, 2011 at 04:04:48PM -0700, Charlie Martin wrote: We've got another mystery panic in 7.2-PRE. Upgrading is not an option; however, if this is familiar to anyone, backporting a patch would be. The stack trace is: db_trace_self_wrapper() at 0x8019120a = db_trace_self_wrapper+0x2a^M panic() at 0x80308797 = panic+0x187^M devfs_populate_loop() at 0x802a45c8 = devfs_populate_loop+0x548^M devfs_populate() at 0x802a46ab = devfs_populate+0x3b^M devfs_lookup() at 0x802a7824 = devfs_lookup+0x264^M VOP_LOO[24165][irq261: plx0] DEBUG (hasc_sv_rcv_cb): rcvd hrtbt ts 24051, 7/9, rc 0^M KUP_APV() at 0x804d5995 = VOP_LOOKUP_APV+0x95^M lookup() at 0x80384a3e = lookup+0x4ce^M namei() at 0x80385768 = namei+0x2c8^M vn_open_cred() at 0x8039b283 = vn_open_cred+0x1b3^M kern_open() at 0x8039a4a0 = kern_open+0x110^M syscall() at 0x804b0e3c = syscall+0x1ec^M Xfast_syscall() at 0x80494ecb = Xfast_syscall+0xab^M --- syscall (5, FreeBSD ELF64, open), rip = 0x800e022fc, rsp = 0x7fbfa128, rbp = 0x801002240 ---^M KDB: enter: panic^M It is impossible to diagnose the real cause of the panic from the backtrace above. 99.99% of the issues causing that backtrace are problems in the specific drivers, which failed to dev_ref() the newly created cdev, e.g. in the clone handler. My interest in the issue is limited to the slightest possibility that the bug is not yet fixed in HEAD or 9/8. Usual suspects are tty, which were completely rototiled in 8. pgpGbzHKAjHSb.pgp Description: PGP signature
Re: directory listing hangs in ufs state
On Wed, Dec 21, 2011 at 09:03:02PM +0400, Andrey Zonov wrote: On 15.12.2011 17:01, Kostik Belousov wrote: On Thu, Dec 15, 2011 at 03:51:02PM +0400, Andrey Zonov wrote: On Thu, Dec 15, 2011 at 12:42 AM, Jeremy Chadwick free...@jdc.parodius.comwrote: On Wed, Dec 14, 2011 at 11:47:10PM +0400, Andrey Zonov wrote: On 14.12.2011 22:22, Jeremy Chadwick wrote: On Wed, Dec 14, 2011 at 10:11:47PM +0400, Andrey Zonov wrote: Hi Jeremy, This is not hardware problem, I've already checked that. I also ran fsck today and got no errors. After some more exploration of how mongodb works, I found that then listing hangs, one of mongodb thread is in biowr state for a long time. It periodically calls msync(MS_SYNC) accordingly to ktrace out. If I'll remove msync() calls from mongodb, how often data will be sync by OS? -- Andrey Zonov On 14.12.2011 2:15, Jeremy Chadwick wrote: On Wed, Dec 14, 2011 at 01:11:19AM +0400, Andrey Zonov wrote: Have you any ideas what is going on? or how to catch the problem? Assuming this isn't a file on the root filesystem, try booting the machine in single-user mode and using fsck -f on the filesystem in question. Can you verify there's no problems with the disk this file lives on as well (smartctl -a /dev/disk)? I'm doubting this is the problem, but thought I'd mention it. I have no real answer, I'm sorry. msync(2) indicates it's effectively deprecated (see BUGS). It looks like this is effectively a mmap-version of fsync(2). I replaced msync(2) with fsync(2). Unfortunately, from man pages it is not obvious that I can do this. Anyway, thanks. Sorry, that wasn't what I was implying. Let me try to explain differently. msync(2) looks, to me, like an mmap-specific version of fsync(2). Based on the man page, it seems that the with msync() you can effectively guaranteed flushing of certain pages within an mmap()'d region to disk. fsync() would flush **all** buffers/internal pages to be flushed to disk. One would need to look at the code to mongodb to find out what it's actually doing with msync(). That is to say, if it's doing something like this (I probably have the semantics wrong -- I've never spent much time with mmap()): fd = open(/some/file, O_RDWR); ptr = mmap(NULL, 65536, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0); ret = msync(ptr, 65536, MS_SYNC); /* or alternatively, this: ret = msync(ptr, NULL, MS_SYNC); */ Then this, to me, would be mostly the equivalent to: fd = fopen(/some/file, r+); ret = fsync(fd); Otherwise, if it's calling msync() only on an address/location within the region ptr points to, then that may be more efficient (less pages to flush). They call msync() for the whole file. So, there will not be any difference. The mmap() arguments -- specifically flags (see man page) -- also play a role here. The one that catches my attention is MAP_NOSYNC. So you may need to look at the mongodb code to figure out what it's mmap() call is. One might wonder why they don't just use open() with the O_SYNC. I imagine that has to do with, again, performance; possibly the don't want all I/O synchronous, and would rather flush certain pages in the mmap'd region to disk as needed. I see the legitimacy in that approach (vs. just using O_SYNC). There's really no easy way for me to tell you which is more efficient, better, blah blah without spending a lot of time with a benchmarking program that tests all of this, *plus* an entire system (world) built with profiling. I ran for two hours mongodb with fsync() and got the following: STARTED INBLK OUBLK MAJFLT MINFLT Thu Dec 15 10:34:52 2011 3 192744314 3080182 This is output of `ps -o lstart,inblock,oublock,majflt,minflt -U mongodb'. Then I ran it with default msync(): STARTED INBLK OUBLK MAJFLT MINFLT Thu Dec 15 12:34:53 2011 0 7241555 79 5401945 There are also two graphics of disk business [1] [2]. The difference is significant, in 37 times! That what I expected to get. In commentaries for vm_object_page_clean() I found this: * When stuffing pages asynchronously, allow clustering. XXX we need a * synchronous clustering mode implementation. It means for me that msync(MS_SYNC) flush every page on disk in single IO transaction. If we multiply 4K and 37 we get 150K. This number is size of the single transaction in my experience. +alc@, kib@ Am I right? Is there any plan to implement this? Current buffer clustering code can only do only async writes. In fact, I am not quite sure what would consitute the sync clustering, because the ability to delay the write is important to be able to cluster at all. Also, I am not sure that lack of clustering is the biggest problem. IMO, the fact that each write is sync is the first problem there. It would be quite a work to add the tracking of the issued writes
Re: fsck_ufs out of swapspace
On Tue, Dec 20, 2011 at 09:51:43AM +1100, Peter Jeremy wrote: On 2011-Dec-19 22:27:49 +0100, Michiel Boland bolan...@xs4all.nl wrote: Problem solved - it was indeed an endian thing. The problem is that fsck uses a real_dev_bsize variable that is declared long, but the DIOCGSECTORSIZE ioctl takes an u_int argument. To be accurate, this isn't an endian problem, it's a general problem of passing a pointer to an incorrectly sized object. The bug is masked on amd64 iA64 because real_dev_bsize is statically allocated and therefore initialised to zero. This means the failure to assign the top 32 bits in the ioctl doesn't affect the final result. A PR has been submitted. sparc64/163460 for the record. Thank you for tracking that down. The easier fix is to change the type of real_dev_bsize. I used long only because other n variables keeping the sector size are long, but there is no much reason to use long there. Peter, would you, please retest the +J on non-512 byte sectors, with the patch attached ? diff --git a/sbin/fsck_ffs/fsck.h b/sbin/fsck_ffs/fsck.h index 8091d0f..4e30a7e 100644 --- a/sbin/fsck_ffs/fsck.h +++ b/sbin/fsck_ffs/fsck.h @@ -268,7 +268,7 @@ charsnapname[BUFSIZ]; /* when doing snapshots, the name of the file */ char *cdevname; /* name of device being checked */ long dev_bsize; /* computed value of DEV_BSIZE */ long secsize;/* actual disk sector size */ -long real_dev_bsize; +u_int real_dev_bsize; /* actual disk sector size, not overriden */ char nflag; /* assume a no response */ char yflag; /* assume a yes response */ intbkgrdflag; /* use a snapshot to run on an active system */ diff --git a/sbin/fsck_ffs/suj.c b/sbin/fsck_ffs/suj.c index ec8b5ab..b784519 100644 --- a/sbin/fsck_ffs/suj.c +++ b/sbin/fsck_ffs/suj.c @@ -206,7 +206,7 @@ opendisk(const char *devnam) real_dev_bsize) == -1) real_dev_bsize = secsize; if (debug) - printf(dev_bsize %ld\n, real_dev_bsize); + printf(dev_bsize %u\n, real_dev_bsize); } /* pgpcm0dWM9HIP.pgp Description: PGP signature
Re: directory listing hangs in ufs state
On Thu, Dec 15, 2011 at 03:51:02PM +0400, Andrey Zonov wrote: On Thu, Dec 15, 2011 at 12:42 AM, Jeremy Chadwick free...@jdc.parodius.comwrote: On Wed, Dec 14, 2011 at 11:47:10PM +0400, Andrey Zonov wrote: On 14.12.2011 22:22, Jeremy Chadwick wrote: On Wed, Dec 14, 2011 at 10:11:47PM +0400, Andrey Zonov wrote: Hi Jeremy, This is not hardware problem, I've already checked that. I also ran fsck today and got no errors. After some more exploration of how mongodb works, I found that then listing hangs, one of mongodb thread is in biowr state for a long time. It periodically calls msync(MS_SYNC) accordingly to ktrace out. If I'll remove msync() calls from mongodb, how often data will be sync by OS? -- Andrey Zonov On 14.12.2011 2:15, Jeremy Chadwick wrote: On Wed, Dec 14, 2011 at 01:11:19AM +0400, Andrey Zonov wrote: Have you any ideas what is going on? or how to catch the problem? Assuming this isn't a file on the root filesystem, try booting the machine in single-user mode and using fsck -f on the filesystem in question. Can you verify there's no problems with the disk this file lives on as well (smartctl -a /dev/disk)? I'm doubting this is the problem, but thought I'd mention it. I have no real answer, I'm sorry. msync(2) indicates it's effectively deprecated (see BUGS). It looks like this is effectively a mmap-version of fsync(2). I replaced msync(2) with fsync(2). Unfortunately, from man pages it is not obvious that I can do this. Anyway, thanks. Sorry, that wasn't what I was implying. Let me try to explain differently. msync(2) looks, to me, like an mmap-specific version of fsync(2). Based on the man page, it seems that the with msync() you can effectively guaranteed flushing of certain pages within an mmap()'d region to disk. fsync() would flush **all** buffers/internal pages to be flushed to disk. One would need to look at the code to mongodb to find out what it's actually doing with msync(). That is to say, if it's doing something like this (I probably have the semantics wrong -- I've never spent much time with mmap()): fd = open(/some/file, O_RDWR); ptr = mmap(NULL, 65536, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0); ret = msync(ptr, 65536, MS_SYNC); /* or alternatively, this: ret = msync(ptr, NULL, MS_SYNC); */ Then this, to me, would be mostly the equivalent to: fd = fopen(/some/file, r+); ret = fsync(fd); Otherwise, if it's calling msync() only on an address/location within the region ptr points to, then that may be more efficient (less pages to flush). They call msync() for the whole file. So, there will not be any difference. The mmap() arguments -- specifically flags (see man page) -- also play a role here. The one that catches my attention is MAP_NOSYNC. So you may need to look at the mongodb code to figure out what it's mmap() call is. One might wonder why they don't just use open() with the O_SYNC. I imagine that has to do with, again, performance; possibly the don't want all I/O synchronous, and would rather flush certain pages in the mmap'd region to disk as needed. I see the legitimacy in that approach (vs. just using O_SYNC). There's really no easy way for me to tell you which is more efficient, better, blah blah without spending a lot of time with a benchmarking program that tests all of this, *plus* an entire system (world) built with profiling. I ran for two hours mongodb with fsync() and got the following: STARTED INBLK OUBLK MAJFLT MINFLT Thu Dec 15 10:34:52 2011 3 192744314 3080182 This is output of `ps -o lstart,inblock,oublock,majflt,minflt -U mongodb'. Then I ran it with default msync(): STARTED INBLK OUBLK MAJFLT MINFLT Thu Dec 15 12:34:53 2011 0 7241555 79 5401945 There are also two graphics of disk business [1] [2]. The difference is significant, in 37 times! That what I expected to get. In commentaries for vm_object_page_clean() I found this: * When stuffing pages asynchronously, allow clustering. XXX we need a * synchronous clustering mode implementation. It means for me that msync(MS_SYNC) flush every page on disk in single IO transaction. If we multiply 4K and 37 we get 150K. This number is size of the single transaction in my experience. +alc@, kib@ Am I right? Is there any plan to implement this? Current buffer clustering code can only do only async writes. In fact, I am not quite sure what would consitute the sync clustering, because the ability to delay the write is important to be able to cluster at all. Also, I am not sure that lack of clustering is the biggest problem. IMO, the fact that each write is sync is the first problem there. It would be quite a work to add the tracking of the issued writes to the
Re: tmpfs deadlock on stable/9
On Wed, Dec 07, 2011 at 01:57:08PM +0400, Dmitry Morozovsky wrote: Dear colleagues, I have ports tinderbox runnign on stable/9-amd64, with working directories on tmpfs. I have two consecutive tmpfs deadlocks like root@beaver:/usr/local/tb/scripts# ps t2 PID TT STATTIME COMMAND 2337 2 Is 0:00.04 /bin/tcsh 3079 2 I0:00.01 sudo -sE 3260 2 I0:00.02 /bin/tcsh 20309 2 I+ 0:00.06 /bin/sh ./tc tinderbuild -nullfs -norebuild -b 9-i386-RiNet 27035 2 S+ 0:00.13 make PACKAGES=/usr/local/tb/packages/9-i386-RiNet -k -j1 all 46470 2 I+ 0:00.00 sh -ev 46471 2 I+ 0:00.01 /bin/sh /usr/local/tb/scripts/lib/portbuild 9-i386-RiNet 9-i386 RiNet -nullfs gsm-1.0.13.tbz /usr/ports/audio/gsm 46677 2 I+ 0:00.00 /bin/sh /buildscript /usr/ports/audio/gsm 2 46766 2 I+ 0:00.00 /pnohang 7200 /tmp/make.log4 gsm-1.0.13 make build 46767 2 I+ 0:00.02 make build 46768 2 I+ 0:00.00 /pnohang 7200 /tmp/make.log4 gsm-1.0.13 make build 46789 2 I+ 0:00.00 [sh] 46790 2 D+ 0:00.01 make -f Makefile -j4 all 46918 2 I+ 0:00.00 sh -ev 46926 2 I+ 0:00.00 sh -ev 46928 2 Z+ 0:00.09 defunct 46938 2 D+ 0:00.00 mv gsm_create.o ./src/gsm_create.o 46940 2 D+ 0:00.00 mv gsm_print.o ./src/gsm_print.o (this is parallel build, last 2 rm's are deadlocked on tmpfs) what kind of additional info should I send? I have debugging turned on in kernel, if it's needed. http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug-deadlocks.html No, I do not promise to look into it. pgpQrBHe06ciC.pgp Description: PGP signature
Re: tmpfs deadlock on stable/9
On Wed, Dec 07, 2011 at 09:59:32PM +0400, Dmitry Morozovsky wrote: On Wed, 7 Dec 2011, Kostik Belousov wrote: I have ports tinderbox runnign on stable/9-amd64, with working directories on tmpfs. I have two consecutive tmpfs deadlocks like root@beaver:/usr/local/tb/scripts# ps t2 PID TT STATTIME COMMAND 2337 2 Is 0:00.04 /bin/tcsh 3079 2 I0:00.01 sudo -sE 3260 2 I0:00.02 /bin/tcsh 20309 2 I+ 0:00.06 /bin/sh ./tc tinderbuild -nullfs -norebuild -b 9-i386-RiNet 27035 2 S+ 0:00.13 make PACKAGES=/usr/local/tb/packages/9-i386-RiNet -k -j1 all 46470 2 I+ 0:00.00 sh -ev 46471 2 I+ 0:00.01 /bin/sh /usr/local/tb/scripts/lib/portbuild 9-i386-RiNet 9-i386 RiNet -nullfs gsm-1.0.13.tbz /usr/ports/audio/gsm 46677 2 I+ 0:00.00 /bin/sh /buildscript /usr/ports/audio/gsm 2 46766 2 I+ 0:00.00 /pnohang 7200 /tmp/make.log4 gsm-1.0.13 make build 46767 2 I+ 0:00.02 make build 46768 2 I+ 0:00.00 /pnohang 7200 /tmp/make.log4 gsm-1.0.13 make build 46789 2 I+ 0:00.00 [sh] 46790 2 D+ 0:00.01 make -f Makefile -j4 all 46918 2 I+ 0:00.00 sh -ev 46926 2 I+ 0:00.00 sh -ev 46928 2 Z+ 0:00.09 defunct 46938 2 D+ 0:00.00 mv gsm_create.o ./src/gsm_create.o 46940 2 D+ 0:00.00 mv gsm_print.o ./src/gsm_print.o (this is parallel build, last 2 rm's are deadlocked on tmpfs) what kind of additional info should I send? I have debugging turned on in kernel, if it's needed. http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug-deadlocks.html No, I do not promise to look into it. It is available at http://bsd.woozle.net/tmpfs-lock-20111207.txt (~260k) BTW, at least some of the debugger commands referenced (show locks, show alllocks) are no longer exist This means that you do not have witness in your kernel. Look at the reference I pointed you once more. pgpcWVtYWuC4L.pgp Description: PGP signature
Re: Something missing in truss
On Sat, Dec 03, 2011 at 01:54:58PM -0600, Dan Nelson wrote: In the last episode (Dec 02), Eivind Evensen said: Does anybody else see this or know why? The machine here is running : uname -a FreeBSD elg.hjerdalen.lokalnett 8.2-STABLE FreeBSD 8.2-STABLE #36: Wed Nov 30 22:03:07 CET 2011 rumrunner@elg.hjerdalen.lokalnett:/usr/obj/usr/src/sys/RUM amd64 While trying to weed out some firefox problems, I've noticed that truss doesn't recognise certain syscalls : getpid() = 1519 (0x5ef) clock_gettime(4,{48496.335142903 }) = 0 (0x0) kevent(20,{0x23,EVFILT_READ,EV_ADD,0,0x0,0x809ec9d80},1,{0x15,EVFILT_READ,0x0,0,0x1,0x809ec9e80},64,0x0) = 1 (0x1) clock_gettime(4,{48496.335293202 }) = 0 (0x0) read(21,\0,1) = 1 (0x1) clock_gettime(4,{48496.335382599 }) = 0 (0x0) umask(0x80a52ee20,0x8,0x0,0x80a52ee00,0x7f1f9eb0,0x80a52ee00) = 116 (0x74) -- UNKNOWN SYSCALL -14704864 -- syscall(0x7f1f9ec0,0x0,0x18745,0x7f1f9eb0,0x1,0x7f1f9e90) = 454 (0x1c6) umask(0x80a52ee20,0x8,0x0,0x80a52ee00,0x7f1f9eb0,0x80a52ee00) = 116 (0x74) -- UNKNOWN SYSCALL -14704864 -- syscall(0x7f1f9ec0,0x0,0x18745,0x7f1f9eb0,0x1,0x7f1f9e90) = 454 (0x1c6) umask(0x80a52ee20,0x8,0x0,0x80a52ee00,0x7f1f9eb0,0x80a52ee00) = 116 (0x74) -- UNKNOWN SYSCALL -14704864 -- syscall(0x7f1f9ec0,0x0,0x18745,0x7f1f9eb0,0x1,0x7f1f9e90) = 454 (0x1c6) umask(0x80a52ee20,0x8,0x0,0x80a52ee00,0x7f1f9eb0,0x80a52ee00) = 116 (0x74) -- UNKNOWN SYSCALL -14704864 -- syscall(0x7f1f9ec0,0x0,0x18745,0x7f1f9eb0,0x1,0x7f1f9e90) = 454 (0x1c6) umask(0x80a52ee20,0x8,0x0,0x80a52ee00,0x7f1f9eb0,0x80a52ee00) = 116 (0x74) -- UNKNOWN SYSCALL -14704864 -- syscall(0x7f1f9ec0,0x0,0x18745,0x7f1f9eb0,0x1,0x7f1f9e90) = 454 (0x1c6) Two problems: truss get confused when you attach to a process that's currently executing a syscall, and it gets even more confused when you have a threaded process waiting in many syscalls at once. The following patch fixes problem #1, but problem #2 involves keeping more per-thread state and ends up touching a lot of the truss code. See http://www.evoy.net/FreeBSD/truss.diff for one solution (and more syscall decodes). Index: setup.c === --- setup.c (revision 228242) +++ setup.c (working copy) @@ -202,8 +202,10 @@ find_thread(info, lwpinfo.pl_lwpid); switch(WSTOPSIG(waitval)) { case SIGTRAP: - info-pr_why = info-curthread-in_syscall?S_SCX:S_SCE; - info-curthread-in_syscall = 1 - info-curthread-in_syscall; + if ((lwpinfo.pl_flags(PL_FLAG_SCE|PL_FLAG_SCX)) == 0) + err(1,pl_flags=%x contains neither PL_FLAG_SCE or PL_FLAG_SCX, lwpinfo.pl_flags); + info-pr_why = (lwpinfo.pl_flagsPL_FLAG_SCE) ? S_SCE:S_SCX; + info-curthread-in_syscall = (info-pr_why == S_SCE) ? 1:0; break; default: info-pr_why = S_SIG; I started the similar but bigger patch to handle syscalls entry, leave using explicit kernel hints. The patch is bigger because it also aims to also handle execve(2) kind of syscalls to properly change ABI decoder, and forks to attach to the childs in race-free manner. Unfortunately, it is stalled. I just committed the similar change from the patch, adding your assertion for the case when no PL_FLAG_SCE/SCX were provided. I think that assertion is in fact not quite right, and code should fall to the default case in the switch. The reason is that SIGTRAP may be sent as a normal signal. But this change is more controversial, and the patch should be an improvement over the current situation. Also, I should note that the patch cannot be merged even to stable/9, because MIPS and ARM still does not properly support PL_FLAGS_XXX. I hope to handle the merges after 9.0 is released. pgptBfyTXHzM1.pgp Description: PGP signature
Re: 8.2 + apache == a LOT of sigprocmask
On Fri, Nov 18, 2011 at 12:07:51PM -0800, Doug Barton wrote: On 11/18/2011 01:19, Kostik Belousov wrote: On Fri, Nov 18, 2011 at 12:00:57AM -0800, Doug Barton wrote: On 11/17/2011 02:57, Kostik Belousov wrote: It's not catching there though: Reading symbols from /libexec/ld-elf.so.1...done. Loaded symbols for /libexec/ld-elf.so.1 0x28183b2d in accept () at accept.S:3 3 RSYSCALL(accept) (gdb) c Continuing. no thread to satisfy query 0x28183b2d in accept () at accept.S:3 3 RSYSCALL(accept) (gdb) info threads Cannot get thread info: invalid key (gdb) Err, the other part of my message was that you shall set the breakpoint on sigprocmask. I'm sorry I'm not making myself clear. We are setting the breakpoint on sigprocmask. But, maybe I'm doing it wrong. Can you give precise instructions as to what you want me to do, from the beginning? Sorry to be so dense. Find the pid of the process issuing excessive number of sigprocmask calls. Do $ gdb /usr/local/bin/httpd (gdb) attach pid (gdb) b _sigprocmask (gdb) c Bah ! Breakpoint fired. (gdb) bt (gdb) c ... Repeat ... Right, so we're on the same page at least. I've been abbreviating the output of gdb to make it easier to see the problem, but here is a (nearly) complete transcript: gdb /usr/local/bin/httpd Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type show copying to see the conditions. There is absolutely no warranty for GDB. Type show warranty for details. This GDB was configured as i386-marcel-freebsd... (gdb) attach 1380 Attaching to program: /usr/local/bin/httpd, process 1380 Reading symbols from (lots of symbol-reading snipped) 3 RSYSCALL(accept) Current language: auto; currently asm (gdb) b _sigprocmask Breakpoint 1 at 0x282d9055: file /usr/src/lib/libthr/thread/thr_sig.c, line 210. (gdb) c Continuing. no thread to satisfy query 0x28183b2d in accept () at accept.S:3 3 RSYSCALL(accept) (gdb) c Continuing. no thread to satisfy query 0x28183b2d in accept () at accept.S:3 3 RSYSCALL(accept) (gdb) c Continuing. no thread to satisfy query 0x28183b2d in accept () at accept.S:3 3 RSYSCALL(accept) etc. This is an issue with either your environment or your gdb, or bug in gdb. It seems that 'continue' did not worked for you at all. I tried to reproduce this locally, but was not able to. And, I am unable to hit sigprocmask for my apache anywhere except rtld. I also have libthr linked in. So the way forward to catch sigprocmask callers is one of - figure out why your gdb does not work and fix it; might be, try to use gdb from ports. - or add libunwind backtraces into sigprocmask - or use dtrace (I doubt that 8.2 has neccessary usermode bits, and seriously doubt its stability). pgplBF5ytqHkR.pgp Description: PGP signature
Re: 8.2 + apache == a LOT of sigprocmask
On Fri, Nov 18, 2011 at 12:00:57AM -0800, Doug Barton wrote: On 11/17/2011 02:57, Kostik Belousov wrote: It's not catching there though: Reading symbols from /libexec/ld-elf.so.1...done. Loaded symbols for /libexec/ld-elf.so.1 0x28183b2d in accept () at accept.S:3 3RSYSCALL(accept) (gdb) c Continuing. no thread to satisfy query 0x28183b2d in accept () at accept.S:3 3RSYSCALL(accept) (gdb) info threads Cannot get thread info: invalid key (gdb) Err, the other part of my message was that you shall set the breakpoint on sigprocmask. I'm sorry I'm not making myself clear. We are setting the breakpoint on sigprocmask. But, maybe I'm doing it wrong. Can you give precise instructions as to what you want me to do, from the beginning? Sorry to be so dense. Find the pid of the process issuing excessive number of sigprocmask calls. Do $ gdb /usr/local/bin/httpd (gdb) attach pid (gdb) b _sigprocmask (gdb) c Bah ! Breakpoint fired. (gdb) bt (gdb) c ... Repeat ... I want to see a backtrace from the breakpoint hit. Several times. Me too. :) Meanwhile, in response to one of the other questions, we are using mpm_prefork. Also, the particular problem we're seeing does not appear related to fork(). The pattern of sigprocmask() calls is different from the pattern you see with fork(). I am sure that your sigprocmask calls do not come from rtld, it is some use of setjmp or sigsetjmp(1), most likely. I am not aware of any significant users of setjmp or sigprocmask in our system libraries. Doug -- We could put the whole Internet into a book. Too practical. Breadth of IT experience, and depth of knowledge in the DNS. Yours for the right price. :) http://SupersetSolutions.com/ pgpLtgYwMDdmp.pgp Description: PGP signature
Re: 8.2 + apache == a LOT of sigprocmask
On Wed, Nov 16, 2011 at 11:59:06PM -0800, Doug Barton wrote: On 11/16/2011 23:49, Kostik Belousov wrote: On Wed, Nov 16, 2011 at 10:46:27PM -0800, Doug Barton wrote: On 11/15/2011 02:09, Jeremy Chadwick wrote: On Tue, Nov 15, 2011 at 11:07:45AM +0200, Kostik Belousov wrote: On Mon, Nov 14, 2011 at 12:51:35PM -0800, Doug Barton wrote: On 11/14/2011 12:31, Doug Barton wrote: Trying to track down a load problem we're seeing on 8.2-RELEASE-p4 i386 in a busy web hosting environment I came across the following post: http://lists.freebsd.org/pipermail/freebsd-questions/2011-October/234520.html That basically describes what we're seeing as well, including the doesn't happen on Linux part. Does anyone have any ideas about this? With incredibly similar stuff running on 7.x we didn't see this problem, so it seems to be something new in 8. Just took a closer look at our ktrace, and actually our pattern is slightly different than the one in that post. In ours the second option is null, but the third is set: 74195 httpd0.17 RET sigprocmask 0 74195 httpd0.13 CALL sigprocmask(SIG_BLOCK,0,0xbfbf89d4) 74195 httpd0.09 RET sigprocmask 0 74195 httpd0.13 CALL sigprocmask(SIG_BLOCK,0,0xbfbf89d4) 74195 httpd0.09 RET sigprocmask 0 74195 httpd0.12 CALL sigprocmask(SIG_BLOCK,0,0xbfbf89d4) But repeated hundreds of times in a row. The calls cannot come from rtld, they are generated by some setjmp() invocation. If signal-safety is not needed, sigsetjmp() should be used instead. Quick grep of the apache httpd source shows a single setjmp() in their copy of pcre. No idea is it to safe to change setjmp() into sigsetjmp(?, 0). I hate cross-posting, but: adding freebsd-apache@ to the list. Some of the Apache folks (not just port committers) may have some insight to Kostik's findings. Thanks to everyone for the responses. We tried Kostik's suggestion and unfortunately it didn't reduce the number of sigprocmask() calls to a statistically significant degree. Does anyone have any other ideas on ways to debug this? We're sort of running out of things to test. :-/ Given how important (and prevalent) the Apache + FreeBSD combination is, I'm kind of disturbed that we're seeing this performance problem, and if it's something in 8.x that's also in 9.x, it would be better to fix it prior to 9.0-RELEASE. Since my guess appeared to be not useful, Well I wouldn't say that they weren't useful, we eliminated the obvious candidate. So, not good news certainly, but not unhelpful. :) the way forward is to identify the location of the call(s) that cause the issue. I suggest compliling at least apache itself, libc, rtld and libthr (if used) with debugging information. Then, attach to the running apache worker with the gdb and Note this part. set breakpoint on sigprocmask. Several backtraces from the hit breakpoint should give enough data. We tried that, and got this: Loaded symbols for /libexec/ld-elf.so.1 0x28183a5d in accept () from /lib/libc.so.7 (gdb) b sigprocmask Breakpoint 1 at 0x282d8f84 (gdb) c Continuing. no thread to satisfy query 0x28183a5d in accept () from /lib/libc.so.7 (gdb) It seems your libc has no debugging information. accept() is the pure syscall wrapper, it cannot call sigprocmask. If gdb catched the PLT trampoline instead of real accept(), we would see the rtld frames. So install libc, libthr and rtld with debug. Also, having debug symbols for apache itself can be useful. Of course I'm not the world's greatest gdb'er, so maybe there is a better way to do it? High-tech solution is to link with libunwind and add code into sigprocmask() to gather the stacks. But I expect that gdb attach is enough. Ok, we'll look into that, thanks. Doug -- We could put the whole Internet into a book. Too practical. Breadth of IT experience, and depth of knowledge in the DNS. Yours for the right price. :) http://SupersetSolutions.com/ pgp8pbhWv1A3X.pgp Description: PGP signature
Re: 8.2 + apache == a LOT of sigprocmask
On Thu, Nov 17, 2011 at 01:26:49AM -0800, Doug Barton wrote: On 11/17/2011 00:12, Kostik Belousov wrote: On Wed, Nov 16, 2011 at 11:59:06PM -0800, Doug Barton wrote: On 11/16/2011 23:49, Kostik Belousov wrote: On Wed, Nov 16, 2011 at 10:46:27PM -0800, Doug Barton wrote: On 11/15/2011 02:09, Jeremy Chadwick wrote: On Tue, Nov 15, 2011 at 11:07:45AM +0200, Kostik Belousov wrote: On Mon, Nov 14, 2011 at 12:51:35PM -0800, Doug Barton wrote: On 11/14/2011 12:31, Doug Barton wrote: Trying to track down a load problem we're seeing on 8.2-RELEASE-p4 i386 in a busy web hosting environment I came across the following post: http://lists.freebsd.org/pipermail/freebsd-questions/2011-October/234520.html That basically describes what we're seeing as well, including the doesn't happen on Linux part. Does anyone have any ideas about this? With incredibly similar stuff running on 7.x we didn't see this problem, so it seems to be something new in 8. Just took a closer look at our ktrace, and actually our pattern is slightly different than the one in that post. In ours the second option is null, but the third is set: 74195 httpd0.17 RET sigprocmask 0 74195 httpd0.13 CALL sigprocmask(SIG_BLOCK,0,0xbfbf89d4) 74195 httpd0.09 RET sigprocmask 0 74195 httpd0.13 CALL sigprocmask(SIG_BLOCK,0,0xbfbf89d4) 74195 httpd0.09 RET sigprocmask 0 74195 httpd0.12 CALL sigprocmask(SIG_BLOCK,0,0xbfbf89d4) But repeated hundreds of times in a row. The calls cannot come from rtld, they are generated by some setjmp() invocation. If signal-safety is not needed, sigsetjmp() should be used instead. Quick grep of the apache httpd source shows a single setjmp() in their copy of pcre. No idea is it to safe to change setjmp() into sigsetjmp(?, 0). I hate cross-posting, but: adding freebsd-apache@ to the list. Some of the Apache folks (not just port committers) may have some insight to Kostik's findings. Thanks to everyone for the responses. We tried Kostik's suggestion and unfortunately it didn't reduce the number of sigprocmask() calls to a statistically significant degree. Does anyone have any other ideas on ways to debug this? We're sort of running out of things to test. :-/ Given how important (and prevalent) the Apache + FreeBSD combination is, I'm kind of disturbed that we're seeing this performance problem, and if it's something in 8.x that's also in 9.x, it would be better to fix it prior to 9.0-RELEASE. Since my guess appeared to be not useful, Well I wouldn't say that they weren't useful, we eliminated the obvious candidate. So, not good news certainly, but not unhelpful. :) the way forward is to identify the location of the call(s) that cause the issue. I suggest compliling at least apache itself, libc, rtld and libthr (if used) with debugging information. Then, attach to the running apache worker with the gdb and Note this part. Right, we attached to a worker, that's why it's in accept(). :) It seems your libc has no debugging information. accept() is the pure syscall wrapper, it cannot call sigprocmask. If gdb catched the PLT trampoline instead of real accept(), we would see the rtld frames. So install libc, libthr and rtld with debug. It's not catching there though: Reading symbols from /libexec/ld-elf.so.1...done. Loaded symbols for /libexec/ld-elf.so.1 0x28183b2d in accept () at accept.S:3 3 RSYSCALL(accept) (gdb) c Continuing. no thread to satisfy query 0x28183b2d in accept () at accept.S:3 3 RSYSCALL(accept) (gdb) info threads Cannot get thread info: invalid key (gdb) Err, the other part of my message was that you shall set the breakpoint on sigprocmask. I want to see a backtrace from the breakpoint hit. Several times. The backtrace at the attach time has no use. pgptW6yGgAFjw.pgp Description: PGP signature
Re: 8.2 + apache == a LOT of sigprocmask
On Wed, Nov 16, 2011 at 10:46:27PM -0800, Doug Barton wrote: On 11/15/2011 02:09, Jeremy Chadwick wrote: On Tue, Nov 15, 2011 at 11:07:45AM +0200, Kostik Belousov wrote: On Mon, Nov 14, 2011 at 12:51:35PM -0800, Doug Barton wrote: On 11/14/2011 12:31, Doug Barton wrote: Trying to track down a load problem we're seeing on 8.2-RELEASE-p4 i386 in a busy web hosting environment I came across the following post: http://lists.freebsd.org/pipermail/freebsd-questions/2011-October/234520.html That basically describes what we're seeing as well, including the doesn't happen on Linux part. Does anyone have any ideas about this? With incredibly similar stuff running on 7.x we didn't see this problem, so it seems to be something new in 8. Just took a closer look at our ktrace, and actually our pattern is slightly different than the one in that post. In ours the second option is null, but the third is set: 74195 httpd0.17 RET sigprocmask 0 74195 httpd0.13 CALL sigprocmask(SIG_BLOCK,0,0xbfbf89d4) 74195 httpd0.09 RET sigprocmask 0 74195 httpd0.13 CALL sigprocmask(SIG_BLOCK,0,0xbfbf89d4) 74195 httpd0.09 RET sigprocmask 0 74195 httpd0.12 CALL sigprocmask(SIG_BLOCK,0,0xbfbf89d4) But repeated hundreds of times in a row. The calls cannot come from rtld, they are generated by some setjmp() invocation. If signal-safety is not needed, sigsetjmp() should be used instead. Quick grep of the apache httpd source shows a single setjmp() in their copy of pcre. No idea is it to safe to change setjmp() into sigsetjmp(?, 0). I hate cross-posting, but: adding freebsd-apache@ to the list. Some of the Apache folks (not just port committers) may have some insight to Kostik's findings. Thanks to everyone for the responses. We tried Kostik's suggestion and unfortunately it didn't reduce the number of sigprocmask() calls to a statistically significant degree. Does anyone have any other ideas on ways to debug this? We're sort of running out of things to test. :-/ Given how important (and prevalent) the Apache + FreeBSD combination is, I'm kind of disturbed that we're seeing this performance problem, and if it's something in 8.x that's also in 9.x, it would be better to fix it prior to 9.0-RELEASE. Since my guess appeared to be not useful, the way forward is to identify the location of the call(s) that cause the issue. I suggest compliling at least apache itself, libc, rtld and libthr (if used) with debugging information. Then, attach to the running apache worker with the gdb and set breakpoint on sigprocmask. Several backtraces from the hit breakpoint should give enough data. High-tech solution is to link with libunwind and add code into sigprocmask() to gather the stacks. But I expect that gdb attach is enough. pgph4H6aDhzI5.pgp Description: PGP signature
Re: 8.2 + apache == a LOT of sigprocmask
On Mon, Nov 14, 2011 at 12:51:35PM -0800, Doug Barton wrote: On 11/14/2011 12:31, Doug Barton wrote: Trying to track down a load problem we're seeing on 8.2-RELEASE-p4 i386 in a busy web hosting environment I came across the following post: http://lists.freebsd.org/pipermail/freebsd-questions/2011-October/234520.html That basically describes what we're seeing as well, including the doesn't happen on Linux part. Does anyone have any ideas about this? With incredibly similar stuff running on 7.x we didn't see this problem, so it seems to be something new in 8. Just took a closer look at our ktrace, and actually our pattern is slightly different than the one in that post. In ours the second option is null, but the third is set: 74195 httpd0.17 RET sigprocmask 0 74195 httpd0.13 CALL sigprocmask(SIG_BLOCK,0,0xbfbf89d4) 74195 httpd0.09 RET sigprocmask 0 74195 httpd0.13 CALL sigprocmask(SIG_BLOCK,0,0xbfbf89d4) 74195 httpd0.09 RET sigprocmask 0 74195 httpd0.12 CALL sigprocmask(SIG_BLOCK,0,0xbfbf89d4) But repeated hundreds of times in a row. The calls cannot come from rtld, they are generated by some setjmp() invocation. If signal-safety is not needed, sigsetjmp() should be used instead. Quick grep of the apache httpd source shows a single setjmp() in their copy of pcre. No idea is it to safe to change setjmp() into sigsetjmp(?, 0). pgp1NaxV3yHCF.pgp Description: PGP signature
Re: valgrind on FreeBSD?
On Sun, Oct 09, 2011 at 04:18:48PM +0200, V??clav Zeman wrote: Jeremy Chadwick wrote, On 9.10.2011 16:11: On Sun, Oct 09, 2011 at 03:48:57PM +0200, V??clav Zeman wrote: V??clav Zeman wrote, On 9.10.2011 15:25: Bakul Shah wrote, On 6.10.2011 8:40: On Wed, 05 Oct 2011 23:06:04 +0200 =?UTF-8?B?VsOhY2xhdiBaZW1hbg==?= v.hais...@sh.cvut.cz wrote: Hi. No matter what I try, valgrind on 7.3-STABLE is giving me this, both Valgrind ports: valgrind: Startup or configuration error: Can't establish current working directory at startup valgrind: Unable to start up properly. Giving up. What do I need to do to make it work? Try running valgrind under ktrace ( view with kdump). That will tell you what directory it is trying to access or what syscall fails and why. Hi. So I have done that and more. I have first updated from 7.3 to 8.2 (RELENG_8 actually). I have not managed to recompile all of the installed Ports yet, but I made sure to recompile valgrind and its dependencies. The same thing has happened! As I have said, I have done the ktrace and here is the interesting bit: 78028 valgrind NAMI /usr/local/lib/valgrind/memcheck-amd64-freebsd 78028 memcheck-amd64-free RET execve 0 78028 memcheck-amd64-free CALL getpid 78028 memcheck-amd64-free RET getpid 78028/0x130cc 78028 memcheck-amd64-free CALL __sysctl(0x39a91450,0x4,0x389a3800,0x39a91468,0,0) 78028 memcheck-amd64-free SCTL kern.proc.vmmap.78028 78028 memcheck-amd64-free RET __sysctl 0 78028 memcheck-amd64-free CALL mmap(0x49000,0x40,PROT_READ|PROT_WRITE|PROT_EXEC,MAP_PRIVATE|MAP_FIXED|MAP_ANON,0x,0) 78028 memcheck-amd64-free RET mmap 17179906048/0x49000 78028 memcheck-amd64-free CALL getrlimit(RLIMIT_DATA,0x39e6a780) 78028 memcheck-amd64-free RET getrlimit 0 78028 memcheck-amd64-free CALL setrlimit(RLIMIT_DATA,0x39a919e0) 78028 memcheck-amd64-free RET setrlimit 0 78028 memcheck-amd64-free CALL getrlimit(RLIMIT_STACK,0x39e6a790) 78028 memcheck-amd64-free RET getrlimit 0 78028 memcheck-amd64-free CALL __getcwd(0x3882d700,0x3ff) 78028 memcheck-amd64-free NAMI .. 78028 memcheck-amd64-free RET __getcwd -1 errno 2 No such file or directory 78028 memcheck-amd64-free CALL write(0x2,0x3830b060,0x6c) 78028 memcheck-amd64-free GIO fd 2 wrote 108 bytes valgrind: Startup or configuration error: valgrind:Can't establish current working directory at startup 78028 memcheck-amd64-free RET write 108/0x6c 78028 memcheck-amd64-free CALL write(0x2,0x3830b060,0x33) 78028 memcheck-amd64-free GIO fd 2 wrote 51 bytes valgrind: Unable to start up properly. Giving up. 78028 memcheck-amd64-free RET write 51/0x33 78028 memcheck-amd64-free CALL exit(0x1) Now what? Why would the __getcwd call be failing with No such file or directory? It is the nullfs! I have /home mounted using nullfs to /usr/home: /usr/home /home nullfs rw,multilabel,acls 0 0 When I run valgrind from the /usr based directory, it works: shell::wilx:/usr/home/users/wilx/tmp/yttool valgrind --tool=memcheck ./yttool ==34679== Memcheck, a memory error detector ==34679== Copyright (C) 2002-2010, and GNU GPL'd, by Julian Seward et al. ==34679== Using Valgrind-3.6.1 and LibVEX; rerun with -h for copyright info ==34679== Command: ./yttool ==34679== ==34679== ==34679== HEAP SUMMARY: ==34679== in use at exit: 20,395 bytes in 119 blocks ==34679== total heap usage: 6,719 allocs, 6,600 frees, 716,787 bytes allocated ==34679== ==34679== LEAK SUMMARY: ==34679==definitely lost: 0 bytes in 0 blocks ==34679==indirectly lost: 0 bytes in 0 blocks ==34679== possibly lost: 134 bytes in 4 blocks ==34679==still reachable: 20,261 bytes in 115 blocks ==34679== suppressed: 0 bytes in 0 blocks ==34679== Rerun with --leak-check=full to see details of leaked memory ==34679== ==34679== For counts of detected and suppressed errors, rerun with: -v ==34679== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0) But when I run it from the nullfs mount, it fails: shell::wilx:/usr/home/users/wilx/tmp/yttool cd $HOME/tmp/yttool shell::wilx:~/tmp/yttool valgrind --tool=memcheck ./yttool valgrind: Startup or configuration error: valgrind:Can't establish current working directory at startup valgrind: Unable to start up properly. Giving up. Amazing how userland utilities behave differently depending upon the underlying filesystem type, eh? Good thing I asked what your underlying filesystem types were. Don't ever think that it'll all just work. :-) I believe there are other issues/stipulations with nullfs (some have been reported over the years), so I'm not too surprised by this issue. I have no idea who currently
Re: stable/9 r225827 i386 panic: vm_page_unwire: page 0xc2a38dc8's wire count is zero
On Thu, Sep 29, 2011 at 02:52:31PM +0300, Alexandr Kovalenko wrote: Hello! I'm running 9.0-BETA3 (r225827) and now rebuilding all my 1215 ports (I've upgraded from 8.2). I'm getting panic. Is it known problem/already fixed somewhere? FreeBSD mile.xxx.ua 9.0-BETA3 FreeBSD 9.0-BETA3 #0 r225827: Wed Sep 28 17:11:17 EEST 2011 r...@mile.xxx.ua:/usr/obj/usr/src/sys/mile-9 i386 Unread portion of the kernel message buffer: panic: vm_page_unwire: page 0xc2a38dc8's wire count is zero cpuid = 1 Uptime: 16h6m53s Physical memory: 1904 MB Dumping 367 MB: 352 336 320 304 288 272 256 240 224 208 192 176 160 144 128 112 96 80 64 48 32 16 #0 doadump (textdump=1) at pcpu.h:244 #1 0xc071e5cb in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:442 #2 0xc071e82b in panic (fmt=Variable fmt is not available. ) at /usr/src/sys/kern/kern_shutdown.c:607 #3 0xc0966903 in vm_page_unwire (m=0xc2a38dc8, activate=0) at /usr/src/sys/vm/vm_page.c:1905 Please do frame 2, then p/x *m and show the result. #4 0xc0796b80 in vfs_vmio_release (bp=0xde8bcbf4) at /usr/src/sys/kern/vfs_bio.c:1638 #5 0xc0798813 in getnewbuf (vp=0xc6ea3550, slpflag=0, slptimeo=0, size=16384, maxsize=16384, gbflags=0) at /usr/src/sys/kern/vfs_bio.c:1949 #6 0xc0799f2a in getblk (vp=0xc6ea3550, blkno=2520, size=16384, slpflag=0, slptimeo=0, flags=Variable flags is not available. ) at /usr/src/sys/kern/vfs_bio.c:2788 #7 0xc079d49c in cluster_rbuild (vp=0xc6ea3550, filesize=44505088, lbn=2520, blkno=1209440, size=16384, run=Variable run is not available. ) at /usr/src/sys/kern/vfs_cluster.c:332 #8 0xc079e145 in cluster_read (vp=0xc6ea3550, filesize=44505088, lblkno=2520, size=16384, cred=0x0, totread=1024, seqcount=7, bpp=0xf5824b60) at /usr/src/sys/kern/vfs_cluster.c:254 #9 0xc0934cf5 in ffs_read (ap=0xf5824bac) at /usr/src/sys/ufs/ffs/ffs_vnops.c:514 #10 0xc09ccb92 in VOP_READ_APV (vop=0xc0aa6a80, a=0xf5824bac) at vnode_if.c:887 #11 0xc07c1120 in vn_read (fp=0xc5474508, uio=0xf5824c48, active_cred=0xc56a4d80, flags=1, td=0xc5b76b80) at vnode_if.h:384 #12 0xc076380e in dofileread (td=0xc5b76b80, fd=3, fp=0xc5474508, auio=0xf5824c48, offset=41189376, flags=1) at file.h:254 #13 0xc07639f5 in kern_preadv (td=0xc5b76b80, fd=3, auio=0xf5824c48, offset=41189376) at /usr/src/sys/kern/sys_generic.c:288 #14 0xc0763b0d in sys_pread (td=0xc5b76b80, uap=0xf5824cec) at /usr/src/sys/kern/sys_generic.c:189 #15 0xc09accf5 in syscall (frame=0xf5824d28) at subr_syscall.c:131 #16 0xc0996db1 in Xint0x80_syscall () at /usr/src/sys/i386/i386/exception.s:266 #17 0x0033 in ?? () Previous frame inner to this frame (corrupt stack?) -- Alexandr Kovalenko http://uafug.org.ua/ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org pgpEqW488Y83a.pgp Description: PGP signature
Re: stable/9 r225827 i386 panic: vm_page_unwire: page 0xc2a38dc8's wire count is zero
On Thu, Sep 29, 2011 at 03:47:19PM +0300, Alexandr Kovalenko wrote: On Thu, Sep 29, 2011 at 3:30 PM, Kostik Belousov kostik...@gmail.com wrote: On Thu, Sep 29, 2011 at 02:52:31PM +0300, Alexandr Kovalenko wrote: Hello! I'm running 9.0-BETA3 (r225827) and now rebuilding all my 1215 ports (I've upgraded from 8.2). I'm getting panic. Is it known problem/already fixed somewhere? FreeBSD mile.xxx.ua 9.0-BETA3 FreeBSD 9.0-BETA3 #0 r225827: Wed Sep 28 17:11:17 EEST 2011 r...@mile.xxx.ua:/usr/obj/usr/src/sys/mile-9 i386 Unread portion of the kernel message buffer: panic: vm_page_unwire: page 0xc2a38dc8's wire count is zero cpuid = 1 Uptime: 16h6m53s Physical memory: 1904 MB Dumping 367 MB: 352 336 320 304 288 272 256 240 224 208 192 176 160 144 128 112 96 80 64 48 32 16 #0 doadump (textdump=1) at pcpu.h:244 #1 0xc071e5cb in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:442 #2 0xc071e82b in panic (fmt=Variable fmt is not available. ) at /usr/src/sys/kern/kern_shutdown.c:607 #3 0xc0966903 in vm_page_unwire (m=0xc2a38dc8, activate=0) at /usr/src/sys/vm/vm_page.c:1905 Please do frame 2, then p/x *m and show the result. (kgdb) frame 2 frame 3, sorry. p/x *(struct vm_page *)0xc2a38dc8 will do it as well. #2 0xc071e82b in panic (fmt=Variable fmt is not available.) at /usr/src/sys/kern/kern_shutdown.c:607 607 kern_reboot(bootopt); (kgdb) p/x *m No symbol m in current context. #4 0xc0796b80 in vfs_vmio_release (bp=0xde8bcbf4) at /usr/src/sys/kern/vfs_bio.c:1638 #5 0xc0798813 in getnewbuf (vp=0xc6ea3550, slpflag=0, slptimeo=0, size=16384, maxsize=16384, gbflags=0) at /usr/src/sys/kern/vfs_bio.c:1949 #6 0xc0799f2a in getblk (vp=0xc6ea3550, blkno=2520, size=16384, slpflag=0, slptimeo=0, flags=Variable flags is not available. ) at /usr/src/sys/kern/vfs_bio.c:2788 #7 0xc079d49c in cluster_rbuild (vp=0xc6ea3550, filesize=44505088, lbn=2520, blkno=1209440, size=16384, run=Variable run is not available. ) at /usr/src/sys/kern/vfs_cluster.c:332 #8 0xc079e145 in cluster_read (vp=0xc6ea3550, filesize=44505088, lblkno=2520, size=16384, cred=0x0, totread=1024, seqcount=7, bpp=0xf5824b60) at /usr/src/sys/kern/vfs_cluster.c:254 #9 0xc0934cf5 in ffs_read (ap=0xf5824bac) at /usr/src/sys/ufs/ffs/ffs_vnops.c:514 #10 0xc09ccb92 in VOP_READ_APV (vop=0xc0aa6a80, a=0xf5824bac) at vnode_if.c:887 #11 0xc07c1120 in vn_read (fp=0xc5474508, uio=0xf5824c48, active_cred=0xc56a4d80, flags=1, td=0xc5b76b80) at vnode_if.h:384 #12 0xc076380e in dofileread (td=0xc5b76b80, fd=3, fp=0xc5474508, auio=0xf5824c48, offset=41189376, flags=1) at file.h:254 #13 0xc07639f5 in kern_preadv (td=0xc5b76b80, fd=3, auio=0xf5824c48, offset=41189376) at /usr/src/sys/kern/sys_generic.c:288 #14 0xc0763b0d in sys_pread (td=0xc5b76b80, uap=0xf5824cec) at /usr/src/sys/kern/sys_generic.c:189 #15 0xc09accf5 in syscall (frame=0xf5824d28) at subr_syscall.c:131 #16 0xc0996db1 in Xint0x80_syscall () at /usr/src/sys/i386/i386/exception.s:266 #17 0x0033 in ?? () Previous frame inner to this frame (corrupt stack?) -- Alexandr Kovalenko http://uafug.org.ua/ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org -- Alexandr Kovalenko http://uafug.org.ua/ pgpKS0hyuF5Gk.pgp Description: PGP signature
Re: stable/9 r225827 i386 panic: vm_page_unwire: page 0xc2a38dc8's wire count is zero
On Thu, Sep 29, 2011 at 03:51:53PM +0300, Alexandr Kovalenko wrote: 2011/9/29 Kostik Belousov kostik...@gmail.com: On Thu, Sep 29, 2011 at 03:47:19PM +0300, Alexandr Kovalenko wrote: On Thu, Sep 29, 2011 at 3:30 PM, Kostik Belousov kostik...@gmail.com wrote: On Thu, Sep 29, 2011 at 02:52:31PM +0300, Alexandr Kovalenko wrote: Hello! I'm running 9.0-BETA3 (r225827) and now rebuilding all my 1215 ports (I've upgraded from 8.2). I'm getting panic. Is it known problem/already fixed somewhere? Do you use custom kernel config ? Is there a chance you have ZERO_COPY_SOCKETS option enabled ? FreeBSD mile.xxx.ua 9.0-BETA3 FreeBSD 9.0-BETA3 #0 r225827: Wed Sep 28 17:11:17 EEST 2011 r...@mile.xxx.ua:/usr/obj/usr/src/sys/mile-9 i386 Unread portion of the kernel message buffer: panic: vm_page_unwire: page 0xc2a38dc8's wire count is zero cpuid = 1 Uptime: 16h6m53s Physical memory: 1904 MB Dumping 367 MB: 352 336 320 304 288 272 256 240 224 208 192 176 160 144 128 112 96 80 64 48 32 16 #0 doadump (textdump=1) at pcpu.h:244 #1 0xc071e5cb in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:442 #2 0xc071e82b in panic (fmt=Variable fmt is not available. ) at /usr/src/sys/kern/kern_shutdown.c:607 #3 0xc0966903 in vm_page_unwire (m=0xc2a38dc8, activate=0) at /usr/src/sys/vm/vm_page.c:1905 Please do frame 2, then p/x *m and show the result. (kgdb) frame 2 frame 3, sorry. p/x *(struct vm_page *)0xc2a38dc8 will do it as well. (kgdb) frame 3 #3 0xc0966903 in vm_page_unwire (m=0xc2a38dc8, activate=0) at /usr/src/sys/vm/vm_page.c:1905 1905panic(vm_page_unwire: page %p's wire count is zero, m); (kgdb) p/x *(struct vm_page *)0xc2a38dc8 $1 = {pageq = {tqe_next = 0xc2a38e10, tqe_prev = 0xc282a2b0}, listq = {tqe_next = 0xc2a38e10, tqe_prev = 0xc282a2b8}, left = 0x0, right = 0x0, object = 0xc5725770, pindex = 0xbd3, phys_addr = 0x56a32000, md = {pv_list = {tqh_first = 0xc3cc6418, tqh_last = 0xc3cc641c}, pat_mode = 0x6}, queue = 0x1, segind = 0x2, hold_count = 0x0, order = 0xb, pool = 0x0, cow = 0x0, wire_count = 0x0, aflags = 0x3, flags = 0x0, oflags = 0x0, act_count = 0x5, busy = 0x0, valid = 0xff, dirty = 0xff} Please show the output of p *(struct vm_object *)0xc5725770 from kgdb. #2 0xc071e82b in panic (fmt=Variable fmt is not available.) at /usr/src/sys/kern/kern_shutdown.c:607 607 kern_reboot(bootopt); (kgdb) p/x *m No symbol m in current context. #4 0xc0796b80 in vfs_vmio_release (bp=0xde8bcbf4) at /usr/src/sys/kern/vfs_bio.c:1638 #5 0xc0798813 in getnewbuf (vp=0xc6ea3550, slpflag=0, slptimeo=0, size=16384, maxsize=16384, gbflags=0) at /usr/src/sys/kern/vfs_bio.c:1949 #6 0xc0799f2a in getblk (vp=0xc6ea3550, blkno=2520, size=16384, slpflag=0, slptimeo=0, flags=Variable flags is not available. ) at /usr/src/sys/kern/vfs_bio.c:2788 #7 0xc079d49c in cluster_rbuild (vp=0xc6ea3550, filesize=44505088, lbn=2520, blkno=1209440, size=16384, run=Variable run is not available. ) at /usr/src/sys/kern/vfs_cluster.c:332 #8 0xc079e145 in cluster_read (vp=0xc6ea3550, filesize=44505088, lblkno=2520, size=16384, cred=0x0, totread=1024, seqcount=7, bpp=0xf5824b60) at /usr/src/sys/kern/vfs_cluster.c:254 #9 0xc0934cf5 in ffs_read (ap=0xf5824bac) at /usr/src/sys/ufs/ffs/ffs_vnops.c:514 #10 0xc09ccb92 in VOP_READ_APV (vop=0xc0aa6a80, a=0xf5824bac) at vnode_if.c:887 #11 0xc07c1120 in vn_read (fp=0xc5474508, uio=0xf5824c48, active_cred=0xc56a4d80, flags=1, td=0xc5b76b80) at vnode_if.h:384 #12 0xc076380e in dofileread (td=0xc5b76b80, fd=3, fp=0xc5474508, auio=0xf5824c48, offset=41189376, flags=1) at file.h:254 #13 0xc07639f5 in kern_preadv (td=0xc5b76b80, fd=3, auio=0xf5824c48, offset=41189376) at /usr/src/sys/kern/sys_generic.c:288 #14 0xc0763b0d in sys_pread (td=0xc5b76b80, uap=0xf5824cec) at /usr/src/sys/kern/sys_generic.c:189 #15 0xc09accf5 in syscall (frame=0xf5824d28) at subr_syscall.c:131 #16 0xc0996db1 in Xint0x80_syscall () at /usr/src/sys/i386/i386/exception.s:266 #17 0x0033 in ?? () Previous frame inner to this frame (corrupt stack?) -- Alexandr Kovalenko http://uafug.org.ua/ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org -- Alexandr Kovalenko http://uafug.org.ua/ -- Alexandr Kovalenko http://uafug.org.ua/ pgpJLf0pqRQ0D.pgp Description: PGP signature
Re: stable/9 r225827 i386 panic: vm_page_unwire: page 0xc2a38dc8's wire count is zero
On Thu, Sep 29, 2011 at 04:12:19PM +0300, Alexandr Kovalenko wrote: 2011/9/29 Kostik Belousov kostik...@gmail.com: On Thu, Sep 29, 2011 at 03:51:53PM +0300, Alexandr Kovalenko wrote: 2011/9/29 Kostik Belousov kostik...@gmail.com: On Thu, Sep 29, 2011 at 03:47:19PM +0300, Alexandr Kovalenko wrote: On Thu, Sep 29, 2011 at 3:30 PM, Kostik Belousov kostik...@gmail.com wrote: On Thu, Sep 29, 2011 at 02:52:31PM +0300, Alexandr Kovalenko wrote: Hello! I'm running 9.0-BETA3 (r225827) and now rebuilding all my 1215 ports (I've upgraded from 8.2). I'm getting panic. Is it known problem/already fixed somewhere? Do you use custom kernel config ? Is there a chance you have ZERO_COPY_SOCKETS option enabled ? Yes, ZERO_COPY_SOCKETS is there. Ok, this is the cause. Remove it. I asked for some additional data below, which you ignored, but I believe that I will not see anything new there, after we found the ZERO_COPY_SOCKETS in kernel config. FreeBSD mile.xxx.ua 9.0-BETA3 FreeBSD 9.0-BETA3 #0 r225827: Wed Sep 28 17:11:17 EEST 2011 r...@mile.xxx.ua:/usr/obj/usr/src/sys/mile-9 i386 Unread portion of the kernel message buffer: panic: vm_page_unwire: page 0xc2a38dc8's wire count is zero cpuid = 1 Uptime: 16h6m53s Physical memory: 1904 MB Dumping 367 MB: 352 336 320 304 288 272 256 240 224 208 192 176 160 144 128 112 96 80 64 48 32 16 #0 doadump (textdump=1) at pcpu.h:244 #1 0xc071e5cb in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:442 #2 0xc071e82b in panic (fmt=Variable fmt is not available. ) at /usr/src/sys/kern/kern_shutdown.c:607 #3 0xc0966903 in vm_page_unwire (m=0xc2a38dc8, activate=0) at /usr/src/sys/vm/vm_page.c:1905 Please do frame 2, then p/x *m and show the result. (kgdb) frame 2 frame 3, sorry. p/x *(struct vm_page *)0xc2a38dc8 will do it as well. (kgdb) frame 3 #3 0xc0966903 in vm_page_unwire (m=0xc2a38dc8, activate=0) at /usr/src/sys/vm/vm_page.c:1905 1905 panic(vm_page_unwire: page %p's wire count is zero, m); (kgdb) p/x *(struct vm_page *)0xc2a38dc8 $1 = {pageq = {tqe_next = 0xc2a38e10, tqe_prev = 0xc282a2b0}, listq = {tqe_next = 0xc2a38e10, tqe_prev = 0xc282a2b8}, left = 0x0, right = 0x0, object = 0xc5725770, pindex = 0xbd3, phys_addr = 0x56a32000, md = {pv_list = {tqh_first = 0xc3cc6418, tqh_last = 0xc3cc641c}, pat_mode = 0x6}, queue = 0x1, segind = 0x2, hold_count = 0x0, order = 0xb, pool = 0x0, cow = 0x0, wire_count = 0x0, aflags = 0x3, flags = 0x0, oflags = 0x0, act_count = 0x5, busy = 0x0, valid = 0xff, dirty = 0xff} Please show the output of p *(struct vm_object *)0xc5725770 from kgdb. #2 0xc071e82b in panic (fmt=Variable fmt is not available.) at /usr/src/sys/kern/kern_shutdown.c:607 607 kern_reboot(bootopt); (kgdb) p/x *m No symbol m in current context. #4 0xc0796b80 in vfs_vmio_release (bp=0xde8bcbf4) at /usr/src/sys/kern/vfs_bio.c:1638 #5 0xc0798813 in getnewbuf (vp=0xc6ea3550, slpflag=0, slptimeo=0, size=16384, maxsize=16384, gbflags=0) at /usr/src/sys/kern/vfs_bio.c:1949 #6 0xc0799f2a in getblk (vp=0xc6ea3550, blkno=2520, size=16384, slpflag=0, slptimeo=0, flags=Variable flags is not available. ) at /usr/src/sys/kern/vfs_bio.c:2788 #7 0xc079d49c in cluster_rbuild (vp=0xc6ea3550, filesize=44505088, lbn=2520, blkno=1209440, size=16384, run=Variable run is not available. ) at /usr/src/sys/kern/vfs_cluster.c:332 #8 0xc079e145 in cluster_read (vp=0xc6ea3550, filesize=44505088, lblkno=2520, size=16384, cred=0x0, totread=1024, seqcount=7, bpp=0xf5824b60) at /usr/src/sys/kern/vfs_cluster.c:254 #9 0xc0934cf5 in ffs_read (ap=0xf5824bac) at /usr/src/sys/ufs/ffs/ffs_vnops.c:514 #10 0xc09ccb92 in VOP_READ_APV (vop=0xc0aa6a80, a=0xf5824bac) at vnode_if.c:887 #11 0xc07c1120 in vn_read (fp=0xc5474508, uio=0xf5824c48, active_cred=0xc56a4d80, flags=1, td=0xc5b76b80) at vnode_if.h:384 #12 0xc076380e in dofileread (td=0xc5b76b80, fd=3, fp=0xc5474508, auio=0xf5824c48, offset=41189376, flags=1) at file.h:254 #13 0xc07639f5 in kern_preadv (td=0xc5b76b80, fd=3, auio=0xf5824c48, offset=41189376) at /usr/src/sys/kern/sys_generic.c:288 #14 0xc0763b0d in sys_pread (td=0xc5b76b80, uap=0xf5824cec) at /usr/src/sys/kern/sys_generic.c:189 #15 0xc09accf5 in syscall (frame=0xf5824d28) at subr_syscall.c:131 #16 0xc0996db1 in Xint0x80_syscall () at /usr/src/sys/i386/i386/exception.s:266 #17 0x0033 in ?? () Previous frame inner to this frame (corrupt stack?) -- Alexandr Kovalenko http://uafug.org.ua/ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo
Re: NFSD hang
On Tue, Sep 27, 2011 at 08:06:50PM -0400, Rick Macklem wrote: I suspect something is making those threads loop, but I don't know how to figure out where? (In the bad old days, I would have exscaped to a debugger and looked where the program counter was, then repeated after a cont a few times, to see where they were executing. However, I have no idea how to do that on a multicore system, even if you still had the system sitting there? If someone does know how to do this, please feel free to chime in;-) Absolutely the same. Break into the debugger, use ps to find the pid/tid of the looping threads, then do bt id to get a backtrace for them. Repeat several time to get some idea where the loop is located. pgpCeDaV6lEDG.pgp Description: PGP signature
Re: luit -encoding gbk causes Segmentation fault (core dumped) in 9-stable
On Wed, Sep 28, 2011 at 09:09:30PM +0800, Adrian Chadd wrote: Hm, it's not all that useful. But it first calls dlopen(). Does it have shared modules? Do you have old copies of those somewhere lying around? You need to build everything, i.e. base and the port, build with debug symbols. Otherwise, the backtrace give no useful information. pgpeghVB2ZyTk.pgp Description: PGP signature
Re: /usr/bin/script eating 100% cpu with portupgrade and xargs
On Sun, Sep 18, 2011 at 02:54:34PM +0300, Mikolaj Golub wrote: On Sun, 18 Sep 2011 13:25:26 +0200 Ronald Klop wrote: RK It is a while since I programmed C, but why will writing 0 bytes give RK the reader an end-of-file? Shouldn't the fd be closed to indicate RK end-of-file? AFAIR, this trick with writing 0 to emulate EOF because we can't close the fd -- we still want to read from it. Poor shutdown(2) for non-socket :-). Colin might tell more... Please note that interpreting the receiving of 0 bytes on the terminal as EOF is only a convention. If done absolutely properly, script shall not interpret zero-byte read as EOF. Might be, the reasonable thing to do would be to only look at the stdin once in a second after receiving zero-bytes, and switching it back to normal mode if something is read. pgpHt3LnC7dhH.pgp Description: PGP signature
Re: /usr/bin/script eating 100% cpu with portupgrade and xargs
On Sun, Sep 18, 2011 at 11:57:57PM +0300, Mikolaj Golub wrote: On Sun, 18 Sep 2011 20:24:23 +0300 Kostik Belousov wrote: KB On Sun, Sep 18, 2011 at 02:54:34PM +0300, Mikolaj Golub wrote: On Sun, 18 Sep 2011 13:25:26 +0200 Ronald Klop wrote: RK It is a while since I programmed C, but why will writing 0 bytes give RK the reader an end-of-file? Shouldn't the fd be closed to indicate RK end-of-file? AFAIR, this trick with writing 0 to emulate EOF because we can't close the fd -- we still want to read from it. Poor shutdown(2) for non-socket :-). Colin might tell more... KB Please note that interpreting the receiving of 0 bytes on the terminal KB as EOF is only a convention. If done absolutely properly, script shall KB not interpret zero-byte read as EOF. Might be, the reasonable thing to KB do would be to only look at the stdin once in a second after receiving KB zero-bytes, and switching it back to normal mode if something is read. Ok. I see. Below is the patch that does something like this. Looks fine for me, but I did not tested it. I would also suggest to document this behaviour, which can cause a 1-second pause in processing of the user input, somewhere in script(1) manpage, BUGS ? pgp8OqRUNOhFz.pgp Description: PGP signature
Re: panic: spin lock held too long (RELENG_8 from today)
On Sat, Sep 03, 2011 at 12:05:47PM +0200, Attilio Rao wrote: This should be enough for someone NFS-aware to look into it. Were you also able to get a core? I'll try to look into it in the next days, in particular about the softclock state. I am absolutely sure that this is a zfs deadlock. pgpV5NUD9Kyx1.pgp Description: PGP signature
Re: sigwait return 4
On Sat, Aug 27, 2011 at 04:25:36PM +0200, Jilles Tjoelker wrote: On Thu, Aug 25, 2011 at 12:29:29AM +0300, Kostik Belousov wrote: On Wed, Aug 24, 2011 at 10:56:09PM +0200, Jilles Tjoelker wrote: sigwait() was fixed not to return EINTR in 9-current in r212405 (fixed up in r219709). The discussion started at http://lists.freebsd.org/pipermail/freebsd-threads/2010-September/004892.html Solaris is simply wrong in the same way we were wrong. Although POSIX may not be as clear on this as one may like, its intention is clear and additionally not returning EINTR reduces subtle portability problems. Can you, please, describe why do you consider the behaviour prohibiting return of EINTR reasonable ? I do consider that the Solaris behaviour is useful. Applications need to cope with EINTR returns (usually by retrying the call); if they do not do this, bugs arise in uncommon cases. In the case of sigwait(), applications do not really need EINTR: they can include the respective signal into the signal set and do the work inline that was originally in the signal handler. This might require additional pthread_sigmask() calls. This also fixes the race condition almost always associated with EINTR. Historically, this is because sigwait() came with POSIX threads, which also explains why it returns an error number rather than setting errno. The threads group considered EINTR errors not useful enough, given that they may lead to subtle bugs. This is fully standardized for functions like pthread_cond_wait() and pthread_mutex_lock(). In the case of sigwait(), it also plays a role that glibc has decided not to return EINTR, so that returning EINTR may lead to subtle bugs appearing on FreeBSD in software originally written for GNU/Linux. The functions sigwaitinfo() and sigtimedwait() came with POSIX realtime and therefore follow different conventions. I think I finally realized what was the problem Slawa searched the fix for. The fix from r212405 indeed does not allow EINTR to be returned from the sigwait() for new libc, but it still leaves the compat libc and libthr with EINTR. Below is the patch that I provided to Slawa to handle EINTR condition in kernel. The meat is in kern_sig.c two lines, everything else is the r212405 revert. diff --git a/lib/libc/sys/Makefile.inc b/lib/libc/sys/Makefile.inc index fe5061d..aa0959b 100644 --- a/lib/libc/sys/Makefile.inc +++ b/lib/libc/sys/Makefile.inc @@ -24,9 +24,6 @@ SRCS+=${SYSCALL_COMPAT_SRCS} NOASM+=${SYSCALL_COMPAT_SRCS:S/.c/.o/} PSEUDO+= _fcntl.o .endif -SRCS+= sigwait.c -NOASM+= sigwait.o -PSEUDO+= _sigwait.o # Add machine dependent asm sources: SRCS+=${MDASM} diff --git a/lib/libc/sys/Symbol.map b/lib/libc/sys/Symbol.map index 095751a..2ba1f8f 100644 --- a/lib/libc/sys/Symbol.map +++ b/lib/libc/sys/Symbol.map @@ -937,7 +937,6 @@ FBSDprivate_1.0 { _sigtimedwait; __sys_sigtimedwait; _sigwait; - __sigwait; __sys_sigwait; _sigwaitinfo; __sys_sigwaitinfo; diff --git a/lib/libc/sys/sigwait.c b/lib/libc/sys/sigwait.c deleted file mode 100644 index 2fdffdd..000 --- a/lib/libc/sys/sigwait.c +++ /dev/null @@ -1,46 +0,0 @@ -/*- - * Copyright (c) 2010 davi...@freebsd.org - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * 1. Redistributions of source code must retain the above copyright - *notice, this list of conditions and the following disclaimer. - * 2. Redistributions in binary form must reproduce the above copyright - *notice, this list of conditions and the following disclaimer in the - *documentation and/or other materials provided with the distribution. - * - * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND - * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE - * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE - * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE - * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL - * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS - * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) - * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT - * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY - * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF - * SUCH DAMAGE. - */ - -#include sys/cdefs.h -__FBSDID($FreeBSD$); - -#include errno.h -#include signal.h - -int __sys_sigwait(const sigset_t * restrict, int * restrict); - -__weak_reference(__sigwait, sigwait); - -int -__sigwait(const sigset_t * restrict set, int * restrict sig) -{ - int ret; - - /* POSIX does not allow EINTR to be returned */ - do { - ret = __sys_sigwait(set, sig
Re: -m32 on freeBSD 8.2r amd64
On Wed, Aug 24, 2011 at 01:57:02PM +0100, Tom Evans wrote: On Wed, Aug 24, 2011 at 12:11 PM, Michael Hoffmann benz...@arcor.de wrote: Maybe off topic? 1: echo int main(void) { return 0; } t.c 2: setenv LDEMULATION elf_i386_fbsd 3: gcc -c -m32 -o t.o t.c 4: gcc -nostartfiles -o a.out t.o -L/usr/lib32 /usr/lib32/crt1.o /usr/lib32/crti.o 5: file a.out a.out: ELF 32-bit LSB executable, Intel 80386, version 1 (FreeBSD), dynamically linked (uses shared libs), for FreeBSD 8.2, not stripped 6: uname -m amd64 2: q.v. info binutils - Selecting The Target System Maybe there is a more comfortable way. Michael You don't need to go to all that effort: $ uname -m amd64 $ echo int main(void) { return 0; } t.c $ gcc -c -m32 -o t.o t.c $ gcc -m32 -o t t.o -B/usr/lib32 $ file t t: ELF 32-bit LSB executable, Intel 80386, version 1 (FreeBSD), dynamically linked (uses shared libs), for FreeBSD 8.2 (802510), not stripped Well-known problem is that /usr/include/machine/*.h still contains amd64 arch definitions. The resulting binary is broken in the quite subtle ways. pgp1knQx89Ipg.pgp Description: PGP signature
Re: sigwait return 4
On Wed, Aug 24, 2011 at 10:19:07PM +0400, Slawa Olhovchenkov wrote: System is 8.2-RELEASE (GENERIC), amd64. Application -- i386 for freebsd7. In ktrace dump I find some strange result: 22951 100556 kas-milter CALL sigwait(0xffdfdf80,0xffdfdf7c) 22951 100556 kas-milter RET sigwait 4 22951 100556 kas-milter PSIG SIGUSR2 caught handler=0x804c0f0 mask=0x4003 code=0x0 RET sigwait 4 confused me, and, I think, confused application too. man sigwait: ERRORS The sigwait() system call will fail if: [EINVAL] The set argument specifies one or more invalid signal numbers. [EFAULT] Any arguments point outside the allocated address space or there is a memory protection fault. How sigwait can return '4'? May be EINTR, converted from ERESTART? But kern_sigtimedwait from sigwait must be called with timeout == NULL... What should the system do for a delivered signal not present in the set ? I guess this is the case of your ktrace. Looking at the SUSv4, I see no mention of the situation, but in Oracle SunOS 5.10 man page for sigwait(2), it is said explicitely EINTR The wait was interrupted by an unblocked, caught signal. So I think that we have a bug in the man page. diff --git a/lib/libc/sys/sigwait.2 b/lib/libc/sys/sigwait.2 index 8c00cf4..b462201 100644 --- a/lib/libc/sys/sigwait.2 +++ b/lib/libc/sys/sigwait.2 @@ -27,7 +27,7 @@ .\ .\ $FreeBSD$ .\ -.Dd November 11, 2005 +.Dd August 24, 2011 .Dt SIGWAIT 2 .Os .Sh NAME @@ -94,6 +94,8 @@ The .Fn sigwait system call will fail if: .Bl -tag -width Er +.It Bq Er EINTR +The system call was interrupted by an unblocked, caught signal. .It Bq Er EINVAL The .Fa set pgpWlJF1UhSzz.pgp Description: PGP signature
Re: sigwait return 4
On Wed, Aug 24, 2011 at 11:24:46PM +0400, Slawa Olhovchenkov wrote: On Wed, Aug 24, 2011 at 10:07:03PM +0300, Kostik Belousov wrote: On Wed, Aug 24, 2011 at 10:19:07PM +0400, Slawa Olhovchenkov wrote: System is 8.2-RELEASE (GENERIC), amd64. Application -- i386 for freebsd7. In ktrace dump I find some strange result: 22951 100556 kas-milter CALL sigwait(0xffdfdf80,0xffdfdf7c) 22951 100556 kas-milter RET sigwait 4 22951 100556 kas-milter PSIG SIGUSR2 caught handler=0x804c0f0 mask=0x4003 code=0x0 RET sigwait 4 confused me, and, I think, confused application too. man sigwait: ERRORS The sigwait() system call will fail if: [EINVAL] The set argument specifies one or more invalid signal numbers. [EFAULT] Any arguments point outside the allocated address space or there is a memory protection fault. How sigwait can return '4'? May be EINTR, converted from ERESTART? But kern_sigtimedwait from sigwait must be called with timeout == NULL... What should the system do for a delivered signal not present in the set ? I guess this is the case of your ktrace. Looking at the SUSv4, I see no mention of the situation, but in Oracle SunOS 5.10 man page for sigwait(2), it is said explicitely EINTR The wait was interrupted by an unblocked, caught signal. I don't think you right in this case. This is kas-milter and in this thread (this is multi-thread application) kas-milter wait for USR2 for reload config. System return from sigwait only on USR2, but not each return w/ non-zero return code. On freebsd7 this application don't complain about sigwait's return value. Could it be that some other thread has the signal unblocked ? (You can verify this with procstat -j). Can you write the self-contained test case that demonstrates the behaviour ? pgpgqCL1Hl1XD.pgp Description: PGP signature
Re: sigwait return 4
On Wed, Aug 24, 2011 at 11:42:29PM +0400, Slawa Olhovchenkov wrote: On Wed, Aug 24, 2011 at 10:32:02PM +0300, Kostik Belousov wrote: What should the system do for a delivered signal not present in the set ? I guess this is the case of your ktrace. Looking at the SUSv4, I see no mention of the situation, but in Oracle SunOS 5.10 man page for sigwait(2), it is said explicitely EINTR The wait was interrupted by an unblocked, caught signal. I don't think you right in this case. This is kas-milter and in this thread (this is multi-thread application) kas-milter wait for USR2 for reload config. System return from sigwait only on USR2, but not each return w/ non-zero return code. On freebsd7 this application don't complain about sigwait's return value. Could it be that some other thread has the signal unblocked ? (You can verify this with procstat -j). Can you write the self-contained test case that demonstrates the behaviour ? This is closed-source software. How is this statement related to the creation of the standalone test case ? # procstat -j PIDTID COMM SIG FLAGS 1395 100199 kas-milter USR2 -- 1395 100232 kas-milter USR2 -- Both threads have the signal not blocked. This is not definitive, since signal must be blocked during the call to sigwait(2). Note that the SUSv4 says that The signals defined by set shall have been blocked at the time of the call to sigwait(); otherwise, the behavior is undefined. pgpOjMqGVVeyP.pgp Description: PGP signature
Re: sigwait return 4
On Wed, Aug 24, 2011 at 10:56:09PM +0200, Jilles Tjoelker wrote: On Wed, Aug 24, 2011 at 10:07:03PM +0300, Kostik Belousov wrote: On Wed, Aug 24, 2011 at 10:19:07PM +0400, Slawa Olhovchenkov wrote: System is 8.2-RELEASE (GENERIC), amd64. Application -- i386 for freebsd7. In ktrace dump I find some strange result: 22951 100556 kas-milter CALL sigwait(0xffdfdf80,0xffdfdf7c) 22951 100556 kas-milter RET sigwait 4 22951 100556 kas-milter PSIG SIGUSR2 caught handler=0x804c0f0 mask=0x4003 code=0x0 RET sigwait 4 confused me, and, I think, confused application too. man sigwait: ERRORS The sigwait() system call will fail if: [EINVAL] The set argument specifies one or more invalid signal numbers. [EFAULT] Any arguments point outside the allocated address space or there is a memory protection fault. How sigwait can return '4'? May be EINTR, converted from ERESTART? But kern_sigtimedwait from sigwait must be called with timeout == NULL... What should the system do for a delivered signal not present in the set ? I guess this is the case of your ktrace. Looking at the SUSv4, I see no mention of the situation, but in Oracle SunOS 5.10 man page for sigwait(2), it is said explicitely EINTR The wait was interrupted by an unblocked, caught signal. So I think that we have a bug in the man page. diff --git a/lib/libc/sys/sigwait.2 b/lib/libc/sys/sigwait.2 index 8c00cf4..b462201 100644 --- a/lib/libc/sys/sigwait.2 +++ b/lib/libc/sys/sigwait.2 @@ -27,7 +27,7 @@ .\ .\ $FreeBSD$ .\ -.Dd November 11, 2005 +.Dd August 24, 2011 .Dt SIGWAIT 2 .Os .Sh NAME @@ -94,6 +94,8 @@ The .Fn sigwait system call will fail if: .Bl -tag -width Er +.It Bq Er EINTR +The system call was interrupted by an unblocked, caught signal. .It Bq Er EINVAL The .Fa set This patch would be wrong, except to document existing behaviour in -stable branches. sigwait() was fixed not to return EINTR in 9-current in r212405 (fixed up in r219709). The discussion started at http://lists.freebsd.org/pipermail/freebsd-threads/2010-September/004892.html Solaris is simply wrong in the same way we were wrong. Although POSIX may not be as clear on this as one may like, its intention is clear and additionally not returning EINTR reduces subtle portability problems. Can you, please, describe why do you consider the behaviour prohibiting return of EINTR reasonable ? I do consider that the Solaris behaviour is useful. Since we went the other route, the addition to sigwait(2) manpage that clarifies this looks useful. And, sigwait(2) shall be sigwait(3). Also, the sentence the sigwaitinfo() function is equivalent to sigwait() ... in the sigwaitinfo(2) is not complete, due to EINTR. Note that sigwaitinfo() and sigtimedwait() may return EINTR. SA_RESTART applies to sigwaitinfo() but not to sigtimedwait() (because the timeout cannot be restarted). -- Jilles Tjoelker pgpyJTXjtKSeq.pgp Description: PGP signature
Re: sigwait return 4
On Thu, Aug 25, 2011 at 12:29:29AM +0300, Kostik Belousov wrote: Solaris is simply wrong in the same way we were wrong. Although POSIX may not be as clear on this as one may like, its intention is clear and additionally not returning EINTR reduces subtle portability problems. Can you, please, describe why do you consider the behaviour prohibiting return of EINTR reasonable ? I do consider that the Solaris behaviour is useful. Since we went the other route, the addition to sigwait(2) manpage that clarifies this looks useful. And, sigwait(2) shall be sigwait(3). Also, the sentence the sigwaitinfo() function is equivalent to sigwait() ... in the sigwaitinfo(2) is not complete, due to EINTR. Like this (svn cp to be applied). diff --git a/lib/libc/sys/sigwait.2 b/lib/libc/sys/sigwait.2 index 8c00cf4..a9e605c 100644 --- a/lib/libc/sys/sigwait.2 +++ b/lib/libc/sys/sigwait.2 @@ -27,7 +27,7 @@ .\ .\ $FreeBSD$ .\ -.Dd November 11, 2005 +.Dd August 24, 2011 .Dt SIGWAIT 2 .Os .Sh NAME @@ -82,6 +82,14 @@ selected, it will be the lowest numbered one. The selection order between realtime and non-realtime signals, or between multiple pending non-realtime signals, is unspecified. +.Sh IMPLEMENTATION NOTES +The +.Fn sigwait +function is implemented as a wrapper around the +.Fn __sys_sigwait +system call, which retries the call on +.Er EINTR +error. .Sh RETURN VALUES If successful, .Fn sigwait diff --git a/lib/libc/sys/sigwaitinfo.2 b/lib/libc/sys/sigwaitinfo.2 index 41be9e2..a83de06 100644 --- a/lib/libc/sys/sigwaitinfo.2 +++ b/lib/libc/sys/sigwaitinfo.2 @@ -27,7 +27,7 @@ .\ .\ $FreeBSD$ .\ -.Dd November 11, 2005 +.Dd August 24, 2011 .Dt SIGTIMEDWAIT 2 .Os .Sh NAME @@ -116,6 +116,16 @@ except that the selected signal number shall be stored in the member, and the cause of the signal shall be stored in the .Va si_code member. +Besides this, the +.Fn sigwaitinfo +and +.Fn sigtimedwait +system calls may return +.Er EINTR +if interrupted by signal, which is not allowed for the +.Fn sigwait +function. +.Pp If any value is queued to the selected signal, the first such queued value is dequeued and, if the info argument is .Pf non- Dv NULL , pgppSwqcBJsag.pgp Description: PGP signature
Re: 32GB limit per swap device?
On Sat, Aug 20, 2011 at 12:33:29PM -0500, Alan Cox wrote: On Thu, Aug 18, 2011 at 3:16 AM, Alexander V. Chernikov melif...@ipfw.ruwrote: On 10.08.2011 19:16, per...@pluto.rain.com wrote: Chuck Swigercswi...@mac.com wrote: On Aug 9, 2011, at 7:26 AM, Daniel Kalchev wrote: I am trying to set up 64GB partitions for swap for a system that has 64GB of RAM (with the idea to dump kernel core etc). But, on 8-stable as of today I get: WARNING: reducing size to maximum of 67108864 blocks per swap unit Is there workaround for this limitation? Another interesting question: swap pager operates in page blocks (PAGE_SIZE=4k on common arch). Block device size in passed to swaponsomething() in number of _disk_ blocks (e.g. in DEV_BSIZE=512). After that, kernel b-lists (on top of which swap pager is build) maximum objects check is enforced. The (possible) problem is that real object count we will operate on is not the value passed to swaponsomething() since it is calculated in wrong units. we should check b-list limit on (X * DEV_BSIZE512 / PAGE_SIZE) value which is rough (X / 8) so we should be able to address 32*8=256G. The code should look like this: Index: vm/swap_pager.c ==**==**=== --- vm/swap_pager.c (revision 223877) +++ vm/swap_pager.c (working copy) @@ -2129,6 +2129,15 @@ swaponsomething(struct vnode *vp, void *id, u_long u_long mblocks; /* +* nblks is in DEV_BSIZE'd chunks, convert to PAGE_SIZE'd chunks. +* First chop nblks off to page-align it, then convert. +* +* sw-sw_nblks is in page-sized chunks now too. +*/ + nblks = ~(ctodb(1) - 1); + nblks = dbtoc(nblks); + + /* * If we go beyond this, we get overflows in the radix * tree bitmap code. */ @@ -2138,14 +2147,6 @@ swaponsomething(struct vnode *vp, void *id, u_long mblocks); nblks = mblocks; } - /* -* nblks is in DEV_BSIZE'd chunks, convert to PAGE_SIZE'd chunks. -* First chop nblks off to page-align it, then convert. -* -* sw-sw_nblks is in page-sized chunks now too. -*/ - nblks = ~(ctodb(1) - 1); - nblks = dbtoc(nblks); sp = malloc(sizeof *sp, M_VMPGDATA, M_WAITOK | M_ZERO); sp-sw_vp = vp; (move pages recalculation before b-list check) Can someone comment on this? I believe that you are correct. Have you tried testing this change on a large swap device? I probably agree too, but I am in the process of re-reading the swap code, and I do not quite believe in the limit. When the initial code was committed, our daddr_t was 32bit, I checked the RELENG_4 sources. Current code uses int64_t for daddr_t. My impression right now is that we only utilize the low 32bits of daddr_t. Esp. interesting looks the following typedef: typedef uint32_tu_daddr_t; /* unsigned disk address */ which (correctly) means that typical mask (u_daddr_t)-1 is 0x. I wonder whether we could just use full 64bit and de-facto remove the limitation on the swap partition size. pgpJVixGsCJlw.pgp Description: PGP signature
Re: 32GB limit per swap device?
On Sat, Aug 20, 2011 at 10:42:28PM +0400, Alexander V. Chernikov wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Alan Cox wrote: On 08/20/2011 12:41, Kostik Belousov wrote: On Sat, Aug 20, 2011 at 12:33:29PM -0500, Alan Cox wrote: On Thu, Aug 18, 2011 at 3:16 AM, Alexander V. Chernikovmelif...@ipfw.ruwrote: On 10.08.2011 19:16, per...@pluto.rain.com wrote: Chuck Swigercswi...@mac.com wrote: On Aug 9, 2011, at 7:26 AM, Daniel Kalchev wrote: I am trying to set up 64GB partitions for swap for a system that has 64GB of RAM (with the idea to dump kernel core etc). But, on 8-stable as of today I get: WARNING: reducing size to maximum of 67108864 blocks per swap unit Is there workaround for this limitation? Another interesting question: swap pager operates in page blocks (PAGE_SIZE=4k on common arch). Block device size in passed to swaponsomething() in number of _disk_ blocks (e.g. in DEV_BSIZE=512). After that, kernel b-lists (on top of which swap pager is build) maximum objects check is enforced. The (possible) problem is that real object count we will operate on is not the value passed to swaponsomething() since it is calculated in wrong units. we should check b-list limit on (X * DEV_BSIZE512 / PAGE_SIZE) value which is rough (X / 8) so we should be able to address 32*8=256G. The code should look like this: Index: vm/swap_pager.c ==**==**=== --- vm/swap_pager.c (revision 223877) +++ vm/swap_pager.c (working copy) @@ -2129,6 +2129,15 @@ swaponsomething(struct vnode *vp, void *id, u_long u_long mblocks; /* +* nblks is in DEV_BSIZE'd chunks, convert to PAGE_SIZE'd chunks. +* First chop nblks off to page-align it, then convert. +* +* sw-sw_nblks is in page-sized chunks now too. +*/ + nblks= ~(ctodb(1) - 1); + nblks = dbtoc(nblks); + + /* * If we go beyond this, we get overflows in the radix * tree bitmap code. */ @@ -2138,14 +2147,6 @@ swaponsomething(struct vnode *vp, void *id, u_long mblocks); nblks = mblocks; } - /* -* nblks is in DEV_BSIZE'd chunks, convert to PAGE_SIZE'd chunks. -* First chop nblks off to page-align it, then convert. -* -* sw-sw_nblks is in page-sized chunks now too. -*/ - nblks= ~(ctodb(1) - 1); - nblks = dbtoc(nblks); sp = malloc(sizeof *sp, M_VMPGDATA, M_WAITOK | M_ZERO); sp-sw_vp = vp; (move pages recalculation before b-list check) Can someone comment on this? I believe that you are correct. Have you tried testing this change on a large swap device? I will try tomorrow. I probably agree too, but I am in the process of re-reading the swap code, and I do not quite believe in the limit. I'm uncertain whether the current limit, 0x4000 / BLIST_META_RADIX, is exact or not, but I doubt that it is too large. It is not exact. It is rough estimation of sizeof(blmeta_t) * X 4G (blist_create() assumes malloc() not being able to allocate more that 4G. I'm not sure if it is true this days) X is number of blocks we need to store. Actual number, however, it is X / (1 + 1/BLIST_META_RADIX + 1/BLIST_META_RADIX^2 + ...) but it dffers from X not very much. blist can be seen as tree of radix trees, with metainformation for all those radix trees allocated by single allocation which imposes this limit. Metatinformation is used to find free blocks more quickly Single linear allocation is required to advance to next radix tree on the same level very fast: * * * * * ** ** ** ** ** ^^^ Some kind of schema with 3 level in tree and BLIST_META_RADIX=2 (instead of 16). When the initial code was committed, our daddr_t was 32bit, I checked the RELENG_4 sources. Current code uses int64_t for daddr_t. My impression right now is that we only utilize the low 32bits of daddr_t. Esp. interesting looks the following typedef: typedefuint32_tu_daddr_t;/* unsigned disk address */ which (correctly) means that typical mask (u_daddr_t)-1 is 0x. I wonder whether we could just use full 64bit and de-facto remove the limitation on the swap partition size. This will increase struct blmeta_t twice and cause 2*X memory usage for every swap configuration. No, daddr_t is already 64bit. Nothing will increase. My point is the current limitation is artificial. I think Alan note referred to the amount of the radix tree nodes required to cover the large swap partition. But it could be a good temporary measure. I expect to be able to provide some numeric evidence later. I would rather argue first that the subr_list code
Re: debugging frequent kernel panics on 8.2-RELEASE
On Wed, Aug 17, 2011 at 11:21:42PM +0300, Andriy Gapon wrote: [skip] But I also would like to use this opportunity to discuss how we can make it easier to debug such issue as this. I think that this problem demonstrates that when we treat certain junk in kernel address value as a userland address value, we throw additional heaps of irrelevant stuff on top of an actual problem. One solution could be to use a special flag that would mark all actual attempts to access userland address (e.g. setting the flag on entrance to copyin and clearing it upon return), so that in the page fault handler we could distinguish actual faults on userland addresses from faults on garbage kernel addresses. I am sure that there could be other clever techniques to catch such garbage addresses early. We already have such mechanism, the kernel code aware of the usermode page access sets pcb_onfault. See the end of trap_pfault() handler. In fact, we can catch it earlier, before even calling vm_fault(). BTW, I think this is esp. useful in the combination with the support for the SMEP in recent Intel CPUs. commit 2e1b36fa93f9499e37acf04a66ff0646d4f13536 Author: Konstantin Belousov kos...@pooma.home Date: Thu Aug 18 00:08:50 2011 +0300 Assert that the exiting process does not return to usermode. On x86, do not call vm_fault() when the kernel is not prepared to handle unsuccessful page fault. diff --git a/sys/amd64/amd64/trap.c b/sys/amd64/amd64/trap.c index 4e5f8b8..55e1e5a 100644 --- a/sys/amd64/amd64/trap.c +++ b/sys/amd64/amd64/trap.c @@ -674,6 +674,19 @@ trap_pfault(frame, usermode) goto nogo; map = vm-vm_map; + + /* +* When accessing a usermode address, kernel must be +* ready to accept the page fault, and provide a +* handling routine. Since accessing the address +* without the handler is a bug, do not try to handle +* it normally, and panic immediately. +*/ + if (!usermode (td-td_intr_nesting_level != 0 || + PCPU_GET(curpcb)-pcb_onfault == NULL)) { + trap_fatal(frame, eva); + return (-1); + } } /* diff --git a/sys/i386/i386/trap.c b/sys/i386/i386/trap.c index 5a8016c..e6d2b5a 100644 --- a/sys/i386/i386/trap.c +++ b/sys/i386/i386/trap.c @@ -831,6 +831,11 @@ trap_pfault(frame, usermode, eva) goto nogo; map = vm-vm_map; + if (!usermode (td-td_intr_nesting_level != 0 || + PCPU_GET(curpcb)-pcb_onfault == NULL)) { + trap_fatal(frame, eva); + return (-1); + } } /* diff --git a/sys/kern/subr_trap.c b/sys/kern/subr_trap.c index 3527ed1..a69b7b8 100644 --- a/sys/kern/subr_trap.c +++ b/sys/kern/subr_trap.c @@ -99,6 +99,8 @@ userret(struct thread *td, struct trapframe *frame) CTR3(KTR_SYSC, userret: thread %p (pid %d, %s), td, p-p_pid, td-td_name); + KASSERT((p-p_flag P_WEXIT) == 0, + (Exiting process returns to usermode)); #if 0 #ifdef DIAGNOSTIC /* Check that we called signotify() enough. */ pgpMIIm18QgD2.pgp Description: PGP signature
Re: dtrace ustack kernel panic
On Sat, Jul 30, 2011 at 12:05:33PM -0700, maestro something wrote: Hi, Have you started kgdb with the correct kernel and core file? If yes, then I am out of ideas. I hope so, I only recompiled the kernel once according to the DTRACE wiki instructions and I certainly only have one /var/crash/vmcore.* file. I'll try recompiling the kernel with -O1 and try again. In the meantime, I'm wondering whether I'm really the only/first one that ran into this problem or if there are people that actually successfully used the ustack() target on freebsd-8.2? I could not get the information even after recompiling the kernel here is the relevant (I think information). fb82i386# cat /etc/make.conf CFLAGS= -O (accodring to man make.conf only -O and -O2 is supported for CFLAGS anyways) kernel.debug is the newly compiled kernel (according to the timestamp) fb82i386# kgdb kernel.debug /var/crash/vmcore.0 GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type show copying to see the conditions. There is absolutely no warranty for GDB. Type show warranty for details. This GDB was configured as i386-marcel-freebsd... Unread portion of the kernel message buffer: kernel trap 12 with interrupts disabled Fatal trap 12: page fault while in kernel mode cpuid = 0; apic id = 00 fault virtual address= 0x108 fault code= supervisor write, page not present instruction pointer= 0x20:0xc1100847 stack pointer= 0x28:0xcd39a9e4 frame pointer= 0x28:0xcd39a9fc code segment= base 0x0, limit 0xf, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags= resume, IOPL = 0 current process= 1060 (nc) trap number= 12 panic: page fault cpuid = 0 KDB: stack backtrace: #0 0xc09036a7 at kdb_backtrace+0x47 #1 0xc08d1a07 at panic+0x117 #2 0xc0c158c3 at trap_fatal+0x323 #3 0xc0c15bc0 at trap_pfault+0x2f0 #4 0xc0c1612a at trap+0x48a #5 0xc0bfc97c at calltrap+0x6 #6 0xc10e99db at dtrace_panic+0x1b #7 0xc10e9a0d at dtrace_assfail+0x2d #8 0xc10fa6a6 at dtrace_probe+0xfd6 #9 0xc1237ce4 at systrace_probe+0x84 #10 0xc090f63f at syscallenter+0x47f #11 0xc0c15c14 at syscall+0x34 #12 0xc0bfca11 at Xint0x80_syscall+0x21 Uptime: 2m39s Physical memory: 239 MB Dumping 78 MB: 63 47 31 15 (kgdb) where #0 doadump () at pcpu.h:231 #1 0xc08d17a3 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:419 #2 0xc08d1a40 in panic (fmt=Variable fmt is not available. ) at /usr/src/sys/kern/kern_shutdown.c:592 #3 0xc0c158c3 in trap_fatal (frame=0xcd39a9a4, eva=264) at /usr/src/sys/i386/i386/trap.c:946 #4 0xc0c15bc0 in trap_pfault (frame=0xcd39a9a4, usermode=0, eva=264) at /usr/src/sys/i386/i386/trap.c:859 #5 0xc0c1612a in trap (frame=0xcd39a9a4) at /usr/src/sys/i386/i386/trap.c:532 #6 0xc0bfc97c in calltrap () at /usr/src/sys/i386/i386/exception.s:166 #7 0xc1100847 in dtrace_panic_trigger () from /boot/kernel/dtrace.ko Previous frame inner to this frame (corrupt stack?) (kgdb) list *dtrace_probe+0xfd6 No source file for address 0xc10fa6a6. So I'm stuck at the same point. any other ideas? This is i386, right ? I think the cause is that assembler routine panic_trigger does not establish the standard i386 frame. Basically, you need either this, or dwarf annotations, for gdb to be able to walk over the frame. You need to add the standard prologue pushl %ebp movl%esp,%ebp and standard epilogue leave to the function. No idea whether it will continue to operate correctly after. pgpTmrHqEp6po.pgp Description: PGP signature
Re: Sleeping thread owns a nonsleepable lock panic ( lor)
On Tue, Jul 26, 2011 at 07:12:23PM -0400, Rick Macklem wrote: Kostik Belousov wrote: On Tue, Jul 26, 2011 at 01:17:52PM +0200, Herve Boulouis wrote: Le 26/07/2011 12:06, Kostik Belousov a Иcrit: On Tue, Jul 26, 2011 at 11:49:13AM +0200, Herve Boulouis wrote: Le 25/07/2011 11:59, Kostik Belousov a ?crit: Ok the patched server crashed this morning strangely : all httpd processes were stuck in nfs or vmopar and were unkillable. Below is the full ps. Please see the http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug-deadlocks.html for information required to debug the deadlocks. the box was not stricly deadlocked since I was able to interact with it but I suppose you want me to break into debugger when the symptoms appears again and report all the commands listed in the handbook deadlock section ? Exactly. I think everything was hung that accessed an nfs mount point. From the usermode, procstat -kk could catch some interesting information, but it is redundant if ddb output is captured. Would it be worth considering reverting r223054? (Note that I don't understand the VM side, so this may be completely wrong:-) The sleeps on vmopar could be happening because a dirty page is busy and r223054 changes the VM_PAGER_xx value set a couple of ways. 1 - When it returns VM_PAGER_ERROR instead of VM_PAGER_AGAIN, the return value of runlen from vm_pageout_flush() changes. 2 - I'm not sure, but I think the pre-r223054 code marked a partially written page as VM_PAGER_OK instead of VM_PAGER_AGAIN? (I'm wondering about this one, since the problem seems to happen when the file's size has been truncated.) Herve Boulouis, if you want to see what r223054 changes, just go to http://svn.freebsd.org/viewvc/stable/8/sys/nfsclient and then click on nfs_bio.c. (The changes are small and could easily be reverted with a manual edit.) Since r223054 went into stable/8 on Jun 13, it seems a possible explanation? rick I doubt it. The ps output makes it not very inplausible that the reporter got the LOR between vnode lock and page busy flag. The correct order is vnode lock - busy bit. vmopar is a wait for the busy page state. Mentioned revision does not change the lock order. Anyway, this is only a speculation, until the requested data is provided. pgpN7hsFvpj0G.pgp Description: PGP signature
Re: Sleeping thread owns a nonsleepable lock panic ( lor)
On Tue, Jul 26, 2011 at 11:49:13AM +0200, Herve Boulouis wrote: Le 25/07/2011 11:59, Kostik Belousov a Иcrit: Ok the patched server crashed this morning strangely : all httpd processes were stuck in nfs or vmopar and were unkillable. Below is the full ps. Please see the http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug-deadlocks.html for information required to debug the deadlocks. pgpkLJtHzd71H.pgp Description: PGP signature
Re: Sleeping thread owns a nonsleepable lock panic ( lor)
On Tue, Jul 26, 2011 at 01:17:52PM +0200, Herve Boulouis wrote: Le 26/07/2011 12:06, Kostik Belousov a Иcrit: On Tue, Jul 26, 2011 at 11:49:13AM +0200, Herve Boulouis wrote: Le 25/07/2011 11:59, Kostik Belousov a ?crit: Ok the patched server crashed this morning strangely : all httpd processes were stuck in nfs or vmopar and were unkillable. Below is the full ps. Please see the http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug-deadlocks.html for information required to debug the deadlocks. the box was not stricly deadlocked since I was able to interact with it but I suppose you want me to break into debugger when the symptoms appears again and report all the commands listed in the handbook deadlock section ? Exactly. I think everything was hung that accessed an nfs mount point. From the usermode, procstat -kk could catch some interesting information, but it is redundant if ddb output is captured. pgpXgEYH9PI7d.pgp Description: PGP signature
Re: Sleeping thread owns a nonsleepable lock panic ( lor)
On Mon, Jul 25, 2011 at 12:21:07PM +0200, Herve Boulouis wrote: Hi list, We have 2 freebsd 8.2-STABLE (cvsuped june 22) that keeps crashing in a bad way : The are doing heavy apache / php4 web serving from a nfs mount and panic at least once a day with the following message (no crash dump produced, hand copied from the console) : Sleeping on vmopar with the following non-sleepable locks held: exclusive sleep mutex NFSnode lock (NFSnode lock) r = 0 (0xff0201798000) locked @ nfsclient/nfs_subs.c:538 lock order reversal: 1st 0x018ff6da80 turnstile lock (turnstile lock) @ kern/subr_turnstile.c:190 2nd 0xff80b52b10 scrlock (scrlock) @ dev/syscons.c:2570 lock order reversal: 1st 0x018ff6da80 turnstile lock (turnstile lock) @ kern/subr_turnstile.c:190 2nd 0xff80b78ef8 sleepq chain (sleepq chain) @ kern/subr_turnstile.c:203 lock order reversal: 1st 0xff80b78ef8 sleepq chain (sleepq chain) @ kern/subr_turnstile.c:203 2nd 0xff80b52b10 scrlock (scrlock) @ dev/syscons.c:2570 Sleeping thread (tid 100998, pid 20700) owns a non-sleepable lock panic: sleeping thread cpuid = 1 panic: bufwrite: buffer is not busy??? cpuid = 1 The 2 servers share the same load and panic consistently. I enabled WITNESS on the 2 in the hope it would allow the boxes to auto reboot after panic and get extra debug info. I got debug info but the servers still hangs after the double panic :( Try this. Calling vnode_pager_setsize() while holding a mutex is prohibited. On the other hand, I remember that my attempt to add a strict assert that a vnode is exclusively locked in vnode_pager_setsize() had to be reversed because nfs_loadattrcache() sometimes called without vnode lock held. commit 2aa7d15c38b0c01e3f724f04d7ed02ce11c82cc0 Author: Konstantin Belousov kostik...@gmail.com Date: Mon Jul 25 11:56:04 2011 +0300 Postpone the vnode_pager_setsize() call until the nfs node mutex is dropped. diff --git a/sys/nfsclient/nfs_subs.c b/sys/nfsclient/nfs_subs.c index 19fde06..351885a 100644 --- a/sys/nfsclient/nfs_subs.c +++ b/sys/nfsclient/nfs_subs.c @@ -478,7 +478,9 @@ nfs_loadattrcache(struct vnode **vpp, struct mbuf **mdp, caddr_t *dposp, struct timespec mtime, mtime_save; int v3 = NFS_ISV3(vp); int error = 0; + int do_setsize; + do_setsize = 0; md = *mdp; t1 = (mtod(md, caddr_t) + md-m_len) - *dposp; cp2 = nfsm_disct(mdp, dposp, NFSX_FATTR(v3), t1, M_WAIT); @@ -606,7 +608,7 @@ nfs_loadattrcache(struct vnode **vpp, struct mbuf **mdp, caddr_t *dposp, np-n_size = vap-va_size; np-n_flag |= NSIZECHANGED; } - vnode_pager_setsize(vp, np-n_size); + do_setsize = 1; } else { np-n_size = vap-va_size; } @@ -643,6 +645,8 @@ nfs_loadattrcache(struct vnode **vpp, struct mbuf **mdp, caddr_t *dposp, KDTRACE_NFS_ATTRCACHE_LOAD_DONE(vp, np-n_vattr, 0); #endif mtx_unlock(np-n_mtx); + if (do_setsize) + vnode_pager_setsize(vp, np-n_size); out: #ifdef KDTRACE_HOOKS if (error) pgpIRWwyIFV50.pgp Description: PGP signature
Re: disable 64-bit dma for one PCI slot only?
On Wed, Jul 20, 2011 at 11:54:06AM +0200, Stefan Esser wrote: The Rev column is required for of devices that are not uniquely identified by their Vnd/Dev-IDs. (These used to exist, e.g. the Symbios SCSI controllers, though I'm not aware of any device that needed a different driver depending on the PCI revision number.) Might be there is indeed no such device which require different driver due to revision, but there are definitely devices that require different workarounds in the driver based on revision. Seeing the revision in the output of pciconf very much helps to reduce the mail turnaround when analyzing user reports. pgp5QDCLC8Yv0.pgp Description: PGP signature
Re: panic: spin lock held too long (RELENG_8 from today)
On Thu, Jul 07, 2011 at 10:36:42AM +0300, Andriy Gapon wrote: on 07/07/2011 08:55 Mike Tancsa said the following: I did a buildworld on this box to bring it up to RELENG_8 for the BIND fixes. Unfortunately, the formerly solid box (April 13th kernel) panic'd tonight with Unread portion of the kernel message buffer: spin lock 0xc0b1d200 (sched lock 1) held by 0xc5dac8a0 (tid 100107) too long panic: spin lock held too long cpuid = 0 Uptime: 13h30m4s Physical memory: 2035 MB Its a somewhat busy box taking in mail as well as backups for a few servers over nfs. At the time, it would have been getting about 250Mb/s inbound on its gigabit interface. Full core.txt file at http://www.tancsa.com/core-jul8-2011.txt I thought that this was supposed to contain output of 'thread apply all bt' in kgdb. Anyway, I think that stacktrace for tid 100107 may have some useful information. #0 doadump () at pcpu.h:231 231 pcpu.h: No such file or directory. in pcpu.h (kgdb) #0 doadump () at pcpu.h:231 #1 0xc06fd6d3 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:429 #2 0xc06fd937 in panic (fmt=Variable fmt is not available. ) at /usr/src/sys/kern/kern_shutdown.c:602 #3 0xc06ed95f in _mtx_lock_spin_failed (m=0x0) at /usr/src/sys/kern/kern_mutex.c:490 #4 0xc06ed9e5 in _mtx_lock_spin (m=0xc0b1d200, tid=3312388992, opts=0, file=0x0, line=0) at /usr/src/sys/kern/kern_mutex.c:526 #5 0xc0720254 in sched_add (td=0xc5dac5c0, flags=0) at /usr/src/sys/kern/sched_ule.c:1119 #6 0xc07203f9 in sched_wakeup (td=0xc5dac5c0) at /usr/src/sys/kern/sched_ule.c:1950 #7 0xc07061f8 in setrunnable (td=0xc5dac5c0) at /usr/src/sys/kern/kern_synch.c:499 #8 0xc07362af in sleepq_resume_thread (sq=0xca0da300, td=0xc5dac5c0, pri=Variable pri is not available. ) at /usr/src/sys/kern/subr_sleepqueue.c:751 #9 0xc0736e18 in sleepq_signal (wchan=0xc5fafe50, flags=1, pri=0, queue=0) at /usr/src/sys/kern/subr_sleepqueue.c:825 #10 0xc06b6764 in cv_signal (cvp=0xc5fafe50) at /usr/src/sys/kern/kern_condvar.c:422 #11 0xc08eaa0d in xprt_assignthread (xprt=Variable xprt is not available. ) at /usr/src/sys/rpc/svc.c:342 #12 0xc08ec502 in xprt_active (xprt=0xc95d9600) at /usr/src/sys/rpc/svc.c:378 #13 0xc08ee051 in svc_vc_soupcall (so=0xc6372ce0, arg=0xc95d9600, waitflag=1) at /usr/src/sys/rpc/svc_vc.c:747 #14 0xc075bbb1 in sowakeup (so=0xc6372ce0, sb=0xc6372d34) at /usr/src/sys/kern/uipc_sockbuf.c:191 #15 0xc08447bc in tcp_do_segment (m=0xcaa8d200, th=0xca6aa824, so=0xc6372ce0, tp=0xc63b4d20, drop_hdrlen=52, tlen=1448, iptos=0 '\0', ti_locked=2) at /usr/src/sys/netinet/tcp_input.c:1775 #16 0xc0847930 in tcp_input (m=0xcaa8d200, off0=20) at /usr/src/sys/netinet/tcp_input.c:1329 #17 0xc07ddaf7 in ip_input (m=0xcaa8d200) at /usr/src/sys/netinet/ip_input.c:787 #18 0xc07b8859 in netisr_dispatch_src (proto=1, source=0, m=0xcaa8d200) at /usr/src/sys/net/netisr.c:859 #19 0xc07b8af0 in netisr_dispatch (proto=1, m=0xcaa8d200) at /usr/src/sys/net/netisr.c:946 #20 0xc07ae5e1 in ether_demux (ifp=0xc56ed800, m=0xcaa8d200) at /usr/src/sys/net/if_ethersubr.c:894 #21 0xc07aeb5f in ether_input (ifp=0xc56ed800, m=0xcaa8d200) at /usr/src/sys/net/if_ethersubr.c:753 #22 0xc09977b2 in nfe_int_task (arg=0xc56ff000, pending=1) at /usr/src/sys/dev/nfe/if_nfe.c:2187 #23 0xc07387ca in taskqueue_run_locked (queue=0xc5702440) at /usr/src/sys/kern/subr_taskqueue.c:248 #24 0xc073895c in taskqueue_thread_loop (arg=0xc56ff130) at /usr/src/sys/kern/subr_taskqueue.c:385 #25 0xc06d1027 in fork_exit (callout=0xc07388a0 taskqueue_thread_loop, arg=0xc56ff130, frame=0xc538ed28) at /usr/src/sys/kern/kern_fork.c:861 #26 0xc09a5c24 in fork_trampoline () at /usr/src/sys/i386/i386/exception.s:275 (kgdb) BTW, we had a similar panic, spinlock held too long, the spinlock is the sched lock N, on busy 8-core box recently upgraded to the stable/8. Unfortunately, machine hung dumping core, so the stack trace for the owner thread was not available. I was unable to make any conclusion from the data that was present. If the situation is reproducable, you coulld try to revert r221937. This is pure speculation, though. pgpW2o7azLFBo.pgp Description: PGP signature
Re: csh Cannot open /etc/termcap after starting screen
On Sat, Jun 18, 2011 at 10:14:32PM +0200, Stefan `Sec` Zehl wrote: Hi, On Thu, Jun 16, 2011 at 13:15 -0700, Jeremy Chadwick wrote: Example: run mutt from within GNU screen while connected to the system with PuTTY, then copy some of the terminal content and paste it somewhere. Wow, look at all those extraneous spaces at the end of lines, which you now gloriously have to manually remove. While I don't want to stand in the way of your rant, this is actually a bug/problem of mutt. -- mutt is really printing spaces there, so it is (IMHO) correct that copypaste copies spaces. It is the case of the default termcap entry for the screen. Try TERM=screen-bce mutt. pgpVDFesl77jN.pgp Description: PGP signature
Re: doscmd under 8-stable, anyone?
On Wed, Jun 15, 2011 at 03:57:05PM +0200, Joerg Wunsch wrote: When trying to use doscmd on 8-stable, all I get is: Error mapping HMA, HMA disabled: : Invalid argument Segmentation fault (core dumped) The segfault happens at the end of mem_init(), when the allocated DOS memory (which is located at virtual address 0) is attempted to be written to. Apparently, the mmap() failure that causes the HMA disabled message is actually a fatal error rather than a benign one the could be ignored, as it results in no valid DOS memory allocation at all. Right now, the only older system I could test it against uses FreeBSD 5.x, where the mmap() works as expected. So does anyone have an idea why this mmap() call: if (mmap((caddr_t)0x00, 0x10, PROT_EXEC | PROT_READ | PROT_WRITE, MAP_ANON | MAP_FIXED | MAP_SHARED, -1, 0) == MAP_FAILED) { perror(Error mapping HMA, HMA disabled: ); HMA_a20 = -1; close(HMA_fd_off); close(HMA_fd_on); return; } yields an EINVAL now under 8-stable? Do sysctl security.bsd.map_at_zero=1 pgps0JeiyjSft.pgp Description: PGP signature
Re: doscmd under 8-stable, anyone?
On Wed, Jun 15, 2011 at 04:44:55PM +0200, Joerg Wunsch wrote: As Kostik Belousov wrote: Do sysctl security.bsd.map_at_zero=1 Just for the record, this sysctl also makes my really really old utree binary work again. The binary dates back to 386BSD 0.0, and I'm only keeping it out of curiosity: j@uriah 66% ls -l /usr/local/bin/utree -rwxr-xr-x 1 bin bin 179639 Apr 30 1992 /usr/local/bin/utree* The only thing to make it run is to use a termcap entry that is smaller than 1024 byte, as this used to be a hard-coded limitation in the termcap library of those days, and the binary is statically linked. TERM=vt100 works, xterm no longer does. The ability to run this binary only serves as a proof that no backward compatibility has ever been broken in FreeBSD. ;-) (Obviously, all the various COMPAT_* options must be present in the kernel config.) Yes, doscmd and N-magic a.out binaries were the arguments to implement the sysctl instead of outright disable of the mapping at address 0. You are the first documented case of the wiseness of the decision :). BTW, I semi-jokingly committed the support for FreeBSD-1.0/i386 ABI on amd64 on April 1. Would be interesting to see how does your binary behaves. pgpll3h9Y77Ec.pgp Description: PGP signature
Re: automoc4 processes lock again
On Mon, May 09, 2011 at 12:40:56PM +0400, Max Brazhnikov wrote: Hi, After recent Qt-4.7.3 update I can't build KDE4 ports anymore (tested on 8.2-STABLE amd64 only). The problem is always reproduced with x11/kdelibs4. The build stalls with hanging automoc4 processes. Any help is appreciated. # ps | grep automoc 18636 3 IN+0:00.02 /usr/local/bin/automoc4 /usr/obj/usr/local/tinderbox/portstrees/FreeBSD/ports/x11/kdelibs4/work/kdelibs-4.6.3/build/kde3support/ 18640 3 IN+0:00.00 /usr/local/bin/automoc4 /usr/obj/usr/local/tinderbox/portstrees/FreeBSD/ports/x11/kdelibs4/work/kdelibs-4.6.3/build/kde3support/ # gdb automoc4 18636 ... Reading symbols from /lib/libthr.so.3...done. [New Thread 801c0ae40 (LWP 100660/automoc4)] [New Thread 801c041c0 (LWP 100590/initial thread)] ... [Switching to Thread 801c0ae40 (LWP 100660/automoc4)] 0x00080104c99c in select () at select.S:3 3 RSYSCALL(select) (gdb) bt #0 0x00080104c99c in select () at select.S:3 #1 0x0008008502cd in QProcessManager::run (this=0x800b196e0) at io/qprocess_unix.cpp:245 #2 0x000800749bde in QThreadPrivate::start (arg=0x800b196e0) at thread/qthread_unix.cpp:320 #3 0x0008017985e1 in thread_start (curthread=0x801c0ae40) at /usr/freebsd/8/src/lib/libthr/thread/thr_create.c:288 #4 0x in ?? () Error accessing memory address 0x7fbff000: Bad address. Current language: auto; currently asm # gdb automoc4 18640 ... 0x0008017a24cc in _umtx_op_err () at /usr/freebsd/8/src/lib/libthr/arch/amd64/amd64/_umtx_op_err.S:37 37 RSYSCALL_ERR(_umtx_op) (gdb) bt #0 0x0008017a24cc in _umtx_op_err () at /usr/freebsd/8/src/lib/libthr/arch/amd64/amd64/_umtx_op_err.S:37 #1 0x0008017a21bc in __thr_umutex_lock (mtx=0x8018a7380, id=100590) at /usr/freebsd/8/src/lib/libthr/thread/thr_umtx.c:58 #2 0x00080179d04a in init_static (thread=0x801c041c0, mutex=0x801166c78) at thr_umtx.h:88 #3 0x00080179d7ad in __pthread_mutex_lock (mutex=0x801166c78) at /usr/freebsd/8/src/lib/libthr/thread/thr_mutex.c:441 #4 0x00080104b21e in _flockfile (fp=0x801166be0) at /usr/freebsd/8/src/lib/libc/stdio/_flock_stub.c:70 #5 0x000801021515 in fileno (fp=0x801166be0) at /usr/freebsd/8/src/lib/libc/stdio/fileno.c:52 #6 0x00080084f109 in QProcessPrivate::execChild (this=0x801c51600, workingDir=0x0, path=0x0, argv=0x801c5b7c0, envp=0x0) at io/qprocess_unix.cpp:712 #7 0x000800851fc3 in QProcessPrivate::startProcess (this=0x801c51600) at io/qprocess_unix.cpp:665 #8 0x000800802248 in QProcess::start (this=0x7fffcd10, program=@0x7fffd8f8, arguments=@0x7fffcd00, mode=@0x7fffcd20) at io/qprocess.cpp:1960 #9 0x0040acd2 in AutoMoc::echoColor (this=0x7fffd8d0, msg=@0x7fffce80) at /usr/obj/usr/ports/devel/automoc4/work/automoc4-0.9.88/kde4automoc.cpp:73 #10 0x0040517c in AutoMoc::generateMoc (this=0x7fffd8d0, sourceFile=@0x801c0f910, mocFileName=@0x801c0f918) at /usr/obj/usr/ports/devel/automoc4/work/automoc4-0.9.88/kde4automoc.cpp:569 #11 0x00408011 in AutoMoc::run (this=0x7fffd8d0) at /usr/obj/usr/ports/devel/automoc4/work/automoc4-0.9.88/kde4automoc.cpp:470 #12 0x00409135 in main (argc=6, argv=0x7fffd9a8) at /usr/obj/usr/ports/devel/automoc4/work/automoc4-0.9.88/kde4automoc.cpp:114 Current language: auto; currently asm You did not supplied enough information. Which of the processes is parent, which is child ? Note that there are other threads in the pid 18636. What does they do ? If you would allow me to make some guess, then I could assume that pid 18640 is the child. Note that the child is waiting for the pthread mutex locked which protects the stdio' FILE structure. Now, assume additionally that the parent had the FILE locked in one thread while another thread did the fork. Then, the child process would never be able to obtain the lock because the lock was acquired by the thread that exists no longer (in the child process, only the thread that called fork is duplicated). In fact, I believe that you already reported a similar problem with malloc(3) some time ago. The root of the problem would be an undefined (and permitted by POSIX) behaviour of calling non-async signal safe functions in multithreaded process after fork. For malloc(3), this can be argued to be a quality of the implementation issue, but there is no reason to specially handle random mutexes, even from libc. If the mutex was locked during the fork time, the protected data structure is arguably in the inconsistent state after the fork in the child. pgpEO6Gaom30D.pgp Description: PGP signature
Re: automoc4 processes lock again
On Mon, May 09, 2011 at 07:39:46PM +0400, Max Brazhnikov wrote: On Mon, 9 May 2011 15:41:05 +0300, Kostik Belousov wrote: You did not supplied enough information. Which of the processes is parent, which is child ? Note that there are other threads in the pid 18636. What does they do ? Here is backtraces from all threads http://people.freebsd.org/~makc/automoc4.bt 63373 is a parent now, 63374 is a child. There were no related changes in Qt4 and automoc4 sources, probably my update from 8.2-PRERELEASE to STABLE a week ago triggered the issue. It is obviously application bug, yes, I think my guess was right. Thou shalt not call non-async safe functions in thy child of multithreaded process. Since it is a race, I see it more curious that it did not manifested itself prevously. If you would allow me to make some guess, then I could assume that pid 18640 is the child. Note that the child is waiting for the pthread mutex locked which protects the stdio' FILE structure. Now, assume additionally that the parent had the FILE locked in one thread while another thread did the fork. Then, the child process would never be able to obtain the lock because the lock was acquired by the thread that exists no longer (in the child process, only the thread that called fork is duplicated). In fact, I believe that you already reported a similar problem with malloc(3) some time ago. The root of the problem would be an undefined (and permitted by POSIX) behaviour of calling non-async signal safe functions in multithreaded process after fork. For malloc(3), this can be argued to be a quality of the implementation issue, but there is no reason to specially handle random mutexes, even from libc. If the mutex was locked during the fork time, the protected data structure is arguably in the inconsistent state after the fork in the child. pgpxoq0JuHcPt.pgp Description: PGP signature
Re: Kernel memory leak in 8.2-PRERELEASE?
On Sat, Apr 02, 2011 at 10:17:27AM -0400, Boris Kochergin wrote: Ahoy. This morning, I awoke to the following on one of my servers: pid 59630 (httpd), uid 80, was killed: out of swap space pid 59341 (find), uid 0, was killed: out of swap space pid 23134 (irssi), uid 1001, was killed: out of swap space pid 49332 (sshd), uid 1001, was killed: out of swap space pid 69074 (httpd), uid 0, was killed: out of swap space pid 11879 (eggdrop-1.6.19), uid 1001, was killed: out of swap space ... And so on. The machine is: FreeBSD exodus.poly.edu 8.2-PRERELEASE FreeBSD 8.2-PRERELEASE #2: Thu Dec 2 11:39:21 EST 2010 sp...@exodus.poly.edu:/usr/obj/usr/src/sys/EXODUS amd64 10:13AM up 120 days, 20:06, 2 users, load averages: 0.00, 0.01, 0.00 The memory line from top intrigued me: Mem: 16M Active, 48M Inact, 6996M Wired, 229M Cache, 828M Buf, 605M Free The machine has 8 gigs of memory, and I don't know what all that wired memory is being used for. There is a large-ish (6 x 1.5-TB) ZFS RAID-Z2 on it which has had a disk in the UNAVAIL state for a few months: # zpool status pool: home state: DEGRADED status: One or more devices could not be used because the label is missing or invalid. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Replace the device using 'zpool replace'. see: http://www.sun.com/msg/ZFS-8000-4J scrub: none requested config: NAMESTATE READ WRITE CKSUM homeDEGRADED 0 0 0 raidz2DEGRADED 0 0 0 ada0ONLINE 0 0 0 ada1ONLINE 0 0 0 ada2ONLINE 0 0 0 ada3ONLINE 0 0 0 ada4ONLINE 0 0 0 ada5UNAVAIL 08511 experienced I/O failures errors: No known data errors vmstat -m and vmstat -z output: http://acm.poly.edu/~spawk/vmstat-m.txt http://acm.poly.edu/~spawk/vmstat-z.txt Anyone have a clue? I know it's just going to happen again if I reboot the machine. It is still up in case there are diagnostics for me to run. Try r218795. Most likely, your issue is not leak. pgpDK3atxfMFJ.pgp Description: PGP signature
Re: Problem using POSIX message queues
On Mon, Mar 28, 2011 at 01:19:38PM -0400, Derek Tattersall wrote: While trying to develop an understanding of the use POSIX message queues, I found that issuing a mq_open (2) call, resulted in Bad system call: 12 error message. I have tried to run the tools/regression/mqueue tests, but they fail in mq_open with the bad system call error. In addition, the mq_open (2) man page refers to mq_timedreceive (3), mq_timedsend(3) which exist as section 2 man pages and a mq_unlink(3) man page which I can't find at all. Try kldload mqueuefs before the tests. pgpZpbPIeMbum.pgp Description: PGP signature
Re: [releng_7 tinderbox] failure on ia64/ia64
On Sun, Mar 13, 2011 at 01:04:05PM +, FreeBSD Tinderbox wrote: TB --- 2011-03-13 11:05:21 - tinderbox 2.6 running on freebsd-legacy.sentex.ca TB --- 2011-03-13 11:05:21 - starting RELENG_7 tinderbox run for ia64/ia64 TB --- 2011-03-13 11:05:21 - cleaning the object tree TB --- 2011-03-13 11:05:35 - cvsupping the source tree TB --- 2011-03-13 11:05:35 - /usr/bin/csup -z -r 3 -g -L 1 -h cvsup.sentex.ca -s /usr/home/tinderbox/RELENG_7/ia64/ia64/supfile TB --- 2011-03-13 11:05:42 - building world TB --- 2011-03-13 11:05:42 - MAKEOBJDIRPREFIX=/obj TB --- 2011-03-13 11:05:42 - PATH=/usr/bin:/usr/sbin:/bin:/sbin TB --- 2011-03-13 11:05:42 - TARGET=ia64 TB --- 2011-03-13 11:05:42 - TARGET_ARCH=ia64 TB --- 2011-03-13 11:05:42 - TZ=UTC TB --- 2011-03-13 11:05:42 - __MAKE_CONF=/dev/null TB --- 2011-03-13 11:05:42 - cd /src TB --- 2011-03-13 11:05:42 - /usr/bin/make -B buildworld World build started on Sun Mar 13 11:05:44 UTC 2011 Rebuilding the temporary build tree stage 1.1: legacy release compatibility shims stage 1.2: bootstrap tools stage 2.1: cleaning up the object tree stage 2.2: rebuilding the object tree stage 2.3: build tools stage 3: cross tools stage 4.1: building includes stage 4.2: building libraries stage 4.3: make dependencies stage 4.4: building everything World build completed on Sun Mar 13 12:50:12 UTC 2011 TB --- 2011-03-13 12:50:12 - generating LINT kernel config TB --- 2011-03-13 12:50:12 - cd /src/sys/ia64/conf TB --- 2011-03-13 12:50:12 - /usr/bin/make -B LINT TB --- 2011-03-13 12:50:12 - building LINT kernel TB --- 2011-03-13 12:50:12 - MAKEOBJDIRPREFIX=/obj TB --- 2011-03-13 12:50:12 - PATH=/usr/bin:/usr/sbin:/bin:/sbin TB --- 2011-03-13 12:50:12 - TARGET=ia64 TB --- 2011-03-13 12:50:12 - TARGET_ARCH=ia64 TB --- 2011-03-13 12:50:12 - TZ=UTC TB --- 2011-03-13 12:50:12 - __MAKE_CONF=/dev/null TB --- 2011-03-13 12:50:12 - cd /src TB --- 2011-03-13 12:50:12 - /usr/bin/make -B buildkernel KERNCONF=LINT Kernel build for LINT started on Sun Mar 13 12:50:12 UTC 2011 stage 1: configuring the kernel stage 2.1: cleaning up the object tree stage 2.2: rebuilding the object tree stage 2.3: build tools stage 3.1: making dependencies stage 3.2: building everything [...] cc -c -O2 -pipe -fno-strict-aliasing -std=c99 -Wall -Wredundant-decls -Wnested-externs -Wstrict-prototypes -Wmissing-prototypes -Wpointer-arith -Winline -Wcast-qual -Wundef -Wno-pointer-sign -fformat-extensions -nostdinc -I. -I/src/sys -I/src/sys/contrib/altq -I/src/sys/contrib/ia64/libuwx/src -D_KERNEL -DHAVE_KERNEL_OPTION_HEADERS -include opt_global.h -fno-common -finline-limit=15000 --param inline-unit-growth=100 --param large-function-growth=1000 -fno-builtin -mconstant-gp -ffixed-r13 -mfixed-range=f32-f127 -fpic -ffreestanding -Werror /src/sys/kern/imgact_shell.c /src/sys/kern/imgact_shell.c: In function 'exec_shell_imgact': /src/sys/kern/imgact_shell.c:238: error: invalid storage class for function 'shell_modevent' cc1: warnings being treated as errors /src/sys/kern/imgact_shell.c:238: warning: no previous prototype for 'shell_modevent' /src/sys/kern/imgact_shell.c:238: error: initializer element is not constant /src/sys/kern/imgact_shell.c:238: error: (near initialization for 'shell_mod.evhand') /src/sys/kern/imgact_shell.c:238: error: expected declaration or statement at end of input *** Error code 1 Stop in /obj/ia64/src/sys/LINT. *** Error code 1 Stop in /src. *** Error code 1 Stop in /src. TB --- 2011-03-13 13:04:05 - WARNING: /usr/bin/make returned exit code 1 TB --- 2011-03-13 13:04:05 - ERROR: failed to build lint kernel TB --- 2011-03-13 13:04:05 - 5823.39 user 719.80 system 7123.70 real I committed from the wrong tree, sorry. Should be fixed now. pgpQbKJAJ5f00.pgp Description: PGP signature
Re: Linker set issues with ath(4) HALs
On Sat, Mar 05, 2011 at 07:50:05PM +1100, Peter Jeremy wrote: I have a Atheros AR5424 and so, based on the 8.2-STABLE i386 NOTES and some rummaging in the sources, I tried to build a kernel with: deviceath # Atheros pci/cardbus NIC's deviceath_ar5212 # HAL for Atheros AR5212 and derived chips deviceath_rate_sample # SampleRate tx rate control for ath and this died during the kernel linking with: linking kernel.debug ah.o(.text+0x23c): In function `ath_hal_rfprobe': /usr/src/sys/dev/ath/ath_hal/ah.c:142: undefined reference to `__start_set_ah_rf s' ah.o(.text+0x241):/usr/src/sys/dev/ath/ath_hal/ah.c:142: undefined reference to `__stop_set_ah_rfs' ah.o(.text+0x25a):/usr/src/sys/dev/ath/ath_hal/ah.c:142: undefined reference to `__stop_set_ah_rfs' Following a suggestion by a friend, I changed that to: deviceath # Atheros pci/cardbus NIC's options AH_SUPPORT_AR5416 deviceath_hal # Atheros HAL deviceath_rate_sample # SampleRate tx rate control for ath and it worked. Normally, I would leave it at that but I'd like to understand what is actually going on... In both cases, ah.o contains the following 4 references: U __start_set_ah_chips U __start_set_ah_rfs U __stop_set_ah_chips U __stop_set_ah_rfs generated by: /* linker set of registered chips */ OS_SET_DECLARE(ah_chips, struct ath_hal_chip); /* linker set of registered RF backends */ OS_SET_DECLARE(ah_rfs, struct ath_hal_rf); These symbols do not appear in any other .o files, though there are a variety of other __{start,stop}_set_* symbols - all of which show up as 'A' (absolule) values in the final kernel. My questions are: How are these linker set references resolved? I can't find anything that defines these symbols - either in .o files or in ldscript files. In the first case, there are 2 pairs of undefined linker set variables but the linker only reports one pair as unresolved. Why don't both sets show up as resolved or unresolved? Why does using the generic ath_hal, rather than the hardware-specific HAL fix the problem? Linker synthesizes the symbols assuming the following two conditions are met: - the symbols are referenced; - there exists an ELF section named `set_ah_rfs'. It assigns the (relocated) start of the section to __start_sectionname, and end to __stop_sectionname. Most likely, omitting the option causes some SET_ENTRY() macro, which put a symbol into linker set, to be ommitted. Then, no section is created and linker does not synthesizes the missed symbols. pgp26WSwRNadH.pgp Description: PGP signature
Re: svn commit: r219178 - head/sys/crypto/aesni
On Wed, Mar 02, 2011 at 02:56:58PM +, Konstantin Belousov wrote: Author: kib Date: Wed Mar 2 14:56:58 2011 New Revision: 219178 URL: http://svn.freebsd.org/changeset/base/219178 Log: Fix a bug in the result of manual assembly. Reported by:Stefan Grundmann sg2342 googlemail com PR: kern/155118 MFC after: 3 days The end result of this bug should affect only AES256 variants, causing wrong keyschedule calculation. If you have a geli partition with 256bit key that worked with previous version of aesni(4), best strategy is backup, reinitialize geli volume with the new driver, then restore. Sorry. Modified: head/sys/crypto/aesni/aeskeys_amd64.S head/sys/crypto/aesni/aeskeys_i386.S Modified: head/sys/crypto/aesni/aeskeys_amd64.S == --- head/sys/crypto/aesni/aeskeys_amd64.S Wed Mar 2 14:39:26 2011 (r219177) +++ head/sys/crypto/aesni/aeskeys_amd64.S Wed Mar 2 14:56:58 2011 (r219178) @@ -162,7 +162,7 @@ ENTRY(aesni_set_enckey) .byte 0x66,0x0f,0x3a,0xdf,0xc8,0x20 call_key_expansion_256b // aeskeygenassist $0x40,%xmm2,%xmm1 # round 7 - .byte 0x66,0x0f,0x3a,0xdf,0xca,0x20 + .byte 0x66,0x0f,0x3a,0xdf,0xca,0x40 call_key_expansion_256a retq .Lenc_key192: Modified: head/sys/crypto/aesni/aeskeys_i386.S == --- head/sys/crypto/aesni/aeskeys_i386.S Wed Mar 2 14:39:26 2011 (r219177) +++ head/sys/crypto/aesni/aeskeys_i386.S Wed Mar 2 14:56:58 2011 (r219178) @@ -167,7 +167,7 @@ ENTRY(aesni_set_enckey) .byte 0x66,0x0f,0x3a,0xdf,0xc8,0x20 call_key_expansion_256b // aeskeygenassist $0x40,%xmm2,%xmm1 # round 7 - .byte 0x66,0x0f,0x3a,0xdf,0xca,0x20 + .byte 0x66,0x0f,0x3a,0xdf,0xca,0x40 call_key_expansion_256a .cfi_adjust_cfa_offset -4 leave pgpOzcvoWU4UT.pgp Description: PGP signature
Re: FYI: Userspace DTrace MFC to stable/8
On Tue, Mar 01, 2011 at 11:03:07AM +0200, Nikolay Denev wrote: On 1 Mar, 2011, at 01:33 , Robert Watson wrote: Dear all: Just an FYI that I've gone ahead and merged userspace DTrace support to FreeBSD 8.x from 9.x. While it appeared to pass build tests locally, boot and run, etc, this is a non-trivial merge, and it's possible I've messed up. If so, apologies in advance, and I'll try to resolve any problems as quickly as I can! And of course, many thanks go to Rui Paulo, who did the port of userspace DTrace to FreeBSD 9.x with support from the FreeBSD Foundation! Thanks, Robert N M Watson Computer Laboratory University of Cambridge That's great news! Many thanks to all that made this possible! I have a quick question though, now do I have to rebuild my world with WITH_CTF ? I'm asking because I did that by mistake some months ago on a RELENG_8 machine, and the world that was built had some problems, like gcc giving segfault 11 while compiling world or some ports. It was a known issue that ctfconvert (I think it is ctfconvert) damages statically linked binaries. Most likely, it was not fixed yet. pgpgIgYwH0GU1.pgp Description: PGP signature
Re: FreeBSD 8.2 Release, ZFS + Samba, running out of memory
On Tue, Feb 22, 2011 at 10:55:37PM +0100, Henner Heck wrote: Hello, i experience freezing of my FreeBSD machine when performing certain operations on a Samba share. Technical info: - FreeBSD 8.2 Release 64 Bit (it also happened with 8.2 RC3) - Samba 3.5.6.1 - Athlon II Quadcore, 4 GB Ram - 1 SSD with a ZFS pool (No.0) containing the FreeBSD system - 12x2TB RaidZ2 pool (No.1) for data, created on 12 GEOM eli encrypted partitions on 12 disks, shared to a Windows 7 PC with Samba, 8 of the disks are attached to 2 Marvell SATA controllers, 4 to the onboard controller - ZPool v15, ZFS v4 Scenarios (checked using top): A: When copying files from one directory in pool 1 to another, the free memory drops from about 3700M to abaout 200M in the process, but seems to stabilize then. B: When copying the files onto a Windows machine using the Samba share, the free memory seems to stabilize at about 100M. C: When computing a hashvalue of files from the share on Windows or doing a binary compare to copies of the files stored on the Windows PC (using Total Commander), the free memory on the FreeBSD machine drops even lower and shortly after the BSD system freezes. Here is the last top output i got via ssh: /last pid: 1328; load averages: 4.53, 2.23, 0.99up 0+00:04:39 22:07:50 263 processes: 43 running, 201 sleeping, 19 waiting CPU: 0.9% user, 0.0% nice, 23.1% system, 4.2% interrupt, 71.9% idle Mem: 720K Active, 516M Wired, 144K Cache, 320K Buf, *39M Free* Swap: 4096M Total, 12M Used, 4084M Free, 3008K In, 5124K Out PID USERNAME THR PRI NICE SIZERES STATE C TIME WCPU COMMAND 11 root4 171 ki31 0K64K RUN 0 15:54 303.61% idle 1321 root1 520 27812K 704K swread 1 0:24 14.26% smbd 12 root 19 -60- 0K 304K WAIT0 0:21 12.45% intr 16 root1 48- 0K16K psleep 2 0:01 3.76% pagedaemon 3 root1 -8- 0K16K RUN 0 0:06 3.27% g_up 4 root1 -8- 0K16K - 3 0:05 2.69% g_down 0 root 108 -80 0K 1712K - 0 1:02 1.86% kernel 8 root6 -8- 0K88K tx-tx 1 0:00 1.27% zfskern 1268 root1 44- 0K16K geli:w 1 0:03 0.98% g_eli[1] gpt 1225 root1 45- 0K16K RUN 3 0:02 0.98% g_eli[3] gpt 1267 root1 44- 0K16K geli:w 0 0:02 0.98% g_eli[0] gpt 1237 root1 44- 0K16K RUN 0 0:02 0.88% g_eli[0] gpt 1214 root1 44- 0K16K RUN 2 0:02 0.88% g_eli[2] gpt 1244 root1 44- 0K16K RUN 2 0:02 0.78% g_eli[2] gpt 1243 root1 44- 0K16K RUN 1 0:02 0.78% g_eli[1] gpt 1212 root1 44- 0K16K RUN 0 0:02 0.78% g_eli[0] gpt 1215 root1 44- 0K16K RUN 3 0:02 0.78% g_eli[3] gpt 1213 root1 44- 0K16K RUN 1 0:02 0.78% g_eli[1] gpt 1240 root1 44- 0K16K RUN 3 0:02 0.78% g_eli[3] gpt 1217 root1 44- 0K16K RUN 0 0:02 0.78% g_eli[0] gpt 1242 root1 44- 0K16K RUN 0 0:02 0.68% g_eli[0] gpt 1238 root1 44- 0K16K RUN 1 0:02 0.68% g_eli[1] gpt 1248 root1 44- 0K16K RUN 1 0:02 0.68% g_eli[1] gpt 1252 root1 44- 0K16K RUN 0 0:02 0.68% g_eli[0] gpt 1249 root1 44- 0K16K RUN 2 0:02 0.68% g_eli[2] gpt 1269 root1 44- 0K16K geli:w 2 0:02 0.68% g_eli[2] gpt/ It looks like a caching problem to me, but i don't know how to fix it. I am also a bit confused, since i don't see an obvious difference between scenario B and C. I had a similar setup with 5 disks RaidZ1 and Samba running on 8.1 Release, and never experienced such a freeze. Does anyone have advice on how to get rid of this problem? Try the patch from rev. 218795. If it indeed help, we would need an errara notice. pgp337czssbUY.pgp Description: PGP signature
Re: About panic: bufwrite: buffer is not busy???
On Sun, Feb 20, 2011 at 10:30:52AM -0500, Mike Tancsa wrote: On 2/20/2011 9:33 AM, Andrey Smagin wrote: On week -current I have same problem, my box paniced every 2-15 min. I resolve problem by next steps - unplug network connectors from 2 intel em (82574L) cards. I think last time that mpd5 related panic, but mpd5 work with another re interface interated on MB. I think it may be em related panic, or em+mpd5. The latest panic I saw didnt have anything to do with em. Are you sure your crashes are because of the nic drive ? The latest I saw was on Friday. # kgdb /usr/obj/usr/src/sys/router/kernel.debug vmcore.11 (kgdb) bt #0 doadump () at pcpu.h:231 #1 0xc04a51f9 in db_fncall (dummy1=1, dummy2=0, dummy3=-106856, dummy4=0xc6b9696c ) at /usr/src/sys/ddb/db_command.c:548 #2 0xc04a55f1 in db_command (last_cmdp=0xc096f73c, cmd_table=0x0, dopager=1) at /usr/src/sys/ddb/db_command.c:445 #3 0xc04a574a in db_command_loop () at /usr/src/sys/ddb/db_command.c:498 #4 0xc04a764d in db_trap (type=12, code=0) at /usr/src/sys/ddb/db_main.c:229 #5 0xc068ba7e in kdb_trap (type=12, code=0, tf=0xc6b96b94) at /usr/src/sys/kern/subr_kdb.c:546 #6 0xc088056f in trap_fatal (frame=0xc6b96b94, eva=52) at /usr/src/sys/i386/i386/trap.c:937 #7 0xc0880830 in trap_pfault (frame=0xc6b96b94, usermode=0, eva=52) at /usr/src/sys/i386/i386/trap.c:859 #8 0xc0880d4a in trap (frame=0xc6b96b94) at /usr/src/sys/i386/i386/trap.c:532 #9 0xc086716c in calltrap () at /usr/src/sys/i386/i386/exception.s:166 #10 0xc0657a16 in uihold (uip=0x0) at /usr/src/sys/kern/kern_resource.c:1248 #11 0xc0654ec9 in crcopy (dest=0xce3ee800, src=0xce3ee600) at /usr/src/sys/kern/kern_prot.c:1873 #12 0xc0654fd1 in crcopysafe (p=0xc90cc810, cr=0xce3ee800) at /usr/src/sys/kern/kern_prot.c:1950 #13 0xc0656d7f in seteuid (td=0xc9196b80, uap=0xc6b96cec) at /usr/src/sys/kern/kern_prot.c:615 #14 0xc06985ff in syscallenter (td=0xc9196b80, sa=0xc6b96ce4) at /usr/src/sys/kern/subr_trap.c:315 #15 0xc0880884 in syscall (frame=0xc6b96d28) at /usr/src/sys/i386/i386/trap.c:1061 #16 0xc08671d1 in Xint0x80_syscall () at /usr/src/sys/i386/i386/exception.s:264 #17 0x0033 in ?? () (kgdb) frame 10 #10 0xc0657a16 in uihold (uip=0x0) at /usr/src/sys/kern/kern_resource.c:1248 1248{ (kgdb) list 1243 * Place another refcount on a uidinfo struct. 1244 */ 1245void 1246uihold(uip) 1247struct uidinfo *uip; 1248{ 1249 1250refcount_acquire(uip-ui_ref); 1251} 1252 (kgdb) p *uip Cannot access memory at address 0x0 (kgdb) p uip $1 = (struct uidinfo *) 0x0 (kgdb) Is this reproducable ? What system version is it ? Could you, please, go to frame 12 and show the output of p *p, p *(p-p_ucred) ? pgpw4qSDa53Ej.pgp Description: PGP signature
Re: minor data-typing error in 8.1 fs/devfs/devfs_vnops.c
On Mon, Feb 07, 2011 at 12:53:14AM -0800, per...@pluto.rain.com wrote: Noticed while digging through devfs_read_f() and devfs_write_f() in the course of investigating some unexpected (by me) geom behavior: ... int ioflag, error, resid; ... resid = uio-uio_resid; ... if (uio-uio_resid != resid || ... IOW resid (an int) is being assigned from and compared with uio-uio_resid (an ssize_t). I suppose it's probably harmless on any arch where an (int) is at least as large as an (ssize_t), but strictly speaking it does look like a bug -- or am I missing something? The only consequence of resid truncating uio_resid would be failure to update access time for the devfs node, which is probably not a big issue. In fact, HEAD cannot generate request for i/o greater than 4GB anyway. The type of uio_resid was increased from int to ssize_t to not break the KBI and ease indended fix to support full size_t arguments for read(2)/write(2). The change requires lots of careful review, and thus stalled. I integrated your fix into the patch, see http://people.freebsd.org/~kib/misc/uio_resid.4.patch pgpAONwt6Yfz2.pgp Description: PGP signature
Re: Xorg in swwrt
On Sun, Feb 06, 2011 at 03:19:12PM +1030, Daniel O'Connor wrote: I updated ports (portmaster -a basically) on this 8.2-PRE box and now I find X takes a long, long time to start up and uses lots of CPU. It shows the wchan as swwrt. eg.. last pid: 21791; load averages: 0.12, 0.29, 0.23 up 0+16:09:07 15:16:15 496 processes: 2 running, 494 sleeping CPU: 0.0% user, 0.0% nice, 46.7% system, 0.0% interrupt, 53.3% idle Mem: 190M Active, 33M Inact, 3217M Wired, 198M Cache, 15M Buf, 171M Free Swap: 4096M Total, 621M Used, 3475M Free, 15% Inuse, 212K Out PID USERNAMETHR PRI NICE SIZERES STATE C TIME WCPU COMMAND 21787 fiona 1 760 168M 134M swwrt 0 0:04 32.37% Xorg swwrt means waiting for the syncronous swap-out to finish. This is consistent with the top indicating the non-trivial amount of swap space used and swapout happen right now. Look at the working set of the application you are starting. Another thing that is standing out is huge wired count. 21788 darius1 760 31860K 4868K pause 1 0:00 1.17% zsh 2081 darius4 440 113M 11620K ucond 1 9:45 0.10% python2.6 656 root 1 440 24392K 1096K select 1 3:44 0.00% ppp 1881 darius 32 520 135M 8804K uwait 0 2:24 0.00% python2.6 Does anyone else see this? If it matters I am using the xf86-video-ati driver (II) RADEON(0): vgaHWGetIOBase: hwp-IOBase is 0x03d0, hwp-PIOOffset is 0x (==) RADEON(0): RGB weight 888 (II) RADEON(0): Using 8 bits per RGB (8 bit DAC) (--) RADEON(0): Chipset: ATI Radeon HD 4200 (ChipID = 0x9710) (--) RADEON(0): Linear framebuffer at 0xd800 (II) RADEON(0): PCI card detected [snip] (II) RADEON(0): MC_AGP_LOCATION : 0x003f (II) RADEON(0): Depth moves disabled by default (II) RADEON(0): Allocating from a screen of 131008 kb (II) RADEON(0): Will use 32 kb for hardware cursor 0 at offset 0x00b7c000 (II) RADEON(0): Will use 32 kb for hardware cursor 1 at offset 0x00b8 (II) RADEON(0): Will use 11760 kb for front buffer at offset 0x (II) RADEON(0): Will use 64 kb for PCI GART at offset 0x07ff (II) RADEON(0): Will use 11760 kb for back buffer at offset 0x00b84000 (II) RADEON(0): Will use 11760 kb for depth buffer at offset 0x0170 (II) RADEON(0): Will use 47616 kb for textures at offset 0x0227c000 (II) RADEON(0): Will use 48080 kb for X Server offscreen at offset 0x050fc000 drmOpenDevice: node name is /dev/dri/card0 drmOpenDevice: open result is 10, (OK) drmOpenDevice: node name is /dev/dri/card0 drmOpenDevice: open result is 10, (OK) drmOpenByBusid: Searching for BusID pci::01:05.0 drmOpenDevice: node name is /dev/dri/card0 drmOpenDevice: open result is 10, (OK) drmOpenByBusid: drmOpenMinor returns 10 drmOpenByBusid: drmGetBusid reports pci::01:05.0 (II) [drm] DRM interface version 1.2 (II) [drm] DRM open master succeeded. (II) RADEON(0): [drm] Using the DRM lock SAREA also for drawables. (II) RADEON(0): [drm] framebuffer handle = 0xd800 (II) RADEON(0): [drm] added 1 reserved context for kernel (II) RADEON(0): X context handle = 0x3 (II) RADEON(0): [drm] installed DRM signal handler [in swwrt] Does anyone else see this? Thanks. -- Daniel O'Connor software and network engineer for Genesis Software - http://www.gsoft.com.au The nice thing about standards is that there are so many of them to choose from. -- Andrew Tanenbaum GPG Fingerprint - 5596 B766 97C0 0E94 4347 295E E593 DC20 7B3F CE8C ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org pgpyZWLQ7q6ud.pgp Description: PGP signature
Re: tmpfs is zero bytes (no free space), maybe a zfs bug?
On Wed, Jan 19, 2011 at 05:27:38PM +0100, Ivan Voras wrote: On 19 January 2011 16:02, Kostik Belousov kostik...@gmail.com wrote: http://people.freebsd.org/~ivoras/diffs/tmpfs.h.patch I don't think this is a complete solution but it's a start. If you can, try it and see if it helps. This is not a start, and actually a step in the wrong direction. Tmpfs is wrong now, but the patch would make the wrongness even bigger. Issue is that the current tmpfs calculation should not depend on the length of the inactive queue or the amount of free pages. This data only measures the pressure on the pagedaemon, and has absolutely no relation to the amount of data that can be put into anonymous objects before the system comes out of swap. vm_lowmem handler is invoked in two situations: - when KVA cannot satisfy the request for the space allocation; - when pagedaemon have to start the scan. None of the situations has any direct correlation with the fact that tmpfs needs to check, that is Is there enough swap to keep all my future anonymous memory requests ?. Might be, swap reservation numbers can be useful to the tmpfs reporting. Also might be, tmpfs should reserve the swap explicitely on start, instead of making attempts to guess how much can be allocated at random moment. Thank you for your explanation! I'm still not very familiar with VM and VFS. Could you also read my report at http://www.mail-archive.com/freebsd-current@freebsd.org/msg126491.html ? I'm curious about the fact that there is lots of 'free' memory here in the same situation. This is another ugliness in the dynamic calculation. Your wired is around 15GB, that is always greater then available swap + free + inactive. As result, tmpfs_mem_info() always returns 0. In this situation TMPFS_PAGES_MAX() seems to return negative value, and then TMPFS_PAGES_AVAIL() clamps at 0. Do you think that there is something which can be done as a band-aid without a major modification to tmpfs? pgpZ2A2eFkpjo.pgp Description: PGP signature
Re: Living on gmirror: need to reincarnate /etc/rc.early
On Tue, Jan 25, 2011 at 01:20:53PM +0600, Eugene Grosbein wrote: Hi! In RELENG_8, gmirror is good enough to keep whole HDD pair withing the mirror. Its performance, stability any pretty ease of maintainance allows to use it widely. With wide deployment of gmirror in production I've faced inability of RELENG_8 to store kernel crashdumps out-of-the-box. gmirror manual page documents a way to setup FreeBSD so that it would store crashdumps again but that way involves /etc/rc.early removed from RELENG_8. I've read about intentions - it was unsafe etc. But we still need working crashdump support. Easiest way is to reincarnate /etc/rc.d/early support making it better and safer and it should support gmirror's mechanics for crashdumps out-of-the-box. Comments? Yes, I have this change for eons. Actually, from the moment rc.early was booted out. diff --git a/etc/rc.d/Makefile b/etc/rc.d/Makefile index 6f80b87..7981ce0 100755 --- a/etc/rc.d/Makefile +++ b/etc/rc.d/Makefile @@ -9,7 +9,7 @@ FILES= DAEMON FILESYSTEMS LOGIN NETWORKING SERVERS \ ccd cleanvar cleartmp cron \ ddb defaultroute devd devfs dhclient \ dmesg dumpon \ - encswap \ + early encswap \ faith fsck ftp-proxy ftpd \ gbde geli geli2 gptboot gssd \ hastd hcsecd \ diff --git a/etc/rc.d/early b/etc/rc.d/early new file mode 100755 index 000..8a863d0 --- /dev/null +++ b/etc/rc.d/early @@ -0,0 +1,29 @@ +#!/bin/sh +# +# $FreeBSD$ +# + +# PROVIDE: early +# REQUIRE: disks localswap +# BEFORE: fsck + +# +# Support for legacy /etc/rc.early script +# +. /etc/rc.subr + +name=early +start_cmd=early_start +stop_cmd=: + +early_start() +{ + if [ -r /etc/rc.early ]; then + echo -n 'Executing rc.early script:' + . /etc/rc.early + echo '.' + fi +} + +load_rc_config $name +run_rc_command $1 pgpK3h1KJyuPk.pgp Description: PGP signature
Re: Living on gmirror: need to reincarnate /etc/rc.early
On Tue, Jan 25, 2011 at 11:30:06AM -0800, Doug Barton wrote: On 01/24/2011 23:20, Eugene Grosbein wrote: Hi! In RELENG_8, gmirror is good enough to keep whole HDD pair withing the mirror. Its performance, stability any pretty ease of maintainance allows to use it widely. With wide deployment of gmirror in production I've faced inability of RELENG_8 to store kernel crashdumps out-of-the-box. gmirror manual page documents a way to setup FreeBSD so that it would store crashdumps again but that way involves /etc/rc.early removed from RELENG_8. I've read about intentions - it was unsafe etc. But we still need working crashdump support. Easiest way is to reincarnate /etc/rc.d/early support making it better and safer and it should support gmirror's mechanics for crashdumps out-of-the-box. I'll tell you the same thing I told Kostik way back when I removed it. This is the only thing that anyone has ever suggested a use for in /etc/rc.early, and the solution in the man page is a hack. :) If this is something that is necessary to do then I'd prefer to do it properly and add an /etc/rc.d/gmirror that runs in the proper (early) position, and then figure out the proper location in rc.d to handle the second half of the configuration. No, my use for rc.early is different. I use it to load modules before filesystems are mounted. I'm happy to review patches. :) Doug -- Nothin' ever doesn't change, but nothin' changes much. -- OK Go Breadth of IT experience, and depth of knowledge in the DNS. Yours for the right price. :) http://SupersetSolutions.com/ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org pgp0IAJRe68JT.pgp Description: PGP signature
Re: Living on gmirror: need to reincarnate /etc/rc.early
On Tue, Jan 25, 2011 at 12:30:37PM -0800, Doug Barton wrote: On 01/25/2011 12:28, Kostik Belousov wrote: No, my use for rc.early is different. I use it to load modules before filesystems are mounted. Ok, I'll bite ... what is deficient about doing this in /boot/loader.conf? The fact that for failing driver, I can still get to single-user reliably with boot -s without doing loader command line magic under the stress. Or, not having to describe that magic over the phone to somebody who would prefer to play^H^H^H do something else instead. pgp3YimxMXptr.pgp Description: PGP signature
Re: tmpfs is zero bytes (no free space), maybe a zfs bug?
On Wed, Jan 19, 2011 at 11:39:41AM +0100, Ivan Voras wrote: On 19/01/2011 11:09, Attila Nagy wrote: On 01/19/11 09:46, Jeremy Chadwick wrote: On Wed, Jan 19, 2011 at 09:37:35AM +0100, Attila Nagy wrote: I first noticed this problem on machines with more memory (32GB eg.), but now it happens on 4G machines too: tmpfs 0B 0B 0B 100% /tmp FreeBSD builder 8.2-PRERELEASE FreeBSD 8.2-PRERELEASE #0: Sat Jan 8 22:11:54 CET 2011 Maybe it's related, that I use zfs on these machines... Sometimes it grows and shrinks, but generally there is no space even for a small file, or a socket to create. http://lists.freebsd.org/pipermail/freebsd-stable/2011-January/060867.html Oh crap. :( I hope somebody can find the time to look into this, it's pretty annoying... http://people.freebsd.org/~ivoras/diffs/tmpfs.h.patch I don't think this is a complete solution but it's a start. If you can, try it and see if it helps. This is not a start, and actually a step in the wrong direction. Tmpfs is wrong now, but the patch would make the wrongness even bigger. Issue is that the current tmpfs calculation should not depend on the length of the inactive queue or the amount of free pages. This data only measures the pressure on the pagedaemon, and has absolutely no relation to the amount of data that can be put into anonymous objects before the system comes out of swap. vm_lowmem handler is invoked in two situations: - when KVA cannot satisfy the request for the space allocation; - when pagedaemon have to start the scan. None of the situations has any direct correlation with the fact that tmpfs needs to check, that is Is there enough swap to keep all my future anonymous memory requests ?. Might be, swap reservation numbers can be useful to the tmpfs reporting. Also might be, tmpfs should reserve the swap explicitely on start, instead of making attempts to guess how much can be allocated at random moment. pgptpHEpyasZg.pgp Description: PGP signature
Re: 8.2-PRERELEASE: live deadlock, almost all processes in pfault state
On Sat, Jan 08, 2011 at 09:44:57PM +0300, Lev Serebryakov wrote: Hello, Freebsd-stable. I've added `transmission' BitTorrent client to my home server and now it deadlocks easily (after about 1 hour of intensive download and seeding). This server is upgraded from 7.x and last time I've run transmission on 7.x system without any problems. I have home partition on geom_raid5 device, so I can not exclude this third-party module from experiments. My home filsystem has 32KiB block and all other filesystems (/, /var, /tmp, /usr) has standard 16KiB block sizes. I know, that 7.x system had (has?) deadlock when 16KiB and 64KiB file systems are mixed up on one system, but I never experienced deadlocks with 16KiB and 32KiB mixture. All filesystems (Except root) is SU, but no gjournal so SU_J patch are in use. Same BitTorrent client on same filesystem, but accessed via NFS (from other host), doesn't cause deadlock and works rock-stable for days. I've built kernel with all debug options, waited for deadlock and collect all information, mentioned in Developer's Handbook / Debugging Deadlocks. Capture from debug session is attached, together with kernel config and dmesg from rebooting. As I can easily reproduce this deadlock, I could provide any additional information from kernel debugger, if needed. System: FreeBSD 8.2-PRERELEASE cvsup:2011-01-08 00:41:24 MSK (GTM+3) time Platform: amd64 There is some weird backtrace at the pid 20, what is g_raid5 ? If I am guessing right, this creature has a classic deadlock when bio processing requires memory allocation. It seems that tid 100079 is sleeping not even due to the free page shortage, but due to address space exhaustion. As result, read/write requests are stalled. Then, syncer is blocked waiting for some physical buffer (look at tid 100075), owning the vnode lock. Other processes also wait for the locked buffers, etc. So my belief is that this is plain driver (g_raid5, whatever is it) i/o loss. Try the same load without it. pgpLhnfw4K47p.pgp Description: PGP signature
Re: 8.2-PRERELEASE: live deadlock, almost all processes in pfault state
On Sat, Jan 08, 2011 at 10:29:09PM +0300, Lev Serebryakov wrote: Hello, Kostik. You wrote 8 января 2011 г., 22:02:32: There is some weird backtrace at the pid 20, what is g_raid5 ? It is geom_raid5, with two threads -- working one and one for processing finished bios. If I am guessing right, this creature has a classic deadlock when bio processing requires memory allocation. It seems that tid 100079 tid 100079 sleep in waiting for some data in queue. is sleeping not even due to the free page shortage, but due to address space exhaustion. As result, read/write requests are stalled. tid 100078 sleep in malloc(). But geom_raid5 never ever allocate more than 128MiB of memory and it is 64bit system with huge amount of kmem_size/kmem_size_max... How could I explore allocation (like vmstat -m) from kdb to be sure, it doesn't allocated more? Use show uma and show malloc from ddb. And, if it is classic deadlock is here any classical solution to it? Do not allocate during bio processing. Really, I'm maintainer of geom_raid5 now, so I need fix this deadlock, but I don't really understand, why does it occur? I've hit panic with kernel memory exhausted symptoms when module allocate too much, but not deadlock :( Hm, I missed the kmem_back() in the stack. Yes, the thread is waiting for page allocation. Then, syncer is blocked waiting for some physical buffer (look at tid 100075), owning the vnode lock. Other processes also wait for the locked buffers, etc. So my belief is that this is plain driver (g_raid5, whatever is it) i/o loss. Try the same load without it. I can not, because all data is on this GEOM :) -- // Black Lion AKA Lev Serebryakov l...@serebryakov.spb.ru pgpl73U94BtBn.pgp Description: PGP signature
Re: 8.2-PRERELEASE: live deadlock, almost all processes in pfault state
On Sat, Jan 08, 2011 at 11:10:21PM +0300, Lev Serebryakov wrote: Hello, Kostik. You wrote 8 января 2011 г., 22:56:13: And, if it is classic deadlock is here any classical solution to it? Do not allocate during bio processing. So, if GEOM need some cache, it needs pre-allocate it and implements custom allocator over allocated chunk? :( And what is bio processing in this context? geom_raid5 puts all bio processing == whole time needed to finish pageout. Pageout is often performed to clean the page to lower the page shortage. If pageout requires more free pages to finish during the shortage, then we get the deadlock. Also, it seems that you allocate not only bios (small objects, not every request cause page allocation), but also the huge buffers, that require free pages each time. bios into the (private, internal) queue and geom_start() exits immediately, and bio could spend rather long time in queue (if it is write request) before it will be sent to underlying provider. And, yes, it could be combined with other bios to form new one (why allocation of new bio is needed). So, is bio processing a whole time before bio is complete, or only geom_start() call or what? Also, RAID5 needs to read data (other stripes) and write data (new checksum) when write bio is processed. BTW, system geom_raid3 and geom_vinum (with raid5 volume) need to do the same to maintain checksums, so they could deadlock (in theory) too, if problem is allocate memory during bio processing. And geom_mirror needs allocate bio for second (third, ...) component on every write... -- // Black Lion AKA Lev Serebryakov l...@freebsd.org pgpxNoOkpIjqZ.pgp Description: PGP signature
Re: Hang in VOP_LOCK1_APV on 8-STABLE with NFS.
On Fri, Jan 07, 2011 at 02:37:25PM -0500, Rick Macklem wrote: Hi, OpenOffice hangs on NFS when I try to save a file or even when I try to open the save dialog in this case. $ 17:25:35 ron...@ronald [~] procstat -kk 85575 PID TID COMM TDNAME KSTACK 85575 100322 soffice.bin initial thread mi_switch+0x176 sleepq_wait+0x3b __lockmgr_args+0x655 vop_stdlock+0x39 VOP_LOCK1_APV+0x46 _vn_lock+0x44 vget+0x67 vfs_hash_get+0xeb nfs_nget+0xa8 nfs_lookup+0x65e VOP_LOOKUP_APV+0x40 lookup+0x48a namei+0x518 kern_statat_vnhook+0x82 kern_statat+0x15 lstat+0x22 syscallenter+0x186 syscall+0x40 85575 100502 soffice.bin - mi_switch+0x176 sleepq_catch_signals+0x309 sleepq_timedwait_sig+0x12 _sleep+0x1a0 do_cv_wait+0x639 __umtx_op_cv_wait+0x51 syscallenter+0x186 syscall+0x40 Xfast_syscall+0xe2 85575 100576 soffice.bin - mi_switch+0x176 sleepq_catch_signals+0x309 sleepq_timedwait_sig+0x12 _sleep+0x1a0 do_cv_wait+0x639 __umtx_op_cv_wait+0x51 syscallenter+0x186 syscall+0x40 Xfast_syscall+0xe2 85575 100577 soffice.bin - mi_switch+0x176 sleepq_catch_signals+0x309 sleepq_wait_sig+0xc _sleep+0x25d kern_accept+0x19c accept+0xfe syscallenter+0x186 syscall+0x40 Xfast_syscall+0xe2 85575 100578 soffice.bin - mi_switch+0x176 sleepq_catch_signals+0x309 sleepq_wait_sig+0xc _cv_wait_sig+0x10e seltdwait+0xed poll+0x457 syscallenter+0x186 syscall+0x40 Xfast_syscall+0xe2 85575 100579 soffice.bin - mi_switch+0x176 sleepq_catch_signals+0x309 sleepq_timedwait_sig+0x12 _cv_timedwait_sig+0x11d seltdwait+0x79 poll+0x457 syscallenter+0x186 syscall+0x40 Xfast_syscall+0xe2 $ 17:25:35 ron...@ronald [~] uname -a FreeBSD ronald.office.base.nl 8.2-PRERELEASE FreeBSD 8.2-PRERELEASE #6: Mon Dec 27 23:49:30 CET 2010 r...@ronald.office.base.nl:/usr/obj/usr/src/sys/GENERIC amd64 I think all the above tells us is that the thread is waiting for a vnode lock. The question then becomes what is holding a lock on that vnode and why?. It is not possible to exit or kill soffice.bin. I had a slighty different procstat stack before, but that was fixed a couple of days ago. Yea, it will be in an uniterruptible sleep when waiting for a vnode lock. Any thoughts? Enabling local locks in NFS doesn't fix it. Here's some things you could try: 1 - apply the attached patch. It fixes a known problem w.r.t. the client side of the krpc. Not likely to fix this, but I can hope:-) 1a - Look around of other processes in the uninterruptible sleep state, quite possible, one of them also owns the lock the openoffice is waiting for. Also see http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug-deadlocks.html Of the particular interest are the witness output and backtraces for all threads that are reported by witness as owning the vnode locks. 2 - If #1 doesn't fix the problem: - before making it hang, start capturing packets via: # tcpdump -s 0 -w xxx host server - then make it hang, kill the above and # procstat -ka # ps axHlww and capture the output of both of these. Hopefully these 2 commands will indicate what is holding the vnode lock and maybe, why. The xxx file can be looked at in wireshark to see what/if any NFS traffic is happening. If you aren't comfortable looking at the above, you can email them to me and I'll take a stab at them someday. 3 - Try the experimental client to see if it behaves differently. The mount command is: # mount -t newnfs -o nfsv3,the options you already use server:/path /mntpath (This might ideantify if the regular client has an infrequently executed code path that forgets to unlock the vnode, since it uses a somewhat different RPC layer. The buffer cache handling etc are almost the same, but the RPC stuff is fairly different.) The nfs server is an up-to-date Linux Debian 5 with kernel 2.6.26. I'm afraid I can't blame Linux (at least not until we have more info;-). If more info is needed. I can easily reproduce this. See above #2. Good luck with it and let us know how it goes, rick ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org pgp5EBbygNWgK.pgp Description: PGP signature
Re: FreeBSD 8.2-PRERELEASE hangs under load with live kernel
On Thu, Jan 06, 2011 at 01:31:45PM +0300, Lev Serebryakov wrote: Hello, Freebsd-stable. I've added torrent client (transmission) to software on my home server and it starts to hang in very unusual way: kernel works but userland doesn't. I can ping it (and it answers). I can scroll console with scrolllock button and keys. I can break into debugger with Ctrl+SysReq and it shows, that one CPU is occupied by idle process and other by Giant tasq, but no userland processes answer: I can not ssh to it, I cannot login on console, samba is dead, etc. ps in kernel debugger shows, that many of processes in pfault state, and noting more special. memtest86+ doesn't show any errors after 8 passes of tests (about 10 hours), so RAM looks Ok. What should I do in kdb to understand what happens? Kernel config and /var/run/dmesg.boot is attached. http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug-deadlocks.html pgpWvepMMaGgU.pgp Description: PGP signature
Re: RFC: Upgrade BIND version in RELENG_7 to BIND 9.6.x
On Fri, Dec 17, 2010 at 09:41:54PM -0800, Doug Barton wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Howdy, Traditionally for contributed software generally, and BIND in particular we have tried to keep the major version of the contributed software consistent throughout a given RELENG_$N branch of FreeBSD. Hopefully the reasoning for this is obvious, we want to avoid POLA violations. Actually not. My own POV is that we should follow the vendor release cycle, and not the FreeBSD release cycle, for the contributed software. I do not advocate immediate upgrade of the third-party software that reached its EOL, but I think that we should do this without pushback if maintainer consider the neccessity of upgrade. pgpHCCfGCFIOv.pgp Description: PGP signature
Re: Following vendor release cycle (Was: Re: RFC: Upgrade BIND version in RELENG_7 to BIND 9.6.x)
On Sat, Dec 18, 2010 at 03:07:11PM -0800, Doug Barton wrote: On 12/18/2010 03:15, Kostik Belousov wrote: On Fri, Dec 17, 2010 at 09:41:54PM -0800, Doug Barton wrote: Howdy, Traditionally for contributed software generally, and BIND in particular we have tried to keep the major version of the contributed software consistent throughout a given RELENG_$N branch of FreeBSD. Hopefully the reasoning for this is obvious, we want to avoid POLA violations. Actually not. My own POV is that we should follow the vendor release cycle, and not the FreeBSD release cycle, for the contributed software. I do not advocate immediate upgrade of the third-party software that reached its EOL, but I think that we should do this without pushback if maintainer consider the neccessity of upgrade. Just to be clear, there were considerable discussions, over a long period of time; between myself, the release engineers, and the security-officer team regarding the subject of BIND 9.3 in RELENG_6. I was given the green light to upgrade if I felt it was necessary (as you're suggesting here) but the final decision to live with the status quo was mine, and I accept responsibility for it. My reasoning was as follows: 1. All the latest versions of BIND are available in ports, and I made sure that they worked in RELENG_6 so that users who wanted to stay at that OS level but had more serious DNS needs had an easy path. 2. Because BIND 9.3 lacked the ability to do modern DNSSEC anyone who wanted that feature would have to upgrade anyway. 3. BIND 9.3 was still suitable for the (primary) stated purpose of BIND in the base, a basic local resolving name server. 4. BIND 9.3 was different enough that users migrating from it to more modern versions were experiencing problems. 5. Users were naturally migrating to RELENG_[78] at a pace which minimized the impact of the issue. If any of those things had stopped being true my decision would have been different, but as it was I chose to grin and bear it in order to avoid the POLA violation for any users who were actually using BIND 9.3 in RELENG_6. However, the circumstances for BIND 9.4 and RELENG_7 are different, and much more amenable to the upgrade, which is why I'm proposing it. I do not question your decision of upgrading or leaving the legacy version of BIND in the legacy branch of FreeBSD src. I only noted that my personal POV is that we develop the OS, and not are the vendor of the third-party software, in this case the BIND. As such, I think that following the vendor life-cycle for contrib is less resource-intensive for the project, and should be the default. If anybody who does the real work feels that it is interesting/nice to the users/generally better to spend the time neccessary to keep the upgrade path on the branch smoother, I am fine with this. pgpkjh3N0ouV5.pgp Description: PGP signature
Re: vm.swap_reserved toooooo large?
On Wed, Dec 15, 2010 at 03:43:56PM +0200, George Mamalakis wrote: On 15/12/2010 13:26, Trond Endrest??l wrote: On Wed, 15 Dec 2010 13:04+0200, George Mamalakis wrote: I was testing a program that would exhaust all my memory (in C++), and when this would happen, it would call set_new_handler() along with one of my functions that would inform the user about the lack of memory and then it would exit the program. Instead, the program was force-killed by the kernel (signal 9) and I was informed that: If all your process' memory is exhausted, then there is no memory left for the runtime system for doing I/O and the other stuff you want. Next, unless I'm on drugs, maybe you should call set_new_handler() before you actually run out of memory. Just my $0.02. Trond. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org Trond, My problem was not that the program was force-killed, my problem was that the system reserved 500G+ of swap, even though the total size is 4G. Read tuning(7), in particular, the description of vm.overcommit sysctl. pgptM012qrIFN.pgp Description: PGP signature
Re: cryptodev cipher registration (aesni and padlock)
On Mon, Dec 13, 2010 at 10:27:00AM -0500, Mike Tancsa wrote: While doing some testing with the aesni driver, it seems some ciphers are registered with openssl and some are not. e.g. if I start an ssh session using aes128, I see the following [pyroxene]% ssh -c aes128-cbc smarthost1 cryptostats | grep sym 654198 symmetric crypto ops (0 errors, 0 times driver blocked) [pyroxene]% ssh -c aes128-cbc smarthost1 cryptostats | grep sym 654225 symmetric crypto ops (0 errors, 0 times driver blocked) [pyroxene]% ie it shows the hardware transformation count increasing. But if I do aes 192 or 256, it does not [pyroxene]% ssh -c aes256-cbc smarthost1 cryptostats | grep sym 654231 symmetric crypto ops (0 errors, 0 times driver blocked) [pyroxene]% ssh -c aes192-cbc smarthost1 cryptostats | grep sym 654231 symmetric crypto ops (0 errors, 0 times driver blocked) [pyroxene]% ssh -c aes192-cbc smarthost1 cryptostats | grep sym 654231 symmetric crypto ops (0 errors, 0 times driver blocked) [pyroxene]% ssh -c aes192-cbc smarthost1 cryptostats | grep sym 654231 symmetric crypto ops (0 errors, 0 times driver blocked) [pyroxene]% Yet the are supposed to be supported, no ? Where in openssl is this configured ? The padlock driver does the same thing % ssh -c aes256-cbc smarthost1 cryptotest -z 0.000 sec, 2aes crypts, 16 bytes, 400 byte/sec, 30.5 Mb/sec 0.000 sec, 2aes crypts, 32 bytes, 1600 byte/sec, 122.1 Mb/sec 0.000 sec, 2aes crypts, 64 bytes, 3200 byte/sec, 244.1 Mb/sec 0.000 sec, 2aes crypts, 128 bytes, 6400 byte/sec, 488.3 Mb/sec 0.000 sec, 2aes crypts, 256 bytes, 12800 byte/sec, 976.6 Mb/sec 0.000 sec, 2aes crypts, 512 bytes, 17067 byte/sec, 1302.1 Mb/sec 0.000 sec, 2aes crypts,1024 bytes, 292571429 byte/sec, 2232.1 Mb/sec 0.000 sec, 2aes crypts,2048 bytes, 45511 byte/sec, 3472.2 Mb/sec 0.000 sec, 2aes crypts,4096 bytes, 51200 byte/sec, 3906.2 Mb/sec 0.000 sec, 2aes crypts,8192 bytes, 420102564 byte/sec, 3205.1 Mb/sec 0.000 sec, 2 aes192 crypts, 16 bytes, 800 byte/sec, 61.0 Mb/sec 0.000 sec, 2 aes192 crypts, 32 bytes, 1600 byte/sec, 122.1 Mb/sec 0.000 sec, 2 aes192 crypts, 64 bytes, 3200 byte/sec, 244.1 Mb/sec 0.000 sec, 2 aes192 crypts, 128 bytes, 6400 byte/sec, 488.3 Mb/sec 0.000 sec, 2 aes192 crypts, 256 bytes, 12800 byte/sec, 976.6 Mb/sec 0.000 sec, 2 aes192 crypts, 512 bytes, 20480 byte/sec, 1562.5 Mb/sec 0.000 sec, 2 aes192 crypts,1024 bytes, 34133 byte/sec, 2604.2 Mb/sec 0.000 sec, 2 aes192 crypts,2048 bytes, 40960 byte/sec, 3125.0 Mb/sec 0.000 sec, 2 aes192 crypts,4096 bytes, 54613 byte/sec, 4166.7 Mb/sec 0.000 sec, 2 aes192 crypts,8192 bytes, 496484848 byte/sec, 3787.9 Mb/sec 0.000 sec, 2 aes256 crypts, 16 bytes, 1067 byte/sec, 81.4 Mb/sec 0.000 sec, 2 aes256 crypts, 32 bytes, 2133 byte/sec, 162.8 Mb/sec 0.000 sec, 2 aes256 crypts, 64 bytes, 3200 byte/sec, 244.1 Mb/sec 0.000 sec, 2 aes256 crypts, 128 bytes, 5120 byte/sec, 390.6 Mb/sec 0.000 sec, 2 aes256 crypts, 256 bytes, 10240 byte/sec, 781.2 Mb/sec 0.000 sec, 2 aes256 crypts, 512 bytes, 20480 byte/sec, 1562.5 Mb/sec 0.000 sec, 2 aes256 crypts,1024 bytes, 292571429 byte/sec, 2232.1 Mb/sec 0.000 sec, 2 aes256 crypts,2048 bytes, 40960 byte/sec, 3125.0 Mb/sec 0.000 sec, 2 aes256 crypts,4096 bytes, 51200 byte/sec, 3906.2 Mb/sec 0.000 sec, 2 aes256 crypts,8192 bytes, 442810811 byte/sec, 3378.4 Mb/secW From my reading of src/crypto/openssl/crypto/engine/eng_cryptodev.c, and browsing http://cvs.openssl.org/rlog?f=openssl/crypto/engine/eng_cryptodev.c it seems that only OpenSSL HEAD and 1.0 branch have support for AES-192 and AES-256 when working with /dev/crypto. pgp0dhrqHFg74.pgp Description: PGP signature
Re: aesni(?) corrupts data on 8.2-BETA1
On Sat, Dec 11, 2010 at 07:37:51PM -0500, Mike Tancsa wrote: On 12/11/2010 6:22 PM, Kostik Belousov wrote: On Sat, Dec 11, 2010 at 06:08:08PM -0500, Mike Tancsa wrote: On 12/11/2010 11:01 AM, Kostik Belousov wrote: I have no access to AESNI hardware. For start, you may use src/tools/tools/crypto/cryptotest to somewhat verify the sanity of the driver. I doesnt happen every time, but one out of 5 or so First, which arch is it, amd64 or i386 ? Also, please revert r216162 and do the same tests. Hi, Its AMD64, but i386 seems to be impacted too. I am not sure how to revert to a specific commit, but for now I csup'd with a date tag of *date=2010.12.02.23.00.00 which is a day before http://lists.freebsd.org/pipermail/svn-src-stable-8/2010-December/004338.html And that seems to fix it! I have been running cryptotest -c -z -t 10 in a loop for the past 10min and not one error. Please try this patch on the latest HEAD or RELENG_8. diff --git a/sys/amd64/amd64/fpu.c b/sys/amd64/amd64/fpu.c index 482b5da..1b493b4 100644 --- a/sys/amd64/amd64/fpu.c +++ b/sys/amd64/amd64/fpu.c @@ -426,7 +426,9 @@ fpudna(void) fxrstor(fpu_initialstate); if (pcb-pcb_initial_fpucw != __INITIAL_FPUCW__) fldcw(pcb-pcb_initial_fpucw); - fpuuserinited(curthread); + pcb-pcb_flags |= PCB_FPUINITDONE; + if (PCB_USER_FPU(pcb)) + pcb-pcb_flags |= PCB_USERFPUINITDONE; } else fxrstor(pcb-pcb_save); critical_exit(); diff --git a/sys/i386/isa/npx.c b/sys/i386/isa/npx.c index 9ec5d25..f314e44 100644 --- a/sys/i386/isa/npx.c +++ b/sys/i386/isa/npx.c @@ -684,7 +684,9 @@ npxdna(void) fpurstor(npx_initialstate); if (pcb-pcb_initial_npxcw != __INITIAL_NPXCW__) fldcw(pcb-pcb_initial_npxcw); - npxuserinited(curthread); + pcb-pcb_flags |= PCB_NPXINITDONE; + if (PCB_USER_FPU(pcb)) + pcb-pcb_flags |= PCB_NPXUSERINITDONE; } else { /* * The following fpurstor() may cause an IRQ13 when the pgpA0gqcjE6TG.pgp Description: PGP signature
Re: aesni(?) corrupts data on 8.2-BETA1
On Fri, Dec 10, 2010 at 08:43:18PM -0500, Mike Tancsa wrote: Actually, I just noticed something like this as well with ssh via cryptodev and rsync as well. It was erroring out. eg. Dec 10 16:50:01 backup3 sshd[13120]: Corrupted MAC on input. Dec 10 16:50:01 backup3 sshd[13120]: Finished discarding for 64.x.x.x I had a few ssh sessions die as well. It was working ok with a kernel / world from last week. I was going to try and see if I can narrow it down, but it seemed to had been working fine with world from last week. Not sure if its the openssl update ? But if you are seeing issues with geli, then I doubt its openssl. ---Mike On 12/10/2010 7:49 PM, Jan Henrik Sylvester wrote: I just upgraded my main laptop from 8.1-RELEASE (GENERIC, amd64) to 8.2-BETA1 and added aesni_load=YES to my /boot/loader.conf. (If my interpretation is correct:) With aesni loaded, I see many files corrupted on my geli encrypted volume. Without aesni loaded, they are ok. I have got a journaling UFS2 on gjournal on geli on a FreeBSD partition on a MBR slice on a disk with ahci loaded. Story: First I noticed some weirdness of Thunderbird not showing the upgraded message properly and reloading IMAP messages that have already been read, but did not think of anything. Only during my usual rsyncing of the encrypted volume, I saw that some files could not be read (invalid file descriptor?). I rebooted without aesni and got a different error message. I created checksums of all files on that encrypted volume with and without aesni loaded (rebooting in between): 150 Differences (one files could not be read in both cases). Just to make sure, I tried to rsync with --checksum and --dry-run to the other machine that is supposed to have the same files: With aesni, many files were scheduled to be synced and one could not be read, but without aesni, only that one file was scheduled to be synced -- it probably got corrupted for good with aesni loaded. It is especially weird that I did not attempt to write to the file that got corrupted on disk with aesni loaded. Is there anything I am doing wrong or is it really aesni or the processor failing? The processor is a Core i7-M620 (with AESNI of course). Before I investigate any further, I have to make a real backup... rsyncing does not prevent silent corruption. I am lucky that it was not so silent after all. I have no access to AESNI hardware. For start, you may use src/tools/tools/crypto/cryptotest to somewhat verify the sanity of the driver. pgpAfHZnB2pXv.pgp Description: PGP signature
Re: aesni(?) corrupts data on 8.2-BETA1
On Sat, Dec 11, 2010 at 06:08:08PM -0500, Mike Tancsa wrote: On 12/11/2010 11:01 AM, Kostik Belousov wrote: I have no access to AESNI hardware. For start, you may use src/tools/tools/crypto/cryptotest to somewhat verify the sanity of the driver. I doesnt happen every time, but one out of 5 or so First, which arch is it, amd64 or i386 ? Also, please revert r216162 and do the same tests. pgplwXlG6q3qY.pgp Description: PGP signature
Re: top io mode
On Thu, Nov 25, 2010 at 05:28:30AM -0600, Adam Vande More wrote: top io doesn't seem to display stats when dealing direct with a block device like so: dd if=/dev/ada0 of=/dev/null However if dd runs on a regular file eg dd if=test.file of=/dev/null then stats are reported in top. Is this the expected behavior? I do not think so, and the patch at the end of the message worked for me. I cannot explain the if (!TD_IS_IDLETHREAD(curthread)) curthread-td_ru.ru_inblock++; checks that are done in vfs_bio.c. diff --git a/sys/kern/kern_physio.c b/sys/kern/kern_physio.c index d6be6e7..34072f3 100644 --- a/sys/kern/kern_physio.c +++ b/sys/kern/kern_physio.c @@ -57,10 +57,13 @@ physio(struct cdev *dev, struct uio *uio, int ioflag) for (i = 0; i uio-uio_iovcnt; i++) { while (uio-uio_iov[i].iov_len) { bp-b_flags = 0; - if (uio-uio_rw == UIO_READ) + if (uio-uio_rw == UIO_READ) { bp-b_iocmd = BIO_READ; - else + curthread-td_ru.ru_inblock++; + } else { bp-b_iocmd = BIO_WRITE; + curthread-td_ru.ru_oublock++; + } bp-b_iodone = bdone; bp-b_data = uio-uio_iov[i].iov_base; bp-b_bcount = uio-uio_iov[i].iov_len; pgpfkjhFTcdCl.pgp Description: PGP signature
Re: top io mode
On Thu, Nov 25, 2010 at 04:35:53PM -0600, Adam Vande More wrote: On Thu, Nov 25, 2010 at 3:04 PM, Jeremy Chadwick free...@jdc.parodius.comwrote: Bad form to follow up to my own Email of course, but some discussion material: I'm a frequent offender myself so I won't be pointing any fingers. top -m io doesn't show any I/O writes, while gstat(8) does, and to numerous devices all which make up some form of ZFS pool. Yes, it's a generic ZFS mirror. If you do something like dd if=/dev/urandom of=/pool/file bs=64k, does top -m io show write I/O for the dd process? It does not on my ZFS STABLE system with Kostick's path. So the patch fixes reads, but not writes. cc'ing to notify in case he has more ideas. Can you show exact command and describe some details about setup for the case where you still do not observe the counter in top ? (With my patch applied). There are two mixed things in the thread: First, there is (was) a missed accounting for i/o to physical devices, and the patch I posted should fix it. Second, it is relatively well-known that ZFS does not properly accounts i/o. Might be, the patch by Andrew fixes it, I do not know. pgpnsJ18JpLPt.pgp Description: PGP signature
Re: top io mode
On Thu, Nov 25, 2010 at 05:18:13PM -0600, Adam Vande More wrote: On Thu, Nov 25, 2010 at 4:44 PM, Kostik Belousov kostik...@gmail.comwrote: Can you show exact command and describe some details about setup for the case where you still do not observe the counter in top ? (With my patch applied). - Still broken with patch applied; dd if=/dev/zero of=/tmp/delete.me bs=64k What is /tmp/delete.me ? A file ? On what kind of filesystem is it located ? Summoning some psychic power, I can predict that delete.me is located on ZFS or tmpfs filesystem. Is this right ? If yes, then the result is expected and nothing is broken there (except ZFS, but I already described it). during this top -m io displays for dd: 2235 adam 14 24 0 0 0 0 0.00% dd - Fixed with patch applied; dd if=/dev/ada0 of=/dev/null bs=64k during this top -m io displays for dd: 2248 adam 3262 0 3262 0 0 3262 100.00% dd -- Adam Vande More pgpFbyWSWKOjh.pgp Description: PGP signature
Re: Call for testers: FPU changes
On Sat, Nov 20, 2010 at 01:30:54AM -0500, Mike Tancsa wrote: On 11/16/2010 4:43 AM, Kostik Belousov wrote: On Mon, Nov 15, 2010 at 10:42:50PM -0500, Mike Tancsa wrote: On 11/15/2010 4:13 PM, Kostik Belousov wrote: Patch is at http://people.freebsd.org/~kib/misc/releng_8_fpu.1.patch I did some more tests post commit today using the aesni kld taken directly from HEAD. BTW, do you plan to MFC this as well ? Sure, I will merge aesni(4), it was the only reason to work on the kern_fpu in stable/8. I want some pause between KPI and driver MFC, to ease the handling of possible mismerge or fixing latent bugs (since stable has much larger testing base then HEAD). Results at the bottom of http://www.tancsa.com/fpu.html It certainly makes a difference with geli. IPSEC and userland stuff, not so much. The CPU itself is crazy fast, so its hard to see a difference in things like ssh and even ipsec didnt yield any differences. For ssh and userland stuff I guess once there is an aesni userland engine, this would probably help over the cryptodev interface. Yes, the small blocks encoding/decoding has a large overhead of loop setup code. Thank you. pgpGfPDahzWMt.pgp Description: PGP signature
Re: Call for testers: FPU changes
On Tue, Nov 16, 2010 at 08:46:23PM -0500, Mike Tancsa wrote: On 11/16/2010 5:19 PM, Kostik Belousov wrote: Would your conclusion be that the patch seems to increase the throughput of the aesni(4) ? I think that on small-sized blocks, when using aesni(4), the dominating factor is the copying/copyout of the data to/from the kernel address space. Still would be interesting to compare the full output of openssl speed on aesni(4) with and without the patch I posted. Hi, There does seem to be some improvement on large blocks. But there are some freakishly fast times. On other sizes, there is no difference in speed it would seem I did 20 runs. Updated stats at http://www.tancsa.com/fpu.html Thank you. Indeed, I think that the test units are too small so that random system events can cause the variation. Nonetheless, patch seems to help, so I committed it. Meantime, the similar change may be beneficial for padlock(4) too. f you are going to test it, please note that most likely, openssl padlock engine does not use padlock(4), I do not know for sure. diff --git a/sys/crypto/via/padlock.c b/sys/crypto/via/padlock.c index 77e059b..ba63093 100644 --- a/sys/crypto/via/padlock.c +++ b/sys/crypto/via/padlock.c @@ -170,7 +170,7 @@ padlock_newsession(device_t dev, uint32_t *sidp, struct cryptoini *cri) struct padlock_session *ses = NULL; struct cryptoini *encini, *macini; struct thread *td; - int error; + int error, saved_ctx; if (sidp == NULL || cri == NULL) return (EINVAL); @@ -238,10 +238,18 @@ padlock_newsession(device_t dev, uint32_t *sidp, struct cryptoini *cri) if (macini != NULL) { td = curthread; - error = fpu_kern_enter(td, ses-ses_fpu_ctx, FPU_KERN_NORMAL); + if (!is_fpu_kern_thread(0)) { + error = fpu_kern_enter(td, ses-ses_fpu_ctx, + FPU_KERN_NORMAL); + saved_ctx = 1; + } else { + error = 0; + saved_ctx = 0; + } if (error == 0) { error = padlock_hash_setup(ses, macini); - fpu_kern_leave(td, ses-ses_fpu_ctx); + if (saved_ctx) + fpu_kern_leave(td, ses-ses_fpu_ctx); } if (error != 0) { padlock_freesession_one(sc, ses, 0); diff --git a/sys/crypto/via/padlock_cipher.c b/sys/crypto/via/padlock_cipher.c index 0ae26c8..1456ddf 100644 --- a/sys/crypto/via/padlock_cipher.c +++ b/sys/crypto/via/padlock_cipher.c @@ -205,7 +205,7 @@ padlock_cipher_process(struct padlock_session *ses, struct cryptodesc *enccrd, struct thread *td; u_char *buf, *abuf; uint32_t *key; - int allocated, error; + int allocated, error, saved_ctx; buf = padlock_cipher_alloc(enccrd, crp, allocated); if (buf == NULL) @@ -250,14 +250,21 @@ padlock_cipher_process(struct padlock_session *ses, struct cryptodesc *enccrd, } td = curthread; - error = fpu_kern_enter(td, ses-ses_fpu_ctx, FPU_KERN_NORMAL); + if (!is_fpu_kern_thread(0)) { + error = fpu_kern_enter(td, ses-ses_fpu_ctx, FPU_KERN_NORMAL); + saved_ctx = 1; + } else { + error = 0; + saved_ctx = 0; + } if (error != 0) goto out; padlock_cbc(abuf, abuf, enccrd-crd_len / AES_BLOCK_LEN, key, cw, ses-ses_iv); - fpu_kern_leave(td, ses-ses_fpu_ctx); + if (saved_ctx) + fpu_kern_leave(td, ses-ses_fpu_ctx); if (allocated) { crypto_copyback(crp-crp_flags, crp-crp_buf, enccrd-crd_skip, diff --git a/sys/crypto/via/padlock_hash.c b/sys/crypto/via/padlock_hash.c index 58c58b2..0fe182b 100644 --- a/sys/crypto/via/padlock_hash.c +++ b/sys/crypto/via/padlock_hash.c @@ -366,17 +366,24 @@ padlock_hash_process(struct padlock_session *ses, struct cryptodesc *maccrd, struct cryptop *crp) { struct thread *td; - int error; + int error, saved_ctx; td = curthread; - error = fpu_kern_enter(td, ses-ses_fpu_ctx, FPU_KERN_NORMAL); + if (!is_fpu_kern_thread(0)) { + error = fpu_kern_enter(td, ses-ses_fpu_ctx, FPU_KERN_NORMAL); + saved_ctx = 1; + } else { + error = 0; + saved_ctx = 0; + } if (error != 0) return (error); if ((maccrd-crd_flags CRD_F_KEY_EXPLICIT) != 0) padlock_hash_key_setup(ses, maccrd-crd_key, maccrd-crd_klen); error = padlock_authcompute(ses, maccrd, crp-crp_buf, crp-crp_flags); - fpu_kern_leave(td, ses-ses_fpu_ctx); + if (saved_ctx) + fpu_kern_leave(td, ses-ses_fpu_ctx); return (error
Re: Call for testers: FPU changes
On Wed, Nov 17, 2010 at 02:18:50PM -0500, Mike Tancsa wrote: On 11/17/2010 11:35 AM, Kostik Belousov wrote: Meantime, the similar change may be beneficial for padlock(4) too. f you are going to test it, please note that most likely, openssl padlock engine does not use padlock(4), I do not know for sure. diff --git a/sys/crypto/via/padlock.c b/sys/crypto/via/padlock.c index 77e059b..ba63093 100644 --- a/sys/crypto/via/padlock.c +++ b/sys/crypto/via/padlock.c Patch applied cleanly Full results at the bottom of http://www.tancsa.com/fpu.html On large blocks, version 1 vs the above patch show no significant difference. This is with openssl using the cryptodev engine. I also compared to the openssl padlock engine which gave interesting results! 0(via)# cat version1.txt | sed -e 's/k//g' | awk '{print $6}' 1 0(via)# cat version2.txt | sed -e 's/k//g' | awk '{print $6}' 2 0(via)# ministat 1 2 x 1 + 2 N Min MaxMedian AvgStddev x 30 2591851.6 6645345.1 4326340.6 4227917.6 1083181.2 + 30 2574883.9 8830282.8 4033610.4 4241195.6 1519334.8 No difference proven at 95.0% confidence 0(via)# cat version1.txt | sed -e 's/k//g' | awk '{print $5}' 1 0(via)# cat version2.txt | sed -e 's/k//g' | awk '{print $5}' 2 0(via)# ministat 1 2 N Min MaxMedian AvgStddev x 30 1124673.3 2320883.7 1527677.1 1550631.9 295165.4 + 30 1069788.2 2508865.7 1594506.2 1588193.2 389414.33 No difference proven at 95.0% confidence 0(via)# Thank you once more. If nothing new pops up, I will commit the MFC tomorrow. Unfortunately, no suspend/resume testers appeared, so be it. pgpCEmxG512GE.pgp Description: PGP signature
Re: Call for testers: FPU changes
On Mon, Nov 15, 2010 at 10:42:50PM -0500, Mike Tancsa wrote: On 11/15/2010 4:13 PM, Kostik Belousov wrote: Patch is at http://people.freebsd.org/~kib/misc/releng_8_fpu.1.patch Hi, One small failure on the patch The text leading up to this was: -- |Index: pc98/include/npx.h |=== |--- pc98/include/npx.h (revision 215253) |+++ pc98/include/npx.h (working copy) -- Patching file pc98/include/npx.h using Plan A... Hunk #1 failed at 1. 1 out of 1 hunks failed--saving rejects to pc98/include/npx.h.rej This is because our patch(1) in base is somewhat old, I believe. The diff was generated by svn diff from the up to date stable/8 checkout, and the reason for failure is expanded $FreeBSD$ tags. Newer gnu patch, available in ports, handless this correctly, reporting about patches applied with fuzz. I tested with openssl and openvpn and all seems to work great on the via board and my i5 board!! Simple test details at http://www.tancsa.com/fpu.html I will try out geli and some more extensive tests tomorrow Thanks for porting this back to RELENG_8 ! This is actually somewhat puzzling. Does openssl in base automatically use crypto(4) ? Also, could you, please redo the speed tests for aesni(4) with the following patch applied over the driver sources ? Thank you ! diff --git a/sys/crypto/aesni/aesni_wrap.c b/sys/crypto/aesni/aesni_wrap.c index 36c66ea..3fd397c 100644 --- a/sys/crypto/aesni/aesni_wrap.c +++ b/sys/crypto/aesni/aesni_wrap.c @@ -246,14 +246,21 @@ int aesni_cipher_setup(struct aesni_session *ses, struct cryptoini *encini) { struct thread *td; - int error; + int error, saved_ctx; td = curthread; - error = fpu_kern_enter(td, ses-fpu_ctx, FPU_KERN_NORMAL); + if (!is_fpu_kern_thread(0)) { + error = fpu_kern_enter(td, ses-fpu_ctx, FPU_KERN_NORMAL); + saved_ctx = 1; + } else { + error = 0; + saved_ctx = 0; + } if (error == 0) { error = aesni_cipher_setup_common(ses, encini-cri_key, encini-cri_klen); - fpu_kern_leave(td, ses-fpu_ctx); + if (saved_ctx) + fpu_kern_leave(td, ses-fpu_ctx); } return (error); } @@ -264,16 +271,22 @@ aesni_cipher_process(struct aesni_session *ses, struct cryptodesc *enccrd, { struct thread *td; uint8_t *buf; - int error, allocated; + int error, allocated, saved_ctx; buf = aesni_cipher_alloc(enccrd, crp, allocated); if (buf == NULL) return (ENOMEM); td = curthread; - error = fpu_kern_enter(td, ses-fpu_ctx, FPU_KERN_NORMAL); - if (error != 0) - goto out; + if (!is_fpu_kern_thread(0)) { + error = fpu_kern_enter(td, ses-fpu_ctx, FPU_KERN_NORMAL); + if (error != 0) + goto out; + saved_ctx = 1; + } else { + saved_ctx = 0; + error = 0; + } if ((enccrd-crd_flags CRD_F_KEY_EXPLICIT) != 0) { error = aesni_cipher_setup_common(ses, enccrd-crd_key, @@ -311,7 +324,8 @@ aesni_cipher_process(struct aesni_session *ses, struct cryptodesc *enccrd, ses-iv); } } - fpu_kern_leave(td, ses-fpu_ctx); + if (saved_ctx) + fpu_kern_leave(td, ses-fpu_ctx); if (allocated) crypto_copyback(crp-crp_flags, crp-crp_buf, enccrd-crd_skip, enccrd-crd_len, buf); pgpTmlaTNbgbt.pgp Description: PGP signature
Re: Call for testers: FPU changes
On Tue, Nov 16, 2010 at 05:08:30PM -0500, Mike Tancsa wrote: On 11/16/2010 4:43 AM, Kostik Belousov wrote: On Mon, Nov 15, 2010 at 10:42:50PM -0500, Mike Tancsa wrote: On 11/15/2010 4:13 PM, Kostik Belousov wrote: Patch is at http://people.freebsd.org/~kib/misc/releng_8_fpu.1.patch Hi, One small failure on the patch The text leading up to this was: -- |Index: pc98/include/npx.h |=== |--- pc98/include/npx.h (revision 215253) |+++ pc98/include/npx.h (working copy) -- Patching file pc98/include/npx.h using Plan A... Hunk #1 failed at 1. 1 out of 1 hunks failed--saving rejects to pc98/include/npx.h.rej This is because our patch(1) in base is somewhat old, I believe. The diff was generated by svn diff from the up to date stable/8 checkout, and the reason for failure is expanded $FreeBSD$ tags. Newer gnu patch, available in ports, handless this correctly, reporting about patches applied with fuzz. I tested with openssl and openvpn and all seems to work great on the via board and my i5 board!! Simple test details at http://www.tancsa.com/fpu.html I will try out geli and some more extensive tests tomorrow Thanks for porting this back to RELENG_8 ! This is actually somewhat puzzling. Does openssl in base automatically use crypto(4) ? I force it it via ssl.cnf 0(achinetboot)% tail -11 /etc/ssl/openssl.cnf openssl_conf = openssl_def [openssl_def] engines = openssl_engines [openssl_engines] padlock = cryptodev_engine [cryptodev_engine] default_algorithms = ALL 0(achinetboot)% Ah, that explains the results. The limiting factor here for ssh seems to be the 100Mb link my i5 box is on. Here is with and without aesni loaded 0(achinetboot)% /usr/bin/time scp -c aes128-cbc test.bin mdtan...@10.255.255.1:/dev/null test.bin 100% 88MB 11.0MB/s 00:08 8.14 real 0.44 user 0.57 sys 0(achinetboot)% /usr/bin/time scp -c aes128-cbc test.bin mdtan...@10.255.255.1:/dev/null test.bin 100% 88MB 11.0MB/s 00:08 8.15 real 1.46 user 0.36 sys 0(achinetboot)% I will move it to gigabit to get a better test shortly. Also, could you, please redo the speed tests for aesni(4) with the following patch applied over the driver sources ? Thank you ! diff --git a/sys/crypto/aesni/aesni_wrap.c b/sys/crypto/aesni/aesni_wrap.c index 36c66ea..3fd397c 100644 --- a/sys/crypto/aesni/aesni_wrap.c +++ b/sys/crypto/aesni/aesni_wrap.c @@ -246,14 +246,21 @@ int patch -p2 a Hmm... Looks like a unified diff to me... The text leading up to this was: -- |diff --git a/sys/crypto/aesni/aesni_wrap.c b/sys/crypto/aesni/aesni_wrap.c |index 36c66ea..3fd397c 100644 |--- a/sys/crypto/aesni/aesni_wrap.c |+++ b/sys/crypto/aesni/aesni_wrap.c -- Patching file crypto/aesni/aesni_wrap.c using Plan A... Hunk #1 succeeded at 246. Hunk #2 succeeded at 271. Hunk #3 succeeded at 324. Hmm... Ignoring the trailing garbage. done Seems to work ok 0(achinetboot)# kldload aesni 0(achinetboot)# openssl speed -evp aes-128-cbc To get the most accurate results, try to run this program when this computer is idle. Doing aes-128-cbc for 3s on 16 size blocks: 2587085 aes-128-cbc's in 0.39s Doing aes-128-cbc for 3s on 64 size blocks: 2425301 aes-128-cbc's in 0.38s Doing aes-128-cbc for 3s on 256 size blocks: 1925353 aes-128-cbc's in 0.19s Doing aes-128-cbc for 3s on 1024 size blocks: 1098255 aes-128-cbc's in 0.11s Doing aes-128-cbc for 3s on 8192 size blocks: 152631 aes-128-cbc's in 0.05s OpenSSL 0.9.8n 24 Mar 2010 built on: date not available options:bn(64,32) md2(int) rc4(idx,int) des(ptr,risc1,16,long) aes(partial) blowfish(idx) compiler: cc available timing options: USE_TOD HZ=128 [sysconf value] timing function used: getrusage The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes256 bytes 1024 bytes 8192 bytes aes-128-cbc 105979.48k 404781.84k 2632455.13k 9955323.90k 27619906.16k 0(achinetboot)# But there is a LOT of variation between runs for some reason. I added to http://www.tancsa.com/fpu.html the different runs Mike, thank you again. Would your conclusion be that the patch seems to increase the throughput of the aesni(4) ? I think that on small-sized blocks, when using aesni(4), the dominating factor is the copying/copyout of the data to/from the kernel address space. Still would be interesting to compare the full output of openssl speed on aesni(4) with and without the patch I posted. pgpC53U96rkuf.pgp Description: PGP signature
Call for testers: FPU changes
Hello, this is a call for testers of the merge of fpu_kern_enter/leave(9) to RELENG_8. The changes are required to fix some issues with VIA padlock engine, and to actually merge aesni(4) to RELENG_8. I ask to look at the possible FPU context handling regressions. Reports from the users of VIA padlock hardware are also needed. Any user that has suspend/resume magically working on 8 branch, please test that the patch does not make the things worse. Please note that the pre-release freeze will start in 2 weeks, so I need to get testing results relatively quickly to be in time for 8.2. Patch is at http://people.freebsd.org/~kib/misc/releng_8_fpu.1.patch Thanks in advance. pgp3FKznhbprw.pgp Description: PGP signature
Re: 8.1-STABLE: problem with unmounting ZFS snapshots
On Sat, Nov 13, 2010 at 01:09:55PM +0200, Andriy Gapon wrote: on 13/11/2010 13:06 Martin Matuska said the following: No, this is not good for us. Solaris does not allow mounting of snapshots on any vnode, like we do. Solaris has them only in .zfs/snapshots. This allows us to have read-only mounts without even mounting the parent zfs. Before v15 we have been happy with that code and had no issues :-) I have a very simple testcase where just fixing the VFS_RELE breaks our forced unmount. Let's say we use the correct VFS_RELE in zfs_vfsops.c: VFS_RELE(vfsp-mnt_vnodecovered-v_vfsp); Now let's say you have a mounted filesystem (e.g. md) under /mnt: /dev/md5 on /mnt (ufs, local) # mkdir /mnt/test # mount -t zfs t...@t2 /mnt/test # umount -f /mnt Now you will hang because the second VFS_HOLD. Hang here would be bad, I agree. But I think that the umount shouldn't succeed either, in this case. Normal unmount indeed shall not succeed in this case, because mount adds a reference to the covered vnode. But forced unmount should be allowed to proceed. After unmount, you can use fsid to unmount the lower mount point. So I stick to my opinion that this extra protection is more a problem than a solution in our case and it should be commented out. -- Andriy Gapon ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org pgpZDbybghpdJ.pgp Description: PGP signature
Re: AESNI
On Fri, Nov 12, 2010 at 09:58:53AM -0500, Daryl Richards wrote: I'm wondering what the status is of AES-NI in 8-STABLE? I can find references to it being put into 9 back in July, with the note that it would be MFC'ed within a month, but as far as I've been able to find, nothing after that. As a public service, since I already got several private mails, I will state the current situation: AESNI merge depends on the merge of r208833 and a lot of followups to r208833. The merge of r208833 needs r208453, that was committed to stable/8 only recently, since it required an approval from r...@. This delay together with lack of the time recently on my side makes me skeptical about chances to have aesni(4) in 8.2. Please note that aesni(4) is not needed to use AESNI in usermode. Did I miss something? Does it just need testing? How can I go about doing that? I'd like to help! Thanks, -- Daryl Richards Isle Technical Services Inc. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org pgpvTKCa8O3yK.pgp Description: PGP signature
Re: NFS deadlock (unkillable nfsd and no mounts work)
On Fri, Nov 05, 2010 at 10:27:09AM -0700, Josh Carroll wrote: I'm having a problem with nfsd hanging and not serving mount points, during which time it can not not be killed. This problem started happening sometime after November 2nd, since kernel from 11/2 sources does not exhibit this problem. Please try the attached patch, rick Thanks! I had to manually patch for some reason, but I can confirmed that nfsd is now well-behaved with your patch applied. I tested a couple of different mounts and played two separate files on the Popcorn Hour (one lower bitrate, the other higher bitrate) and both played without a hiccup. While those were playing I also was able to automount my home directory on the macbook and move around my home directory. So it looks like this patch did the trick. Thanks Rick, really appreciate the fast response. Is there a reason why this doesn't seem to be getting reported a lot? What is particular in my setup that broke it? ps: Starting about Monday I won't be able to do commits for about 3 weeks so, if this patch works, could someone else please commit it, thanks, rick If someone can commit this, I'd really appreciate it. I will report back if I notice any problems, but I imagine this would probably get fixed in HEAD first, then MFC'd anyway, right? Unless this is already fixed in HEAD. Anyway, thanks again Rick! I appreciate it. Regards, Josh As far as I can tell, there have been no adverse effects or regressions with the kernel built with this patch (I had t I agree that the fix a right fix for real issue. It should only affect the filesystems that do support VFS_VGET(). In other words, it is relevant for e.g. UFS exports, but not for ZFS, that is the Andrey case. The change is committed as r214851 with shortest MFC timeout possible. There is further issue with use of VOP_ISLOCKED(). Andrey, can you try this untested change in your settings ? Thanks and sorry. diff --git a/sys/nfsserver/nfs_serv.c b/sys/nfsserver/nfs_serv.c index 2b9131f..668b02c 100644 --- a/sys/nfsserver/nfs_serv.c +++ b/sys/nfsserver/nfs_serv.c @@ -3037,6 +3037,7 @@ nfsrv_readdirplus(struct nfsrv_descript *nfsd, struct nfssvc_sock *slp, struct vattr va, at, *vap = va; struct nfs_fattr *fp; int len, nlen, rem, xfer, tsiz, i, error = 0, error1, getret = 1; + int vp_locked; int siz, cnt, fullsiz, eofflag, rdonly, dirlen, ncookies; u_quad_t off, toff, verf; u_long *cookies = NULL, *cookiep; /* needs to be int64_t or off_t */ @@ -3067,10 +3068,12 @@ nfsrv_readdirplus(struct nfsrv_descript *nfsd, struct nfssvc_sock *slp, fullsiz = siz; error = nfsrv_fhtovp(fhp, 1, vp, vfslocked, nfsd, slp, nam, rdonly, TRUE); + vp_locked = 1; if (!error vp-v_type != VDIR) { error = ENOTDIR; vput(vp); vp = NULL; + vp_locked = 0; } if (error) { nfsm_reply(NFSX_UNSIGNED); @@ -3090,6 +3093,7 @@ nfsrv_readdirplus(struct nfsrv_descript *nfsd, struct nfssvc_sock *slp, error = nfsrv_access(vp, VEXEC, cred, rdonly, 0); if (error) { vput(vp); + vp_locked = 0; vp = NULL; nfsm_reply(NFSX_V3POSTOPATTR); nfsm_srvpostop_attr(getret, at); @@ -3097,6 +3101,7 @@ nfsrv_readdirplus(struct nfsrv_descript *nfsd, struct nfssvc_sock *slp, goto nfsmout; } VOP_UNLOCK(vp, 0); + vp_locked = 0; rbuf = malloc(siz, M_TEMP, M_WAITOK); again: iv.iov_base = rbuf; @@ -3110,6 +3115,7 @@ again: io.uio_td = NULL; eofflag = 0; vn_lock(vp, LK_SHARED | LK_RETRY); + vp_locked = 1; if (cookies) { free((caddr_t)cookies, M_TEMP); cookies = NULL; @@ -3118,6 +3124,7 @@ again: off = (u_quad_t)io.uio_offset; getret = VOP_GETATTR(vp, at, cred); VOP_UNLOCK(vp, 0); + vp_locked = 0; if (!cookies !error) error = NFSERR_PERM; if (!error) @@ -3238,8 +3245,10 @@ again: } else { cn.cn_flags = ~ISDOTDOT; } - if (!VOP_ISLOCKED(vp)) + if (!vp_locked) { vn_lock(vp, LK_SHARED | LK_RETRY); + vp_locked = 1; + } if ((vp-v_vflag VV_ROOT) != 0 (cn.cn_flags ISDOTDOT) != 0) { vref(vp); @@ -3342,7 +3351,7 @@ invalid: cookiep++; ncookies--; } - if (!usevget VOP_ISLOCKED(vp)) + if (!usevget vp_locked) vput(vp); else
Re: 8.1-STABLE: zfs and sendfile: problem still exists
On Sat, Oct 30, 2010 at 05:43:54PM +0300, Andriy Gapon wrote: on 30/10/2010 14:25 Artemiev Igor said the following: On Sat, Oct 30, 2010 at 01:33:00PM +0300, Andriy Gapon wrote: on 30/10/2010 13:12 Artemiev Igor said the following: On Sat, Oct 30, 2010 at 12:52:54PM +0300, Andriy Gapon wrote: Heh, next try. Got a panic, vm_page_unwire: invalid wire count: 0 Oh, thank you for testing - forgot another piece (VM_ALLOC_WIRE for vm_page_alloc): Yep, it work. But VM_ALLOC_WIRE not exists in RELENG_8, therefore i slightly modified your patch: I apologize for my haste, it should have been VM_ALLOC_WIRED. Here is a corrected patch: Index: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c === --- sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c (revision 214318) +++ sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c (working copy) @@ -67,6 +67,7 @@ #include sys/sf_buf.h #include sys/sched.h #include sys/acl.h +#include vm/vm_pageout.h /* * Programming rules. @@ -464,7 +465,7 @@ uiomove_fromphys(m, off, bytes, uio); VM_OBJECT_LOCK(obj); vm_page_wakeup(m); - } else if (m != NULL uio-uio_segflg == UIO_NOCOPY) { + } else if (uio-uio_segflg == UIO_NOCOPY) { /* * The code below is here to make sendfile(2) work * correctly with ZFS. As pointed out by ups@ @@ -474,9 +475,23 @@ */ KASSERT(off == 0, (unexpected offset in mappedread for sendfile)); - if (vm_page_sleep_if_busy(m, FALSE, zfsmrb)) + if (m != NULL vm_page_sleep_if_busy(m, FALSE, zfsmrb)) goto again; - vm_page_busy(m); + if (m == NULL) { + m = vm_page_alloc(obj, OFF_TO_IDX(start), + VM_ALLOC_NOBUSY | VM_ALLOC_WIRED | VM_ALLOC_NORMAL); + if (m == NULL) { + VM_OBJECT_UNLOCK(obj); + VM_WAIT; + VM_OBJECT_LOCK(obj); + goto again; + } + } else { + vm_page_lock_queues(); + vm_page_wire(m); + vm_page_unlock_queues(); + } + vm_page_io_start(m); Why wiring the page if it is busied ? pgp8p8bSN9Uij.pgp Description: PGP signature
Re: 8.1-STABLE: zfs and sendfile: problem still exists
On Fri, Oct 29, 2010 at 06:31:21PM +0400, Alexander Zagrebin wrote: I've tried the nginx with disabled sendfile (the nginx.conf contains sendfile off;): $ dd if=/dev/random of=test bs=1m count=100 100+0 records in 100+0 records out 104857600 bytes transferred in 5.892504 secs (17795083 bytes/sec) $ fetch -o /dev/null http://localhost/test /dev/null 100% of 100 MB 41 MBps $ fetch -o /dev/null http://localhost/test /dev/null 100% of 100 MB 44 MBps $ fetch -o /dev/null http://localhost/test /dev/null 100% of 100 MB 44 MBps I am really surprised with such a bad performance of sendfile. Will you be able to profile the issue further? Yes. I will also try to think of some measurements. A transfer rate is too low for the _first_ attempt only. Further attempts demonstrates a reasonable transfer rate. For example, nginx with sendfile on;: $ dd if=/dev/random of=test bs=1m count=100 100+0 records in 100+0 records out 104857600 bytes transferred in 5.855305 secs (17908136 bytes/sec) $ fetch -o /dev/null http://localhost/test /dev/null 3% of 100 MB 118 kBps 13m50s^C fetch: transfer interrupted $ fetch -o /dev/null http://localhost/test /dev/null 100% of 100 MB 39 MBps If there was no access to the file during some time, then everything repeats: The first attempt - transfer rate is too low A further attempts - no problems Can you reproduce the problem on your system? Could it be the priming of the vm object pages content ? Due to double-buffering, and (possibly false) optimization to only perform double-buffering when vm object already has some data cached, reads can prime vm object page list before file is mmapped or sendfile-ed. pgpnA8KHQc5Dk.pgp Description: PGP signature
Re: 8.1-STABLE: zfs and sendfile: problem still exists
On Fri, Oct 29, 2010 at 06:05:26PM +0300, Andriy Gapon wrote: on 29/10/2010 17:53 Kostik Belousov said the following: Could it be the priming of the vm object pages content ? Sorry, not familiar with this term. Do you mean prepopulation of vm object with valid pages? Due to double-buffering, and (possibly false) optimization to only What optimization? On zfs vnode read, the page from the corresponding vm object is only populated with the vnode data if the page already exists in the object. Not doing the optimization would be to allocate the page uncoditionally on the read if not already present, and copy the data from ARC to the page. perform double-buffering when vm object already has some data cached, reads can prime vm object page list before file is mmapped or sendfile-ed. No double-buffering is done to optimize anything. Double-buffering is a consequence of having page cache and ARC. The special double-buffering code is to just handle that fact - e.g. making sure that VOP_READ reads data from page cache instead of ARC if it's possible that the data in them differs (i.e. page cache has more recent data). So, if I understood the term 'priming' correctly, no priming should ever occur. The priming is done on the first call to VOP_READ() with the right offset after the page is allocated. pgpsWIastHVGc.pgp Description: PGP signature
Re: 8.1-STABLE: zfs and sendfile: problem still exists
On Fri, Oct 29, 2010 at 06:22:54PM +0300, Andriy Gapon wrote: on 29/10/2010 18:17 Kostik Belousov said the following: On Fri, Oct 29, 2010 at 06:05:26PM +0300, Andriy Gapon wrote: on 29/10/2010 17:53 Kostik Belousov said the following: Could it be the priming of the vm object pages content ? Sorry, not familiar with this term. Do you mean prepopulation of vm object with valid pages? Due to double-buffering, and (possibly false) optimization to only What optimization? On zfs vnode read, the page from the corresponding vm object is only populated with the vnode data if the page already exists in the object. Do you mean a specific type of read? For normal reads it's the other way around - if the page already exists and is valid, then we read from the page, not from ARC. Let me repeat it once more: zfs does not properly caches the vnode data content in the page cache (the cache is used in a weaker sence, not meaning the freebsd 'cached' memory, but a cache in more common sence). Not doing the optimization I mentioned would mean always allocating the pages and making it (partially) valid for each read call. Not doing the optimization would be to allocate the page uncoditionally on the read if not already present, and copy the data from ARC to the page. perform double-buffering when vm object already has some data cached, reads can prime vm object page list before file is mmapped or sendfile-ed. No double-buffering is done to optimize anything. Double-buffering is a consequence of having page cache and ARC. The special double-buffering code is to just handle that fact - e.g. making sure that VOP_READ reads data from page cache instead of ARC if it's possible that the data in them differs (i.e. page cache has more recent data). So, if I understood the term 'priming' correctly, no priming should ever occur. The priming is done on the first call to VOP_READ() with the right offset after the page is allocated. Again, what is priming? Filling the cache with an appropriate content. pgpc8DbIfno18.pgp Description: PGP signature
Re: kpanic on install 32GB of RAM [SEC=UNCLASSIFIED]
On Thu, Oct 21, 2010 at 09:50:03AM -0700, Sean Bruno wrote: On Thu, 2010-10-21 at 05:48 -0700, Andriy Gapon wrote: on 20/10/2010 21:28 Sean Bruno said the following: I guess, I could replace the kernel on the CD and have them reburn it? That should work. BTW, here I described yet another way of building custom recovery/installation CDs that I use: http://wiki.freebsd.org/AvgLiveCD Before I get started on this, it looks like something else is going on. Here is a panic + trace on the latest 9-current snap shot. hammer time indeed. Suggestions are welcome! http://people.freebsd.org/~sbruno/9-current-panic.png http://people.freebsd.org/~sbruno/9-current-trace-panic.png It feels like msgbufp variable has absurd value. Can you arrange to get the output of verbose boot, esp. the SMAP lines ? Also, you could add printfs near amd64/amd64/machdep.c:1517 /* Map the message buffer. */ msgbufp = (struct msgbuf *)PHYS_TO_DMAP(phys_avail[pa_indx]); to show the values of all participants, i.e. msgbufp, pa_indx and phys_avail[pa_indx]. pgpl9NWXh3FQ7.pgp Description: PGP signature
Re: Panic with chromium and 8.1-STABLE (Thu Sep 16 09:52:17 BRT 2010)
On Sun, Sep 19, 2010 at 07:28:13PM -0300, Mario Sergio Fujikawa Ferreira wrote: Hi, I've just began trying chrome web browser from http://chromium.hybridsource.org/ but it triggered 2 panics on my 8.1-STABLE system. $ uname -a FreeBSD exxodus.fedaykin.here 8.1-STABLE FreeBSD 8.1-STABLE #26: Thu Sep 16 09:52:17 BRT 2010 li...@exxodus:/usr/obj/usr/src/sys/LIOUX amd64 The panic information is: panic: vm_page_unwire: invalid wire count: 0 cpuid = 0 KDB: enter: panic 0xff006ecce000: tag ufs, type VREG usecount 1, writecount 1, refcount 4 mountedhere 0 flags () v_object 0xff0151489870 ref 0 pages 8 lock type ufs: EXCL by thread 0xff00200947c0 (pid 25025) ino 119526591, on dev ufs/fsusr 0xff011107f938: tag ufs, type VREG usecount 0, writecount 0, refcount 4 mountedhere 0 flags (VV_NOSYNC|VI_DOINGINACT) v_object 0xff0151f7f870 ref 0 pages 1284 lock type ufs: EXCL by thread 0xff01882cc7c0 (pid 26689) ino 263, on dev md0 I've made available 2 ddb textdumps at: http://people.freebsd.org/~lioux/panic/2010091900/textdump.tar.0 http://people.freebsd.org/~lioux/panic/2010091900/textdump.tar.1 I was able to use chrome prior to this latest kernel update. Now, I can reproduce a kernel panic even browsing www.google.com Please, let me know if I can provide any further information. Does it panic if you remove ZERO_COPY_SOCKETS option from the kernel config ? pgpN8BHEAqKFh.pgp Description: PGP signature
Re: Panic with chromium and 8.1-STABLE (Thu Sep 16 09:52:17 BRT 2010)
On Wed, Sep 22, 2010 at 03:58:12PM -0400, jhell wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 09/22/2010 09:28, Kostik Belousov wrote: On Sun, Sep 19, 2010 at 07:28:13PM -0300, Mario Sergio Fujikawa Ferreira wrote: Hi, I've just began trying chrome web browser from http://chromium.hybridsource.org/ but it triggered 2 panics on my 8.1-STABLE system. $ uname -a FreeBSD exxodus.fedaykin.here 8.1-STABLE FreeBSD 8.1-STABLE #26: Thu Sep 16 09:52:17 BRT 2010 li...@exxodus:/usr/obj/usr/src/sys/LIOUX amd64 The panic information is: panic: vm_page_unwire: invalid wire count: 0 cpuid = 0 KDB: enter: panic 0xff006ecce000: tag ufs, type VREG usecount 1, writecount 1, refcount 4 mountedhere 0 flags () v_object 0xff0151489870 ref 0 pages 8 lock type ufs: EXCL by thread 0xff00200947c0 (pid 25025) ino 119526591, on dev ufs/fsusr 0xff011107f938: tag ufs, type VREG usecount 0, writecount 0, refcount 4 mountedhere 0 flags (VV_NOSYNC|VI_DOINGINACT) v_object 0xff0151f7f870 ref 0 pages 1284 lock type ufs: EXCL by thread 0xff01882cc7c0 (pid 26689) ino 263, on dev md0 I've made available 2 ddb textdumps at: http://people.freebsd.org/~lioux/panic/2010091900/textdump.tar.0 http://people.freebsd.org/~lioux/panic/2010091900/textdump.tar.1 I was able to use chrome prior to this latest kernel update. Now, I can reproduce a kernel panic even browsing www.google.com Please, let me know if I can provide any further information. Does it panic if you remove ZERO_COPY_SOCKETS option from the kernel config ? This is triggered as well on a system without ZERO_COPY_SOCKETS just to clear that bit up. I do not know what did prompted you to decide that the issue is the same. There is nothing common except the word panic in the report by lioux and your backtraces. You could have better luck showing your traces on the fs@ or asking zfs porters directly. pgpY0ktYoKGOu.pgp Description: PGP signature
Re: SuperMicro i7 (UP) - very slow performance
On Sat, Sep 18, 2010 at 08:32:32AM -0500, Bryce Edwards wrote: I have a Supermicro with the C7X58 motherboard and an i7 930 cpu, and it is nowhere near the performance it should be. A buildworld just took 22.5 hours! I use 5046A-XB with i7-930 as home workstation, running latest RELENG_8, and I do not have the problem you noted. My BIOS is v1.1, USB legacy is enabled. I did noted one issue with hw, built-in firewire controller generated too high interrupt rate, so I usually do not load firewire.ko unless needed. br...@tahiti[~]uname -a FreeBSD tahiti.bryce.net 8.1-STABLE FreeBSD 8.1-STABLE #0: Tue Sep 7 22:45:38 CDT 2010 r...@tahiti.bryce.net:/usr/obj/usr/src/sys/GENERIC amd64 I have disabled Legacy USB Support in the BIOS and that helped, but I'm not finding any other setting that are getting things where they need to be. I have tested the two system drives independently (currently a zfs mirror), so it is not likely to be an hdd issue. Here's the verbose dmesg boot details - http://www.bryce.net/files/dmesg.boot And, the IPMI ASL in case that is of any value - http://www.bryce.net/files/tahiti.asl Currently, I'm not running powerd, performance is not better with it running. r...@tahiti[/usr/src]#cat /boot/loader.conf ahci_load=YES coretemp_load=YES zfs_load=YES vfs.root.mountfrom=zfs:system #vfs.zfs.prefetch_disable=1 kern.maxfiles=16384 # async i/o aio_load=YES # VirtualBox #vboxdrv_load=YES # SMB #ichsmb_load=YES #smb_load=YES # Power Saving #kern.hz=100 # Disable APIC subsystem - no longer needed when disabling lapic below #hint.apic.0.disabled=1 # Disable local APIC (LAPIC) timer - for C3 state #hint.apic.0.clock=0 # Avoid 128 interrupts/sec per core, at cost of scheduling precision #hint.atrtc.0.clock=0 # Disable throttle control (and rely on EIST) hint.p4tcc.0.disabled=1 hint.acpi_throttle.0.disabled=1 Thanks in advance for your time! ::Bryce:: ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org pgpSYvjBzVbps.pgp Description: PGP signature
Re: strange problem with FreeBSD 7.3 64bit
On Fri, Sep 10, 2010 at 10:45:08AM +0200, freebsd wrote: hi list, we upgraded some 20 boxes from 7.1 and 7.2 to 7.3-RELEASE-p2 (all amd64) and now are experiencing some weird behaviour on 6 of them with rsnapshot: after a few days/several weeks (seems to be completely random), rsnapshot reports that it can't start due it's lockfile and process still being present. on such boxes either a zombie rm or find process (which presumably were launched by rsnapshot) can be found. if the backup was done to a separate partition (physical disks or RAIDs) any access (ls, stat, fsck, etc) to the partition would kill the current SSH session, creating a new zombie of the process one just started. unmounting the affected partition would render the server completely unresponsive and required a hardware reset. when trying to restart, the machines wouldn't even shut down completely but hanged somewhere after syncing buffers, only a hardware reset worked. after the reboot, those partitions were unmounted and fscked. after which the backups would work again until the next error happened again. the hardware of affected and unaffected system are: HP ProLiant DL380 G4 HP ProLiant DL380 G5 HP ProLiant DL360 G5 there is no visible pattern between affected and unaffected boxes. also those machines were upgraded the exact same way, running identical kernels (more or less GENERIC, with QUOTA activated). we upgraded the most critical boxes which showed that behaviour on a daily interval to 8.0-RELEASE and ever since this behavior has disappeared since nearly 3 months now. we installed a debug-kernel on an affected box, but the machine wouldn't panic when the error occured. when trying to unmount the affected partition it just went completely unresponsive, as mentioned above. before trying to unmount procstat -ak showed some processes with VOP_LOCK1_APV: 55396 100135 find - mi_switch sleepq_switch sleepq_wait _sleep acquire _lockmgr ffs_lock VOP_LOCK1_APV _vn_lock vget cache_lookup vfs_cache_lookup VOP_LOOKUP_APV lookup namei kern_lstat lstat syscall 70923 100146 rsync - mi_switch sleepq_switch sleepq_wait _sleep acquire _lockmgr ffs_lock VOP_LOCK1_APV _vn_lock vget vfs_hash_get ffs_vgetf ufs_lookup_ vfs_cache_lookup OP_LOOKUP_APV lookup namei kern_lstat since this hardware has been working before 7.3 and -- as we assume -- would work again with 8.*, we would be grateful for any hints what could be the cause of all this. It sounds like a deadlock, but the cause cannot be identified without further diagnostic. It might be driver (ciss I assume), but may be quota code, or even something else. Please follow the http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug-deadlocks.html to obtain the required information. pgp7ynd7eg2du.pgp Description: PGP signature
Re: strange problem with FreeBSD 7.3 64bit
On Fri, Sep 10, 2010 at 12:04:50PM +0200, freebsd wrote: Am 10.09.2010 11:21, schrieb Kostik Belousov: On Fri, Sep 10, 2010 at 10:45:08AM +0200, freebsd wrote: It sounds like a deadlock, but the cause cannot be identified without further diagnostic. It might be driver (ciss I assume), but may be quota code, or even something else. Please follow the http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug-deadlocks.html to obtain the required information. thanks for the quick answer. we've added the additional options to debug deadlocks. we'll have the required information in the timeframe of 1-2 weeks, since the testbox isn't that fast at generating the error. QUOTA most likely isn't the culprit, since 2 of the affected 6 boxes were running GENERIC w/o any modifications. Ah. Then, the ciss(4) is the main suspect, but I cannot help with it. pgpRF6jA2MDow.pgp Description: PGP signature
Re: csup in repomirror mode dumps core @ stable/8
On Thu, Sep 02, 2010 at 03:59:07AM +0400, Dmitry Morozovsky wrote: Dear colleagues, some 2 days ago my repo mirror (stable/8...@amd64) starts dumping core on copying repo: ... SetAttrs CVSROOT-src/Emptydir Edit CVSROOT-src/access,v Segmentation fault (core dumped) deleting files from sup/cvsroot-all/ did not help unfortunately, quick usual `make -DDEBUG_FLAGS=-g' in /usr/src/usr.bin/csup does not work, and I did not dig into this deeply yet, so trace are without parameters: I think it should be DEBUG_FLAGS=-g and not -D pgpmHR1fWSTWG.pgp Description: PGP signature
Re: STABLE kernel panic: privileged instruction fault
On Mon, Aug 16, 2010 at 07:15:16PM +0400, Alexey Tarasov wrote: Hello. I have a couple of Supermicro servers which got the similar kernel panic with all FreeBSD versions I tried since 6.4. Now I want to investigate into the problem. The servers get into panic with similar workload: file server with a lot of files and connections. Web server software is nginx. File system is UFS+GJOURNAL. Outgoing traffic on each server is ~10 MB/s. I think it is not software problem, because when I've installed Linux with such configuration there were no kernel panics. Here is the short overview of the hardware: CPU: Intel(R) Pentium(R) 4 CPU 3.00GHz (2992.51-MHz K8-class CPU) Origin = GenuineIntel Id = 0xf65 Family = f Model = 6 Stepping = 5 Features=0xbfebfbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE Features2=0xe59dSSE3,DTES64,MON,DS_CPL,EST,TM2,CNXT-ID,CX16,xTPR,PDCM AMD Features=0x20100800SYSCALL,NX,LM AMD Features2=0x1LAHF TSC: P-state invariant real memory = 2147483648 (2048 MB) avail memory = 2054619136 (1959 MB) DMESG: http://lexasoft.ru/m/dmesg.txt CORE: http://lexasoft.ru/m/core.txt Fatal trap 1: privileged instruction fault while in kernel mode cpuid = 1; apic id = 01 instruction pointer = 0x20:0xff8040d2cc83 stack pointer = 0x28:0xff8040d2ca80 frame pointer = 0x28:0xff0060c0b740 code segment= base 0x0, limit 0xf, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags= interrupt enabled, resume, IOPL = 0 current process = 9388 (nginx) trap number = 1 panic: privileged instruction fault cpuid = 1 Uptime: 17d15h48m49s Physical memory: 2032 MB Dumping 1485 MB: 1470 1454 1438 1422 1406 1390 1374 1358 1342 1326 1310 1294 1278 1262 1246 1230 1214 1198 1182 1166 1150 1134 1118 1102 1086 1070 1054 1038 1022 1006 990 974 958 942 926 910 894 878 862 846 830 814 798 782 766 750 734 718 702 686 670 654 638 622 606 590 574 558 542 526 510 494 478 462 446 430 414 398 382 366 350 334 318 302 286 270 254 238 222 206 190 174 158 142 126 110 94 78 62 46 30 14 (kgdb) #0 doadump () at pcpu.h:223 #1 0x80590c59 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:416 #2 0x8059108c in panic (fmt=0x80951fc4 %s) at /usr/src/sys/kern/kern_shutdown.c:579 #3 0x80878fd8 in trap_fatal (frame=0xff0060c0b740, eva=Variable eva is not available. ) at /usr/src/sys/amd64/amd64/trap.c:857 #4 0x808799ea in trap (frame=0xff8040d2c9d0) at /usr/src/sys/amd64/amd64/trap.c:644 #5 0x8085f983 in calltrap () at /usr/src/sys/amd64/amd64/exception.S:224 #6 0xff8040d2cc83 in ?? () #7 0xff8040d2cb50 in ?? () #8 0xff8040d2caf0 in ?? () #9 0xff8040d2cbf0 in ?? () #10 0xff0060c0b740 in ?? () #11 0x80b83c60 in sysent () #12 0xff8040d2cc80 in ?? () #13 0xff8040d2cae0 in ?? () #14 0x8059c431 in bintime (bt=0x80ad3140) at /usr/src/sys/kern/kern_tc.c:200 Previous frame inner to this frame (corrupt stack?) (kgdb) The backtrace make absolutely no sense. I would not trust kgdb anyway. Compile ddb in and do backtrace in console on the panic. Also, disassemble the kernel at the fault address. I am very curious which instruction causes this. This is stock GENERIC on the bare metal booted, right ? pgp0KYiAf9rFf.pgp Description: PGP signature
Re: STABLE kernel panic: privileged instruction fault
On Mon, Aug 16, 2010 at 11:21:15PM +0400, Alexey Tarasov wrote: Hello Kostik! On Aug 16, 2010, at 10:48 PM, Kostik Belousov wrote: The backtrace make absolutely no sense. I would not trust kgdb anyway. Compile ddb in and do backtrace in console on the panic. Also, disassemble the kernel at the fault address. I am very curious which instruction causes this. This is stock GENERIC on the bare metal booted, right ? Yes, stock GENERIC. Please, check this out: Dump of assembler code from 0xff0060c0b700 to 0xff0060c0b780: Would be nice if you keep all requested data in one place, so that we do not need to search for the old mails to see the context. According to your previous mail, the fault happen at the address instruction pointer = 0x20:0xff8040d2cc83 Your disassembled the stack instead. Please just do disass 0xff8040d2cc83,0xff8040d2cca0 in kgdb. But also, I want to see the backtrace and disassembly output from ddb. pgp2XZZMDRqkp.pgp Description: PGP signature
Re: STABLE kernel panic: privileged instruction fault
On Mon, Aug 16, 2010 at 11:35:36PM +0400, Alexey Tarasov wrote: On Aug 16, 2010, at 11:31 PM, Kostik Belousov wrote: On Mon, Aug 16, 2010 at 11:21:15PM +0400, Alexey Tarasov wrote: Hello Kostik! On Aug 16, 2010, at 10:48 PM, Kostik Belousov wrote: The backtrace make absolutely no sense. I would not trust kgdb anyway. Compile ddb in and do backtrace in console on the panic. Also, disassemble the kernel at the fault address. I am very curious which instruction causes this. This is stock GENERIC on the bare metal booted, right ? Yes, stock GENERIC. Please, check this out: Dump of assembler code from 0xff0060c0b700 to 0xff0060c0b780: Would be nice if you keep all requested data in one place, so that we do not need to search for the old mails to see the context. According to your previous mail, the fault happen at the address instruction pointer = 0x20:0xff8040d2cc83 Your disassembled the stack instead. Please just do disass 0xff8040d2cc83,0xff8040d2cca0 in kgdb. But also, I want to see the backtrace and disassembly output from ddb. (kgdb) disass 0xff8040d2cc83,0xff8040d2cca0 No function contains specified address. Err, it seems that old gdb accepts only spaces. Please try disass 0xff8040d2cc83 0xff8040d2cca0 instead. I will build kernel with DDB tomorrow, install it on some servers and wait for the panic occurs. Ok. Did you checked for such things as rootkits ? pgpTly6pt0t7A.pgp Description: PGP signature
Re: Kernel symbol file alternate location
On Fri, Aug 06, 2010 at 09:29:31AM +0200, Oliver Fromme wrote: Daniel O'Connor wrote: On 06/08/2010, at 2:38, Oliver Fromme wrote: I think this is the main reason / has had to grow - the actual kernel is relatively small so even a 256Mb / could hold several, but with the symbol files it is not possible. I think a very simple solution would be to install the symbol files elsewhere (probably configurable via make.conf), and install symlinks in the kernel directory. If you do this, tools using the symbol files won't have to be changed. This would probably be a fairly trivial change to the install- kernel target, I guess. I don't have patches, though. Yeah, I don't think it's hard to move them, however I'm worried what it will break :) The only thing I can see that would have to change would be kgdb so it tells gdb where to find the symbols. That's why I suggested to place symlinks in the kernel directory. No change to kgdb necessary. It might even be possible to not install the symbol files at all, but keep them under /usr/obj, so the installkernel target would have to do nothing more than create symlinks. This could be controlled by a make.conf variable, like SYMLINK_SYMBOLS=YES (NO would be the existing behaviour of installing the actual symbol files in /boot/kernel). If you keep /usr/obj around, you do not need symbol files at all, and INSTALL_NODEBUG?=true in make.conf is enough. You can always use kernel.debug and modules with debugging symbols from build directory for kgdb. pgpaoijv6x887.pgp Description: PGP signature