Re: 9-stable from i386 to amd64
In the last episode (Feb 10), Randy Bush said: > is there a recipe for moving from i386 to amd64? > > on a very remote system, i made the migration from 7.4 to 8.2 to 9.0, all > 32-bit. it was done with repeated > > make buildworld > make kernel.new [0] > nextboot -k kernel.new > reboot > make installworld > etc > > [0] - well, there were some mv(1)s in there :) > > so after it was happy with 9.0 i386, i went to move to amd64 with > > make buildworld TARGET=amd64 > make kernel TARGET=amd64 DESTDIR=kernel.new [0] > nextboot -k kernel.new > reboot > > it did not come back from the reboot, and required a manual reset. i have > no console access to the machine, not my choice. > > clue bat please. You probably got bit by a mismatched /libexec/ld-elf.so. The kernel expects that to be the "native" version, and on a 64-bit kernel it also expects a ld-elf32.so to be the "compat" 32-bit version. When you rebooted onto the 64-bit kernel, it couldn't find /libexec/ld-elf32.so to run any of the 32-bit binaries on the system. My guess is that your reboot attempt died in /sbin/init, prompting for a path to /bin/sh. If you compiled with a static /bin/sh for performance, it probably died very early in /etc/rc. I think copying ld-elf.so over to ld-elf32.so might have been all you needed to boot, but that would end up with a 64-bit kernel running a true 32-bit userland with all the libraries in the "wrong" place, and your "installworld" step would replace them with their 64-bit equivalents and your install would die halfway through, leaving you with a large mess to clean up. The cleanest upgrade path is to prepare your 32-bit root to be bootable by both 32- and 64-bit kernels: copy the ld-elf32.so that was built during your buildworld over to /libexec/ld-elf32.so, and also make copies of /lib and /usr/lib to /lib32 and /usr/lib32 respectively. That way when you reboot to a 64-bit kernel, your 32-bit executables will be running "correctly" out of compat32 paths and your installworld should succeed. When I did all this on a local system, I made judicious use of ZFS snapshots and clones, preserving a bootable clone of my original system plus intermediate versions all the way until I was happy with the result. I've never done it completely remotely, but if you do a trial run or two on a local machine or VM, you should be able to it confidently remotely. -- Dan Nelson dnel...@allantgroup.com ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: tmpfs nfs exports?
In the last episode (Oct 30), jb said: > Alfred Perlstein mu.org> writes: > > Hey folks, any reason why not to include the following patch in 9.1? It > > would be nice to have tmpfs be exportable. > > > > I'm good to commit it, I can also wait until post 9.1. > > ... > > How do you identify tmpfs ? With fsid ? > > Since nfs server is stateless, are these exports identical ? > export /tmp, reboot, export /tmp > > What about /tmp on tmpfs ? > export /tmp, reboot, export /tmp I wanted to do the exact same thing a few years ago. I patched mdmfs and the startup scripts to allow for an fsid value to be passed to mdmfs on every reboot. That works for the filesystem itself, but then you have to contend with the random NFS generation number on every inode. I decided it wasn't worth the trouble at that point. If you really want an exportable /tmp, just live with the fact that you'll get ESTALE errors on all clients when you reboot the server. Maybe giving the root inode a constant generation number is all that's needed, since I suppose most clients that have mounted the server don't actually have any open filehandles. -- Dan Nelson dnel...@allantgroup.com ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: recommended memory for zfs
In the last episode (May 09), Benjamin Adams said: > Hello zfs question about memory. > I heard zfs is very ram hungry. > Service looking to run: > - nginx > - postgres > - php-fpm > - python > > I have a machine with two quad core cpus but only 4 G Memory > > I'm looking to buy more ram now. > What would be the recommend amount of memory for zfs across 6 drives on > this setup? As much as is reasonable to purchase. Postgres would probably appreciate the memory more than ZFS. You can run ZFS on memory-limited machines (I've gone as far down as 256MB), but the critical part is running a 64-bit kernel. ZFS does a lot of kernel malloc/free operations, and address space fragmentation on a 32-bit system will eventually cause a panic when ZFS can't malloc a contiguous 128k chunk. -- Dan Nelson dnel...@allantgroup.com ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Network throughput: Never get more than 112MB/s über two NICs
In the last episode (Apr 12), Denny Schierz said: > Am Montag, den 11.04.2011, 21:52 +0200 schrieb Denny Schierz: > > Am 11.04.2011 um 20:06 schrieb Tim Daneliuk: > > > Are you certain you are not somehow running active-passive instead of > > > active-active ... just a thought... > > > > 150% sure. I used two dedicated NICs WITHOUT any loadbalancing. The sum > > has to be more than 112MB/s. > > it must me the network. I tested two crossover connections and I've got > 220MB/s :-) Check to see whether your switch ports are oversubscribed (common for older blade switches, or very high-density blades); sometimes there will be rectangles enclosing groups of 6-8 ports, which means that they are controlled by a single chip internally. Moving each of your test machines to a separate group may improve your performance. -- Dan Nelson dnel...@allantgroup.com ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Network throughput: Never get more than 112MB/s über two NICs
In the last episode (Apr 12), Dan Nelson said: > In the last episode (Apr 12), Denny Schierz said: > > Am Montag, den 11.04.2011, 21:52 +0200 schrieb Denny Schierz: > > > Am 11.04.2011 um 20:06 schrieb Tim Daneliuk: > > > > Are you certain you are not somehow running active-passive instead of > > > > active-active ... just a thought... > > > > > > 150% sure. I used two dedicated NICs WITHOUT any loadbalancing. The sum > > > has to be more than 112MB/s. > > > > it must me the network. I tested two crossover connections and I've got > > 220MB/s :-) > > Check to see whether your switch ports are oversubscribed (common for older > blade switches, or very high-density blades); sometimes there will be > rectangles enclosing groups of 6-8 ports, which means that they are > controlled by a single chip internally. Moving each of your test machines > to a separate group may improve your performance. .. I missed a line in your original post: > > All are connected through a Cisco Catalyst WS-X4515. This is a supervisor module for a 4500 series chassis, but only has two SFP ports on it. Your servers are unlikely to be plugged into it. They're probably plugged into another module. This page lists some gigabit ethernet modules that oversubscribe their ports, and which ports belong to which groups: http://www.cisco.com/en/US/docs/switches/lan/catalyst4500/hardware/module/guide/03instal.html#wpxref23495 -- Dan Nelson dnel...@allantgroup.com ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Large number of SATA commits (MFCs) to RELENG_8
In the last episode (Apr 21), Doug Barton said: > On 04/20/2011 19:43, Lystopad Olexandr wrote: > > May be we need another one file, like src/ChangeLog ? > > Users who run a -stable branch are expected to read > freebsd-stable@FreeBSD.org (note, not just subscribe), AND read the > commit mail for their branch; just like users who run HEAD are expected > to read freebsd-current@ and the relevant commit mail. I use a small shell script called "update" that does a "svn update", and also prints a line at the end that you can copy&paste into another terminal to get the log of what was just pulled. #! /bin/sh stat=$(svn status --depth empty -v -u) localrev=$(echo "$stat" | cut -c10- | awk 'NR==1 {print $2}') latestrev=$(echo "$stat" | awk 'NR==2 {print $4}') repo=$(svn info | sed -ne '/^URL/s/^.*: //p') echo "$stat" svn info | grep Revision svn update if [ "$localrev" != "$latestrev" ] ; then echo "Log:" echo "svn log -v -r $(($localrev+1)):$latestrev $repo" fi Sample output: (root@dan) /usr/src # ./update M 220902 220902 jilles . Status against revision: 220927 Revision: 220902 Usbin/conscontrol/conscontrol.c Usbin/conscontrol/conscontrol.8 U sbin/conscontrol Usys/kern/uipc_sockbuf.c Usys/kern/kern_exit.c Usys/netgraph/ng_base.c U sys/contrib/pf U sys/contrib/dev/acpica U sys/cddl/contrib/opensolaris U sys/amd64/include/xen Usys/sys/proc.h U sys Updated to revision 220927. Log: svn log -v -r 220903:220927 svn://svn.freebsd.org/base/stable/8 -- Dan Nelson dnel...@allantgroup.com ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Unstable ARP responses times
In the last episode (May 13), Bartosz Woronicz said: > Since I moved from 7.3-stable to 8.2-stable I go strange long responses > of arp, with arping. > I.e. > root@Korbotron82|pts/3|13:35:35|/home/mastier # arping -i vlan92 > 79.110.194.140 > ARPING 79.110.194.140 > 60 bytes from 00:15:17:a2:ea:38 (79.110.194.140): index=0 time=1.579 msec > 60 bytes from 00:15:17:a2:ea:38 (79.110.194.140): index=1 time=653.326 msec > 60 bytes from 00:15:17:a2:ea:38 (79.110.194.140): index=2 time=7.153 usec arping has a usleep(1) call in its read loop, which can cause delays like this if there are other processes running and the scheduler decides to run another process. Try removing the usleep(1) on line 916 of arping.c and see if that helps. The best solution would be to use the kernel-provided timestamps from the pcap header, rather than calling gettimeofday() in userland. If you run "tcpdump arp", you should be able to see the packet timestamps as the kernel sees them. -- Dan Nelson dnel...@allantgroup.com ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: ZFS I/O errors
In the last episode (May 30), Olaf Seibert said: > On Mon 30 May 2011 at 03:33:49 -0700, Jeremy Chadwick wrote: > > On Mon, May 30, 2011 at 12:10:51PM +0200, Olaf Seibert wrote: > > I'm not sure why this didn't actually map to a filename on the system > > however. I've never quite understood what the hexadecimal values shown > > represent (I have ideas but it'd be useful to know what they meant). > > The scrub is starting to add some filenames to the list. So far they are > two filenames in snapshots (where current versions of the file have been > modified since then). > > > Try running without compression and see if that improves things. > > That sounds like a good idea. > > My theory so far is that it ran out of memory while compressing, with > incorrect compressed data written to the disk. The ZFS compression code will panic if it can't allocate the buffer needed to store the compressed data, so that's unlikely to be your problem. The only time I have seen an "illegal byte sequence" error was when trying to copy raw disk images containing ZFS pools to different disks, and the destination disk was a different size than the original. I wasn't even able to import the pool in that case, though. The zfs IO code overloads the EILSEQ error code and uses it as a "checksum error" code. Returning that error for the same block on all disks is definitely weird. Could you have run a partitioning tool, or some other program that would have done direct writes to all of your component disks? Your scrub is also a bit worrying - 24k checksum errors definitely shouldn't occur during normal usage. -- Dan Nelson dnel...@allantgroup.com ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: stable/8 nfsd: is it normal to have one worker regardless of -n setting?
In the last episode (Aug 01), Dmitry Morozovsky said: > just noticed that contemporary nfsd does not fork children in accordance > to -n setting: > > stable/8: > > root@beaver:/usr/local/tb/scripts# pid nfs > 1745 ?? Is 0:00.02 nfsd: master (nfsd) > 1746 ?? S 0:03.29 nfsd: server (nfsd) > root@beaver:/usr/local/tb/scripts# grep nfs_server_flags /etc/rc.conf > nfs_server_flags="-u -t -n 4" They are threads now: # ps axw | grep nfsd 1373 ?? Is 0:00.02 nfsd: master (nfsd) 1374 ?? S 5:47.14 nfsd: server (nfsd) # ps axwH | grep nfsd 1373 ?? Is 0:00.02 nfsd: master (nfsd) 1374 ?? S 1:25.79 nfsd: server (nfsd) 1374 ?? S 1:26.65 nfsd: server (nfsd) 1374 ?? S 1:27.67 nfsd: server (nfsd) 1374 ?? S 1:27.04 nfsd: server (nfsd) -- Dan Nelson dnel...@allantgroup.com ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Something missing in truss
In the last episode (Dec 02), Eivind Evensen said: > Does anybody else see this or know why? > > The machine here is running : > > > uname -a > FreeBSD elg.hjerdalen.lokalnett 8.2-STABLE FreeBSD 8.2-STABLE #36: Wed Nov 30 > 22:03:07 CET 2011 > rumrunner@elg.hjerdalen.lokalnett:/usr/obj/usr/src/sys/RUM amd64 > > While trying to weed out some firefox problems, I've noticed > that truss doesn't recognise certain syscalls : > > getpid() = 1519 (0x5ef) > clock_gettime(4,{48496.335142903 })= 0 (0x0) > kevent(20,{0x23,EVFILT_READ,EV_ADD,0,0x0,0x809ec9d80},1,{0x15,EVFILT_READ,0x0,0,0x1,0x809ec9e80},64,0x0) > = 1 (0x1) > clock_gettime(4,{48496.335293202 })= 0 (0x0) > read(21,"\0",1)= 1 (0x1) > clock_gettime(4,{48496.335382599 })= 0 (0x0) > umask(0x80a52ee20,0x8,0x0,0x80a52ee00,0x7f1f9eb0,0x80a52ee00) = 116 (0x74) > -- UNKNOWN SYSCALL -14704864 -- > syscall(0x7f1f9ec0,0x0,0x18745,0x7f1f9eb0,0x1,0x7f1f9e90) = 454 > (0x1c6) > umask(0x80a52ee20,0x8,0x0,0x80a52ee00,0x7f1f9eb0,0x80a52ee00) = 116 (0x74) > -- UNKNOWN SYSCALL -14704864 -- > syscall(0x7f1f9ec0,0x0,0x18745,0x7f1f9eb0,0x1,0x7f1f9e90) = 454 > (0x1c6) > umask(0x80a52ee20,0x8,0x0,0x80a52ee00,0x7f1f9eb0,0x80a52ee00) = 116 (0x74) > -- UNKNOWN SYSCALL -14704864 -- > syscall(0x7f1f9ec0,0x0,0x18745,0x7f1f9eb0,0x1,0x7f1f9e90) = 454 > (0x1c6) > umask(0x80a52ee20,0x8,0x0,0x80a52ee00,0x7f1f9eb0,0x80a52ee00) = 116 (0x74) > -- UNKNOWN SYSCALL -14704864 -- > syscall(0x7f1f9ec0,0x0,0x18745,0x7f1f9eb0,0x1,0x7f1f9e90) = 454 > (0x1c6) > umask(0x80a52ee20,0x8,0x0,0x80a52ee00,0x7f1f9eb0,0x80a52ee00) = 116 (0x74) > -- UNKNOWN SYSCALL -14704864 -- > syscall(0x7f1f9ec0,0x0,0x18745,0x7f1f9eb0,0x1,0x7f1f9e90) = 454 > (0x1c6) Two problems: truss get confused when you attach to a process that's currently executing a syscall, and it gets even more confused when you have a threaded process waiting in many syscalls at once. The following patch fixes problem #1, but problem #2 involves keeping more per-thread state and ends up touching a lot of the truss code. See http://www.evoy.net/FreeBSD/truss.diff for one solution (and more syscall decodes). Index: setup.c === --- setup.c (revision 228242) +++ setup.c (working copy) @@ -202,8 +202,10 @@ find_thread(info, lwpinfo.pl_lwpid); switch(WSTOPSIG(waitval)) { case SIGTRAP: - info->pr_why = info->curthread->in_syscall?S_SCX:S_SCE; - info->curthread->in_syscall = 1 - info->curthread->in_syscall; + if ((lwpinfo.pl_flags&(PL_FLAG_SCE|PL_FLAG_SCX)) == 0) + err(1,"pl_flags=%x contains neither PL_FLAG_SCE or PL_FLAG_SCX", lwpinfo.pl_flags); + info->pr_why = (lwpinfo.pl_flags&PL_FLAG_SCE) ? S_SCE:S_SCX; + info->curthread->in_syscall = (info->pr_why == S_SCE) ? 1:0; break; default: info->pr_why = S_SIG; -- Dan Nelson dnel...@allantgroup.com ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: kernel: negative sbsize for uid = 0
In the last episode (Dec 13), Doug Barton said: > I'm running 8.2-RELEASE-p4 i386 on some web servers that are generally > lightly-moderately loaded, but occasionally see some heavy spikes where > load average goes way up. When that is happening, but sometimes even when > it's not, I get hundreds of this message spewing into the logs: > > kernel: negative sbsize for uid = 0 > > I haven't found anything particularly useful by searching for that > message, the one reference was to mbufs, but that seems not to be the > problem. Here is the output of 'netstat -m' during one of the load > spikes: [...] > So is this message something to worry about? If so, how can I diagnose > what's happening, and how do I fix it? I've seen it ocassionally too. The error message is printed in /sys/kern/kern_resource.c when the ui_sbsize resource counter goes negative. There's probably insufficient locking somewhere in the functions that call chgsbsize. The increment/decrement is done atomically, but the data pointed to by the "hiwat" argument is read then updated later without an explicit lock, so if that value changes while the function is executing, it could cause problems. ui_sbsize is only used by the resource limiting code, though, so unless you're enforcing an sbsize rlimit, it should be harmless. -- Dan Nelson dnel...@allantgroup.com ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: swi4: clock taking 40% cpu?!?
In the last episode (Dec 15), Jeremy Chadwick said: > On Thu, Dec 15, 2011 at 12:51:28PM -0800, Doug Barton wrote: > > Web server under heavy'ish load (7 on a 2 cpu system) running > > 8.2-RELEASE-p4 i386 I'm seeing this: > > > > PID USERNAME PRI NICE SIZERES STATE C TIME WCPU COMMAND > > 12 root -32- 0K 112K WAIT0 129:01 39.99% {swi4: clock} > > > > Any ideas why the clock should be taking so much cpu? HZ=100 if that > > makes a difference ... > > Could be wrong, but I believe this correlates with IRQ 4. What does > vmstat -i show for a total and rate for irq4 if you run it, wait a few > seconds, then run it again? Does the number greatly/rapidly increase? That would be "irq4" in that case, though. "swi4" is just a software interrupt thread, and "clock" is the softclock callout handler. There are both KTR and DTrace logging functions in kern_timeout.c, so you could use either one to get a handle on what's eating your CPU. Busy-looping "procstat -k 12" for a few seconds might get you some useful stacks, as well. > Shot in the dark here, but the only thing I can think of that might > cause this is software being extremely aggressive with calls to things > like gettimeofday(2) or clock_gettime(2). Really not sure. ntpd maybe > (unlikely but possible)? Sort of grasping at straws here. -- Dan Nelson dnel...@allantgroup.com ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Snapshot corruption.
In the last episode (Nov 22), David Gilbert said: > >>>>> "Brian" == Brian Fundakowski Feldman <[EMAIL PROTECTED]> writes: > Brian> Long strings of NUL bytes? Missing data? Spam (from the same > Brian> file, or from other files)? > > Well... I don't really know db file formats. Most of the corruption > I found in berkley db files. mailgraph uses rrd. mailman uses some > form of berkley db, too. I don't know what the corruption "looked" > like other than the db library would no longer accept it. db files are very fragile when it comes to OS or process crashes. There is no logging, and writes are cached until the process exits or a db->sync() is called, virtually guaranteeing corruption. Ideally, db files should only cache data and be rebuildable from other data, or they should db->sync() after every write. db 2+ databases can do logging, but I don't know how many applications actually request it. -- Dan Nelson [EMAIL PROTECTED] ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: 5-STABLE softupdates issue?
In the last episode (Nov 24), Scott Long said: > Matthias Andree wrote: > > out of fun and to investigate claims about alleged bgfsck resource > > hogging (which I could not reproduce) posted to > > news:de.comp.os.unix.bsd, I pressed the reset button on a live > > FreeBSD 5-STABLE system. > > > > Upon reboot, fsck -p complained about an unexpected softupdates > > inconsistency on the / file system and put me into single user > > mode, the manual fsck / then asked me to agree to increasing a link > > count from 21 to 22 (and later to fix the summary, which I consider > > a non-issue). A subsequent fsck -p / ended with no abnormality > > detected. > > No, this in theory should not happen. YOu could have caught it right > at the instance that it was sending a transaction out to disk, or you > could have caught an edge case that isn't understood yet. > Unfortunately, ATA drives also cannot be trusted to flush their > caches when one would expect, so this leaves open a lot of possible > causes for your problem. If you just want to test stability in the face of system crashes (and not power failure), you can drop to DDB and run "reboot" to simulate a panic (or run reboot -qn as root). That way your drive doesn't lose power. That said, I get unexpected softupdates inconsistencies pretty regularly on kernel panics. I just let the system run until I can reboot and run a fsck -p. -- Dan Nelson [EMAIL PROTECTED] ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
panic: Duplicate free of item 0xc37aa200 from zone 0xc103f9a0(Mbuf)
Got this overnight on a 5.3-STABLE 2005-01-13 kernel: This GDB was configured as "i386-portbld-freebsd5.3"... panic: Duplicate free of item 0xc37aa200 from zone 0xc103f9a0(Mbuf) panic messages: --- panic: Duplicate free of item 0xc37aa200 from zone 0xc103f9a0(Mbuf) cpuid = 0 --- trap 0x1, eip = 0, esp = 0xe2561d7c, ebp = 0 --- boot() called on cpu#0 Uptime: 15h14m6s Dumping 1023 MB Compressing Compressed to 314 MB Dumpsize = 330268160 Dump starting at 475069440 --- #0 doadump () at pcpu.h:159 in pcpu.h doadump () at pcpu.h:159 159 in pcpu.h #0 doadump () at pcpu.h:159 #1 0xc059c1c6 in boot (howto=260) at ../../../kern/kern_shutdown.c:410 #2 0xc059bc3b in panic (fmt=0xc079a0b3 "Duplicate free of item %p from zone %p(%s)\n") at ../../../kern/kern_shutdown.c:566 #3 0xc06e4b96 in uma_dbg_free (zone=0xc103f9a0, slab=0xc37aafa8, item=0xc37aa200) at ../../../vm/uma_dbg.c:299 #4 0xc06e2ce0 in uma_zfree_arg (zone=0xc103f9a0, item=0xc37aa200, udata=0x0) at ../../../vm/uma_core.c:2257 #5 0xc05d760c in m_freem (mb=0x0) at uma.h:302 #6 0xc05d8d15 in m_defrag (m0=0xc37aa200, how=1) at ../../../kern/uipc_mbuf.c:1124 #7 0xc068d49a in dc_start (ifp=0xc238d000) at ../../../pci/if_dc.c:3337 #8 0xc05bf76d in taskqueue_run (queue=0xc2324900) at ../../../kern/subr_taskqueue.c:191 #9 0xc05854f2 in ithread_loop (arg=0xc227d900) at ../../../kern/kern_intr.c:547 #10 0xc05843f9 in fork_exit (callout=0xc0585310 , arg=0x0, frame=0x0) at ../../../kern/kern_fork.c:807 #11 0xc071a3dc in fork_trampoline () at ../../../i386/i386/exception.s:209 (gdb) -- Dan Nelson [EMAIL PROTECTED] ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
panic: Assertion besttd != NULL failed at ../../../kern/subr_sleepqueue.c:676
Got this while running a Java proxy server under libthr: FreeBSD dan.emsphone.com 5.3-STABLE FreeBSD 5.3-STABLE #387: Thu Jan 13 14:43:03 CST 2005 [EMAIL PROTECTED]:/usr/src/sys/i386/compile/DANSMP i386 panic: Assertion besttd != NULL failed at ../../../kern/subr_sleepqueue.c:676 cpuid = 1 KDB: stack backtrace: panic(c07822b6,c078843f,c078825f,2a4,) at panic+0x1cf sleepq_signal(c3a9b4b0,0,,e7461ce0,c05a74eb) at sleepq_signal+0xf0 wakeup_one(c3a9b4b0,0,c07863e3,12e,be3173f0) at wakeup_one+0x20 thr_wake(c2bb84b0,e7461d14,4,16,1) at thr_wake+0xdb syscall(3123002f,2f,4c98002f,811bfc0,83f4400) at syscall+0x137 Xint0x80_syscall() at Xint0x80_syscall+0x1f --- syscall (443, FreeBSD ELF32, thr_wake), eip = 0x280e658f, esp = 0xbe3173ec, ebp = 0xbe317418 --- panic: mi_switch: switch in a critical section cpuid = 1 The last two lines repeated forever until I reset the box, so no coredump, but the stack is almost identical to a panic I got back in November and did get a trace on (the crashdump is long gone though). #0 doadump () at pcpu.h:159 #1 0xc059aa86 in boot (howto=260) at ../../../kern/kern_shutdown.c:397 #2 0xc059a55b in panic (fmt=0xc077e446 "Assertion %s failed at %s:%d") at ../../../kern/kern_shutdown.c:553 #3 0xc05bcb5c in sleepq_remove_thread (sq=0xc3394dc0, td=0x0) at ../../../kern/subr_sleepqueue.c:594 #4 0xc05bd47d in sleepq_signal (wchan=0xc2c5d7d0, flags=0, pri=-1) at ../../../kern/subr_sleepqueue.c:675 #5 0xc05a1d80 in wakeup_one (ident=0x0) at ../../../kern/kern_synch.c:266 #6 0xc05a5dab in thr_wake (td=0xc38abaf0, uap=0x0) at ../../../kern/kern_thr.c:303 #7 0xc072a797 in syscall (frame={tf_fs = 819527727, tf_es = 1285816367, tf_ds = -1078001617, tf_edi = 135378944, tf_esi = 137891840, tf_ebp = -1173986296, tf_isp = -419791500, tf_ebx = 671700552, tf_edx = 138204160, tf_ecx = 0, tf_eax = 443, tf_trapno = 22, tf_err = 2, tf_eip = 672032143, tf_cs = 31, tf_eflags = 582, tf_esp = -1173986340, tf_ss = 47}) at ../../../i386/i386/trap.c:1001 #8 0xc07168af in Xint0x80_syscall () at ../../../i386/i386/exception.s:201 -- Dan Nelson [EMAIL PROTECTED] ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
panic: Duplicate free of item 0xc37aa200 from zone 0xc103f9a0(Mbuf)
I've gotten this one twice in 5 days (still have the cores if anyone needs more info): FreeBSD dan.emsphone.com 5.3-STABLE FreeBSD 5.3-STABLE #387: Thu Jan 13 14:43:03 CST 2005 [EMAIL PROTECTED]:/usr/src/sys/i386/compile/DANSMP i386 panic: Duplicate free of item 0xc37aa200 from zone 0xc103f9a0(Mbuf) #0 doadump () at pcpu.h:159 #1 0xc059c1c6 in boot (howto=260) at ../../../kern/kern_shutdown.c:410 #2 0xc059bc3b in panic (fmt=0xc079a0b3 "Duplicate free of item %p from zone %p(%s)\n") at ../../../kern/kern_shutdown.c:566 #3 0xc06e4b96 in uma_dbg_free (zone=0xc103f9a0, slab=0xc37aafa8, item=0xc37aa200) at ../../../vm/uma_dbg.c:299 #4 0xc06e2ce0 in uma_zfree_arg (zone=0xc103f9a0, item=0xc37aa200, udata=0x0) at ../../../vm/uma_core.c:2257 #5 0xc05d760c in m_freem (mb=0x0) at uma.h:302 #6 0xc05d8d15 in m_defrag (m0=0xc37aa200, how=1) at ../../../kern/uipc_mbuf.c:1124 #7 0xc068d49a in dc_start (ifp=0xc238d000) at ../../../pci/if_dc.c:3337 #8 0xc05bf76d in taskqueue_run (queue=0xc2324900) at ../../../kern/subr_taskqueue.c:191 #9 0xc05854f2 in ithread_loop (arg=0xc227d900) at ../../../kern/kern_intr.c:547 #10 0xc05843f9 in fork_exit (callout=0xc0585310 , arg=0x0, frame=0x0) at ../../../kern/kern_fork.c:807 #11 0xc071a3dc in fork_trampoline () at ../../../i386/i386/exception.s:209 panic: Duplicate free of item 0xc274d000 from zone 0xc103f9a0(Mbuf) #0 doadump () at pcpu.h:159 #1 0xc059c1c6 in boot (howto=260) at ../../../kern/kern_shutdown.c:410 #2 0xc059bc3b in panic (fmt=0xc079a0b3 "Duplicate free of item %p from zone %p(%s)\n") at ../../../kern/kern_shutdown.c:566 #3 0xc06e4b96 in uma_dbg_free (zone=0xc103f9a0, slab=0xc274dfa8, item=0xc274d000) at ../../../vm/uma_dbg.c:299 #4 0xc06e2ce0 in uma_zfree_arg (zone=0xc103f9a0, item=0xc274d000, udata=0x0) at ../../../vm/uma_core.c:2257 #5 0xc05d760c in m_freem (mb=0x0) at uma.h:302 #6 0xc05d8d15 in m_defrag (m0=0xc274d000, how=1) at ../../../kern/uipc_mbuf.c:1124 #7 0xc068d49a in dc_start (ifp=0xc238d000) at ../../../pci/if_dc.c:3337 #8 0xc05bf76d in taskqueue_run (queue=0xc2324900) at ../../../kern/subr_taskqueue.c:191 #9 0xc05854f2 in ithread_loop (arg=0xc227d900) at ../../../kern/kern_intr.c:547 #10 0xc05843f9 in fork_exit (callout=0xc0585310 , arg=0x0, frame=0x0) at ../../../kern/kern_fork.c:807 #11 0xc071a3dc in fork_trampoline () at ../../../i386/i386/exception.s:209 -- Dan Nelson [EMAIL PROTECTED] ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Very large directory
In the last episode (Jan 19), Phillip Salzman said: > I have a pair of servers that act as SMTP/AV gateways. It seems that > even though we've told the AV software not to store messages, it is > anyway. > > They've been running for a little while now - and recently we've > noticed a lot of disk space disappearing. Shortly after that, a > simple du into our /var/spool returned a not so nice error: > > du: fts_read: Cannot allocate memory > > No matter what command I run on that directory, I just don't seem to > have enough available resources to show the files let alone delete > them (echo *, ls, find, rm -rf, etc.) Try raising your datasize rlimit value; also see the thread "Directories with 2 million files" at http://lists.freebsd.org/pipermail/freebsd-current/2004-April/026170.html for some other ideas. "find . | xargs rm" sounds promising. -- Dan Nelson [EMAIL PROTECTED] ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: what is vrlock, and why is it causing me problems?
In the last episode (Feb 01), Marc G. Fournier said: > > /proc/10/status:syncer 10 0 0 0 -1,-1 noflags 1103511247,149 0,0 44097,435728 > vrlock 0 0 0,0 - > /proc/39506/status:postgres 39506 39505 39500 39500 -1,-1 noflags > 1106713055,403942 3002,40214 4118,737982 vrlock 1001 1001 1001,1001,1001 > maildb.hub.org > /proc/51927/status:postgres 51927 41068 41068 41068 -1,-1 noflags > 1107288042,547570 0,0 1,303423 vrlock 1001 1001 1001,1001,1001 > pgsql72.hub.org > /proc/52582/status:postgres 52582 52581 52579 52579 -1,-1 noflags > 1104953811,860872 16534,324197 22783,184956 vrlock 1001 1001 1001,1001,1001 > pgsql74.hub.org > /proc/53309/status:umount 53309 82960 53309 82960 5,2 ctty 1107288298,562659 > 0,0 0,546634 vrlock 0 0 0,0,0,2,3,4,5,20,31 - > /proc/54039/status:umount 54039 53941 54039 53941 5,4 ctty 1107288402,928683 > 0,0 0,526544 vrlock 0 0 0,0,0,2,3,4,5,20,31 - > /proc/9/status:bufdaemon 9 0 0 0 -1,-1 noflags 1103511247,130 0,0 432,924063 > vrlock 0 0 0,0 - vrlock is an internal vinum lock (see /sys/dev/vinum/vinumlock.c ). > Load on the server is neglible, so it isn't like I'm dealing with > 'server lag' > > FreeBSD 4.10-STABLE #7: Thu Oct 7 20:17:02 ADT 2004 more like deadlock, I think. There were two commits in RELENG_4 after your build time, but it's a performance fix and probably won't affect your lock problem. -- Dan Nelson [EMAIL PROTECTED] ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: [HEADS UP] perl symlinks in /usr/bin will be gone
In the last episode (Feb 03), Christian Weisgerber said: > Chuck Swiger <[EMAIL PROTECTED]> wrote: > > > Well-behaved 3rd party scripts ought to start Perl via: > > #! /usr/bin/env perl > > Why should the authors of those scripts break them for systems which > have /bin/env? Are there any systems that have a /bin/env (and that do not also have a /bin -> /usr/bin symlink)? -- Dan Nelson [EMAIL PROTECTED] ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Adjusting time on a secured FreeBSD machine.
In the last episode (Feb 04), Stan said: > I ran into this same problem. After trying various things, I finally > gave up and did it the easy way. If you don't mind rebooting, the > easiest thing to do is set the clock in the BIOS as accurately as > possible, then let ntpd fine tune it from there. Setting your BIOS clock shouldn't be necessary since the bootup scripts will do an ntp sync before raising the securelevel anyway. Make sure you have ntpdate_enable="YES" in rc.conf. -- Dan Nelson [EMAIL PROTECTED] ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: swapfile being eaten by unknown process
In the last episode (Feb 14), Kris Kennaway said: > On Tue, Feb 15, 2005 at 01:30:42AM +, John wrote: > > Is there a way of seeing *what* program/process is eating swap. > > There are loads of ways of seeing that it is being eaten, but so > > far haven't found a way of knowing what eats, so can't fix the > > problem. Can anyone enlighten me? > > Use ps or top, and look for the process with the huge size. This is > not foolproof, because a process can allocate memory without using it > (e.g. rpc.statd), but it's a place to start. If you see a process > that is both large, and paging to/from disk, that's a better > indication. To see which processes are paging: run top, hit 'm' to switch modes, and hit 'o' then 'fault' to sort the processes by how many page faults they are doing. This isn't completely foolproof either, since reads from mmap()ed files count as faults as well. -- Dan Nelson [EMAIL PROTECTED] ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: buildworld fails on: ===> bin/domainname
In the last episode (Feb 21), bsdnooby said: > When I run 'make buildworld' I get a series of errors like this: > > rm -f .depend GPATH GRTAGS GSYMS GTAGS > ===> bin/domainname > "Makefile", line 3: Need an operator > ... > "Makefile", line 33 Need an operator > make: fatal errors encountered -- cannot continue > *** Error code 1 > Stop in /usr/src/bin > *** Error code 1 > Stop in /usr/src > *** Error code 1 > Stop in /usr/src The contents of bin/domainname/Makefile should read: # $FreeBSD: src/bin/domainname/Makefile,v 1.7 2001/12/04 01:57:40 obrien Exp $ PROG= domainname .include Check to make sure your copy didn't get corrupted somehow. -- Dan Nelson [EMAIL PROTECTED] ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: FreeBSD 5-STABLE, MSI KT880 , fxp and SCB timeouts
In the last episode (Feb 23), Jon Noack said: > On 02/23/05 20:06, Mario Sergio Fujikawa Ferreira wrote: > > My last motherboard burned down to ashes so I got myself a brand > >(after 2 weeks) new MSI KT880. I am getting some weird results. > > > > 1) fxp intel etherxpress 10/100 network cards report SCB timeout > >as well as achieving ridiculously low transfer rates of 600 > >Bytes/second. Well, I got 10 KBytes/sec once but that does not count > >since a side box gets more than 50KB/s ;-) on the same hub. Oh, I've > >already switched hub ports, rj45 cables and fxp cards. > > Duplex mismatch? You say "hub" and not "switch", so you might need > to force the card to half-duplex. Oddly enough, the fxp(4) man page > doesn't include half-duplex as a media option. Surely it supports > it... Autodetection on ethernet detects both speed and duplex, and full-duplex and half-duplex are either/or, so if you force a speed and don't force full-duplex, you get half-duplex by default. -- Dan Nelson [EMAIL PROTECTED] ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: vmstat/iostat/top all fail to report CPU usage
In the last episode (Feb 28), Jeff Behl said: > as reported in bug: bin/60385 > > this is still occurring in almost all of our systems, even those at > stable, and is pretty major issue. any known progress on this? we're > running ibm e325 servers. > > FreeBSD www3 5.3-STABLE FreeBSD 5.3-STABLE #1: Tue Feb 15 10:09:17 PST > 2005[EMAIL PROTECTED]:/usr/obj/usr/src/sys/SMP amd64 Here's the workaround I use on a machine here that loses its stat clock after a week or so of uptime. Put this in /etc/crontab: # Flaky clock. Check it every 5 minutes. */5 * * * * root/usr/local/bin/fixrtc .. and this in /usr/local/bin/fixrtc: #! /bin/sh # get the interrupt rate for the stat clock over one second getticks() { ( vmstat -i ; sleep 1 ; vmstat -i ) | awk '/rtc/ { if (sum) sum+=$3; else sum-=$3 } END { print sum }' } ticks=$(getticks) # It should be firing at 128 hz. If not, kick it if [ $ticks -lt 64 ] ; then echo "Stat clock has died. Attempting to reset." echo /etc/rc.d/ntpd stop echo /usr/sbin/ntpdate -b ntp.pool.org echo /etc/rc.d/ntpd start echo echo "RTC interrupt rate is now $(getticks)" fi -- Dan Nelson [EMAIL PROTECTED] ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Abort Trap for cron-jobs in 5.3
In the last episode (Mar 15), Niklas Saers said: > On Tue, 15 Mar 2005, Niklas Saers wrote: > > I've got four servers that all have the same problem: when jobs get > > started from Cron, they die after some time with an "Abort trap". > > Jobs that are dying are: > > > > /usr/libexec/atrun >> /var/log/cron 2>&1 /usr/bin/nice -10 > > /usr/local/bin/zsh /root/bin/sendBarkMail.sh > /dev/null 2>&1 > > > > I also get this on virtually every shell-script that uses tar, leaving my > > filesystem littered with bsdtar.core files. > > > > Running these jobs from the command prompt works fine. Any suggestions on > > what may be causing them to die from cron? sendBarkMail.sh simply moves > > mails from one folder to another periodically > > Note to self: ask the question. ;-) > > What I'm wondering about is: what could be causing the Abort Trap's? > > World and kernel are a recent RELENG_5_3 compiled like described in > src/UPDATING. What's the stack trace from one of those cores? Also, try not redirecting stdout and stderr to /dev/null; you are probably discarding a valuable error message. -- Dan Nelson [EMAIL PROTECTED] ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: NFS failure for bonnie++
In the last episode (Mar 16), David Gilbert said: > I have two machines running 5.3-PRERELEASE (cvsup'd yesterday). > They're dual opterons running amd64 code. One of them has 1.0T of > disk mounted with gmirror, gconcat and ggate... and it exports this > via nfs. > > The other is an nfs client. > > When I run bonnie++ -n 200 -s 4000 -u dgilbert on the server, it runs > fine. When I run the same command on the client, it dies trying to > delete the files. bonnie segfaults? -- Dan Nelson [EMAIL PROTECTED] ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: kern/78664: truss does not work on 5-STABLE(5.4-PRERELEASE)
HASHI Hiroaki wrote: > truss command does not work with below message. > > "truss: PIOCBIS: Inappropriate ioctl for device" I've narrowed it down to something committed between 02-24 and 02-27, but can't continue the binary search until tonight. It would be really nice if this was fixed before 5.4 gets released :) -- Dan Nelson [EMAIL PROTECTED] ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: malloc() debugging flags broken on RELENG_5
In the last episode (Mar 21), Bartosz Fabianowski said: > Some commit in the last few weeks has broken the malloc() debug flags > on RELENG_5. According to the man page, a call to free() or realloc() > with a modified pointer should cause a warning. Setting the "A" flag > in either /etc/malloc.conf or MALLOC_OPTIONS should turn this into an > error. However, what happens is that this *always* causes an error. > And even setting the corresponding "a" flag does not turn it into a > warning. You're not running as root, are you? The A flag is always set for root or setuid processes as a security measure. There hasn't been any changes to the malloc code in 5.x since 5.3. > This is very unfortunate as some poorly written programs (KDE's > Kopete messenger in my case) seem to rely on the fact that free() and > realloc() with modified pointers are OK. File a bugreport; a program must pass the same pointer to free() that it received from malloc(). -- Dan Nelson [EMAIL PROTECTED] ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: malloc() debugging flags broken on RELENG_5
In the last episode (Mar 21), Bartosz Fabianowski said: > >You're not running as root, are you? The A flag is always set for > >root or setuid processes as a security measure. > > No, I am running as a normal user. > > >There hasn't been any changes to the malloc code in 5.x since 5.3. > > I realize there shouldn't have been any changes and I also cannot > find everything in the CVS logs. But when I run Kopete, I get the > following: > > kopete in free(): error: modified (chunk-) pointer > ^ > According to the man page, this word should read "warning" instead of > "error" and the application should not be aborted. The actual test in the malloc code reads: if (malloc_abort || issetugid() || getuid() == 0 || getgid() == 0) wrterror(p) , so it may also trigger if your primary groupid is 0 (wheel). Just being a member of the wheel group won't trigger it. > >File a bugreport; a program must pass the same pointer to free() that > > it received from malloc(). > > Obviously, there is a bug in Kopete. But it runs for other people with > earlier versions of RELENG_5. I am currently downgrading to 1st March to > see whether that fixes the issue for me. It might also be caused by some dependant package, and not strictly kopete's fault. Depends on what is being freed. -- Dan Nelson [EMAIL PROTECTED] ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Time zone change confuses cron
In the last episode (Mar 29), Ladislav Bodnar said: > I've just changed the system time zone from local time to UTC by > copying /usr/share/zoneinfo/Etc/UTC to /etc/localtime. To my dismay, > I found that crontab (both /etc/crontab and user-level crontab) > completely ignores the change and continues executing scripts > according to the old time. If you haven't rebooted yet, restart cron. A process reads timezone settings only once, during startup. You're not supposed to pull the rug out from under its feet by switching /etc/localtime :) -- Dan Nelson [EMAIL PROTECTED] ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: syscons options and memory use
In the last episode (Mar 31), Ronald Klop said: > The syscons manual page says: > "The following options will remove some features from the syscons > driver and save kernel memory. > [...] > SC_NO_SYSMOUSE > This option removes mouse support in the syscons driver. > The mouse daemon moused(8) will fail if this option is > defined. This option implies the SC_NO_CUTPASTE option > too. > " > > How much memory does this save (or how can I discover that)? Is it worth > it on a 96MB PentiumII laptop? I would guess that the memory savings is probably on the order of kilobytes. Useful if you're trying to prevent excessive swapping on an 8MB system. Not worth disabling on your system. -- Dan Nelson [EMAIL PROTECTED] ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: syscons options and memory use
In the last episode (Mar 31), Ronald Klop said: > On Thu, 31 Mar 2005 01:04:10 -0600, Dan Nelson <[EMAIL PROTECTED]> > wrote: > >In the last episode (Mar 31), Ronald Klop said: > >>The syscons manual page says: > >>"The following options will remove some features from the syscons > >> driver and save kernel memory. > >> [...] > >> SC_NO_SYSMOUSE > >>This option removes mouse support in the syscons driver. > >>The mouse daemon moused(8) will fail if this option is > >>defined. This option implies the SC_NO_CUTPASTE option > >>too. > >>" > >> > >>How much memory does this save (or how can I discover that)? Is it worth > >>it on a 96MB PentiumII laptop? > > > >I would guess that the memory savings is probably on the order of > >kilobytes. Useful if you're trying to prevent excessive swapping on an > >8MB system. Not worth disabling on your system. > > How can I see the size of my kernel? > I know vmstat -m and netstat -m, but from that info I don't see if I > reduced the memory footprint after disabling an option or device. For the kernel size itself, just "ls -l /boot/kernel/kernel" :) A more interesting number might be the output of "sysctl hw.usermem", which I believe is the amount of memory available to user processes. -- Dan Nelson [EMAIL PROTECTED] ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: nfsiod tasks started in error
In the last episode (Apr 07), [EMAIL PROTECTED] said: > During sysinstall answered no to the server and client nfs questions > and after installed completed and system rebooted I see task > nfsiod1,2,3,4 running in output of ps ax command. This was not the > case in any of the 4.x releases. This can be looked upon as a > security leak. This may be a error in the new boot up process. This > was first reported 1/16/2004 in 5.2 RC2 as Problem Report kern/61438 > and again in 5.3 as Problem Report kern/79539 Both of those PRs should be closed as not-a-bug, I think. nfsiod threads simply allow multiple concurrent NFS requests. In 4.*, with no nfiod processes running, you can still use NFS (just more slowly than with them). In 5.*, they are auto-created as kernel threads during bootup. > I tried to run /usr/local/etc/rc.d/killnfs.sh script to kill these > unwanted tasks but that does not work. They aren't tasks, but kernel threads. Just like pagedaemon, swapper, g_event, irq*, swi*, and a couple dozen other threads created by the kernel. > Any suggestions on how I can kill these bogus nfs tasks as part of > boot up or what to change in the boot up process so these tasks don't > get started in the first place? Doing a manual recompile of the > kernel to remove the nfs statements is not a viable solution. Why not? If you want to disable NFS, that's the only way. -- Dan Nelson [EMAIL PROTECTED] ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: nfsiod tasks started in error
In the last episode (Apr 07), [EMAIL PROTECTED] said: > You did not answer my question. how can I kill these bogus nfs tasks > as part of the boot up or what to change in the boot up process so > these tasks don't get started in the first place? What is a work > around with out compiling the kernel? The answer is to remove "options NFS" from your kernel and recompile. If you enable the nfs client, you automatically get "nfsiod" threads created for you, just like if you have acpi compiled into your kernel, you get "acpi_task*" threads, or if you have a keyboard plugged in, you get an "irq1: atkbd0" thread. Neither of those existed in 4.* and you're not complaining about them. What is it about those four nfsiod threads that upsets you so much? -- Dan Nelson [EMAIL PROTECTED] ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: nfsiod tasks started in error
In the last episode (Apr 07), [EMAIL PROTECTED] said: [ my exact message; this is even worse than top-posting. ] > Plan and simple. It's a security hole. If no nfs is selected in > sysinstall then there should not be any nfs stuff started at all. Then the fix is to remove the nfs client flag from sysinstall, since it's built into the kernel and cannot be disabled without rebuilding. -- Dan Nelson [EMAIL PROTECTED] ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: kernel killing processes when out of swap
In the last episode (Apr 12), Nick Barnes said: > This is the well-known problem with my fantasy world in which the OS > doesn't overcommit any resources. All those programs are broken, but > it's too costly to fix them. If overcommit had been resisted more > effectively in the first place, those programs would have been > written properly. Another issue is things like shared libraries; without overcommit you need to reserve the file size * the number of processes mapping it, since you can't guarantee they won't touch every COW page handed to them. I think you can design a shlib scheme where you can map the libs RO; not sure if you would take a performance hit or if there are other considerations. There's a similar problem when large processes want to fork+exec something; for a fraction of a second you need to reserve 2x the process's space until the exec frees it. vfork solves that problem, at the expense of blocking the parent until the child's process is loaded. -- Dan Nelson [EMAIL PROTECTED] ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Newbie Question About System Update
In the last episode (Apr 19), Bill Moran said: > Chuck Swiger <[EMAIL PROTECTED]> wrote: > > Bill Moran wrote: > > > The system can not replace programs that are in use, > > This is generally not the case. Unix lets you continue to access a > > file after it has been deleted, so long as the process hangs on to > > a file descriptor. This lets you replace programs in use, without > > running into the same problems that platforms like Windows have. > > What you say?: > > bash-2.05b$ su > Password: > bolivia# cp /usr/sbin/cron /home/wmoran/. > bolivia# cp /home/wmoran/cron /usr/sbin/. > cp: /usr/sbin/./cron: Text file busy > bolivia# > > Notice that /usr/sbin/cron is in use (because my system is running > normally) I can copy _from_ that file, but I can not overwrite it. What you can do, however, is: create the new file under a temporary name, delete the original, and rename the temp file to the orignal's name, which is what /usr/bin/install does. I've done many installworlds on running systems without problems. -- Dan Nelson [EMAIL PROTECTED] ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: load > 1, no process using >10% CPU...?
In the last episode (Apr 19), Damian Gerow said: > Thus spake Damian Gerow ([EMAIL PROTECTED]) [19/04/05 21:21]: > : I'm a little fuzzy as to /how/ load is calculated, but why would my > : system think that it's doing all kinds of work when ps, top, and > : systat can't really tell me /what/ it's doing? > > It turned out to be a runaway xmms process. But I still find it > strange that it didn't show anything obvious in top. If xmms is threaded, you probably got bit by the "libpthread doesn't do process CPU accounting" bug. Most threaded processes will just show up as 0 %CPU in top, no matter what they're doing. The rusage stats are handled correctly, though, so look for processes whose TIME value is increasing at one (or more if you're SMP) seconds per second. -- Dan Nelson [EMAIL PROTECTED] ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: gstat and scripting
In the last episode (Apr 21), Ronald Klop said: > The tool gstat can produce very nice stats. Can I get these stats > from the system periodicly for use in my one scripts/graphs? Is there > a sysctl like kern.ad0.reads? Or some other way of retreiving this > info from the kernel. Looking at the gstat output, the numbers must > be in there. You can use net-snmp and mrtg to graph disk stats. I suggest applying the patch at http://sourceforge.net/tracker/index.php?func=detail&aid=1085243&group_id=12694&atid=312694 so you get 64-bit counters (32-bit counters roll over too fast to be useful): $ snmptable -v2c localhost diskiotable SNMP table: enterprises.ucdavis.ucdExperimental.ucdDiskIOMIB.diskIOTable diskIOIndex diskIODevice diskIONRead diskIONWritten diskIOReads diskIOWrites diskIONReadX diskIONWrittenX diskIOLA1 diskIOLA5 diskIOLA15 1 da0 3682573440 4134971392 7734458 71468595 68107082880828768692224 1794 398138 2 cd0 24 0 30 24 0 0 0 0 3 cd1 911237260 0 139320 911237260 0 0 0 0 4pass01622 24 401 1622 24 0 0 0 5pass1 0 0 00 0 0 0 0 0 6pass2 57676 2710198624330242848 57676 2710198624 0 0 0 If you graph diskIONReadX and diskIONWrittenX over time, you'll get a nice graph of throughput. -- Dan Nelson [EMAIL PROTECTED] ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
panic: mutex tty owned at ../../../kern/kern_event.c:1487
Got the following on a 2004-04-21 -stable kernel: panic messages: --- panic: mutex tty owned at ../../../kern/kern_event.c:1487 cpuid = 0 KDB: stack backtrace: kdb_backtrace(c07c93fb,0,c07aed2c,e97b799c,c3d05780) at kdb_backtrace+0x2e panic(c07aed2c,c07a955e,c07acb39,5cf,c383c800) at panic+0x139 _mtx_assert(c383c910,2,c07acb39,5cf,c383c800) at _mtx_assert+0xf3 knote(c383c880,0,0) at knote+0x3d ttwakeup(c383c800,c383c800,c4cc3e8c,1,c07aeb7a) at ttwakeup+0x89 ttyinput(d,c383c800,0,0,4d) at ttyinput+0x8d4 ttypend(c383c800,c383c800,e97b7a80,c05e9b80,c383c800) at ttypend+0x6c ttnread(c383c800,c2803374,c327d700,e97b7aa4,c0596b3c) at ttnread+0x1b filt_ttyread(c2803374,0,c07acb39,5dd,c383c800) at filt_ttyread+0x20 knote(c383c880,0,0) at knote+0xcc ttwakeup(c383c800,c07b377e,32e,32d,c05aa530) at ttwakeup+0x89 ttioctl(c383c800,802c7415,c39864c0,3,c3d05780) at ttioctl+0xc6b ttyioctl(c2bce300,802c7415,c39864c0,3,c3d05780) at ttyioctl+0x65 ptyioctl(c2bce300,802c7415,c39864c0,3,c3d05780) at ptyioctl+0x2a8 spec_ioctl(e97b7c00,e97b7cac,c06233e4,e97b7c00,0) at spec_ioctl+0x17c spec_vnoperate(e97b7c00,0,c07b6fe6,30d,c080d640) at spec_vnoperate+0x18 vn_ioctl(c3968198,802c7415,c39864c0,c3a00a00,c3d05780) at vn_ioctl+0x204 ioctl(c3d05780,e97b7d14,c,431,3) at ioctl+0x448 syscall(2f,2f,2f,1,1) at syscall+0x2a0 Xint0x80_syscall() at Xint0x80_syscall+0x1f --- syscall (54, FreeBSD ELF32, ioctl), eip = 0x2828533f, esp = 0xbfaed66c, ebp = 0xbfaed6c8 --- boot() called on cpu#0 Uptime: 2d1h39m7s Dumping 1023 MB Compressing 64>22 128>46 192>71 256>95 320>119 384>143 448>167 512>191 576>215 640>233 704>249 768>264 832>273 896>296 960>320 Compressed to 346 MB Dumpsize = 363098112 Dump starting at 442239488 64 128 192 256 320 384 448 512 576 640 704 768 832 896 960 --- #0 doadump () at pcpu.h:159 159 pcpu.h: No such file or directory. in pcpu.h doadump () at pcpu.h:159 159 in pcpu.h #0 doadump () at pcpu.h:159 #1 0xc05b44aa in boot (howto=260) at ../../../kern/kern_shutdown.c:410 #2 0xc05b4884 in panic (fmt=0xc07aed2c "mutex %s owned at %s:%d") at ../../../kern/kern_shutdown.c:566 #3 0xc05aac23 in _mtx_assert (m=0xc383c910, what=0, file=0xc07acb39 "../../../kern/kern_event.c", line=1487) at ../../../kern/kern_mutex.c:753 #4 0xc0596aad in knote (list=0xc383c880, hint=0, islocked=0) at ../../../kern/kern_event.c:1487 #5 0xc05eb699 in ttwakeup (tp=0xc383c800) at ../../../kern/tty.c:2374 #6 0xc05e83b4 in ttyinput (c=13, tp=0xc383c800) at ../../../kern/tty.c:601 #7 0xc05ea23c in ttypend (tp=0xc383c800) at ../../../kern/tty.c:1658 #8 0xc05e9c5b in ttnread (tp=0xc383c800) at ../../../kern/tty.c:1352 #9 0xc05e9b80 in filt_ttyread (kn=0xc2803374, hint=0) at ../../../kern/tty.c:1313 #10 0xc0596b3c in knote (list=0xc383c880, hint=0, islocked=0) at ../../../kern/kern_event.c:1504 #11 0xc05eb699 in ttwakeup (tp=0xc383c800) at ../../../kern/tty.c:2374 #12 0xc05e937b in ttioctl (tp=0xc383c800, cmd=2150396949, data=0xc39864c0, flag=3) at ../../../kern/tty.c:1064 #13 0xc05ec6f5 in ttyioctl (dev=0x0, cmd=2150396949, data=0xc39864c0 "\006\t", flag=3, td=0x0) at ../../../kern/tty.c:2917 #14 0xc05ef0f8 in ptyioctl (dev=0xc2bce300, cmd=2150396949, data=0xc39864c0 "\006\t", flag=0, td=0x0) at ../../../kern/tty_pty.c:623 #15 0xc056da0c in spec_ioctl (ap=0xe97b7c00) at ../../../fs/specfs/spec_vnops.c:357 #16 0xc056d038 in spec_vnoperate (ap=0x0) at ../../../fs/specfs/spec_vnops.c:118 #17 0xc06233e4 in vn_ioctl (fp=0xc3968198, com=2150396949, data=0xc39864c0, active_cred=0xc3a00a00, td=0xc3d05780) at vnode_if.h:503 #18 0xc05dbae8 in ioctl (td=0xc3d05780, uap=0xe97b7d14) at file.h:257 #19 0xc07535e0 in syscall (frame={tf_fs = 47, tf_es = 47, tf_ds = 47, tf_edi = 1, tf_esi = 1, tf_ebp = -1079060792, tf_isp = -377782924, tf_ebx = 674260420, tf_edx = 140677888, tf_ecx = -2144570347, tf_eax = 54, tf_trapno = 12, tf_err = 2, tf_eip = 673731391, tf_cs = 31, tf_eflags = 582, tf_esp = -1079060884, tf_ss = 47}) at ../../../i386/i386/trap.c:1001 #20 0xc073e38f in Xint0x80_syscall () at ../../../i386/i386/exception.s:201 #21 0x002f in ?? () [garbage] #48 0xc3585900 in ?? () #49 0xc05c6f90 in sched_switch (td=0x1, newtd=0x283065c4, flags=---Can't read userspace from dump, or kernel process--- ) at ../../../kern/sched_4bsd.c:881 gdbcom:2: Error in sourced command file: Previous frame inner to this frame (corrupt stack?) (kgdb) -- Dan Nelson [EMAIL PROTECTED] ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: panic: mutex tty owned at ../../../kern/kern_event.c:1487
In the last episode (Apr 23), John-Mark Gurney said: > Dan Nelson wrote this message on Sat, Apr 23, 2005 at 12:28 -0500: > > Got the following on a 2004-04-21 -stable kernel: > > > > panic messages: > > --- > > panic: mutex tty owned at ../../../kern/kern_event.c:1487 > > cpuid = 0 > > I can whip up a patch if you want to try it (and can easily reproduce)... I tried repeating the panic (hitting ^C in an app partway through its startup routine), but it's not cooperating, unfortunately. -- Dan Nelson [EMAIL PROTECTED] ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: nss_ldap / top startup
In the last episode (Apr 25), Oliver Brandmueller said: > I have some servers running running on 5.4-STABLE as of Apr 5th. I > use nss_ldap for a userbase of currently about 24000 accounts (will > be growing to approx 6 in the next weeks). I don't use pam_ldap > currently, because users only need to login by IMAP, POP, SMTP and > FTP, for all of these services daemons are used which natively auth > against the LDAP server. > > The more accounts there are in the LDAP directory, the longer the > startup of "top" takes. With the current userbase top takes about 3-4 > seconds to start (on a mostly idle Dual Xeon 2.8GHz with fast disks > and local slapd). > > The startup time is not any different, sometimes I feel (did not try > to measure) it's even longer, if I use "top -u" to not map uids. The > running processes are only from a few uids, all the LDAP users > usually don't have processes running under thier IDs. You can benchmark top by running "time top -d1", which will print one page then immediately exit. > Any ideas, why this is happening? Will I need 10 seconds, when there > are 6 accounts in LDAP? :-) Try editing /usr/src/usr.bin/top/Makefile, add -DRANDOM_PW, and rebuild. That should probably be the default on FreeBSD anyway. -- Dan Nelson [EMAIL PROTECTED] ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Trek Technology Thumbdrive 16MB
In the last episode (May 04), Gerrit Khn said: > has anyone used the subject successfully with -stable? umass(4) > mentions the 8MB version, so I thought 16MB should actually work, > too. However, when I plug the device in, I just get > > ugen0: Trek Technology ThumbDrive, rev 1.10/1.00, addr 2 > > and no umass device. > > usbdevs shows > > port 3 addr 2: full speed, power 40 mA, config 1, ThumbDrive(0x), Trek > Technology(0x0a16), rev 1.00, device ugen0 umass only attaches to devices it recognizes. There are entries for both the thumbdrive and thumbdrive_8mb in usbdevs, but only the 8mb version is in umass.c. Try copying the entry and changing USB_PRODUCT_TREK_THUMBDRIVE_8MB to USB_PRODUCT_TREK_THUMBDRIVE, and see if that works. -- Dan Nelson [EMAIL PROTECTED] ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: pthreads and nagios issue
In the last episode (May 06), Christophe Yayon said: > i am upgrading our nagios 1.2 (on freebsd 5.3-release) to nagios 2.0 > (currently last cvs after 2.0b3) on Freebsd-5.4RC3 and i saw a very > strange thing. > > After few hours, nagios main process (nagios -d ...) use lot of cpu > time and when i do a truss on the pid, i have a "kse_release" loop > message. Truss hasn't been updated to handle kse or thr threads yet, so don't rely on that output. ktrace shouldl still work, as will using gcore and gdb to get stack traces. -- Dan Nelson [EMAIL PROTECTED] ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Outdated lib*_p.a files
In the last episode (May 09), Jason C. Wells said: > I run a homegrown script after upgrades to find outdated binaries. I have > a bunch of files name /usr/lib/lib*_p.a that predate my recent upgrade to > 5.4-RELEASE. What are these? Can they be deleted without harm? Those are versions of libraries built with profiling code. If you have NOPROFILE set in your make.conf, you should remove them from /usr/lib. -- Dan Nelson [EMAIL PROTECTED] ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Apache + Caching DNS: conflict at bootup? (DNS runs too late)
In the last episode (May 09), Colin Percival said: > Rob wrote: > > Some time ago, there was (or still is) a similar conflict with > > hostname resolution at bootup when using ntpd. > > Yes, but not with named -- the problem was only when using a dns > cache from the ports tree, since those are started later in the boot > sequence. I always put two nameserver lines in my resolv.conf, even on machines running bind (where the first line is 127.0.0.1). That way if programs are started before bind, they can still do DNs lookups. -- Dan Nelson [EMAIL PROTECTED] ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: PTHREAD_INVARIANTS in 5.x
In the last episode (May 12), Scott Long said: > Daniel Eischen wrote: > >On Wed, 11 May 2005, Jonathan Noack wrote: > >>I checked out _PTHREADS_INVARIANTS for libthr and libpthread on CURRENT. > >> As far as I can tell, all but one of the defines under > >>_PTHREADS_INVARIANTS are ASSERTs; they check for a condition and if it > >>is false result in a fatal error. These should be very visible if > >>they are being tripped. Only MUTEX_INIT_LINK actually *does* > >>something. It is defined in src/lib/libpthread/thread/thr_mutex.c > >>at lines 43-46 and in src/lib/libthr/thread/thr_mutex.c at lines > >>44-47: > > > >This is way overblown and they're other areas for much better > >optimizations than worrying about a couple of instructions. Perhaps > >if it were called _PTHREAD_ROBUST instead of _PTHREAD_INVARIANTS, > >noone would notice ;-) > > Yes, the check for the cross-linked threads libraries is still quite > useful. However, we gave a general policy of turning off most other > debugging and invariants tools for production releases. A good > example is the malloc debugging options that are on in HEAD and off > in RELENG_5. Would we be able to reach a compromise similar to that? The malloc flags can cause serious performance issues, though, since they basically force a memory fill before every malloc and after every free. On the other hand, shouldn't there be a better way of detecting cross-linked threads libraries than dieing because some internal mutex isn't initialized? Maybe set __isthreaded to 1, 2, or 3 (or (int)'c_r\0', 'kse\0', 'thr\0', to allow for even more threads libs)? -- Dan Nelson [EMAIL PROTECTED] ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Boot loader cant identify ntfs?
In the last episode (May 18), Mike Jakubik said: > Could someone tell me why our bootloader still can not recognize a > ntfs partition, and report it as Windows instead of displaying "??" ? The next release should: RCS file: /home/ncvs/src/sys/boot/i386/boot0/boot0.S,v revision 1.14 date: 2005/02/08 20:43:04; author: des; state: Exp; lines: +2 -2 Remove type 0x4 (FAT12 <32MB) to make room for type 0x7 (NTFS). revision 1.10.2.4 (RELENG_5) date: 2005/04/21 15:42:28; author: obrien; state: Exp; lines: +2 -2 MFC: rev 1.14: remove type 0x4 (FAT12 <32MB) to make room for type 0x7 (NTFS). -- Dan Nelson [EMAIL PROTECTED] ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: panic in recent RELENG_5 tcp code path
In the last episode (May 20), Kris Kennaway said: > On Fri, May 20, 2005 at 05:15:36PM +0400, Gleb Smirnoff wrote: > > On Fri, May 20, 2005 at 03:10:32PM +0200, Jeremie Le Hen wrote: > > J> I'm going to recompile my kernel with INVARIANTS but I wonder in > > J> which order of magniture it will slow my kernel down. In other > > J> words, what does INVARIANTS do concretely, shall I expect a > > J> performance drop like WITNESS does ? > > > > No. The performance loss is _much_ less significant than in WITNESS > > case. You probably will not notice it. > > Actually, INVARIANTS causes about a 10% penalty on wall clock time on > 5.x and above. Which is a lot less of a hit than WITNESS is, to be sure. WITNESS is like walking in mud :) Do you know if INVARIANT_SUPPORT by itself is enough to cause the 10% slowdown? That turns on LOCK_DEBUG, which in turn disables inlining of mutex macros. -- Dan Nelson [EMAIL PROTECTED] ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: savecore: first and last dump headers disagree
In the last episode (May 23), Palle Girgensohn said: > We have an amd64 system that still experiences crashes after > installing 5.4, mostly during high loads. (It's been unstable all the > time, really; see previous posts.) > > I've added dumpdev="/dev/amrd0s2b", and some time ago I did get coredumps, > but with latest versions of the kernel, savecore does not give me a dump, > instead it says: > > savecore: first and last dump headers disagree on /dev/amrd0s2b > savecore: unsaved dumps found but not saved "savecore -vv" should print enough of both headers to let you see what's different. > Fatal trap 12: page fault while in kernel mode > cpuid = 0: apic id = 00 > fault virtual address= 0x00 > ... > trap number = 12 > panic: page fault > cpuid = 0 > boot() called on cpu#0 > Uptime: 1d23h50m36s > Dumping 2047 MB > 16 32 > > The cursor sits at the position after "32". That's probably why your headers disagree :) If you put "options KDB_TRACE" in your kernel config file, it will print a small stack trace before trying to dump, which might be enough to track down the cause of the panic even without a dump. -- Dan Nelson [EMAIL PROTECTED] ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: good performing SCSI RAID5 ( was: asr ( 2015S ) support in 5.4 amd64? )
In the last episode (Jun 01), Steven Hartland said: > Thanks for that Bruce I'm quite surprised that these numbers are so > low after playing with a cheapo hightech SATA controller which with > the help of the guys on the list I was able to give out 200MB/s I > really would expect the relatively expensive SCSI controllers to do > significantly better especially as they have superior disks attached > ( 10K vs 7k2 ) and not performance which is well below ( 1/2 ) that > expected of a single disk. The faster rpms will get you more concurrent I/Os per second but won't do as much for throughput. My asr 3200S cards got repurposed before I could try them with 5.x, but with the 370F firmware I'm pretty sure I was able to get more than 40MB/sec reads out of them on 4.x with 4-disk RAID5 sets. Since the asr driver needs Giant, try a UP kernel and see if it goes any faster. -- Dan Nelson [EMAIL PROTECTED] ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: libc.so.4 & libc_r.so.4 in ices0
In the last episode (Jun 23), Kevin S. Brackett said: > libc.so.4 => /usr/lib/libc.so.4 (0x28755000) > libc_r.so.4 => /usr/lib/libc_r.so.4 (0x287ee000) > > any ideas why it's doing this, and what the fix is? Looks fine to me: libc.so.5 => /lib/libc.so.5 (0x2875c000) libpthread.so.1 => /usr/lib/libpthread.so.1 (0x28836000) Is this a machine recently upgraded from 4.*? Does "ldd -a ices" indicate that those libs are being pulled in as dependencies of another library? If so, rebuild that port, then rebuild ices. Here is a script to find all the binaries linked to superceded port libs and libs directly linked to threads libs: #! /bin/sh ( find -s /usr/local/lib /usr/X11R6/lib -name "lib*.so" find -s /usr/local/bin /usr/X11R6/bin/ ) | xargs ldd -a 2>/dev/null | awk ' /^[^\t]/ { cmd=$1 } /^\t.*\/compat\// { printf "%s\t%s\n",cmd,$3 } /^\t(libc_r|libpthread|libthr).so/ { printf "%s\t%s\n",cmd,$3 } ' -- Dan Nelson [EMAIL PROTECTED] ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Upgrading from 4.x to 5.x ... possible?
In the last episode (Jul 07), Marc G. Fournier said: > Without having to rebuild from scratch, is this something that is > possible, or have the changes become so great as to make this > undesirable? That's what the Upgrade menu item in sysinstall is for :) It'll save a copy of /etc to a safe place then copy the passwd file back after the install. You'll want to rebuild all your ports, but most should still work until you do. A fresh install is always cleaner, but I've upgraded some of my servers from 2.2.8 -> 4.* -> 5.* with no problems. -- Dan Nelson [EMAIL PROTECTED] ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: tcp troughput weirdness
In the last episode (Jul 12), David Malone said: > > did the trick! now can someone remind me what inflight does? and > > could someone explain why increasing sendspace alone did not do the > > trick? (i had it at 64k, which got things better, but not > > sufficient). > > TCP inflight limiting is supposed to guess the bandwidth-delay > product for a TCP connection and stop the window expanding much > above this. It's a pretty neat idea for DSL links that often have > huge buffers at the far end, where inflight limiting can prevent > delays to interactive traffic. > > However, some of the guys I know that work on TCP dynamics reckon > that they can they can come up with situations where inflight > limiting will break. Unfortunately, I haven't had time to talk > this through with them. I guess you may have found one of those > situations ;-) You might want to apply the patch at the bottom of http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/75122 ; without it, new connections get a random initial bandwidth. -- Dan Nelson [EMAIL PROTECTED] ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Q: RT32 (Request Tracker) + jail
In the last episode (Jul 19), J. Nyhuis said: > I would like to have RT running in a jailed environment. The > challenge, it seems, will be to get sendmail running in the same > jailed environment as RT and the other components. > For those not so familiar with the components of RT, the > jail would include apache1.3+modperl, MySQL, sendmail, and RT. > That's a lot of stuff to get working in there! (but fortunately > FreeBSD jails seem straightforward and easy) ^_^ > I expect sendmail to be the real problem of the above bunch. Sendmail should do just fine, I'd think. -- Dan Nelson [EMAIL PROTECTED] ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: QLogic QLA210 OEM (Sun) Fibre Cards and FreeBSD 5.x
In the last episode (Jul 26), Eli K. Breen said: > Has anyone here had any success getting the Sun-branded Qlogic 2Gb fibre > adapters (QLA210) working under FreeBSD? > > Apparently these boards should work as they're compatible with the > QLA2200/2300 stack (and therefore should work with the isp driver) but > when booting I see the following: > > pcib3: at device 0.2 on pci1 > pci3: on pcib3 > pci3: at device 11.0 (no driver attached) > > I do have the ISP driver in the kernel. Adding the PCI IDs to isp_pci.c may be enough to get it working. -- Dan Nelson [EMAIL PROTECTED] ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Consistent file system hang with RELENG_6 of today ...
In the last episode (Jul 30), Marc G. Fournier said: > 'k, this is turning out to be alot of fun ... the only machines I > have here that I can use to talk to the portmaster are Windows boxes > ... can you recommend a client for windows that would do good logging > similar to what 'script' does under FreeBSD? :( Just about any terminal emulator will have a logging or capture option. CRT: File -> Log Session Hyperterm: Transfer -> Receive Text... Putty: Settings -> Session -> Logging -- Dan Nelson [EMAIL PROTECTED] ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: broken port...
In the last episode (Dec 24), FreeBSD said: > cute one though > > Updater failed: Cannot install > "/usr/ports/japanese/tk80/patches/#cvs.cvsup-1394.186" to > "/usr/pobts/japanese/tk80/patches/patch-ab": No such file or directory > > /usr/pobts :) You didn't provide much information, but it looks like you were running cvsup, right? If you run it again, does it have trouble on the same file? 'b' and 'r' are one bit apart from each other (01100010 and 01110010). Sounds like your machine flipped a bit. -- Dan Nelson [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-stable" in the body of the message
Re: Pseudoterminals increase: compilation error
In the last episode (Jul 19), Unga said: > On Sat, 7/19/08, Peter Jeremy wrote: > > On 2008-Jul-18 18:38:36 -0700, Unga wrote: > > >As per FAQ, > > > http://www.freebsd.org/doc/en_US.ISO8859-1/books/faq/admin.html, > > > I tried to increase the number of ptys: "10.19.1 Build and > > > install a new kernel with the line in the configuration file: > > > device pty N where N is the number of requested pseudoterminals." > > > > That has been obsolete for a while. Do you actually have a problem > > with insufficient PTYs? > > Looks like, may not be. > > The Problem: > expect -c "spawn ls" > spawn ls > The system has no more ptys. Ask your system administrator to create > more. while executing "spawn ls" > > It now seems to be a permission problem as explained in > http://expect.nist.gov/FAQ.html#q67 . > > Still investigating. Any help will be very much appreciated. Expect's error message doesn't say anything except "something isn't working but I won't tell you what". Run truss -o truss.log -f expect -c "spawn ls" and determine which syscall is failing, with what error number, just before expect prints its "no more ptys" message. That will tell you whether it's a permissions issue, or something else. If there are no obvious errors, post a part of the log. Also, what version of expect are you running? Versions between 5.38.0_1 and 5.43.0_2 had a bug in the port Makefile that limited the number of ptys expect could see. See http://www.freebsd.org/cgi/query-pr.cgi?pr=108311 . -- Dan Nelson [EMAIL PROTECTED] ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Temperature monitoring on old desktop - Dell OptiPlex SX270?
In the last episode (Aug 03), Torfinn Ingolfsen said: > On Sat, 02 Aug 2008 20:19:12 -0700 Jeremy Chadwick wrote: > > > On Sun, Aug 03, 2008 at 01:50:53AM +0200, Torfinn Ingolfsen wrote: > > The first questions to ask are: 1) does this machine even have a > > H/W monitoring IC on it, and 2) is it enabled/wired to thermistors > > and fans? > > Yes, but so far I haven't found out anything by searching. > > > What processor is in it? Not a Core2Duo. I'm guessing since it's > > circa 2004, probably a Pentium 3 or 4, or possibly an older AMD. > > Pentium 4. From dmesg: > CPU: Intel(R) Pentium(R) 4 CPU 2.60GHz (2593.51-MHz 686-class CPU) > Origin = "GenuineIntel" Id = 0xf29 Stepping = 9 > > Features=0xbfebfbff > Features2=0x4400 > Logical CPUs per core: 2 > > > None of those, to my knowledge, have on-die temperatures -- they all > > rely on external H/W monitoring. > > Ok, so what is the 'TM' feature of this cpu then? >From what I can find on Intel's site for your CPU, TM is an emergency switch that lowers the CPU speed to pervent overheating that could damage the processor. Under normal circumstances, it should never trip, and its on/off status (not temperature) is only readable by two pins on the CPU. It can be disabled and enabled by software, but not monitored. http://download.intel.com/design/Pentium4/datashts/29864312.pdf "The Thermal Monitor feature helps control the processor temperature by activating the Thermal Control Circuit (TCC) when the processor silicon reaches its maximum operating temperature. The TCC reduces processor power consumption by modulating (starting and stopping) the internal processor core clocks. The Thermal Monitor feature must be enabled for the processor to be operating within specifications. The temperature at which Thermal Monitor activates the thermal control circuit is not user configurable and is not software visible." > > I just checked http://tingox.googlepages.com/sx270 and sure enough, an > > older P4. coretemp(4) won't work with this. > > I know, I just thought that ther might be something similar for the > TM feature of Pentium 4's. > > > I would start by booting the machine into Windows and install > > SpeedFan. If that thing is able to detect and provide thermal data, > > Ouch. I was hoping that I wouldn't have to do that. The machine have > no internal CD-drive, and for some reason doesn't want to boot from a > (usb) external cd-drive either (kind of funny - it boots from flash > drives and external hard drives. But cd-rom -no). > > I was hoping to solve this without windows in the picture. -- Dan Nelson [EMAIL PROTECTED] ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: snapshots and disk usage
In the last episode (Sep 07), Stefan `Sec` Zehl said: > Hi, > > I am using ufs snapshots on RELENG_7 for some time now, and am generally > happy with it. I have noticed a strange behaviour when removing large > amount of files, and wanted to ask if this is expected. > > Before starting, we check the free space on /usr: > > | ice:/usr>df -h . > | Filesystem SizeUsed Avail Capacity Mounted on > | /dev/ad4s2.elid9.7G7.6G1.3G64%/usr > > Then delete /usr/obj and run df again: > > | ice:/usr>sudo rm -rf obj 2>/dev/null > | ice:/usr>df -h . > | Filesystem SizeUsed Avail Capacity Mounted on > | /dev/ad4s2.elid9.7G5.7G3.2G64%/usr > > This is unexpected. With snapshots, removing something should not > release space. > > Sure enough, in the course of the next minute, the fake free space > vanishes > > | ice:/usr>df -h . > | Filesystem SizeUsed Avail Capacity Mounted on > | /dev/ad4s2.elid9.7G5.9G3.0G66%/usr > | ice:/usr>df -h . > | Filesystem SizeUsed Avail Capacity Mounted on > | /dev/ad4s2.elid9.7G6.6G2.3G74%/usr > | ice:/usr>df -h . > | Filesystem SizeUsed Avail Capacity Mounted on > | /dev/ad4s2.elid9.7G8.6G269M97%/usr > > and all the free space is allocated in the snapshot: > > | ice:~>sudo snapshot list > | Filesystem User User% Snap Snap% Snapshot > | /usr 8GB 89.3% 2GB 21.5% daily.1 > | /usr 8GB 89.3%344MB3.5% daily.0 > | /usr 8GB 89.3%344MB3.5% weekly.0 > | /usr 8GB 89.3%344MB3.5% hourly.1 > | /usr 8GB 89.3% 7MB0.1% hourly.0 > > My understanding so far was that df may underreport free space, but i > find overreporting it a bit troublesome. -- What would happen if I tried > to use that space before it was allocated to the snapshot? I think you're running into the softupdates delay. When you delete a file on a SU-enabled filessytem, the space isn't actually freed until sync. But applications expect that statfs() info is updated immediately, so the kernel pretends that the space is available. That doesn't really work with a snapshot, since if you delete a file that existed in the snapshot, no space will free up. So you see a jump in freespace as the kernel fakes the f_bfree statfs amount, then it slowly drops to the correct value as the deletions actually sync to disk. -- Dan Nelson [EMAIL PROTECTED] ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Problem with dtrace
In the last episode (Sep 20), Michel Talon said: > Still testing FreeBSD-7.1-beta encountered the following (perhaps > to be expected) result with dtrace: > > dtrace -m kernel -> some output -> deadlock after a few seconds. > > Less demanding tracing worked OK. proc, profile, and syscall probes work fine for me; it seems to be just fbt probes that cause problems. Enabling any one will cause a trap 12 a few instructions inside the probed function when it gets called. -- Dan Nelson [EMAIL PROTECTED] ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: UNEXPECTED SOFT UPDATE INCONSISTENCY; RUN fsck MANUALLY
In the last episode (Sep 30), Andrew Snow said: > Zaphod Beeblebrox wrote: > > Also, there exists data within the ARC (I'm always tempted to say > > the ARC Cache, but that is redundant) that is also then in paging > > memory. > > OK, but one advantage of ZFS memory consumption is under heavy write > loads, where much of the memory is used to store and reorder writes. > The heavy memory consumption under reading is a shame, but ZFS has to > cache and use more metadata than UFS, so its a price you pay for the > extra features and benefits. > > What I think we need is a way to turn off read-caching except for > metadata. This allows ARC to only be used more efficiently. > Currently you can turn all read-ahead on or off, with the provided > sysctl tunables, but would be easy to implement a metadata-only > option. I found that access speed suffers when metadata is not > prefetched. That'd be handy, but at least on my system the data prefetcher isn't really called often enough to make a difference either way (assuming the counts are accurate). Metadata prefetch is a big win, however. ([EMAIL PROTECTED]) /home/dan> uptime 11:00PM up 5 days, 13:47, 21 users, load averages: 1.52, 1.68, 1.69 ([EMAIL PROTECTED]) /home/dan> sysctl kstat [..] kstat.zfs.misc.arcstats.hits: 211130907 (95%) kstat.zfs.misc.arcstats.misses: 9808431 kstat.zfs.misc.arcstats.demand_data_hits: 116614377 (98%) kstat.zfs.misc.arcstats.demand_data_misses: 2477943 kstat.zfs.misc.arcstats.demand_metadata_hits: 55805261 (96%) kstat.zfs.misc.arcstats.demand_metadata_misses: 2310006 kstat.zfs.misc.arcstats.prefetch_data_hits: 79878 (53%) kstat.zfs.misc.arcstats.prefetch_data_misses: 71741 kstat.zfs.misc.arcstats.prefetch_metadata_hits: 38556033 (88%) kstat.zfs.misc.arcstats.prefetch_metadata_misses: 4947270 kstat.zfs.misc.arcstats.mru_hits: 23702582 (95%) kstat.zfs.misc.arcstats.mru_ghost_hits: 1274189 kstat.zfs.misc.arcstats.mfu_hits: 149722171 (98%) kstat.zfs.misc.arcstats.mfu_ghost_hits: 2944572 [..] kstat.zfs.misc.arcstats.p: 235221504 kstat.zfs.misc.arcstats.c: 268435456 kstat.zfs.misc.arcstats.c_min: 67108864 kstat.zfs.misc.arcstats.c_max: 268435456 kstat.zfs.misc.arcstats.size: 263926784 -- Dan Nelson [EMAIL PROTECTED] ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Weird truss output
In the last episode (Dec 03), Vlad GALU said: > I'm running a statically linked binary, which I've built inside a > jail. The jail's libc & co are in sync with the host's. Truss then > shows this: > > -- cut here -- > -- UNKNOWN SYSCALL 1048532 -- > -- UNKNOWN SYSCALL 1048532 -- Is this a threaded app that you attached truss to after it was started? The method that truss uses to catch syscall enter/exit events doesn't indicate whether the event is an enter or an exit, so if you attach while a syscall is active, truss handles the exit event as if it were a syscall entry event, and never gets back in synch. It gets worse with threaded apps because each thread is another chance to get out of synch. Try this patch: Index: i386-fbsd.c === RCS file: /home/ncvs/src/usr.bin/truss/i386-fbsd.c,v retrieving revision 1.29 diff -u -p -r1.29 i386-fbsd.c --- i386-fbsd.c 28 Jul 2007 23:15:04 - 1.29 +++ i386-fbsd.c 3 Dec 2008 15:20:09 - @@ -149,7 +149,14 @@ i386_syscall_entry(struct trussinfo *tru fsc.name = (syscall_num < 0 || syscall_num > nsyscalls) ? NULL : syscallnames[syscall_num]; if (!fsc.name) { -fprintf(trussinfo->outfile, "-- UNKNOWN SYSCALL %d --\n", syscall_num); +fprintf(trussinfo->outfile, "-- UNKNOWN SYSCALL %u (0x%08x) --\n", syscall_num, syscall_num); +if ((unsigned int)syscall_num > 0x1000) { + /* When attaching to a running process, we have a 50-50 chance + of attaching to a process waiting in a syscall, which means + our first trap is an exit instead of an entry and we're out + of synch. Reset our flag */ + trussinfo->curthread->in_syscall = 0; +} } if (fsc.name && (trussinfo->flags & FOLLOWFORKS) -- Dan Nelson [EMAIL PROTECTED] ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Weird truss output
In the last episode (Dec 03), Vlad GALU said: > On Wed, Dec 3, 2008 at 5:23 PM, Dan Nelson <[EMAIL PROTECTED]> wrote: > > In the last episode (Dec 03), Vlad GALU said: > >> I'm running a statically linked binary, which I've built inside a > >> jail. The jail's libc & co are in sync with the host's. Truss then > >> shows this: > >> > >> -- cut here -- > >> -- UNKNOWN SYSCALL 1048532 -- > >> -- UNKNOWN SYSCALL 1048532 -- > > > > Is this a threaded app that you attached truss to after it was > > started? The method that truss uses to catch syscall enter/exit > > events doesn't indicate whether the event is an enter or an exit, > > so if you attach while a syscall is active, truss handles the exit > > event as if it were a syscall entry event, and never gets back in > > synch. It gets worse with threaded apps because each thread is > > another chance to get out of synch. Try this patch: > > You were right, this application was indeed threaded. The messages > still occur, although at a slightly lower rate. One other thing > that's not particularly helpful is this: > > -- cut here-- > read(1074283119,"\M-Ry\^A\0",7356800)= 4 (0x4) > -- and here -- > > I obviously don't have that many descriptors in my process. I can > live with the malformed message, but it's a PITA not to know which fd > the read was actually made from :( It looks like there's some other problem where truss either drops a syscall event, or puts some status fields into the wrong thread's structure. It seems to happen when two threads call blocking syscalls, and when they return, truss gets confused as to which thread called which syscall. The read syscall is probably still pending, and you're getting the arguments of the syscall that returned, printed as if it was the read syscall. When the read call completes, you'll probably get an --UNKNOWN SYSCALL-- line, or another mismatched syscall output line. An alternative it to use ktrace/kdump to trace the process; it's more cumbersome to use, but doesn't have problems with threaded processes. -- Dan Nelson [EMAIL PROTECTED] ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Weird truss output
In the last episode (Dec 03), Dan Nelson said: > It looks like there's some other problem where truss either drops a > syscall event, or puts some status fields into the wrong thread's > structure. It seems to happen when two threads call blocking > syscalls, and when they return, truss gets confused as to which > thread called which syscall. The read syscall is probably still > pending, and you're getting the arguments of the syscall that > returned, printed as if it was the read syscall. When the read call > completes, you'll probably get an --UNKNOWN SYSCALL-- line, or > another mismatched syscall output line. It turns out that was the problem. There was a global structure that held syscall information. Converting it to a per-thread structure makes it work much better :) If you're adventurous, try applying the patch at http://www.evoy.net/FreeBSD/truss.diff , which fixes that problem plus a bunch of other stuff. If you're not adventurous, try the following instead, which just fixes the per-thread problem. Index: i386-fbsd.c === RCS file: /home/ncvs/src/usr.bin/truss/i386-fbsd.c,v retrieving revision 1.29 diff -u -r1.29 i386-fbsd.c --- i386-fbsd.c 28 Jul 2007 23:15:04 - 1.29 +++ i386-fbsd.c 3 Dec 2008 18:48:23 - @@ -49,6 +49,7 @@ #include #include +#include #include #include #include @@ -77,29 +78,29 @@ * 'struct syscall' describes the system call; it may be NULL, however, * if we don't know about this particular system call yet. */ -static struct freebsd_syscall { +struct freebsd_syscall { struct syscall *sc; const char *name; int number; unsigned long *args; int nargs; /* number of arguments -- *not* number of words! */ char **s_args; /* the printable arguments */ -} fsc; +}; /* Clear up and free parts of the fsc structure. */ static __inline void -clear_fsc(void) { - if (fsc.args) { -free(fsc.args); +clear_fsc(struct freebsd_syscall *fsc) { + if (fsc->args) { +free(fsc->args); } - if (fsc.s_args) { + if (fsc->s_args) { int i; -for (i = 0; i < fsc.nargs; i++) - if (fsc.s_args[i]) - free(fsc.s_args[i]); -free(fsc.s_args); +for (i = 0; i < fsc->nargs; i++) + if (fsc->s_args[i]) + free(fsc->s_args[i]); +free(fsc->s_args); } - memset(&fsc, 0, sizeof(fsc)); + memset(fsc, 0, sizeof(*fsc)); } /* @@ -117,9 +118,20 @@ unsigned int parm_offset; struct syscall *sc = NULL; struct ptrace_io_desc iorequest; + struct freebsd_syscall *fsc; + cpid = trussinfo->curthread->tid; - clear_fsc(); + fsc = trussinfo->curthread->fsc; + if (fsc == NULL) + { + fsc = malloc(sizeof(*fsc)); + if (fsc == NULL) + errx(1, "cannot allocate syscall struct"); +memset(fsc, 0, sizeof(*fsc)); +trussinfo->curthread->fsc = fsc; + } else +clear_fsc(fsc); if (ptrace(PT_GETREGS, cpid, (caddr_t)®s, 0) < 0) { @@ -145,17 +157,24 @@ break; } - fsc.number = syscall_num; - fsc.name = + fsc->number = syscall_num; + fsc->name = (syscall_num < 0 || syscall_num > nsyscalls) ? NULL : syscallnames[syscall_num]; - if (!fsc.name) { -fprintf(trussinfo->outfile, "-- UNKNOWN SYSCALL %d --\n", syscall_num); + if (!fsc->name) { +fprintf(trussinfo->outfile, "-- UNKNOWN SYSCALL %u (0x%08x) --\n", syscall_num, syscall_num); +if ((unsigned int)syscall_num > 0x1000) { + /* When attaching to a running process, we have a 50-50 chance + of attaching to a process waiting in a syscall, which means + our first trap is an exit instead of an entry and we're out + of synch. Reset our flag */ + trussinfo->curthread->in_syscall = 0; +} } - if (fsc.name && (trussinfo->flags & FOLLOWFORKS) - && ((!strcmp(fsc.name, "fork") -|| !strcmp(fsc.name, "rfork") -|| !strcmp(fsc.name, "vfork" + if (fsc->name && (trussinfo->flags & FOLLOWFORKS) + && ((!strcmp(fsc->name, "fork") +|| !strcmp(fsc->name, "rfork") +|| !strcmp(fsc->name, "vfork" { trussinfo->curthread->in_fork = 1; } @@ -163,30 +182,30 @@ if (nargs == 0) return; - fsc.args = malloc((1+nargs) * sizeof(unsigned long)); + fsc->args = malloc((1+nargs) * sizeof(unsigned long)); iorequest.piod_op = PIOD_READ_D; iorequest.piod_offs = (void *)parm_offset; - iorequest.piod_addr = fsc.args; + iorequest.piod_addr = fsc->args; iorequest.piod_len = (1+nargs) * sizeof(unsigned long); ptrace(PT_IO, cpid, (caddr_t)&iorequest, 0); if (iorequest.piod_len == 0) return; - if (fsc.name) -
Re: Weird truss output
In the last episode (Dec 03), Vlad GALU said: > On Wed, Dec 3, 2008 at 8:56 PM, Dan Nelson <[EMAIL PROTECTED]> wrote: > [...] > > Am I doing something wrong? I've applied the full diff, rebuilt > truss, now I get this: > -- cut here -- > [EMAIL PROTECTED] / # truss -p 52731 > SIGNAL 17 (SIGSTOP) > > -- UNKNOWN SYSCALL 1048535 -- > -- UNKNOWN SYSCALL 1048496 -- > -- UNKNOWN SYSCALL 1048559 -- > -- UNKNOWN SYSCALL 1048559 -- > -- UNKNOWN SYSCALL -8464 -- > -- UNKNOWN SYSCALL -8464 -- > -- UNKNOWN SYSCALL 527 -- > -- UNKNOWN SYSCALL 527 -- > /100084: read(1074283119,"\M-|\M^WP\^A",7356800) = 4 (0x4) > -- UNKNOWN SYSCALL 527 -- > -- UNKNOWN SYSCALL 7385248 -- > -- and here -- > > Perhaps I should mention that I block all signals from all threads, > except for one, where I do all the handling/cleanup. So you're back to your original behaviour basically? Not sure what's wrong; it all works great on my machine... Are you on a 64-bit system? I only have a Pentium-III here, so the big patch isn't guaranteed to work on anything except i386. The little patch inlined in my previous email is for i386-fbsd.c, but you should be able to make similar changes to amd64-fbsd.c (most of the diff just replaces "fsc." with "fsc->" ). -- Dan Nelson [EMAIL PROTECTED] ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Why is NFSv4 so slow? (root/toor)
In the last episode (Jun 29), Rick C. Petty said: > On Tue, Jun 29, 2010 at 10:20:57AM -0500, Adam Vande More wrote: > > On Tue, Jun 29, 2010 at 9:58 AM, Rick Macklem wrote: > > > > > I suppose if the FreeBSD world feels that "root" and "toor" must both > > > exist in the password database, then "nfsuserd" could be hacked to > > > handle the case of translating uid 0 to "root" without calling > > > getpwuid(). It seems ugly, but if deleting "toor" from the password > > > database upsets people, I can do that. > > > > I agree with Ian on this. I don't use toor either, but have seen people > > use it, and sometimes it will get recommended here for various reasons > > e.g. running a root account with a different default shell. It > > wouldn't bother me having to do this provided it was documented, but > > having to do so would be a POLA violation to many users I think. > > To be fair, I'm not sure this is even a problem. Rick M. only suggested > it as a possibility. I would think that getpwuid() would return the first > match which has always been root. At least that's what it does when > scanning the passwd file; I'm not sure about NIS. If someone can prove > that this will cause a problem with NFSv4, we could consider hackingit. > Otherwise I don't think we should change this behavior yet. If there are multiple users that map to the same userid, nscd on Linux will select one name at random and return it for getpwuid() calls. I haven't seen this behaviour on FreeBSD or Solaris, though. They always seem to return the first entry in the passwd file. -- Dan Nelson dnel...@allantgroup.com ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: 8.1R ZFS almost locking up system
In the last episode (Aug 21), Tim Bishop said: > I've had a problem on a FreeBSD 8.1R system for a few weeks. It seems > that ZFS gets in to an almost unresponsive state. Last time it did it > (two weeks ago) I couldn't even log in, although the system was up, this > time I could manage a reboot but couldn't stop any applications (they > were likely hanging on I/O). Could your pool be very close to full? Zfs will throttle itself when it's almost out of disk space. I know it's "saved" me from filling up my filesystems a couple times :) > A few items from top, including zfskern: > > PID USERNAME THR PRI NICE SIZERES STATE C TIME WCPU COMMAND > 5 root4 -8- 0K60K zio->i 0 54:38 3.47% zfskern > 91775 70 1 440 53040K 31144K tx->tx 1 2:11 0.00% postgres > 39661 tdb 1 440 55776K 32968K tx->tx 0 0:39 0.00% mutt > 14828 root1 470 14636K 1572K tx->tx 1 0:03 0.00% zfs > 11188 root1 510 14636K 1572K tx->tx 0 0:03 0.00% zfs > > At some point during this process my zfs snapshots have been failing to > complete: > > root5 0.8 0.0 060 ?? DL7Aug10 54:43.83 [zfskern] > root 8265 0.0 0.0 14636 1528 ?? D10:00AM 0:03.12 zfs snapshot > -r po...@2010-08-21_10:00:01--1d > root11188 0.0 0.1 14636 1572 ?? D11:00AM 0:02.93 zfs snapshot > -r po...@2010-08-21_11:00:01--1d > root14828 0.0 0.1 14636 1572 ?? D12:00PM 0:03.04 zfs snapshot > -r po...@2010-08-21_12:00:00--1d > root17862 0.0 0.1 14636 1572 ?? D 1:00PM 0:01.96 zfs snapshot > -r po...@2010-08-21_13:00:01--1d > root20986 0.0 0.1 14636 1572 ?? D 2:00PM 0:02.07 zfs snapshot > -r po...@2010-08-21_14:00:01--1d procstat -k on some of these processes might help to pinpoint what part of the zfs code they're all waiting in. -- Dan Nelson dnel...@allantgroup.com ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: 8.1R ZFS almost locking up system
In the last episode (Aug 31), Tim Bishop said: > On Sat, Aug 21, 2010 at 05:24:29PM -0500, Dan Nelson wrote: > > In the last episode (Aug 21), Tim Bishop said: > > > A few items from top, including zfskern: > > > > > > PID USERNAME THR PRI NICE SIZERES STATE C TIME WCPU COMMAND > > > 5 root4 -8- 0K60K zio->i 0 54:38 3.47% zfskern > > > 91775 70 1 440 53040K 31144K tx->tx 1 2:11 0.00% > > > postgres > > > 39661 tdb 1 440 55776K 32968K tx->tx 0 0:39 0.00% mutt > > > 14828 root1 470 14636K 1572K tx->tx 1 0:03 0.00% zfs > > > 11188 root1 510 14636K 1572K tx->tx 0 0:03 0.00% zfs > > > > > > At some point during this process my zfs snapshots have been failing to > > > complete: > > > > > > root5 0.8 0.0 060 ?? DL7Aug10 54:43.83 [zfskern] > > > root 8265 0.0 0.0 14636 1528 ?? D10:00AM 0:03.12 zfs > > > snapshot -r po...@2010-08-21_10:00:01--1d > > > root11188 0.0 0.1 14636 1572 ?? D11:00AM 0:02.93 zfs > > > snapshot -r po...@2010-08-21_11:00:01--1d > > > root14828 0.0 0.1 14636 1572 ?? D12:00PM 0:03.04 zfs > > > snapshot -r po...@2010-08-21_12:00:00--1d > > > root17862 0.0 0.1 14636 1572 ?? D 1:00PM 0:01.96 zfs > > > snapshot -r po...@2010-08-21_13:00:01--1d > > > root20986 0.0 0.1 14636 1572 ?? D 2:00PM 0:02.07 zfs > > > snapshot -r po...@2010-08-21_14:00:01--1d > > > > procstat -k on some of these processes might help to pinpoint what part of > > the zfs code they're all waiting in. > > It happened again this Saturday (clearly something in the weekly > periodic run is triggering the issue). procstat -kk shows the following > for processes doing something zfs related (where zfs related means the > string 'zfs' in the procstat -kk output): > > 0 100084 kernel zfs_vn_rele_task mi_switch+0x16f > sleepq_wait+0x42 _sleep+0x31c taskqueue_thread_loop+0xb7 fork_exit+0x118 > fork_trampoline+0xe > 5 100031 zfskern arc_reclaim_thre mi_switch+0x16f > sleepq_timedwait+0x42 _cv_timedwait+0x129 arc_reclaim_thread+0x2d1 > fork_exit+0x118 fork_trampoline+0xe > 5 100032 zfskern l2arc_feed_threa mi_switch+0x16f > sleepq_timedwait+0x42 _cv_timedwait+0x129 l2arc_feed_thread+0x1be > fork_exit+0x118 fork_trampoline+0xe > 5 100085 zfskern txg_thread_enter mi_switch+0x16f > sleepq_wait+0x42 _cv_wait+0x111 txg_thread_wait+0x79 txg_quiesce_thread+0xb5 > fork_exit+0x118 fork_trampoline+0xe > 5 100086 zfskern txg_thread_enter mi_switch+0x16f > sleepq_wait+0x42 _cv_wait+0x111 zio_wait+0x61 dsl_pool_sync+0xea > spa_sync+0x355 txg_sync_thread+0x195 fork_exit+0x118 fork_trampoline+0xe >17 100040 syncer -mi_switch+0x16f > sleepq_wait+0x42 _cv_wait+0x111 txg_wait_synced+0x7c zil_commit+0x416 > zfs_sync+0xa6 sync_fsync+0x184 sync_vnode+0x16b sched_sync+0x1c9 > fork_exit+0x118 fork_trampoline+0xe > 2210 100156 syslogd -mi_switch+0x16f > sleepq_wait+0x42 _cv_wait+0x111 txg_wait_open+0x85 zfs_freebsd_write+0x378 > VOP_WRITE_APV+0xb2 vn_write+0x2d7 dofilewrite+0x85 kern_writev+0x60 > writev+0x41 syscall+0x1e7 Xfast_syscall+0xe1 > 3500 100177 syslogd -mi_switch+0x16f > sleepq_wait+0x42 _cv_wait+0x111 txg_wait_open+0x85 zfs_freebsd_write+0x378 > VOP_WRITE_APV+0xb2 vn_write+0x2d7 dofilewrite+0x85 kern_writev+0x60 > writev+0x41 syscall+0x1e7 Xfast_syscall+0xe1 > 3783 100056 syslogd -mi_switch+0x16f > sleepq_wait+0x42 _cv_wait+0x111 txg_wait_open+0x85 zfs_freebsd_write+0x378 > VOP_WRITE_APV+0xb2 vn_write+0x2d7 dofilewrite+0x85 kern_writev+0x60 > writev+0x41 syscall+0x1e7 Xfast_syscall+0xe1 > 4064 100165 mysqld initial thread mi_switch+0x16f > sleepq_wait+0x42 _cv_wait+0x111 txg_wait_open+0x85 dmu_tx_assign+0x16c > zfs_inactive+0xd9 zfs_freebsd_inactive+0x1a vinactive+0x6a vputx+0x1cc > vn_close+0xa1 vn_closefile+0x5a _fdrop+0x23 closef+0x3b kern_close+0x14d > syscall+0x1e7 Xfast_syscall+0xe1 > 4441 100224 python2.6initial thread mi_switch+0x16f > sleepq_wait+0x42 _cv_wait+0x111 txg_wait_open+0x85 dmu_tx_assign+0x16c > zfs_inactive+0xd9 zfs_freebsd_inactive+0x1a vinactive+0x6a vputx+0x1cc > null_reclaim+0xbc vgonel+0x12e vrecycle+0x7d null_inactive+0x1f > vinactive+0x6a vputx+0x1cc vn_close+0xa1 vn_closefile+0x5a _fdrop+0x23 > 100227 python2.6initial thread mi_swit
Re: 8.1R ZFS almost locking up system
In the last episode (Sep 01), Tim Bishop said: > On Tue, Aug 31, 2010 at 10:58:29AM -0500, Dan Nelson wrote: > > In the last episode (Aug 31), Tim Bishop said: > > > It happened again this Saturday (clearly something in the weekly > > > periodic run is triggering the issue). procstat -kk shows the > > > following for processes doing something zfs related (where zfs related > > > means the string 'zfs' in the procstat -kk output): > > > > > > 0 100084 kernel zfs_vn_rele_task mi_switch+0x16f > > > sleepq_wait+0x42 _sleep+0x31c taskqueue_thread_loop+0xb7 fork_exit+0x118 > > > fork_trampoline+0xe > > > 5 100031 zfskern arc_reclaim_thre mi_switch+0x16f > > > sleepq_timedwait+0x42 _cv_timedwait+0x129 arc_reclaim_thread+0x2d1 > > > fork_exit+0x118 fork_trampoline+0xe > > > 5 100032 zfskern l2arc_feed_threa mi_switch+0x16f > > > sleepq_timedwait+0x42 _cv_timedwait+0x129 l2arc_feed_thread+0x1be > > > fork_exit+0x118 fork_trampoline+0xe > > > 5 100085 zfskern txg_thread_enter mi_switch+0x16f > > > sleepq_wait+0x42 _cv_wait+0x111 txg_thread_wait+0x79 > > > txg_quiesce_thread+0xb5 fork_exit+0x118 fork_trampoline+0xe > > > 5 100086 zfskern txg_thread_enter mi_switch+0x16f > > > sleepq_wait+0x42 _cv_wait+0x111 zio_wait+0x61 dsl_pool_sync+0xea > > > spa_sync+0x355 txg_sync_thread+0x195 fork_exit+0x118 fork_trampoline+0xe > > >17 100040 syncer -mi_switch+0x16f > > > sleepq_wait+0x42 _cv_wait+0x111 txg_wait_synced+0x7c zil_commit+0x416 > > > zfs_sync+0xa6 sync_fsync+0x184 sync_vnode+0x16b sched_sync+0x1c9 > > > fork_exit+0x118 fork_trampoline+0xe > > > 2210 100156 syslogd -mi_switch+0x16f > > > sleepq_wait+0x42 _cv_wait+0x111 txg_wait_open+0x85 > > > zfs_freebsd_write+0x378 VOP_WRITE_APV+0xb2 vn_write+0x2d7 > > > dofilewrite+0x85 kern_writev+0x60 writev+0x41 syscall+0x1e7 > > > Xfast_syscall+0xe1 > > > 3500 100177 syslogd -mi_switch+0x16f > > > sleepq_wait+0x42 _cv_wait+0x111 txg_wait_open+0x85 > > > zfs_freebsd_write+0x378 VOP_WRITE_APV+0xb2 vn_write+0x2d7 > > > dofilewrite+0x85 kern_writev+0x60 writev+0x41 syscall+0x1e7 > > > Xfast_syscall+0xe1 > > > 3783 100056 syslogd -mi_switch+0x16f > > > sleepq_wait+0x42 _cv_wait+0x111 txg_wait_open+0x85 > > > zfs_freebsd_write+0x378 VOP_WRITE_APV+0xb2 vn_write+0x2d7 > > > dofilewrite+0x85 kern_writev+0x60 writev+0x41 syscall+0x1e7 > > > Xfast_syscall+0xe1 > > > 4064 100165 mysqld initial thread mi_switch+0x16f > > > sleepq_wait+0x42 _cv_wait+0x111 txg_wait_open+0x85 dmu_tx_assign+0x16c > > > zfs_inactive+0xd9 zfs_freebsd_inactive+0x1a vinactive+0x6a vputx+0x1cc > > > vn_close+0xa1 vn_closefile+0x5a _fdrop+0x23 closef+0x3b kern_close+0x14d > > > syscall+0x1e7 Xfast_syscall+0xe1 > > > 4441 100224 python2.6initial thread mi_switch+0x16f > > > sleepq_wait+0x42 _cv_wait+0x111 txg_wait_open+0x85 dmu_tx_assign+0x16c > > > zfs_inactive+0xd9 zfs_freebsd_inactive+0x1a vinactive+0x6a vputx+0x1cc > > > null_reclaim+0xbc vgonel+0x12e vrecycle+0x7d null_inactive+0x1f > > > vinactive+0x6a vputx+0x1cc vn_close+0xa1 vn_closefile+0x5a _fdrop+0x23 > > > 100227 python2.6initial thread mi_switch+0x16f > > > sleepq_wait+0x42 _cv_wait+0x111 txg_wait_open+0x85 dmu_tx_assign+0x16c > > > zfs_inactive+0xd9 zfs_freebsd_inactive+0x1a vinactive+0x6a vputx+0x1cc > > > null_reclaim+0xbc vgonel+0x12e vrecycle+0x7d null_inactive+0x1f > > > vinactive+0x6a vputx+0x1cc vn_close+0xa1 vn_closefile+0x5a _fdrop+0x23 > > > 4445 100228 python2.6initial thread mi_switch+0x16f > > > sleepq_wait+0x42 _cv_wait+0x111 txg_wait_open+0x85 dmu_tx_assign+0x16c > > > zfs_inactive+0xd9 zfs_freebsd_inactive+0x1a vinactive+0x6a vputx+0x1cc > > > null_reclaim+0xbc vgonel+0x12e vrecycle+0x7d null_inactive+0x1f > > > vinactive+0x6a vputx+0x1cc vn_close+0xa1 vn_closefile+0x5a _fdrop+0x23 > > > 4446 100229 python2.6initial thread mi_switch+0x16f > > > sleepq_wait+0x42 _cv_wait+0x111 txg_wait_open+0x85 dmu_tx_assign+0x16c > > > zfs_inactive+0xd9 zfs_freebsd_inactive+0x1a vinactive+0x6a vputx+0x1cc > > > null_reclaim+0xbc vgonel+0x12e vrecycle+0x7d null_inactive+0x1f > > > vinactive+0x6a vputx+0x1cc vn_close+0xa1 vn_closefile+0x5a _fdrop+0x23 > > > 4447 100
Re: TTY task group scheduling
In the last episode (Nov 19), Alexander Leidinger said: > Quoting Alexander Best (from Fri, 19 Nov 2010 00:17:10 > +): > > 17:51 @ Genesys : Luigi Rizzo had a plugabble scheduler back in 4.* or > > thereabouts > > 17:51 @ Genesys : you could kldload new ones and switch to them on the fly > > 17:52 @ arundel : wow. that sounds cool. too bad it didn't make it > > into src tree. by now it's probably outdated and needs to be reworked quite > > a bit. > > > > > > does anybody know something about this? > > I'm aware of the I/O scheduling code (which is now available at least > in -current), but I do not remember CPU scheduling code from Luigi. > Are you sure Genesys didn't mix up something by accident? I am rarely mixed up :) A quick search didn't bring up a direct reference, but here's a mention of it from Luigi: http://lists.freebsd.org/pipermail/freebsd-hackers/2004-November/008891.html -- Dan Nelson dnel...@allantgroup.com ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Panic in ZFS layer on 8.1-STABLE
In the last episode (Dec 15), Andriy Gapon said: > on 15/12/2010 10:28 Jeremie Le Hen said the following: > > Hi, > > > > [ Please Cc: me when replying, as I'm not subscribed to -sta...@. ] > > > > My filer at home runs FreeBSD. A single data RAID-1 zpool with 10~15 > > datasets, two of them using compression. Over the night, I got the > > following panic: > > Thanks for the stack trace! > But where is the promised panic message? :) > > I suspect that you ran out of kernel address space. > You'd probably have to tune your system and/or add more memory. > Please research this topic via mailing lists archives. > > > Tracing pid 0 tid 100111 td 0x86393a00 > > kdb_enter(809faa5b,809faa5b,80a12e84,cb114aec,0,...) at kdb_enter+0x3a > > panic(80a12e84,1c000,2e3e8000,80a12e7e,7d0,...) at panic+0x131 > > kmem_malloc(8169008c,1c000,2,cb114b6c,80909a99,...) at kmem_malloc+0x285 > > page_alloc(0,1c000,cb114b5f,2,2f0c800,...) at page_alloc+0x27 > > uma_large_malloc(1c000,2,0,8609b3f0,30,...) at uma_large_malloc+0x4a > > malloc(1c000,860b2120,2,cb114bb0,8601d36d,...) at malloc+0x7c > > zfs_kmem_alloc(1c000,2,cb114bf0,8601f77b,1c000,...) at > > zfs_kmem_alloc+0x20 > > zio_buf_alloc(1c000,cb114c30,86008817,92c33bd0,cb114bf0,...) at > > zio_buf_alloc+0x44 > > zio_compress_data(3,b4264000,2,0,cb114c58,...) at > > zio_compress_data+0x8b The following patch may help you. It helps me :) It converts the zio_buf_alloc() call into a zio_buf_alloc_nowait(), so that if the alloc fails, zio_compress_data() returns failure and zfs writes the block uncompressed instead of panicing. Index: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c === --- sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c(revision 216418) +++ sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c(working copy) @@ -202,6 +202,20 @@ zio_buf_alloc(size_t size) return (kmem_alloc(size, KM_SLEEP)); } +void * +zio_buf_alloc_nowait(size_t size) +{ +#ifdef ZIO_USE_UMA + size_t c = (size - 1) >> SPA_MINBLOCKSHIFT; + + ASSERT(c < SPA_MAXBLOCKSIZE >> SPA_MINBLOCKSHIFT); + + return (kmem_cache_alloc(zio_buf_cache[c], KM_NOSLEEP)); +#else + return (kmem_alloc(size, KM_NOSLEEP)); +#endif +} + /* * Use zio_data_buf_alloc to allocate data. The data will not appear in a * crashdump if the kernel panics. This exists so that we will limit the amount Index: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio_compress.c === --- sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio_compress.c (revision 216418) +++ sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio_compress.c (working copy) @@ -32,6 +32,12 @@ #include #include +int panics_avoided_by_not_compressing = 0; +SYSCTL_DECL(_vfs_zfs); +SYSCTL_INT(_vfs_zfs, OID_AUTO, compression_panics_avoided, CTLFLAG_RD, + &panics_avoided_by_not_compressing, 0, +"kmem_map panics avoided by skipping compression when memory is low"); + /* * Compression vectors. */ @@ -109,7 +115,17 @@ zio_compress_data(int cpfunc, void *src, uint64_t destbufsize = P2ALIGN(srcsize - (srcsize >> 3), SPA_MINBLOCKSIZE); if (destbufsize == 0) return (0); + +#if 1 + dest = zio_buf_alloc_nowait(destbufsize); + if (dest == 0) + { + panics_avoided_by_not_compressing++; + return (0); + } +#else dest = zio_buf_alloc(destbufsize); +#endif ciosize = ci->ci_compress(src, dest, (size_t)srcsize, (size_t)destbufsize, ci->ci_level); Index: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zio.h === --- sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zio.h(revision 216418) +++ sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zio.h(working copy) @@ -398,6 +398,7 @@ extern zio_t *zio_unique_parent(zio_t *cio); extern void zio_add_child(zio_t *pio, zio_t *cio); extern void *zio_buf_alloc(size_t size); +extern void *zio_buf_alloc_nowait(size_t size); extern void zio_buf_free(void *buf, size_t size); extern void *zio_data_buf_alloc(size_t size); extern void zio_data_buf_free(void *buf, size_t size); -- Dan Nelson dnel...@allantgroup.com ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: link aggregation - bundling 2 lagg interfaces together
In the last episode (Feb 04), Damien Fleuriot said: > I have a firewall with 2x Intel pro dual port cards. > > On Intel A , port 1 goes to switch 1, port 2 goes to switch 2 > On Intel B , port 1 goes to switch 1, port 2 goes to switch 2 > > I have created the following 2 lagg devices using LACP: > > lagg0 = A1 + B1 > lagg1 = A2 + B2 > > This works fine. > > Now, what I had in mind was creating a lagg2 device using lagg0 and > lagg1 with failover. > > That would provide redundancy in case of a switch failure. > > ifconfig won't let me though: > > # ifconfig lagg2 laggproto failover laggport lagg0 laggport lagg1 > ifconfig: SIOCSLAGGPORT: Invalid argument > > I suppose it's not possible to aggregate lagg interfaces ? Apparently not: http://fxr.watson.org/fxr/source/net/if_lagg.c#L516 It looks like there is preliminary code under #ifdef LAGG_PORT_STACKING, but it claims to be untested. -- Dan Nelson dnel...@allantgroup.com ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: 3TB disc and block alignment
In the last episode (Feb 17), Daniel Kalchev said: > >>> da0: Fixed Direct Access SCSI-5 device > >>> da0: 2861588MB (5860533168 512 byte sectors: 255H 63S/T 364801C) > > > > Thanks -- is it also possible to have something like > > > > da0: 2861588MB (732566646 4096 byte sectors: 255H 63S/T 364801C) > > According to Hitachi, this is an 512b drive. Correct. This isn't a 4k drive. Datasheet: http://www.hgst.com/internal-drives/enterprise/ultrastar/ultrastar-7k3000 Sector size (variable, Bytes/sector)512 -- Dan Nelson dnel...@allantgroup.com ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: (MORE INFO) Ext firewire drive not mounted after update
In the last episode (Nov 13), Robert said: > On Fri, 13 Nov 2009 13:01:47 -0800 > Robert wrote: > > In the time honored FreeBSD tradition, I am replying to my own email. > > I booted with a 8.0RC2 livefs CD and the external disk shows up as > /dev/da0, /das1, /das1d. I then connected the external drive via USB and > rebooted to the 8.0 Prerelease system. The drive shows up and is able to > mount. > > It appears that some thing is amiss with the latest version. I will > download the latest livefs iso and see if that works. I think I remember seing a posting within the last few days saying that the "sbp" device wan't going to be compiled into the 8.0-release kernel due to it causing hangs on boot. If you run "kldload sbp" as root after the system has booted you should see your disk devices appear. I can't find the list post mentioning it, but here's the svn commit log: r199112 | kensmith | 2009-11-09 15:39:42 -0600 (Mon, 09 Nov 2009) | 11 lines Changed paths: M /stable/8/sys/amd64/conf/GENERIC M /stable/8/sys/i386/conf/GENERIC M /stable/8/sys/ia64/conf/GENERIC M /stable/8/sys/powerpc/conf/GENERIC M /stable/8/sys/sparc64/conf/GENERIC Comment out the sbp(4) entry for GENERIC config files that contain it. There are known issues with this driver that are beyond what can be fixed for 8.0-RELEASE and the bugs can cause boot failure on some systems. It's not clear if it impacts all systems and there is interest in getting the problem fixed so for now just comment it out instead of remove it. Commit straight to stable/8, this is an 8.0-RELEASE issue. Head was left alone so work on it can continue there. Reviewed by: Primary misc. architecture maintainers (marcel, marius) -- Dan Nelson dnel...@allantgroup.com ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: (MORE INFO) Ext firewire drive not mounted after update
In the last episode (Nov 13), Robert said: > On Fri, 13 Nov 2009 17:15:39 -0600 Dan Nelson wrote: > > In the last episode (Nov 13), Robert said: > > > On Fri, 13 Nov 2009 13:01:47 -0800 > > > Robert wrote: > > > It appears that some thing is amiss with the latest version. I will > > > download the latest livefs iso and see if that works. > > > > I think I remember seing a posting within the last few days saying that > > the "sbp" device wan't going to be compiled into the 8.0-release kernel > > due to it causing hangs on boot. If you run "kldload sbp" as root after > > the system has booted you should see your disk devices appear. > > Thanks for responding. I checked and the "sbp" device is in fact commented > out. I do remember a thread a month or two back about some folkes having > trouble with firewire drives. I never experienced any trouble on of that > trouble on this system. > > I can continue to operate my drive on USB but I may need firewire in the > near future. I have a friend who is a photographer and I archive her > photos for her. She sends me an external drive or two and I burn her > projects onto DVD. I am not sure if her drives have an USB connector. Note that you can still run "kldload sbp" after bootup to see fireware disks. You can also try adding "device sbp" back to your kernel config and see if it works for you. The hangs apparently only happen on certain motherboards. -- Dan Nelson dnel...@allantgroup.com ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: 8.0-RC USB/FS problem
In the last episode (Nov 24), Jeremy Chadwick said: > On Tue, Nov 24, 2009 at 06:13:21PM +0100, Hans Petter Selasky wrote: > > On Tuesday 24 November 2009 17:58:47 Guojun Jin wrote: > > > Sorry for the typo -- it is public not pub in the middle. The others > > > should > > > be all public. > > > > > > http:/www.daemonfun.com/archives/public/USB/crash1-reset.bz2 > > > > > > > %fetch http:/www.daemonfun.com/archives/public/USB/crash1-reset.bz2 > > fetch: http:/www.daemonfun.com/archives/public/USB/crash1-reset.bz2: No > > address record > > The above issue is unrelated to the USB/FS problem. It looks like > fetch(1) has a parser bug. Note the text portion between the URI and URL > is colon-slash not colon-slash-slash like it should be. That's a typo in the URL, not a bug in fetch :) -- Dan Nelson dnel...@allantgroup.com ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: how to get the UFSID of a mounted filesystem ?
In the last episode (Nov 30), Pete French said: > I observer that when I mount a UFS filesystem using the device name then > the entry vanishes from /dev/ufsid, and glabel list no longer shows the > device. Which begs the question, how do I find out the ufsid of a mountde > filesystem (e.g. '/' so that I can change it's fstab entry for the next > reboot?) > > Am slightly embarassed to have to ask for help! Am sure this was easy and > in dmesg last time I did this... Easiest way is to run dumpfs on the device you currently have mounted. The fsid will be on the 2nd line of the output: (r...@studio) /root># dumpfs /dev/da2s1a | head -2 magic 19540119 (UFS2) timeSun Nov 29 18:19:39 2009 superblock location 65536 id [ 49b21fba 667e8575 ] Next easiest is to run "mount -v" as root, which will give you the fsid, but byte-swapped so you have to mess with it to get a value that matches what glabel expects: /dev/ufsid/49b21fba667e8575 on /tmp/z (ufs, local, soft-updates, fsid ba1fb24975857e66) -- Dan Nelson dnel...@allantgroup.com ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: do I want ch0 or pass1?
In the last episode (Jan 21), Dan Langille said: > Please CC me on replies. > > I'm running into issues with hard-coding some devices (see recent post > titled 'device.hints isn't setting what I want'). > > Associated with this issue is confusion over whether I want to use ch0 > or pass1. I have these devices: > > at scbus1 target 0 lun 0 (ch0,pass1) > at scbus1 target 5 lun 0 (sa1,pass2) > > My understanding: chio(1) will with ch0, whereas mtx(1) will work with > pass1. Is this correct? More information/elaboration will help I'm sure. > > Why do I ask? I can get the tape changer and tape drive hardwired to ch0 > and sa1 respectively. I cannot [yet] do the same with pass1. You can try wiring them down the same way you wire down regular devices, but if they're created sequentially in probe order, that won't work. Ideally, mtx should use cam_open_spec_device() which, when given a device name, will automatically open the matching pass device. -- Dan Nelson dnel...@allantgroup.com ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: numeric sort(1) is broken on -STABLE
In the last episode (Feb 10), Ulrich Spörlein said: > On Wed, 10.02.2010 at 13:49:05 +0300, Ruslan Ermilov wrote: > > On Wed, Feb 10, 2010 at 09:58:14AM +0100, Ulrich Spörlein wrote: > > > not sure if this is a pilot error, but it seems to me that gnu sort -n > > > is broken on at least -STABLE (couldn't test -CURRENT yet). > > > > > > It somehow does not manifest when using a simple list and sorting on a > > > specific column, but it always happens to me when using it in > > > combination with find(1). > > > > > > % truncate -s10m a; truncate -s5m b; truncate -s800k c > > > % find a b c -ls|sort -nk7,7 > > > 8 64 -rw-r--r--1 uqs wheel > > > 10485760 Feb 10 09:13 a > > > 10 64 -rw-r--r--1 uqs wheel > > > 5242880 Feb 10 09:13 b > > > 12 64 -rw-r--r--1 uqs wheel > > > 819200 Feb 10 09:13 c > > > > I bet you're using some non-C locale for LC_NUMERIC. What does "locale" > > output tell you? > > Yes and no. LC_NUMERIC is still at C, LC_CTYPE is set to UTF-8, but as > there are no non-ASCII symbols in that output it shouldn't matter, right? > For me, 819200 is smaller than 10485760 in pretty much all locales. Why > the hell is a numeric gnusort locale dependant? Why is -g working anyway? Try adding a 'b' to your sort flags. I bet the leading spaces in front of your numbers are being treated as part of the sort key. Maybe de_DE.UTF-8 and C have different ideas of what is whitespace? -- Dan Nelson dnel...@allantgroup.com ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: ugen kernel module?
In the last episode (Mar 10): > In FreeBSD7 there was ugen.ko kernel module and I can use apcupsd with USB > devices, but in FreeBSD there is no such module, how can I use APC power > supply with usb interface (I mean usage of the apcupsd port)? It's built into the usb subsystem now. All USB devices (including USB hubs and devices controlled by other drivers) now have a ugen device. Try running "usbconfig list" to show them. I bet your UPS has just moved to a different ugen number. -- Dan Nelson dnel...@allantgroup.com ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Make ZFS auto-destroy snapshots when the out of space?
In the last episode (May 29), Kirk Strauser said: > I found some nice scripts to regularly snapshot all the filesystems in my > ZFS pool at > http://www.neces.com/blog/technology/integrating-freebsd-zfs-and-periodic-snapshots-and-scrubs > . One thing bothers me, though: I have to intentionally set how many > months' worth of snapshots I want to keep. Too many and I run out of > room. Too few and I lose some of the benefits of easy recovery of > deleted data. My computer is better at bookkeeping than I am, so why not > let it? > > I'd propose standardizing on an attribute like > org.freebsd:allowautodestroy. Modify ZFS's disk full behavior to scan for > snapshots with that attribute set and destroy the oldest one, and continue > until there's enough free space to complete a write requests or until out > of "expendable" snapshots to destroy (at which time the normal disk full > handler would run). Also run a daily periodic script to ensure that the > free space stays below a configurable threshold each day so that ZFS isn't > constantly butting up against completely full drives. If the kernel does the snapshot deleting itself, why not add a pool-level property that sets the amount of free space at which the deletion starts? That way you don't need the cleanup script. Alternatively, make the org.freebsd:allowautodestroy property hold the trigger freespace amount. That way you can have monthly/daily/hourly snapshots but set it so the hourly ones disappear first, then the dailies (by setting the destroy trigger slightly higher for the ones you want to expire first). -- Dan Nelson dnel...@allantgroup.com ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: questionable feature- rcvar woes
In the last episode (Nov 28), Andrei Kolu said: > Something is wrong with rcvar or I am just blatant. > > For example: > > 1) Enable powerd in rc.conf > # echo 'enable_powerd="YES"' >> /etc/rc.conf > 2) Launch powerd > # /etc/rc.d/powerd start > Starting powerd. > 3) And stopping it. > # /etc/rc.d/powerd stop > Stopping powerd. > > Everything looks fine, but when I disable powerd in rc.conf then problem > arise. > > 1) Disable powerd in rc.conf- comment it out. > # enable_powerd="YES" > 2) Stop powerd > # /etc/rc.d/powerd stop > ...silence- nothing in logs either. > > What? Not even a warning message and powerd is actually running- why > I have to reboot to disable it? I know that I can stop it by enabling > it in rc.conf but what the point? Same problem when I want to start > some service without appropriate line in rc.conf. I'd prefer to see > somekind of warning about misconfigured rc.conf or at least > information about what's going on in reality. Try "/etc/rc.d/powerd forcestop". What happens during startup and shutdown is that all rc.d scripts are run with "start" or "stop" arguments, and only the ones that have been enabled do anything. -- Dan Nelson [EMAIL PROTECTED] ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: mount -p and NFS options
In the last episode (Feb 04), Mike Andrews said: > Is there anything like "mount -p" that will print the current NFS > options in use? TCP vs UDP, v2 vs v3, read/write sizes etc. It > doesn't have to be in fstab format; I just need to be able to see > what the flags are for an active mount. > > This would be useful in tracking down an irritating NFS problem I've > been experiencing with diskless systems in every 6.x release and > 7.0-RC1, namely libc.so.6 appears to be truncated or corrupt to the > client at somewhat random times... I think it may be related to > mount options, hence the question. Theoretically, any filesystem that uses nmount(2) should have its options recored in an easy-to-extract format, since one of the arguments to nmount is an array of options. I patched my kernel and /sbin/mount binary to do this (borrowing the f_charspare field in struct statfs), and it mostly works. The stuff below in <> brackets are from the options array. You can see that cd9660 was mounted with the option "ssector=0": ([EMAIL PROTECTED]) /root># mount local/root on / (zfs, NFS exported, local, ) devfs on /dev (devfs, local, <>) /dev/ufs/boot on /.boot (ufs, local, soft-updates, ) procfs on /proc (procfs, local, ) /dev/md0 on /tmp (ufs, NFS exported, local, <>) /dev/cd0 on /cdrom (cd9660, NFS exported, local, read-only, ) Unfortunately, mount_nfs simply calls nmount with a single "nfs_args" option whose value is the same binary "struct nfs_args" it used to call mount(2) with :( The fix would be to make nfs_vfsops.c and mount_nfs.c use the options array instead of a custom struct, but nfs_vfsops.c:nfs_decode_args scares me off every time I look at it. -- Dan Nelson [EMAIL PROTECTED] Index: sys/kern/vfs_mount.c === RCS file: /home/ncvs/src/sys/kern/vfs_mount.c,v retrieving revision 1.265.2.2 diff -u -p -r1.265.2.2 vfs_mount.c --- sys/kern/vfs_mount.c 17 Jan 2008 04:24:53 - 1.265.2.2 +++ sys/kern/vfs_mount.c 18 Jan 2008 23:13:48 - @@ -1020,6 +1020,40 @@ vfs_domount( if (mp->mnt_opt != NULL) vfs_freeopts(mp->mnt_opt); mp->mnt_opt = mp->mnt_optnew; + + /* Collapse the mount options into a readable string */ + mp->mnt_stat.f_charspare[0]=0; + if (mp->mnt_opt) { + struct vfsopt *opt; + struct sbuf *sb; + + sb = sbuf_new(NULL, mp->mnt_stat.f_charspare, +sizeof(mp->mnt_stat.f_charspare), +SBUF_FIXEDLEN); + TAILQ_FOREACH(opt, mp->mnt_opt, link) { +/* + * Skip options that are temporary, stored + * elsewhere in struct statfs, or are structs + */ +if (strcmp(opt->name,"errmsg") == 0 || +strcmp(opt->name,"from") == 0 || +strcmp(opt->name,"fspath") == 0 || +strcmp(opt->name,"fstype") == 0 || +strcmp(opt->name,"nfs_args") == 0 || +strcmp(opt->name,"update") == 0 ) + continue; +if (sbuf_len(sb)) + sbuf_cat(sb, ","); +sbuf_cat(sb, opt->name); +if (opt->len) { + sbuf_cat(sb, "="); + sbuf_cat(sb, opt->value); +} + } + sbuf_finish(sb); + sbuf_delete(sb); + } + (void)VFS_STATFS(mp, &mp->mnt_stat, td); } /* Index: sbin/mount/mount.c === RCS file: /home/ncvs/src/sbin/mount/mount.c,v retrieving revision 1.96 diff -u -p -r1.96 mount.c --- sbin/mount/mount.c 25 Jun 2007 05:06:54 - 1.96 +++ sbin/mount/mount.c 2 Oct 2007 21:20:18 - @@ -596,6 +596,7 @@ prmount(struct statfs *sfp) (void)printf(", %s", o->o_name); flags &= ~o->o_opt; } + printf(", <%s>",sfp->f_charspare); /* * Inform when file system is mounted by an unprivileged user * or privileged non-root user. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Analysis of disk file block with ZFS checksum error
In the last episode (Feb 08), Joe Peterson said: > Mark Day wrote: > > Based on the subset of data you posted, the bad data looks like > > ASCII text. The bad data from offset a to a000f is: > > > > ${138AFE{@ > > @$$}1 > > > > The bad data from offset af6c1 to af6c8 is: > > > > 392A9}@ > > > > I don't recognize the content beyond that, but I'd guess that > > somehow the contents of some other file managed to overwrite that > > portion of the bad file. As for how that happened, I don't know. > > But if someone recognizes where the bad content came from, that > > might be a clue. > > Good eye! Yes, it indeed does appear to be ASCII. I *thought* > something in the repetition when I originally did an od -a looked > interesting. > > I dumped the whole bad section as a string, and here's (partly) what I get: > > @$${138B8B{@ > <(21470=Thu Jan 24 23:20:58 2008)> > [117:^80(^91^21470)] > @$$}138B8B}@ ... > @$${138C18{@ > <(21472=1201242069)>[-2:^80(^82^85)(^83^1B5)(^84=b)(^85=1)(^86=0)(^87=0) > (^88=0)(^89^2146C)(^8A=)(^8B=40)(^8C=2e)(^8D^84)(^8E=0)(^90^21472) > (^91^21460)] > @$$}138C18}@ > > and more of the same. Note the date string. There are several like > that. Anyone recognize this text format? It's a Mork database from the Mozilla project: http://developer.mozilla.org/en/docs/Mork_Structure#Rows -- Dan Nelson [EMAIL PROTECTED] ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Question about file system checks
In the last episode (Mar 28), Ivan Voras said: > Danny Pansters wrote: >> Generally I can say that with freebsd even if you pull the plug and >> then let it reboot and do the automatical background fsck you'll >> likely loose only that one file you might have been editing while >> (or just before) you unplugged the box. > > Stress testing I've done suggests otherwise :) I've literally > repeatedly pulled the plug of a server in a controlled environment, > and with a network logging of (a high load of) file system > operations. My results show that UFS+SU and ZFS on FreeBSD loose *the > most* files (and in case of UFS+SU especially directories), than any > of: jfs, xfs, reiser3 (on Linux 2.6.22) and NTFS (on Windows 2003 > Server). ext3 is somewhat similar to UFS+SU, though about 30% better > at not loosing files. Note that you can tweak the SU caching time by adjusting the sysctls kern.{meta,dir,file}delay. Take them down to 10 seconds instead of 30 and you'll lose less files (at the cost of more disk I/O of course). > Some other notes from this proceeding: > > 1. UFS+gjournal looses the least, but it's also the slowest. > 2. UFS+SU had no truncated files or files of unexpected length (apparently > it just looses the file that would end up in this state) > 3. XFS and JFS end up with a *huge* number of files that are truncated or > of unexpected length (40%-50%!) > 4. In no case has any of the above file systems gone completely corrupted > or lost any of the files/directories not being updated. > 5. ZFS on FreeBSD was the fastest, in the sense of creating the most files > during this benchmark (though speed was not the target for this benchmark > so this is a low-quality observation), closely followed by JFS and XFS. ZFS's transaction commit interval is only 5 seconds (see txg_time in uts/common/fs/zfs/txg.c); how many more files/second did it create vs the others to be able to lose the most files in that window? :) > 6. ZFS crashed the kernel at least once. -- Dan Nelson [EMAIL PROTECTED] ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: NFS and /etc/exports
In the last episode (Apr 14), Alfred Perlstein said: > * Robert Blayzor <[EMAIL PROTECTED]> [080414 06:07] wrote: > > On Apr 14, 2008, at 7:02 AM, Nawfal bin Mohmad Rouyan wrote: > > >I'm using TCP and the entry in /etc/fstab on all clients is as below: > > > > > >build:/usr/ports/usr/ports nfs > > >tcp,intr,nfsv3,-w=32768,-r=32768,rw,noauto 0 0 > > >build:/usr/src /usr/srcnfs > > >tcp,intr,nfsv3,-w=32768,-r=32768,rw,noauto 0 0 > > >build:/usr/obj /usr/objnfs > > >tcp,intr,nfsv3,-w=32768,-r=32768,rw,noauto 0 0 > > > > Are -r and -w really needed/useful for TCP mounts? > > yes. This is interesting: according to mountnfs() in nfs_vfsops.c, those are already the kernel defaults: if ((argp->flags & NFSMNT_NFSV3) && argp->sotype == SOCK_STREAM) { nmp->nm_wsize = nmp->nm_rsize = NFS_MAXDATA; } else { nmp->nm_wsize = NFS_WSIZE; nmp->nm_rsize = NFS_RSIZE; } $ grep nfs_maxdata /sys/nfs/* /sys/nfs/nfsproto.h:#define NFS_MAXDATA 32768 But it looks like /sbin/mount_nfs always overrides them to NFS_WSIZE and NFS_RSIZE (both 8K) in its nfsdefargs struct. -- Dan Nelson [EMAIL PROTECTED] ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: zeroed fields in ps output
In the last episode (Apr 23), WaW said: > I have noticed something strange with some processes running in my > system. Look at ps output below: it says that nfsd, smbd and zsh are > running for ~13992 days, that means their ELAPSED field == 0 in unix > time. Moreover, RSS field also == 0. This happens in 1-2 days after > system is booted up. Is this a bug or a feature? > > System is 7.0-RELEASE/amd64. And if it makes sense - nfs and samba do > export zfs filesystems. > > USER PID %CPU %MEM VSZ RSS ELAPSED STARTED STAT COMMAND > root 675 0.0 0.0 1616 0 13992-16:28:05 - IWs /sbin/devd > root 784 0.0 0.0 3572 0 13992-16:28:05 - IWs nfsd: > root 786 0.0 0.0 3572 0 13992-16:28:05 - IW nfsd: > root 787 0.0 0.0 3572 0 13992-16:28:05 - IW nfsd: > root 788 0.0 0.0 3572 0 13992-16:28:05 - IW nfsd: > root 789 0.0 0.0 3572 0 13992-16:28:05 - IW nfsd: > root 846 0.0 0.0 3 0 13992-16:28:05 - IW > /usr/local/sbin/smbd -D -s /usr/local/etc/smb.conf > waw 1021 0.0 0.0 17372 0 13992-16:28:05 - IWs /bin/zsh > waw 1026 0.0 0.0 16220 0 13992-16:28:05 - IWs /bin/zsh > root 1030 0.0 0.0 19400 0 13992-16:28:05 - IW su - Processes with a W in the second column of STAT have been completely swapped out; That definitely explains why RSS=0, and may explain why etime is unavailable. ps should probably print a "-" there (like it does for STARTED) instead of an obviously wrong value. -- Dan Nelson [EMAIL PROTECTED] ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: auto_nlist failed on cp_time at location 1
In the last episode (Apr 23), Tim Stoddard said: > I just upgraded from FreeBSD 6.2 -> > 6.3 (using source tree). I then recompiled my net-snmp port binaries (using > portupgrade). I am now get error message in my logs every five secs. > I am sure my libkvm is in sync with my kernel. I do not know what else > to look at. You got bit by revision 1.178.2.5 date: 2008/04/09 19:47:20; author: peter; state: Exp; lines: +68 -5 MFC: record per-cpu stats for %user/%nice/%system/%idle , which removed the kernel variable that net-snmp uses to track CPU usage. Try this patch (put it in /usr/ports/net-mgmt/net-snmp/files and rebuild net-snmp). I've sent it to the net-snmp port maintainer so hopefully it will be committed soon. -- Dan Nelson [EMAIL PROTECTED] --- agent/mibgroup/hardware/cpu/cpu_nlist.c 2007-01-19 10:53:44.0 -0600 +++ agent/mibgroup/hardware/cpu/cpu_nlist.c 2008-04-22 00:13:48.330686919 -0500 @@ -1,5 +1,5 @@ /* - * nlist() interface + * sysctl() interface * e.g. FreeBSD */ #include @@ -12,24 +12,9 @@ #include #include -#ifdef HAVE_SYS_DKSTAT_H -#include -#endif #ifdef HAVE_SYS_SYSCTL_H #include #endif -#ifdef HAVE_SYS_VMMETER_H -#include -#endif -#ifdef HAVE_VM_VM_PARAM_H -#include -#endif -#ifdef HAVE_VM_VM_EXTERN_H -#include -#endif - -#define CPU_SYMBOL "cp_time" -#define MEM_SYMBOL "cnt" void _cpu_copy_stats( netsnmp_cpu_info *cpu ); @@ -67,11 +52,12 @@ */ int netsnmp_cpu_arch_load( netsnmp_cache *cache, void *magic ) { long cpu_stats[CPUSTATES]; -struct vmmeter mem_stats; +int size, tempval; + netsnmp_cpu_info *cpu = netsnmp_cpu_get_byIdx( -1, 0 ); -auto_nlist( CPU_SYMBOL, (char *) cpu_stats, sizeof(cpu_stats)); -auto_nlist( MEM_SYMBOL, (char *)&mem_stats, sizeof(mem_stats)); +size = sizeof(cpu_stats); +sysctlbyname("kern.cp_time", &cpu_stats, &size, NULL, 0); cpu->user_ticks = (unsigned long)cpu_stats[CP_USER]; cpu->nice_ticks = (unsigned long)cpu_stats[CP_NICE]; @@ -85,15 +71,19 @@ * Interrupt/Context Switch statistics * XXX - Do these really belong here ? */ -#if defined(openbsd2) || defined(darwin) -cpu->swapIn = (unsigned long)mem_stats.v_swpin; -cpu->swapOut = (unsigned long)mem_stats.v_swpout; -#else -cpu->swapIn = (unsigned long)mem_stats.v_swappgsin+mem_stats.v_vnodepgsin; -cpu->swapOut = (unsigned long)mem_stats.v_swappgsout+mem_stats.v_vnodepgsout; -#endif -cpu->nInterrupts = (unsigned long)mem_stats.v_intr; -cpu->nCtxSwitches = (unsigned long)mem_stats.v_swtch; +size = sizeof(int); +#define GET_VM_STATS(cat, name, netsnmpname) \ +do { \ +sysctlbyname("vm.stats." #cat "." #name, &tempval, &size, NULL, 0); \ +cpu->netsnmpname = (unsigned long) tempval; \ +} while(0) + +GET_VM_STATS(vm, v_swappgsin, swapIn); +GET_VM_STATS(vm, v_swappgsout, swapOut); +GET_VM_STATS(vm, v_vnodepgsin, pageIn); +GET_VM_STATS(vm, v_vnodepgsout, pageOut); +GET_VM_STATS(sys, v_intr,nInterrupts); +GET_VM_STATS(sys, v_swtch, nCtxSwitches); #ifdef PER_CPU_INFO for ( i = 0; i < n; i++ ) { ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: auto_nlist failed on cp_time at location 1
In the last episode (Apr 24), Tim Stoddard said: > I applied your patch by hand and recompiled/reinstalled net-snmp, > however I am still seeing the same error just on a different memory > address now. > > Apr 24 10:16:41 shaggy snmpd[73273]: kvm_read(*, 1, 0xbf7fe830, 20) = -1: > kvm_read: Bad address > Apr 24 10:16:41 shaggy snmpd[73273]: auto_nlist failed on cp_time at location > 1 > Apr 24 10:16:46 shaggy snmpd[73273]: kvm_read(*, 1, 0xbf7fe830, 20) = -1: > kvm_read: Bad address > Apr 24 10:16:46 shaggy snmpd[73273]: auto_nlist failed on cp_time at location > 1 > Apr 24 10:16:51 shaggy snmpd[73273]: kvm_read(*, 1, 0xbf7fe830, 20) = -1: > kvm_read: Bad address > Apr 24 10:16:51 shaggy snmpd[73273]: auto_nlist failed on cp_time at location > 1 Hm. It looks like net-snmp has two different pieces of code that both do the same thing (read CPU and vmstat info). I wonder which OIDs trigger them on your system? On my system, walking enterprises.ucdavis.systemStats uses the cpu_nlist.c code. Here's a patch for the other file (vmstat_freebsd2.c); it's not even compiled on my 7-stable system, so I can't verify that it's correct. I'm not sure why my first patch didn't apply; I attached it straight out of my net-snmp/files/ directory. -- Dan Nelson [EMAIL PROTECTED] --- agent/mibgroup/ucd-snmp/vmstat_freebsd2.c 2008-04-24 10:25:59.834152091 -0500 +++ agent/mibgroup/ucd-snmp/vmstat_freebsd2.c 2008-04-24 10:25:59.834152091 -0500 @@ -189,13 +189,15 @@ * Update structures (only if time has passed) */ if (time_new != time_old) { +int size; time_diff = time_new - time_old; time_old = time_new; /* * CPU usage */ -auto_nlist(CPTIME_SYMBOL, (char *) cpu_new, sizeof(cpu_new)); +size = sizeof(cpu_new); +sysctlbyname("kern.cp_time", &cpu_new, &size, NULL, 0); cpu_total = 0; ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Poor write performance with LSI 320-2 on 6.1-STABLE
In the last episode (Sep 29), Albert Chin said: > On Thu, Sep 28, 2006 at 05:15:05PM -0500, Albert Chin wrote: > > I have an Intel S875PWP1 motherboard with a Pentium4 [EMAIL PROTECTED] PCI > > bus is 33Mhz, 32-bit. I recently purchased an LSI 320-2/128MB on eBay > > (though the card really looks like a PERC4/DS) and just ran some > > bonnie++ tests on a RAID 1 array between two U320 drives for the first > > channel and on a RAID 0 array between one U320 drive for the second > > channel. The 320-2 has the latest LSI firmware, 1L47. > > I reran some of the tests with the same 320-2 but on an Intel > SE7520BD2 with 32-bit and 64-bit (100Mhz) slots: > #1. RAID 1, two U320 drives, channel 1, 32-bit, 33Mhz slot > Version 1.93c --Sequential Output-- --Sequential Input- > --Random- > Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- > --Seeks-- > MachineSize K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec > %CP > maetel.il.thew 300M 186 99 16707 5 16063 6 654 99 537320 93 4129 > 50 > Latency 45215us 199ms 89764us 34740us1215us1808ms > Version 1.93c --Sequential Create-- Random > Create > maetel.il.thewritte -Create-- --Read--- -Delete-- -Create-- --Read--- > -Delete-- > files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec > %CP > 16 7441 23 + +++ + +++ 5799 18 + +++ + > +++ > Latency 479ms 122us2508us 606ms 13549us 101us > > #2. RAID 1, two U320 drives, channel 1, 64-bit, 100Mhz slot > Version 1.93c --Sequential Output-- --Sequential Input- > --Random- > Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- > --Seeks-- > MachineSize K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec > %CP > maetel.il.thew 300M 186 99 18006 6 15964 5 634 99 571275 99 4450 > 57 > Latency 44992us 139ms 130ms 35143us1238us 120ms > Version 1.93c --Sequential Create-- Random > Create > maetel.il.thewritte -Create-- --Read--- -Delete-- -Create-- --Read--- > -Delete-- > files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec > %CP > 16 7581 24 + +++ + +++ 5750 18 + +++ + > +++ > Latency 511ms 255us2615us 622ms 12691us 53us > > Odd that I don't get x2 the performance when the bus bandwidth doubles > in speed. Not really odd, since you're nowhere near even the 32-bit bus's max. (32bit * 33Mhz) / 8 bits = 132 MB/sec, and in write-through mode you're spending most of your time witing for the disks to sync. With a larger filesize you might see a difference in the sequential input test; judging by your insane sequential read and random seek values, your 300M test file looks like it's completely cached in RAM. A size 2x your RAM capacity is recommended. -- Dan Nelson [EMAIL PROTECTED] ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: problem with old /usr/src/contrib/amd
In the last episode (Oct 03), Nicolas Martin said: > i was wondering if an update of /usr/src/contrib/amd is planned ? I > encounter a problem using amd with nolock options, and it seems that > this problem was fixed on recent version of am-utils. If anything, it would be updated in -current, not stable. Until a newer version is imported, you can use the sysutils/am-utils port. -- Dan Nelson [EMAIL PROTECTED] ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Is jemalloc going to make its way into RELENG_6?
In the last episode (Oct 05), Vlad GALU said: > Judging from my tests (allocating numerous small objects, then > freeing the memory) it looks like the bottleneck is in free(). I've > built a different libc library with the malloc.c and tree.h taken > from HEAD and it now behaves nicely. I haven't seen any bad side > effects on this machine (it's the lappie I do most of my work on, I > run KDE, seamonkey, mplayer, openoffice, the like) since I switched > to the new libc. Another nice solution would be to ship the modified > libc in base so the people who really need jemalloc can relink to it > via libmap.conf. You can compile just the -current version of malloc.c as a shared object, then inject it into specific binaries: $ gcc -O -Wall -I/usr/src/lib/libc/include -shared -o /lib/jemalloc.so jemalloc.c $ MALLOC_OPTIONS=P date date in malloc(): warning: unknown char in MALLOC_OPTIONS Thu Oct 5 11:44:36 CDT 2006 $ LD_PRELOAD=/lib/jemalloc.so MALLOC_OPTIONS=P date |& head Thu Oct 5 11:44:49 CDT 2006 ___ Begin malloc statistics ___ Number of CPUs: 2 Number of arenas: 11 Chunk size: 524288 (2^19) Quantum size: 16 (2^4) Max small size: 512 Pointer size: 4 Assertions enabled Allocated: 4096, space used: 1048576 I've tried this with seamonkey and mysqld, so this method seems to work fine on complex apps. -- Dan Nelson [EMAIL PROTECTED] ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Pleading for commit
In the last episode (Oct 24), Doug Barton said: > Duane Whitty wrote: > >Patching it myself after every cvs update is not such a big deal; It > >is forgetting to patch it after every update which is a big deal. > > Write a little script for yourself that calls cvsup then runs patch > so you won't forget. :) Or cvsup the CVS repository (instead of using checkout mode), check out your working tree from there, and run "cvs update" to update your sources, which will preserve local changes. -- Dan Nelson [EMAIL PROTECTED] ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: adaptec utilities on amd64?
In the last episode (Nov 17), Vivek Khera said: > On Nov 15, 2006, at 7:34 PM, Bruce Burden wrote: > > I have a 2230SLP that I will be installing early next week on > > my AMD64 implementation. I am hoping that the aaccli program in > > ports will work. > > If it has the newer firmware, it will not work with aaccli. If you > got the card after they switched to the "R" revision, you have the > newer firmware. > > Some time long ago, someone posted a very short C program that probes > the LSI controller and spits out this kind of output: > > [EMAIL PROTECTED] amrstat > Drive 0:34.18 GB, RAID1 optimal > Drive 1: 102.54 GB, RAID1 optimal > > This is the kind of output I'd love to get from my adaptec > controllers, too. This can be trivially scripted and hooked into a > monitoring system like nagios. > > The aaccli tool is a curses based app (despite the "cli" in the name) > and scripting it is damn near impossible. It doesn't even read > commands from stdin! It's non-interactive if you pass it a commandline, though. I have a Big Brother script that does this (amongst other things): # Gather Data CONTROLLERS=$($AACCLI controller list | awk '/PERC/ { print $1 }') OUT_AAC="Controller list: $CONTROLLERS " CMD_AAC="task list /all : controller details : container list /full : disk list /full : disk show smart /full : enclosure list /full : enclosure show status" for c in $CONTROLLERS ; do OUT_AAC=$OUT_AAC$($AACCLI open /readonly $c : $CMD_AAC) done It then processes the contents of $OUT_AAC to determine if the array's happy or not. -- Dan Nelson [EMAIL PROTECTED] ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: malloc(0) returns 0x800 on FreeBSD 6.2 ?
In the last episode (Dec 11), Luigi Rizzo said: > i was debugging a program on FreeBSD 6, and much to my surprise, i > noticed that malloc(0) returns 0x800, as shown by this program: > > > more a.c > #include > int main(int argc, char *argv[]) > { > char *p = malloc(0); > printf(" malloc 0 returns %p\n", p); > } > > cc -o a a.c > > ./a >malloc 0 returns 0x800 > > if you look at the source this is indeed clear - internally the 0x800 > is ZEROSIZEPTR and is set when a zero length is passed to malloc() > unless you have malloc_sysv set. Right, it passed you a pointer to which you may write 0 bytes to; exactly what the program asked for :) The FreeBSD 6.x behaviour is slightly against POSIX rules that state all successful malloc calls must return unique pointers, so the 7.x malloc silently rounds zero-size mallocs to 1. Ideally malloc would return unique pointers to blocks of memory set to MPROT_NONE via mprotect() (you could fit 8192 of these pointers in an 8k page), to prevent applications from using that byte of memory. -- Dan Nelson [EMAIL PROTECTED] ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: malloc(0) returns 0x800 on FreeBSD 6.2 ?
In the last episode (Dec 11), Dan Nelson said: > In the last episode (Dec 11), Luigi Rizzo said: > > i was debugging a program on FreeBSD 6, and much to my surprise, i > > noticed that malloc(0) returns 0x800, as shown by this program: > > > > > more a.c > > #include > > int main(int argc, char *argv[]) > > { > > char *p = malloc(0); > > printf(" malloc 0 returns %p\n", p); > > } > > > cc -o a a.c > > > ./a > > malloc 0 returns 0x800 > > > > if you look at the source this is indeed clear - internally the 0x800 > > is ZEROSIZEPTR and is set when a zero length is passed to malloc() > > unless you have malloc_sysv set. > > Right, it passed you a pointer to which you may write 0 bytes to; > exactly what the program asked for :) > > The FreeBSD 6.x behaviour is slightly against POSIX rules that state > all successful malloc calls must return unique pointers, so the 7.x > malloc silently rounds zero-size mallocs to 1. Ideally malloc would > return unique pointers to blocks of memory set to MPROT_NONE via > mprotect() (you could fit 8192 of these pointers in an 8k page), to > prevent applications from using that byte of memory. Also note that the 0x800 behaviour was added to malloc.c rev 1.60 back in 2001, which means that all of the 5.x and 6.x releases did this. -- Dan Nelson [EMAIL PROTECTED] ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Is syslog() reentrant? Was: OpenBSD's spamd.
In the last episode (Dec 19), Christopher Hilton said: > Christopher Hilton wrote: > >Has anyone gotten a newer version of OpenBSD's spamd than the one in > >ports going? I'm cvsupping my ports tree now but since I didn't see > >an update on the cvs server I'm assuming 3.7 is the latest version. > > > >Between OpenBSD 3.7 and 3.8 spamd gained the ability to tarpit or > >stutter at all connections for a configurable period of time. I > >understand that stuttering for the first few seconds of the SMTP > >dialog causes many spammers to go away before even generating a > >greylisting tuple. It's something I'd like to try and see for myself > >and it will be fairly easy since my primary MX is behind an OpenBSD > >firewall. However, my secondary MX is a FreeBSD box with no such > >protection and I fear that the spammers will just take advantage of > >the fact that my secondary MX has weaker protections than my > >primary. > > > > A casual attempt to compile a fresher copy of the software shows that > spamd is using the OpenBSD's reentrant syslog functions (syslog_r, > openlog_r, etc) Is FreeBSD's syslog already reentrant? It is, as of FreeBSD 5.4. In previous versions only openlog() and syslog("%m") with an invalid errno were non-reentrant. http://www.freebsd.org/cgi/query-pr.cgi?pr=72394 -- Dan Nelson [EMAIL PROTECTED] ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"