from:"Dan Nelson"

Re: 9-stable from i386 to amd64

2012-02-10 Thread Dan Nelson

In the last episode (Feb 10), Randy Bush said:
> is there a recipe for moving from i386 to amd64?
> 
> on a very remote system, i made the migration from 7.4 to 8.2 to 9.0, all
> 32-bit.  it was done with repeated
> 
> make buildworld
> make kernel.new [0]
> nextboot -k kernel.new
> reboot
> make installworld
> etc
> 
> [0] - well, there were some mv(1)s in there :)
> 
> so after it was happy with 9.0 i386, i went to move to amd64 with
> 
> make buildworld TARGET=amd64
> make kernel TARGET=amd64 DESTDIR=kernel.new [0]
> nextboot -k kernel.new
> reboot
> 
> it did not come back from the reboot, and required a manual reset.  i have
> no console access to the machine, not my choice.
> 
> clue bat please.

You probably got bit by a mismatched /libexec/ld-elf.so. The kernel expects
that to be the "native" version, and on a 64-bit kernel it also expects a
ld-elf32.so to be the "compat" 32-bit version.  When you rebooted onto the
64-bit kernel, it couldn't find /libexec/ld-elf32.so to run any of the
32-bit binaries on the system.  My guess is that your reboot attempt died in
/sbin/init, prompting for a path to /bin/sh.  If you compiled with a static
/bin/sh for performance, it probably died very early in /etc/rc.

I think copying ld-elf.so over to ld-elf32.so might have been all you needed
to boot, but that would end up with a 64-bit kernel running a true 32-bit
userland with all the libraries in the "wrong" place, and your
"installworld" step would replace them with their 64-bit equivalents and
your install would die halfway through, leaving you with a large mess to
clean up.

The cleanest upgrade path is to prepare your 32-bit root to be bootable by
both 32- and 64-bit kernels: copy the ld-elf32.so that was built during your
buildworld over to /libexec/ld-elf32.so, and also make copies of
/lib and /usr/lib to /lib32 and /usr/lib32 respectively.  That way when you
reboot to a 64-bit kernel, your 32-bit executables will be running
"correctly" out of compat32 paths and your installworld should succeed.

When I did all this on a local system, I made judicious use of ZFS snapshots
and clones, preserving a bootable clone of my original system plus
intermediate versions all the way until I was happy with the result.  I've
never done it completely remotely, but if you do a trial run or two on a
local machine or VM, you should be able to it confidently remotely.

-- 
Dan Nelson
dnel...@allantgroup.com
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: tmpfs nfs exports?

2012-10-30 Thread Dan Nelson

In the last episode (Oct 30), jb said:
> Alfred Perlstein  mu.org> writes:
> > Hey folks, any reason why not to include the following patch in 9.1? It
> > would be nice to have tmpfs be exportable.
> > 
> > I'm good to commit it, I can also wait until post 9.1.
> > ...
> 
> How do you identify tmpfs ? With fsid ?
> 
> Since nfs server is stateless, are these exports identical ?
> export /tmp, reboot, export /tmp
> 
> What about /tmp on tmpfs ?
> export /tmp, reboot, export /tmp

I wanted to do the exact same thing a few years ago.  I patched mdmfs and
the startup scripts to allow for an fsid value to be passed to mdmfs on
every reboot.  That works for the filesystem itself, but then you have to
contend with the random NFS generation number on every inode.  I decided it
wasn't worth the trouble at that point. 

If you really want an exportable /tmp, just live with the fact that you'll
get ESTALE errors on all clients when you reboot the server.  Maybe giving
the root inode a constant generation number is all that's needed, since I
suppose most clients that have mounted the server don't actually have any
open filehandles.

-- 
Dan Nelson
dnel...@allantgroup.com
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: recommended memory for zfs

2013-05-09 Thread Dan Nelson

In the last episode (May 09), Benjamin Adams said:
> Hello zfs question about memory.
> I heard zfs is very ram hungry.
> Service looking to run:
> - nginx
> - postgres
> - php-fpm
> - python
> 
> I have a machine with two quad core cpus but only 4 G Memory
> 
> I'm looking to buy more ram now.
> What would be the recommend amount of memory for zfs across 6 drives on 
> this setup?

As much as is reasonable to purchase.  Postgres would probably appreciate
the memory more than ZFS.  You can run ZFS on memory-limited machines (I've
gone as far down as 256MB), but the critical part is running a 64-bit
kernel.  ZFS does a lot of kernel malloc/free operations, and address space
fragmentation on a 32-bit system will eventually cause a panic when ZFS
can't malloc a contiguous 128k chunk.

-- 
Dan Nelson
dnel...@allantgroup.com
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Network throughput: Never get more than 112MB/s über two NICs

2011-04-12 Thread Dan Nelson

In the last episode (Apr 12), Denny Schierz said:
> Am Montag, den 11.04.2011, 21:52 +0200 schrieb Denny Schierz:
> > Am 11.04.2011 um 20:06 schrieb Tim Daneliuk:
> > > Are you certain you are not somehow running active-passive instead of
> > > active-active ...  just a thought...
> > 
> > 150% sure. I used two dedicated NICs WITHOUT any loadbalancing. The sum
> > has to be more than 112MB/s.
> 
> it must me the network. I tested two crossover connections and I've got
> 220MB/s :-)

Check to see whether your switch ports are oversubscribed (common for older
blade switches, or very high-density blades); sometimes there will be
rectangles enclosing groups of 6-8 ports, which means that they are
controlled by a single chip internally.  Moving each of your test machines
to a separate group may improve your performance.

-- 
Dan Nelson
dnel...@allantgroup.com
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Network throughput: Never get more than 112MB/s über two NICs

2011-04-12 Thread Dan Nelson

In the last episode (Apr 12), Dan Nelson said:
> In the last episode (Apr 12), Denny Schierz said:
> > Am Montag, den 11.04.2011, 21:52 +0200 schrieb Denny Schierz:
> > > Am 11.04.2011 um 20:06 schrieb Tim Daneliuk:
> > > > Are you certain you are not somehow running active-passive instead of
> > > > active-active ...  just a thought...
> > > 
> > > 150% sure. I used two dedicated NICs WITHOUT any loadbalancing. The sum
> > > has to be more than 112MB/s.
> > 
> > it must me the network. I tested two crossover connections and I've got
> > 220MB/s :-)
> 
> Check to see whether your switch ports are oversubscribed (common for older
> blade switches, or very high-density blades); sometimes there will be
> rectangles enclosing groups of 6-8 ports, which means that they are
> controlled by a single chip internally.  Moving each of your test machines
> to a separate group may improve your performance.

.. I missed a line in your original post:

> > All are connected through a Cisco Catalyst WS-X4515.

This is a supervisor module for a 4500 series chassis, but only has two SFP
ports on it.  Your servers are unlikely to be plugged into it.  They're
probably plugged into another module.  This page lists some gigabit ethernet
modules that oversubscribe their ports, and which ports belong to which
groups:

http://www.cisco.com/en/US/docs/switches/lan/catalyst4500/hardware/module/guide/03instal.html#wpxref23495

-- 
Dan Nelson
dnel...@allantgroup.com
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Large number of SATA commits (MFCs) to RELENG_8

2011-04-21 Thread Dan Nelson

In the last episode (Apr 21), Doug Barton said:
> On 04/20/2011 19:43, Lystopad Olexandr wrote:
> > May be we need another one file, like src/ChangeLog ?
> 
> Users who run a -stable branch are expected to read 
> freebsd-stable@FreeBSD.org (note, not just subscribe), AND read the 
> commit mail for their branch; just like users who run HEAD are expected 
> to read freebsd-current@ and the relevant commit mail.

I use a small shell script called "update" that does a "svn update", and
also prints a line at the end that you can copy&paste into another terminal
to get the log of what was just pulled.

#! /bin/sh
stat=$(svn status --depth empty -v -u)
localrev=$(echo "$stat" | cut -c10- | awk 'NR==1 {print $2}')
latestrev=$(echo "$stat" | awk 'NR==2 {print $4}')
repo=$(svn info | sed -ne '/^URL/s/^.*: //p')
echo "$stat"
svn info | grep Revision
svn update
if [ "$localrev" != "$latestrev" ] ; then
  echo "Log:"
  echo "svn log -v -r $(($localrev+1)):$latestrev $repo"
fi

Sample output:

(root@dan) /usr/src # ./update
 M  220902   220902 jilles   .
Status against revision: 220927
Revision: 220902
Usbin/conscontrol/conscontrol.c
Usbin/conscontrol/conscontrol.8
 U   sbin/conscontrol
Usys/kern/uipc_sockbuf.c
Usys/kern/kern_exit.c
Usys/netgraph/ng_base.c
 U   sys/contrib/pf
 U   sys/contrib/dev/acpica
 U   sys/cddl/contrib/opensolaris
 U   sys/amd64/include/xen
Usys/sys/proc.h
 U   sys
Updated to revision 220927.
Log:
svn log -v -r 220903:220927 svn://svn.freebsd.org/base/stable/8

-- 
Dan Nelson
dnel...@allantgroup.com
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Unstable ARP responses times

2011-05-13 Thread Dan Nelson

In the last episode (May 13), Bartosz Woronicz said:
> Since I moved from 7.3-stable to 8.2-stable I go strange long responses 
> of arp, with arping.
> I.e.
> root@Korbotron82|pts/3|13:35:35|/home/mastier # arping -i vlan92 
> 79.110.194.140
> ARPING 79.110.194.140
> 60 bytes from 00:15:17:a2:ea:38 (79.110.194.140): index=0 time=1.579 msec
> 60 bytes from 00:15:17:a2:ea:38 (79.110.194.140): index=1 time=653.326 msec
> 60 bytes from 00:15:17:a2:ea:38 (79.110.194.140): index=2 time=7.153 usec

arping has a usleep(1) call in its read loop, which can cause delays like
this if there are other processes running and the scheduler decides to run
another process.  Try removing the usleep(1) on line 916 of arping.c and see
if that helps.  The best solution would be to use the kernel-provided
timestamps from the pcap header, rather than calling gettimeofday() in
userland.  If you run "tcpdump arp", you should be able to see the packet
timestamps as the kernel sees them.

-- 
Dan Nelson
dnel...@allantgroup.com
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: ZFS I/O errors

2011-05-30 Thread Dan Nelson

In the last episode (May 30), Olaf Seibert said:
> On Mon 30 May 2011 at 03:33:49 -0700, Jeremy Chadwick wrote:
> > On Mon, May 30, 2011 at 12:10:51PM +0200, Olaf Seibert wrote:
> > I'm not sure why this didn't actually map to a filename on the system
> > however.  I've never quite understood what the hexadecimal values shown
> > represent (I have ideas but it'd be useful to know what they meant).
> 
> The scrub is starting to add some filenames to the list. So far they are
> two filenames in snapshots (where current versions of the file have been
> modified since then).
> 
> > Try running without compression and see if that improves things.
> 
> That sounds like a good idea.
> 
> My theory so far is that it ran out of memory while compressing, with
> incorrect compressed data written to the disk.

The ZFS compression code will panic if it can't allocate the buffer needed
to store the compressed data, so that's unlikely to be your problem.  The
only time I have seen an "illegal byte sequence" error was when trying to
copy raw disk images containing ZFS pools to different disks, and the
destination disk was a different size than the original.  I wasn't even able
to import the pool in that case, though.  

The zfs IO code overloads the EILSEQ error code and uses it as a "checksum
error" code.  Returning that error for the same block on all disks is
definitely weird.  Could you have run a partitioning tool, or some other
program that would have done direct writes to all of your component disks?

Your scrub is also a bit worrying - 24k checksum errors definitely shouldn't
occur during normal usage.

-- 
Dan Nelson
dnel...@allantgroup.com
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: stable/8 nfsd: is it normal to have one worker regardless of -n setting?

2011-07-31 Thread Dan Nelson

In the last episode (Aug 01), Dmitry Morozovsky said:
> just noticed that contemporary nfsd does not fork children in accordance
> to -n setting:
> 
> stable/8:
> 
> root@beaver:/usr/local/tb/scripts# pid nfs
>  1745  ??  Is 0:00.02 nfsd: master (nfsd)
>  1746  ??  S  0:03.29 nfsd: server (nfsd)
> root@beaver:/usr/local/tb/scripts# grep nfs_server_flags /etc/rc.conf
> nfs_server_flags="-u -t -n 4"

They are threads now:

# ps axw | grep nfsd
 1373  ??  Is 0:00.02 nfsd: master (nfsd)
 1374  ??  S  5:47.14 nfsd: server (nfsd)
# ps axwH | grep nfsd
 1373  ??  Is 0:00.02 nfsd: master (nfsd)
 1374  ??  S  1:25.79 nfsd: server (nfsd)
 1374  ??  S  1:26.65 nfsd: server (nfsd)
 1374  ??  S  1:27.67 nfsd: server (nfsd)
 1374  ??  S  1:27.04 nfsd: server (nfsd)

-- 
Dan Nelson
dnel...@allantgroup.com
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Something missing in truss

2011-12-03 Thread Dan Nelson

In the last episode (Dec 02), Eivind Evensen said:
> Does anybody else see this or know why?
> 
> The machine here is running :
> 
> > uname -a
> FreeBSD elg.hjerdalen.lokalnett 8.2-STABLE FreeBSD 8.2-STABLE #36: Wed Nov 30 
> 22:03:07 CET 2011 
> rumrunner@elg.hjerdalen.lokalnett:/usr/obj/usr/src/sys/RUM  amd64
> 
> While trying to weed out some firefox problems, I've noticed
> that truss doesn't recognise certain syscalls :
> 
> getpid()   = 1519 (0x5ef)
> clock_gettime(4,{48496.335142903 })= 0 (0x0)
> kevent(20,{0x23,EVFILT_READ,EV_ADD,0,0x0,0x809ec9d80},1,{0x15,EVFILT_READ,0x0,0,0x1,0x809ec9e80},64,0x0)
>  = 1 (0x1)
> clock_gettime(4,{48496.335293202 })= 0 (0x0)
> read(21,"\0",1)= 1 (0x1)
> clock_gettime(4,{48496.335382599 })= 0 (0x0)
> umask(0x80a52ee20,0x8,0x0,0x80a52ee00,0x7f1f9eb0,0x80a52ee00) = 116 (0x74)
> -- UNKNOWN SYSCALL -14704864 --
> syscall(0x7f1f9ec0,0x0,0x18745,0x7f1f9eb0,0x1,0x7f1f9e90) = 454 
> (0x1c6)
> umask(0x80a52ee20,0x8,0x0,0x80a52ee00,0x7f1f9eb0,0x80a52ee00) = 116 (0x74)
> -- UNKNOWN SYSCALL -14704864 --
> syscall(0x7f1f9ec0,0x0,0x18745,0x7f1f9eb0,0x1,0x7f1f9e90) = 454 
> (0x1c6)
> umask(0x80a52ee20,0x8,0x0,0x80a52ee00,0x7f1f9eb0,0x80a52ee00) = 116 (0x74)
> -- UNKNOWN SYSCALL -14704864 --
> syscall(0x7f1f9ec0,0x0,0x18745,0x7f1f9eb0,0x1,0x7f1f9e90) = 454 
> (0x1c6)
> umask(0x80a52ee20,0x8,0x0,0x80a52ee00,0x7f1f9eb0,0x80a52ee00) = 116 (0x74)
> -- UNKNOWN SYSCALL -14704864 --
> syscall(0x7f1f9ec0,0x0,0x18745,0x7f1f9eb0,0x1,0x7f1f9e90) = 454 
> (0x1c6)
> umask(0x80a52ee20,0x8,0x0,0x80a52ee00,0x7f1f9eb0,0x80a52ee00) = 116 (0x74)
> -- UNKNOWN SYSCALL -14704864 --
> syscall(0x7f1f9ec0,0x0,0x18745,0x7f1f9eb0,0x1,0x7f1f9e90) = 454 
> (0x1c6)

Two problems: truss get confused when you attach to a process that's
currently executing a syscall, and it gets even more confused when you have
a threaded process waiting in many syscalls at once.

The following patch fixes problem #1, but problem #2 involves keeping more
per-thread state and ends up touching a lot of the truss code.  See
http://www.evoy.net/FreeBSD/truss.diff for one solution (and more syscall
decodes).

Index: setup.c
===
--- setup.c (revision 228242)
+++ setup.c (working copy)
@@ -202,8 +202,10 @@
find_thread(info, lwpinfo.pl_lwpid);
switch(WSTOPSIG(waitval)) {
case SIGTRAP:
-   info->pr_why = info->curthread->in_syscall?S_SCX:S_SCE;
-   info->curthread->in_syscall = 1 - 
info->curthread->in_syscall;
+   if ((lwpinfo.pl_flags&(PL_FLAG_SCE|PL_FLAG_SCX)) == 0)
+   err(1,"pl_flags=%x contains neither PL_FLAG_SCE 
or PL_FLAG_SCX", lwpinfo.pl_flags);
+   info->pr_why = (lwpinfo.pl_flags&PL_FLAG_SCE) ? 
S_SCE:S_SCX;
+       info->curthread->in_syscall = (info->pr_why == S_SCE) ? 
1:0;
break;
default:
info->pr_why = S_SIG;


-- 
Dan Nelson
dnel...@allantgroup.com
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: kernel: negative sbsize for uid = 0

2011-12-14 Thread Dan Nelson

In the last episode (Dec 13), Doug Barton said:
> I'm running 8.2-RELEASE-p4 i386 on some web servers that are generally
> lightly-moderately loaded, but occasionally see some heavy spikes where
> load average goes way up.  When that is happening, but sometimes even when
> it's not, I get hundreds of this message spewing into the logs:
> 
> kernel: negative sbsize for uid = 0
> 
> I haven't found anything particularly useful by searching for that
> message, the one reference was to mbufs, but that seems not to be the
> problem.  Here is the output of 'netstat -m' during one of the load
> spikes:
[...]
> So is this message something to worry about? If so, how can I diagnose
> what's happening, and how do I fix it?

I've seen it ocassionally too.  The error message is printed in
/sys/kern/kern_resource.c when the ui_sbsize resource counter goes negative. 
There's probably insufficient locking somewhere in the functions that call
chgsbsize.  The increment/decrement is done atomically, but the data pointed
to by the "hiwat" argument is read then updated later without an explicit
lock, so if that value changes while the function is executing, it could
cause problems.  ui_sbsize is only used by the resource limiting code,
though, so unless you're enforcing an sbsize rlimit, it should be harmless.

-- 
Dan Nelson
dnel...@allantgroup.com
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: swi4: clock taking 40% cpu?!?

2011-12-15 Thread Dan Nelson

In the last episode (Dec 15), Jeremy Chadwick said:
> On Thu, Dec 15, 2011 at 12:51:28PM -0800, Doug Barton wrote:
> > Web server under heavy'ish load (7 on a 2 cpu system) running
> > 8.2-RELEASE-p4 i386 I'm seeing this:
> > 
> > PID USERNAME PRI NICE   SIZERES STATE   C   TIME   WCPU COMMAND
> > 12  root -32- 0K   112K WAIT0 129:01 39.99% {swi4: clock}
> > 
> > Any ideas why the clock should be taking so much cpu? HZ=100 if that
> > makes a difference ...
> 
> Could be wrong, but I believe this correlates with IRQ 4.  What does
> vmstat -i show for a total and rate for irq4 if you run it, wait a few
> seconds, then run it again?  Does the number greatly/rapidly increase?

That would be "irq4" in that case, though.  "swi4" is just a software
interrupt thread, and "clock" is the softclock callout handler.  There are
both KTR and DTrace logging functions in kern_timeout.c, so you could use
either one to get a handle on what's eating your CPU.  Busy-looping
"procstat -k 12" for a few seconds might get you some useful stacks, as
well.
 
> Shot in the dark here, but the only thing I can think of that might
> cause this is software being extremely aggressive with calls to things
> like gettimeofday(2) or clock_gettime(2).  Really not sure.  ntpd maybe
> (unlikely but possible)?  Sort of grasping at straws here.

-- 
Dan Nelson
dnel...@allantgroup.com
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Snapshot corruption.

2004-11-22 Thread Dan Nelson

In the last episode (Nov 22), David Gilbert said:
> >>>>> "Brian" == Brian Fundakowski Feldman <[EMAIL PROTECTED]> writes:
> Brian> Long strings of NUL bytes?  Missing data?  Spam (from the same
> Brian> file, or from other files)?
> 
> Well... I don't really know db file formats.  Most of the corruption
> I found in berkley db files.  mailgraph uses rrd.  mailman uses some
> form of berkley db, too.  I don't know what the corruption "looked"
> like other than the db library would no longer accept it.

db files are very fragile when it comes to OS or process crashes. There
is no logging, and writes are cached until the process exits or a
db->sync() is called, virtually guaranteeing corruption.  Ideally, db
files should only cache data and be rebuildable from other data, or
they should db->sync() after every write.  db 2+ databases can do
logging, but I don't know how many applications actually request it.

-- 
Dan Nelson
[EMAIL PROTECTED]
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: 5-STABLE softupdates issue?

2004-11-24 Thread Dan Nelson

In the last episode (Nov 24), Scott Long said:
> Matthias Andree wrote:
> > out of fun and to investigate claims about alleged bgfsck resource
> > hogging (which I could not reproduce) posted to
> > news:de.comp.os.unix.bsd, I pressed the reset button on a live
> > FreeBSD 5-STABLE system.
> >
> > Upon reboot, fsck -p complained about an unexpected softupdates
> > inconsistency on the / file system and put me into single user
> > mode, the manual fsck / then asked me to agree to increasing a link
> > count from 21 to 22 (and later to fix the summary, which I consider
> > a non-issue). A subsequent fsck -p / ended with no abnormality
> > detected.
> 
> No, this in theory should not happen.  YOu could have caught it right
> at the instance that it was sending a transaction out to disk, or you
> could have caught an edge case that isn't understood yet. 
> Unfortunately, ATA drives also cannot be trusted to flush their
> caches when one would expect, so this leaves open a lot of possible
> causes for your problem.

If you just want to test stability in the face of system crashes (and
not power failure), you can drop to DDB and run "reboot" to simulate a
panic (or run reboot -qn as root).  That way your drive doesn't lose
power.

That said, I get unexpected softupdates inconsistencies pretty
regularly on kernel panics.  I just let the system run until I can
reboot and run a fsck -p.

-- 
Dan Nelson
[EMAIL PROTECTED]
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

panic: Duplicate free of item 0xc37aa200 from zone 0xc103f9a0(Mbuf)

2005-01-14 Thread Dan Nelson


Got this overnight on a 5.3-STABLE 2005-01-13 kernel:

This GDB was configured as "i386-portbld-freebsd5.3"...
panic: Duplicate free of item 0xc37aa200 from zone 0xc103f9a0(Mbuf)

panic messages:
---
panic: Duplicate free of item 0xc37aa200 from zone 0xc103f9a0(Mbuf)

cpuid = 0
--- trap 0x1, eip = 0, esp = 0xe2561d7c, ebp = 0 ---
boot() called on cpu#0
Uptime: 15h14m6s
Dumping 1023 MB
Compressing
Compressed to 314 MB
Dumpsize = 330268160
Dump starting at 475069440
---
#0  doadump () at pcpu.h:159
in pcpu.h
doadump () at pcpu.h:159
159 in pcpu.h
#0  doadump () at pcpu.h:159
#1  0xc059c1c6 in boot (howto=260) at ../../../kern/kern_shutdown.c:410
#2  0xc059bc3b in panic (fmt=0xc079a0b3 "Duplicate free of item %p from zone 
%p(%s)\n") at ../../../kern/kern_shutdown.c:566
#3  0xc06e4b96 in uma_dbg_free (zone=0xc103f9a0, slab=0xc37aafa8, 
item=0xc37aa200) at ../../../vm/uma_dbg.c:299
#4  0xc06e2ce0 in uma_zfree_arg (zone=0xc103f9a0, item=0xc37aa200, udata=0x0) 
at ../../../vm/uma_core.c:2257
#5  0xc05d760c in m_freem (mb=0x0) at uma.h:302
#6  0xc05d8d15 in m_defrag (m0=0xc37aa200, how=1) at 
../../../kern/uipc_mbuf.c:1124
#7  0xc068d49a in dc_start (ifp=0xc238d000) at ../../../pci/if_dc.c:3337
#8  0xc05bf76d in taskqueue_run (queue=0xc2324900) at 
../../../kern/subr_taskqueue.c:191
#9  0xc05854f2 in ithread_loop (arg=0xc227d900) at ../../../kern/kern_intr.c:547
#10 0xc05843f9 in fork_exit (callout=0xc0585310 , arg=0x0, 
frame=0x0) at ../../../kern/kern_fork.c:807
#11 0xc071a3dc in fork_trampoline () at ../../../i386/i386/exception.s:209
(gdb)

-- 
Dan Nelson
[EMAIL PROTECTED]
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

panic: Assertion besttd != NULL failed at ../../../kern/subr_sleepqueue.c:676

2005-01-18 Thread Dan Nelson


Got this while running a Java proxy server under libthr:

FreeBSD dan.emsphone.com 5.3-STABLE FreeBSD 5.3-STABLE #387: Thu Jan 13 
14:43:03 CST 2005 [EMAIL PROTECTED]:/usr/src/sys/i386/compile/DANSMP  i386

panic: Assertion besttd != NULL failed at ../../../kern/subr_sleepqueue.c:676
cpuid = 1
KDB: stack backtrace:
panic(c07822b6,c078843f,c078825f,2a4,) at panic+0x1cf
sleepq_signal(c3a9b4b0,0,,e7461ce0,c05a74eb) at sleepq_signal+0xf0
wakeup_one(c3a9b4b0,0,c07863e3,12e,be3173f0) at wakeup_one+0x20
thr_wake(c2bb84b0,e7461d14,4,16,1) at thr_wake+0xdb
syscall(3123002f,2f,4c98002f,811bfc0,83f4400) at syscall+0x137
Xint0x80_syscall() at Xint0x80_syscall+0x1f
--- syscall (443, FreeBSD ELF32, thr_wake), eip = 0x280e658f, esp = 0xbe3173ec, 
ebp = 0xbe317418 --- 
panic: mi_switch: switch in a critical section
cpuid = 1

The last two lines repeated forever until I reset the box, so no
coredump, but the stack is almost identical to a panic I got back in
November and did get a trace on (the crashdump is long gone though).

#0  doadump () at pcpu.h:159
#1  0xc059aa86 in boot (howto=260) at ../../../kern/kern_shutdown.c:397
#2  0xc059a55b in panic (fmt=0xc077e446 "Assertion %s failed at %s:%d") at 
../../../kern/kern_shutdown.c:553
#3  0xc05bcb5c in sleepq_remove_thread (sq=0xc3394dc0, td=0x0) at 
../../../kern/subr_sleepqueue.c:594
#4  0xc05bd47d in sleepq_signal (wchan=0xc2c5d7d0, flags=0, pri=-1) at 
../../../kern/subr_sleepqueue.c:675
#5  0xc05a1d80 in wakeup_one (ident=0x0) at ../../../kern/kern_synch.c:266
#6  0xc05a5dab in thr_wake (td=0xc38abaf0, uap=0x0) at 
../../../kern/kern_thr.c:303
#7  0xc072a797 in syscall (frame={tf_fs = 819527727, tf_es = 1285816367, tf_ds 
= -1078001617, tf_edi = 135378944, tf_esi = 137891840, tf_ebp = -1173986296, 
tf_isp = -419791500, tf_ebx = 671700552, tf_edx = 138204160, tf_ecx = 0, tf_eax 
= 443, tf_trapno = 22, tf_err = 2, tf_eip = 672032143, tf_cs = 31, tf_eflags = 
582, tf_esp = -1173986340, tf_ss = 47}) at ../../../i386/i386/trap.c:1001
#8  0xc07168af in Xint0x80_syscall () at ../../../i386/i386/exception.s:201

-- 
    Dan Nelson
[EMAIL PROTECTED]
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

panic: Duplicate free of item 0xc37aa200 from zone 0xc103f9a0(Mbuf)

2005-01-19 Thread Dan Nelson


I've gotten this one twice in 5 days (still have the cores if anyone
needs more info):

FreeBSD dan.emsphone.com 5.3-STABLE FreeBSD 5.3-STABLE #387: Thu Jan 13 
14:43:03 CST 2005 [EMAIL PROTECTED]:/usr/src/sys/i386/compile/DANSMP  i386

panic: Duplicate free of item 0xc37aa200 from zone 0xc103f9a0(Mbuf)
#0  doadump () at pcpu.h:159
#1  0xc059c1c6 in boot (howto=260) at ../../../kern/kern_shutdown.c:410
#2  0xc059bc3b in panic (fmt=0xc079a0b3 "Duplicate free of item %p from zone 
%p(%s)\n") at ../../../kern/kern_shutdown.c:566
#3  0xc06e4b96 in uma_dbg_free (zone=0xc103f9a0, slab=0xc37aafa8, 
item=0xc37aa200) at ../../../vm/uma_dbg.c:299
#4  0xc06e2ce0 in uma_zfree_arg (zone=0xc103f9a0, item=0xc37aa200, udata=0x0) 
at ../../../vm/uma_core.c:2257
#5  0xc05d760c in m_freem (mb=0x0) at uma.h:302
#6  0xc05d8d15 in m_defrag (m0=0xc37aa200, how=1) at 
../../../kern/uipc_mbuf.c:1124
#7  0xc068d49a in dc_start (ifp=0xc238d000) at ../../../pci/if_dc.c:3337
#8  0xc05bf76d in taskqueue_run (queue=0xc2324900) at 
../../../kern/subr_taskqueue.c:191
#9  0xc05854f2 in ithread_loop (arg=0xc227d900) at ../../../kern/kern_intr.c:547
#10 0xc05843f9 in fork_exit (callout=0xc0585310 , arg=0x0, 
frame=0x0) at ../../../kern/kern_fork.c:807
#11 0xc071a3dc in fork_trampoline () at ../../../i386/i386/exception.s:209

panic: Duplicate free of item 0xc274d000 from zone 0xc103f9a0(Mbuf)
#0  doadump () at pcpu.h:159
#1  0xc059c1c6 in boot (howto=260) at ../../../kern/kern_shutdown.c:410
#2  0xc059bc3b in panic (fmt=0xc079a0b3 "Duplicate free of item %p from zone 
%p(%s)\n") at ../../../kern/kern_shutdown.c:566
#3  0xc06e4b96 in uma_dbg_free (zone=0xc103f9a0, slab=0xc274dfa8, 
item=0xc274d000) at ../../../vm/uma_dbg.c:299
#4  0xc06e2ce0 in uma_zfree_arg (zone=0xc103f9a0, item=0xc274d000, udata=0x0) 
at ../../../vm/uma_core.c:2257
#5  0xc05d760c in m_freem (mb=0x0) at uma.h:302
#6  0xc05d8d15 in m_defrag (m0=0xc274d000, how=1) at 
../../../kern/uipc_mbuf.c:1124
#7  0xc068d49a in dc_start (ifp=0xc238d000) at ../../../pci/if_dc.c:3337
#8  0xc05bf76d in taskqueue_run (queue=0xc2324900) at 
../../../kern/subr_taskqueue.c:191
#9  0xc05854f2 in ithread_loop (arg=0xc227d900) at ../../../kern/kern_intr.c:547
#10 0xc05843f9 in fork_exit (callout=0xc0585310 , arg=0x0, 
frame=0x0) at ../../../kern/kern_fork.c:807
#11 0xc071a3dc in fork_trampoline () at ../../../i386/i386/exception.s:209


-- 
Dan Nelson
[EMAIL PROTECTED]
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: Very large directory

2005-01-19 Thread Dan Nelson

In the last episode (Jan 19), Phillip Salzman said:
> I have a pair of servers that act as SMTP/AV gateways.  It seems that
> even though we've told the AV software not to store messages, it is
> anyway.
> 
> They've been running for a little while now - and recently we've
> noticed a lot of disk space disappearing.  Shortly after that, a
> simple du into our /var/spool returned a not so nice error:
> 
>   du: fts_read: Cannot allocate memory
> 
> No matter what command I run on that directory, I just don't seem to
> have enough available resources to show the files let alone delete
> them (echo *, ls, find, rm -rf, etc.)

Try raising your datasize rlimit value; also see the thread
"Directories with 2 million files" at
http://lists.freebsd.org/pipermail/freebsd-current/2004-April/026170.html
for some other ideas.  "find . | xargs rm" sounds promising.

-- 
Dan Nelson
[EMAIL PROTECTED]
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: what is vrlock, and why is it causing me problems?

2005-02-01 Thread Dan Nelson

In the last episode (Feb 01), Marc G. Fournier said:
> 
> /proc/10/status:syncer 10 0 0 0 -1,-1 noflags 1103511247,149 0,0 44097,435728 
> vrlock 0 0 0,0 -
> /proc/39506/status:postgres 39506 39505 39500 39500 -1,-1 noflags 
> 1106713055,403942 3002,40214 4118,737982 vrlock 1001 1001 1001,1001,1001  
> maildb.hub.org
> /proc/51927/status:postgres 51927 41068 41068 41068 -1,-1 noflags 
> 1107288042,547570 0,0 1,303423 vrlock 1001 1001 1001,1001,1001  
> pgsql72.hub.org
> /proc/52582/status:postgres 52582 52581 52579 52579 -1,-1 noflags 
> 1104953811,860872 16534,324197 22783,184956 vrlock 1001 1001 1001,1001,1001  
> pgsql74.hub.org
> /proc/53309/status:umount 53309 82960 53309 82960 5,2 ctty  1107288298,562659 
> 0,0 0,546634 vrlock 0 0 0,0,0,2,3,4,5,20,31 -
> /proc/54039/status:umount 54039 53941 54039 53941 5,4 ctty  1107288402,928683 
> 0,0 0,526544 vrlock 0 0 0,0,0,2,3,4,5,20,31 -
> /proc/9/status:bufdaemon 9 0 0 0 -1,-1 noflags 1103511247,130 0,0 432,924063 
> vrlock 0 0 0,0 -

vrlock is an internal vinum lock (see /sys/dev/vinum/vinumlock.c ).

> Load on the server is neglible, so it isn't like I'm dealing with
> 'server lag'
>
> FreeBSD 4.10-STABLE #7: Thu Oct  7 20:17:02 ADT 2004

more like deadlock, I think.  There were two commits in RELENG_4 after
your build time, but it's a performance fix and probably won't affect
your lock problem.

-- 
Dan Nelson
[EMAIL PROTECTED]
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: [HEADS UP] perl symlinks in /usr/bin will be gone

2005-02-03 Thread Dan Nelson

In the last episode (Feb 03), Christian Weisgerber said:
> Chuck Swiger <[EMAIL PROTECTED]> wrote:
> 
> > Well-behaved 3rd party scripts ought to start Perl via:
> > #! /usr/bin/env perl
> 
> Why should the authors of those scripts break them for systems which
> have /bin/env?

Are there any systems that have a /bin/env (and that do not also have a
/bin -> /usr/bin symlink)?

-- 
Dan Nelson
[EMAIL PROTECTED]
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: Adjusting time on a secured FreeBSD machine.

2005-02-04 Thread Dan Nelson

In the last episode (Feb 04), Stan said:
> I ran into this same problem.  After trying various things, I finally
> gave up and did it the easy way.  If you don't mind rebooting, the
> easiest thing to do is set the clock in the BIOS as accurately as
> possible, then let ntpd fine tune it from there.

Setting your BIOS clock shouldn't be necessary since the bootup scripts
will do an ntp sync before raising the securelevel anyway.  Make sure
you have ntpdate_enable="YES" in rc.conf.

-- 
Dan Nelson
[EMAIL PROTECTED]
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: swapfile being eaten by unknown process

2005-02-14 Thread Dan Nelson

In the last episode (Feb 14), Kris Kennaway said:
> On Tue, Feb 15, 2005 at 01:30:42AM +, John wrote:
> > Is there a way of seeing *what* program/process is eating swap.
> > There are loads of ways of seeing that it is being eaten, but so
> > far haven't found a way of knowing what eats, so can't fix the
> > problem. Can anyone enlighten me?
> 
> Use ps or top, and look for the process with the huge size.  This is
> not foolproof, because a process can allocate memory without using it
> (e.g. rpc.statd), but it's a place to start.  If you see a process
> that is both large, and paging to/from disk, that's a better
> indication.

To see which processes are paging: run top, hit 'm' to switch modes,
and hit 'o' then 'fault' to sort the processes by how many page faults
they are doing.  This isn't completely foolproof either, since reads
from mmap()ed files count as faults as well.


-- 
Dan Nelson
[EMAIL PROTECTED]
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: buildworld fails on: ===> bin/domainname

2005-02-21 Thread Dan Nelson

In the last episode (Feb 21), bsdnooby said:
> When I run 'make buildworld' I get a series of errors like this:
> 
> rm -f .depend GPATH GRTAGS GSYMS GTAGS
> ===> bin/domainname
> "Makefile", line 3: Need an operator
> ...
> "Makefile", line 33 Need an operator
> make: fatal errors  encountered -- cannot continue
> *** Error code 1
> Stop in /usr/src/bin
> *** Error code 1
> Stop in /usr/src
> *** Error code 1
> Stop in /usr/src

The contents of bin/domainname/Makefile should read:

 # $FreeBSD: src/bin/domainname/Makefile,v 1.7 2001/12/04 01:57:40 obrien Exp $
 
 PROG=  domainname
 
 .include 

Check to make sure your copy didn't get corrupted somehow.

-- 
Dan Nelson
[EMAIL PROTECTED]
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: FreeBSD 5-STABLE, MSI KT880 , fxp and SCB timeouts

2005-02-23 Thread Dan Nelson

In the last episode (Feb 23), Jon Noack said:
> On 02/23/05 20:06, Mario Sergio Fujikawa Ferreira wrote:
> >  My last motherboard burned down to ashes so I got myself a brand
> >(after 2 weeks) new MSI KT880. I am getting some weird results.
> >
> >  1) fxp intel etherxpress 10/100 network cards report SCB timeout
> >as well as achieving ridiculously low transfer rates of 600
> >Bytes/second. Well, I got 10 KBytes/sec once but that does not count
> >since a side box gets more than 50KB/s ;-) on the same hub. Oh, I've
> >already switched hub ports, rj45 cables and fxp cards.
> 
> Duplex mismatch?  You say "hub" and not "switch", so you might need
> to force the card to half-duplex.  Oddly enough, the fxp(4) man page
> doesn't include half-duplex as a media option.  Surely it supports
> it...

Autodetection on ethernet detects both speed and duplex, and
full-duplex and half-duplex are either/or, so if you force a speed and
don't force full-duplex, you get half-duplex by default.

-- 
Dan Nelson
[EMAIL PROTECTED]
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: vmstat/iostat/top all fail to report CPU usage

2005-02-28 Thread Dan Nelson

In the last episode (Feb 28), Jeff Behl said:
> as reported in bug: bin/60385
> 
> this is still occurring in almost all of our systems, even those at 
> stable, and is pretty major issue.  any known progress on this?  we're 
> running ibm e325 servers. 
> 
> FreeBSD www3 5.3-STABLE FreeBSD 5.3-STABLE #1: Tue Feb 15 10:09:17 PST 
> 2005[EMAIL PROTECTED]:/usr/obj/usr/src/sys/SMP  amd64

Here's the workaround I use on a machine here that loses its stat clock
after a week or so of uptime.  Put this in /etc/crontab:

#  Flaky clock.  Check it every 5 minutes.
*/5 *   *   *   *   root/usr/local/bin/fixrtc

.. and this in /usr/local/bin/fixrtc:

#! /bin/sh

# get the interrupt rate for the stat clock over one second
getticks() {
  ( vmstat -i ; sleep 1 ; vmstat -i ) |
  awk '/rtc/ { if (sum) sum+=$3; else sum-=$3 } END { print sum }'
}

ticks=$(getticks)

# It should be firing at 128 hz.  If not, kick it
if [ $ticks -lt 64 ] ; then
  echo "Stat clock has died.  Attempting to reset."
  echo
  /etc/rc.d/ntpd stop
  echo
  /usr/sbin/ntpdate -b ntp.pool.org
  echo
  /etc/rc.d/ntpd start
  echo
  echo "RTC interrupt rate is now $(getticks)"
fi


-- 
Dan Nelson
[EMAIL PROTECTED]
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: Abort Trap for cron-jobs in 5.3

2005-03-15 Thread Dan Nelson

In the last episode (Mar 15), Niklas Saers said:
> On Tue, 15 Mar 2005, Niklas Saers wrote:
> > I've got four servers that all have the same problem: when jobs get
> > started from Cron, they die after some time with an "Abort trap".
> > Jobs that are dying are:
> >
> > /usr/libexec/atrun >> /var/log/cron 2>&1 /usr/bin/nice -10 
> > /usr/local/bin/zsh /root/bin/sendBarkMail.sh > /dev/null 2>&1
> >
> > I also get this on virtually every shell-script that uses tar, leaving my
> > filesystem littered with bsdtar.core files.
> >
> > Running these jobs from the command prompt works fine. Any suggestions on
> > what may be causing them to die from cron? sendBarkMail.sh simply moves
> > mails from one folder to another periodically
>
> Note to self: ask the question. ;-)
> 
> What I'm wondering about is: what could be causing the Abort Trap's?
> 
> World and kernel are a recent RELENG_5_3 compiled like described in
> src/UPDATING.

What's the stack trace from one of those cores?

Also, try not redirecting stdout and stderr to /dev/null; you are
probably discarding a valuable error message.

-- 
Dan Nelson
[EMAIL PROTECTED]
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: NFS failure for bonnie++

2005-03-16 Thread Dan Nelson

In the last episode (Mar 16), David Gilbert said:
> I have two machines running 5.3-PRERELEASE (cvsup'd yesterday).
> They're dual opterons running amd64 code.  One of them has 1.0T of
> disk mounted with gmirror, gconcat and ggate... and it exports this
> via nfs.
> 
> The other is an nfs client.
> 
> When I run bonnie++ -n 200 -s 4000 -u dgilbert on the server, it runs
> fine.  When I run the same command on the client, it dies trying to
> delete the files.

bonnie segfaults?

-- 
Dan Nelson
[EMAIL PROTECTED]
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: kern/78664: truss does not work on 5-STABLE(5.4-PRERELEASE)

2005-03-17 Thread Dan Nelson

HASHI Hiroaki wrote:
> truss command does not work with below message.
>
> "truss: PIOCBIS: Inappropriate ioctl for device"

I've narrowed it down to something committed between 02-24 and 02-27,
but can't continue the binary search until tonight.  It would be really
nice if this was fixed before 5.4 gets released :)

-- 
Dan Nelson
[EMAIL PROTECTED]
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: malloc() debugging flags broken on RELENG_5

2005-03-21 Thread Dan Nelson

In the last episode (Mar 21), Bartosz Fabianowski said:
> Some commit in the last few weeks has broken the malloc() debug flags
> on RELENG_5. According to the man page, a call to free() or realloc()
> with a modified pointer should cause a warning. Setting the "A" flag
> in either /etc/malloc.conf or MALLOC_OPTIONS should turn this into an
> error. However, what happens is that this *always* causes an error.
> And even setting the corresponding "a" flag does not turn it into a
> warning.

You're not running as root, are you?  The A flag is always set for root
or setuid processes as a security measure.  There hasn't been any
changes to the malloc code in 5.x since 5.3.

> This is very unfortunate as some poorly written programs (KDE's
> Kopete messenger in my case) seem to rely on the fact that free() and
> realloc() with modified pointers are OK.

File a bugreport; a program must pass the same pointer to free() that
it received from malloc().

-- 
Dan Nelson
[EMAIL PROTECTED]
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: malloc() debugging flags broken on RELENG_5

2005-03-21 Thread Dan Nelson

In the last episode (Mar 21), Bartosz Fabianowski said:
> >You're not running as root, are you?  The A flag is always set for
> >root or setuid processes as a security measure.
> 
> No, I am running as a normal user.
> 
> >There hasn't been any changes to the malloc code in 5.x since 5.3.
> 
> I realize there shouldn't have been any changes and I also cannot
> find everything in the CVS logs. But when I run Kopete, I get the
> following:
> 
> kopete in free(): error: modified (chunk-) pointer
>   ^
> According to the man page, this word should read "warning" instead of
> "error" and the application should not be aborted.

The actual test in the malloc code reads:

if (malloc_abort || issetugid() || getuid() == 0 || getgid() == 0)
wrterror(p)

, so it may also trigger if your primary groupid is 0 (wheel).  Just
being a member of the wheel group won't trigger it.
 
> >File a bugreport; a program must pass the same pointer to free() that
> > it received from malloc().
> 
> Obviously, there is a bug in Kopete. But it runs for other people with 
> earlier versions of RELENG_5. I am currently downgrading to 1st March to 
> see whether that fixes the issue for me.

It might also be caused by some dependant package, and not strictly
kopete's fault.  Depends on what is being freed.

-- 
Dan Nelson
[EMAIL PROTECTED]
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: Time zone change confuses cron

2005-03-28 Thread Dan Nelson

In the last episode (Mar 29), Ladislav Bodnar said:
> I've just changed the system time zone from local time to UTC by
> copying /usr/share/zoneinfo/Etc/UTC to /etc/localtime. To my dismay,
> I found that crontab (both /etc/crontab and user-level crontab)
> completely ignores the change and continues executing scripts
> according to the old time.

If you haven't rebooted yet, restart cron.  A process reads timezone
settings only once, during startup.  You're not supposed to pull the
rug out from under its feet by switching /etc/localtime :)

-- 
Dan Nelson
[EMAIL PROTECTED]
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: syscons options and memory use

2005-03-30 Thread Dan Nelson

In the last episode (Mar 31), Ronald Klop said:
> The syscons manual page says:
> "The following options will remove some features from the syscons
>  driver and save kernel memory.
>  [...]
>  SC_NO_SYSMOUSE
> This option removes mouse support in the syscons driver.
> The mouse daemon moused(8) will fail if this option is
> defined. This option implies the SC_NO_CUTPASTE option
> too.
> "
> 
> How much memory does this save (or how can I discover that)? Is it worth  
> it on a 96MB PentiumII laptop?

I would guess that the memory savings is probably on the order of
kilobytes.  Useful if you're trying to prevent excessive swapping on an
8MB system.  Not worth disabling on your system.

-- 
Dan Nelson
[EMAIL PROTECTED]
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: syscons options and memory use

2005-03-31 Thread Dan Nelson

In the last episode (Mar 31), Ronald Klop said:
> On Thu, 31 Mar 2005 01:04:10 -0600, Dan Nelson <[EMAIL PROTECTED]>  
> wrote:
> >In the last episode (Mar 31), Ronald Klop said:
> >>The syscons manual page says:
> >>"The following options will remove some features from the syscons
> >> driver and save kernel memory.
> >> [...]
> >> SC_NO_SYSMOUSE
> >>This option removes mouse support in the syscons driver.
> >>The mouse daemon moused(8) will fail if this option is
> >>defined. This option implies the SC_NO_CUTPASTE option
> >>too.
> >>"
> >>
> >>How much memory does this save (or how can I discover that)? Is it worth
> >>it on a 96MB PentiumII laptop?
> >
> >I would guess that the memory savings is probably on the order of
> >kilobytes.  Useful if you're trying to prevent excessive swapping on an
> >8MB system.  Not worth disabling on your system.
> 
> How can I see the size of my kernel?
> I know vmstat -m and netstat -m, but from that info I don't see if I  
> reduced the memory footprint after disabling an option or device.

For the kernel size itself, just "ls -l /boot/kernel/kernel" :)  A more
interesting number might be the output of "sysctl hw.usermem", which I
believe is the amount of memory available to user processes.

-- 
Dan Nelson
[EMAIL PROTECTED]
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: nfsiod tasks started in error

2005-04-07 Thread Dan Nelson

In the last episode (Apr 07), [EMAIL PROTECTED] said:
> During sysinstall answered no to the server and client nfs questions
> and after installed completed and system rebooted I see task
> nfsiod1,2,3,4 running in output of ps ax command.  This was not the
> case in any of the 4.x releases. This can be looked upon as a
> security leak. This may be a error in the new boot up process. This
> was first reported 1/16/2004 in 5.2 RC2 as Problem Report kern/61438
> and again in 5.3 as Problem Report kern/79539

Both of those PRs should be closed as not-a-bug, I think.  nfsiod
threads simply allow multiple concurrent NFS requests.  In 4.*, with no
nfiod processes running, you can still use NFS (just more slowly than
with them).  In 5.*, they are auto-created as kernel threads during
bootup.
 
> I tried to run /usr/local/etc/rc.d/killnfs.sh script to kill these
> unwanted tasks but that does not work.

They aren't tasks, but kernel threads.  Just like pagedaemon, swapper,
g_event, irq*, swi*, and a couple dozen other threads created by the
kernel.

> Any suggestions on how I can kill these bogus nfs tasks as part of
> boot up or what to change in the boot up process so these tasks don't
> get started in the first place? Doing a manual recompile of the
> kernel to remove the nfs statements is not a viable solution.

Why not?  If you want to disable NFS, that's the only way.

-- 
Dan Nelson
[EMAIL PROTECTED]
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: nfsiod tasks started in error

2005-04-07 Thread Dan Nelson

In the last episode (Apr 07), [EMAIL PROTECTED] said:
> You did not answer my question. how can I kill these bogus nfs tasks
> as part of the boot up or what to change in the boot up process so
> these tasks don't get started in the first place?  What is a work
> around with out compiling the kernel?

The answer is to remove "options NFS" from your kernel and recompile. 

If you enable the nfs client, you automatically get "nfsiod" threads
created for you, just like if you have acpi compiled into your kernel,
you get "acpi_task*" threads, or if you have a keyboard plugged in, you
get an "irq1: atkbd0" thread.  Neither of those existed in 4.* and
you're not complaining about them.  What is it about those four nfsiod
threads that upsets you so much?

-- 
Dan Nelson
[EMAIL PROTECTED]
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: nfsiod tasks started in error

2005-04-07 Thread Dan Nelson

In the last episode (Apr 07), [EMAIL PROTECTED] said:
[ my exact message; this is even worse than top-posting. ]
> Plan and simple. It's a security hole.  If no nfs is selected in
> sysinstall then there should not be any nfs stuff started at all.

Then the fix is to remove the nfs client flag from sysinstall, since
it's built into the kernel and cannot be disabled without rebuilding. 

-- 
    Dan Nelson
[EMAIL PROTECTED]
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: kernel killing processes when out of swap

2005-04-12 Thread Dan Nelson

In the last episode (Apr 12), Nick Barnes said:
> This is the well-known problem with my fantasy world in which the OS
> doesn't overcommit any resources.  All those programs are broken, but
> it's too costly to fix them.  If overcommit had been resisted more
> effectively in the first place, those programs would have been
> written properly.

Another issue is things like shared libraries; without overcommit you
need to reserve the file size * the number of processes mapping it,
since you can't guarantee they won't touch every COW page handed to
them.  I think you can design a shlib scheme where you can map the libs
RO; not sure if you would take a performance hit or if there are other
considerations.  There's a similar problem when large processes want to
fork+exec something; for a fraction of a second you need to reserve 2x
the process's space until the exec frees it.  vfork solves that
problem, at the expense of blocking the parent until the child's
process is loaded.

-- 
Dan Nelson
[EMAIL PROTECTED]
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: Newbie Question About System Update

2005-04-19 Thread Dan Nelson

In the last episode (Apr 19), Bill Moran said:
> Chuck Swiger <[EMAIL PROTECTED]> wrote:
> > Bill Moran wrote:
> > > The system can not replace programs that are in use,
> > This is generally not the case.  Unix lets you continue to access a
> > file after it has been deleted, so long as the process hangs on to
> > a file descriptor.  This lets you replace programs in use, without
> > running into the same problems that platforms like Windows have.
> 
> What you say?:
> 
> bash-2.05b$ su
> Password:
> bolivia# cp /usr/sbin/cron /home/wmoran/.
> bolivia# cp /home/wmoran/cron /usr/sbin/.
> cp: /usr/sbin/./cron: Text file busy
> bolivia# 
> 
> Notice that /usr/sbin/cron is in use (because my system is running
> normally)  I can copy _from_ that file, but I can not overwrite it.

What you can do, however, is: create the new file under a temporary
name, delete the original, and rename the temp file to the orignal's
name, which is what /usr/bin/install does.  I've done many
installworlds on running systems without problems.

-- 
Dan Nelson
[EMAIL PROTECTED]
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: load > 1, no process using >10% CPU...?

2005-04-19 Thread Dan Nelson

In the last episode (Apr 19), Damian Gerow said:
> Thus spake Damian Gerow ([EMAIL PROTECTED]) [19/04/05 21:21]:
> : I'm a little fuzzy as to /how/ load is calculated, but why would my
> : system think that it's doing all kinds of work when ps, top, and
> : systat can't really tell me /what/ it's doing?
> 
> It turned out to be a runaway xmms process.  But I still find it
> strange that it didn't show anything obvious in top.

If xmms is threaded, you probably got bit by the "libpthread doesn't do
process CPU accounting" bug.  Most threaded processes will just show up
as 0 %CPU in top, no matter what they're doing.  The rusage stats are
handled correctly, though, so look for processes whose TIME value is
increasing at one (or more if you're SMP) seconds per second.

-- 
Dan Nelson
[EMAIL PROTECTED]
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: gstat and scripting

2005-04-20 Thread Dan Nelson

In the last episode (Apr 21), Ronald Klop said:
> The tool gstat can produce very nice stats. Can I get these stats
> from the system periodicly for use in my one scripts/graphs? Is there
> a sysctl like kern.ad0.reads? Or some other way of retreiving this
> info from the kernel. Looking at the gstat output, the numbers must
> be in there.

You can use net-snmp and mrtg to graph disk stats.
I suggest applying the patch at
http://sourceforge.net/tracker/index.php?func=detail&aid=1085243&group_id=12694&atid=312694
so you get 64-bit counters (32-bit counters roll over too fast to be
useful):

$ snmptable -v2c localhost diskiotable
SNMP table: enterprises.ucdavis.ucdExperimental.ucdDiskIOMIB.diskIOTable

 diskIOIndex diskIODevice diskIONRead diskIONWritten diskIOReads diskIOWrites 
diskIONReadX diskIONWrittenX diskIOLA1 diskIOLA5 diskIOLA15
   1  da0  3682573440 4134971392 7734458 71468595  
68107082880828768692224  1794   398138
   2  cd0  24  0   30   
24   0 0 0  0
   3  cd1   911237260  0   139320   
 911237260   0 0 0  0
   4pass01622 24  401   
  1622  24 0 0  0
   5pass1   0  0   00   
 0   0 0 0  0
   6pass2   57676 2710198624330242848   
 57676  2710198624 0 0  0

If you graph diskIONReadX and diskIONWrittenX over time, you'll get a
nice graph of throughput.

-- 
Dan Nelson
[EMAIL PROTECTED]
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

panic: mutex tty owned at ../../../kern/kern_event.c:1487

2005-04-23 Thread Dan Nelson


Got the following on a 2004-04-21 -stable kernel:

panic messages:
---
panic: mutex tty owned at ../../../kern/kern_event.c:1487
cpuid = 0
KDB: stack backtrace:
kdb_backtrace(c07c93fb,0,c07aed2c,e97b799c,c3d05780) at kdb_backtrace+0x2e
panic(c07aed2c,c07a955e,c07acb39,5cf,c383c800) at panic+0x139
_mtx_assert(c383c910,2,c07acb39,5cf,c383c800) at _mtx_assert+0xf3
knote(c383c880,0,0) at knote+0x3d
ttwakeup(c383c800,c383c800,c4cc3e8c,1,c07aeb7a) at ttwakeup+0x89
ttyinput(d,c383c800,0,0,4d) at ttyinput+0x8d4
ttypend(c383c800,c383c800,e97b7a80,c05e9b80,c383c800) at ttypend+0x6c
ttnread(c383c800,c2803374,c327d700,e97b7aa4,c0596b3c) at ttnread+0x1b
filt_ttyread(c2803374,0,c07acb39,5dd,c383c800) at filt_ttyread+0x20
knote(c383c880,0,0) at knote+0xcc
ttwakeup(c383c800,c07b377e,32e,32d,c05aa530) at ttwakeup+0x89
ttioctl(c383c800,802c7415,c39864c0,3,c3d05780) at ttioctl+0xc6b
ttyioctl(c2bce300,802c7415,c39864c0,3,c3d05780) at ttyioctl+0x65
ptyioctl(c2bce300,802c7415,c39864c0,3,c3d05780) at ptyioctl+0x2a8
spec_ioctl(e97b7c00,e97b7cac,c06233e4,e97b7c00,0) at spec_ioctl+0x17c
spec_vnoperate(e97b7c00,0,c07b6fe6,30d,c080d640) at spec_vnoperate+0x18
vn_ioctl(c3968198,802c7415,c39864c0,c3a00a00,c3d05780) at vn_ioctl+0x204
ioctl(c3d05780,e97b7d14,c,431,3) at ioctl+0x448
syscall(2f,2f,2f,1,1) at syscall+0x2a0
Xint0x80_syscall() at Xint0x80_syscall+0x1f
--- syscall (54, FreeBSD ELF32, ioctl), eip = 0x2828533f, esp = 0xbfaed66c, ebp 
= 0xbfaed6c8 ---
boot() called on cpu#0
Uptime: 2d1h39m7s
Dumping 1023 MB
Compressing
 64>22 128>46 192>71 256>95 320>119 384>143 448>167 512>191 576>215 640>233 
704>249 768>264 832>273 896>296 960>320
Compressed to 346 MB
Dumpsize = 363098112
Dump starting at 442239488
 64 128 192 256 320 384 448 512 576 640 704 768 832 896 960
---
#0  doadump () at pcpu.h:159
159 pcpu.h: No such file or directory.
in pcpu.h
doadump () at pcpu.h:159
159 in pcpu.h
#0  doadump () at pcpu.h:159
#1  0xc05b44aa in boot (howto=260) at ../../../kern/kern_shutdown.c:410
#2  0xc05b4884 in panic (fmt=0xc07aed2c "mutex %s owned at %s:%d") at 
../../../kern/kern_shutdown.c:566
#3  0xc05aac23 in _mtx_assert (m=0xc383c910, what=0, file=0xc07acb39 
"../../../kern/kern_event.c", line=1487) at ../../../kern/kern_mutex.c:753
#4  0xc0596aad in knote (list=0xc383c880, hint=0, islocked=0) at 
../../../kern/kern_event.c:1487
#5  0xc05eb699 in ttwakeup (tp=0xc383c800) at ../../../kern/tty.c:2374
#6  0xc05e83b4 in ttyinput (c=13, tp=0xc383c800) at ../../../kern/tty.c:601
#7  0xc05ea23c in ttypend (tp=0xc383c800) at ../../../kern/tty.c:1658
#8  0xc05e9c5b in ttnread (tp=0xc383c800) at ../../../kern/tty.c:1352
#9  0xc05e9b80 in filt_ttyread (kn=0xc2803374, hint=0) at 
../../../kern/tty.c:1313
#10 0xc0596b3c in knote (list=0xc383c880, hint=0, islocked=0) at 
../../../kern/kern_event.c:1504
#11 0xc05eb699 in ttwakeup (tp=0xc383c800) at ../../../kern/tty.c:2374
#12 0xc05e937b in ttioctl (tp=0xc383c800, cmd=2150396949, data=0xc39864c0, 
flag=3) at ../../../kern/tty.c:1064
#13 0xc05ec6f5 in ttyioctl (dev=0x0, cmd=2150396949, data=0xc39864c0 "\006\t", 
flag=3, td=0x0) at ../../../kern/tty.c:2917
#14 0xc05ef0f8 in ptyioctl (dev=0xc2bce300, cmd=2150396949, data=0xc39864c0 
"\006\t", flag=0, td=0x0) at ../../../kern/tty_pty.c:623
#15 0xc056da0c in spec_ioctl (ap=0xe97b7c00) at 
../../../fs/specfs/spec_vnops.c:357
#16 0xc056d038 in spec_vnoperate (ap=0x0) at ../../../fs/specfs/spec_vnops.c:118
#17 0xc06233e4 in vn_ioctl (fp=0xc3968198, com=2150396949, data=0xc39864c0, 
active_cred=0xc3a00a00, td=0xc3d05780) at vnode_if.h:503
#18 0xc05dbae8 in ioctl (td=0xc3d05780, uap=0xe97b7d14) at file.h:257
#19 0xc07535e0 in syscall (frame={tf_fs = 47, tf_es = 47, tf_ds = 47, tf_edi = 
1, tf_esi = 1, tf_ebp = -1079060792, tf_isp = -377782924, tf_ebx = 674260420, 
tf_edx = 140677888, tf_ecx = -2144570347, tf_eax = 54, tf_trapno = 12, tf_err = 
2, tf_eip = 673731391, tf_cs = 31, tf_eflags = 582, tf_esp = -1079060884, tf_ss 
= 47}) at ../../../i386/i386/trap.c:1001
#20 0xc073e38f in Xint0x80_syscall () at ../../../i386/i386/exception.s:201
#21 0x002f in ?? ()
[garbage]
#48 0xc3585900 in ?? ()
#49 0xc05c6f90 in sched_switch (td=0x1, newtd=0x283065c4, flags=---Can't read 
userspace from dump, or kernel process---

) at ../../../kern/sched_4bsd.c:881
gdbcom:2: Error in sourced command file:
Previous frame inner to this frame (corrupt stack?)
(kgdb) 

-- 
Dan Nelson
[EMAIL PROTECTED]
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: panic: mutex tty owned at ../../../kern/kern_event.c:1487

2005-04-23 Thread Dan Nelson

In the last episode (Apr 23), John-Mark Gurney said:
> Dan Nelson wrote this message on Sat, Apr 23, 2005 at 12:28 -0500:
> > Got the following on a 2004-04-21 -stable kernel:
> > 
> > panic messages:
> > ---
> > panic: mutex tty owned at ../../../kern/kern_event.c:1487
> > cpuid = 0
> 
> I can whip up a patch if you want to try it (and can easily reproduce)...

I tried repeating the panic (hitting ^C in an app partway through its
startup routine), but it's not cooperating, unfortunately.

-- 
Dan Nelson
[EMAIL PROTECTED]
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: nss_ldap / top startup

2005-04-25 Thread Dan Nelson

In the last episode (Apr 25), Oliver Brandmueller said:
> I have some servers running running on 5.4-STABLE as of Apr 5th. I
> use nss_ldap for a userbase of currently about 24000 accounts (will
> be growing to approx 6 in the next weeks). I don't use pam_ldap
> currently, because users only need to login by IMAP, POP, SMTP and
> FTP, for all of these services daemons are used which natively auth
> against the LDAP server.
> 
> The more accounts there are in the LDAP directory, the longer the
> startup of "top" takes. With the current userbase top takes about 3-4
> seconds to start (on a mostly idle Dual Xeon 2.8GHz with fast disks
> and local slapd).
> 
> The startup time is not any different, sometimes I feel (did not try
> to measure) it's even longer, if I use "top -u" to not map uids. The
> running processes are only from a few uids, all the LDAP users
> usually don't have processes running under thier IDs.

You can benchmark top by running "time top -d1", which will print one
page then immediately exit.
 
> Any ideas, why this is happening? Will I need 10 seconds, when there
> are 6 accounts in LDAP? :-)

Try editing /usr/src/usr.bin/top/Makefile, add -DRANDOM_PW, and
rebuild.  That should probably be the default on FreeBSD anyway.

-- 
Dan Nelson
[EMAIL PROTECTED]
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: Trek Technology Thumbdrive 16MB

2005-05-04 Thread Dan Nelson

In the last episode (May 04), Gerrit Khn said:
> has anyone used the subject successfully with -stable? umass(4)
> mentions the 8MB version, so I thought 16MB should actually work,
> too. However, when I plug the device in, I just get
> 
> ugen0: Trek Technology ThumbDrive, rev 1.10/1.00, addr 2
> 
> and no umass device.
> 
> usbdevs shows
> 
> port 3 addr 2: full speed, power 40 mA, config 1, ThumbDrive(0x), Trek 
> Technology(0x0a16), rev 1.00, device ugen0

umass only attaches to devices it recognizes.  There are entries for
both the thumbdrive and thumbdrive_8mb in usbdevs, but only the 8mb
version is in umass.c.  Try copying the entry and changing
USB_PRODUCT_TREK_THUMBDRIVE_8MB to USB_PRODUCT_TREK_THUMBDRIVE, and see
if that works.

-- 
Dan Nelson
[EMAIL PROTECTED]
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: pthreads and nagios issue

2005-05-06 Thread Dan Nelson

In the last episode (May 06), Christophe Yayon said:
> i am upgrading our nagios 1.2 (on freebsd 5.3-release) to nagios 2.0
> (currently last cvs after 2.0b3) on Freebsd-5.4RC3 and i saw a very
> strange thing.
> 
> After few hours, nagios main process (nagios -d ...) use lot of cpu
> time and when i do a truss on the pid, i have a "kse_release" loop
> message.

Truss hasn't been updated to handle kse or thr threads yet, so don't
rely on that output.  ktrace shouldl still work, as will using gcore
and gdb to get stack traces.
 
-- 
Dan Nelson
[EMAIL PROTECTED]
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: Outdated lib*_p.a files

2005-05-09 Thread Dan Nelson

In the last episode (May 09), Jason C. Wells said:
> I run a homegrown script after upgrades to find outdated binaries.  I have 
> a bunch of files name /usr/lib/lib*_p.a that predate my recent upgrade to 
> 5.4-RELEASE.  What are these?  Can they be deleted without harm?

Those are versions of libraries built with profiling code.  If you have
NOPROFILE set in your make.conf, you should remove them from /usr/lib.

-- 
    Dan Nelson
[EMAIL PROTECTED]
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: Apache + Caching DNS: conflict at bootup? (DNS runs too late)

2005-05-09 Thread Dan Nelson

In the last episode (May 09), Colin Percival said:
> Rob wrote:
> > Some time ago, there was (or still is) a similar conflict with
> > hostname resolution at bootup when using ntpd.
> 
> Yes, but not with named -- the problem was only when using a dns
> cache from the ports tree, since those are started later in the boot
> sequence.

I always put two nameserver lines in my resolv.conf, even on machines
running bind (where the first line is 127.0.0.1).  That way if programs
are started before bind, they can still do DNs lookups.

-- 
Dan Nelson
[EMAIL PROTECTED]
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: PTHREAD_INVARIANTS in 5.x

2005-05-12 Thread Dan Nelson

In the last episode (May 12), Scott Long said:
> Daniel Eischen wrote:
> >On Wed, 11 May 2005, Jonathan Noack wrote:
> >>I checked out _PTHREADS_INVARIANTS for libthr and libpthread on CURRENT.
> >> As far as I can tell, all but one of the defines under
> >>_PTHREADS_INVARIANTS are ASSERTs; they check for a condition and if it
> >>is false result in a fatal error.  These should be very visible if
> >>they are being tripped.  Only MUTEX_INIT_LINK actually *does*
> >>something.  It is defined in src/lib/libpthread/thread/thr_mutex.c
> >>at lines 43-46 and in src/lib/libthr/thread/thr_mutex.c at lines
> >>44-47:
> >
> >This is way overblown and they're other areas for much better
> >optimizations than worrying about a couple of instructions.  Perhaps
> >if it were called _PTHREAD_ROBUST instead of _PTHREAD_INVARIANTS,
> >noone would notice ;-)
> 
> Yes, the check for the cross-linked threads libraries is still quite
> useful.  However, we gave a general policy of turning off most other
> debugging and invariants tools for production releases.  A good
> example is the malloc debugging options that are on in HEAD and off
> in RELENG_5. Would we be able to reach a compromise similar to that?

The malloc flags can cause serious performance issues, though, since
they basically force a memory fill before every malloc and after every
free.  On the other hand, shouldn't there be a better way of detecting
cross-linked threads libraries than dieing because some internal mutex
isn't initialized?  Maybe set __isthreaded to 1, 2, or 3 (or
(int)'c_r\0', 'kse\0', 'thr\0', to allow for even more threads libs)?

-- 
Dan Nelson
[EMAIL PROTECTED]
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: Boot loader cant identify ntfs?

2005-05-18 Thread Dan Nelson

In the last episode (May 18), Mike Jakubik said:
> Could someone tell me why our bootloader still can not recognize a
> ntfs partition, and report it as Windows instead of displaying "??" ?

The next release should:

RCS file: /home/ncvs/src/sys/boot/i386/boot0/boot0.S,v

revision 1.14
date: 2005/02/08 20:43:04;  author: des;  state: Exp;  lines: +2 -2
Remove type 0x4 (FAT12 <32MB) to make room for type 0x7 (NTFS).

revision 1.10.2.4 (RELENG_5)
date: 2005/04/21 15:42:28;  author: obrien;  state: Exp;  lines: +2 -2
MFC: rev 1.14: remove type 0x4 (FAT12 <32MB) to make room for type 0x7 (NTFS).

-- 
Dan Nelson
[EMAIL PROTECTED]
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: panic in recent RELENG_5 tcp code path

2005-05-20 Thread Dan Nelson

In the last episode (May 20), Kris Kennaway said:
> On Fri, May 20, 2005 at 05:15:36PM +0400, Gleb Smirnoff wrote:
> > On Fri, May 20, 2005 at 03:10:32PM +0200, Jeremie Le Hen wrote:
> > J> I'm going to recompile my kernel with INVARIANTS but I wonder in
> > J> which order of magniture it will slow my kernel down.  In other 
> > J> words, what does INVARIANTS do concretely, shall I expect a 
> > J> performance drop like WITNESS does ?
> > 
> > No. The performance loss is _much_ less significant than in WITNESS
> > case. You probably will not notice it.
> 
> Actually, INVARIANTS causes about a 10% penalty on wall clock time on
> 5.x and above.

Which is a lot less of a hit than WITNESS is, to be sure.  WITNESS is
like walking in mud :)  Do you know if INVARIANT_SUPPORT by itself is
enough to cause the 10% slowdown?  That turns on LOCK_DEBUG, which in
turn disables inlining of mutex macros.

-- 
Dan Nelson
[EMAIL PROTECTED]
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: savecore: first and last dump headers disagree

2005-05-23 Thread Dan Nelson

In the last episode (May 23), Palle Girgensohn said:
> We have an amd64 system that still experiences crashes after
> installing 5.4, mostly during high loads. (It's been unstable all the
> time, really; see previous posts.)
> 
> I've added dumpdev="/dev/amrd0s2b", and some time ago I did get coredumps, 
> but with latest versions of the kernel, savecore does not give me a dump, 
> instead it says:
> 
> savecore: first and last dump headers disagree on /dev/amrd0s2b
> savecore: unsaved dumps found but not saved

"savecore -vv" should print enough of both headers to let you see
what's different.

> Fatal trap 12: page fault while in kernel mode
> cpuid = 0: apic id = 00
> fault virtual address= 0x00
> ...
> trap number  = 12
> panic: page fault
> cpuid = 0
> boot() called on cpu#0
> Uptime: 1d23h50m36s
> Dumping 2047 MB
> 16 32
> 
> The cursor sits at the position after "32".

That's probably why your headers disagree :)  If you put "options
KDB_TRACE" in your kernel config file, it will print a small stack
trace before trying to dump, which might be enough to track down the
cause of the panic even without a dump.


-- 
Dan Nelson
[EMAIL PROTECTED]
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: good performing SCSI RAID5 ( was: asr ( 2015S ) support in 5.4 amd64? )

2005-06-01 Thread Dan Nelson

In the last episode (Jun 01), Steven Hartland said:
> Thanks for that Bruce I'm quite surprised that these numbers are so
> low after playing with a cheapo hightech SATA controller which with
> the help of the guys on the list I was able to give out 200MB/s I
> really would expect the relatively expensive SCSI controllers to do
> significantly better especially as they have superior disks attached
> ( 10K vs 7k2 ) and not performance which is well below ( 1/2 ) that
> expected of a single disk.

The faster rpms will get you more concurrent I/Os per second but won't
do as much for throughput.  My asr 3200S cards got repurposed before I
could try them with 5.x, but with the 370F firmware I'm pretty sure I
was able to get more than 40MB/sec reads out of them on 4.x with 4-disk
RAID5 sets.  Since the asr driver needs Giant, try a UP kernel and see
if it goes any faster.
 
-- 
Dan Nelson
[EMAIL PROTECTED]
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: libc.so.4 & libc_r.so.4 in ices0

2005-06-23 Thread Dan Nelson

In the last episode (Jun 23), Kevin S. Brackett said:
>   libc.so.4 => /usr/lib/libc.so.4 (0x28755000)
>   libc_r.so.4 => /usr/lib/libc_r.so.4 (0x287ee000)
> 
> any ideas why it's doing this, and what the fix is?

Looks fine to me:

libc.so.5 => /lib/libc.so.5 (0x2875c000)
libpthread.so.1 => /usr/lib/libpthread.so.1 (0x28836000)

Is this a machine recently upgraded from 4.*?  Does "ldd -a ices"
indicate that those libs are being pulled in as dependencies of another
library?  If so, rebuild that port, then rebuild ices.

Here is a script to find all the binaries linked to superceded port
libs and libs directly linked to threads libs:

#! /bin/sh
( find -s /usr/local/lib /usr/X11R6/lib -name "lib*.so"
  find -s /usr/local/bin /usr/X11R6/bin/
) |
xargs ldd -a 2>/dev/null |
awk '
  /^[^\t]/ { cmd=$1 } 
  /^\t.*\/compat\// { printf "%s\t%s\n",cmd,$3 }
  /^\t(libc_r|libpthread|libthr).so/ { printf "%s\t%s\n",cmd,$3 }
'


-- 
Dan Nelson
[EMAIL PROTECTED]
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: Upgrading from 4.x to 5.x ... possible?

2005-07-07 Thread Dan Nelson

In the last episode (Jul 07), Marc G. Fournier said:
> Without having to rebuild from scratch, is this something that is
> possible, or have the changes become so great as to make this
> undesirable?

That's what the Upgrade menu item in sysinstall is for :)  It'll save a
copy of /etc to a safe place then copy the passwd file back after the
install.  You'll want to rebuild all your ports, but most should still
work until you do.  A fresh install is always cleaner, but I've
upgraded some of my servers from 2.2.8 -> 4.* -> 5.* with no problems.

-- 
Dan Nelson
[EMAIL PROTECTED]
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: tcp troughput weirdness

2005-07-12 Thread Dan Nelson

In the last episode (Jul 12), David Malone said:
> > did the trick! now can someone remind me what inflight does? and
> > could someone explain why increasing sendspace alone did not do the
> > trick? (i had it at 64k, which got things better, but not
> > sufficient).
> 
> TCP inflight limiting is supposed to guess the bandwidth-delay
> product for a TCP connection and stop the window expanding much
> above this. It's a pretty neat idea for DSL links that often have
> huge buffers at the far end, where inflight limiting can prevent
> delays to interactive traffic.
> 
> However, some of the guys I know that work on TCP dynamics reckon
> that they can they can come up with situations where inflight
> limiting will break. Unfortunately, I haven't had time to talk
> this through with them. I guess you may have found one of those
> situations ;-)

You might want to apply the patch at the bottom of
http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/75122 ; without it, new
connections get a random initial bandwidth.

-- 
Dan Nelson
[EMAIL PROTECTED]
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: Q: RT32 (Request Tracker) + jail

2005-07-19 Thread Dan Nelson

In the last episode (Jul 19), J. Nyhuis said:
>   I would like to have RT running in a jailed environment.  The
> challenge, it seems, will be to get sendmail running in the same
> jailed environment as RT and the other components.
>   For those not so familiar with the components of RT, the 
> jail would include apache1.3+modperl, MySQL, sendmail, and RT. 
> That's a lot of stuff to get working in there!  (but fortunately
> FreeBSD jails seem straightforward and easy) ^_^
>   I expect sendmail to be the real problem of the above bunch.

Sendmail should do just fine, I'd think.  

-- 
Dan Nelson
[EMAIL PROTECTED]
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: QLogic QLA210 OEM (Sun) Fibre Cards and FreeBSD 5.x

2005-07-26 Thread Dan Nelson

In the last episode (Jul 26), Eli K. Breen said:
> Has anyone here had any success getting the Sun-branded Qlogic 2Gb fibre 
> adapters (QLA210) working under FreeBSD?
> 
> Apparently these boards should work as they're compatible with the 
> QLA2200/2300 stack (and therefore should work with the isp driver) but 
> when booting I see the following:
> 
> pcib3:  at device 0.2 on pci1
> pci3:  on pcib3
> pci3:  at device 11.0 (no driver attached)
> 
> I do have the ISP driver in the kernel.

Adding the PCI IDs to isp_pci.c may be enough to get it working.

-- 
Dan Nelson
[EMAIL PROTECTED]
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: Consistent file system hang with RELENG_6 of today ...

2005-07-29 Thread Dan Nelson

In the last episode (Jul 30), Marc G. Fournier said:
> 'k, this is turning out to be alot of fun ... the only machines I
> have here that I can use to talk to the portmaster are Windows boxes
> ... can you recommend a client for windows that would do good logging
> similar to what 'script' does under FreeBSD?  :(

Just about any terminal emulator will have a logging or capture option. 

CRT: File -> Log Session
Hyperterm:  Transfer -> Receive Text...
Putty: Settings -> Session -> Logging

-- 
Dan Nelson
[EMAIL PROTECTED]
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: broken port...

1999-12-24 Thread Dan Nelson


In the last episode (Dec 24), FreeBSD said:
> cute one though
> 
> Updater failed: Cannot install
> "/usr/ports/japanese/tk80/patches/#cvs.cvsup-1394.186" to
> "/usr/pobts/japanese/tk80/patches/patch-ab": No such file or directory
> 
> /usr/pobts :)

You didn't provide much information, but it looks like you were running
cvsup, right?  If you run it again, does it have trouble on the same
file?  'b' and 'r' are one bit apart from each other (01100010 and
01110010).  Sounds like your machine flipped a bit.

-- 
Dan Nelson
[EMAIL PROTECTED]


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-stable" in the body of the message

Re: Pseudoterminals increase: compilation error

2008-07-19 Thread Dan Nelson

In the last episode (Jul 19), Unga said:
> On Sat, 7/19/08, Peter Jeremy  wrote:
> > On 2008-Jul-18 18:38:36 -0700, Unga wrote:
> > >As per FAQ,
> > > http://www.freebsd.org/doc/en_US.ISO8859-1/books/faq/admin.html,
> > > I tried to increase the number of ptys: "10.19.1 Build and
> > > install a new kernel with the line in the configuration file:
> > > device pty N where N is the number of requested pseudoterminals."
> > 
> > That has been obsolete for a while.  Do you actually have a problem
> > with insufficient PTYs?
> 
> Looks like, may not be.
> 
> The Problem: 
> expect -c "spawn ls"
> spawn ls
> The system has no more ptys.  Ask your system administrator to create
>  more. while executing "spawn ls"
> 
> It now seems to be a permission problem as explained in
> http://expect.nist.gov/FAQ.html#q67 .
> 
> Still investigating. Any help will be very much appreciated.

Expect's error message doesn't say anything except "something isn't
working but I won't tell you what".  Run

truss -o truss.log -f expect -c "spawn ls" 

and determine which syscall is failing, with what error number, just
before expect prints its "no more ptys" message.  That will tell you
whether it's a permissions issue, or something else.  If there are no
obvious errors, post a part of the log.

Also, what version of expect are you running?  Versions between
5.38.0_1 and 5.43.0_2 had a bug in the port Makefile that limited the
number of ptys expect could see.  See
http://www.freebsd.org/cgi/query-pr.cgi?pr=108311 .

-- 
Dan Nelson
[EMAIL PROTECTED]
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: Temperature monitoring on old desktop - Dell OptiPlex SX270?

2008-08-03 Thread Dan Nelson

In the last episode (Aug 03), Torfinn Ingolfsen said:
> On Sat, 02 Aug 2008 20:19:12 -0700 Jeremy Chadwick wrote:
> 
> > On Sun, Aug 03, 2008 at 01:50:53AM +0200, Torfinn Ingolfsen wrote:
> > The first questions to ask are: 1) does this machine even have a
> > H/W monitoring IC on it, and 2) is it enabled/wired to thermistors
> > and fans?
> 
> Yes, but so far I haven't found out anything by searching.
> 
> > What processor is in it?  Not a Core2Duo.  I'm guessing since it's
> > circa 2004, probably a Pentium 3 or 4, or possibly an older AMD.
> 
> Pentium 4. From dmesg:
> CPU: Intel(R) Pentium(R) 4 CPU 2.60GHz (2593.51-MHz 686-class CPU)
>   Origin = "GenuineIntel"  Id = 0xf29  Stepping = 9
>   
> Features=0xbfebfbff
>   Features2=0x4400
>   Logical CPUs per core: 2
> 
> > None of those, to my knowledge, have on-die temperatures -- they all
> > rely on external H/W monitoring.
> 
> Ok, so what is the 'TM' feature of this cpu then?

>From what I can find on Intel's site for your CPU, TM is an emergency
switch that lowers the CPU speed to pervent overheating that could
damage the processor.  Under normal circumstances, it should never
trip, and its on/off status (not temperature) is only readable by two
pins on the CPU.  It can be disabled and enabled by software, but not
monitored.

http://download.intel.com/design/Pentium4/datashts/29864312.pdf

"The Thermal Monitor feature helps control the processor temperature by
activating the Thermal Control Circuit (TCC) when the processor silicon
reaches its maximum operating temperature. The TCC reduces processor
power consumption by modulating (starting and stopping) the internal
processor core clocks. The Thermal Monitor feature must be enabled for
the processor to be operating within specifications. The temperature at
which Thermal Monitor activates the thermal control circuit is not user
configurable and is not software visible."

> > I just checked http://tingox.googlepages.com/sx270 and sure enough, an
> > older P4.  coretemp(4) won't work with this.
> 
> I know, I just thought that ther might be something similar for the
> TM feature of Pentium 4's.
> 
> > I would start by booting the machine into Windows and install
> > SpeedFan.  If that thing is able to detect and provide thermal data,
> 
> Ouch. I was hoping that I wouldn't have to do that. The machine have
> no internal CD-drive, and for some reason doesn't want to boot from a
> (usb) external cd-drive either (kind of funny - it boots from flash
> drives and external hard drives. But cd-rom -no).
> 
> I was hoping to solve this without windows in the picture.

-- 
Dan Nelson
[EMAIL PROTECTED]
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: snapshots and disk usage

2008-09-07 Thread Dan Nelson

In the last episode (Sep 07), Stefan `Sec` Zehl said:
> Hi,
> 
> I am using ufs snapshots on RELENG_7 for some time now, and am generally
> happy with it. I have noticed a strange behaviour when removing large
> amount of files, and wanted to ask if this is expected.
> 
> Before starting, we check the free space on /usr:
> 
> | ice:/usr>df -h .
> | Filesystem SizeUsed   Avail Capacity  Mounted on
> | /dev/ad4s2.elid9.7G7.6G1.3G64%/usr
> 
> Then delete /usr/obj and run df again:
> 
> | ice:/usr>sudo rm -rf obj 2>/dev/null
> | ice:/usr>df -h .
> | Filesystem SizeUsed   Avail Capacity  Mounted on
> | /dev/ad4s2.elid9.7G5.7G3.2G64%/usr
> 
> This is unexpected. With snapshots, removing something should not
> release space.
> 
> Sure enough, in the course of the next minute, the fake free space
> vanishes
> 
> | ice:/usr>df -h .
> | Filesystem SizeUsed   Avail Capacity  Mounted on
> | /dev/ad4s2.elid9.7G5.9G3.0G66%/usr
> | ice:/usr>df -h .
> | Filesystem SizeUsed   Avail Capacity  Mounted on
> | /dev/ad4s2.elid9.7G6.6G2.3G74%/usr
> | ice:/usr>df -h .
> | Filesystem SizeUsed   Avail Capacity  Mounted on
> | /dev/ad4s2.elid9.7G8.6G269M97%/usr
> 
> and all the free space is allocated in the snapshot:
> 
> | ice:~>sudo snapshot list
> | Filesystem  User   User% Snap   Snap%  Snapshot
> | /usr 8GB   89.3%  2GB   21.5%  daily.1
> | /usr 8GB   89.3%344MB3.5%  daily.0
> | /usr 8GB   89.3%344MB3.5%  weekly.0
> | /usr 8GB   89.3%344MB3.5%  hourly.1
> | /usr 8GB   89.3%  7MB0.1%  hourly.0
> 
> My understanding so far was that df may underreport free space, but i
> find overreporting it a bit troublesome. -- What would happen if I tried
> to use that space before it was allocated to the snapshot?

I think you're running into the softupdates delay.  When you delete a
file on a SU-enabled filessytem, the space isn't actually freed until
sync.  But applications expect that statfs() info is updated
immediately, so the kernel pretends that the space is available.  That
doesn't really work with a snapshot, since if you delete a file that
existed in the snapshot, no space will free up.  So you see a jump in
freespace as the kernel fakes the f_bfree statfs amount, then it slowly
drops to the correct value as the deletions actually sync to disk.

-- 
Dan Nelson
[EMAIL PROTECTED]
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: Problem with dtrace

2008-09-20 Thread Dan Nelson

In the last episode (Sep 20), Michel Talon said:
> Still testing FreeBSD-7.1-beta encountered the following (perhaps
> to be expected) result with dtrace:
> 
> dtrace -m kernel  -> some output -> deadlock after a few seconds.
> 
> Less demanding tracing worked OK.

proc, profile, and syscall probes work fine for me; it seems to be just
fbt probes that cause problems.  Enabling any one will cause a trap 12
a few instructions inside the probed function when it gets called.

-- 
Dan Nelson
[EMAIL PROTECTED]
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: UNEXPECTED SOFT UPDATE INCONSISTENCY; RUN fsck MANUALLY

2008-09-29 Thread Dan Nelson

In the last episode (Sep 30), Andrew Snow said:
> Zaphod Beeblebrox wrote:
> > Also, there exists data within the ARC (I'm always tempted to say
> > the ARC Cache, but that is redundant) that is also then in paging
> > memory.
> 
> OK, but one advantage of ZFS memory consumption is under heavy write
> loads, where much of the memory is used to store and reorder writes.
> The heavy memory consumption under reading is a shame, but ZFS has to
> cache and use more metadata than UFS, so its a price you pay for the
> extra features and benefits.
> 
> What I think we need is a way to turn off read-caching except for
> metadata.  This allows ARC to only be used more efficiently. 
> Currently you can turn all read-ahead on or off, with the provided
> sysctl tunables, but would be easy to implement a metadata-only
> option.  I found that access speed suffers when metadata is not
> prefetched.

That'd be handy, but at least on my system the data prefetcher isn't
really called often enough to make a difference either way (assuming
the counts are accurate).  Metadata prefetch is a big win, however.

([EMAIL PROTECTED]) /home/dan> uptime
11:00PM  up 5 days, 13:47, 21 users, load averages: 1.52, 1.68, 1.69
([EMAIL PROTECTED]) /home/dan> sysctl kstat
[..]
kstat.zfs.misc.arcstats.hits: 211130907 (95%)
kstat.zfs.misc.arcstats.misses: 9808431
kstat.zfs.misc.arcstats.demand_data_hits: 116614377 (98%)
kstat.zfs.misc.arcstats.demand_data_misses: 2477943
kstat.zfs.misc.arcstats.demand_metadata_hits: 55805261 (96%)
kstat.zfs.misc.arcstats.demand_metadata_misses: 2310006
kstat.zfs.misc.arcstats.prefetch_data_hits: 79878 (53%)
kstat.zfs.misc.arcstats.prefetch_data_misses: 71741
kstat.zfs.misc.arcstats.prefetch_metadata_hits: 38556033 (88%)
kstat.zfs.misc.arcstats.prefetch_metadata_misses: 4947270
kstat.zfs.misc.arcstats.mru_hits: 23702582 (95%)
kstat.zfs.misc.arcstats.mru_ghost_hits: 1274189
kstat.zfs.misc.arcstats.mfu_hits: 149722171 (98%)
kstat.zfs.misc.arcstats.mfu_ghost_hits: 2944572
[..]
kstat.zfs.misc.arcstats.p: 235221504
kstat.zfs.misc.arcstats.c: 268435456
kstat.zfs.misc.arcstats.c_min: 67108864
kstat.zfs.misc.arcstats.c_max: 268435456
kstat.zfs.misc.arcstats.size: 263926784

-- 
Dan Nelson
[EMAIL PROTECTED]
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: Weird truss output

2008-12-03 Thread Dan Nelson

In the last episode (Dec 03), Vlad GALU said:
> I'm running a statically linked binary, which I've built inside a
> jail. The jail's libc & co are in sync with the host's. Truss then
> shows this:
> 
> -- cut here --
> -- UNKNOWN SYSCALL 1048532 --
> -- UNKNOWN SYSCALL 1048532 --

Is this a threaded app that you attached truss to after it was started?
The method that truss uses to catch syscall enter/exit events doesn't
indicate whether the event is an enter or an exit, so if you attach
while a syscall is active, truss handles the exit event as if it were a
syscall entry event, and never gets back in synch.  It gets worse with
threaded apps because each thread is another chance to get out of
synch.  Try this patch:

Index: i386-fbsd.c
===
RCS file: /home/ncvs/src/usr.bin/truss/i386-fbsd.c,v
retrieving revision 1.29
diff -u -p -r1.29 i386-fbsd.c
--- i386-fbsd.c 28 Jul 2007 23:15:04 -  1.29
+++ i386-fbsd.c 3 Dec 2008 15:20:09 -
@@ -149,7 +149,14 @@ i386_syscall_entry(struct trussinfo *tru
   fsc.name =
 (syscall_num < 0 || syscall_num > nsyscalls) ? NULL : 
syscallnames[syscall_num];
   if (!fsc.name) {
-fprintf(trussinfo->outfile, "-- UNKNOWN SYSCALL %d --\n", syscall_num);
+fprintf(trussinfo->outfile, "-- UNKNOWN SYSCALL %u (0x%08x) --\n", 
syscall_num, syscall_num);
+if ((unsigned int)syscall_num > 0x1000) {
+  /* When attaching to a running process, we have a 50-50 chance
+ of attaching to a process waiting in a syscall, which means
+ our first trap is an exit instead of an entry and we're out
+ of synch. Reset our flag */
+  trussinfo->curthread->in_syscall = 0;
+}
   }
 
   if (fsc.name && (trussinfo->flags & FOLLOWFORKS)


-- 
Dan Nelson
[EMAIL PROTECTED]
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: Weird truss output

2008-12-03 Thread Dan Nelson

In the last episode (Dec 03), Vlad GALU said:
> On Wed, Dec 3, 2008 at 5:23 PM, Dan Nelson <[EMAIL PROTECTED]> wrote:
> > In the last episode (Dec 03), Vlad GALU said:
> >> I'm running a statically linked binary, which I've built inside a
> >> jail. The jail's libc & co are in sync with the host's. Truss then
> >> shows this:
> >>
> >> -- cut here --
> >> -- UNKNOWN SYSCALL 1048532 --
> >> -- UNKNOWN SYSCALL 1048532 --
> >
> > Is this a threaded app that you attached truss to after it was
> > started? The method that truss uses to catch syscall enter/exit
> > events doesn't indicate whether the event is an enter or an exit,
> > so if you attach while a syscall is active, truss handles the exit
> > event as if it were a syscall entry event, and never gets back in
> > synch.  It gets worse with threaded apps because each thread is
> > another chance to get out of synch.  Try this patch:
> 
> You were right, this application was indeed threaded. The messages
> still occur, although at a slightly lower rate. One other thing
> that's not particularly helpful is this:
> 
> -- cut here--
>  read(1074283119,"\M-Ry\^A\0",7356800)= 4 (0x4)
> -- and here --
> 
> I obviously don't have that many descriptors in my process. I can
> live with the malformed message, but it's a PITA not to know which fd
> the read was actually made from :(

It looks like there's some other problem where truss either drops a
syscall event, or puts some status fields into the wrong thread's
structure.  It seems to happen when two threads call blocking syscalls,
and when they return, truss gets confused as to which thread called
which syscall.  The read syscall is probably still pending, and you're
getting the arguments of the syscall that returned, printed as if it
was the read syscall.  When the read call completes, you'll probably
get an --UNKNOWN SYSCALL-- line, or another mismatched syscall output
line.

An alternative it to use ktrace/kdump to trace the process; it's more
cumbersome to use, but doesn't have problems with threaded processes.

-- 
Dan Nelson
[EMAIL PROTECTED]
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: Weird truss output

2008-12-03 Thread Dan Nelson

In the last episode (Dec 03), Dan Nelson said:
> It looks like there's some other problem where truss either drops a
> syscall event, or puts some status fields into the wrong thread's
> structure.  It seems to happen when two threads call blocking
> syscalls, and when they return, truss gets confused as to which
> thread called which syscall.  The read syscall is probably still
> pending, and you're getting the arguments of the syscall that
> returned, printed as if it was the read syscall.  When the read call
> completes, you'll probably get an --UNKNOWN SYSCALL-- line, or
> another mismatched syscall output line.

It turns out that was the problem.  There was a global structure that
held syscall information.  Converting it to a per-thread structure
makes it work much better :)  If you're adventurous, try applying the
patch at http://www.evoy.net/FreeBSD/truss.diff , which fixes that
problem plus a bunch of other stuff.  If you're not adventurous, try
the following instead, which just fixes the per-thread problem.

Index: i386-fbsd.c
===
RCS file: /home/ncvs/src/usr.bin/truss/i386-fbsd.c,v
retrieving revision 1.29
diff -u -r1.29 i386-fbsd.c
--- i386-fbsd.c 28 Jul 2007 23:15:04 -  1.29
+++ i386-fbsd.c 3 Dec 2008 18:48:23 -
@@ -49,6 +49,7 @@
 #include 
 #include 
 
+#include 
 #include 
 #include 
 #include 
@@ -77,29 +78,29 @@
  * 'struct syscall' describes the system call; it may be NULL, however,
  * if we don't know about this particular system call yet.
  */
-static struct freebsd_syscall {
+struct freebsd_syscall {
struct syscall *sc;
const char *name;
int number;
unsigned long *args;
int nargs;  /* number of arguments -- *not* number of words! */
char **s_args;  /* the printable arguments */
-} fsc;
+};
 
 /* Clear up and free parts of the fsc structure. */
 static __inline void
-clear_fsc(void) {
-  if (fsc.args) {
-free(fsc.args);
+clear_fsc(struct freebsd_syscall *fsc) {
+  if (fsc->args) {
+free(fsc->args);
   }
-  if (fsc.s_args) {
+  if (fsc->s_args) {
 int i;
-for (i = 0; i < fsc.nargs; i++)
-  if (fsc.s_args[i])
-   free(fsc.s_args[i]);
-free(fsc.s_args);
+for (i = 0; i < fsc->nargs; i++)
+  if (fsc->s_args[i])
+   free(fsc->s_args[i]);
+free(fsc->s_args);
   }
-  memset(&fsc, 0, sizeof(fsc));
+  memset(fsc, 0, sizeof(*fsc));
 }
 
 /*
@@ -117,9 +118,20 @@
   unsigned int parm_offset;
   struct syscall *sc = NULL;
   struct ptrace_io_desc iorequest;
+  struct freebsd_syscall *fsc;
+
   cpid = trussinfo->curthread->tid;
 
-  clear_fsc();
+  fsc = trussinfo->curthread->fsc;
+  if (fsc == NULL)
+  {
+   fsc = malloc(sizeof(*fsc));
+   if (fsc == NULL)
+  errx(1, "cannot allocate syscall struct");
+memset(fsc, 0, sizeof(*fsc));
+trussinfo->curthread->fsc = fsc;
+  } else
+clear_fsc(fsc);
   
   if (ptrace(PT_GETREGS, cpid, (caddr_t)®s, 0) < 0)
   {
@@ -145,17 +157,24 @@
 break;
   }
 
-  fsc.number = syscall_num;
-  fsc.name =
+  fsc->number = syscall_num;
+  fsc->name =
 (syscall_num < 0 || syscall_num > nsyscalls) ? NULL : 
syscallnames[syscall_num];
-  if (!fsc.name) {
-fprintf(trussinfo->outfile, "-- UNKNOWN SYSCALL %d --\n", syscall_num);
+  if (!fsc->name) {
+fprintf(trussinfo->outfile, "-- UNKNOWN SYSCALL %u (0x%08x) --\n", 
syscall_num, syscall_num);
+if ((unsigned int)syscall_num > 0x1000) {
+  /* When attaching to a running process, we have a 50-50 chance
+ of attaching to a process waiting in a syscall, which means
+ our first trap is an exit instead of an entry and we're out
+ of synch. Reset our flag */
+  trussinfo->curthread->in_syscall = 0;
+}
   }
 
-  if (fsc.name && (trussinfo->flags & FOLLOWFORKS)
-   && ((!strcmp(fsc.name, "fork")
-|| !strcmp(fsc.name, "rfork")
-|| !strcmp(fsc.name, "vfork"
+  if (fsc->name && (trussinfo->flags & FOLLOWFORKS)
+   && ((!strcmp(fsc->name, "fork")
+|| !strcmp(fsc->name, "rfork")
+|| !strcmp(fsc->name, "vfork"
   {
 trussinfo->curthread->in_fork = 1;
   }
@@ -163,30 +182,30 @@
   if (nargs == 0)
 return;
 
-  fsc.args = malloc((1+nargs) * sizeof(unsigned long));
+  fsc->args = malloc((1+nargs) * sizeof(unsigned long));
   iorequest.piod_op = PIOD_READ_D;
   iorequest.piod_offs = (void *)parm_offset;
-  iorequest.piod_addr = fsc.args;
+  iorequest.piod_addr = fsc->args;
   iorequest.piod_len = (1+nargs) * sizeof(unsigned long);
   ptrace(PT_IO, cpid, (caddr_t)&iorequest, 0);
   if (iorequest.piod_len == 0)
 return;
 
-  if (fsc.name)
-

Re: Weird truss output

2008-12-03 Thread Dan Nelson

In the last episode (Dec 03), Vlad GALU said:
> On Wed, Dec 3, 2008 at 8:56 PM, Dan Nelson <[EMAIL PROTECTED]> wrote:
> [...]
> 
>   Am I doing something wrong? I've applied the full diff, rebuilt
> truss, now I get this:
> -- cut here --
> [EMAIL PROTECTED] / # truss -p 52731
> SIGNAL 17 (SIGSTOP)
> 
> -- UNKNOWN SYSCALL 1048535 --
> -- UNKNOWN SYSCALL 1048496 --
> -- UNKNOWN SYSCALL 1048559 --
> -- UNKNOWN SYSCALL 1048559 --
> -- UNKNOWN SYSCALL -8464 --
> -- UNKNOWN SYSCALL -8464 --
> -- UNKNOWN SYSCALL 527 --
> -- UNKNOWN SYSCALL 527 --
> /100084: read(1074283119,"\M-|\M^WP\^A",7356800) = 4 (0x4)
> -- UNKNOWN SYSCALL 527 --
> -- UNKNOWN SYSCALL 7385248 --
> -- and here --
> 
>  Perhaps I should mention that I block all signals from all  threads,
> except for one, where I do all the handling/cleanup.

So you're back to your original behaviour basically?  Not sure what's
wrong; it all works great on my machine...  Are you on a 64-bit system? 
I only have a Pentium-III here, so the big patch isn't guaranteed to
work on anything except i386.  The little patch inlined in my previous
email is for i386-fbsd.c, but you should be able to make similar
changes to amd64-fbsd.c (most of the diff just replaces "fsc." with
"fsc->" ).

-- 
Dan Nelson
[EMAIL PROTECTED]
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: Why is NFSv4 so slow? (root/toor)

2010-06-29 Thread Dan Nelson

In the last episode (Jun 29), Rick C. Petty said:
> On Tue, Jun 29, 2010 at 10:20:57AM -0500, Adam Vande More wrote:
> > On Tue, Jun 29, 2010 at 9:58 AM, Rick Macklem  wrote:
> > 
> > > I suppose if the FreeBSD world feels that "root" and "toor" must both
> > > exist in the password database, then "nfsuserd" could be hacked to
> > > handle the case of translating uid 0 to "root" without calling
> > > getpwuid().  It seems ugly, but if deleting "toor" from the password
> > > database upsets people, I can do that.
> > 
> > I agree with Ian on this.  I don't use toor either, but have seen people
> > use it, and sometimes it will get recommended here for various reasons
> > e.g.  running a root account with a different default shell.  It
> > wouldn't bother me having to do this provided it was documented, but
> > having to do so would be a POLA violation to many users I think.
> 
> To be fair, I'm not sure this is even a problem.  Rick M. only suggested
> it as a possibility.  I would think that getpwuid() would return the first
> match which has always been root.  At least that's what it does when
> scanning the passwd file; I'm not sure about NIS.  If someone can prove
> that this will cause a problem with NFSv4, we could consider hackingit. 
> Otherwise I don't think we should change this behavior yet.

If there are multiple users that map to the same userid, nscd on Linux will
select one name at random and return it for getpwuid() calls.  I haven't
seen this behaviour on FreeBSD or Solaris, though.  They always seem to
return the first entry in the passwd file.

-- 
Dan Nelson
dnel...@allantgroup.com
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: 8.1R ZFS almost locking up system

2010-08-21 Thread Dan Nelson

In the last episode (Aug 21), Tim Bishop said:
> I've had a problem on a FreeBSD 8.1R system for a few weeks. It seems
> that ZFS gets in to an almost unresponsive state. Last time it did it
> (two weeks ago) I couldn't even log in, although the system was up, this
> time I could manage a reboot but couldn't stop any applications (they
> were likely hanging on I/O).

Could your pool be very close to full?  Zfs will throttle itself when it's
almost out of disk space.  I know it's "saved" me from filling up my
filesystems a couple times :)

> A few items from top, including zfskern:
> 
>   PID USERNAME  THR PRI NICE   SIZERES STATE   C   TIME   WCPU COMMAND
> 5 root4  -8- 0K60K zio->i  0  54:38  3.47% zfskern
> 91775 70  1  440 53040K 31144K tx->tx  1   2:11  0.00% postgres
> 39661 tdb 1  440 55776K 32968K tx->tx  0   0:39  0.00% mutt
> 14828 root1  470 14636K  1572K tx->tx  1   0:03  0.00% zfs
> 11188 root1  510 14636K  1572K tx->tx  0   0:03  0.00% zfs
> 
> At some point during this process my zfs snapshots have been failing to
> complete:
> 
> root5  0.8  0.0 060  ??  DL7Aug10  54:43.83 [zfskern]
> root 8265  0.0  0.0 14636  1528  ??  D10:00AM   0:03.12 zfs snapshot 
> -r po...@2010-08-21_10:00:01--1d
> root11188  0.0  0.1 14636  1572  ??  D11:00AM   0:02.93 zfs snapshot 
> -r po...@2010-08-21_11:00:01--1d
> root14828  0.0  0.1 14636  1572  ??  D12:00PM   0:03.04 zfs snapshot 
> -r po...@2010-08-21_12:00:00--1d
> root17862  0.0  0.1 14636  1572  ??  D 1:00PM   0:01.96 zfs snapshot 
> -r po...@2010-08-21_13:00:01--1d
> root20986  0.0  0.1 14636  1572  ??  D 2:00PM   0:02.07 zfs snapshot 
> -r po...@2010-08-21_14:00:01--1d

procstat -k on some of these processes might help to pinpoint what part of
the zfs code they're all waiting in.

-- 
Dan Nelson
dnel...@allantgroup.com
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: 8.1R ZFS almost locking up system

2010-08-31 Thread Dan Nelson

In the last episode (Aug 31), Tim Bishop said:
> On Sat, Aug 21, 2010 at 05:24:29PM -0500, Dan Nelson wrote:
> > In the last episode (Aug 21), Tim Bishop said:
> > > A few items from top, including zfskern:
> > > 
> > >   PID USERNAME  THR PRI NICE   SIZERES STATE   C   TIME   WCPU COMMAND
> > > 5 root4  -8- 0K60K zio->i  0  54:38  3.47% zfskern
> > > 91775 70  1  440 53040K 31144K tx->tx  1   2:11  0.00% 
> > > postgres
> > > 39661 tdb 1  440 55776K 32968K tx->tx  0   0:39  0.00% mutt
> > > 14828 root1  470 14636K  1572K tx->tx  1   0:03  0.00% zfs
> > > 11188 root1  510 14636K  1572K tx->tx  0   0:03  0.00% zfs
> > > 
> > > At some point during this process my zfs snapshots have been failing to
> > > complete:
> > > 
> > > root5  0.8  0.0 060  ??  DL7Aug10  54:43.83 [zfskern]
> > > root 8265  0.0  0.0 14636  1528  ??  D10:00AM   0:03.12 zfs 
> > > snapshot -r po...@2010-08-21_10:00:01--1d
> > > root11188  0.0  0.1 14636  1572  ??  D11:00AM   0:02.93 zfs 
> > > snapshot -r po...@2010-08-21_11:00:01--1d
> > > root14828  0.0  0.1 14636  1572  ??  D12:00PM   0:03.04 zfs 
> > > snapshot -r po...@2010-08-21_12:00:00--1d
> > > root17862  0.0  0.1 14636  1572  ??  D 1:00PM   0:01.96 zfs 
> > > snapshot -r po...@2010-08-21_13:00:01--1d
> > > root20986  0.0  0.1 14636  1572  ??  D 2:00PM   0:02.07 zfs 
> > > snapshot -r po...@2010-08-21_14:00:01--1d
> > 
> > procstat -k on some of these processes might help to pinpoint what part of
> > the zfs code they're all waiting in.
> 
> It happened again this Saturday (clearly something in the weekly
> periodic run is triggering the issue). procstat -kk shows the following
> for processes doing something zfs related (where zfs related means the
> string 'zfs' in the procstat -kk output):
> 
> 0 100084 kernel   zfs_vn_rele_task mi_switch+0x16f 
> sleepq_wait+0x42 _sleep+0x31c taskqueue_thread_loop+0xb7 fork_exit+0x118 
> fork_trampoline+0xe 
> 5 100031 zfskern  arc_reclaim_thre mi_switch+0x16f 
> sleepq_timedwait+0x42 _cv_timedwait+0x129 arc_reclaim_thread+0x2d1 
> fork_exit+0x118 fork_trampoline+0xe 
> 5 100032 zfskern  l2arc_feed_threa mi_switch+0x16f 
> sleepq_timedwait+0x42 _cv_timedwait+0x129 l2arc_feed_thread+0x1be 
> fork_exit+0x118 fork_trampoline+0xe 
> 5 100085 zfskern  txg_thread_enter mi_switch+0x16f 
> sleepq_wait+0x42 _cv_wait+0x111 txg_thread_wait+0x79 txg_quiesce_thread+0xb5 
> fork_exit+0x118 fork_trampoline+0xe 
> 5 100086 zfskern  txg_thread_enter mi_switch+0x16f 
> sleepq_wait+0x42 _cv_wait+0x111 zio_wait+0x61 dsl_pool_sync+0xea 
> spa_sync+0x355 txg_sync_thread+0x195 fork_exit+0x118 fork_trampoline+0xe 
>17 100040 syncer   -mi_switch+0x16f 
> sleepq_wait+0x42 _cv_wait+0x111 txg_wait_synced+0x7c zil_commit+0x416 
> zfs_sync+0xa6 sync_fsync+0x184 sync_vnode+0x16b sched_sync+0x1c9 
> fork_exit+0x118 fork_trampoline+0xe 
>  2210 100156 syslogd  -mi_switch+0x16f 
> sleepq_wait+0x42 _cv_wait+0x111 txg_wait_open+0x85 zfs_freebsd_write+0x378 
> VOP_WRITE_APV+0xb2 vn_write+0x2d7 dofilewrite+0x85 kern_writev+0x60 
> writev+0x41 syscall+0x1e7 Xfast_syscall+0xe1 
>  3500 100177 syslogd  -mi_switch+0x16f 
> sleepq_wait+0x42 _cv_wait+0x111 txg_wait_open+0x85 zfs_freebsd_write+0x378 
> VOP_WRITE_APV+0xb2 vn_write+0x2d7 dofilewrite+0x85 kern_writev+0x60 
> writev+0x41 syscall+0x1e7 Xfast_syscall+0xe1 
>  3783 100056 syslogd  -mi_switch+0x16f 
> sleepq_wait+0x42 _cv_wait+0x111 txg_wait_open+0x85 zfs_freebsd_write+0x378 
> VOP_WRITE_APV+0xb2 vn_write+0x2d7 dofilewrite+0x85 kern_writev+0x60 
> writev+0x41 syscall+0x1e7 Xfast_syscall+0xe1 
>  4064 100165 mysqld   initial thread   mi_switch+0x16f 
> sleepq_wait+0x42 _cv_wait+0x111 txg_wait_open+0x85 dmu_tx_assign+0x16c 
> zfs_inactive+0xd9 zfs_freebsd_inactive+0x1a vinactive+0x6a vputx+0x1cc 
> vn_close+0xa1 vn_closefile+0x5a _fdrop+0x23 closef+0x3b kern_close+0x14d 
> syscall+0x1e7 Xfast_syscall+0xe1 
>  4441 100224 python2.6initial thread   mi_switch+0x16f 
> sleepq_wait+0x42 _cv_wait+0x111 txg_wait_open+0x85 dmu_tx_assign+0x16c 
> zfs_inactive+0xd9 zfs_freebsd_inactive+0x1a vinactive+0x6a vputx+0x1cc 
> null_reclaim+0xbc vgonel+0x12e vrecycle+0x7d null_inactive+0x1f 
> vinactive+0x6a vputx+0x1cc vn_close+0xa1 vn_closefile+0x5a _fdrop+0x23 
>   100227 python2.6initial thread   mi_swit

Re: 8.1R ZFS almost locking up system

2010-09-02 Thread Dan Nelson

In the last episode (Sep 01), Tim Bishop said:
> On Tue, Aug 31, 2010 at 10:58:29AM -0500, Dan Nelson wrote:
> > In the last episode (Aug 31), Tim Bishop said:
> > > It happened again this Saturday (clearly something in the weekly
> > > periodic run is triggering the issue).  procstat -kk shows the
> > > following for processes doing something zfs related (where zfs related
> > > means the string 'zfs' in the procstat -kk output):
> > > 
> > > 0 100084 kernel   zfs_vn_rele_task mi_switch+0x16f 
> > > sleepq_wait+0x42 _sleep+0x31c taskqueue_thread_loop+0xb7 fork_exit+0x118 
> > > fork_trampoline+0xe 
> > > 5 100031 zfskern  arc_reclaim_thre mi_switch+0x16f 
> > > sleepq_timedwait+0x42 _cv_timedwait+0x129 arc_reclaim_thread+0x2d1 
> > > fork_exit+0x118 fork_trampoline+0xe 
> > > 5 100032 zfskern  l2arc_feed_threa mi_switch+0x16f 
> > > sleepq_timedwait+0x42 _cv_timedwait+0x129 l2arc_feed_thread+0x1be 
> > > fork_exit+0x118 fork_trampoline+0xe 
> > > 5 100085 zfskern  txg_thread_enter mi_switch+0x16f 
> > > sleepq_wait+0x42 _cv_wait+0x111 txg_thread_wait+0x79 
> > > txg_quiesce_thread+0xb5 fork_exit+0x118 fork_trampoline+0xe 
> > > 5 100086 zfskern  txg_thread_enter mi_switch+0x16f 
> > > sleepq_wait+0x42 _cv_wait+0x111 zio_wait+0x61 dsl_pool_sync+0xea 
> > > spa_sync+0x355 txg_sync_thread+0x195 fork_exit+0x118 fork_trampoline+0xe 
> > >17 100040 syncer   -mi_switch+0x16f 
> > > sleepq_wait+0x42 _cv_wait+0x111 txg_wait_synced+0x7c zil_commit+0x416 
> > > zfs_sync+0xa6 sync_fsync+0x184 sync_vnode+0x16b sched_sync+0x1c9 
> > > fork_exit+0x118 fork_trampoline+0xe 
> > >  2210 100156 syslogd  -mi_switch+0x16f 
> > > sleepq_wait+0x42 _cv_wait+0x111 txg_wait_open+0x85 
> > > zfs_freebsd_write+0x378 VOP_WRITE_APV+0xb2 vn_write+0x2d7 
> > > dofilewrite+0x85 kern_writev+0x60 writev+0x41 syscall+0x1e7 
> > > Xfast_syscall+0xe1 
> > >  3500 100177 syslogd  -mi_switch+0x16f 
> > > sleepq_wait+0x42 _cv_wait+0x111 txg_wait_open+0x85 
> > > zfs_freebsd_write+0x378 VOP_WRITE_APV+0xb2 vn_write+0x2d7 
> > > dofilewrite+0x85 kern_writev+0x60 writev+0x41 syscall+0x1e7 
> > > Xfast_syscall+0xe1 
> > >  3783 100056 syslogd  -mi_switch+0x16f 
> > > sleepq_wait+0x42 _cv_wait+0x111 txg_wait_open+0x85 
> > > zfs_freebsd_write+0x378 VOP_WRITE_APV+0xb2 vn_write+0x2d7 
> > > dofilewrite+0x85 kern_writev+0x60 writev+0x41 syscall+0x1e7 
> > > Xfast_syscall+0xe1 
> > >  4064 100165 mysqld   initial thread   mi_switch+0x16f 
> > > sleepq_wait+0x42 _cv_wait+0x111 txg_wait_open+0x85 dmu_tx_assign+0x16c 
> > > zfs_inactive+0xd9 zfs_freebsd_inactive+0x1a vinactive+0x6a vputx+0x1cc 
> > > vn_close+0xa1 vn_closefile+0x5a _fdrop+0x23 closef+0x3b kern_close+0x14d 
> > > syscall+0x1e7 Xfast_syscall+0xe1 
> > >  4441 100224 python2.6initial thread   mi_switch+0x16f 
> > > sleepq_wait+0x42 _cv_wait+0x111 txg_wait_open+0x85 dmu_tx_assign+0x16c 
> > > zfs_inactive+0xd9 zfs_freebsd_inactive+0x1a vinactive+0x6a vputx+0x1cc 
> > > null_reclaim+0xbc vgonel+0x12e vrecycle+0x7d null_inactive+0x1f 
> > > vinactive+0x6a vputx+0x1cc vn_close+0xa1 vn_closefile+0x5a _fdrop+0x23 
> > >   100227 python2.6initial thread   mi_switch+0x16f 
> > > sleepq_wait+0x42 _cv_wait+0x111 txg_wait_open+0x85 dmu_tx_assign+0x16c 
> > > zfs_inactive+0xd9 zfs_freebsd_inactive+0x1a vinactive+0x6a vputx+0x1cc 
> > > null_reclaim+0xbc vgonel+0x12e vrecycle+0x7d null_inactive+0x1f 
> > > vinactive+0x6a vputx+0x1cc vn_close+0xa1 vn_closefile+0x5a _fdrop+0x23 
> > >  4445 100228 python2.6initial thread   mi_switch+0x16f 
> > > sleepq_wait+0x42 _cv_wait+0x111 txg_wait_open+0x85 dmu_tx_assign+0x16c 
> > > zfs_inactive+0xd9 zfs_freebsd_inactive+0x1a vinactive+0x6a vputx+0x1cc 
> > > null_reclaim+0xbc vgonel+0x12e vrecycle+0x7d null_inactive+0x1f 
> > > vinactive+0x6a vputx+0x1cc vn_close+0xa1 vn_closefile+0x5a _fdrop+0x23 
> > >  4446 100229 python2.6initial thread   mi_switch+0x16f 
> > > sleepq_wait+0x42 _cv_wait+0x111 txg_wait_open+0x85 dmu_tx_assign+0x16c 
> > > zfs_inactive+0xd9 zfs_freebsd_inactive+0x1a vinactive+0x6a vputx+0x1cc 
> > > null_reclaim+0xbc vgonel+0x12e vrecycle+0x7d null_inactive+0x1f 
> > > vinactive+0x6a vputx+0x1cc vn_close+0xa1 vn_closefile+0x5a _fdrop+0x23 
> > >  4447 100

Re: TTY task group scheduling

2010-11-19 Thread Dan Nelson

In the last episode (Nov 19), Alexander Leidinger said:
> Quoting Alexander Best  (from Fri, 19 Nov 2010 00:17:10 
> +):
> > 17:51 @  Genesys : Luigi Rizzo had a plugabble scheduler back in 4.* or 
> > thereabouts
> > 17:51 @  Genesys : you could kldload new ones and switch to them on the fly
> > 17:52 @  arundel : wow. that sounds cool. too bad it didn't make it  
> > into src tree. by now it's probably outdated and needs to be reworked quite 
> > a bit.
> > 
> >
> > does anybody know something about this?
> 
> I'm aware of the I/O scheduling code (which is now available at least  
> in -current), but I do not remember CPU scheduling code from Luigi.  
> Are you sure Genesys didn't mix up something by accident?

I am rarely mixed up :)  A quick search didn't bring up a direct reference,
but here's a mention of it from Luigi:

http://lists.freebsd.org/pipermail/freebsd-hackers/2004-November/008891.html

-- 
Dan Nelson
dnel...@allantgroup.com
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Panic in ZFS layer on 8.1-STABLE

2010-12-15 Thread Dan Nelson

In the last episode (Dec 15), Andriy Gapon said:
> on 15/12/2010 10:28 Jeremie Le Hen said the following:
> > Hi,
> > 
> > [ Please Cc: me when replying, as I'm not subscribed to -sta...@. ]
> > 
> > My filer at home runs FreeBSD.  A single data RAID-1 zpool with 10~15
> > datasets, two of them using compression.  Over the night, I got the
> > following panic:
> 
> Thanks for the stack trace!
> But where is the promised panic message? :)
> 
> I suspect that you ran out of kernel address space.
> You'd probably have to tune your system and/or add more memory.
> Please research this topic via mailing lists archives.
> 
> > Tracing pid 0 tid 100111 td 0x86393a00
> > kdb_enter(809faa5b,809faa5b,80a12e84,cb114aec,0,...) at kdb_enter+0x3a
> > panic(80a12e84,1c000,2e3e8000,80a12e7e,7d0,...) at panic+0x131
> > kmem_malloc(8169008c,1c000,2,cb114b6c,80909a99,...) at kmem_malloc+0x285
> > page_alloc(0,1c000,cb114b5f,2,2f0c800,...) at page_alloc+0x27
> > uma_large_malloc(1c000,2,0,8609b3f0,30,...) at uma_large_malloc+0x4a
> > malloc(1c000,860b2120,2,cb114bb0,8601d36d,...) at malloc+0x7c
> > zfs_kmem_alloc(1c000,2,cb114bf0,8601f77b,1c000,...) at
> > zfs_kmem_alloc+0x20
> > zio_buf_alloc(1c000,cb114c30,86008817,92c33bd0,cb114bf0,...) at
> > zio_buf_alloc+0x44
> > zio_compress_data(3,b4264000,2,0,cb114c58,...) at
> > zio_compress_data+0x8b

The following patch may help you.  It helps me :)  It converts the
zio_buf_alloc() call into a zio_buf_alloc_nowait(), so that if the alloc
fails, zio_compress_data() returns failure and zfs writes the block
uncompressed instead of panicing.

Index: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c
===
--- sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c(revision 
216418)
+++ sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c(working copy)
@@ -202,6 +202,20 @@ zio_buf_alloc(size_t size)
return (kmem_alloc(size, KM_SLEEP));
 }
 
+void *
+zio_buf_alloc_nowait(size_t size)
+{
+#ifdef ZIO_USE_UMA
+   size_t c = (size - 1) >> SPA_MINBLOCKSHIFT;
+
+   ASSERT(c < SPA_MAXBLOCKSIZE >> SPA_MINBLOCKSHIFT);
+
+   return (kmem_cache_alloc(zio_buf_cache[c], KM_NOSLEEP));
+#else
+   return (kmem_alloc(size, KM_NOSLEEP));
+#endif
+}
+
 /*
  * Use zio_data_buf_alloc to allocate data.  The data will not appear in a
  * crashdump if the kernel panics.  This exists so that we will limit the 
amount
Index: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio_compress.c
===
--- sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio_compress.c   
(revision 216418)
+++ sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio_compress.c   
(working copy)
@@ -32,6 +32,12 @@
 #include 
 #include 
 
+int panics_avoided_by_not_compressing = 0;
+SYSCTL_DECL(_vfs_zfs);
+SYSCTL_INT(_vfs_zfs, OID_AUTO, compression_panics_avoided, CTLFLAG_RD, 
+   &panics_avoided_by_not_compressing, 0,
+"kmem_map panics avoided by skipping compression when memory is low");
+
 /*
  * Compression vectors.
  */
@@ -109,7 +115,17 @@ zio_compress_data(int cpfunc, void *src, uint64_t
destbufsize = P2ALIGN(srcsize - (srcsize >> 3), SPA_MINBLOCKSIZE);
if (destbufsize == 0)
return (0);
+
+#if 1
+   dest = zio_buf_alloc_nowait(destbufsize);
+   if (dest == 0)
+   {
+   panics_avoided_by_not_compressing++;
+   return (0);
+   }
+#else
dest = zio_buf_alloc(destbufsize);
+#endif
 
ciosize = ci->ci_compress(src, dest, (size_t)srcsize,
(size_t)destbufsize, ci->ci_level);
Index: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zio.h
===
--- sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zio.h(revision 
216418)
+++ sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zio.h(working copy)
@@ -398,6 +398,7 @@ extern zio_t *zio_unique_parent(zio_t *cio);
 extern void zio_add_child(zio_t *pio, zio_t *cio);
 
 extern void *zio_buf_alloc(size_t size);
+extern void *zio_buf_alloc_nowait(size_t size);
 extern void zio_buf_free(void *buf, size_t size);
 extern void *zio_data_buf_alloc(size_t size);
 extern void zio_data_buf_free(void *buf, size_t size);



-- 
Dan Nelson
dnel...@allantgroup.com
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: link aggregation - bundling 2 lagg interfaces together

2011-02-04 Thread Dan Nelson

In the last episode (Feb 04), Damien Fleuriot said:
> I have a firewall with 2x Intel pro dual port cards.
> 
> On Intel A , port 1 goes to switch 1, port 2 goes to switch 2
> On Intel B , port 1 goes to switch 1, port 2 goes to switch 2
> 
> I have created the following 2 lagg devices using LACP:
> 
> lagg0 = A1 + B1
> lagg1 = A2 + B2
> 
> This works fine.
> 
> Now, what I had in mind was creating a lagg2 device using lagg0 and
> lagg1 with failover.
> 
> That would provide redundancy in case of a switch failure.
> 
> ifconfig won't let me though:
> 
> # ifconfig lagg2 laggproto failover laggport lagg0 laggport lagg1
> ifconfig: SIOCSLAGGPORT: Invalid argument
> 
> I suppose it's not possible to aggregate lagg interfaces ?

Apparently not: http://fxr.watson.org/fxr/source/net/if_lagg.c#L516

It looks like there is preliminary code under #ifdef LAGG_PORT_STACKING, but
it claims to be untested.

-- 
Dan Nelson
dnel...@allantgroup.com
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: 3TB disc and block alignment

2011-02-17 Thread Dan Nelson

In the last episode (Feb 17), Daniel Kalchev said:
> >>> da0:  Fixed Direct Access SCSI-5 device
> >>> da0: 2861588MB (5860533168 512 byte sectors: 255H 63S/T 364801C)
> >
> > Thanks -- is it also possible to have something like
> >
> > da0: 2861588MB (732566646 4096 byte sectors: 255H 63S/T 364801C)
>
> According to Hitachi, this is an 512b drive.

Correct.  This isn't a 4k drive.  Datasheet:

http://www.hgst.com/internal-drives/enterprise/ultrastar/ultrastar-7k3000

Sector size (variable, Bytes/sector)512 

-- 
Dan Nelson
dnel...@allantgroup.com
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: (MORE INFO) Ext firewire drive not mounted after update

2009-11-13 Thread Dan Nelson

In the last episode (Nov 13), Robert said:
> On Fri, 13 Nov 2009 13:01:47 -0800
> Robert  wrote:
> 
> In the time honored FreeBSD tradition, I am replying to my own email. 
> 
> I booted with a 8.0RC2 livefs CD and the external disk shows up as
> /dev/da0, /das1, /das1d.  I then connected the external drive via USB and
> rebooted to the 8.0 Prerelease system.  The drive shows up and is able to
> mount.
> 
> It appears that some thing is amiss with the latest version. I will
> download the latest livefs iso and see if that works.

I think I remember seing a posting within the last few days saying that the
"sbp" device wan't going to be compiled into the 8.0-release kernel due to
it causing hangs on boot.  If you run "kldload sbp" as root after the system
has booted you should see your disk devices appear.

I can't find the list post mentioning it, but here's the svn commit log:

r199112 | kensmith | 2009-11-09 15:39:42 -0600 (Mon, 09 Nov 2009) | 11 lines

Changed paths:
   M /stable/8/sys/amd64/conf/GENERIC
   M /stable/8/sys/i386/conf/GENERIC
   M /stable/8/sys/ia64/conf/GENERIC
   M /stable/8/sys/powerpc/conf/GENERIC
   M /stable/8/sys/sparc64/conf/GENERIC

Comment out the sbp(4) entry for GENERIC config files that contain it. 
There are known issues with this driver that are beyond what can be fixed
for 8.0-RELEASE and the bugs can cause boot failure on some systems.  It's
not clear if it impacts all systems and there is interest in getting the
problem fixed so for now just comment it out instead of remove it.

Commit straight to stable/8, this is an 8.0-RELEASE issue.  Head was left
alone so work on it can continue there.

Reviewed by:    Primary misc. architecture maintainers (marcel, marius)

-- 
Dan Nelson
dnel...@allantgroup.com
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: (MORE INFO) Ext firewire drive not mounted after update

2009-11-15 Thread Dan Nelson

In the last episode (Nov 13), Robert said:
> On Fri, 13 Nov 2009 17:15:39 -0600 Dan Nelson  wrote:
> > In the last episode (Nov 13), Robert said:
> > > On Fri, 13 Nov 2009 13:01:47 -0800
> > > Robert  wrote:
> > > It appears that some thing is amiss with the latest version. I will
> > > download the latest livefs iso and see if that works.
> > 
> > I think I remember seing a posting within the last few days saying that
> > the "sbp" device wan't going to be compiled into the 8.0-release kernel
> > due to it causing hangs on boot.  If you run "kldload sbp" as root after
> > the system has booted you should see your disk devices appear.
>
> Thanks for responding. I checked and the "sbp" device is in fact commented
> out.  I do remember a thread a month or two back about some folkes having
> trouble with firewire drives.  I never experienced any trouble on of that
> trouble on this system.
> 
> I can continue to operate my drive on USB but I may need firewire in the
> near future.  I have a friend who is a photographer and I archive her
> photos for her.  She sends me an external drive or two and I burn her
> projects onto DVD.  I am not sure if her drives have an USB connector.

Note that you can still run "kldload sbp" after bootup to see fireware
disks.  You can also try adding "device sbp" back to your kernel config and
see if it works for you.  The hangs apparently only happen on certain
motherboards.

-- 
Dan Nelson
dnel...@allantgroup.com
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: 8.0-RC USB/FS problem

2009-11-24 Thread Dan Nelson

In the last episode (Nov 24), Jeremy Chadwick said:
> On Tue, Nov 24, 2009 at 06:13:21PM +0100, Hans Petter Selasky wrote:
> > On Tuesday 24 November 2009 17:58:47 Guojun Jin wrote:
> > > Sorry for the typo -- it is public not pub in the middle. The others 
> > > should
> > > be all public.
> > >
> > > http:/www.daemonfun.com/archives/public/USB/crash1-reset.bz2
> > >
> > 
> > %fetch http:/www.daemonfun.com/archives/public/USB/crash1-reset.bz2
> > fetch: http:/www.daemonfun.com/archives/public/USB/crash1-reset.bz2: No 
> > address record
> 
> The above issue is unrelated to the USB/FS problem.  It looks like
> fetch(1) has a parser bug.  Note the text portion between the URI and URL
> is colon-slash not colon-slash-slash like it should be.

That's a typo in the URL, not a bug in fetch :)

-- 
Dan Nelson
dnel...@allantgroup.com
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: how to get the UFSID of a mounted filesystem ?

2009-11-29 Thread Dan Nelson

In the last episode (Nov 30), Pete French said:
> I observer that when I mount a UFS filesystem using the device name then
> the entry vanishes from /dev/ufsid, and glabel list no longer shows the
> device.  Which begs the question, how do I find out the ufsid of a mountde
> filesystem (e.g.  '/' so that I can change it's fstab entry for the next
> reboot?)
> 
> Am slightly embarassed to have to ask for help! Am sure this was easy and
> in dmesg last time I did this...

Easiest way is to run dumpfs on the device you currently have mounted.  The
fsid will be on the 2nd line of the output:

(r...@studio) /root># dumpfs /dev/da2s1a | head -2
magic   19540119 (UFS2) timeSun Nov 29 18:19:39 2009
superblock location 65536   id  [ 49b21fba 667e8575 ]

Next easiest is to run "mount -v" as root, which will give you the fsid, but
byte-swapped so you have to mess with it to get a value that matches what
glabel expects:

/dev/ufsid/49b21fba667e8575 on /tmp/z (ufs, local, soft-updates, fsid 
ba1fb24975857e66)

-- 
Dan Nelson
dnel...@allantgroup.com
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: do I want ch0 or pass1?

2010-01-22 Thread Dan Nelson

In the last episode (Jan 21), Dan Langille said:
> Please CC me on replies.
> 
> I'm running into issues with hard-coding some devices (see recent post
> titled 'device.hints isn't setting what I want').
> 
> Associated with this issue is confusion over whether I want to use ch0
> or pass1.  I have these devices:
> 
> at scbus1 target 0 lun 0 (ch0,pass1)
> at scbus1 target 5 lun 0 (sa1,pass2)
> 
> My understanding: chio(1) will with ch0, whereas mtx(1) will work with
> pass1.  Is this correct?  More information/elaboration will help I'm sure.
> 
> Why do I ask? I can get the tape changer and tape drive hardwired to ch0
> and sa1 respectively.  I cannot [yet] do the same with pass1.

You can try wiring them down the same way you wire down regular devices, but
if they're created sequentially in probe order, that won't work.

Ideally, mtx should use cam_open_spec_device() which, when given a device
name, will automatically open the matching pass device.

-- 
Dan Nelson
dnel...@allantgroup.com
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: numeric sort(1) is broken on -STABLE

2010-02-10 Thread Dan Nelson

In the last episode (Feb 10), Ulrich Spörlein said:
> On Wed, 10.02.2010 at 13:49:05 +0300, Ruslan Ermilov wrote:
> > On Wed, Feb 10, 2010 at 09:58:14AM +0100, Ulrich Spörlein wrote:
> > > not sure if this is a pilot error, but it seems to me that gnu sort -n
> > > is broken on at least -STABLE (couldn't test -CURRENT yet).
> > > 
> > > It somehow does not manifest when using a simple list and sorting on a
> > > specific column, but it always happens to me when using it in
> > > combination with find(1).
> > > 
> > > % truncate -s10m a; truncate -s5m b; truncate -s800k c
> > > % find a b c -ls|sort -nk7,7
> > >  8   64 -rw-r--r--1 uqs  wheel
> > > 10485760 Feb 10 09:13 a
> > > 10   64 -rw-r--r--1 uqs  wheel 
> > > 5242880 Feb 10 09:13 b
> > > 12   64 -rw-r--r--1 uqs  wheel  
> > > 819200 Feb 10 09:13 c
> > 
> > I bet you're using some non-C locale for LC_NUMERIC.  What does "locale"
> > output tell you?
> 
> Yes and no. LC_NUMERIC is still at C, LC_CTYPE is set to UTF-8, but as
> there are no non-ASCII symbols in that output it shouldn't matter, right? 
> For me, 819200 is smaller than 10485760 in pretty much all locales.  Why
> the hell is a numeric gnusort locale dependant?  Why is -g working anyway?

Try adding a 'b' to your sort flags.  I bet the leading spaces in front of
your numbers are being treated as part of the sort key.  Maybe de_DE.UTF-8
and C have different ideas of what is whitespace?

-- 
Dan Nelson
dnel...@allantgroup.com
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: ugen kernel module?

2010-03-09 Thread Dan Nelson

In the last episode (Mar 10):
> In FreeBSD7 there was ugen.ko kernel module and I can use apcupsd with USB
> devices, but in FreeBSD there is no such module, how can I use APC power
> supply with usb interface (I mean usage of the apcupsd port)?

It's built into the usb subsystem now.  All USB devices (including USB hubs
and devices controlled by other drivers) now have a ugen device.  Try
running "usbconfig list" to show them.  I bet your UPS has just moved to a
different ugen number.

-- 
Dan Nelson
dnel...@allantgroup.com
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Make ZFS auto-destroy snapshots when the out of space?

2010-05-29 Thread Dan Nelson

In the last episode (May 29), Kirk Strauser said:
> I found some nice scripts to regularly snapshot all the filesystems in my
> ZFS pool at
> http://www.neces.com/blog/technology/integrating-freebsd-zfs-and-periodic-snapshots-and-scrubs
> . One thing bothers me, though: I have to intentionally set how many 
> months' worth of snapshots I want to keep. Too many and I run out of 
> room. Too few and I lose some of the benefits of easy recovery of 
> deleted data. My computer is better at bookkeeping than I am, so why not 
> let it?
> 
> I'd propose standardizing on an attribute like
> org.freebsd:allowautodestroy.  Modify ZFS's disk full behavior to scan for
> snapshots with that attribute set and destroy the oldest one, and continue
> until there's enough free space to complete a write requests or until out
> of "expendable" snapshots to destroy (at which time the normal disk full
> handler would run).  Also run a daily periodic script to ensure that the
> free space stays below a configurable threshold each day so that ZFS isn't
> constantly butting up against completely full drives.

If the kernel does the snapshot deleting itself, why not add a pool-level
property that sets the amount of free space at which the deletion starts? 
That way you don't need the cleanup script.  Alternatively, make the
org.freebsd:allowautodestroy property hold the trigger freespace amount. 
That way you can have monthly/daily/hourly snapshots but set it so the
hourly ones disappear first, then the dailies (by setting the destroy
trigger slightly higher for the ones you want to expire first).

-- 
Dan Nelson
dnel...@allantgroup.com
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: questionable feature- rcvar woes

2007-11-28 Thread Dan Nelson

In the last episode (Nov 28), Andrei Kolu said:
> Something is wrong with rcvar or I am just blatant.
> 
> For example:
> 
> 1) Enable powerd in rc.conf
> # echo 'enable_powerd="YES"' >> /etc/rc.conf
> 2) Launch powerd
> # /etc/rc.d/powerd start
> Starting powerd.
> 3) And stopping it.
> # /etc/rc.d/powerd stop
> Stopping powerd.
> 
> Everything looks fine, but when I disable powerd in rc.conf then problem 
> arise.
> 
> 1) Disable powerd in rc.conf- comment it out.
> # enable_powerd="YES"
> 2) Stop powerd
> # /etc/rc.d/powerd stop
> ...silence- nothing in logs either.
> 
> What? Not even a warning message and powerd is actually running- why
> I have to reboot to disable it? I know that I can stop it by enabling
> it in rc.conf but what the point? Same problem when I want to start
> some service without appropriate line in rc.conf. I'd prefer to see
> somekind of warning about misconfigured rc.conf or at least
> information about what's going on in reality.

Try "/etc/rc.d/powerd forcestop".  What happens during startup and
shutdown is that all rc.d scripts are run with "start" or "stop"
arguments, and only the ones that have been enabled do anything.

-- 
Dan Nelson
[EMAIL PROTECTED]
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: mount -p and NFS options

2008-02-05 Thread Dan Nelson

In the last episode (Feb 04), Mike Andrews said:
> Is there anything like "mount -p" that will print the current NFS
> options in use?  TCP vs UDP, v2 vs v3, read/write sizes etc.  It
> doesn't have to be in fstab format; I just need to be able to see
> what the flags are for an active mount.
> 
> This would be useful in tracking down an irritating NFS problem I've
> been experiencing with diskless systems in every 6.x release and
> 7.0-RC1, namely libc.so.6 appears to be truncated or corrupt to the
> client at somewhat random times...  I think it may be related to
> mount options, hence the question.

Theoretically, any filesystem that uses nmount(2) should have its
options recored in an easy-to-extract format, since one of the
arguments to nmount is an array of options.  I patched my kernel and
/sbin/mount binary to do this (borrowing the f_charspare field in
struct statfs), and it mostly works.  The stuff below in <> brackets
are from the options array.  You can see that cd9660 was mounted with
the option "ssector=0":

 ([EMAIL PROTECTED]) /root># mount
 local/root on / (zfs, NFS exported, local, )
 devfs on /dev (devfs, local, <>)
 /dev/ufs/boot on /.boot (ufs, local, soft-updates, )
 procfs on /proc (procfs, local, )
 /dev/md0 on /tmp (ufs, NFS exported, local, <>)
 /dev/cd0 on /cdrom (cd9660, NFS exported, local, read-only, )

Unfortunately, mount_nfs simply calls nmount with a single "nfs_args"
option whose value is the same binary "struct nfs_args" it used to call
mount(2) with :(  The fix would be to make nfs_vfsops.c and mount_nfs.c
use the options array instead of a custom struct, but
nfs_vfsops.c:nfs_decode_args scares me off every time I look at it.

-- 
Dan Nelson
[EMAIL PROTECTED]
Index: sys/kern/vfs_mount.c
===
RCS file: /home/ncvs/src/sys/kern/vfs_mount.c,v
retrieving revision 1.265.2.2
diff -u -p -r1.265.2.2 vfs_mount.c
--- sys/kern/vfs_mount.c	17 Jan 2008 04:24:53 -	1.265.2.2
+++ sys/kern/vfs_mount.c	18 Jan 2008 23:13:48 -
@@ -1020,6 +1020,40 @@ vfs_domount(
 		if (mp->mnt_opt != NULL)
 			vfs_freeopts(mp->mnt_opt);
 		mp->mnt_opt = mp->mnt_optnew;
+	
+		/* Collapse the mount options into a readable string */
+		mp->mnt_stat.f_charspare[0]=0;
+		if (mp->mnt_opt) {
+			struct vfsopt *opt;
+			struct sbuf *sb;
+
+			sb = sbuf_new(NULL, mp->mnt_stat.f_charspare, 
+sizeof(mp->mnt_stat.f_charspare), 
+SBUF_FIXEDLEN);
+			TAILQ_FOREACH(opt, mp->mnt_opt, link) {
+/*
+ * Skip options that are temporary, stored 
+ * elsewhere in struct statfs, or are structs
+ */
+if (strcmp(opt->name,"errmsg") == 0 ||
+strcmp(opt->name,"from") == 0 ||
+strcmp(opt->name,"fspath") == 0 ||
+strcmp(opt->name,"fstype") == 0 ||
+strcmp(opt->name,"nfs_args") == 0 ||
+strcmp(opt->name,"update") == 0 )
+	continue;
+if (sbuf_len(sb))
+	sbuf_cat(sb, ",");
+sbuf_cat(sb, opt->name);
+if (opt->len) {
+	sbuf_cat(sb, "=");
+	sbuf_cat(sb, opt->value);
+}
+			}
+			sbuf_finish(sb);
+			sbuf_delete(sb);
+		}
+
 		(void)VFS_STATFS(mp, &mp->mnt_stat, td);
 	}
 	/*
Index: sbin/mount/mount.c
===
RCS file: /home/ncvs/src/sbin/mount/mount.c,v
retrieving revision 1.96
diff -u -p -r1.96 mount.c
--- sbin/mount/mount.c	25 Jun 2007 05:06:54 -	1.96
+++ sbin/mount/mount.c	2 Oct 2007 21:20:18 -
@@ -596,6 +596,7 @@ prmount(struct statfs *sfp)
 			(void)printf(", %s", o->o_name);
 			flags &= ~o->o_opt;
 		}
+	printf(", <%s>",sfp->f_charspare);
 	/*
 	 * Inform when file system is mounted by an unprivileged user
 	 * or privileged non-root user.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: Analysis of disk file block with ZFS checksum error

2008-02-08 Thread Dan Nelson

In the last episode (Feb 08), Joe Peterson said:
> Mark Day wrote:
> > Based on the subset of data you posted, the bad data looks like
> > ASCII text. The bad data from offset a to a000f is:
> >
> > ${138AFE{@
> > @$$}1
> >
> > The bad data from offset af6c1 to af6c8 is:
> >
> > 392A9}@
> >
> > I don't recognize the content beyond that, but I'd guess that
> > somehow the contents of some other file managed to overwrite that
> > portion of the bad file.  As for how that happened, I don't know. 
> > But if someone recognizes where the bad content came from, that
> > might be a clue.
> 
> Good eye!  Yes, it indeed does appear to be ASCII.  I *thought*
> something in the repetition when I originally did an od -a looked
> interesting.
> 
> I dumped the whole bad section as a string, and here's (partly) what I get:
> 
> @$${138B8B{@
> <(21470=Thu Jan 24 23:20:58 2008)>
> [117:^80(^91^21470)]
> @$$}138B8B}@
...
> @$${138C18{@
> <(21472=1201242069)>[-2:^80(^82^85)(^83^1B5)(^84=b)(^85=1)(^86=0)(^87=0)
> (^88=0)(^89^2146C)(^8A=)(^8B=40)(^8C=2e)(^8D^84)(^8E=0)(^90^21472)
> (^91^21460)]
> @$$}138C18}@
> 
> and more of the same.  Note the date string.  There are several like
> that.  Anyone recognize this text format?

It's a Mork database from the Mozilla project:

http://developer.mozilla.org/en/docs/Mork_Structure#Rows

-- 
Dan Nelson
[EMAIL PROTECTED]
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: Question about file system checks

2008-03-28 Thread Dan Nelson

In the last episode (Mar 28), Ivan Voras said:
> Danny Pansters wrote:
>> Generally I can say that with freebsd even if you pull the plug and
>> then let it reboot and do the automatical background fsck you'll
>> likely loose only that one file you might have been editing while
>> (or just before) you unplugged the box.
> 
> Stress testing I've done suggests otherwise :) I've literally
> repeatedly pulled the plug of a server in a controlled environment,
> and with a network logging of (a high load of) file system
> operations. My results show that UFS+SU and ZFS on FreeBSD loose *the
> most* files (and in case of UFS+SU especially directories), than any
> of: jfs, xfs, reiser3 (on Linux 2.6.22) and NTFS (on Windows 2003
> Server). ext3 is somewhat similar to UFS+SU, though about 30% better
> at not loosing files.

Note that you can tweak the SU caching time by adjusting the sysctls
kern.{meta,dir,file}delay.  Take them down to 10 seconds instead of 30
and you'll lose less files (at the cost of more disk I/O of course).
 
> Some other notes from this proceeding:
> 
> 1. UFS+gjournal looses the least, but it's also the slowest.
> 2. UFS+SU had no truncated files or files of unexpected length (apparently 
> it just looses the file that would end up in this state)
> 3. XFS and JFS end up with a *huge* number of files that are truncated or 
> of unexpected length (40%-50%!)
> 4. In no case has any of the above file systems gone completely corrupted 
> or lost any of the files/directories not being updated.
> 5. ZFS on FreeBSD was the fastest, in the sense of creating the most files 
> during this benchmark (though speed was not the target for this benchmark 
> so this is a low-quality observation), closely followed by JFS and XFS.

ZFS's transaction commit interval is only 5 seconds (see txg_time in
uts/common/fs/zfs/txg.c); how many more files/second did it create vs
the others to be able to lose the most files in that window? :)

> 6. ZFS crashed the kernel at least once.

-- 
Dan Nelson
[EMAIL PROTECTED]
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: NFS and /etc/exports

2008-04-14 Thread Dan Nelson

In the last episode (Apr 14), Alfred Perlstein said:
> * Robert Blayzor <[EMAIL PROTECTED]> [080414 06:07] wrote:
> > On Apr 14, 2008, at 7:02 AM, Nawfal bin Mohmad Rouyan wrote:
> > >I'm using TCP and the entry in /etc/fstab on all clients is as below:
> > >
> > >build:/usr/ports/usr/ports  nfs 
> > >tcp,intr,nfsv3,-w=32768,-r=32768,rw,noauto  0   0
> > >build:/usr/src  /usr/srcnfs 
> > >tcp,intr,nfsv3,-w=32768,-r=32768,rw,noauto  0   0
> > >build:/usr/obj  /usr/objnfs 
> > >tcp,intr,nfsv3,-w=32768,-r=32768,rw,noauto  0   0
> > 
> > Are -r and -w really needed/useful for TCP mounts?
> 
> yes.

This is interesting: according to mountnfs() in nfs_vfsops.c, those are
already the kernel defaults:

if ((argp->flags & NFSMNT_NFSV3) && argp->sotype == SOCK_STREAM) {
nmp->nm_wsize = nmp->nm_rsize = NFS_MAXDATA;
} else {
nmp->nm_wsize = NFS_WSIZE;
nmp->nm_rsize = NFS_RSIZE;
}

$ grep nfs_maxdata /sys/nfs/*
/sys/nfs/nfsproto.h:#define NFS_MAXDATA 32768

But it looks like /sbin/mount_nfs always overrides them to NFS_WSIZE
and NFS_RSIZE (both 8K) in its nfsdefargs struct.

-- 
Dan Nelson
[EMAIL PROTECTED]
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: zeroed fields in ps output

2008-04-23 Thread Dan Nelson

In the last episode (Apr 23), WaW said:
> I have noticed something strange with some processes running in my
> system. Look at ps output below: it says that nfsd, smbd and zsh are
> running for ~13992 days, that means their ELAPSED field == 0 in unix
> time. Moreover, RSS field also == 0. This happens in 1-2 days after
> system is booted up. Is this a bug or a feature?
> 
> System is 7.0-RELEASE/amd64. And if it makes sense - nfs and samba do
> export zfs filesystems.
> 
> USER  PID %CPU %MEM   VSZ   RSS  ELAPSED STARTED STAT COMMAND
> root  675  0.0  0.0  1616 0 13992-16:28:05 -   IWs  /sbin/devd
> root  784  0.0  0.0  3572 0 13992-16:28:05 -   IWs  nfsd:
> root  786  0.0  0.0  3572 0 13992-16:28:05 -   IW   nfsd:
> root  787  0.0  0.0  3572 0 13992-16:28:05 -   IW   nfsd:
> root  788  0.0  0.0  3572 0 13992-16:28:05 -   IW   nfsd:
> root  789  0.0  0.0  3572 0 13992-16:28:05 -   IW   nfsd:
> root  846  0.0  0.0 3 0 13992-16:28:05 -   IW 
> /usr/local/sbin/smbd -D -s /usr/local/etc/smb.conf
> waw  1021  0.0  0.0 17372 0 13992-16:28:05 -   IWs  /bin/zsh
> waw  1026  0.0  0.0 16220 0 13992-16:28:05 -   IWs  /bin/zsh
> root 1030  0.0  0.0 19400 0 13992-16:28:05 -   IW   su -

Processes with a W in the second column of STAT have been completely
swapped out; That definitely explains why RSS=0, and may explain why
etime is unavailable.  ps should probably print a "-" there (like it
does for STARTED) instead of an obviously wrong value.

-- 
Dan Nelson
[EMAIL PROTECTED]
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: auto_nlist failed on cp_time at location 1

2008-04-23 Thread Dan Nelson

In the last episode (Apr 23), Tim Stoddard said:
> I just upgraded from FreeBSD 6.2 ->
> 6.3 (using source tree).  I then recompiled my net-snmp port binaries (using
> portupgrade).  I am now get error message in my logs every five secs. 
> I am sure my libkvm is in sync with my kernel.  I do not know what else
> to look at.  

You got bit by 

 revision 1.178.2.5
 date: 2008/04/09 19:47:20;  author: peter;  state: Exp;  lines: +68 -5
 MFC: record per-cpu stats for %user/%nice/%system/%idle

, which removed the kernel variable that net-snmp uses to track CPU
usage. Try this patch (put it in /usr/ports/net-mgmt/net-snmp/files and
rebuild net-snmp).  I've sent it to the net-snmp port maintainer so
hopefully it will be committed soon.

-- 
Dan Nelson
[EMAIL PROTECTED]
--- agent/mibgroup/hardware/cpu/cpu_nlist.c 2007-01-19 10:53:44.0 
-0600
+++ agent/mibgroup/hardware/cpu/cpu_nlist.c 2008-04-22 00:13:48.330686919 
-0500
@@ -1,5 +1,5 @@
 /*
- *   nlist() interface
+ *   sysctl() interface
  * e.g. FreeBSD
  */
 #include 
@@ -12,24 +12,9 @@
 #include 
 #include 
 
-#ifdef HAVE_SYS_DKSTAT_H
-#include 
-#endif
 #ifdef HAVE_SYS_SYSCTL_H
 #include 
 #endif
-#ifdef HAVE_SYS_VMMETER_H
-#include 
-#endif
-#ifdef HAVE_VM_VM_PARAM_H
-#include 
-#endif
-#ifdef HAVE_VM_VM_EXTERN_H
-#include 
-#endif
-
-#define CPU_SYMBOL  "cp_time"
-#define MEM_SYMBOL  "cnt"
 
 void _cpu_copy_stats( netsnmp_cpu_info *cpu );
 
@@ -67,11 +52,12 @@
  */
 int netsnmp_cpu_arch_load( netsnmp_cache *cache, void *magic ) {
 long   cpu_stats[CPUSTATES];
-struct vmmeter mem_stats;
+int size, tempval;
+
 netsnmp_cpu_info *cpu = netsnmp_cpu_get_byIdx( -1, 0 );
 
-auto_nlist( CPU_SYMBOL, (char *) cpu_stats, sizeof(cpu_stats));
-auto_nlist( MEM_SYMBOL, (char *)&mem_stats, sizeof(mem_stats));
+size = sizeof(cpu_stats);
+sysctlbyname("kern.cp_time", &cpu_stats, &size, NULL, 0);
 
 cpu->user_ticks = (unsigned long)cpu_stats[CP_USER];
 cpu->nice_ticks = (unsigned long)cpu_stats[CP_NICE];
@@ -85,15 +71,19 @@
  * Interrupt/Context Switch statistics
  *   XXX - Do these really belong here ?
  */
-#if defined(openbsd2) || defined(darwin)
-cpu->swapIn  = (unsigned long)mem_stats.v_swpin;
-cpu->swapOut = (unsigned long)mem_stats.v_swpout;
-#else
-cpu->swapIn  = (unsigned long)mem_stats.v_swappgsin+mem_stats.v_vnodepgsin;
-cpu->swapOut = (unsigned 
long)mem_stats.v_swappgsout+mem_stats.v_vnodepgsout;
-#endif
-cpu->nInterrupts  = (unsigned long)mem_stats.v_intr;
-cpu->nCtxSwitches = (unsigned long)mem_stats.v_swtch;
+size = sizeof(int);
+#define GET_VM_STATS(cat, name, netsnmpname) \
+do { \
+sysctlbyname("vm.stats." #cat "." #name, &tempval, &size, NULL, 0); \
+cpu->netsnmpname = (unsigned long) tempval; \
+} while(0)
+
+GET_VM_STATS(vm,  v_swappgsin,   swapIn);
+GET_VM_STATS(vm,  v_swappgsout,  swapOut);
+GET_VM_STATS(vm,  v_vnodepgsin,  pageIn);
+GET_VM_STATS(vm,  v_vnodepgsout, pageOut);
+GET_VM_STATS(sys, v_intr,nInterrupts);
+GET_VM_STATS(sys, v_swtch,   nCtxSwitches);
 
 #ifdef PER_CPU_INFO
 for ( i = 0; i < n; i++ ) {
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: auto_nlist failed on cp_time at location 1

2008-04-24 Thread Dan Nelson

In the last episode (Apr 24), Tim Stoddard said:
> I applied your patch by hand and recompiled/reinstalled net-snmp,
> however I am still seeing the same error just on a different memory
> address now.
> 
> Apr 24 10:16:41 shaggy snmpd[73273]: kvm_read(*, 1, 0xbf7fe830, 20) = -1: 
> kvm_read: Bad address
> Apr 24 10:16:41 shaggy snmpd[73273]: auto_nlist failed on cp_time at location 
> 1
> Apr 24 10:16:46 shaggy snmpd[73273]: kvm_read(*, 1, 0xbf7fe830, 20) = -1: 
> kvm_read: Bad address
> Apr 24 10:16:46 shaggy snmpd[73273]: auto_nlist failed on cp_time at location 
> 1
> Apr 24 10:16:51 shaggy snmpd[73273]: kvm_read(*, 1, 0xbf7fe830, 20) = -1: 
> kvm_read: Bad address
> Apr 24 10:16:51 shaggy snmpd[73273]: auto_nlist failed on cp_time at location 
> 1

Hm.  It looks like net-snmp has two different pieces of code that both
do the same thing (read CPU and vmstat info).  I wonder which OIDs
trigger them on your system?  On my system, walking
enterprises.ucdavis.systemStats uses the cpu_nlist.c code.  Here's a
patch for the other file (vmstat_freebsd2.c); it's not even compiled on
my 7-stable system, so I can't verify that it's correct.  I'm not sure
why my first patch didn't apply; I attached it straight out of my
net-snmp/files/ directory.

-- 
Dan Nelson
[EMAIL PROTECTED]
--- agent/mibgroup/ucd-snmp/vmstat_freebsd2.c   2008-04-24 10:25:59.834152091 
-0500
+++ agent/mibgroup/ucd-snmp/vmstat_freebsd2.c   2008-04-24 10:25:59.834152091 
-0500
@@ -189,13 +189,15 @@
  * Update structures (only if time has passed) 
  */
 if (time_new != time_old) {
+int size;
 time_diff = time_new - time_old;
 time_old = time_new;
 
 /*
  * CPU usage 
  */
-auto_nlist(CPTIME_SYMBOL, (char *) cpu_new, sizeof(cpu_new));
+size = sizeof(cpu_new);
+sysctlbyname("kern.cp_time", &cpu_new, &size, NULL, 0);
 
 cpu_total = 0;
 
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: Poor write performance with LSI 320-2 on 6.1-STABLE

2006-09-29 Thread Dan Nelson

In the last episode (Sep 29), Albert Chin said:
> On Thu, Sep 28, 2006 at 05:15:05PM -0500, Albert Chin wrote:
> > I have an Intel S875PWP1 motherboard with a Pentium4 [EMAIL PROTECTED] PCI
> > bus is 33Mhz, 32-bit. I recently purchased an LSI 320-2/128MB on eBay
> > (though the card really looks like a PERC4/DS) and just ran some
> > bonnie++ tests on a RAID 1 array between two U320 drives for the first
> > channel and on a RAID 0 array between one U320 drive for the second
> > channel. The 320-2 has the latest LSI firmware, 1L47.
> 
> I reran some of the tests with the same 320-2 but on an Intel
> SE7520BD2 with 32-bit and 64-bit (100Mhz) slots:
>   #1. RAID 1, two U320 drives, channel 1, 32-bit, 33Mhz slot
> Version 1.93c   --Sequential Output-- --Sequential Input- 
> --Random-
> Concurrency   1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- 
> --Seeks--
> MachineSize K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec 
> %CP
> maetel.il.thew 300M   186  99 16707   5 16063   6   654  99 537320  93  4129  
> 50
> Latency 45215us 199ms   89764us   34740us1215us1808ms
> Version 1.93c   --Sequential Create-- Random 
> Create
> maetel.il.thewritte -Create-- --Read--- -Delete-- -Create-- --Read--- 
> -Delete--
>   files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec 
> %CP
>  16  7441  23 + +++ + +++  5799  18 + +++ + 
> +++
> Latency   479ms 122us2508us 606ms   13549us 101us
> 
>   #2. RAID 1, two U320 drives, channel 1, 64-bit, 100Mhz slot
> Version 1.93c   --Sequential Output-- --Sequential Input- 
> --Random-
> Concurrency   1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- 
> --Seeks--
> MachineSize K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec 
> %CP
> maetel.il.thew 300M   186  99 18006   6 15964   5   634  99 571275  99  4450  
> 57
> Latency 44992us 139ms 130ms   35143us1238us 120ms
> Version 1.93c   --Sequential Create-- Random 
> Create
> maetel.il.thewritte -Create-- --Read--- -Delete-- -Create-- --Read--- 
> -Delete--
>   files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec 
> %CP
>  16  7581  24 + +++ + +++  5750  18 + +++ + 
> +++
> Latency   511ms 255us2615us 622ms   12691us  53us
> 
> Odd that I don't get x2 the performance when the bus bandwidth doubles
> in speed.

Not really odd, since you're nowhere near even the 32-bit bus's max.
(32bit * 33Mhz) / 8 bits = 132 MB/sec, and in write-through mode you're
spending most of your time witing for the disks to sync.  With a larger
filesize you might see a difference in the sequential input test;
judging by your insane sequential read and random seek values, your
300M test file looks like it's completely cached in RAM.  A size 2x
your RAM capacity is recommended.
 
-- 
Dan Nelson
[EMAIL PROTECTED]
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: problem with old /usr/src/contrib/amd

2006-10-03 Thread Dan Nelson

In the last episode (Oct 03), Nicolas Martin said:
> i was wondering if an update of /usr/src/contrib/amd is planned ? I
> encounter a problem using amd with nolock options, and it seems that
> this problem was fixed on recent version of am-utils.

If anything, it would be updated in -current, not stable.  Until a
newer version is imported, you can use the sysutils/am-utils port.

-- 
    Dan Nelson
[EMAIL PROTECTED]
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: Is jemalloc going to make its way into RELENG_6?

2006-10-05 Thread Dan Nelson

In the last episode (Oct 05), Vlad GALU said:
> Judging from my tests (allocating numerous small objects, then
> freeing the memory) it looks like the bottleneck is in free(). I've
> built a different libc library with the malloc.c and tree.h taken
> from HEAD and it now behaves nicely. I haven't seen any bad side
> effects on this machine (it's the lappie I do most of my work on, I
> run KDE, seamonkey, mplayer, openoffice, the like) since I switched
> to the new libc. Another nice solution would be to ship the modified
> libc in base so the people who really need jemalloc can relink to it
> via libmap.conf.

You can compile just the -current version of malloc.c as a shared
object, then inject it into specific binaries:

$ gcc -O -Wall -I/usr/src/lib/libc/include -shared -o /lib/jemalloc.so 
jemalloc.c
$ MALLOC_OPTIONS=P date
date in malloc(): warning: unknown char in MALLOC_OPTIONS
Thu Oct  5 11:44:36 CDT 2006
$ LD_PRELOAD=/lib/jemalloc.so MALLOC_OPTIONS=P date |& head
Thu Oct  5 11:44:49 CDT 2006
___ Begin malloc statistics ___
Number of CPUs: 2
Number of arenas: 11
Chunk size: 524288 (2^19)
Quantum size: 16 (2^4)
Max small size: 512
Pointer size: 4
Assertions enabled
Allocated: 4096, space used: 1048576

I've tried this with seamonkey and mysqld, so this method seems to work
fine on complex apps.

-- 
Dan Nelson
[EMAIL PROTECTED]
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: Pleading for commit

2006-10-24 Thread Dan Nelson

In the last episode (Oct 24), Doug Barton said:
> Duane Whitty wrote:
> >Patching it myself after every cvs update is not such a big deal; It
> >is forgetting to patch it after every update which is a big deal.
> 
> Write a little script for yourself that calls cvsup then runs patch
> so you won't forget. :)

Or cvsup the CVS repository (instead of using checkout mode), check out
your working tree from there, and run "cvs update" to update your
sources, which will preserve local changes.

-- 
Dan Nelson
[EMAIL PROTECTED]
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: adaptec utilities on amd64?

2006-11-17 Thread Dan Nelson

In the last episode (Nov 17), Vivek Khera said:
> On Nov 15, 2006, at 7:34 PM, Bruce Burden wrote:
> > I have a 2230SLP that I will be installing early next week on
> >   my AMD64 implementation. I am hoping that the aaccli program in
> >   ports will work.
> 
> If it has the newer firmware, it will not work with aaccli.  If you  
> got the card after they switched to the "R" revision, you have the  
> newer firmware.
> 
> Some time long ago, someone posted a very short C program that probes  
> the LSI controller and spits out this kind of output:
> 
> [EMAIL PROTECTED] amrstat
> Drive 0:34.18 GB, RAID1  optimal
> Drive 1:   102.54 GB, RAID1  optimal
> 
> This is the kind of output I'd love to get from my adaptec
> controllers, too.  This can be trivially scripted and hooked into a
> monitoring system like nagios.
> 
> The aaccli tool is a curses based app (despite the "cli" in the name)
> and scripting it is damn near impossible.  It doesn't even read
> commands from stdin!

It's non-interactive if you pass it a commandline, though.  I have a
Big Brother script that does this (amongst other things):

  # Gather Data 
  CONTROLLERS=$($AACCLI controller list | awk '/PERC/ { print $1 }')
  OUT_AAC="Controller list: $CONTROLLERS
  "
  CMD_AAC="task list /all : controller details : container list /full : disk 
list /full : disk show smart /full : enclosure list /full : enclosure show 
status"

  for c in $CONTROLLERS ; do
OUT_AAC=$OUT_AAC$($AACCLI open /readonly $c : $CMD_AAC)
  done

It then processes the contents of $OUT_AAC to determine if the array's
happy or not.


-- 
Dan Nelson
[EMAIL PROTECTED]
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: malloc(0) returns 0x800 on FreeBSD 6.2 ?

2006-12-11 Thread Dan Nelson

In the last episode (Dec 11), Luigi Rizzo said:
> i was debugging a program on FreeBSD 6, and much to my surprise, i
> noticed that malloc(0) returns 0x800, as shown by this program:
> 
>   > more a.c
>   #include 
>   int main(int argc, char *argv[])
>   {
>   char *p = malloc(0);
>   printf(" malloc 0 returns %p\n", p);
>   }
>   > cc -o a a.c
>   > ./a
>malloc 0 returns 0x800
> 
> if you look at the source this is indeed clear - internally the 0x800
> is ZEROSIZEPTR and is set when a zero length is passed to malloc()
> unless you have malloc_sysv set.

Right, it passed you a pointer to which you may write 0 bytes to;
exactly what the program asked for :)

The FreeBSD 6.x behaviour is slightly against POSIX rules that state
all successful malloc calls must return unique pointers, so the 7.x
malloc silently rounds zero-size mallocs to 1.  Ideally malloc would
return unique pointers to blocks of memory set to MPROT_NONE via
mprotect() (you could fit 8192 of these pointers in an 8k page), to
prevent applications from using that byte of memory.

-- 
Dan Nelson
[EMAIL PROTECTED]
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: malloc(0) returns 0x800 on FreeBSD 6.2 ?

2006-12-11 Thread Dan Nelson

In the last episode (Dec 11), Dan Nelson said:
> In the last episode (Dec 11), Luigi Rizzo said:
> > i was debugging a program on FreeBSD 6, and much to my surprise, i
> > noticed that malloc(0) returns 0x800, as shown by this program:
> > 
> > > more a.c
> > #include 
> > int main(int argc, char *argv[])
> > {
> > char *p = malloc(0);
> > printf(" malloc 0 returns %p\n", p);
> > }
> > > cc -o a a.c
> > > ./a
> >  malloc 0 returns 0x800
> > 
> > if you look at the source this is indeed clear - internally the 0x800
> > is ZEROSIZEPTR and is set when a zero length is passed to malloc()
> > unless you have malloc_sysv set.
> 
> Right, it passed you a pointer to which you may write 0 bytes to;
> exactly what the program asked for :)
> 
> The FreeBSD 6.x behaviour is slightly against POSIX rules that state
> all successful malloc calls must return unique pointers, so the 7.x
> malloc silently rounds zero-size mallocs to 1.  Ideally malloc would
> return unique pointers to blocks of memory set to MPROT_NONE via
> mprotect() (you could fit 8192 of these pointers in an 8k page), to
> prevent applications from using that byte of memory.

Also note that the 0x800 behaviour was added to malloc.c rev 1.60 back
in 2001, which means that all of the 5.x and 6.x releases did this.

-- 
Dan Nelson
[EMAIL PROTECTED]
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: Is syslog() reentrant? Was: OpenBSD's spamd.

2006-12-19 Thread Dan Nelson

In the last episode (Dec 19), Christopher Hilton said:
> Christopher Hilton wrote:
> >Has anyone gotten a newer version of OpenBSD's spamd than the one in
> >ports going? I'm cvsupping my ports tree now but since I didn't see
> >an update on the cvs server I'm assuming 3.7 is the latest version.
> >
> >Between OpenBSD 3.7 and 3.8 spamd gained the ability to tarpit or
> >stutter at all connections for a configurable period of time. I
> >understand that stuttering for the first few seconds of the SMTP
> >dialog causes many spammers to go away before even generating a
> >greylisting tuple. It's something I'd like to try and see for myself
> >and it will be fairly easy since my primary MX is behind an OpenBSD
> >firewall. However, my secondary MX is a FreeBSD box with no such
> >protection and I fear that the spammers will just take advantage of
> >the fact that my secondary MX has weaker protections than my
> >primary.
> >
> 
> A casual attempt to compile a fresher copy of the software shows that
> spamd is using the OpenBSD's reentrant syslog functions (syslog_r,
> openlog_r, etc) Is FreeBSD's syslog already reentrant?

It is, as of FreeBSD 5.4.  In previous versions only openlog() and
syslog("%m") with an invalid errno were non-reentrant.

http://www.freebsd.org/cgi/query-pr.cgi?pr=72394

-- 
Dan Nelson
[EMAIL PROTECTED]
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

1 2 >

1 - 100 of 163 matches

Mail list logo