Re: MFC: Distributed audit daemon committed (was: svn commit: r243752 - in head: etc etc/defaults etc/mail etc/mtree etc/rc.d share/man/man4 usr.sbin usr.sbin/auditdistd (fwd)) (fwd)
On 18 Dec 2012, at 18:38, Bryan Drewery wrote: >> Just an FYI that the new distributed audit daemon has been MFC'd to >> 9-STABLE. >> >> As noted in UPDATING, you will need to run "mergemaster -p" before using >> installkernel or installworld targets in order to add the new >> "auditdistd" system user. This should be part of the regular update >> cycle anyway, but after the experience of adding auditdistd in >> 10-CURRENT, we've discovered that many people are skipping that step in >> the update cycle, so I figured it best to point out here. >> >> (Technically, only installworld requires the user, but the user-check >> guards in the system Makefiles are enforced for both targets.) > > Have you seen misc/174405? Apparently installkernel is requiring the > user as well. The documented process in UPDATING does not mention > running mergemaster -p before [install]kernel. Hi Bryan: I was not aware of the PR. However, yes, that was the point I was making in my e-mail -- that the Makefile seems to put the user check on installkernel and not just installworld. While I did MFC the change to add the 'auditdistd' user to the requirements list, I didn't originate that change, and agree that it's a "false positive". I hadn't originally planned to add an UPDATING entry, or Makefile dependency, as mergemaster -p is part of our standard upgrade procedure before installworld; however, I got a lot of complaints :-). I did also add an explicit URL pointing at the upgrade procedure in the handbook as part of UPDATING as a result. It would be useful if someone would make the necessary changes to the Makefile infrastructure to allow kernel vs. userspace install-time dependencies on users (and groups) separate. Robert ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: [poll / rfc] kdb_stop_cpus
On 4 Jun 2011, at 09:22, Andriy Gapon wrote: > on 03/06/2011 20:57 Robert N. M. Watson said the following: >> >> On 3 Jun 2011, at 16:13, Andriy Gapon wrote: >> >>> I wonder if anybody uses kdb_stop_cpus with non-default value. If, yes, I >>> am very interested to learn about your usecase for it. >> >> The issue that prompted the sysctl was non-NMI IPIs being used to enter the >> debugger or reboot following a core hanging with interrupts disabled. With >> the switch to NMI IPIs in some of those circumstances, life is better -- at >> least, on hardware that supports non-maskable IPIs. I seem to recall sparc64 >> doesn't, however? > > Seems to be so as Nathan has also pointed out for PPC. > For this I also plan the following change: > > commit 458ebd9aca7e91fc6e0825c727c7220ab9f61016 > >generic_stop_cpus: move timeout detection code from under DIAGNOSTIC > >... and also increase it a bit. >IMO it's better to detect and report the (rather serious) condition and >allow a system to proceed somehow rather than be stuck in an endless >loop. Agreed on detecting and reporting. It would be good to confirm that it works in practice, however, and also that there are no false positives. I'm not sure what the best test scenarios are for that. Robert ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: [poll / rfc] kdb_stop_cpus
On 3 Jun 2011, at 16:13, Andriy Gapon wrote: > I wonder if anybody uses kdb_stop_cpus with non-default value. > If, yes, I am very interested to learn about your usecase for it. The issue that prompted the sysctl was non-NMI IPIs being used to enter the debugger or reboot following a core hanging with interrupts disabled. With the switch to NMI IPIs in some of those circumstances, life is better -- at least, on hardware that supports non-maskable IPIs. I seem to recall sparc64 doesn't, however? Not sure about MIPS, etc. Attilio has since significantly improved our shutdown behaviour -- initially, the switch to NMI IPIs broke other things (because certain IPIs then improperly preempted threads holding spinlocks), but that pretty much all seems worked out now. Robert > > I think that the default kdb behavior is the correct one, so it doesn't make > sense > to have a knob to turn on incorrect behavior. > But I may be missing something obvious. > > The comment in the code doesn't really satisfy me: > /* > * Flag indicating whether or not to IPI the other CPUs to stop them on > * entering the debugger. Sometimes, this will result in a deadlock as > * stop_cpus() waits for the other cpus to stop, so we allow it to be > * disabled. In order to maximize the chances of success, use a hard > * stop for that. > */ > > The hard stop should be sufficiently mighty. > Yes, I am aware of supposedly extremely rare situations where a deadlock could > happen even when using hard stop. But I'd rather fix that than have this > switch. > > Oh, the commit message (from 2004) explains it: >> Add a new sysctl, debug.kdb.stop_cpus, which controls whether or not we >> attempt to IPI other cpus when entering the debugger in order to stop >> them while in the debugger. The default remains to issue the stop; >> however, that can result in a hang if another cpu has interrupts disabled >> and is spinning, since the IPI won't be received and the KDB will wait >> indefinitely. We probably need to add a timeout, but this is a useful >> stopgap in the mean time. > > But that was before we started using hard stop in this context (in 2009). > > -- > Andriy Gapon ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: is dtrace usable?
On Mar 9, 2010, at 2:16 PM, Alexander Leidinger wrote: >> From this you can see that sys.mk is included and parsed before 'Makefile', >> so the WITH_CTF=yes is not set until after sys.mk has been parsed. > > I think we need to find a different solution for this. The need to specify > WITH_CTF at the command line is very error prone. :( You are neither the first person to have made this observation, nor the first person to have failed to propose a solution in the form of a patch :-). Robert___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: net.inet.tcp.timer_race: does anyone have a non-zero value?
On Mar 7, 2010, at 12:33 PM, Mikolaj Golub wrote: > On Sun, 7 Mar 2010 11:59:35 + (GMT) Robert Watson wrote: > >> Please check the results of the following command: >> >> % sysctl net.inet.tcp.timer_race >> net.inet.tcp.timer_race: 0 > > Are the results for FreeBSD7 look interesting for you? Because currently we > have mostly FreeBSD7.1 hosts in production and I observe nonzero values on 8 > hosts (about 15%). I would send more details to you privately if you are > interested. Yes, 7.x is also of interest, thanks! Robert___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: mbuf leakage with nfs/udp (was: mbuf leakage with nfs/zfs)
On Feb 28, 2010, at 2:52 PM, Daniel Braniss wrote: > well, I have further reduced the problem, it happens with NFS/UDP writes. > i'll try the wireshark road, but i'm very rusty with RPC, the other road is to > check the changes, my oldest is from late october (RC2) where it's happening, > while > Gerrit tried 8-pre from November and worked, so it will be fun > trying to nail it down :-) Fortunately, Wireshark actually has quite a good NFS RPC decoder -- it will tell you what operation appears, what the arguments are, interpret NFS error codes, etc. In fact, it's an excellent way to learn about NFS... Robert___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: mbuf leakage with nfs/zfs?
On Feb 28, 2010, at 12:11 PM, Daniel Braniss wrote: >> I'm pulling in Robert Watson, who has some familiarity with the UDP >> stack/code in FreeBSD. I'm not sure he'll be a sufficient source of >> knowledge for this specific issue since it appears (?) to be specific to >> NFS; Rick Macklem would be a better choice, but as reported, he's MIA. >> >> Robert, are you aware of any changes or implementation issues which >> might cause excessive (read: leaking) mbuf use under UDP-based NFS? Do >> you know of a way folks could determine the source of the leak, either >> via DDB or while the system is live? > > I have been runing some tests in a controlled environment. > > server and client are both 64bit Xeon/X5550 @ 2.67GHz with 16Gb of memory > FreeBSD/SMP: 2 package(s) x 4 core(s) x 2 SMT threads > > the client is runing latest 8.0 stable > the load is created by runing 'make -j32 buildworld' and sleeping 150 sec. > in between runs, this is the straight line you will see in the graphs. > Both the src and obj directories are NFS mounted from the server, regular UFS. > > when server is running 7.2-stable no leakage is seen. > see ftp://ftp.cs.huji.ac.il/users/danny/freebsd/mbufs/{tcp,udp}-7.2.ps > when server is runing 8.0-stable > see ftp://ftp.cs.huji.ac.il/users/danny/freebsd/mbufs/{tcp,udp}-8.0.ps > you can see that udp is leaking! > > cheers, > danny > ps: I think the subject should be changed again, removing zfs ... This type of problem (occurs with one client but not another) is almost always the result of the access pattern of a particular client triggering a specific (and perhaps single) bug in error-handling. For example, we might not be properly freeing the received request when generating an EPERM in an edge case. The hard bit is identifying which it is. If it's reproducible with UDP, then usually the process is: - Build a minimal test case to trigger the problem -- ideally with as little complexity as possible. - Run netstat -m at the beginning of the test and the end of the test on the server to count the number of leaked mbufs - Run wireshark throughout the test - Walk the wireshark trace looking for some error that occurs at about the same or slightly lower number of times then the number of mbufs leaked - Iterate, narrowing the test case until it's either obvious exactly what's going on, or you've identified a relatively constrained code path and can just spot the bug by reading the code It's almost certainly one or a small number of very specific RPCs that are triggering it -- maybe OpenBSD does an extra lookup, or stat, or something, on a name that may not exist anymore, or does it sooner than the other clients. Hard to say, other than to wave hands at the possibilities. And it may well be we're looking at two bugs: Danny may see one bug, perhaps triggered by a race condition, but it may be different from the OpenBSD client-triggered bug (to be clear: it's definitely a FreeBSD bug, although we might only see it when an OpenBSD client is used because perhaps OpenBSD also has a bug or feature). Robert___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: em interface slow down on 8.0R
On 1 Dec 2009, at 12:05, Elliot Finley wrote: > On Mon, Nov 30, 2009 at 6:29 PM, Hiroki Sato wrote: > Jack Vogel wrote > in <2a41acea0911301119j1449be58y183f2fe1d1112...@mail.gmail.com>: > > jf> I will look into this Hiroki, as time goes the older hardware does not > jf> always > jf> get test cycles like one might wish. > > > Here's some more info to throw into the mix. I have several new boxes > running 8-Stable (a few hours after release). > > Leaving all sysctl at default, I get around 400mbps testing with netperf or > iperf. If I set the following on the box running 'netserver' or 'iperf -s': > > kern.ipc.maxsockbuf=16777216 > net.inet.tcp.recvspace=1048576 > > then I can get around 926mbps. But then if I make those same changes on the > box running the client side of netperf or iperf the performance drops back > down to around 400mbps. > > All boxes have the same hardware. they have two 4-port Intel NICS in them. > > e...@pci0:5:0:1: class=0x02 card=0x10a48086 chip=0x10a48086 rev=0x06 > hdr=0x00 > vendor = 'Intel Corporation' > device = '82571EB Gigabit Ethernet Controller' > class = network > subclass = ethernet > > any pointers on further network tuning to get bidirectional link saturation > would be much appreciated. These boxes are not in production yet, so anyone > that would like to have access to troubleshoot, just ask. I've CC'd Lawrence Stewart in on this thread, as he's been doing work on the TCP stack lately and might have insight into what you might be running into. Lawrence -- there's a bit of a back thread with configuration and problem details in the stable@ archives. Robert___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Extreme console latency during disk IO (8.0-RC1, previous releases also affected according to others)
On 13 Oct 2009, at 14:33, Ivan Voras wrote: If (1) is highly variable during I/O, it's almost certainly a property of the VM technology you're using, and there's nought to be done about it in the guest OS. Here's an example of a ping session with 0.1s resolution during a few seconds-stall in ssh: 64 bytes from 161.53.72.188: icmp_seq=1576 ttl=64 time=0.383 ms 64 bytes from 161.53.72.188: icmp_seq=1577 ttl=64 time=0.405 ms 64 bytes from 161.53.72.188: icmp_seq=1578 ttl=64 time=0.360 ms 64 bytes from 161.53.72.188: icmp_seq=2304 ttl=64 time=4.194 ms 64 bytes from 161.53.72.188: icmp_seq=2305 ttl=64 time=0.454 ms 64 bytes from 161.53.72.188: icmp_seq=2306 ttl=64 time=0.376 ms note huge packet loss. It looks like it's VM fault or something like it. It sounds like the VM is failing to execute the guest during certain types of I/O. A bit of scheduler tracing in the host OS probably wouldn't go amiss to confirm that the VM really is suspending the guest at about the same time ICMP latency goes up. However, given the above I think I you can reasonable assume that the 4ms jump you're seeing there is due to global host OS/VM scheduling, and not FreeBSD scheduling. Robert ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: smbfs panic when lost connection or unmount --force
On 10 Jul 2009, at 13:17, Oliver Pinter wrote: I know, that the bt is useful, but ddb works with usb keyboard? At nigth then I send the log. Unfortunately, a known issue with FreeBSD 8.0 is that the new USB stack, while a vast improvement over the previous USB stack in countless ways, does not support polled access from DDB. You will need to use a serial port, firewire port, ps/2, or AT keyboard in order to get interactive DDB support. If that's not feasible, or if it's just easier, you may be able to use the DDB scripting facility + textdumps to run DDB commands automatically on panic to produce useful debugging output. Take a look at the textdump(4) man page for details. This can be combined with a traditional crashdump to capture both DDB output and normal dump data for use with kgdb. Robert //sorry for bad english ps.: attached the config On 7/10/09, Robert Watson wrote: On Fri, 10 Jul 2009, Oliver Pinter wrote: It is a kernel panic, when force unmount the smbfs volume or lost the connection with the samba server. This is a NULL pointer dereference in the kernel. Per Attilio's e- mail, a stack trace should help us track it down. Thanks! Robert N M Watson Computer Laboratory University of Cambridge -- Thes OS is: kern.ostype: FreeBSD kern.osrelease: 7.2-STABLE kern.osrevision: 199506 kern.version: FreeBSD 7.2-STABLE #4: Sat Jun 27 21:44:32 CEST 2009 r...@oliverp:/usr/obj/usr/src/sys/stable kern.osreldate: 702103 -- make.conf: CPUTYPE?=core2 CFLAGS= -O2 -fno-strict-aliasing -pipe MODULES_OVERRIDE=smbfs libiconv libmchain zfs opensolaris drm cd9660 cd9660_iconv -- panic message: Jul 10 01:58:39 oliverp syslogd: kernel boot file is /boot/kernel/ kernel Jul 10 01:58:39 oliverp kernel: kernel trap 12 with interrupts disabled Jul 10 01:58:39 oliverp kernel: Jul 10 01:58:39 oliverp kernel: Jul 10 01:58:39 oliverp kernel: Fatal trap 12: page fault while in kernel mode Jul 10 01:58:39 oliverp kernel: cpuid = 2; apic id = 02 Jul 10 01:58:39 oliverp kernel: fault virtual address = 0x30 Jul 10 01:58:39 oliverp kernel: fault code = supervisor read data, page not present Jul 10 01:58:39 oliverp kernel: instruction pointer = 0x8:0x80327fd0 Jul 10 01:58:39 oliverp kernel: stack pointer = 0x10:0xff8078360940 Jul 10 01:58:39 oliverp kernel: frame pointer = 0x10:0xff0004c31390 Jul 10 01:58:39 oliverp kernel: code segment= base 0x0, limit 0xf, type 0x1b Jul 10 01:58:39 oliverp kernel: = DPL 0, pres 1, long 1, def32 0, gran 1 Jul 10 01:58:39 oliverp kernel: processor eflags= resume, IOPL = 0 Jul 10 01:58:39 oliverp kernel: current process = 60406 (smbiod0) Jul 10 01:58:39 oliverp kernel: trap number = 12 Jul 10 01:58:39 oliverp kernel: panic: page fault Jul 10 01:58:39 oliverp kernel: cpuid = 2 Jul 10 01:58:39 oliverp kernel: Uptime: 6h51m16s Jul 10 01:58:39 oliverp kernel: Physical memory: 4087 MB Jul 10 01:58:39 oliverp kernel: Dumping 2448 MB:Copyright (c) 1992-2009 The FreeBSD Project. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org " ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"