from:"Robert N. M. Watson"

Re: MFC: Distributed audit daemon committed (was: svn commit: r243752 - in head: etc etc/defaults etc/mail etc/mtree etc/rc.d share/man/man4 usr.sbin usr.sbin/auditdistd (fwd)) (fwd)

2012-12-19 Thread Robert N. M. Watson

On 18 Dec 2012, at 18:38, Bryan Drewery wrote:

>> Just an FYI that the new distributed audit daemon has been MFC'd to
>> 9-STABLE.
>> 
>> As noted in UPDATING, you will need to run "mergemaster -p" before using
>> installkernel or installworld targets in order to add the new
>> "auditdistd" system user.  This should be part of the regular update
>> cycle anyway, but after the experience of adding auditdistd in
>> 10-CURRENT, we've discovered that many people are skipping that step in
>> the update cycle, so I figured it best to point out here.
>> 
>> (Technically, only installworld requires the user, but the user-check
>> guards in the system Makefiles are enforced for both targets.)
> 
> Have you seen misc/174405? Apparently installkernel is requiring the
> user as well. The documented process in UPDATING does not mention
> running mergemaster -p before [install]kernel.

Hi Bryan:

I was not aware of the PR. However, yes, that was the point I was making in my 
e-mail -- that the Makefile seems to put the user check on installkernel and 
not just installworld. While I did MFC the change to add the 'auditdistd' user 
to the requirements list, I didn't originate that change, and agree that it's a 
"false positive". I hadn't originally planned to add an UPDATING entry, or 
Makefile dependency, as mergemaster -p is part of our standard upgrade 
procedure before installworld; however, I got a lot of complaints :-). I did 
also add an explicit URL pointing at the upgrade procedure in the handbook as 
part of UPDATING as a result. It would be useful if someone would make the 
necessary changes to the Makefile infrastructure to allow kernel vs. userspace 
install-time dependencies on users (and groups) separate.

Robert
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: [poll / rfc] kdb_stop_cpus

2011-06-04 Thread Robert N. M. Watson


On 4 Jun 2011, at 09:22, Andriy Gapon wrote:

> on 03/06/2011 20:57 Robert N. M. Watson said the following:
>> 
>> On 3 Jun 2011, at 16:13, Andriy Gapon wrote:
>> 
>>> I wonder if anybody uses kdb_stop_cpus with non-default value. If, yes, I
>>> am very interested to learn about your usecase for it.
>> 
>> The issue that prompted the sysctl was non-NMI IPIs being used to enter the
>> debugger or reboot following a core hanging with interrupts disabled. With
>> the switch to NMI IPIs in some of those circumstances, life is better -- at
>> least, on hardware that supports non-maskable IPIs. I seem to recall sparc64
>> doesn't, however?
> 
> Seems to be so as Nathan has also pointed out for PPC.
> For this I also plan the following change:
> 
> commit 458ebd9aca7e91fc6e0825c727c7220ab9f61016
> 
>generic_stop_cpus: move timeout detection code from under DIAGNOSTIC
> 
>... and also increase it a bit.
>IMO it's better to detect and report the (rather serious) condition and
>allow a system to proceed somehow rather than be stuck in an endless
>loop.

Agreed on detecting and reporting. It would be good to confirm that it works in 
practice, however, and also that there are no false positives. I'm not sure 
what the best test scenarios are for that.

Robert

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: [poll / rfc] kdb_stop_cpus

2011-06-03 Thread Robert N. M. Watson


On 3 Jun 2011, at 16:13, Andriy Gapon wrote:

> I wonder if anybody uses kdb_stop_cpus with non-default value.
> If, yes, I am very interested to learn about your usecase for it.

The issue that prompted the sysctl was non-NMI IPIs being used to enter the 
debugger or reboot following a core hanging with interrupts disabled. With the 
switch to NMI IPIs in some of those circumstances, life is better -- at least, 
on hardware that supports non-maskable IPIs. I seem to recall sparc64 doesn't, 
however? Not sure about MIPS, etc. Attilio has since significantly improved our 
shutdown behaviour -- initially, the switch to NMI IPIs broke other things 
(because certain IPIs then improperly preempted threads holding spinlocks), but 
that pretty much all seems worked out now.

Robert

> 
> I think that the default kdb behavior is the correct one, so it doesn't make 
> sense
> to have a knob to turn on incorrect behavior.
> But I may be missing something obvious.
> 
> The comment in the code doesn't really satisfy me:
> /*
> * Flag indicating whether or not to IPI the other CPUs to stop them on
> * entering the debugger.  Sometimes, this will result in a deadlock as
> * stop_cpus() waits for the other cpus to stop, so we allow it to be
> * disabled.  In order to maximize the chances of success, use a hard
> * stop for that.
> */
> 
> The hard stop should be sufficiently mighty.
> Yes, I am aware of supposedly extremely rare situations where a deadlock could
> happen even when using hard stop.  But I'd rather fix that than have this 
> switch.
> 
> Oh, the commit message (from 2004) explains it:
>> Add a new sysctl, debug.kdb.stop_cpus, which controls whether or not we
>> attempt to IPI other cpus when entering the debugger in order to stop
>> them while in the debugger.  The default remains to issue the stop;
>> however, that can result in a hang if another cpu has interrupts disabled
>> and is spinning, since the IPI won't be received and the KDB will wait
>> indefinitely.  We probably need to add a timeout, but this is a useful
>> stopgap in the mean time.
> 
> But that was before we started using hard stop in this context (in 2009).
> 
> -- 
> Andriy Gapon

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: is dtrace usable?

2010-03-09 Thread Robert N. M. Watson

On Mar 9, 2010, at 2:16 PM, Alexander Leidinger wrote:

>> From this you can see that sys.mk is included and parsed before 'Makefile',
>> so the WITH_CTF=yes is not set until after sys.mk has been parsed.
> 
> I think we need to find a different solution for this. The need to specify 
> WITH_CTF at the command line is very error prone. :(

You are neither the first person to have made this observation, nor the first 
person to have failed to propose a solution in the form of a patch :-).

Robert___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: net.inet.tcp.timer_race: does anyone have a non-zero value?

2010-03-07 Thread Robert N. M. Watson


On Mar 7, 2010, at 12:33 PM, Mikolaj Golub wrote:

> On Sun, 7 Mar 2010 11:59:35 + (GMT) Robert Watson wrote:
> 
>> Please check the results of the following command:
>> 
>>  % sysctl net.inet.tcp.timer_race
>>  net.inet.tcp.timer_race: 0
> 
> Are the results for FreeBSD7 look interesting for you? Because currently we
> have mostly FreeBSD7.1 hosts in production and I observe nonzero values on 8
> hosts (about 15%). I would send more details to you privately if you are
> interested.

Yes, 7.x is also of interest, thanks!

Robert___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: mbuf leakage with nfs/udp (was: mbuf leakage with nfs/zfs)

2010-02-28 Thread Robert N. M. Watson

On Feb 28, 2010, at 2:52 PM, Daniel Braniss wrote:

> well, I have further reduced the problem, it happens with NFS/UDP writes.
> i'll try the wireshark road, but i'm very rusty with RPC, the other road is to
> check the changes, my oldest is from late october (RC2) where it's happening, 
> while
> Gerrit tried 8-pre from November and worked, so it will be fun
> trying to nail it down :-)

Fortunately, Wireshark actually has quite a good NFS RPC decoder -- it will 
tell you what operation appears, what the arguments are, interpret NFS error 
codes, etc. In fact, it's an excellent way to learn about NFS...

Robert___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: mbuf leakage with nfs/zfs?

2010-02-28 Thread Robert N. M. Watson

On Feb 28, 2010, at 12:11 PM, Daniel Braniss wrote:

>> I'm pulling in Robert Watson, who has some familiarity with the UDP
>> stack/code in FreeBSD.  I'm not sure he'll be a sufficient source of
>> knowledge for this specific issue since it appears (?) to be specific to
>> NFS; Rick Macklem would be a better choice, but as reported, he's MIA.
>> 
>> Robert, are you aware of any changes or implementation issues which
>> might cause excessive (read: leaking) mbuf use under UDP-based NFS?  Do
>> you know of a way folks could determine the source of the leak, either
>> via DDB or while the system is live?
> 
> I have been runing some tests in a controlled environment.
> 
> server and client are both 64bit Xeon/X5550 @  2.67GHz with 16Gb of memory
> FreeBSD/SMP: 2 package(s) x 4 core(s) x 2 SMT threads
> 
> the client is runing latest 8.0 stable
> the load is created by runing 'make -j32 buildworld' and sleeping 150 sec.
> in between runs, this is the straight line you will see in the graphs.
> Both the src and obj directories are NFS mounted from the server, regular UFS.
> 
> when server is running 7.2-stable no leakage is seen.
> see ftp://ftp.cs.huji.ac.il/users/danny/freebsd/mbufs/{tcp,udp}-7.2.ps
> when server is runing 8.0-stable
> see ftp://ftp.cs.huji.ac.il/users/danny/freebsd/mbufs/{tcp,udp}-8.0.ps
> you can see that udp is leaking!
> 
> cheers,
>   danny
> ps: I think the subject should be changed again, removing zfs ...

This type of problem (occurs with one client but not another) is almost always 
the result of the access pattern of a particular client triggering a specific 
(and perhaps single) bug in error-handling. For example, we might not be 
properly freeing the received request when generating an EPERM in an edge case. 
The hard bit is identifying which it is. If it's reproducible with UDP, then 
usually the process is:

- Build a minimal test case to trigger the problem -- ideally with as little 
complexity as possible.
- Run netstat -m at the beginning of the test and the end of the test on the 
server to count the number of leaked mbufs
- Run wireshark throughout the test
- Walk the wireshark trace looking for some error that occurs at about the same 
or slightly lower number of times then the number of mbufs leaked
- Iterate, narrowing the test case until it's either obvious exactly what's 
going on, or you've identified a relatively constrained code path and can just 
spot the bug by reading the code

It's almost certainly one or a small number of very specific RPCs that are 
triggering it -- maybe OpenBSD does an extra lookup, or stat, or something, on 
a name that may not exist anymore, or does it sooner than the other clients. 
Hard to say, other than to wave hands at the possibilities.

And it may well be we're looking at two bugs: Danny may see one bug, perhaps 
triggered by a race condition, but it may be different from the OpenBSD 
client-triggered bug (to be clear: it's definitely a FreeBSD bug, although we 
might only see it when an OpenBSD client is used because perhaps OpenBSD also 
has a bug or feature).

Robert___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: em interface slow down on 8.0R

2009-12-02 Thread Robert N. M. Watson


On 1 Dec 2009, at 12:05, Elliot Finley wrote:

> On Mon, Nov 30, 2009 at 6:29 PM, Hiroki Sato  wrote:
> Jack Vogel  wrote
>  in <2a41acea0911301119j1449be58y183f2fe1d1112...@mail.gmail.com>:
> 
> jf> I will look into this Hiroki, as time goes the older hardware does not
> jf> always
> jf> get test cycles like one might wish.
> 
> 
> Here's some more info to throw into the mix.  I have several new boxes 
> running 8-Stable (a few hours after release).
> 
> Leaving all sysctl at default, I get around 400mbps testing with netperf or 
> iperf.  If I set the following on the box running 'netserver' or 'iperf -s':
> 
> kern.ipc.maxsockbuf=16777216
> net.inet.tcp.recvspace=1048576
> 
> then I can get around 926mbps.  But then if I make those same changes on the 
> box running the client side of netperf or iperf the performance drops back 
> down to around 400mbps.
> 
> All boxes have the same hardware.  they have two 4-port Intel NICS in them.
> 
> e...@pci0:5:0:1: class=0x02 card=0x10a48086 chip=0x10a48086 rev=0x06 
> hdr=0x00
> vendor = 'Intel Corporation'
> device = '82571EB Gigabit Ethernet Controller'
> class  = network
> subclass   = ethernet
> 
> any pointers on further network tuning to get bidirectional link saturation 
> would be much appreciated.  These boxes are not in production yet, so anyone 
> that would like to have access to troubleshoot, just ask.

I've CC'd Lawrence Stewart in on this thread, as he's been doing work on the 
TCP stack lately and might have insight into what you might be running into. 
Lawrence -- there's a bit of a back thread with configuration and problem 
details in the stable@ archives.

Robert___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Extreme console latency during disk IO (8.0-RC1, previous releases also affected according to others)

2009-10-13 Thread Robert N. M. Watson



On 13 Oct 2009, at 14:33, Ivan Voras wrote:

If (1) is highly variable during I/O, it's almost certainly a  
property of
the VM technology you're using, and there's nought to be done about  
it in

the guest OS.


Here's an example of a ping session with 0.1s resolution during a few
seconds-stall in ssh:

64 bytes from 161.53.72.188: icmp_seq=1576 ttl=64 time=0.383 ms
64 bytes from 161.53.72.188: icmp_seq=1577 ttl=64 time=0.405 ms
64 bytes from 161.53.72.188: icmp_seq=1578 ttl=64 time=0.360 ms

64 bytes from 161.53.72.188: icmp_seq=2304 ttl=64 time=4.194 ms
64 bytes from 161.53.72.188: icmp_seq=2305 ttl=64 time=0.454 ms
64 bytes from 161.53.72.188: icmp_seq=2306 ttl=64 time=0.376 ms

note huge packet loss. It looks like it's VM fault or something like  
it.


It sounds like the VM is failing to execute the guest during certain  
types of I/O. A bit of scheduler tracing in the host OS probably  
wouldn't go amiss to confirm that the VM really is suspending the  
guest at about the same time ICMP latency goes up. However, given the  
above I think I you can reasonable assume that the 4ms jump you're  
seeing there is due to global host OS/VM scheduling, and not FreeBSD  
scheduling.


Robert

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: smbfs panic when lost connection or unmount --force

2009-07-12 Thread Robert N. M. Watson



On 10 Jul 2009, at 13:17, Oliver Pinter wrote:


I know, that the bt is useful, but ddb works with usb keyboard?
At nigth then I send the log.


Unfortunately, a known issue with FreeBSD 8.0 is that the new USB  
stack, while a vast improvement over the previous USB stack in  
countless ways, does not support polled access from DDB. You will need  
to use a serial port, firewire port, ps/2, or AT keyboard in order to  
get interactive DDB support.


If that's not feasible, or if it's just easier, you may be able to use  
the DDB scripting facility + textdumps to run DDB commands  
automatically on panic to produce useful debugging output. Take a look  
at the textdump(4) man page for details. This can be combined with a  
traditional crashdump to capture both DDB output and normal dump data  
for use with kgdb.


Robert



//sorry for bad english

ps.: attached the config

On 7/10/09, Robert Watson  wrote:


On Fri, 10 Jul 2009, Oliver Pinter wrote:

It is a kernel panic, when force unmount the smbfs volume or lost  
the

connection with the samba server.


This is a NULL pointer dereference in the kernel.  Per Attilio's e- 
mail, a

stack trace should help us track it down.  Thanks!

Robert N M Watson
Computer Laboratory
University of Cambridge



--
Thes OS is:


kern.ostype: FreeBSD
kern.osrelease: 7.2-STABLE
kern.osrevision: 199506
kern.version: FreeBSD 7.2-STABLE #4: Sat Jun 27 21:44:32 CEST 2009
  r...@oliverp:/usr/obj/usr/src/sys/stable
kern.osreldate: 702103

--
make.conf:


CPUTYPE?=core2
CFLAGS= -O2 -fno-strict-aliasing -pipe
MODULES_OVERRIDE=smbfs libiconv libmchain zfs opensolaris drm cd9660
cd9660_iconv

--
panic message:

Jul 10 01:58:39 oliverp syslogd: kernel boot file is /boot/kernel/ 
kernel
Jul 10 01:58:39 oliverp kernel: kernel trap 12 with interrupts  
disabled

Jul 10 01:58:39 oliverp kernel:
Jul 10 01:58:39 oliverp kernel:
Jul 10 01:58:39 oliverp kernel: Fatal trap 12: page fault while in  
kernel

mode
Jul 10 01:58:39 oliverp kernel: cpuid = 2; apic id = 02
Jul 10 01:58:39 oliverp kernel: fault virtual address   = 0x30
Jul 10 01:58:39 oliverp kernel: fault code  = supervisor read data,
page not present
Jul 10 01:58:39 oliverp kernel: instruction pointer =
0x8:0x80327fd0
Jul 10 01:58:39 oliverp kernel: stack pointer   =
0x10:0xff8078360940
Jul 10 01:58:39 oliverp kernel: frame pointer   =
0x10:0xff0004c31390
Jul 10 01:58:39 oliverp kernel: code segment= base 0x0, limit
0xf, type 0x1b
Jul 10 01:58:39 oliverp kernel: = DPL 0, pres 1, long 1, def32 0,  
gran 1

Jul 10 01:58:39 oliverp kernel: processor eflags= resume, IOPL = 0
Jul 10 01:58:39 oliverp kernel: current process = 60406 (smbiod0)
Jul 10 01:58:39 oliverp kernel: trap number = 12
Jul 10 01:58:39 oliverp kernel: panic: page fault
Jul 10 01:58:39 oliverp kernel: cpuid = 2
Jul 10 01:58:39 oliverp kernel: Uptime: 6h51m16s
Jul 10 01:58:39 oliverp kernel: Physical memory: 4087 MB
Jul 10 01:58:39 oliverp kernel: Dumping 2448 MB:Copyright (c)
1992-2009 The FreeBSD Project.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org 
"








___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: MFC: Distributed audit daemon committed (was: svn commit: r243752 - in head: etc etc/defaults etc/mail etc/mtree etc/rc.d share/man/man4 usr.sbin usr.sbin/auditdistd (fwd)) (fwd)

Re: [poll / rfc] kdb_stop_cpus

Re: [poll / rfc] kdb_stop_cpus

Re: is dtrace usable?

Re: net.inet.tcp.timer_race: does anyone have a non-zero value?

Re: mbuf leakage with nfs/udp (was: mbuf leakage with nfs/zfs)

Re: mbuf leakage with nfs/zfs?

Re: em interface slow down on 8.0R

Re: Extreme console latency during disk IO (8.0-RC1, previous releases also affected according to others)

Re: smbfs panic when lost connection or unmount --force

10 matches

Site Navigation

Mail list logo

Footer information