Re: Request for review testing of VFS locking patch
On Fri, 20 Sep 2002, Boris Popov wrote: On Thu, 19 Sep 2002, Jeff Roberson wrote: Well, haven't tested it with smbfs, but may point that patch for nwfs contains two vref()s instead of vgetref(). Ah, thanks very much. (un?)luckily it was in debug code so it would not have been noticed for a while. I have updated the patch to reflect this change as well as three bugs in the nfs locking that phk and I found. It's at the same place: http://www.chesapeake.net/~jroberson/vfssmp.diff Thanks, Jeff To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Request for review testing of VFS locking patch
I have a patch available at http://www.chesapeake.net/~jroberson/vfssmp.diff that locks the majority of the vnode fields. The namecache locking has been omitted from this patch. The locking has been specified in vnode.h and all interlock, syncer, and vn lock usage has been verified. Any places that are unlocked now should be marked with mp_fixme's. This patch touches every filesystem. I have tested with several but I would appreciate more extensive testing especially if you use one of the lesser used filesystems (ie non ufs). Please test with WITNESS and DEBUG_VFS_LOCKS enabled. If you find that it drops into the debugger please get a back trace and then do the following: w vfs_badlock_panic 0 w vfs_badlock_print 0 w vfs_badlock_mutex 0 Currently I know that sendfile() and the UFS snapshot code fail assertions. There are many diffs that just switch from explicit mtx ops to using the new VI_*LOCK macros. I did this only in places where I actually reviewed the code. The remaining direct v_interlock accesses serve as indicators of behavior that needs to be further verified. I also have not verified usage of the mntvnode mtx or the freelist mutex etc. There may be racees there. I did, however, fixup the broken vflush() mntvnode race. Once this has been commited I will be free to lock the rest of the vnode and then move on to other filesystem related datastructures. My goal is to have the high level VFS and at least some filesystems SMP safe for 5.0. Any feedback is welcome. Thanks, Jeff To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: alpha tinderbox failure
Is someone going to address this? If not, I will. Jeff To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: [bde@zeta.org.au: Re: Page faults from bento cluster (Re: Problemsreading vmcores)]
As near as I can tell the panic is happening in VOP_GETATTR(). It looks to me like it would be possible for the vnode to be recycled between the time when it passes the vp-v_mount test at the top of the loop and the time when vn_lock() succeeds. Shouldn't we bump the vnode reference count by calling vref() at the top of the loop and add the appropriate calls to vrele()? Rev.1.395 made some changes that I didn't like much here. The VOP_GETATTR() is now done unconditionally. This pessimizes vflush() and enlarges any race windows. I think WRITECLOSE is only used for mount -u from rw to ro, so the pessimization exercises code that was rarely used before. Rev.1.394 called VOP_GETATTR() with the interlock held. This was wrong but probably reduced race windows. The window seems to have been opened before rev.1.394 by releasing mntvnode_slock before aquiring the interlock. RELENG_4 doesn't release mntvnode_slock at that point (it holds both locks across the VOP_GETATTR()). Bruce I have patches that fix the locking behavior in vflush() in my current VFS smp patch. It's not quite complete but it has most of struct vnode locked down. The patch even moves the getattr back into the conditional path. This may fix the behavior here. Again, this is more than vflush, but I didn't want to seperate that out and test it before going to bed. If this fixes the problem I can commit the relavent part of this patch soon. http://www.chesapeake.net/~jroberson/VFSsmp.patch Cheers, Jeff To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: cvs commit: src/sys/tools vnode_if.awk
On Sat, 6 Jul 2002, Jeff Roberson wrote: jeff2002/07/06 23:39:37 PDT Modified files: sys/toolsvnode_if.awk Log: - Use 'options DEBUG_VFS_LOCKS' instead of the DEBUG_ALL_VFS_LOCKS environment variable to enable the lock verifiction code. Revision ChangesPath 1.33 +7 -5 src/sys/tools/vnode_if.awk This was previously disabled because our locking was so bad that we could not boot with this option enabled. I can now boot, compile a kernel, and reboot without catching any locking asserts. This means that we are safe at our current level of debugging, but we are certainly not out of the woods wrt VFS locking yet. If you have a crash test box I would appreciate it if you would enable this kernel option. If it catches any errors you will be droped into the debugger where you can get a backtrace (type: tr) and mail it to me current@ to avoid dups. To disable the panic print once you've hit a bug type the following in ddb: w vfs_badlock_print 0 w vfs_badlock_panic 0 And you will not see any more errors. Thanks! Jeff To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: cvs commit: src/sys/tools vnode_if.awk
On Sun, 7 Jul 2002, Don Lewis wrote: On 7 Jul, Jeff Roberson wrote: On Sat, 6 Jul 2002, Jeff Roberson wrote: - Use 'options DEBUG_VFS_LOCKS' instead of the DEBUG_ALL_VFS_LOCKS environment variable to enable the lock verifiction code. If you have a crash test box I would appreciate it if you would enable this kernel option. If it catches any errors you will be droped into the debugger where you can get a backtrace (type: tr) and mail it to me current@ to avoid dups. It wasn't able to sucessfully boot with this enabled. I'm hand transcribing this, so apologies for any typos: [fsck finishes] Doing initial network setup: host.conf hostname. VOP_READ: 0xc6737800 is not locked but should be Debugger(Lock violation. ) Debugger(c0420fe4) at Debugger+0x45 vn_rdwr(0,c6737800,c6425000,55ac,0,0,1,8,c22c7200,df241aec,c22cc0c0) at vn_rdwr+0x18d linker_hints_lookup(c04750a0,c,c62df000,5,0) at linker_hints_lookup+0x2d9 linker_search_module(c62df000,5,0,0,c0415120) at linker_search_module+0x43 linker_load_module(0,c62df000,0,0,df241cdc) at linker_load_module+0x72 kldload(c22cc0c0,df241d14,1,0,296) at kldload+0xc3 syscall(...) Oh, I don't use kernel modules on my main dev box. Thanks! I'll have this resolved tomorrow. More below. If I disable the panic and continue the boot process, I see the following in dmesg: da0 at ahc0 bus 0 target 0 lun 0 da0: SEAGATE ST336706LW 010A Fixed Direct Access SCSI-3 device da0: 160.000MB/s transfers (80.000MHz, offset 63, 16bit), Tagged Queueing Enable d da0: 35003MB (71687370 512 byte sectors: 255H 63S/T 4462C) /usr/src/sys/vm/uma_core.c:1332: could sleep with kernel linker locked from /u sr/src/sys/kern/kern_linker.c:1798 VOP_READ: 0xc6737800 is not locked but should be VOP_GETVOBJECT: 0xc6737800 is not locked but should be VOP_GETVOBJECT: 0xc6737800 is not locked but should be VOP_BMAP: 0xc6737800 is not locked but should be VOP_GETVOBJECT: 0xc6737800 is not locked but should be VOP_GETVOBJECT: 0xc6737800 is not locked but should be VOP_READ: 0xc6737800 is not locked but should be VOP_READ: 0xc6737800 is not locked but should be VOP_READ: 0xc6737800 is not locked but should be These are all also from the linker. I just verified by loading a module. Thanks, Jeff To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: cvs commit: src/sys/tools vnode_if.awk
On Sun, 7 Jul 2002, Don Lewis wrote: It wasn't able to sucessfully boot with this enabled. I'm hand transcribing this, so apologies for any typos: [snip] Debugger(c0420fe4) at Debugger+0x45 vn_rdwr(0,c6737800,c6425000,55ac,0,0,1,8,c22c7200,df241aec,c22cc0c0) at vn_rdwr+0x18d linker_hints_lookup(c04750a0,c,c62df000,5,0) at linker_hints_lookup+0x2d9 linker_search_module(c62df000,5,0,0,c0415120) at linker_search_module+0x43 linker_load_module(0,c62df000,0,0,df241cdc) at linker_load_module+0x72 kldload(c22cc0c0,df241d14,1,0,296) at kldload+0xc3 syscall(...) Revision 1.91 of kern_linker.c fixes this problem. Can you try again? Thanks! Jeff To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
UMA statistics feedback.
To those of you who provided me with UMA statistics; Thank you! The information was enlightening. The current bucket sizes aren't as bad as I had originally anticipated. I think I need to rework the mechanism by which the statistics are collected to get more interesting results, but for now what we have does nicely. Thanks, Jeff To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: FW: UMA question..
Jeff , (current included because it may be an interesting answer) As you know I'm using UMA to allocate threads and cache them. The 'constructor methods allow me to allocated threads that have been pre-set up with thread stacks and other special items. When they are being cached they still have their stacks etc. attached to them. These are only splitt off when the UMA decides to stop caching an item and actualy return it's memory to the system. In this regard the UMA allocator is not a memory alocator but a 'complex object allocator'... Very cool. Now my question.. I ant to allocate proc structures the same way... in other words, I want a cached proc structure to already have a thread attached to it and a stack attached to the thread.. Is it legal for teh init function which is called by UMA to in turn call UMA to allocate a sub element.. so if I do uma_zalloc(proc args) that in turn should do a uma_zalloc(thread args). would this work? is it legal? No locks are held when doing init ctor or fini. The zone and possibly per cpu queue lock is held while doing the dtor though. So it is safe as long as you don't cause a recursive allocation in the same zone. In short, what you want to do is perfectly reasonable. I need to allocate extra threads independantly of processes, but I could work it so that freed process structures always had a single thread left on them, which would save on allocations.. In the future I need to do teh same for KSEs and KSEGRPS. sp having UMA cache pre-constructed complex items made up of groups of separatly UMA-allocated objects would be a great saving.. the question is.. will it work? can I call UMA from withing a UMA constructor? To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: vm related kernel panics
I got 2 panics from -current sources of today. The back traces are: panic 1: vm_page_insert vm_page_alloc vm_page_grab pmap_new_proc vm_forkproc fork1 fork syscall syscal panic 2: panic mtx_init fork1 fork syscal syscall I would provide more information except I seem to have some problems reading my vmcore.0 files with either gdb or gdb52. I'm not sure what's wrong with gdb, but those two panics were my fault. They were fixed by rev 1.80 of vm_kern.c Jeff To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: Typo in uma_core.c causing panics after uma_zdestroy()
On Wed, 5 Jun 2002, Ian Dowse wrote: The logic for testing UMA_ZFLAG_INTERNAL in zone_dtor() is reversed. I was able to reliably reproduce crashes with: mdconfig -a -t malloc -s 10m mdconfig -d -u 0 mdconfig -a -t malloc -s 10m mdconfig -d -u 0 Ian Index: uma_core.c === RCS file: /FreeBSD/FreeBSD-CVS/src/sys/vm/uma_core.c,v retrieving revision 1.26 diff -u -r1.26 uma_core.c --- uma_core.c3 Jun 2002 22:59:19 - 1.26 +++ uma_core.c5 Jun 2002 01:17:27 - @@ -1132,7 +1132,7 @@ printf(Zone %s was not empty. Lost %d pages of memory.\n, zone-uz_name, zone-uz_pages); - if ((zone-uz_flags UMA_ZFLAG_INTERNAL) != 0) + if ((zone-uz_flags UMA_ZFLAG_INTERNAL) == 0) for (cpu = 0; cpu maxcpu; cpu++) CPU_LOCK_FINI(zone, cpu); Looks great to me. Commit it? Thanks, Jeff To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Please enable 'options MALLOC_PROFILE'!
This will keep statistics on the effeciency of our current malloc bucket sizes. After some time of general usage please do a 'sysctl kern.mprof file' and mail the file to me. Please include the following information: Primary Usage: workstation/server/web server/etc. etc. Architecture: x86/alpha/sparc64/powerpc/ia64 Hostname: 'hostname' Physical mem: 'sysctl hw.physmem' This will allow me to select new malloc bucket sizes that will be more memory effecient for a wide range of loads. The more folks who contribute the better our memory footprint will be. Thanks, Jeff To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: LOOKUP_SHARED is default now
On 9 Apr 2002, Dag-Erling Smorgrav wrote: Considering that neither LOOKUP_SHARED nor LOOKUP_EXCLUSIVE is documented anywhere, could you enlighten us as to what, exactly, they do? Right, sorry. There was some minimal discussion about this on arch quite a while ago. Basically, it allows namei to return leafs locked with shared locks instead of exclusive locks when a flag is set. This not only reduces contention, but also the number of exclusive locks that are floating around in the system. Jeff To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
LOOKUP_SHARED is default now
This patch has seriously reduced file system deadlocks for several people. It also makes concurrent file system access much faster in certain cases. Since I have only heard good reports and no bad reports I'm going to enable it by default. If you do experience some file system deadlocks please let me know. You may revert to the previous behavior with 'options LOOKUP_EXCLUSIVE'. I will take this away after a month or so if there are no problems. Thanks, Jeff To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Peculiar(?) slowdown with -CURRENT as of 21 March
I saw some similar weirdness in my test machines last night where a dual processor DS20 (Alpha 21264 500x2) beat out a PII Xeon 450x4. Normally the quad xeon beats the DS20. The quad xeon was using -j16 but was about 74% idle. The DS20 had used -j8. I didn't get a chacne to run top to see how it was doing during hte world since I didn't notice the weirdness until last night after the DS20 had finsihed but the quad xeon was still chugging along. Are you both running with WITNESS and INVARIANTS? UMA is slightly slower with these options on than the original malloc vm_zone code. I'm not sure why it would be even worse for SMP machines though. So maybe it isn't UMA at all but it's worth looking into. Thanks, Jeff To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: Call for UMA (allocator) testers.
I have received a few reports of panics when loading modules. If you're going to run it you may want to staticly compile in pseudofs/procfs, etc. Thanks, Jeff On Sun, 10 Mar 2002, Glenn Gombert wrote: I have the UMA patch installed on two systems here, a 500Mhz K7 system and dual PIII SMP box, both of which have WITNESS and INVARIANTS configured in the kernel. I will run them for the next few days, and report anything that looks unusual in operation :) GG. I'd like people to test with WITNESS and INVARIANTS, although with these options on it is somewhat slower than the original kernel. With these disabled it is on par. If you have a SMP machine you will get witness warnings if you run low on memory. There is no real problem except that witness doesn't understand that the condition is safe. If you do test this patch, please send me an email so I know how many people are using this. If you get a lock order violation other than acquring duplicate lock of same type please let me know. If you get a panic, please give me a stack trace (tr in ddb) and the output of call uma_print_stats in the debugger if that is possible. This has been debugged and tested over several months so it is quite stable for me. Hopefully it will be stable for you too. :-) The patch and new files are available at: http://www.chesapeake.net/~jroberson/uma.tar Untar into src/sys and apply the patch. After you rerun config you should be ready to compile. Thanks, Jeff To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message Glenn Gombert [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: Call for UMA (allocator) testers.
There were problems with loading modules, but I haven't seen any panics. The loading problems were fixed yesterday in revisions 1.77 and 1.78 of kern_linker.c. I suspect people, who imay have had panics, need to update to the latest version of kern_linker.c. -- Steve Good news for me. Thanks, I haven't caught up on my commit mail yet. I'll make sure this fixes the panic for me as soon as I get home. For those of you that saw the panic, can you update this file and try again? Thanks, Jeff To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: HEADS UP: Be nice to -CURRENT ( 1 week Feature Slush )
Should I postpone my allocator commit then? Thanks, Jeff To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Call for UMA (allocator) testers.
I have an updated copy of my kernel memory allocator available for general testing. If you aren't familiar with this allocator you may want to look at the arch@ archives under Slab allocator. This patch has been tested on SMP alpha and single proc x86. It depends on recent current changes, so fresh sources are expected. This patch is missing the necessary vmstat changes, so you will want to use 'sysctl vm.zone' to view your memory usage. I'd like people to test with WITNESS and INVARIANTS, although with these options on it is somewhat slower than the original kernel. With these disabled it is on par. If you have a SMP machine you will get witness warnings if you run low on memory. There is no real problem except that witness doesn't understand that the condition is safe. If you do test this patch, please send me an email so I know how many people are using this. If you get a lock order violation other than acquring duplicate lock of same type please let me know. If you get a panic, please give me a stack trace (tr in ddb) and the output of call uma_print_stats in the debugger if that is possible. This has been debugged and tested over several months so it is quite stable for me. Hopefully it will be stable for you too. :-) The patch and new files are available at: http://www.chesapeake.net/~jroberson/uma.tar Untar into src/sys and apply the patch. After you rerun config you should be ready to compile. Thanks, Jeff To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
RE: Broken mmap in current?
Title: RE: Broken mmap in current? I think I spoke too soon.. I saw thousands of calls to mmap and assumed it was the thousands of read/writes that I was doing. It's actually for the thousands (8192) of pages that I'm mapping in. Oddly enough though there are only 3272 calls to my mmap routine each time I run the program. I will investigate further. I did find a bug in mlock() and munlock(). I tried mlock()ing after I mmaped, which I later realized was bogus since the pages are always resident as they exist on the bus. Anyway the kernel faults in vm_page_unwire when I munlock. I will investigate further and post a pr though. Thanks for your help! Jeff -Original Message- From: Bruce Evans [mailto:[EMAIL PROTECTED]] Sent: Thursday, January 11, 2001 8:52 PM To: Jeff Roberson Cc: '[EMAIL PROTECTED]' Subject: Re: Broken mmap in current? On Thu, 11 Jan 2001, Jeff Roberson wrote: I have written a character device driver for a proprietary PCI device that has a large sum of mapable memory. The character device supports mmap() which I use to export the memory into a user process. I have no problems accessing the memory on this device, but I notice that my mmap routine is called for every access! Is this a problem with current, or a problem with my mmap? Maybe both. The device mmap routine is called mainly by the mmap syscall for every page to be mmapped. It is also called by dev_pager_getpages() for some pagefaults, but I think this rarely happens. I use bus_alloc_resource and then rman_get_start to get the physical address in my attach, and then the mmap just returns atop(physical address). I assumed this is correct since I have verified with a logical analyzer that I am indeed writing to the memory on the device. This is correct. I looked at some examples. Many drivers get this wrong by using i386_btop(), alpha_btop(), etc. (AFAIK, atop() is for addresses which are what we are converting here, btop() is for (byte) offsets, and the machine-dependent prefixes are a vestige of page clustering code that mostly went away 7 years ago. Also, I noticed that the device's mmap interface does not provide any way to limit the size of the block being mapped? Can I specify the length of the region? The length is implicitly PAGE_SIZE. The device mmap function is called for each page to be mapped. It must verify that the memory from offset to (offset + PAGE_SIZE - 1) belongs to the device and can be accessed with the given protection, and do any device-specific things necessary to enable this memory. This scheme can't support bank-switched device memory very well, if at all. pcvvt_mmap() in the pcvt driver is the simplest example of this. agp_mmap() is a more up to date example with the same old bug that the vga drivers used to have (off by 1 (page) error checking the offset). Bruce To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Broken mmap in current?
Title: Broken mmap in current? I have written a character device driver for a proprietary PCI device that has a large sum of mapable memory. The character device supports mmap() which I use to export the memory into a user process. I have no problems accessing the memory on this device, but I notice that my mmap routine is called for every access! Is this a problem with current, or a problem with my mmap? I use bus_alloc_resource and then rman_get_start to get the physical address in my attach, and then the mmap just returns atop(physical address). I assumed this is correct since I have verified with a logical analyzer that I am indeed writing to the memory on the device. Also, I noticed that the device's mmap interface does not provide any way to limit the size of the block being mapped? Can I specify the length of the region? Thanks, Jeff
Bug Fix for SYSV semaphores.
Title: Bug Fix for SYSV semaphores. I noticed that sysv semaphores initialize the otime member of the semid_ds structure to 0, but they never update it afterwards. This field is supposed to be the last operation time. ie the last time a semctl was done. In UNIX Network Programming, Stevens suggests using this variable to detect races between multiple processes creating/accessing a sysv semaphore. Anyway, I looked through the code and came up with the following trivial patch. Could some one review it and perhaps commit it? This patch was made against current, but I noticed the bug is there in 4.1.1 and most likely everything before that. Thanks, Jeff (Pardon the revision numbers, they are from my own repository) *** sysv_sem.c 2000/09/15 11:11:48 1.1.1.1 --- sysv_sem.c 2000/12/12 23:44:28 *** *** 543,548 --- 543,550 return(EINVAL); } + semaptr-sem_otime = time_second; + if (eval == 0) p-p_retval[0] = rval; return(eval);
PXE build?
Title: PXE build? Does anyone know of any current issues with PXE? I've searched the mailing lists and I don't see any mention of a problem similar to mine. I'm running FreeBSD-CURRENT from 2000 09 15 on a server. The client has an Intel 21143 based ethernet card that claims it has PXE 2.0 (Build 74) support. I've setup bootp/tftp on the server which the client successfully uses to pull down the 'pxeboot' file. After the client retrieves pxeboot it just hangs. There is no further output from the machine. Does anyone know which particular build of PXE 2.0 works with pxeboot? Or is this even a problem with my firmware? Thanks, Jeff