Re: msleep() on recursivly locked mutexes
On 4/26/07, Hans Petter Selasky [EMAIL PROTECTED] wrote: Hi, In the new USB stack I have defined the following: Could you perhaps describe some of the codepaths in the USB stack that require this behavior? -- Bosko Milekic [EMAIL PROTECTED] http://www.crowdedweb.com/ ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: More user developers friendly memguard.
I like this very much. Please commit and feel free to continue improving Memguard for yours (and everyone elses) benefit. On 12/27/05, Pawel Jakub Dawidek [EMAIL PROTECTED] wrote: Here is the patch: http://people.freebsd.org/~pjd/patches/kern_malloc.c.3.patch It allows to configure memory type to debug without recompilling the kernel. It also allows to debug kernel modules with memguard. The rules: 1. If memory type is compiled into the kernel vm.memguard_desc should be configured in /boot/loader.conf. 2. If memory type is in kernel module, vm.memguard_desc sysctl should be configured before loading the module. -- Pawel Jakub Dawidek http://www.wheel.pl [EMAIL PROTECTED] http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! -- Bosko Milekic [EMAIL PROTECTED] To see all the content I generate on the web, check out my Peoplefeeds profile at http://peoplefeeds.com/bosko/profile ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: generic network protocols parser ?
On Fri, Mar 04, 2005 at 11:07:34AM -0500, Aziz KEZZOU wrote: Hi all, I am wondering if any one knows about a generic parser which takes a packet (mbuf) of a certain protocol (e.g RSVP ) as input and generates some data structre representing the packet ? I've been searching for a while and found that ethereal and tcpdump for example use specific data structres and functions to dissect each protocol packets. Is this the only approach possible ? My supervisor suggested using a TLV (Type/Length/Value) approach instead. Any opinions about that? If no such a parser exists is there any practical reason why ? Thanks, Aziz You can only go so far with generic parsing. Eventually you will want some protocol specific value to be extracted and your parser will have to know about what the packet looks like. What are you trying to do, exactly? -- Bosko Milekic [EMAIL PROTECTED] [EMAIL PROTECTED] ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: MBUF statistics
On Tue, 15 Feb 2005 16:54:52 +0100, Max Laier [EMAIL PROTECTED] wrote: On Tuesday 15 February 2005 12:38, Borja Marcos wrote: Hello, Looking at the mbuf statistics available in FreeBSD 4 and FreeBSD 5 I can see that the statistics available in FreeBSD 5 are, surprisingly, much less comprehensive. Is there any other place where I can find out how many mbuf requests have been done, how many of them have waited, how many have failed, etc? I use $vmstat -z | grep Mbuf. The netstat -m output is broken, because fixing this would impose an additional atomic operation on each alloc/free which is a real performance killer. Yeah, unfortunately statistics are too hard to do completely correctly right now (too hard on performance, that is). To make things worse, the more involved UMA zone design used for Mbuf and Cluster allocations means that some of the UMA zone statistics you get with vmstat -z are not entirely accurate. The UMA zone statistics code needs to be changed to accomodate this structure, but in addition to that a way to make statistics gathering cheaper would be nice, because currently doing a 'vmstat -z' is really terrible for performance. Which reminds me... those of you doing benchmarks, please don't run 'vmstat -z' while you're doing them, it might skew/pessimize your results. -- Bosko Milekic - If I were a number, I'd be irrational. Contact Info: http://bmilekic.unixdaemons.com/contact.txt ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Playing with mbuf in userland
Paolo Pisati wrote: Hi, i'm developing a little app that manipulates mbuf. Right now i'm still working on it as userland app but i would like to test it with some real mbufs straight from the stack. Do you know how i can get some of these structs in an easy way? I mean, is it possible to copy some of these struct from stack to userland? Or should i fake it in userland? One way to do this would be to instrument a for-superuser-only socket option that would copy out all of the data, including the metadata and mbuf headers, out to userland, while taking care to modify references within the mbufs to userland locations {*}. To do this, in turn, you would need to obtain the userland target addresses of all mbufs and clusters you're copying out beforehand, and overwrite all mbufs' m_next, m_nextpkt, m_data, and in some cases, m_ext.ext_buf references before doing the copyout in-kernel. This can be a pretty involved copy and would require careful implementation. {*} The data is the socket buffer is kept as an mbuf chain so this is possible. Another option to look into would be to implement a sysctl(8)-exported handler that iterates over the mbuf chain and prints out the mbuf chains in something like XML, which your userland application can then more or less easily parse, and reproduce the chain (fake it up) in userland. This solution is rather attractive because you can do all sorts of things with the intermediate-parsing-language right from the kernel, as well as from userland (at the parsing stages). To see an example of a sysctl(8) handler, refer to src/sys/vm/uma_core.c (the bottom) in FreeBSD 5.x. In any case, I would be very interested in seeing what you come up with, as this could be a very useful diagnostic tool. -Bosko -- Bosko Milekic [EMAIL PROTECTED] [EMAIL PROTECTED] For the wicked / Carry us away / Captivity require from us a song / How can we sing king alpha's song in a strange land? --Bob Marley ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Playing with mbuf in userland
I wrote: Another option to look into would be to implement a sysctl(8)-exported handler that iterates over the mbuf chain and prints out the mbuf chains in something like XML, which your userland application can then more or less easily parse, and reproduce the chain (fake it up) in userland. This solution is rather attractive because you can do all sorts of things with the intermediate-parsing-language right from the kernel, as well as from userland (at the parsing stages). To see an example of a sysctl(8) handler, refer to src/sys/vm/uma_core.c (the bottom) in FreeBSD 5.x. I should also add: you should have various sysctl OIDs that call this handler passing, say, the mbuf chain as an argument (a reference to the top mbuf). This way certain OIDs can send out a snapshot of an mbuf chain at a particular point in the stack, and others can send snapshots from the socket buffer and driver entry-points (you can get some perception of how the chain changes as it makes its way up-and-down the layers). -Bosko ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: [HEADS-UP] mbuma is in the tree
Bosko, [deletia] are you going to convert mbuf tag allocator to UMA? Now tags are allocated with malloc(). AFAIK, tags are used heavily in pf, and forthcoming ALTQ. Moving to UMA should affect their performance positively. First off, malloc() *is* UMA. With mbuma in the tree, I don't believe we have any remaining custom-allocators in the tree. As for what to do with m_tags, it is still unclear to me. Personally, I'm conflicted about their use. On one hand, they offer a clean way to attach metadata to packets, but on the other hand they are quite expensive. If you read the paper on mbuma, you'll notice that I point out that it would be worth investigating whether, in scenarios where an m_tag is ALWAYS required per packet (e.g., MAC), providing a secondary zone with pre-allocated m_tags for packet headers might be worth it. Prior to this work, however, I suggest we investigate the possibility of using smaller mini-mbufs whenever clusters are used so that space wastage is reduced. -Bosko ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
[HEADS-UP] mbuma is in the tree
(Hello Chris Haalboom? :-)) Hello, In order to avoid having to type everything again, I'll refer to the commit log. PLEASE READ IT IN FULL: Bring in mbuma to replace mballoc. mbuma is an Mbuf Cluster allocator built on top of a number of extensions to the UMA framework, all included herein. Extensions to UMA worth noting: - Better layering between slab - zone caches; introduce Keg structure which splits off slab cache away from the zone structure and allows multiple zones to be stacked on top of a single Keg (single type of slab cache); perhaps we should look into defining a subset API on top of the Keg for special use by malloc(9), for example. - UMA_ZONE_REFCNT zones can now be added, and reference counters automagically allocated for them within the end of the associated slab structures. uma_find_refcnt() does a kextract to fetch the slab struct reference from the underlying page, and lookup the corresponding refcnt. mbuma things worth noting: - integrates mbuf cluster allocations with extended UMA and provides caches for commonly-allocated items; defines several zones (two primary, one secondary) and two kegs. - change up certain code paths that always used to do: m_get() + m_clget() to instead just use m_getcl() and try to take advantage of the newly defined secondary Packet zone. - netstat(1) and systat(1) quickly hacked up to do basic stat reporting but additional stats work needs to be done once some other details within UMA have been taken care of and it becomes clearer to how stats will work within the modified framework. From the user perspective, one implication is that the NMBCLUSTERS compile-time option is no longer used. The maximum number of clusters is still capped off according to maxusers, but it can be made unlimited by setting the kern.ipc.nmbclusters boot-time tunable to zero. Work should be done to write an appropriate sysctl handler allowing dynamic tuning of kern.ipc.nmbclusters at runtime. Additional things worth noting/known issues (READ): - One report of 'ips' (ServeRAID) driver acting really slow in conjunction with mbuma. Need more data. Latest report is that ips is equally sucking with and without mbuma. - Giant leak in NFS code sometimes occurs, can't reproduce but currently analyzing; brueffer is able to reproduce but THIS IS NOT an mbuma-specific problem and currently occurs even WITHOUT mbuma. - Issues in network locking: there is at least one code path in the rip code where one or more locks are acquired and we end up in m_prepend() with M_WAITOK, which causes WITNESS to whine from within UMA. Current temporary solution: force all UMA allocations to be M_NOWAIT from within UMA for now to avoid deadlocks unless WITNESS is defined and we can determine with certainty that we're not holding any locks when we're M_WAITOK. - I've seen at least one weird socketbuffer empty-but- mbuf-still-attached panic. I don't believe this to be related to mbuma but please keep your eyes open, turn on debugging, and capture crash dumps. This change removes more code than it adds. A paper is available detailing the change and considering various performance issues, it was presented at BSDCan2004: http://www.unixdaemons.com/~bmilekic/netbuf_bmilekic.pdf Please read the paper for Future Work and implementation details, as well as credits. Testing and Debugging: rwatson, brueffer, Ketrien I. Saihr-Kesenchedra, ... Reviewed by: Lots of people (for different parts) SHOULD YOU HAVE ANY ISSUES: - Turn on INVARIANTS - Turn on WITNESS - Send stack trace and if possible capture crash dump - Might require further information from you, please provide reachable Email address. - When you Email me, please include MBUMA in the Subject line. Cheers, Bosko ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Network buffer allocations: mbuma, PLEASE TEST
Hi, If you're running -CURRENT, please test this: http://people.freebsd.org/~bmilekic/code/mbuma2.diff It is several extensions to UMA and mbuf cluster allocation built on top of it. Once you apply the patch from src/, you need to rebuild and reinstall src/usr.bin/netstat, src/usr.bin/systat, and then a new kernel. When you're configuring your new kernel, you should remove the NMBCLUSTERS compile-time option, it's no longer needed. Clusters will still be capped off according to maxusers (which is auto-tuned itself). Alternately, if you want theoretically unlimited number of clusters, you can tune the boot-time kern.ipc.nmbclusters tunable to zero. Unless final issues arise I'm going to commit this tomorrow morning; it's been tested already quite a bit, and performance considered. A paper is available and was presented at BSDCan 2004; in case you missed it: http://www.unixdaemons.com/~bmilekic/netbuf_bmilekic.pdf It has been looked at for quite some time now. Additional code cleanups will need to occur following commit, maybe. Future work is also possible, see the paper if you're interested in taking some of it on. Oh, and keep me in the CC; I have no idea if I'm subscribed to these lists anymore. You should also follow up to this thread on -net and not on -hackers (trim -hackers from CC in the future). Thanks and happy hacking! Regards, -- Bosko Milekic [EMAIL PROTECTED] [EMAIL PROTECTED] ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: panic in uma_zdestroy
I screwed up... fix coming shortly. Sorry! On Fri, Aug 01, 2003 at 07:00:19PM +0200, Harti Brandt wrote: Hi, with a kernel from yesterday I get a panic on an SMP system when I destroy a zone immediately after creating it. It have a driver (with the probe routine set to return ENXIO) and the following module event function: /* * Module loaded/unloaded */ int en_modevent(module_t mod __unused, int event, void *arg __unused) { switch (event) { case MOD_LOAD: en_vcc_zone = uma_zcreate(EN vccs, sizeof(struct en_vcc), NULL, NULL, NULL, NULL, UMA_ALIGN_PTR, 0); if (en_vcc_zone == NULL) return (ENOMEM); break; case MOD_UNLOAD: uma_zdestroy(en_vcc_zone); break; } return (0); } When I load the module and unload it I get a panic with the following trace: db trace uma_zfree_internal(c083a200,0,0,0,c627b3c4) at uma_zfree_internal+0xb0 cache_drain(c627b300,1,c030547c,245,c0369740) at cache_drain+0xe3 zone_drain_common(c627b300,1,c030547c,461,0) at zone_drain_common+0x62 zone_dtor(c627b300,f4,0,dad4fc40,c01b0255) at zone_dtor+0x55 uma_zfree_internal(c0369660,c627b300,0,0,dad4fc60) at uma_zfree_internal+0x35 uma_zdestroy(c627b300,dad4fc84,c01adce0,c6302c40,1) at uma_zdestroy+0x2a en_modevent(c6302c40,1,0,c5ea2000,c632c700) at en_modevent+0x4b driver_module_handler(c6302c40,1,c658a804,dad4fcc0,c0183f61) at driver_module_handler+0x120 module_unload(c6302c40,c02f00d9,1f1,0,0) at module_unload+0x1e linker_file_unload(c632c700,0,c02f00d9,31b,c632f250) at linker_file_unload+0x81 kldunload(c6046ab0,dad4fd10,c0309978,3ee,1) at kldunload+0x9b syscall(2f,2f,2f,bfbffd03,bfbffc1c) at syscall+0x2b3 Xint0x80_syscall() at Xint0x80_syscall+0x1d --- syscall (305, FreeBSD ELF32, kldunload), eip = 0x80485b3, esp = 0xbfbff76c, ebp = 0xbfbffbcc --- db The uma_zfree_internal call is the first one in cache_drain (the one that frees uc_allocbucket). The seconds argument to uma_zfree_internal in the trace above seems rather strange to me. What is the problem here? harti -- harti brandt, http://www.fokus.fraunhofer.de/research/cc/cats/employees/hartmut.brandt/private [EMAIL PROTECTED], [EMAIL PROTECTED] ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED] -- Bosko Milekic * [EMAIL PROTECTED] * [EMAIL PROTECTED] TECHNOkRATIS Consulting Services * http://www.technokratis.com/ ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: panic in uma_zdestroy
On Fri, Aug 01, 2003 at 01:32:05PM +, Bosko Milekic wrote: I screwed up... fix coming shortly. Sorry! On Fri, Aug 01, 2003 at 07:00:19PM +0200, Harti Brandt wrote: Hi, with a kernel from yesterday I get a panic on an SMP system when I destroy a zone immediately after creating it. It have a driver (with the probe routine set to return ENXIO) and the following module event function: ... Again, I appologize. I just committed something which should fix this: bmilekic2003/08/01 10:42:27 PDT FreeBSD src repository Modified files: sys/vm uma_core.c Log: Only free the pcpu cache buckets if they are non-NULL. Crashed this person's machine: harti Pointy-hat to: me Revision ChangesPath 1.70 +6 -4 src/sys/vm/uma_core.c Let me know if you're still having problems. -- Bosko Milekic * [EMAIL PROTECTED] * [EMAIL PROTECTED] TECHNOkRATIS Consulting Services * http://www.technokratis.com/ ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: complicated downgrade
On Tue, Jul 22, 2003 at 09:01:06AM +0300, Valentin Nechayev wrote: Mon, Jul 21, 2003 at 23:40:05, des wrote about Re: complicated downgrade: I need to downgrade a remote FreeBSD system from 5.1-release to 4.8-release remotely without any local help (except possible hitting Reset). Maybe if you tell us why you need to do this we can figure out a way for you to avoid doing it? System periodically hangs up. Average uptime is ~6 hours. No crash info is available. No serial console is available. Different invariants didn't help, AFAIK (this testing was done by another admin, so I'm not 100% sure). 4.8 in any case is considered more stable, so switching can exclude some software problems or software-caused triggerings of hardware problems. This sounds like the same symptoms as the latest USB problem... when/if you track -current or even run one of the 5.x releases, it's key to realize that this is very active code that you're running; it's not the same thing as running 4.x, for example. The code in 5.x is constantly actively changing, whereas the code in 4.x only receives comparatively well-regulated merges from 5.x, for the most part. Therefore, one of the things to always try is to update to the latest -current, rebuild, and see if you can reproduce. Chances are, your problem may have been fixed and, if not, at least we can be confident that it's reproducable on your hardware with the latest sources. Just now question isn't so important because it was decided to move to another box (including more friendly environment), so my question is more theoretical than practical. But, there is opportunity to play with configs, so I'll try again to play with invariants, witnesses, etc. Thanks to all for help. -netch- Cheers, -- Bosko Milekic * [EMAIL PROTECTED] * [EMAIL PROTECTED] TECHNOkRATIS Consulting Services * http://www.technokratis.com/ ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: SMP problem with uma_zalloc
On Fri, Jul 18, 2003 at 07:05:58PM +0200, Harti Brandt wrote: Hi all, it seems there is a problem with the zone allocator in SMP systems. I have a zone, that has an upper limit on items that resolves to an upper limit of pages of 1. It turns out, that allocations from this zone get stuck from time to time. It seems to me, that the following happens: - on the first call to uma_zalloc a page is allocated and all the free items are put into the cache of the CPU. uz_free of the zone is 0 and uz_cachefree holds all the free items. - when the next call to uma_zalloc occurs on the same CPU, everything is fine. uma_zalloc just gets the next item from the cache. - when the call happens on another CPU, the code finds uz_free to be 0 and checks the page limit (uma_core.c:1492). It finds the limit already reached and puts the process to sleep (uma_zalloc was called with M_WAITOK). - the process may sleep there forever (depending on circumstances). If M_WAITOK is not set, the code will falsely return NULL while there are still free items (albeight in the cache of another CPU). I wonder whether this is intended behaviour. If yes, this should be definitely documented. uma_zone_set_max() seems to be documented only in the header file and it does not mention, that free items may not actually be allocatable because they happen to sit in another CPU's cache. If it is not intended (I would prefer this), I wonder how one can get the items out of another's CPU cache. I'm not too familiar with this code. I suppose this should be done somewhere around uma_core.c:1485? Regards, harti -- harti brandt, http://www.fokus.fraunhofer.de/research/cc/cats/employees/hartmut.brandt/private [EMAIL PROTECTED], [EMAIL PROTECTED] If the per-cpu caches are relatively small (which they ought to be, especially when you've hit a maximum number of allocations from the zone), then this is actually not that bad of a behavior. I spoke to Jeff about this and it seemed to me that he was leaning toward keeping the behavior this way and, in fact, also perhaps _not_ even doing an internal free to the zone when UMA_ZFLAG_FULL is in effect but we still have space in the pcpu cache. While I'm not sure if going that far is a good idea, I _don't_ really think that the current behavior is a bad idea. As mentionned, when you have a zone that is mostly starved, all future frees will go back to the zone and not the per-cpu caches, but if you have some free items in another per-cpu cache, you're not likely to hit a starvation situation unless something is horribly wrong. And having the free code actually drain the per-cpu caches in a zone-full situation may lead to bad behavior under heavy load. Think about what happens under heavy load... your zone is starved and if you then flush all the pcpu caches and the load is still heavy, you're likely to have other threads try to allocate anyway, so they'll end up having to dip into the zone anyway; therefore, there doesn't seem to be much of a reason to push the cached objects back into the zone (if they're going to leave it again soon anyway). -- Bosko Milekic * [EMAIL PROTECTED] * [EMAIL PROTECTED] TECHNOkRATIS Consulting Services * http://www.technokratis.com/ ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: running 5.1-RELEASE with no procfs mounted (lockups?)
On Tue, Jul 15, 2003 at 10:43:19PM -0700, Josh Brooks wrote: [...] One of the systems, the one I am doing all the work on, is an SMP system, and it keeps locking up on me - the lockups are always the same - things are going fine, and suddenly a process fails to complete - maybe it is pwd, maybe I type :q! in vi and it just sticks there - either way, randomly, processes just begin to lock up ... if I log in on another session, I can see the PID, but I cannot kill it - I can kill -9 (PID) 100 times and it will still exist. Eventually the entire system will lock up, although you can always ping the system. When this happens and you start another session to kill the original process, can you perhaps run 'ps pid -l' and get the MWCHAN column? The process could be stuck blocking somewhere in the kernel, which is why your signal is not being delivered. Anyway, this is just one possibility. See if all the processes you describe as 'frozen' have the same MWCHAN and, if so, what is it? -- Bosko Milekic * [EMAIL PROTECTED] * [EMAIL PROTECTED] TECHNOkRATIS Consulting Services * http://www.technokratis.com/ ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: NSS Modules
On Wed, Jul 09, 2003 at 06:27:21PM -0400, Ben Goodwin wrote: Hi guys ... I thought I'd give you a heads-up that I'm porting libnss-mysql to the NSS API that FreeBSD 5.1 has adopted in case anyone has input, suggestions, wants to test, etc.. I'm also curious about including it eventually .. via ports or something perhaps? Is anyone else developing NSS modules for FreeBSD? I believe I've figured out the API .. I've got a rudimentary test working, so ... Actually, I do have one question .. As I support more operating systems, I've wondered about how to autoconf the different APIs .. right now if I see nss.h I know it's one OS, and if I see nsswitch.h I know it's the other (Linux vs. Solaris) .. but that doesn't hold true with FreeBSD added to the mix. Any recommendations on what I could do to create an API define that holds the current O/S in a clean and reliable fashion? Thanks! You should be able to do: #if defined(__FreeBSD__) to test if you're on FreeBSD. This is built as part of the freebsd-spec in gcc so it will be defined at least if you're using our system (stock) compiler. Ideally, though, the API would be the same. :-) -=| Ben http://libnss-mysql.sourceforge.net -- Bosko Milekic * [EMAIL PROTECTED] * [EMAIL PROTECTED] TECHNOkRATIS Consulting Services * http://www.technokratis.com/ ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: NSS Modules
On Thu, Jul 10, 2003 at 05:12:03PM -0400, Ben Goodwin wrote: I'd like to support Sun's cc, however .. so I'm betting that isn't defined (I will check) ... I figured that would be available under gcc but assumed it wasn't portable enough ... You can still test for whether or not it is defined. #if defined(__FreeBSD__) /* Do freebsd-specific stuff */ #else /* Other systems? */ #endif -- Bosko Milekic * [EMAIL PROTECTED] * [EMAIL PROTECTED] TECHNOkRATIS Consulting Services * http://www.technokratis.com/ ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Kernel Support for System Call Performance Monitoring
On Mon, Jun 16, 2003 at 01:20:03PM -0400, Yaoping Ruan wrote: We have been working on improving Web server performance on FreeBSD, and think you may be interested in the results and techniques we used. Specifically, we focus on the SpecWeb99 benchmark and the Flash Web Server, and have roughly quadrupled its performance. We did this by adding support for a very low-cost kernel performance monitoring system, which allowed us to find and fix a number of bad interactions between the server and the OS. We additionally augmented one of the system calls, sendfile, to be more useful for this kind of server. We think that our observations may be useful for other servers, and may present opportunities for performance improvement in FreeBSD. A paper describing our system can be found at http://www.cs.princeton.edu/~yruan/DeBox and we can provide the patches we made if anyone's interested. We welcome any comments and feedback that you have. First off, thank you for choosing FreeBSD for your research. The more effort is put into doing this sort of research, the better it is for both the academic community and the industry. I've read your paper and have a few brief notes: - On DeBox implementation. I understand that the DeBox implementation is primarily a tool used for tracking down potential application bottlenecks and so the relative importance of the crudeness of the implementation is not so high. However, I'm looking at this from the perspective of introducing DeBox as a permanent option in FreeBSD, and two immediate problems are: 1) User-visible DeBoxInfo structure has the magic number 5 PerSleepInfo structs and the magic number 200 CallTrace structs. It seems that it would be somewhat less crude to turn the struct arrays in DeBoxInfo into pointers in which case you have several options. You could provide a library to link applications compiled for DeBox use with that would take care of allocating the space in which to store maxSleeps and maxTrace-worth of memory and hooking the data into resultBuf or providing the addresses as separate arguments to the DeBoxControl() system call. For what concerns the kernel, you could take a similar approach and dynamically pre-allocate the PerSleepInfo and CallTrace structures, based on the requirements given by the DeBoxControl system call. 2) The problem of modifying entry-exit paths in function calls. Admittedly, this is hard, but crudely modifying a select number of functions to Do The Right Thing for what concerns call tracing is hard to justify from a general perspective. I don't mean to spread FUD here; the change you made is totally OK from a measurement perspective and serves great for the paper, it's just tougher to integrate this stuff into the mainline code. - On the Case Study. I was most interested in the sendfile modifications you talk about and would be interested in seeing patches. I know that some of the modifications you mention have already been done in 5.x; Notably, if you have not already, you'll want to glance at: http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/kern/uipc_syscalls.c? \ rev=1.144content-type=text/x-cvsweb-markup (regarding your mapping caching in sf_bufs) and this [gigantic] thread: http://www.freebsd.org/cgi/getmsg.cgi?fetch=12432+15802+ \ /usr/local/www/db/text/2003/freebsd-arch/20030601.freebsd-arch (subject: sendfile(2) SF_NOPUSH flag proposal on freebsd-arch@, at least). You may want to contact Igor Sysoev or other concerned parties in that thread to show them that you actually have performance results resulting from such a change. Finally, I'd like to sort of make a longshot proposal; more of a if you have the time follow-up to your work that someone could be able to perform, and that would certainly be interesting to see: how all this works out when forward-ported to FreeBSD 5.x. Sincerely - Yaoping [EMAIL PROTECTED] Regards, -- Bosko Milekic * [EMAIL PROTECTED] * [EMAIL PROTECTED] TECHNOkRATIS Consulting Services * http://www.technokratis.com/ ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: HEADS UP! Major commits in the tree coming soon
For the benefit of the majority: This post was FAKE. Now please return to your regularly scheduled discussion and kindly ignore all future posts to this thread. -Bosko On Thu, May 29, 2003 at 01:50:33PM -0400, Kenneth Culver wrote: The HEAD code freeze was extended by three days to allow for some final pending work to be committed and prepare 5.1 to be a good release. The code freeze will likely end sometime tomorrow, May 30. We ask that large scale changes still be deferred until after 5.1 is actually released so that any problems can be dealt with. The release engineering team will send out emails explicitely stating when HEAD has thawed and when large changes like new compilers and dynamic-linked worlds can go it. The most important changes I'm going to commit today: - Remove gcc and replace it with a new TenDRA snapshot. I'm just wondering... but is there a reason why gcc is being replaced? Is there a page or a previous list mail that explains the reasons? URL? Thanks. - Remove GNU tar. - Fix httpd.ko to make it work on buggy AMD processors. - Drop support for 386 and 486 cpus. - Remove ext2 support (GPL encumbered). - Add perl 5.8 *and* python 2.2 to base. - Remove Sendmail and replace it with Postfix. If anyone has any reason why these should not be committed, I'll give a 5 hours grace time. Send replies to the list. Thank you. Thorsten and the rest or the release engineering team. Thanks Ken ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED] -- Bosko Milekic [EMAIL PROTECTED] [EMAIL PROTECTED] ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: bootp_subr.c
On Fri, Feb 21, 2003 at 03:06:51PM +, omestre wrote: Hello, I'm working in FreeBSD diskless machines projects... and i have wrote a patch to bootp_subr.c code ( luigi code). I have posted a PR too. (kern/46174). luigi did not reply... no one did. I have more contact with Linux. Is this the FreeBSD world? Thanks! [EMAIL PROTECTED] SDF Public Access UNIX System - http://sdf.lonestar.org Hi, One of the problems could be that I don't see a patch in the PR. You included the source file itself but there is no indication as to what exactly was changed. diff -u bootp_subr.c.old bootp_subr.c would be good enough. Or if you have the sources checked out with cvs, cvs diff -u bootp_subr.c I think this would generate a quicker response. As for the idea itself, it sounds reasonable. Thanks, -- Bosko Milekic * [EMAIL PROTECTED] * [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: Fast interrupts
On Mon, Aug 26, 2002 at 09:41:43AM -0700, Maksim Yevmenkin wrote: John Baldwin wrote: On 26-Aug-2002 M. Warner Losh wrote: can you call wakeup(9) from a fast interrupt handler? [ ...] The only reason I ask is because sio seems to go out of its way to schedule a soft interrupt to deal with waking up processes, which then calls wakeup... Since wakeup only needs a spin lock, it is probably ok. You just can't call anything that would sleep (in any interrupt handler) or block on a non-spin mutex. what is the general locking technique for interrupt handlers? there must be some sort of locking, right? You are allowed to use mutex locks (both spin and MTX_DEF), only you are only allowed to user the former for fast interrupt handlers. -- Bosko Milekic * [EMAIL PROTECTED] * [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: Fast interrupts
On Mon, Aug 26, 2002 at 10:14:32AM -0700, Maksim Yevmenkin wrote: Bosko Milekic wrote: On Mon, Aug 26, 2002 at 09:41:43AM -0700, Maksim Yevmenkin wrote: John Baldwin wrote: On 26-Aug-2002 M. Warner Losh wrote: can you call wakeup(9) from a fast interrupt handler? [ ...] The only reason I ask is because sio seems to go out of its way to schedule a soft interrupt to deal with waking up processes, which then calls wakeup... Since wakeup only needs a spin lock, it is probably ok. You just can't call anything that would sleep (in any interrupt handler) or block on a non-spin ^^ my understanding is that John was talking about any interrupt handler. Not just fast interrupt hander. Yeah, you can't call anything that would _sleep_ (e.g., msleep()). You could still grab a MTX_DEF mutex for a non-fast interrupt handler and possibly block waiting to get it. -- Bosko Milekic * [EMAIL PROTECTED] * [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: Memory corruption in CURRENT
We have seen weird problems regarding the pmap PG_G related stuff (well sort of, it has to do with PSE and PG_G) on ppro and pII chips (apparently, this is not the case with at least Xeons) but what happened, for the record, was this: We would enable PSE and switch the pde corresponding to the first 4M to the new entry describing a 4M page, instead of the one describing the location of the ptes covering those 4M. Then, what we would do is walk all the ptes, including those old stale and useless ones that previously described those first 4M and set the PG_G bit there (Note: we've already set PG_G on our 4M page). Normally, we don't really need to touch the old ptes but we did it just because it was more convenient (i.e. a few lines less code). Oddly enough, on the ppro and pII what would happen is that we would page fault on that page where we kept the old ptes covering those first 4M, and only on that page! The other ptes - the ones that actually mattered - were all fine. The ptes are mapped above the 4M so I don't see how changing the pde for those first 4M would have done anything. To fix the problem, we (actually Peter) committed code that basically just jumps beyond that first page of stale ptes when setting the PG_G bit for the 4K pages, and since then, the problem seems to have gone away. Although we are not sure, this seems like a silicon bug. Since then, Peter had some work planned to load the kernel above the first 4M to see if that fixed the problems. I'm wondering if this problem on the PIVs could be related. Please let us know if the removal of those two options really makes 5-10 buildworlds in a row work out for you. Regards, Bosko On Thu, Aug 22, 2002 at 01:34:11PM +0200, Mark Santcroos wrote: On Thu, Aug 22, 2002 at 04:23:46AM -0700, Terry Lambert wrote: Ugh! Wait until it seems to work for a statistically significant sample size, and for more than one person before calling it happy! Also, I'm not sure looking at the code whether or not the PG_G is truly significant, or just preterbs the workaround. The problem I've referred to in my hunch here is actually related solely to the PSE, but with the recent code reorganization in locore.s, etc., it could have become more significant. I was just giving a slight report, not yelling halleluja yet ;-) It's doing the 2nd buildworld now. Do you also want me to try to split up the disabling of the two options? Mark -- Mark SantcroosRIPE Network Coordination Centre http://www.ripe.net/home/mark/New Projects Group/TTM To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message -- Bosko Milekic * [EMAIL PROTECTED] * [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: m_freem() in tcp_respond()
in panic (fmt=0xc03edd84 from debugger) at /usr/src/sys/kern/kern_shutdown.c:595 #3 0xc014cbb9 in db_panic (addr=-1071517796, have_addr=0, count=-1, modif=0xdc319b3c ) at /usr/src/sys/ddb/db_command.c:435 #4 0xc014cb59 in db_command (last_cmdp=0xc0463918, cmd_table=0xc0463758, aux_cmd_tablep=0xc04c0cb8) at /usr/src/sys/ddb/db_command.c:333 #5 0xc014cc1e in db_command_loop () at /usr/src/sys/ddb/db_command.c:457 #6 0xc014ed5b in db_trap (type=12, code=0) at /usr/src/sys/ddb/db_trap.c:71 #7 0xc03b84ce in kdb_trap (type=12, code=0, regs=0xdc319c90) at /usr/src/sys/i386/i386/db_interface.c:158 #8 0xc03c8e14 in trap_fatal (frame=0xdc319c90, eva=0) at /usr/src/sys/i386/i386/trap.c:969 #9 0xc03c8aed in trap_pfault (frame=0xdc319c90, usermode=0, eva=0) at /usr/src/sys/i386/i386/trap.c:867 #10 0xc03c8667 in trap (frame={tf_fs = 16, tf_es = -600768496, tf_ds = 16, tf_edi = -1048332032, tf_esi = 6422528, tf_ebp = -600728360, tf_isp = -600728388, tf_ebx = 0, tf_edx = 6756410, tf_ecx = 0, tf_eax = 0, tf_trapno = 12, tf_err = 0, tf_eip = -1071517796, tf_cs = 8, tf_eflags = 66199, tf_esp = -1048331972, tf_ss = -1048331972}) at /usr/src/sys/i386/i386/trap.c:466 #11 0xc021ef9c in m_freem (m=0x0) at /usr/src/sys/kern/uipc_mbuf.c:706 ---Type return to continue, or q return to quit--- #12 0xc0273a0f in tcp_respond (tp=0x0, ipgen=0xc183b93c, th=0xc183b950, m=0xc183b900, ack=2100704027, seq=0, flags=20) at /usr/src/sys/netinet/tcp_subr.c:396 #13 0xc0271eff in tcp_input (m=0xc183b900, off0=20, proto=6) at /usr/src/sys/netinet/tcp_input.c:2204 #14 0xc026b874 in ip_input (m=0xc183b900) at /usr/src/sys/netinet/ip_input.c:821 #15 0xc026b8d3 in ipintr () at /usr/src/sys/netinet/ip_input.c:842 #16 0xc03ba809 in swi_net_next () #17 0xc0224929 in connect (p=0xd86e1f20, uap=0xdc319f80) at /usr/src/sys/kern/uipc_syscalls.c:396 #18 0xc03c90f5 in syscall2 (frame={tf_fs = 47, tf_es = 47, tf_ds = 47, tf_edi = 22273, tf_esi = 3, tf_ebp = -1077938064, tf_isp = -600727596, tf_ebx = 671650276, tf_edx = -1077938288, tf_ecx = 13, tf_eax = 98, tf_trapno = 12, tf_err = 2, tf_eip = 672133692, tf_cs = 31, tf_eflags = 659, tf_esp = -1077938252, tf_ss = 47}) at /usr/src/sys/i386/i386/trap.c:1175 #19 0xc03b93a5 in Xint0x80_syscall () #20 0x2806fcbd in ?? () #21 0x8048d88 in ?? () #22 0x8048add in ?? () (kgdb) frame 12 #12 0xc0273a0f in tcp_respond (tp=0x0, ipgen=0xc183b93c, th=0xc183b950, m=0xc183b900, ack=2100704027, seq=0, flags=20) at /usr/src/sys/netinet/tcp_subr.c:396 396 m_freem(m-m_next); (kgdb) print m $1 = (struct mbuf *) 0xc183b900 (kgdb) print m-m_hdr.mh_next $2 = (struct mbuf *) 0x0 (kgdb) frame 11 #11 0xc021ef9c in m_freem (m=0x0) at /usr/src/sys/kern/uipc_mbuf.c:706 706 if (mcl_pool_now mcl_pool_max m-m_next == NULL (kgdb) To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-net in the body of the message -- Bosko Milekic * [EMAIL PROTECTED] * [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: ARM Port: Help with UMA subsystem needed
On Sat, Aug 03, 2002 at 11:07:11AM -0400, Stephane E. Potvin wrote: On Thu, Aug 01, 2002 at 08:05:12PM -0400, Stephane E. Potvin wrote: I've been busy trying to bring the port back in sync with current. Now, each time I start my NetWinder, I get the following panic which I don't seem able to track the source. I would greatly appreciate if anybody knowledgeable with the UMA subsystem could give me a hint on what could be causing this. I just found out that reverting this commit fixes the problem. Any ideas about why other arches don't encouter the problem? jeff2002/06/19 13:49:44 PDT Modified files: sys/vm uma.h uma_core.c Log: - Remove bogus use of kmem_alloc that was inherited from the old zone allocator. This looks like the problem, or at least that which uncovers the problem. The pmap code is calling the zone allocator as well and what happens is that you recurse on the kmem_map lockmgr lock because you allocate recursively from kmem_map. Previously, we could also allocate from kernel_map, if the kernel_map lockmgr lock wasn't held, so this way if we had a recursive call we would get around this problem. I think this whole thing is flaky in general (if this was the way to get around recursion, we should fix it). JHB and/or JeffR: why is the kmem_map lockmgr lock not recursive? Regards, -- Bosko Milekic * [EMAIL PROTECTED] * [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: ARM Port: Help with UMA subsystem needed
On Sat, Aug 03, 2002 at 03:51:20PM -0400, Jeff Roberson wrote: These locks can not be made recurisve safely. In this case you would just recurse forever and never satisfy the allocation. All pmap modules do something like the following: static void * pmap_allocf(uma_zone_t zone, int bytes, u_int8_t *flags, int wait) { *flags = UMA_SLAB_PRIV; return (void *)kmem_alloc(kernel_map, bytes); } pvzone = uma_zcreate(PV ENTRY, sizeof (struct pv_entry), NULL, NULL, NULL, NULL, UMA_ALIGN_PTR, UMA_ZONE_VM); uma_zone_set_allocf(pvzone, pmap_allocf); uma_prealloc(pvzone, initial_pvs); Assuming ARM is following the same example, perhaps it needs to pre-allocate more pvs. Although I somehow doubt it's doing the right thing here because the panic seems to happen early on during boot, according to the trace first provided. Is arm using a seperate allocf? Jeff -- Bosko Milekic * [EMAIL PROTECTED] * [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: tunings for many httpds...
On Wed, Jun 26, 2002 at 11:24:47AM -0700, Matthew Dillon wrote: :[commenting live from ottawa] Pictures! We want pictures! It's pretty cool that the Linux camp has decided to do the Summit stuff too (I'm assuming that this is a relatively new phenomenon). What's even cooler is that they picked Ottawa. I think I may be in Ottawa this weekend (Canada Day is then), so if it's still going on then, or if something else is planned, I would love to attend - if you folks don't mind a BSD developer hanging around. :-) -Matt Matthew Dillon [EMAIL PROTECTED] Regards, -- Bosko Milekic [EMAIL PROTECTED] [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: m_cat() does not update m_pkthdr.len
On Mon, Jun 24, 2002 at 09:15:09PM -0500, Mike Silbersack wrote: On Sun, 23 Jun 2002, Yahel Zamir wrote: Hi, During development of networking code in FreeBSD kernel, we noticed that m_cat(p1, p2) does NOT do some necessary things: p1-m_pkthdr.len += p2-m_pkthdr.len; p2-m_flags = ~M_PKTHDR; Thanks, Yahel. Please notify Luigi or Bosko. See the -net archives as well, I believe that this has been discussed in the last week or two. Mike Silby Silbersack There is not much about m_cat() that says that p1 has to be a packet header and that also says that p2 is a packet header or that, if it is, its packet headerness will be removed. It is up to the surrounding code to make sure that it properly deals with p1 and p2 before concatenating p2 to p1 (or after). To place the added requirement that p1 be a packet header type mbuf would be placing an additional requirement on callers to m_cat(). -- Bosko Milekic [EMAIL PROTECTED] [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: Some small projects for mutt(1)
On Thu, Jun 20, 2002 at 03:27:24PM -0500, Brandon D. Valentine wrote: On Thu, 20 Jun 2002, Bosko Milekic wrote: On Thu, Jun 20, 2002 at 01:10:39PM -0700, Matthew Hunt wrote: This shouldn't be hard to glue together without modifying mutt itself. Make a little program, foo, that takes the message on stdin, passes it through formail -x subject, massages it into a procmail rule, and appends it to some procmail rule file. The massage step should include escaping characters that have special meanings in procmail regexps, and adding something like (Re: *)? at the beginning of the subject when appropriate. Shouldn't be more than a screenful of Perl. Interesting. How would you have a key bound sequence in mutt set off the script on the message, though? For instance, if I do a ctrl+B, how would you ensure that the Right Thing happens, without modifying mutt code? Check out mutt2procmailrc written by my good friend timball: http://www.ghettohack.net/timball/ Hey, this is awesome stuff! Thanks! How come we don't have a port? It rocks. Brandon D. Valentine -- http://www.geekpunk.net [EMAIL PROTECTED] ++[++-][++-].[+-][+-]+.+++..++ +.+[++-]++.+++..+++.--..+. Regards, -- Bosko Milekic [EMAIL PROTECTED] [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: The problem with FreeBSD
Bill H., is that you? On Tue, Jun 18, 2002 at 08:39:57AM +, Bill Flamerola wrote: Okay, this is not really intended as a flame, but kinda necessary, given the current situation in the FreeBSD camp. [...useless stuff...] -- Bosko Milekic [EMAIL PROTECTED] [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: ICU_LEN with IO APIC
On Fri, May 31, 2002 at 12:12:00PM +0300, Aaro J Koskinen wrote: Hello, Is there any particular reason why the number of interrupts is limited to 32 on APIC systems? Is it just a conservative guess on the number of interrupts anyone might want to need...? I'm not sure but perhaps this is historical (and now also required again), but if we use a word to mask out interrupts than after 32 we run out of bits. Who needs more than 32 interrupts anyway?! :-) A. -- Aaro Koskinen E-mail: [EMAIL PROTECTED]I'm the ocean, I'm the giant undertow. http://www.iki.fi/aaro Regards, -- Bosko Milekic [EMAIL PROTECTED] [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: splimp() during panic?
Archie Cobbs wrote: Hi, I'm trying to debug a mbuf corruption bug in the kernel. I've added an mbuf sanity check routine which calls panic() if anything is amiss with the mbuf free list, etc. This function runs at splimp() and if/when it calls panic() the cpl is still at splimp(). My question is: does this guarantee that the mbuf free lists, etc. will not be modified between the time panic() is called and the time a core file is generated? For example, if an incoming packet causes a networking interrupt after panic() has been called but before the core file is written, will that interrupt be blocked when it calls splimp()? splimp() ensures that no driver handlers will be executed. Further, dumpsys() is called from panic() at splhigh() which would also mean that none of those potentially troublesome handlers will run. I've been working under this assumption but it seems to not be valid, because I seem to be seeing panics for situations that are not true in the core file. Are you seeing invalid stuff from DDB but valid stuff from the core file? Because if so, that's REALLY WIERD. If you're just seeing two different but invalid things, then perhaps something is happening when Debugger() runs (is it possible that the cpl() is changed after or before a breakpoint()?). If this is not a valid assumption, is there an easy way to 'freeze' the mbuf free lists long enough to generate the core file when an inconsistency is found (other than adding the obvious hack)? To make doubly-sure, what you can do is just keep a variable 'foo' which you initialize to 0. Before any mbuf free list manipulations, place a 'if (foo == 0)' check. Atomically set foo to 1 before the panic. See if the inconsistency changes. If you're seeing garbage in both cases, but the garbage is inconsistent, perhaps there's a memory problem or the dump isn't working properly (I've never heard of anything like this before). Thanks, -Archie __ Archie Cobbs * Packet Design * http://www.packetdesign.com Regards, -- Bosko Milekic [EMAIL PROTECTED] [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: deltas for sys/kern/vfs_syscalls.c and sys/kern/vfs_subr.c
On Fri, May 03, 2002 at 10:45:32PM +0100, Hiten Pandya wrote: Hi all, I am submitting a patch which removes the register keyword from sys/kern/vfs_syscalls.c. The reason I am doing this is very simple. The 'register' keyword has no effect, as compilers do enough optimizations on their own. Also, I have seen commits made before which do the same thing which I am doing now. I have talked about this patch with jmallett, and various other developers. This patch is located at: http://storm.uk.FreeBSD.org/~hiten/diffs/vfs_syscalls.diff.1 Looks good. The second issue, is what I am not very sure about, but I had a little discussion about this with rwatson. The vfs_subr.c module contains a large #if 0'ed section, which basically contains some sysctls. I think it has been forgotten for removal, so I am submitting a delta which can be used to remove that #if 0'ed section. Note, I am not very sure about this, that is why I am posting this to -hackers. The patch is located at: http://storm.uk.FreeBSD.org/~hiten/diffs/vfs_subr.c.diff.1 I don't think that removing the code is a problem. The real person to ask would be dillon, since he was the one who placed the #if 0 around the block. Thanks. If anyone finds them interesting, please commit them to the CVS repository. P.S. Please do not hesitate to contact me for more information reg. these deltas. -- Hiten Pandya http://storm.uk.FreeBSD.org/~hiten Finger [EMAIL PROTECTED] for PGP public key -- 4FB9 C4A9 4925 CF97 9BF3 ADDA 861D 5DBD E4E3 03C3 -- Bosko Milekic [EMAIL PROTECTED] [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: I want to help
On Wed, Apr 03, 2002 at 07:14:59PM +0200, digital wrote: Hi I'm from Serbia and I love most computers and programming.I am very familiar with FreeBSD operating system and I know to programme in language C, and if it is necessarirly I can learn C++.I would be very happy if I could help in some way in development of FreeBSD operating system (maybe Socket programming).I am student of Astrophysics and in free time I am learning FreeBSD so I came to idea that I can active involve in FreeBSD development. I have computer: CPU celeron2 633 SL3VS, 256 RAM,VGA TNT2 AGP M64 32MB,main board ABIT133 RAID, hard disk IDE 20.5GB Quantum Fireball Plus(LM20A011), CD-ROM IDE 40X Teac CD-540E. My email is : [EMAIL PROTECTED] If you think I can help let me know, How about you start by helping out the bsd.org.yu crew with documentation translation? The rest will/may come with time. A lot of regards and wish for best work, Dragoslav Zaric -- Bosko Milekic [EMAIL PROTECTED] [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: Patch to remove MFREE() macro entirely
On Sat, Feb 02, 2002 at 11:54:17PM -0800, David Greenman wrote: Oh what a tangled web we weave. This should be really easy for people to take a quick look at to see if I made any mistakes. I'm basically untangling the (small) mess that people made of the code while trying to use the MFREE() macro over the last N years. If nobody sees any problems it will go into -current next week some time and then be MFC'd to stable. Looks good to me. I'm definately very much in favor of killing MFREE(). Absolutely! Especially in light of the fact that in -CURRENT now-a-days, MFREE() will has no benefits and pretty much ALL the mbuf macros are deprecated (they just wrap calls to the appropriate functions). They were really big for macros and actually used to make things slower by busting the cache. -DG David Greenman Co-founder, The FreeBSD Project - http://www.freebsd.org President, TeraSolutions, Inc. - http://www.terasolutions.com President, Download Technologies, Inc. - http://www.downloadtech.com Pave the road of life with opportunities. -- Bosko Milekic [EMAIL PROTECTED] [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: mbuf chains
On Fri, Jan 18, 2002 at 04:46:18PM -0800, Skye Poier wrote: What are the rules around mbuf chain construction? (I've read man mbuf, doesnt go into much detail) In particular, I'm assuming: - all mbufs must be same type - the head mbuf must have M_PKTHDR set - the head mbuf.m_pkthdr.len must be the len of the entire chain Anything to add? Take a look at the mchain interface for a nice way to deal with mbufs in certain cases: src/sys/kern/subr_mchain.c The `rules' you state are good advice but are not _technically_ obligatory in the most general case. In other words, it is technically up to the implementor to decide on how to chain mbufs and what their meaning is. My confusion is around splitting/concatenating - When splitting an mbuf chain, the two resultant chains must be as above (heads have M_PKTHDR and mbuf.m_pkthdr.len set) right? When concatenating chains, what do you do with the M_PKTHDR that is now in the middle of the chain? m_cat doesn't seem very sophisticated in this regard. And of course update head mbuf.m_pkthdr.len Again, it all depends on what you're doing. Typically a packet consists of a chain with a head mbuf that is M_PKTHDR and contains the additional information. You don't normally do what you wrote, but again, it depends on the implementation, ultimately. Thanks Skye -- Bosko Milekic [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: huge MFREE() macro?
On Tue, Jan 01, 2002 at 10:16:25PM -0800, Matthew Dillon wrote: I noticed a bunch of routines use MFREE() instead of m_free() (which just calls MFREE()). MFREE() is a huge macro. textdata bss dec hex filename 1986399 252380 145840 2384619 2462eb kernel textdata bss dec hex filename 1983343 252380 145840 2381563 2456fb kernel We save about 3K. Any problems with this? Maybe also MFC to -stable to save some bytes? In -CURRENT, MFREE() just wraps a call to m_free(). (The #if 0's wouldn't be in a commit, I'd actually delete the code) Also, if you do a search for XXX, I think there was an MFREE in there that should have been an m_freem(). Could someone check that? See below. The patch is against -stable. -Matt Matthew Dillon [EMAIL PROTECTED] [...] Index: i386/isa/if_lnc.c === RCS file: /home/ncvs/src/sys/i386/isa/Attic/if_lnc.c,v retrieving revision 1.68.2.4 diff -u -r1.68.2.4 if_lnc.c --- i386/isa/if_lnc.c 8 Jan 2001 15:37:59 - 1.68.2.4 +++ i386/isa/if_lnc.c 2 Jan 2002 06:12:24 - @@ -839,9 +839,13 @@ sc-mbuf_count++; start-buff.mbuf = 0; } else { +#if 0 struct mbuf *junk; MFREE(start-buff.mbuf, junk); - start-buff.mbuf = 0; +#endif + /* XXX shouldn't this be m_freem ?? */ + m_free(start-buff.mbuf); + start-buff.mbuf = NULL; I guess it depends on whether start-buff.mbuf is always a single mbuf or if it ever becomes a chain. If it becomes a chain then it should certainly be m_freem(). How about placing a loop there to traverse forward and count the number of mbufs before m_next == NULL? And, if it is above exactly 1, then change that to an m_freem(). Anyone using lnc? } } sc-pending_transmits--; @@ -1702,8 +1706,12 @@ m-m_len -= chunk; m-m_data += chunk; if (m-m_len = 0) { +#if 0 MFREE(m, head-m_next); m = head-m_next; +#endif + m = m_free(m); + head-m_next = m; } } } -- Bosko Milekic [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: Patch #3 (TCP / Linux / Performance)
On Sun, Dec 02, 2001 at 11:18:42AM -0800, Matthew Dillon wrote: [...] :Does the FreeBSD tcp stack do zero copy (page flip the data to :userspace)? In the localhost case, it seems like there are two copies :to/from userspace there. : :-- :Richard Sharpe, [EMAIL PROTECTED], LPIC-1 There are zero-copy patches floating around but I haven't looked at them to determine how messy they might be. http://people.freebsd.org/~ken/zero_copy/ The main issues with the patch are, afaics: 1. It's fairly large and difficult to maintain, especially in light of the large amount of SMPng-related changes. 2. The receive code only works with certain network cards. [This is expected]. The performance may also vary based on the behavior of the application. -Matt Matthew Dillon [EMAIL PROTECTED] -- Bosko Milekic [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: Open Source Load Balancer
On Mon, Aug 20, 2001 at 01:05:28PM -0400, Bill Kish wrote: Hi All, Sorry if this posting is somewhat off topic, but I think the answer to my question will be found among the subscribers to this list. I'm trying to convince the powers that be here at Coyote Point Systems that we should release the source for our Equalizer load balancer software to the Open Source community. I think I can pull this off, but I'd like to see what sort of interest there might be in this project and perhaps begin recruiting a project team. Any thoughts on the best forum to begin this process? This mostly kernel IP stack code which currently runs on FreeBSD. Most probably [EMAIL PROTECTED] - assuming of course that this is some sort of `network load balancer.' Thanks in advance, -=BK -- --- Bill Kish Ph: 650.969.6000 Chief Engineer,3350 Scott Blvd, Bldg 20 Coyote Point Systems Inc. Santa Clara California 95054 Email: [EMAIL PROTECTED] http://www.coyotepoint.com/ --- For support call: 1-888-891-8150 Email: [EMAIL PROTECTED] --- -- Bosko Milekic [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: Allocate a page at interrupt time
On Mon, Aug 06, 2001 at 11:27:56PM -0700, Terry Lambert wrote: I keep wondering about the sagicity of running interrupts in threads... it still seems like an incredibly bad idea to me. I guess my major problem with this is that by running in threads, it's made it nearly impossibly to avoid receiver livelock situations, using any of the classical techniques (e.g. Mogul's work, etc.). References to published works? It also has the unfortunate property of locking us into virtual wire mode, when in fact Microsoft demonstrated that wiring down interrupts to particular CPUs was good practice, in terms of assuring best performance. Specifically, running in virtual Can you point us at any concrete information that shows this? Specifically, without being Microsoft biased (as is most data published by Microsoft)? -- i.e. preferably third-party performance testing that attributes wiring down of interrupts to particular CPUs as _the_ performance advantage. wire mode means that all your CPUs get hit with the interrupt, whereas running with the interrupt bound to a particular CPU reduces the overall overhead. Even what we have today, with Obviously. the big giant lock and redirecting interrupts to the CPU in the kernel is better than that... -- Terry -- Bosko Milekic [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: Allocate a page at interrupt time
On Tue, Aug 07, 2001 at 12:19:01PM -0700, Matt Dillon wrote: Cache line invalidation does not require an IPI. TLB shootdowns require IPIs. TLB shootdowns are unrelated to interrupt threads, they only occur when shared mmu mappings change. Cache line invalidation can waste cpu cycles -- when cache mastership changes occur between cpus due to threads being switched between cpus. I consider this a serious problem in -current. I don't think it's fair to consider this a serious problem seeing as how, as far as I'm aware, we've intended to eventually introduce code that will favor keeping threads running on one CPU on that same CPU as long as it is reasonable to do so (which should be most of the time). I think after briefly discussing with Alfred on IRC that Alfred has some CPU affinity patches on the way, but I'm not sure if they address thread scheduling with the above intent in mind or if they merely introduce an _interface_ to bind a thread to a single CPU. -Matt -- Bosko Milekic [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: cluster size
On Fri, Jul 27, 2001 at 08:23:37PM -0400, Zhihui Zhang wrote: I thought doing a memory free is always safe in an interrupt context. Now it seems doing an allocation of memory is safe too. Does MCLGET() call vm_page_alloc() or malloc() eventually? If so, it might block. It never calls malloc(). Sometimes, although rarely, it may end up in kmem_malloc() which calls vm_page_alloc(), but vm_page_alloc() should not block as in this case it will be called with the VM_ALLOC_INTERRUPT flag. -Zhihui -- Bosko Milekic [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: cluster size
On Sat, Jul 28, 2001 at 01:44:25PM -0700, Terry Lambert wrote: Zhihui Zhang wrote: I thought doing a memory free is always safe in an interrupt context. Now it seems doing an allocation of memory is safe too. Does MCLGET() call vm_page_alloc() or malloc() eventually? If so, it might block. The mbuf allocator uses the zone allocator. No, it doesn't. When it needs to dip into the VM code (which is rare), it uses kmem_malloc() and so vm_page_alloc()-ates via the kmem_object. Since kmem_map is scaled accordingly to accomodate the mbuf maps (mbuf_map and clust_map), which are submaps of the former, this works out similarily to the zone allocator. Note that the mbuf allocations were NEVER done with the zone allocator. The cool thing about managing mbufs via a map is that it *does* allow for us to unwire associated pages in case we decide to actually free back to the map. Previously, this was never implemented but with the new allocator, the framework is present to allow for freeing of pages to be implemented. If implemented properly, this could allow for the system to re-adapt even if the character of the load changes with time, without affecting allocation performance. The reason this works at interupt is that the page table entries for the memory are already in place in the kernel, but the actual allocations have not taken place. When you are running with less than a full complement of RAM (e.g. 4G on a 32 bit Intel machine), this will permit you to do the allocations of physical RAM later, and have a KVA (kernel virtual address) space that exceeds the amount of physical memory. In practice, this means that your system is not specifically tuned for particular loading, until the memory is committed (when that happens, say, by using all possible mbufs, then you are unable to recover the memory to the system memory pool: it has become type stable). This lets you have a mostly general system that then commits resources based on the character of its load, yet which does not permit the character of the load to change over time. See previous paragraph. When you have all the memory you can address in physical space, then the problem changes somewhat, and you basically do not overcommit resources. The upshot of having the page descriptors preallocated, however, is that you can allocate in interrupt context, and the zone headers are statically allocated at compile time, instead of being malloc'ed later in the kernel boot cycle. You should look at the ziniti and zalloci code: the zone allocator code. The mbuf issue has recently been a bit obfuscated by the -current commit of a replacement allocator, which is mbuf specific. I think this new allocator has some unforgivable drawbacks; you yould be better off looking at the 4.3 kernel source code to get an idea of why interrupt allocations work. Again, the actual allocation code has LITTLE changed even in the new allocator. I simply don't understand where you get the idea that mbufs were ever allocated with the zone allocator but I suspect that if you went ahead and read the new code, you'd realize that for what concerns actually memory allocation, very little has changed vis-a-vis the older allocator. So, in general: 1)Only some allocators can be used at interrupt time 2)If they can, they must precommit kernel address space to the task 3)Once memory is allocated from one of these pools, it is never returned to the system for reuse This (3) only applies to the zone allocator. With maps, you *can* free back to the map and unwire the wired pages (freeing physical memory). 4)The general malloc() code _can not_ be used at interrupt time in FreeBSD (but SVR4's allocator can). Huh? Do you realize that in much much earlier versions of FreeBSD (not long after the import from 4.4BSD, or whatever it was uipc_mbuf.c was initially imported from) all _MBUFS_ were allocated directly with malloc()? Obviously, mbufs are allocatable at interrupt time (and always were, afaic remember). All that you have to make sure to do, when allocating at interrupt time is to allocate with the M_NOWAIT flag. -- Terry -- Bosko Milekic [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: cluster size
On Thu, Jul 26, 2001 at 10:18:09AM -0700, Terry Lambert wrote: The real reason behind all this is to make the input and output routines symmetric, since mbuf's can be allocated at interrupt, and clusters can't (or couldn't, last time I looked at 4.3). They can. Whether they are or not I'm not sure. -- Terry -- Bosko Milekic [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: cluster size
On Thu, Jul 26, 2001 at 10:51:40AM -0700, Terry Lambert wrote: Alfred Perlstein wrote: On Thu, Jul 26, 2001 at 10:18:09AM -0700, Terry Lambert wrote: The real reason behind all this is to make the input and output routines symmetric, since mbuf's can be allocated at interrupt, and clusters can't (or couldn't, last time I looked at 4.3). They can. Whether they are or not I'm not sure. Er, wouldn't that be the only way for cards to refil thier DMA recieve buffers? Look at the Tigon II and FXP drivers. The allocations in the macros turn into m_get, not m_clusterget. From if_fxp.c (fxp_add_rfabuf(), sometimes called from fxp_intr()): MGETHDR(...); -- get mbuf if (m != NULL) { MCLGET(...); -- get cluster ... } -- Terry -- Bosko Milekic [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: cluster size
On Wed, Jul 25, 2001 at 01:51:51PM -0400, Zhihui Zhang wrote: On Tue, 24 Jul 2001, Terry Lambert wrote: Zhihui Zhang wrote: Hi, in freebsd can we change the cluster size from 2048 bytes.If yes how can we do that? do we have to configure in some file? You must be asking why the mbuf cluster size is chosen as 2048, right? It is probably a tradeoff between memory efficient and speed. Ask yourselves: What is the minimum cluster size I would have to have to be able to contain the maximum MTU worth of data, yet remain an even multiple of sizeof(mbuf) -- 256 bytes? A dumb question: why even not odd multiple? -Zhihui It actually has to do with the fact that 2K is the only size equal to or greater than the maximum MTU worth of data that can be multiplied to a page size without any leftover (in other words, page size modulo 2K is zero). -- Bosko Milekic [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: cluster size
On Wed, Jul 25, 2001 at 02:17:38PM -0400, Zhihui Zhang wrote: I see. It has something to do with the power-of-two allocator we are using inside the kernel. No, it has nothing to do with the power-of-two allocation strategy used in some cases inside the kernel. 2K is just the most convenient size for a cluster as it fits the maximum MTU size while at the same time fitting nicely into a page, reducing allocation complexity. -Zhihui -- Bosko Milekic [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: kernel malloc
On Mon, Jul 23, 2001 at 12:37:55PM +0100, vishwanath pargaonkar wrote: Hi, thx for ur reply. i wanted to know in side kernel is there any limit to the malloc that a user can do.what you told in ur previous mail is that at a time user can malloc 4k.but No. You _can_ malloc over 4k and I never said that you could not. All I said was that if you do malloc() a buffer larger than PAGE_SIZE that the buffer will likely not be contiguous in physical memory. What that means is that your buffer may span across two non-contiguous physical pages. Usually you won't care unless you're DMAing into the buffer, or relying on the physical pages to be contiguous. suppose i am doing 2k memory allocations. how many such mallocs i can do? In the kernel, you can do as many as you want. That is, until you run out of physical memory or until you exhaust the kmem_map virtual address space, whichever comes first. is there any configuration we can do depending on our RAM size? please reply. thx vishwanath Regards, -- Bosko Milekic [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: cluster size
On Mon, Jul 23, 2001 at 11:01:27AM -0500, Dan Nelson wrote: In the last episode (Jul 23), vishwanath pargaonkar said: in freebsd can we change the cluster size from 2048 bytes.If yes how can we do that? do we have to configure in some file? Actually, the block size is 8192 bytes by default, with fragment size of 1024 bytes. You pick the sizes when you run newfs with the -b and -f options. I think he was referring to the mbuf cluster size being 2K. In any case, I think the question is way too ambiguous to be answered properly. -- Dan Nelson [EMAIL PROTECTED] -- Bosko Milekic [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: kernel malloc
On Fri, Jul 20, 2001 at 10:17:20AM +0100, vishwanath pargaonkar wrote: Hi, can any one please help me with this. i want allocate a memory in the kernel -a buffer of size 2k to 5k. can i do it using malloc with second parameter as M_TEMP and third as M_WAITOK. can anybody tell me what M_TEMP means .what is maximum malloc i can do with M_TEMP? will the OS allow me to malloc 4k buffer in side kernel??shd i give M_WAITOK or M_DONTWAIT??? M_TEMP is merely there for statistics gathering. If you're writing a subsystem and plan to malloc() a lot of things for the subsystem you may want to create your own malloc type (see malloc(9)). On another note, remember that if you allocate a 5k buffer with malloc() on x86 where the page size if 4k, that you're not guaranteed to have a physically contiguous backing. please tell me. thanx in advance. Regards, -- Bosko Milekic [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: Network performance roadmap.
On Fri, Jul 13, 2001 at 04:37:46PM -0500, Mike Silbersack wrote: Jiangyi Liu has been working on mbuf limiting code for the past week or so. What he has is pretty complete, I expect to get most of it committed once Bosko gets back. Well, I'm back. I'm now going to bed but my INBOX awaits. Mike Silby Silbersack -- Bosko Milekic [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: Article: Network performance by OS
you to characterize the load you expect, which I am sure results in some non-defaults for a number of tuning parameters. Similarly, it has opportunity to notice the network hardware installed: if you install a GigaBit Ethernet card, it's probably a good be that you will be running heavy network services off the machine. If you install SCSI disks, it's a pretty good bet you will be serving static content, either as a file server, or as an FTP or web server. Tuning for mail services is different; the hardware doesn't really tell you that's the use to which you will put the box. On the other hand, some of the tuning was front-loaded by the architecture of the software being better suited to heavy-weight threads implementations. Contrary to their design claims, they are effectively running in a bunch of different processes. Linux would potentially beat NT on this mix, simply because NT has more things running in the background to cause context switches to the non-shared address spaces of other tasks. Put the same test to a 4 processor box with 4 NIC cartds, and I have no doubt that an identically configured NT box will beat the Linux box hands down. A common thread in these complaints that the results were somehow FreeBSD's fault, rather than the fault of tuning and architecture of the application being run, is, frankly, ridiculous. I completely agree. :-))) -- Terry Cheers, -- Bosko Milekic [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: Article: Network performance by OS
On Sat, Jun 16, 2001 at 02:14:14PM -0700, Matt Dillon wrote: It's certainly true that a greater degree of dynamic tuning could be done, but all this benchmark proves (in regards to the TCP results) is that FreeBSD puts its foot down earlier then other OS's in regards to how much it is willing to dedicate to the network. In a real life situation where you may be running a multi-user load or a large database, the very last thing you want to do is shift every last bit of your resources away from the users or the database and to the network when an 'unexpected load' comes in (unexpected meaning something that is a factor of 100 or 1000x what the machine normally handles). The truth of the matter is that no amount of dynamic tuning can handle every situation... at some point you have to manually tune the box. FreeBSD does exactly the right thing on an untuned box by capping the network resources. If the authors want to run the machine into the ground with a benchmark, they have to tune the machine properly to handle the load because FreeBSD anyway is more interested in keeping the integrity of the machine as a whole together then it is tuning itself to match some idiot who thinks he is gods own gift to humanity running a benchmark. This is the best written paragraph on the issue in this entire thread. This is exactly my philosophy toward the whole thing. And I can tell you from previous dealings with companies that use FreeBSD as their main platform that this is one of the main reasons why. -Matt Regards, -- Bosko Milekic [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: questions regarding the MGET function.
Shankar Agarwal wrote: Hi, Can you please tell me when did the MGET function change it implementation from using MALLOC to using pool_get to allocate a mbuf. I Never. We don't use pool_get(). That's a NetBSD-ism. :-) The mbuf subsystem uses its own allocator and stats are kept in mbstat which is exported via sysctl. Things like netstat(1) can fetch this information for you. am having a trouble finding out how does the memstats keep track of the mbufs allocated through pool_get. Thanks Regards Shankar To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: Re: Re: postfix: No buffer space available
Since nobody else has asked this, I think I will: What network device are you using and with what driver? Please show the output of `ifconfig -a' when you notice this problem. Finally, try `ifconfig the_interface down' followed by `ifconfig the_interface up' when you notice this, and see if it temporarily fixes the problem. Thanks to Matthew Dodd and NetBSD, I think we may have a solution to the ep wedging problems (which has similar symptoms, by the way) sometime soon (i.e. when I get around to it this weekend, after first mid-term, if noone beats me to it). In the meantime, it would be nice to know if there are other devices exhibiting this behavior. (All this assuming, of course, that what you're describing is not the result of a kernel resource shortage, such as mbuf starvation, etc.) Regards, Bosko. Renaud Waldura wrote: But neither parameter takes effect. They may be read-only if you're running with securelevel 0. Otherwise they "take effect" just fine. Anybody got any other ideas how scale FreeBSD up to postfix's needs? Yes, recompile your kernel with "maxusers 128" or more. This tweaks a bunch of stuff, notably mbufs. E.g. with 128 "users" I've got: 226/1920/10240 mbufs in use (current/peak/max): 159 mbufs allocated to data 67 mbufs allocated to packet headers 130/1438/2560 mbuf clusters in use (current/peak/max) 3116 Kbytes allocated to network (9% in use) 0 requests for memory denied 0 requests for memory delayed 0 calls to protocol drain routines --Renaud - Original Message - From: "Len Conrad" [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Tuesday, February 20, 2001 1:36 PM Subject: Fwd: Re: Re: postfix: No buffer space available Here's what has happened with the advice earlier: tried to add the following via sysctl.conf kern.ipc.maxsockets = 5000 kern.ipc.maxsockbuf = 524288 But neither parameter takes effect. are these read-only values?? and: # netstat -m 445/720/4096 mbufs in use (current/peak/max): 172 mbufs allocated to data 273 mbufs allocated to packet headers 154/252/1024 mbuf clusters in use (current/peak/max) 684 Kbytes allocated to network (61% in use) 0 requests for memory denied 0 requests for memory delayed 0 calls to protocol drain routines Anybody got any other ideas how scale FreeBSD up to postfix's needs? tia, Len http://BIND8NT.MEIway.com : Binary for ISC BIND 8.2.3 for NT4 W2K http://IMGate.MEIway.com : Build free, hi-perf, anti-spam mail gateways To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: Problems attaching an interrupt handler
This: bus_teardown_intr(dev, sc-irq, sc-ih) != 0 ); looks pretty odd. See your ir_detach(). Alex wrote: Hi, I started experimenting with kernel hacking to write an infrared device driver. Therfore I read Alexander Langer's article on DaemonNews and started modifying the led.c example code. Unfortunately I can't get my interrupt handler working. Could anyone please have a short look on my code. On loading the module the first time everything stays stable and vmstat -i shows 1 INT on my device. After unloading the module and reloading it the kernel crashes on the next incoming interrupt. Any ideas? Alex To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: One thing linux does better than FreeBSD...
Hey! These images are hip! :-) Cheers, Bosko. Matthew N. Dodd wrote: On Tue, 16 Jan 2001, Poul-Henning Kamp wrote: Isn't there *anybody* here who has a SO/family member/neighbor in the graphic/design business ? Yes. http://www.svaha.net/daemon/index.html -- | Matthew N. Dodd | '78 Datsun 280Z | '75 Volvo 164E | FreeBSD/NetBSD | | [EMAIL PROTECTED] | 2 x '84 Volvo 245DL| ix86,sparc,pmax | | http://www.jurai.net/~winter | This Space For Rent | ISO8802.5 4ever | To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: FIN_WAIT_2 / TIME_WAIT Confusion
Hi Michael, What version of FreeBSD are you running? If it's not too much trouble, can you please provide the code you're using to simulate the problem? Are the TIME_WAIT state connections eventually timing out/disappearing? Michael wrote: If this is not proper place to ask this, let me know and I'll go elsewhere as it is a TCP question. . . but I specifically use (and prefer) FreeBSD. I wrote a simple little I/O multiplexing thing that can act as a client or server as a personal project in network programming. Everything seems fine, except that when I use the client to make multiple connections to a web server. Even though I don't primarly use it for this, the following behvior has me curious. I will run my client about 2 or three times, each time it makes 5 connections, pulling back the main page. Then the weird behavior starts: 1. I will get all data back from all connections except for one, perhaps two, which then sit in a FIN_WAIT_2 or sometimes TIME_WAIT state. 2. When I run netstat -a, it indicates that there is data in the read queue for these clients, but select() always returns 0 ready file descriptors. That's what puzzles me. There is data there to be gotten, but I am not getting it. When I look at the data that comes back in tcpflow, it doesn't look like the whole document has made it back either. A couple of runs might work perfectly, then once or twice will be weird. And it seems to multiplex more connections more reliably than fewer (the weird behavior seems inversely proportional to the number connections---to a point of course. The client runs reliably more times with 50 connections than with 5). Three notes: 1. It seems to happen more if I access machines on my LAN than over the Internet. 2. I do make sure and shutdown the write side of the socket after I send the HTTP request so as to avoid keeping the web server in FIN_WAIT_2. 3. I am sure about having the maxfd + 1 in select() correct, so that's not the problem. Does anyone have any ideas as to what's going on? To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: FreeBSD vs Linux, Solaris, and NT
Dennis wrote: : Still, I personally believe, that "core" or general "freebsd community" : should explicitly state, that support for binary drivers and support for : easier inclusion of binary driver or just third party driver is eagerly : encouraged. And as much as possible, easy inclusion of binary drivers : sould be kept in mind whether makeing changes to /usr/src/Makefile or : kernel interfaces or even discussions on the freebsd lists. Core has stated in the past a strong desire for developers not to break kernel interfaces within minor releases. 4.1 broke that "policy" rather badly. Perhaps its time to get rid of the mbuf macros, as any change to that structure breaks binary compatibility in the worst way possible. DB The "problem" was not with the macros themselves, but with the fact that your outdated binary was compiled with old definitions of some structures which were later changed (mbstat structure). The changes that happened there were relatively minor. I'm sure you would know all this had you debugged the problem yourself, but it turns out that all you provided in terms of "support" was whining and directing blame at the FreeBSD team. I disagree with not merging in fixes to -STABLE that help maintain code in general, for the entire project; In this case, the change helped userland code such as netstat(1) deal with mbtypes. This wasn't a "big interface change" by any means. Plus, it was discussed on -net and since -net directly concerns you and your driver, perhaps you should read it every once in a while. Had we not merged this change to -STABLE, I'm sure we would have had just as many, if not more requests: "MFC MFC, you guys are ignoring -STABLE!" as we have now with you complaining about the change being made. A wise man once said something along the lines: "you can never win with tire-kickers," and now I see how he was right. Regards, Bosko. To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: Why not another style thread? (was Re: cvs commit: src/lib/libc/gen getgrent.c)
On Sun, 17 Dec 2000, Chris Costello wrote: On Sunday, December 17, 2000, Jacques A. Vidrine wrote: What do folks think about 1)if (data) free(data); versus 2)free(data); versus 3)#define xfree(x) if ((x) != NULL) free(x); xfree(data); 2. The C standard dictates that free() does nothing when it gets a NULL argument. The other two are just extra clutter. Agreed. However, in the kernel, all free()s should be made as in (1), in my opinion. (2) is dangerous, and (3) would just obfuscate the code. (I know this does not apply to the commit, but should be noted) -- +---+-+ | Chris Costello| This system will self-destruct in five minutes. | | [EMAIL PROTECTED] | | +---+-+ Later, Bosko Milekic [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: crash on 4.2-stable (sendto() system call)
Hello, Can you please also get the instruction at which the page fault occured? You can try "where" from gdb or you can get the instruction pointer from the original page fault message and then you can probably "disassemble fr_makefrip" and get us the contents around the instruction generating the fault. On Thu, 23 Nov 2000, FengYue wrote: Hi, got a crash on 4.2-stable. the machine was running 4.1.1-stable and had no problem at all. 10 hours after upgrade to 4.2-stable I got a vmcore. Here it's the trace and could someone take a look, it looks like it was the sendto() call triggered the crash but I don't know how to reproduce it. Thanks --- initial pcb at 24c320 panicstr: page fault panic messages: --- dmesg: kvm_read: --- #0 0xc013336e in dumpsys () (kgdb) bt #0 0xc013336e in dumpsys () #1 0xc013318f in boot () #2 0xc013350c in poweroff_wait () #3 0xc0200461 in trap_fatal () #4 0xc0200139 in trap_pfault () #5 0xc01ffd1f in trap () #6 0xc01882dd in fr_makefrip () #7 0xc018e20c in fr_checkicmpmatchingstate () #8 0xc018e44d in fr_checkstate () #9 0xc0188ecc in fr_check () #10 0xc017d124 in ip_output () #11 0xc017b416 in icmp_send () #12 0xc017b397 in icmp_reflect () #13 0xc017acbd in icmp_error () #14 0xc0185be4 in udp_input () #15 0xc017bdcb in ip_input () #16 0xc017be2b in ipintr () #17 0xc01f69d5 in swi_net_next () #18 0xc0153881 in sendit () #19 0xc0153975 in sendto () #20 0xc020070d in syscall2 () #21 0xc01f5575 in Xint0x80_syscall () Cannot access memory at address 0xbfbffc8c. Regards, Bosko Milekic [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: Log analysis program running under apache reboots server!
Likely, you're getting a panic() and since you likely don't have debugging options, the machine eventually reboots itself. Notice that this is all "likely" and that since we don't have a crash dump, stack trace, or similar debugging information, that there's not much that can be done except guessing. I would suggest that you try to reproduce the problem on a local machine and get some debugging info. On Mon, 13 Nov 2000, Nicole wrote: Silent reboot :( I hate to respond to my own message.. But the server is remote.. But there is nothing in the logs afterwards.. and nothing appears on the screen when it occurs. Nicole [...] apacheuser:\ :manpath=/usr/share/man /usr/X11R6/man /usr/local/man:\ :cputime=4h:\ :datasize=64M:\ :stacksize=4M:\ :filesize=infinity:\ :memoryuse=64M:\ :priority=0:\ :datasize-cur=32M:\ :stacksize-cur=32M:\ :coredumpsize-cur=0:\ :maxmemorysize-cur=64M:\ :memorylocked=32M:\ :maxproc=128:\ :openfiles=256:\ :tc=standard: ## standard - standard user defaults ## standard:\ :copyright=/etc/COPYRIGHT:\ :welcome=/etc/motd:\ :setenv=MAIL=/var/mail/$,BLOCKSIZE=K:\ :path=~/bin /bin /usr/bin /usr/local/bin:\ :manpath=/usr/share/man /usr/local/man:\ :nologin=/var/run/nologin:\ :cputime=1h30m:\ :datasize=8M:\ :stacksize=2M:\ :memorylocked=4M:\ :memoryuse=8M:\ :filesize=8M:\ :coredumpsize=8M:\ :openfiles=24:\ :maxproc=32:\ :priority=0:\ :requirehome:\ :passwordtime=90d:\ :umask=002:\ :ignoretime@:\ :tc=default: default:\ :cputime=infinity:\ :datasize-cur=22M:\ :stacksize-cur=8M:\ :memorylocked-cur=10M:\ :memoryuse-cur=30M:\ :filesize=infinity:\ :coredumpsize=infinity:\ :maxproc-cur=64:\ :openfiles-cur=64:\ :priority=0:\ :requirehome@:\ :umask=022:\ For starters, I don't see "sbsize" in there, although it doesn't sound like something that should be causing a panic() anymore anyway. Please provide more debugging infos. Thanks, Bosko Milekic [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: zero copy TCP
On Mon, 13 Nov 2000, Jin Guojun wrote: Both, but I may do either way, depending on which way is easier. If we can directly DMA from a disk drive to a NIC, that will be great. If the current implementation requires preloaded buffer, that works. So, where can I look for the patch? Thanks, -Jin Please see sendfile(2). Regards, Bosko Milekic [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: post-install of kernal sources, maxusers max?
On Wed, 8 Nov 2000, Mike Silbersack wrote: I think you can up the mbuf related settings while the system is running. Give it a try. The two sysctls you'll want to fiddle with are: kern.ipc.nmbclusters kern.ipc.nmbufs Nope. These are read-only but can be tuned from loader. You can determine which is needed more through a quick netstat -m. Mike "Silby" Silbersack Cheers, Bosko Milekic [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: When do you want to see panics?
No, not in the general case, they are not normal! So feel free to provide the info. :-) On Thu, 5 Oct 2000, Michael Lucas wrote: Not sure if this is on-topic, but what the heck: I've started playing a little more freely with my laptop. One result is comparatively frequent panics when doing things I know damn well are almost certain to fail, say, while playing with the Linuxulator or in mount_union. Are these panics debugger dumps something people want to see, or is the general attitude "then don't *do* that!" ? If you folks want 'em, I'll send them. (I suppose the generalized form of this question is, "Are panics normal when the sysadmin is a behaving like a damned fool?" ;) Thanks, Michael -- Michael Lucas [EMAIL PROTECTED] http://www.blackhelicopters.org/~mwlucas/ Big Scary Daemons: http://www.oreillynet.com/pub/q/Big_Scary_Daemons Regards, Bosko Milekic [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: mbuf re-write(s), v 0.1
On Thu, 6 Jul 2000, Bosko Milekic wrote: I've recently had the chance to get some profiling done. I used metrics obtained from gprof, as well as the (basic block length) * (number of executions) metric generated by kernbb. The latter reveals an approximate 30% increase in the new code, but does not necessarily imply that time of execution is increased by that amount. gprof makes a fair estimate on execution time, and reveals that the new code is, worse case scenario 30% slower, and best case scenario, negligeably slower. Of course, I'm leaving out some details here, because I've decided to change things a little, in order to further improve (and significantly, at that) the performance of the new code. Note however that the 30% overall APPROXIMATE increase is not something I would consider significant, especially since the allocator/free routines don't hold much %time, and are not the bottleneck in any of the call graphs. I did decide to make drastic changes, however, in order to maintain with the 0-tolerance policy, even if it involves somewhat getting rid of a cleaner interface and adopting a "kernel process." See below. You can disregard the above data. I actually found something detrimental (seriously) to performance. During MFREE, the code would free the page in question if at the time the number of mbufs on the free list exceeds (even by a little) min_on_avail. This is fine. The problem was in MGET/MGETHDR where the code would explicitly allocate when how==M_WAIT and number of mbufs on free list min_on_avail (this was a feeble attempt at making M_NOWAIT allocations even faster). The potential problem is not so obvious: numerous M_WAIT allocs will ALWAYS allocate a page from the map while min_on_avail mbufs on free lists. And, MFREE would almost ALWAYS have to free back to the map as at this point, the number of mbufs on the free lists fairly quickly reaches min_on_avail. So what would happen is a page would be allocated, freed, allocated, freed, etc. m_get + m_free would be an endless cycle of m_mbmapalloc and m_mbmapfree, which increases overhead significantly. After fixing MGET/MGETHDR, I'm getting more promising results. I'll get some hard data and post it later tonight, hopefully. Oh, and I'm still open to the kernel process idea. I'll need one such beast anyway, because it will help minimize page fragmentation for the allocator, on request. -- Bosko Milekic * Voice/Mobile: 514.865.7738 * Pager: 514.921.0237 [EMAIL PROTECTED] * http://www.technokratis.com/ To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: mbuf re-write(s), v 0.1
On Sun, 2 Jul 2000, David Greenman wrote: Yes, malloc is slow for other reasons, but it is especially slow when VM pages are freed back to the general pool. Of course it is possible to introduce hysteresis in the algorithm such that it doesn't free the pages as often, but this (and all the tunables that you proposed) has the negative effect of making the allocator more complex. We've tried very hard not to do this in the current mbuf allocator, making it nearly as efficient as you can get. * Have you looked at the code I proposed? http://24.201.62.9/code/mbuf/ (I did some simplification recently, but it's not done yet, so you may want to look at it). * Again, I did NOT use malloc()/free() to allocate mbufs. Effectively, I do something similar to NetBSD's "pool" interface, only much SIMPLER. * I only proposed ONE additional tunable, and that's the one I mentionned previously. It has the effect of maintaining speed for those who would prefer to have it done in a similar way to before. * I agree with this: - the present allocator is simple - the present allocator is efficient So is the new one, but since it introduces a new useful feature, which has the effect of freeing physical memory when it isn't needed and when the administrator agrees to do so, it's "simple" and "efficient" in its own class. By the way, I'm very open to comments and optimisation suggestions, so if it's not as efficient as possible right now, then I'd love to hear suggestions pertaining to that, but that would maintain the new functionality. I guess I just don't see the problem on any of the servers that I manage (ftp.cdrom.com and ftp.freesoftware.com, for example). There are peaks in usage, but they tend to reach the peaks often enough that freeing the pages for short term memory gain is just a waste of CPU cycles. Memory is so cheap these days that throwing memory at the problem seems to be a very reasonable solution, especially when the system clearly needs it during the peaks. -DG David Greenman Co-founder, The FreeBSD Project - http://www.freebsd.org Manufacturer of high-performance Internet servers - http://www.terasolutions.com Pave the road of life with opportunities. I'm getting the unfortunate impression that evolution is being frowned upon here. Are their other people that frown the proposal out there to this extent? (i.e. "don't change it if it works") I'd like to hear some important voices on this issue so that I can decide whether to just drop this entire thing and forget about it. (in other words, what do committers and/or core have to say about this?) Aside from this, I've gotten several other "pro" opinions on this; some people have even sent suggestions. So I know that I am not the only one (not by far, in fact) to see an opportunity to benefit from this. Either way, I know *I* will be using this code in time to come, so I suppose the question is: Would you consider committing this code or should I stop posting any changes I make in the future altogether? -- Bosko Milekic * Voice/Mobile: 514.865.7738 * Pager: 514.921.0237 [EMAIL PROTECTED] * http://www.technokratis.com/ To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: mbuf re-write(s), v 0.1
dr struct in mbufs (pointer) that points to the mbuf's corresponding page descriptor structure, so that pointer is aquired and the free mbuf chain is extracted from the structure to which the freed mbuf is attached (as it always was). I guess the only real addition in CPU cycles here is the following: a simple check was added that just checks if the entry is on the "empty" list and if it is, moves it over to the "free list." If that's not the case, then there is a possibility that the freed mbuf completes a page and the page can be freed, so if that's the case and min_on_avail allows it, then the page is freed back to the map (notice that this behavior is tunable - again - with min_on_avail). I'm not trying to 'frown upon evolution', unless the particular form of evolution is to make the software worse than it was. I *can* be convinced that your proposed changes are a good thing and I'm asking you to step up to the plate and prove it. That sounds fair. -DG David Greenman Co-founder, The FreeBSD Project - http://www.freebsd.org Manufacturer of high-performance Internet servers - http://www.terasolutions.com Pave the road of life with opportunities. -- Bosko Milekic * Voice/Mobile: 514.865.7738 * Pager: 514.921.0237 [EMAIL PROTECTED] * http://www.technokratis.com/ To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: mbuf re-write(s), v 0.1
On Mon, 3 Jul 2000, Poul-Henning Kamp wrote: Considering the prominence of DoS attacks and similar, I think it makes a lot of sense to be able to free the memory again, and if the hysteresis you have built in means that there is no measurable performance impact I think you will face no objections. That was one of the reasons of writing. Oh, and there's something I forgot to mention previously. The code I presently have frees memory dedicated to mbufs, so obviously, it's significant, but it's even more significant in the case of mbuf clusters, as they are larger. I still haven't finished writing the cluster stuff though but expect it to be similar in concept and design. Is it possible to auto-tune min_on_avail somehow ? What if instead you made it free only when more than 50% of the memory allocated from the map was unused ? min_on_avail is presently a sysctl but I do expect to have it optionally autotuned - read below. Could that freeing be done by a timeout routine which runs every N seconds ? Ah! Finally, you've read my mind! The design has been made with the idea of the possibility of a "kernel process" running [optionally] periodically which will take care of such issues. * reducing fragmentation by moving page descriptor structure nodes with almost complete free lists to the bottom of the "free" doubly-linked list * possible auto-tuning of min_on_avail; I will be expanding mbstat to include allocator statistics, so that the number of times the VM allocation routine and the VM free routine have been called can be recorded and used for such purposes. * drain routine to free pages back to VM system In other words, the free page back to mb_map routine takes as an argument a node on the free list, so the "timeout" daemon can be made to walk the free list and pick out full available pages from the list and return the space to the map, on the condition that min_on_avail is respected. The issue with doing this however is that it will have to splimp() while walking the lists, so the issue being with whether it's really much of an advantage (as opposed to freeing from MFREE if necessary). On the other hand, what I think would be more of an advantage is having MFREE only call m_mbmapfree() [the new free routine] if (how) == M_WAIT. If (how) == M_NOWAIT, then the mbuf will just be attached to its corresponding page descriptor's free chain. I try to take advantage of (how) being M_WAIT as much as possible. For instance, during allocation, even if the free list is not empty but (how) is M_WAIT, the system will still fetch a new page and allocate from it if the number of free mbufs are less than min_on_avail. This is to minimize the calling to m_mbmapalloc() when allocations are to be done with M_NOWAIT (i.e. from interrupts). -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 [EMAIL PROTECTED] | TCP/IP since RFC 956 FreeBSD coreteam member | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. -- Bosko Milekic * Voice/Mobile: 514.865.7738 * Pager: 514.921.0237 [EMAIL PROTECTED] * http://www.technokratis.com/ To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: mbuf re-write(s), v 0.1
On Thu, 29 Jun 2000, David Greenman wrote: We used to do this in FreeBSD, but found that it was a bad idea for performance reasons. Freeing and reallocating memory from the high-level VM system is quite expensive and the trend in NICs these days is towards needing the code to be even faster, not slower. Further, if the 'peak' is reached often, then you're probably not really gaining much by freeing the memory back to the common pool. -DG What was previously done at some point was use the kernel malloc() to allocate mbufs. As you know, this is a general purpose allocator that has to first determine what algorithm to use and then store the object correctly according to its size. This allocator is faster than that one. This allocator knows that it only has to deal with mbufs and knows that all of these mbufs are of the same size. I am not proposing to return to malloc(), I am proposing the new allocator. Also, the "peak" in this case is not reached often, obviously. It is designed with just that idea in mind. But, if the administrator feels that it is, I have provided the following mechanism: { jehovah:/home/bmilekic } sysctl -A | grep min_on_avail kern.ipc.min_on_avail: 0 With this sysctl, the administrator can set a "minimum required" count for mbufs. In other words, it is possible to easily tell the system to keep as many mbufs as you'd like cached on the free lists. David Greenman Co-founder, The FreeBSD Project - http://www.freebsd.org Manufacturer of high-performance Internet servers - http://www.terasolutions.com Pave the road of life with opportunities. -Bosko -- Bosko Milekic * Voice/Mobile: 514.865.7738 * Pager: 514.921.0237 [EMAIL PROTECTED] * http://www.technokratis.com/ To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: Re[2]: mbuf re-write(s), v 0.1
On Thu, 29 Jun 2000, Joe McGuckin wrote: What about a slab allocator (e.g. http://www.cnds.jhu.edu/~jesus/418/SlabAllocator.pdf) Joe What's your motivation behind this recommendation? What you're essentially suggesting is a replacement for our kernel malloc(). This will not make mbuf allocations faster by any means. The mbuf allocator that I presented in code a week or so ago does something very simple, as there is no point in making same-sized object allocations complicated, really, especially when they are small objects; in other words, I did NOT suggest going back to using malloc() for mbufs. I wrote a new, simpler and faster customized allocator which does essentially this: * Check free list, which is a doubly-linked list of "mb_map page descriptor structures" (new structure, mbpl_pg_descr). This structure contains very basic and essential information, such as the address of the VM page, the number of mbufs that are "in use" on that page, etc. If there is a node present on that general free list, then there is a free mbuf, and allocation from the map is not necessary. Grab free mbuf. If node now no longer contains any free mbufs, detach it from this list and attach it to free list. * If nothing on free list, allocate from map, also allocate memory for mbpl_pg_descr node for the obtained page and break page down into n objects, attaching it to the free list. Future allocations can allocate from that page until we run out. Freeing is equally simple: * Compute index into global array of pointers to mbpl_pg_descr structures based on the address of the mbuf. Locate node and determine on which list its on. * Place mbuf back on that mbpl_pg_descr's free list and if the node was previously on the empty list, move it to the free list, as there is now at least one free mbuf available on it. * If the freed mbuf completes the page, the page can be freed back to the map, but ONLY free it back if min_on_avail is met (sysctl). So you see, it is possible in this way to control the free list, and have many objects cached on the free list, essentially going back to the behavior we presently have, if that's what the sysadmin wants, with only the little overhead of having to deal with the linked lists (which isn't much, as they are both doubly-linked, so insertion/removal is fast). That's it, roughly. I hope this clears up some things for those of you who didn't look at the actual code. Regards, Bosko. -- Bosko Milekic * Voice/Mobile: 514.865.7738 * Pager: 514.921.0237 [EMAIL PROTECTED] * http://www.technokratis.com/ To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: mbuf re-write(s): v 0.2: request-for-comments
On Wed, 28 Jun 2000, Dennis wrote: YES! This is wonderful news. I started coding device drivers on Digital UNIX and have long missed this feature. I can't count the number of times I've gotten 90% of the way through doing something with ext mubfs thought to myself "oh hell, now what am I going to do for an m_ext.ext_ref() function?" On a less enthusiastic note, the amount of whitespace changes make it very difficult to eyeball your diff. Could you re-roll your diffs with -b (to ignore your whitespace changes). Its not really "wonderful" to those that have already implemented something using the old method. What version is this "patch" likely to find its way into the mainstream code (or will it), as its likely to break our drivers. Dennis You can cast the void * argument to basically anything you like, so there is little chance that it will break your drivers to the order which you appear to be suggesting. All it would really do is reduce coad bloat and make things less scattered. Actually, network device drivers were one of the motivations for this part of the patch: Bill Paul implements jumbo bufs in if_sk, for example, and has to literally "hide" the address of the softc structure inside the buffer so that he can use it inside his ext_{free, ref} calls. All that this would do is clean things up for him. As this patch is rather big, and does more than just this (i.e. it also completely changes the way mbufs are allocated and freed and thus allows pages allocated from mb_map to be freed back to the map, therefore freeing physical pages as a consequence, etc. -- and there are more changes to come), I think that it's safe to say that there is still a little bit before this goes through. Regards, Bosko. -- Bosko Milekic * Voice/Mobile: 514.865.7738 * Pager: 514.921.0237 [EMAIL PROTECTED] * http://www.technokratis.com/ To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: mbuf re-write(s): v 0.2: request-for-comments
On Wed, 28 Jun 2000, Andrew Gallatin wrote: YES! This is wonderful news. I started coding device drivers on Digital UNIX and have long missed this feature. I can't count the number of times I've gotten 90% of the way through doing something with ext mubfs thought to myself "oh hell, now what am I going to do for an m_ext.ext_ref() function?" I can imagine. As I've previously mentionned, I'm thinking of adopting NetBSD's reference idea, as it seems very handy here. What it basically assumes is that if you're going to increase the reference count to an object, that you know one of the mbufs also referencing that object (since what you're doing is probably "copying" the data without having to actually perform a memory-to-memory copy -- the reason we have reference counts in the first place). So, if you know the mbuf referencing the same object, you will pass it to the macro and it will "increase a reference count" for it itself. What actually occurs is that the m_ext structure holds a forward/backward pointer (in the style of doubly-linked list) and is linked to all the other mbufs referencing the same object. This would isolate the referencing of external objects to the mbuf subsystem, such that callers don't have to worry about it at all, and can essentially get rid of the ext_ref() routine alltogether. On a less enthusiastic note, the amount of whitespace changes make it very difficult to eyeball your diff. Could you re-roll your diffs with -b (to ignore your whitespace changes). Yeah, I made some "appearence/consistency/cleanliness" changes in /sys/sys/mbuf.h in order to maintain consistency and ensure easy readability of the final product. However, for readability purposes, I posted the no-whitespace-changes diff to the same place: http://www.technokratis.com/code/mbuf/ I should have done this immediately; thanks for the advice! Hope this helps. :-) -- Andrew Gallatin, Sr Systems Programmerhttp://www.cs.duke.edu/~gallatin Duke University Email: [EMAIL PROTECTED] Department of Computer SciencePhone: (919) 660-6590 Cheers, Bosko. -- Bosko Milekic * Voice/Mobile: 514.865.7738 * Pager: 514.921.0237 [EMAIL PROTECTED] * http://www.technokratis.com/ To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: mbuf re-write(s): v 0.2: request-for-comments
On Wed, 28 Jun 2000, Dave Baukus wrote: All this talk of mbuf prompts me to point a small bug in M_PREPEND that was introduced somewhere between 3.3 and 4.0; maybe its also in 5.x. [...] If m_prepend() fails then No longer an issue in 5.0-CURRENT, and I'm looking at version 1.50 of mbuf.h Although you pointing it out did lead me to looking at m_prepend() itself, and noticing some bad style issues, like casting on NULLs (ick!) which I'll fix in the patch along with adding the new reference stuff. Thanks! -- Dave Baukus [EMAIL PROTECTED] Chiaro Networks ltd. Richardson, Texas, USA. -- Bosko Milekic * Voice/Mobile: 514.865.7738 * Pager: 514.921.0237 [EMAIL PROTECTED] * http://www.technokratis.com/ To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: mbuf re-write(s): v 0.2: request-for-comments
On Wed, 28 Jun 2000, Kenneth D. Merry wrote: FWIW, I'm in favor of a pointer argument as well. The way I implemented it was actually with a third argument, instead of changing the int to void. i.e.: [...] I don't feel too strongly about it either way -- I suppose it's about the same amount of work to port older code. (I just put an ifdef in the sendfile code, which doesn't use the third argument in my tree.) The u_int is really unnecessary. If the caller needs more important information, he can pass anything he likes, including a data structure, or even a pointer to the mbuf. So this information can be extracted in either case. Ken -- Kenneth Merry [EMAIL PROTECTED] -- Bosko Milekic * Voice/Mobile: 514.865.7738 * Pager: 514.921.0237 [EMAIL PROTECTED] * http://www.technokratis.com/ To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
mbuf re-write(s): v 0.2: request-for-comments
Hello, I just added a MEXTADD() routine to the [now getting bigger] mbuf re-write patch, as well as fixed and changed a few little things here and there (once again). Thus, so-called "version 2" of the diff is again available: http://www.technokratis.com/code/mbuf/ This code includes all that was discussed in the previous Email, as well as a better/actually working external storage facility for clusters. Previously, it was very difficult to allocate external storage, attach it to the mbuf, _and_ as well maintain a reference counter for it, primarily due to the arguments that were taken by ext_free() and ext_buf(). These have been changed to have a new void * pointer passed in as the second argument (following the base address of the storage buffer). Also has been included a void * multi-purpose ext_args pointer in the m_ext struct, so the caller has much more flexibility now. In fact, the caller can now attach a "management" or "reference" structure to the m_ext struct via the ext_args pointer, and have it passed to his ext_free and ext_buf routines. Naturally, for dynamically sized malloc() external buffers, the caller can also allocate along with it space for its reference counter and attach to the mbuf via the ext_args pointer. It will be incremented/decremented properly as ext_args can be passed as the second argument to the two functions. When ext_free, ext_buf, and ext_args are all NULL, but M_EXT is set, then the external storage corresponds to an mcluster. These changes will surely help out/make cleaner some code, like some of Bill Paul's device drivers (if_sk, if_ti, if_wb). For other purposes, such as sf_bufs, for example, it's not _as_ significant, mainly since sf_bufs are allocated from their own map such that the system can easily produce a unique index for a reference counter array just by looking at the offset base_addr_of_sf_buf - base_addr_of_map, like we do for mclusters. However, obviously, we don't want a new map for every new type of external storage we want to attach to an mbuf. :-) (Yes, this means easy attaching of dynamically sized buffers) What I still have left to do before I look into finding/bugging/annoying a committer (sigh) to reviewing/committing all of this: * Re-write the mcluster allocations/deallocations in the same style as the new mbuf allocator/deallocator. ... If someone has a more suitable proposition, please let me know. I love to hear suggestions. * I'm thinking of adopting NetBSD's "cute" and "clean" reference count system; they maintain their mbufs linked through the m_ext when they reference the same storage object. This will remove all fear from external callers/code having to deal with references in the first place, and will isolate it all to the mbuf code. Once this is done, I can also add a NetBSD-like MEXTMALLOC() macro, in addition to the just-added MEXTADD() macro. This would automate dynamic malloc()ing of external storage objects, and make it quite a bit cleaner/easier for the caller. * Patch up userland to deal with all of these changes. * Get some profiling / optimisation done. Since my initial post, I have received quite a few hits/requests for the posted code, and have even received a few comments/suggestions. These have been most helpful. I invite many more,... please! Regards, Bosko. -- Bosko Milekic * Voice/Mobile: 514.865.7738 * Pager: 514.921.0237 [EMAIL PROTECTED] * http://www.technokratis.com/ To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
mbuf re-write(s), v 0.1
In an attempt to eliminate or significantly reduce the hogging of physical memory by unused mbufs, I have begun re-writing some of the mbuf subsystem. I've re-written the allocator and designed an actual free routine, and have also considerably re-written the MGET, MGETHDR, and MFREE macros. I still have some work to do with this, notably optimisation, but I have not been able to do any profiling whatsoever as profiling, I repeat, seems presently broken on -CURRENT. This is particularily useful for machines which see "peak" mbuf usage periods, where many mbufs are allocated, only to be freed a little while later, but which will unfortunately remain on the free list, holding on to physical memory (for a graphical example, see the THIRD graph at http://www.technokratis.com/stats/mbuf.html). Previously, we used to use the kernel malloc() to do mbuf allocations, coupled with the free() routine to do the freeing. However, the new allocator does not have to worry about chosing the right algorithm, and notably, variable sized objects. Of course, I still have some performance tuning to do, but need the profiling to work for that. Of course, there is an min_on_avail variable added to the code, which is yet to be made sysctl-tunable, and which represents the minimum amount of mbufs that must reside on the free lists, so that the system will not explicitly free pages on every occasion it gets. The reason I named this "v 0.1" has to do with the work that is left to be done here. I've, for the moment, removed the m_reclaim() and wait code for mbufs, but this will all have to be re-placed appropriately (not much voodoo involved here). However, I've moved the mclusters to their own map, mcl_map, which is the correct thing to do here, in order to avoid having to worry about fragmentation in the allocation routines (we want most efficiency possible). I'll go ahead and change the mcluster stuff soon, too, and hopefully fix up some of the mclrefcnt usage for clusters. I'll discuss more of this in time to come, and post the URL here. Also, I'm planning to write an optional "mbuf daemon" that can periodically walk the mbuf system's AVAIL_LST, and EMPTY_LST, and re-organize order of elements on, particularily, the AVAIL_LST, in order to minimize fragmentation during allocations, and augment % utilization for the allocator(s). It should also optionally do some other neat tasks, but I haven't exactly decided on which ones, although I'd like to avoid having it raise to splimp() for too long, though. Unlike what some of you may be thinking right now, this is not theoretical work, I have some diffs right here: http://www.technokratis.com/code/mbuf/ (you'll have to excuse my big tabs) The diffs provided for now are context diffs, and they do several things, among the which (not to go too much into details): 1* Implement new mbuf allocator, implement free routine, re-write mbuf allocation and free macros. Add necessary lists / structures for the new system. 2* Change to OID_AUTO for all sysctls in uipc_mbuf.c 3* Make /sys/sys/mbuf.h look nicer, more consistent comments, etc. 4* Have mbuf clusters remain the same for now, but move them over to mcl_map 5* Remove (temporarily) mbuf wait/reclaim stuff. The diffs are in working condition on -CURRENT (as of a couple of days ago, at least), and I'm running them with no apparent problems as we speak. % utilization is great, for now, and I hope that the daemon-to-come will bring it up even higher. I can also tune it with the min_on_avail variable. Of course, from the above 5 points, you'll quickly note that I still have to go around and rebuild userland stuff, but that will wait until the end of all mbuf system modifications. Comments welcome. Special thanks to Mike Silbersack for already discussing such issues with me. Regards, Bosko -- Bosko Milekic * Voice/Mobile: 514.865.7738 * Pager: 514.921.0237 [EMAIL PROTECTED] * http://www.technokratis.com/ To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
ether_output() : WIERD PROBLEM
Hello, I've been doing some mbuf-related work on my -CURRENT machine lately. Particularily, I've re-written the allocator and free routines, amongst other things. However, I've encountered a peculiar problem that surfaces in ether_output(). What happens is that one of my daemons, for example, natd, or httpd, etc., performs a system call, which eventually results in a call to ether_output (following tcp_output, ip_output, etc.). At the bottom of ether_output(), after an IF_ENQUEUE, and an splx(s), there is the following check: if (m-m_flags M_MCAST) ifp-if_omcasts++; The if () part results in a testb $0x2, 0x13(%ebx) IF I REMEMBER correctly. For some wierd reason, when the mbuf in question is at a location: 0xstuffF00 (256 bytes into a page, the second mbuf on a page), there is a page fault. And it's _always_ when the mbuf is at such an address. Where the wierdness begins is when I actually examine the contents of the mbuf... I can actually see them, no page fault, no nothing. In fact, if I `continue' from the debugger, things continue to work fine... until the next 0xstuffF00 mbuf goes through ether_output() and reaches that check. If I move the check of the m_flags to just above the splx(s), but after the IF_ENQUEUE, then the page fault still occurs in the same way, except that I even get a page fault when trying to examine the contents of the mbuf. In other words, I can't even `continue' in this case. If I move the m_flags check before the IF_ENQUEUE, this doesn't happen at all! Furthermore, if I revert my mbuf changes, I don't catch this problem. Anyone got any hints/clues? Regards, Bosko. -- Bosko Milekic * Voice/Mobile: 514.865.7738 * Pager: 514.921.0237 [EMAIL PROTECTED] * http://www.technokratis.com/ To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: ether_output() : WIERD PROBLEM
Wow, a reply to myself. I feel kind of lame. :-) Anyway, this is just an update, with more info. I've checked the status of my new system's lists, once the fault occurs, and I can _guarantee_ that the management lists I wrote the code for are actually not corrupt when this happens. I've looked at the dump of the memory at where they are stored from DDB, and everything looks in order. So my assumption is that this has just uncovered a bug in ether_output(). Although I can't confirm it 100%. I do have another bit of valuable information, though. If I move the m_flags check to after the ENQUEUE, but prior to the call to the interface "start" routine (see the end of ether_output()), then things are fine. The problem only occurs if the check is moved to _after_ the if_start call. On Tue, 13 Jun 2000, Bosko Milekic wrote: Hello, I've been doing some mbuf-related work on my -CURRENT machine lately. Particularily, I've re-written the allocator and free routines, amongst other things. However, I've encountered a peculiar problem that surfaces in ether_output(). What happens is that one of my daemons, for example, natd, or httpd, etc., performs a system call, which eventually results in a call to ether_output (following tcp_output, ip_output, etc.). At the bottom of ether_output(), after an IF_ENQUEUE, and an splx(s), there is the following check: if (m-m_flags M_MCAST) ifp-if_omcasts++; The if () part results in a testb $0x2, 0x13(%ebx) IF I REMEMBER correctly. For some wierd reason, when the mbuf in question is at a location: 0xstuffF00 (256 bytes into a page, the second mbuf on a page), there is a page fault. And it's _always_ when the mbuf is at such an address. Where the wierdness begins is when I actually examine the contents of the mbuf... I can actually see them, no page fault, no nothing. In fact, if I `continue' from the debugger, things continue to work fine... until the next 0xstuffF00 mbuf goes through ether_output() and reaches that check. If I move the check of the m_flags to just above the splx(s), but after the IF_ENQUEUE, then the page fault still occurs in the same way, except that I even get a page fault when trying to examine the contents of the mbuf. In other words, I can't even `continue' in this case. If I move the m_flags check before the IF_ENQUEUE, this doesn't happen at all! Furthermore, if I revert my mbuf changes, I don't catch this problem. Anyone got any hints/clues? Regards, Bosko. -- Bosko Milekic * Voice/Mobile: 514.865.7738 * Pager: 514.921.0237 [EMAIL PROTECTED] * http://www.technokratis.com/ To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: Mbuf waiting mfc to 3
[re-directed to --hackers, as more appropriate there, also [EMAIL PROTECTED] is in the CC, make _SURE_ to remove him from there before you post ANY replies!!!] Mike, your patch looks fine. However, I found a bug in /sys/netkey code. (and it's related to the wait stuff, although I don't believe it directly concerns your patch so, as far as I'm concerned, your stuff is ready to go in.) However, I believe this code is only for 4.x and -CURRENT people. keysock.c's key_sendup() does a silly thing with the mbuf allocation. Attached is a patch that fixes it. This applies to v 1.2 of the file. here is the Id: /* KAME @(#)$Id: keysock.c,v 1.2 1999/08/16 19:30:36 shin Exp $ */ (Jlemon, can you commit this?) Oh yeah, and please also commit pr=18471 as it's been sitting there for a while. Thanks in advance, Bosko. On Fri, 9 Jun 2000, Mike Silbersack wrote: Well, it's been nearly a month since I posted the mbuf waiting MFC for 3.4 to -net, although I haven't heard any complaints about it messing up systems, there have been a few complaints on bugtraq of mbuf exhaustion attacks which would be much less serious with it. :) In any case, the patch is still available at http://www.silby.com/patches/mbuf-wait-mfc-2.patch for review. I'm fairly confident in its reliability, but I'd prefer a few more people to test it if they have the time. If there are no negative complaints, I'd like to get it committed before the end of next week to ensure that we don't miss getting it into 3.5. There are no changes between this patch and the last one I posted other than a single version line I had messed up in the previous one, so if you're currently testing that one, there's no need to download this one. Please post your experiences with it in any case, though. The small memory leak I alluded to in my previous posting of the patch has been found and committed seperately (as it affected 3,4, and 5.) So, please CVSUP before testing this patch to ensure you're seeing its true colors. Thanks, Mike "Silby" Silbersack -- Bosko Milekic * Voice/Mobile: 514.865.7738 * Pager: 514.921.0237 [EMAIL PROTECTED] * http://www.technokratis.com/ --- keysock.old.c Sat Jun 10 03:09:05 2000 +++ keysock.c Sat Jun 10 03:13:43 2000 @@ -419,18 +419,25 @@ while (tlen 0) { if (tlen == len) { MGETHDR(n, M_DONTWAIT, MT_DATA); + if (n == NULL) { + if (m) m_freem(m); + return ENOBUFS; + } n-m_len = MHLEN; } else { MGET(n, M_DONTWAIT, MT_DATA); + if (n == NULL) { + if (m) m_freem(m); + return ENOBUFS; + } n-m_len = MLEN; } - if (!n) - return ENOBUFS; + if (tlen MCLBYTES) { /*XXX better threshold? */ MCLGET(n, M_DONTWAIT); if ((n-m_flags M_EXT) == 0) { m_free(n); - m_freem(m); + if (m) m_freem(m); return ENOBUFS; } n-m_len = MCLBYTES;
Re: kerneld for FreeBSD
On Wed, 7 Jun 2000, void wrote: Doesn't Solaris auto-unload unused drivers when memory gets tight? -- Ben 220 go.ahead.make.my.day ESMTP Postfix An Operating System should only do that when the administrator is so stupid that he/she actually loads "unused" drivers. -- Bosko Milekic [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: fatal trap 12: page fault while in kernel mode
On Fri, 26 May 2000, Greg Skouby wrote: Hello, I posted a message to -questions yesterday about a machine that had the /dev directory somewhat corrupt. I could ls -la /dev/wd0* but when I was in the /dev director when I did an ls it was not showing any of the files. Now, today the machine was rebooting over and over again, freezing with this message: fatal trap 12: page fault while in kernel mode fault virtual address = 0xc33a3c6d fault code = supervisor read, page not present Instruction Pointer = 0x8:0xc022798F You have to post more information. For example, what is at the location pointed at by the instruction pointer? Get a stack trace, if possible (from the debugger), and any other relevant info., most of which is explained in the Handbook. -- Bosko Milekic * pages.infinit.net/bmilekic/index.html * www.technokratis.com [EMAIL PROTECTED] * [EMAIL PROTECTED] * [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Linux Module problems
On my -CURRENT machine, FreeBSD jehovah 5.0-CURRENT FreeBSD 5.0-CURRENT #0: Sat May 13 15:11:13 EDT 2000 root@jehovah:/usr/src/sys/compile/JEHOVAH i386 (obviously a little out-dated), I have recently noticed unusual problems with the linux module which, by the way, is of the same date. The first problem I discovered first came up while building the StarOffice5 port. After checking the dependency for linux's libc5, it _spontaneously_ reboots. No panic(), hence no debugger. I've never seen this sort of behavior before and have no idea what could have caused it. However, I noticed a related incident, which I can reproduce. What I did was, for kicks, kldunload linux, and then make install the staroffice5 port, and this time, I got a page fault and panic() from within malloc, which was trying to move something located at an address on an unmapped page to a register. I can reproduce this easily at the moment, with the following: #!/bin/sh while true; do kldload linux; kldunload linux; done A quick kldunload linux followed by a quick kldload linux does it on the first iteration. What's more odd is that now, after panic()ing the machine a couple of times with the above, I can reproduce the spontaneous reboot easily too, by just starting up linux Netscape! At the moment, I cvsup-ed new sources, and am rebuilding world and a fresh new kernel, at which point I'll try to reproduce this again. I remember seeing this in earlier -CURRENT, too, just never got around to playing with it. Anyone? -- Bosko Milekic * pages.infinit.net/bmilekic/index.html * www.technokratis.com [EMAIL PROTECTED] * [EMAIL PROTECTED] * [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: Linux Module problems
On Fri, 26 May 2000, Alain Thivillon wrote: I had the same problem with all statically linked Linux binaries, including rpm. I guess that loader does not recognize as Linux, launch them as FreeBSD static and one of the syscall is mapped to halt() (for example if dont launch rpm as root, i have "Segmentation violation" instead of a reboot). I just re-cvsuped and rebuilt everything, and I am still having the same problem. In fact, I've noticed something else: After the reboot, the _time_ (not the date, though) is modified to, generally +4 hours. I have no idea why this would be happening. As explained in /usr/src/UPDATING, you have to rebrand them: brandelf -t Linux static-binary The first candidate (and i think this explain you problem) if of course /compat/linux/sbin/ldconfig. Am giving it a shot. -- Alain Thivillon -+- [EMAIL PROTECTED] -+- Hervé Schauer Consultants -- Bosko Milekic * pages.infinit.net/bmilekic/index.html * www.technokratis.com [EMAIL PROTECTED] * [EMAIL PROTECTED] * [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: Linux Module problems
As explained in /usr/src/UPDATING, you have to rebrand them: brandelf -t Linux static-binary The first candidate (and i think this explain you problem) if of course /compat/linux/sbin/ldconfig. Am giving it a shot. This worked. Thanks! -- Bosko Milekic * pages.infinit.net/bmilekic/index.html * www.technokratis.com [EMAIL PROTECTED] * [EMAIL PROTECTED] * [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
socket leak(s)...
I'm afraid my earlier message was incorrect. This is _still_ an issue... (the always_keepalive is unrelated) ... appologies; and tomorrow will be back to poking around day. -- Bosko Milekic * pages.infinit.net/bmilekic/index.html * www.technokratis.com [EMAIL PROTECTED] * [EMAIL PROTECTED] * [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: leaking sockets (closure)
Hi Mike, On Wed, 17 May 2000, Mike Silbersack wrote: Heh, that's sorta neat, I guess. It'll be interesting to find out if the leak is due to the mbuf waiting in some way, or a totally unrelated bug we're tickling. I'd almost guess the latter. I finally peeked at the tcp_timer stuff and quickly realized: `grep keepalive /etc/defaults/rc.conf' or, equivalently, `sysctl -A | grep keepalive' should quickly make things clear... :-) Notice the explicit initialization of always_keepalive to zero in tcp_timer.c, which is what at first glance tripped me off. (I have re-simulated the exhaustion and all seems fine). -Bosko -- Bosko Milekic * pages.infinit.net/bmilekic/index.html * www.technokratis.com [EMAIL PROTECTED] * [EMAIL PROTECTED] * [EMAIL PROTECTED] "Give a man a fish and he will eat for a day. Teach him how to fish, and he will sit in a boat and drink beer all day." To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: What do people think of maybe using the sourceforge software?
On Tue, 16 May 2000, Nick Hibma wrote: I guess that most people leading a project could do with a bit of feature creep, features being shoved under their noses. Even if at first you think that source control solves all our problems, it still could be a way to develop new tools and get them running and tried out before committing them to the tree. Second, the projects page we have now, with all due respect to the people that try to keep it reasonably organised, is a mess due to the lack of updates. people only maintain their project pages perhaps, but certainly not the links that lead to them. Being able to work with more people on the same project on an equal bases would be a good idea IMHO. Nick Although I have no control over what goes on behind the curtains, I must say the following: My feeling is that a lot of the doc people are working really hard to make this sort of stuff happen. I know, for instance, that Jeroen (Asmodai) has great ideas in place for centralization of project listings, and TODO lists, etc. The only thing left is to bind these ideas together and make things like this happen. One of the big issues, I feel, is the duplication of efforts and I, as a "guy who develops from the sidelines" can tell you right now: a centralized information-base such as the one [I believe] these people are working on is key to what I choose to poke at next. Please remember that a lot of people who contribute to the project are not necessarily committers and do not read -commiters mail. The centralization of documentation and various other data will make collaboration possible and, best of all, it'll make it fun (which is what open source is about for many of us). With the centralization of information will come direction. Cheers, Bosko. -- Bosko Milekic * pages.infinit.net/bmilekic/index.html * www.technokratis.com [EMAIL PROTECTED] * [EMAIL PROTECTED] * [EMAIL PROTECTED] "Give a man a fish and he will eat for a day. Teach him how to fish, and he will sit in a boat and drink beer all day." To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Regarding PR 5877: sb_cc issues.
As I haven't seen [other] feedback yet to this PR posted by Bill Fenner, I'm curious as to what people's opinions are. (I should have cross-posted to -hackers when I first replied to it, as I believe that I've seen this mentionned previously, but not getting many replies). -- Bosko Milekic * pages.infinit.net/bmilekic/index.html * www.technokratis.com [EMAIL PROTECTED] * [EMAIL PROTECTED] * [EMAIL PROTECTED] "Give a man a fish and he will eat for a day. Teach him how to fish, and he will sit in a boat and drink beer all day." To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Comments above kmem_malloc() (vm/vm_kern.c)
Is the following comment above kmem_malloc()'s definition in: /sys/vm/vm_kern.c ... still valid? (I hope and suspect not): " * Note that this still only works in a uni-processor environment and * when called at splhigh(). " The only places, as far as I've seen, that call kmem_malloc are the kernel's malloc() and the mbuf allocation routines. Niether of these seems to do it at splhigh(), either. --Bosko Milekic [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: Where is pci_intr_establish() _thread_sys_read()?
On Mon, 6 Mar 2000, Chris Costello wrote: On Monday, March 06, 2000, Zhihui Zhang wrote: Can anyone tell me where is the code for pci_intr_establish() and _thread_sys_read()? I could not find them under /usr/src. I can tell you offhand that _thread_sys_anything is the _real_ syscall for `anything'. This is because a lot of syscalls are reimplemented within libc_r for reasons that are kind of obvious (directly calling the read syscall from one thread would block all the other threads in a process). So _thread_sys_open() == open(2), _thread_sys_read() == read(2), etc. I don't know about pci_intr_establish. -- |Chris Costello [EMAIL PROTECTED] |Today's assembler command : EXOP Execute Operator ` pci_intr_establish is not part of FreeBSD's interface(s), as far as I know. This probably belongs to either NetBSD or OpenBSD (since the drivers that use this routine to setup an interrupt use it under #if defined(__OpenBSD__) or __NetBSD__ blocks. See our bus interface code (e.g. bus_if.[ch]) --Bosko .. Bosko Milekic * [EMAIL PROTECTED] * http://pages.infinit.net/bmilekic/ Montreal, Quebec, Canada. * Technokratis: http://www.technokratis.com/ To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
rtfree panic() (fwd)
Hmmm. Judging from the last CVS log entry for route.c (See r1.59), this problem can manifest itself in -current as well. I´m cross posting on the initial send, but please, when replying, redirect to [single, truly] appropriate list. It *appears* that rtfree() is puking because the rnh pointer is somehow (?) NULL. The rt_tables tree for the given address family either doesn't hold what's being looked for, or there's a problem with the rt_key macro. In any case, I'm not that comfortable with this part of the code yet, so if some route.c guru with radix tree know-how could take a look at this, I think that Shrihari (along with the others who have experienced this?) would appreciate it. Note that if more infos. are needed, request it at the Shrihari's address below. - | Bosko Milekic | Coffee vector: 1.0i+1.0j+1.0k | | Email: [EMAIL PROTECTED] | Sleep vector: -1.0i-1.0j-1.0k | | WWW: http://pages.infinit.net/bmilekic/ | Resulting life: 0i+0j+0k (DNE)| - -- Forwarded message -- Date: Mon, 7 Feb 2000 16:53:28 -0500 From: Shrihari Pandit [EMAIL PROTECTED] To: Bosko Milekic [EMAIL PROTECTED] Subject: rtfree panic() Hey there. I was hoping you might be able to give us hand in this problem: We have a couple of machines that run FreeBSD 3.4-STABLE and they are panicing randomly in rtfree(). These systems contain over 70,000 routes in the routing table. IdlePTD 2600960 initial pcb at 210e34 panicstr: rtfree panic messages: --- panic: rtfree syncing disks... done dumping to dev 20001, offset 0 dump 255 254 253 252 251 250 249 248 247 246 245 244 243 242 241 240 239 238 237 236 235 234 233 232 231 230 229 228 227 226 225 224 223 222 221 220 219 218 217 216 215 214 213 212 211 210 209 208 207 206 205 204 203 202 201 200 199 198 197 196 195 194 193 192 191 190 189 188 187 186 185 184 183 182 181 180 179 178 177 176 175 174 173 172 171 170 169 168 167 166 165 164 163 162 161 160 159 158 157 156 155 154 153 152 151 150 149 148 147 146 145 144 143 142 141 140 139 138 137 136 135 134 133 132 131 130 129 128 127 126 125 124 123 122 121 120 119 118 117 116 115 114 113 112 111 110 109 108 107 106 105 104 103 102 101 100 99 98 97 96 95 94 93 92 91 90 89 88 87 86 85 84 83 82 81 80 79 78 77 76 75 74 73 72 71 70 69 68 67 66 65 64 63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 --- #0 boot (howto=256) at ../../kern/kern_shutdown.c:285 285 in ../../kern/kern_shutdown.c (kgdb) bt #0 boot (howto=256) at ../../kern/kern_shutdown.c:285 #1 0xc012b8e4 in at_shutdown ( function=0xc01f20a2 __set_sysctl__debug_sym_sysctl___debug_if_tun_debug+934, arg=0x3, queue=-1071640608) at ../../kern/kern_shutdown.c:446 #2 0xc016650f in rtfree (rt=0xc2413000) at ../../net/route.c:206 #3 0xc01668f7 in rtrequest (req=2, dst=0xc243de00, gateway=0xc243de10, netmask=0xc2415670, flags=3, ret_nrt=0x0) at ../../net/route.c:509 #4 0xc016be81 in in_ifadownkill (rn=0xc243e800, xap=0xc0201038) at ../../netinet/in_rmx.c:390 #5 0xc0165d68 in rn_walktree (h=0xc23f4200, f=0xc016be4c in_ifadownkill, w=0xc0201038) at ../../net/radix.c:959 #6 0xc016bec8 in in_ifadown (ifa=0xc23fb500) at ../../netinet/in_rmx.c:410 #7 0xc016ff7f in rip_ctlinput (cmd=0, sa=0xc23fb548, vip=0x0) at ../../netinet/raw_ip.c:396 #8 0xc014148d in pfctlinput (cmd=0, sa=0xc23fb548) at ../../kern/uipc_domain.c:265 #9 0xc015e343 in if_unroute (ifp=0xc023ebc4, flag=1, fam=0) at ../../net/if.c:414 #10 0xc015e3cf in if_down (ifp=0xc023ebc4) at ../../net/if.c:449 #11 0xc01e7308 in etp_linkdown () #12 0xc01e97ab in cisco_keepalive () #13 0xc01e9ba8 in cisco_notify () #14 0xc01ecbad in etp_notify () ---Type return to continue, or q return to quit--- #15 0xc01e91e4 in hdlc_rcvhandler () #16 0xc01cf35e in l3_rcvhandler () #17 0xc01c857d in lind_event () #18 0xc01e7445 in hdlc_timeout () #19 0xc0130112 in softclock () at ../../kern/kern_timeout.c:132 --- To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: Acceptable MBUF levels?
On Fri, 28 Jan 2000, Doug White wrote: That would be correct, at least looking at the appropriate code in /sys/kern/uipc_mbuf.c. The read-only sysctls kern.ipc.nmbclusters and kern.ipc.nmbufs hold the max mbuf clusters and the max mbufs, respecively. kern.ipc.nmbufs is bound to an nmbufs value in there, but I can't figure out to what value it's initialized to. `nmbufs' is actually NMBCLUSTERS * 4, unless a value is fetched from the environment (see `loader'). A similar initialization is done for `nmbclusters,' only nmbclusters defaults to NMBCLUSTERS unless something else is provided through the getenv() call (see `TUNABLE_INT_DECL'). Increasing maxusers has the side effect of increasing NMBCLUSTERS according to this formula (from /sys/conf/param.c): #ifndef NMBCLUSTERS #define NMBCLUSTERS (512 + MAXUSERS * 16) #endif You only have to override NMBCLUSTERS by hand if you want a truly gigantic (i.e. 10,000) number of nmbclusters. Just be VERY CAREFUL doing so since you can *reduce* the number, and that's not good! From personal experience, 512 maxusers and 16384 nmbclusters is more than enough for just about anything -- just make sure you can handle a 17MB kernel. :-) Yes, that's exactly right. Good thing you pointed it out too. :-) However, increasing MAXUSERS also ends up increasing other global parameters in the kernel, so you could end up with a rather large kernel when all you really want to do is increase NMBCLUSTERS, and nothing else. But yeah, your point is very valid. Cheers, Bosko. - | Bosko Milekic | Coffee vector: 1.0i+1.0j+1.0k | | Email: [EMAIL PROTECTED] | Sleep vector: -1.0i-1.0j-1.0k | | WWW: http://pages.infinit.net/bmilekic/ | Resulting life: 0i+0j+0k (DNE)| - To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: Acceptable MBUF levels?
On Wed, 26 Jan 2000, Doug White wrote: When people refer to mbufs, they refer to mbuf clusters, of which there's a fixed number. The kernel will allocate more mbufs as necessary. Uhm, actually, mbufs are also allocated from mb_map. Thus, they are also capped. (Unless I'm missing something big again... :-) ) The usual rule of thumb is that the peak should never exceed 75% of the max mbufs in the system to allow for sufficient overhead in extreme situations. In this case you're at 80%, so you should probably recompile your kernel and bump maxusers. Actually, for mbufs and mbuf clusters, you should increase NMBCLUSTERS, which will serve as an indication of allocate-able clusters as well as, ultimately, mbufs. -- Bosko Milekic Email: [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: PR kern/14034: gettimeofday() returns negative value?
On Wed, 19 Jan 2000, Sabrina Minshall wrote: What's going one here? Successive calls to gettimeofday yields negative elapsed time? Any fixes? [ code snipped ] Well, the PR considers a different problem. What your code does is call gettimeofday() once, record the value, and then a little later, call it again while proceeding to calculate a delta between the latter and previous results. Notice the issue mentionned in the PR has been concluded to be faulty hardware. Now, I assure you, this is a problem with your code snippet. I tried this code on a DEC box running: OSF1 oracle.dsuper.net V4.0 1091 alpha And got the exact same results. The problem is the tv1 = tv2 structure equality. Since the byte order is different, you get your usec from tv1 ending up in tv2's usec field. Regards, Bosko. -- Bosko Milekic Email: [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: splimp for PCI
On Mon, 20 Dec 1999, Alex wrote: !This message was sent from Geocrawler.com by "Alex" [EMAIL PROTECTED] !Be sure to reply to that address. ! !Hello, !I'm little confuse to using splimp/splx in driver !that support PCI board. IRQ is shared for PCI. !Is using splimp can cause for some problem? !Thank a lot !Alex ! !Geocrawler.com - The Knowledge Archive Here's my `first glance' shot at an answer: [ ;-) ] Nope, using splimp() will not cause problems in terms of shared IRQs. However, it may cause problems if you're blocking interrupt handlers of priorities that you don't want to. Shared IRQs are dealt with thanks to a linked list of handlers for each shared IRQ, which all end up being called as a result of at least one of the devices "registered" for that IRQ asserts an int request. Each device, however, is "registered" with its own 'mask,' and this mask should correspond to one of several given values, where _handlers_ executing at that priority level will be blocked as per the present priority level. If you haven't done so already, my suggestion is to take a look at spl(9) [e.g. `man 9 spl'], and, if still interested, taking a look at *some* of i386/isa/intr_machdep.c as well as most of sys/i386/i386/nexus.c -- which brings up the question: Is anybody _currently_ working on cleaning this stuff up, and completely getting rid of the remains of the "old" interface? Bosko. .. . . . . . . . . .. . . . Bosko Milekic -- [EMAIL PROTECTED] . . . . . .. . . . . .. . . WWW: http://pages.infinit.net/bmilekic/ . To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: ip checksum
On Tue, 23 Nov 1999, Parthasarathy M. Aji wrote: !Hey, ! !I am trying to recompute the checksum of an IP packet. I use !netinet/in_chksum.c to do this. The values returned are not correct. I've !reset the ip_sum field to 0 before doing the sum. Is there something !missing? ! !thanks ! ! Would you be able to provide some code to illustrate the situation? There are several things that may go wrong. What exactly are you trying to do here? (You may be using the wrong procedure) and what are you getting for return values? --Bosko -- Bosko Milekic [EMAIL PROTECTED] "I want now to tell you, gentlemen, whether you care to hear it or not, why I could not even become an insect." --F. Dostoyevski To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: PCI DMA lockups in 3.2 (3.3 maybe?)
On Mon, 22 Nov 1999, Dennis wrote: !Its a late 3.2-STABLE. so its not that old. Surely someone knows if !something in this area was fixed or not? ! !Since its a DMA lockup, how would you suggest that the informatoin about !what instruction was executing be obtained? ! !The nightmare of instability of 3.x continues whilst the braintrust flogs !away at 4.x. Its really a damn shame. And why is 3.x so much slower than !2.2.8? Will 4.0 be slower yet? ! !DB ! Can you quantify how "slower" the 3.x code is? What's "slower" about it? A lot of people are willing to help, but providing no concrete information offers little possibility. In the mean time, did you happen to get a chance to reproduce the problem in 3.3-STABLE ? It appears from your description of the problem that's it somewhat tougher to debug, and knowing whether 3.3 remedies the problem can be of some help. -- Bosko Milekic [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
mbuf wait code (revisited) -- review?
Hi, Attached are some diffs that provide a couple of wait routines in the out-of-mbuf and/or out-of-mbuf-cluster case(s). The attached diffs are for -STABLE and I would be greatful if somebody could review them/give feedback. I have diffs for -CURRENT but am not posting them because I haven't had too much of a chance to test the code -- whereas these below have been tested for a while now on several -STABLE machines. Since the problematic situation has been described numerous times before both on the list and in several PRs, I am not going to go over it again. Instead, a [fairly] accurate description of the situation in question can be found at: http://www.freebsd.org/cgi/query-pr.cgi?pr=14042 [Note that the patches posted in the original PR should not be considered.] There are several other open PRs which refer to a similar problem. Furthermore, I've also spotted at least one other PR which addresses a potentially related issue: http://www.freebsd.org/cgi/query-pr.cgi?pr=9883 The above PR mentions MGET turning the provided mbuf pointer to a NULL pointer even if the call was made with M_WAIT. I don't see how this can be the case, especially since presently the code is set to panic() in the m_reclaim when out of mbufs and calling with M_WAIT. In any case, with the code below, MGET will potentially be capable of setting that NULL pointer, which is something that really can't be avoided even if the call is made with M_WAIT. The whole idea behind the provided diffs is to add a 'sleep' time before actually deciding to explicitly "fail" -- this sleep time can be modified dynamically, the diffs add a sysctl kern.ipc.mbuf_wait to tune the sleep time in the tsleep(). Anyway, I would really appreciate feedback and/or suggestions. Furthermore, if anybody's interested in testing it, I can post the -CURRENT version of the diffs (which are only slightly different). Finally, if this looks good, the next step would be to search and dig through all the code that uses the MGET, MGETHDR, MCLGET, MCLALLOC macros and m_get, m_gethdr, m_clalloc functions in order to make sure that all of that code checks whether the returned pointer is referencing a NULL (most of the problematic code resides in sys/nfs, from what I've seen. -- Bosko Milekic [EMAIL PROTECTED] "I counted the steps in my walks and calculated the cubic contents of soup plates, coffee cups, and pieces of food -- otherwise my meal was unenjoyable. All repeated acts or operations I performed had to be divisible by three and if I missed I felt impelled to do it all over again, even if it took hours." -- Nikola Tesla, 1919. (Note: If the diffs below generate problems, please let me know and I'll post this stuff somewhere on the WWW). --snip snip-- diff -ruN sys.old/conf/param.c sys/conf/param.c --- sys.old/conf/param.cSun Oct 31 23:34:16 1999 +++ sys/conf/param.cMon Nov 1 20:07:46 1999 @@ -82,6 +82,7 @@ intmaxfiles = MAXFILES;/* system wide open files limit */ intmaxfilesperproc = MAXFILES; /* per-process open files limit */ intncallout = 16 + NPROC + MAXFILES; /* maximum # of timer events */ +intmbuf_wait = 32; /* mbuf sleep time */ /* maximum # of mbuf clusters */ #ifndef NMBCLUSTERS diff -ruN sys.old/kern/uipc_mbuf.c sys/kern/uipc_mbuf.c --- sys.old/kern/uipc_mbuf.cWed Sep 8 20:45:50 1999 +++ sys/kern/uipc_mbuf.cFri Nov 5 21:44:51 1999 @@ -47,6 +47,10 @@ #include vm/vm_kern.h #include vm/vm_extern.h +#ifdef INVARIANTS +#include machine/cpu.h +#endif + static void mbinit __P((void *)); SYSINIT(mbuf, SI_SUB_MBUF, SI_ORDER_FIRST, mbinit, NULL) @@ -60,6 +64,8 @@ intmax_hdr; intmax_datalen; +static u_int m_mballoc_wid = 0, m_clalloc_wid = 0; + SYSCTL_INT(_kern_ipc, KIPC_MAX_LINKHDR, max_linkhdr, CTLFLAG_RW, max_linkhdr, 0, ""); SYSCTL_INT(_kern_ipc, KIPC_MAX_PROTOHDR, max_protohdr, CTLFLAG_RW, @@ -67,13 +73,14 @@ SYSCTL_INT(_kern_ipc, KIPC_MAX_HDR, max_hdr, CTLFLAG_RW, max_hdr, 0, ""); SYSCTL_INT(_kern_ipc, KIPC_MAX_DATALEN, max_datalen, CTLFLAG_RW, max_datalen, 0, ""); +SYSCTL_INT(_kern_ipc, OID_AUTO, mbuf_wait, CTLFLAG_RW, + mbuf_wait, 0, ""); SYSCTL_STRUCT(_kern_ipc, KIPC_MBSTAT, mbstat, CTLFLAG_RW, mbstat, mbstat, ""); static voidm_reclaim __P((void)); /* "number of clusters of pages" */ #define NCL_INIT 1 - #define NMB_INIT 16 /* ARGSUSED*/ @@ -125,6 +132,9 @@ * any more (nothing is ever freed back to the map) (XXX which * is dumb). (however you are not dead as m_reclaim might * still be able to free a substantial amount of space). +* XXX Furthermore, we can also work with "recycled" mbufs (when +* we're calling with M_WAIT
Re: mbuf shortage situations (followup)
!I think that what needs to be done is to split the problem in two. First, !allow the mbuf routines to return a failure even with M_WAIT. If M_WAIT !is used, it simply means 'try harder, sleeping a bit if necessary'. This !requires ensuring that all the networking code deal with the failure !case - a time consuming but straightforward task. If a failure occurs, !one simply drops the packet, not the connection or anything else drastic. !just the packet. Yes, these is mainly the part I've been working on recently. The sleeping and what not (as I'm sure you've seen from the patches if you looked at them) has already been completed. Adding a counter that will expire and return a pre-defined error is trivial, in this case. The only real issue here (if we can call it that) is get _all_ the networking code to recognize this. Anyone want to help? :-) ! !The second problem that needs to be addressed is resource exhaustion. !For example, allocating thousands of connections and socket-opting their !buffers as large as possible, or programs such as syslog accepting new !connections ad-infinitum. This is a harder problem to fix properly, !but a lot of the various issues such as those with syslog can be dealt !with in userland rather then the kernel. ! ! -Matt ! I agree. The issue here is somewhat related (if I understand your explanation correctly) to [local] processes attempting to grab a lot of socket buffer space. I was a little less concerned with this issue since, as I previously mentionned, Brian Feldman is working on limiting socket buffer space. Nonetheless, if we do not consider limiting, here's what I believe will need to be done: As explained above, when we run out of mbufs and/or mbuf clusters (and some are needed), if we are M_WAIT (when processes socket opt their buffers as large as possible, the call is usually with M_WAIT), we will end up tsleep()ing for certain periods of time, until our counter expires and we return our pre-defined error (as mentionned above). When we do return this error, however, the caller (for instance, we can consider sosend() the caller -- which, if I remember correctly, is one of the callers to MGET() when we setsockopt a large buffer and consequently write() to this socket), will also have to know how to properly deal with this error (e.g.: kill the process?). Killing the process may seem somewhat sadistic to some ( :-) ), but remember that if we do get to the point where 'normal' local processes eat up so much buffer space that we run out, we should probably be increasing NMBCLUSTERS and/or maxusers anyway. As for script weenies, I hope that Brian (and whomever else may be working on it) gets that sockbuf limiting code done, because, to be quite honest, I don't think that script kids having to comprimise more than one account just so they can DoS a box will be much of an issue (if worse comes to worse, we can limit per gid -as opposed to per uid). With exhaustion attacks such as these, we're better off just limiting. Regards, Bosko Milekic. To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: mbuf shortage situations (followup)
On Mon, 13 Sep 1999, Garrett Wollman wrote: !On Sun, 12 Sep 1999 23:19:13 -0400 (EDT), Bosko Milekic [EMAIL PROTECTED] said: ! ! This message is in MIME format. The first part should be readable text, ! while the remaining parts are likely unreadable without MIME-aware tools. ! Send mail to [EMAIL PROTECTED] for more info. ! !It would be preferable if text were sent as text, since MIME-encoded !patches require more effort to read. ! I deffinately agree. This is obviously my mistake, and I was somewhat in a rush, very lagged (modem, eurgh), using pine, and made several [dumb] typos in the 'Attatchement' field. ! I'm also aware of the possiblity of some people not liking the ! fact that we tsleep() forever (e.g. tsleep(x,x,x,0)). ! ! !I don't have any problem with sleeping forever -- but I am concerned !about the possibility of deadlock, especially when client-NFS is !involved. If the problem just moves around and has harder-to-recover !symptoms, the change isn't helping. Well, the main purpose of the code is to basically sleep until something is freed after we've already exhausted the mb_map arena (as I'm sure you've seen if you were able to grab the attachements). This is really a-la-limite stuff. In other words, if 'normal' local programs are having trouble because of mb_map exhaustion, then maxusers nmbclusters would have to be augmented. ! !The 4.3BSD code had two different behaviors: ! ! - For clusters, if M_WAIT was specified and there was no space !left in mb_map, it panicked. However, m_clalloc was never called with !M_WAIT, so that panic was effectively dead code. Hmmm. If m_clalloc was never called with M_WAIT, then all the code calling m_clalloc deffinately checked its return value. It probably had specific ways to deal with m_clalloc returning failures, too? ! ! - For mbufs, if M_WAIT was specified and there were no mbufs !available, it would sleep at PZERO - 1 (which was interruptible). ! !In 4.3, the code was able to deal with cluster allocation failing. We !have a somewhat different situation now, because many network !interface devices have less-flexible DMA mechanisms which don't allow !packet reception into non-contiguous buffers, so we need to have at !least a certain number of clusters available for this purpose. Exactly. This is the next challenge. As for things being interruptable, as I mentionned to a reply to Matt Dillon just a few seconds ago, getting the tsleep to occasionally expire is trivial. As you say above, it's dealing with the failure that is the issue. ! !-GAWollman ! !-- !Garrett A. Wollman | O Siem / We are all family / O Siem / We're all the same ![EMAIL PROTECTED] | O Siem / The fires of freedom !Opinions not those of| Dance in the burning flame !MIT, LCS, CRS, or NSA| - Susan Aglukark and Chad Irschick ! Cheers, Bosko Milekic. To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message