Re: Replacement for grep(1) (part 2)
: :It results sometimes in out of swap, too. : : Inetd is rate-limited by default nowadays, so this really doesn't apply. : :It really does apply. Inetd limits incoming connections per minute, not per :second. It is possible to use minute limit in a few seconds and cause a high :load. Sendmail is worse than inetd; it cannot limit incoming rate on : :Netch You can specify a maximum fork limit for inetd on a per-service basis. You are a year or two too late on these things. A great many improvements have been made to programs like sendmail and inetd explicitly to deal with overload situations. Web servers too. These were fairly simple changes as well. For sendmail it was as simple as making MaxDaemonChildren apply to queue runs - I submitted that one to Eric Allman two years ago and it's been a part of sendmail since then. For inetd it is the -c, -C, and -R options (which can be specified on a per-service basis as well). Dima and I added the -R option back in 1997 specifically to help with DOS attacks. Sendmail is not an issue when properly configured. -Matt Matthew Dillon [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: Replacement for grep(1) (part 2)
Brian F. Feldman wrote: There are other ways. For example, even if a user account is resource limited, root processes (such as sendmail, popper, identd, and so forth) are not. Attacks against these servers generally result in very high loads and sometimes make it difficult to login to fix the problem, but do not result in running out of swap. It results sometimes in out of swap, too. Inetd is rate-limited by default nowadays, so this really doesn't apply. It really does apply. Inetd limits incoming connections per minute, not per second. It is possible to use minute limit in a few seconds and cause a high load. Sendmail is worse than inetd; it cannot limit incoming rate on established connection. Butenko's (bute...@stalker.com) DoS attack to sendmail is to send thousands of letters to local user thru fast netork connection (i.e., Ethernet) thru one established TCP connection; the only barrier is testing of LA before sending '250 XXX message accepted to delivery' reply and fork-and-deliver-or-queue-and-exit decision, but attacker can send too many letters in few seconds; a hundreds of delivery processes locked on /usr/libexec/mail.local mailbox waiting. LA counts system state characteristics of last minute and thus is similar to average patients' temperature per hospital per last year. ;( I have seen a variant of this attack on my mail hosts, when host with 6000 letters in mail queue (mail2news server) sent all its mail to smarthost (uucp spool server); after ~500 letters, sendmail on smarthost closed port 25 on RefuseLA; it was saved from out-of-swap only because domain resolving spent some time. The only mechanism against such type of attack I can imagine is to sm_sleep(1) at mail from: smtp server code or before '250 Message accepted for delivery'. For inetd, we must limit connections per second, not per minute. -- Netch To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: Replacement for grep(1) (part 2)
Matthew Dillon wrote: Give me a shell and I can crash any machine. Oh. ;| A good example of this is sendmail. Before the MaxDaemonChildren and MaxArticleSize options, it was possible for sendmail to overcommit a machine. In this case the overcommit that can occur is with I/O, not swap. As a general performance rule, you have to set MaxDaemonChildren and MaxArticleSize to prevent the overcommit from occuring. This is a function of sendmail, not a function of the kernel. Sigh. ((c)you) Sendmail can overcommit a machine with right set of MaxDaemonChildren, MaxArticleSize, QueueLA RefuseLA options - I have seen such situations. MaxDaemonChildren limits only number of main processes for incoming connections (plus queue run processes). For each connection, after main from: and until accepting message, server process for incoming connection forks child which accepts recipient list and letter body. After message accepting, that child can fork delivery process. A queue run process with O ForkEachJob=true option, which is default, can create a delivery process for each queue job (in my practice, queue of more than 1000 jobs is ordinary event). All these forks depend only on one test - get current LA and compare it with QueueLA - which fail when high load appeared less than one minute ago. To prevent its overcommit, (I interfere in details with parallel message) the minimal (and possibly not enough) setup set is: 1) patch - insert sm_sleep(1) to server subprocess code before accepted reply - limit incoming mail rate; 2) Desrease QueueLA for listening daemon to sub-minimal value (i.e.2); 3) Increase QueueLA for queue running daemon to high values (i.e.50) and set them OForkEachJob=false. But most of these tunings are indirect. A direct tuning invented experimentally on my mail servers is specially hacked pstat program that returns 1 if either swap or file descriptors are used more than 2/3, 0 otherwise; on getting 1, sendmail stops delivering. But, it's pity, this check is unportable. (P.S. Don't tell me change MTA; this is fully another question.) Another good example is a web server. A web server must have specific limitations on the number of simultanious connections it is allowed to handle at once and on the number of CGI's or other auxillary programs that are allowed to be running at any given time. The overcommit issue here has nothing to do with swap and everything to do with performance. Specifically, these limitations exist to avoid cascade failures. As in sendmail case, you propose make some calculations (which are difficult and non-trivial to newbies) to make appreciations of nesessary resources. Another way, which is imho more acceptable, is to provide not hard barriers (SIGKILL on overcommitting), but soft barriers (i.e., stop memory allocating for non-wheel users when memory begins to exhaust). Extra 64M of memory or a disk for swap is commonly quite more cheaper than profitloss on critical service crash. In the same manner any truely critical system server must handle the resource management itself to deal with all sorts of problem situations, including memory. You do not need to build any of this control into the kernel. No, we need it. Not every server can be patched for such tests (due to loss of sources or another reason), not every admin can make nesessary patches. Kernel must help in it. -- Netch To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: Replacement for grep(1) (part 2)
: :It results sometimes in out of swap, too. : : Inetd is rate-limited by default nowadays, so this really doesn't apply. : :It really does apply. Inetd limits incoming connections per minute, not per :second. It is possible to use minute limit in a few seconds and cause a high :load. Sendmail is worse than inetd; it cannot limit incoming rate on : :Netch You can specify a maximum fork limit for inetd on a per-service basis. You are a year or two too late on these things. A great many improvements have been made to programs like sendmail and inetd explicitly to deal with overload situations. Web servers too. These were fairly simple changes as well. For sendmail it was as simple as making MaxDaemonChildren apply to queue runs - I submitted that one to Eric Allman two years ago and it's been a part of sendmail since then. For inetd it is the -c, -C, and -R options (which can be specified on a per-service basis as well). Dima and I added the -R option back in 1997 specifically to help with DOS attacks. Sendmail is not an issue when properly configured. -Matt Matthew Dillon dil...@backplane.com To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: Swap overcommit (was Re: Replacement for grep(1) (part 2))
[cc: list trimmed] On Thu, 15 Jul 1999 [EMAIL PROTECTED] wrote: In that scenario, the 512MB of swap I assigned to this machine would be dangerously low. With 13GB disks available for a couple of hundred bucks, my machines aren't going to run out of swap space any time soon, even if I commit to disk. All I want for Christmas is a knob to disable overcommit. --lyndon CVSup the source repository and start writing. Sander There is no love, no good, no happiness and no future - all these are just illusions. To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: Swap overcommit (was Re: Replacement for grep(1) (part 2))
Matthew Dillon wrote: Something is weird here. If the solaris people are using a SWAPSIZE + REALMEM VM model, they have to allow the allocated + reserved space go +REALMEM bytes over available swap space. If not they are using only a SWAPSIZE VM model. I did not check if the model was a SWAPSIZE+REALMEM or a SWAPSIZE model. Anyway, I think you are assuming that the "swap -s" command shows as total memory just the swap space... Maybe, maybe not. I don't know. But the space against which I reached the ceiling *was* the one reported in the "swap -s" command. Wait - does Solaris normally use swap files or swap partitions? Or is it that weird /tmp filesystem stuff? If it normally uses swap files and allows holes then that explains everything. I'd say partitions. While perusing man pages, I caught briefly the comment that a swap partition could overwrite a normal partition, in a man page about a special command to create swap partitions. Anything you'd like me to check in particular? If you have any source code you'd like me to run, just send it to [EMAIL PROTECTED], though I can only run them at the earliest on monday. Well, at least my monday is your sunday night... :-) -- Daniel C. Sobral(8-DCS) [EMAIL PROTECTED] [EMAIL PROTECTED] "Would you like to go out with me?" "I'd love to." "Oh, well, n... err... would you?... ahh... huh... what do I do next?" To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: Replacement for grep(1) (part 2)
[EMAIL PROTECTED] (Chris G. Demetriou) writes: Matthew Dillon [EMAIL PROTECTED] writes: The text size of a program is irrelevant, because swap is never allocated for it. The data and BSS are only relevant when they No, you can mprotect read-only vnode mappings to writable. Most things wouldn't be hurt badly if this changed, though, I suspect that this already varies between operating systems. are modified. The only thing swap is ever used for is the dynamic allocation of memory. There are three ways to do it: sbrk(), mmap(... MAP_ANON), or mmap(... MAP_PRIVATE). yup, almost: not all MAP_PRIVATE mappings need backing store, only MAP_PRIVATE and writeable mappings. (MAP_PRIVATE does _not_ guarantee that you won't see modifications made via other MAP_SHARED mappings.) ...but in *this* case, you certainly shouldn't allow mprotect to fail (with what, ENOMEM?). It's certainly counterintuitive to me that mprotect could fail due to a resource shortage. Actually, only now have you brought that up. And, that's very system dependent. On NetBSD/i386 the default is 2MB, and, it's worth noting that you only need to reserve as much as the current stack limit allows (after that, you're going to get a signal anyway, and if more So what setrlimit accepts depends on how much memory is available? Ok, programs changing their stack limit are rare, but this would still be another API change. To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: Swap overcommit (was Re: Replacement for grep(1) (part 2))
"Daniel C. Sobral" wrote: It would be nice to have a way to indicate that, a la SIGDANGER. Ok, everybody is avoiding this, so I'll comment. Yes, this would be interesting, and a good implementation will very probably be committed. *BUT*, this is not as useful as it seems. Since the correct solution is buy more memory/increase swap (correct solution for our target markets, anyway), there is little incentive to implement it. So, I think people who can answer the above is thinking like "Well, it is useful, but it's not useful enough for me to spend my time on it, and I'm sure as hell don't want to write mini-papers on why it's not that useful". For those who wish to develop code for safety related systems that is not good enough. They have to prove that all code can handle the degradation of resources gracefully. Such code relies on guaranteed memory allocations or in the very least warnings of memory shortage and prioritized allocations. So the least important sub-systems die first. --Sean To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: Swap overcommit (was Re: Replacement for grep(1) (part 2))
: :For those who wish to develop code for safety related systems that is :not good enough. They have to prove that all code can handle the :degradation :of resources gracefully. Such code relies on guaranteed memory :allocations :or in the very least warnings of memory shortage and prioritized :allocations. :So the least important sub-systems die first. : :--Sean I'm sorry, but when you write code for a safety related system you do not dynamically allocate memory at all. It's all essentially static. There is no issue with the memory resource. Besides, none of the BSD's are certified for any of that stuff that I know of. What's next: A space shot? These what-if scenarios are getting ridiculous. -Matt Matthew Dillon [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: Swap overcommit (was Re: Replacement for grep(1) (part 2))
On Fri, 16 Jul 1999, Matthew Dillon wrote: I'm sorry, but when you write code for a safety related system you do not dynamically allocate memory at all. It's all essentially static. There is no issue with the memory resource. Besides, none of the BSD's are certified for any of that stuff that I know of. What's next: A space shot? These what-if scenarios are getting ridiculous. Well, NetBSD is slated to be used in the 'Space Acceleration Measurement System II', measuring the microgravity environment on the International Space Station using a distributed system based on several NetBSD/i386 boxes. Sometimes your 'what-if' senarios are others' standard operating procedures. David/absolute What _is_, what _should be_, and what _could be_ are all distinct. To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: Swap overcommit (was Re: Replacement for grep(1) (part 2))
On Fri, 16 Jul 1999, Matthew Dillon wrote: : Well, NetBSD is slated to be used in the 'Space Acceleration : Measurement System II', measuring the microgravity environment on : the International Space Station using a distributed system based : on several NetBSD/i386 boxes. : : Sometimes your 'what-if' senarios are others' standard operating : procedures. : : David/absolute : : What _is_, what _should be_, and what _could be_ are all distinct. Ummm... this doesn't sound like a critical system to me. It sounds like an experiment. It's probably an awfully expensive experiment (putting things into space is not cheap) From a financial viewpoint that may be considered critical. Cheers, Al -- Alan Horn - Sysadmin - Dreamworks (+1 818 695 6256) - [EMAIL PROTECTED] I am Connor MacLeod of the Clan MacLeod. I was born in 1518 in the village of Glenfinnan on the shores of Loch Sheil, and I am immortal. To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: Swap overcommit (was Re: Replacement for grep(1) (part 2))
I'm sorry, but when you write code for a safety related system you do not dynamically allocate memory at all. It's all essentially static. There is no issue with the memory resource. Besides, none of the BSD's are certified for any of that stuff that I know of. Sometimes it's not feasible to statically allocate memory. You dynamically allocate all the memory you need at program initialization (and no, we don't want to manage a pool of memory ourselves - that's what the OS is for). Note that languages such as Ada raise exceptions when memory allocation fails. The underlying run-time relies on malloc returning null in order to raise an exception. Normally, programs written in Ada take great care to gracefully handle these exceptions. All the C programs that we've ever written also take great care in handling NULL returns from malloc. I have no problem with overcommit, but I can see the need that some folks have for turning it off. If you don't want to write the code to allow this, that's fine - you don't want/need it, so why should you? But if other folks see a need for it, let _them_ write the hooks for it :-) Dan Eischen [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: Swap overcommit (was Re: Replacement for grep(1) (part 2))
: I'm sorry, but when you write code for a safety related system you : do not dynamically allocate memory at all. It's all essentially static. : There is no issue with the memory resource. Besides, none of the BSD's are : certified for any of that stuff that I know of. : :Sometimes it's not feasible to statically allocate memory. You :dynamically allocate all the memory you need at program initialization :(and no, we don't want to manage a pool of memory ourselves - that's :what the OS is for). :... :Note that languages such as Ada raise exceptions when memory allocation :fails. The underlying run-time relies on malloc returning null in :order to raise an exception. Normally, programs written in Ada Simply set a resource limit. You are making the classic mistake of assuming that a fail-safe in the O.S. must be integrated all the way down into the user level when, in fact, it is simply a matter of setting a resource limit. When you are running an embedded system and have full control over the software being run, setting resource limits will do what you want. By doing so you are effectively managing the software modules on a module-by-module basis and not allowing one module to indirectly effect another. This is what you want to do in an embedded system: You do not want to create a situation where a failure in one module cascades into others. -Matt Matthew Dillon [EMAIL PROTECTED] :take great care to gracefully handle these exceptions. All the C :programs that we've ever written also take great care in handling :NULL returns from malloc. : :I have no problem with overcommit, but I can see the need that :some folks have for turning it off. If you don't want to write :the code to allow this, that's fine - you don't want/need it, :so why should you? But if other folks see a need for it, let :_them_ write the hooks for it :-) : :Dan Eischen :[EMAIL PROTECTED] : To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: Swap overcommit (was Re: Replacement for grep(1) (part 2))
On Fri, 16 Jul 1999, Daniel C. Sobral wrote: Technical follow-up: Contrary to what I previously said, a number of tests reveal that Solaris, indeed, does not overcommit. All non-read only segments, Neither does HP/UX 10.x. (Haven't got an 11 box handy to check.) The memory allocation process is something like this: 1) reserve is allocated from a swap area. Preference is given to swap devices, even if a swap file system has a higher priority. 2) If there is no space on a swap device, swap is allocated from a swap filesystem, if one is configured. If there is nothing to be allocated in a swap filesystem, the kernel attempts to grow the swap file on a filesystem by swchunk (a tunable, default 2MB, I think). (Swap on filesystems starts at zero or swchunck, and is grown as needed up to the limit spec'd at swapon(1M) time.) 3) If this fails, either because there is no space on the file system, or the swapfile has reached its limit, memory (actual core) is allocated. The system tunable swapmem_on determines whether memory is used for swap reserve or not. Default is to use it. 4) If there isn't swap to reserve, the request fails, even if none of the reserved swap is used. The swapinfo(1M) man page makes this quite clear: +Requests for more paging space will fail when they cannot be satisfied by reserving device, file system, or memory paging, even if some of the reserved paging space is not yet in use. Thus it is possible for requests for more paging space to be denied when some, or even all, of the paging areas show zero usage - space in those areas is completely reserved. The upside of this is that if you do run out of swap, the kernel doesn't kill random processes. The downside is, I have seen 4GB boxes, with plenty of swap, run out with less than a gig of memory actually in use. Oh, and if you swap to a filesystem, you can fill it up, without actually using any of the space. I don't know which behaviors is more bogus. David Scheidt To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: Swap overcommit (was Re: Replacement for grep(1) (part 2))
On Thu, Jul 15, 1999 at 09:57:31PM -0700, Matthew Dillon wrote: Something is weird here. If the solaris people are using a SWAPSIZE + REALMEM VM model, they have to allow the allocated + reserved space go +REALMEM bytes over available swap space. If not they are using only a SWAPSIZE VM model. Wait - does Solaris normally use swap files or swap partitions? Or is it that weird /tmp filesystem stuff? If it normally uses swap files and allows holes then that explains everything. No, swap is slice based in Solaris. tmpfs is just a filesystem (much like MFS) which uses swap as backing store. I will admit to never quite understanding the relationship of how much swap tmpfs is willing to steal though... Maybe I should go and read the answerbook (http://docs.sun.com if you want a peek). -- Dom Mitchell -- Palmer Harvey McLane -- Unix Systems Administrator In Mountain View did Larry Wall Sedately launch a quiet plea: That DOS, the ancient system, shall On boxes pleasureless to all Run Perl though lack they C. -- ** This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify the system manager. This footnote also confirms that this email message has been swept by MIMEsweeper for the presence of computer viruses. ** To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: Swap overcommit (was Re: Replacement for grep(1) (part 2))
[cc: list trimmed] On Thu, 15 Jul 1999 lyn...@orthanc.ab.ca wrote: In that scenario, the 512MB of swap I assigned to this machine would be dangerously low. With 13GB disks available for a couple of hundred bucks, my machines aren't going to run out of swap space any time soon, even if I commit to disk. All I want for Christmas is a knob to disable overcommit. --lyndon CVSup the source repository and start writing. Sander There is no love, no good, no happiness and no future - all these are just illusions. To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: Swap overcommit (was Re: Replacement for grep(1) (part 2))
Matthew Dillon wrote: Something is weird here. If the solaris people are using a SWAPSIZE + REALMEM VM model, they have to allow the allocated + reserved space go +REALMEM bytes over available swap space. If not they are using only a SWAPSIZE VM model. I did not check if the model was a SWAPSIZE+REALMEM or a SWAPSIZE model. Anyway, I think you are assuming that the swap -s command shows as total memory just the swap space... Maybe, maybe not. I don't know. But the space against which I reached the ceiling *was* the one reported in the swap -s command. Wait - does Solaris normally use swap files or swap partitions? Or is it that weird /tmp filesystem stuff? If it normally uses swap files and allows holes then that explains everything. I'd say partitions. While perusing man pages, I caught briefly the comment that a swap partition could overwrite a normal partition, in a man page about a special command to create swap partitions. Anything you'd like me to check in particular? If you have any source code you'd like me to run, just send it to c...@comp.cs.gunma-u.ac.jp, though I can only run them at the earliest on monday. Well, at least my monday is your sunday night... :-) -- Daniel C. Sobral(8-DCS) d...@newsguy.com d...@freebsd.org Would you like to go out with me? I'd love to. Oh, well, n... err... would you?... ahh... huh... what do I do next? To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: Replacement for grep(1) (part 2)
Daniel C. Sobral wrote: Eh? Reasonable programs *never* run into trouble. Trouble only happens when you have unreasonable programs around, or did not configure the system correctly. And if you did not configure the system correctly, why do you think you would be able to correctly estimate the stack needed for the various programs? Your words are bad words. Exhausting of any of main resources - virtual memory, disk space, process descriptors, file descriptors - is a terrible situation, but one must not fight against headache with headcutting. Every system can fall in uncontrolled state and eat all of some resource, and kernel stack is to prevent process pool part from this, not to destruct it. I had seen two boxes where swap was out misfortunately with bad results: on first (FreeBSD 2.2.7), system kills the cron (sic!) process, on second (Linux) syslogd, sendmail and some others became poisoned without any warnings. It is totally bad behavior; kernel must be friend, not enemy. Actions supposed enough by me for first (!) time: 1) Count in some kernel variables (readable by sysctl) overflows of virtual memory, file descriptors, process descriptors and other critical resources. This data must be available for watchdogs; for some systems, it is right to reboot them immediately after some overflow, not to try to work in poisoned state. 2) Run (in standard setup!) cron, syslogd and other important daemons from special init slot (as Linux and possibly other systems allow), not from startup scripts. Reason: they must be restarted when die without admin intervention and without wrappers which can also be killed on memory low. 3) Declare thresholds for critical resources; for example, when more than 80% of virtual memory is used, prevent everybody except euid==0 or egid==0 from allocating new memory. 4) Provide special signal (SIGXMEM?) to send messages that there is memory low and all have to shorten their memory. Daemons should interpret this signal similarly to SIGHUP, with exec() itself and restart. Now comes the people saying don't overcommit in *this* case, and overcommit in *that* case. Irrelevant. Programs are still getting killed because memory was overcommitted (with the added disadvantage of you not having as much memory as in a full overcommit mode). Kernel can kill processes that try to get unexistent memory. But when it did not prevent system from falling into overflow, it plays unfair game. -- Netch To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: Replacement for grep(1) (part 2)
jul...@whistle.com (Julian Elischer) writes: If you wanted to fix this, you could add a patch to malloc that touched every page that it handed to the application. (and trapped sig11s) How would you expect that to work? Several misunderstandings seem to be common regarding this issue (most not directed at you): - malloc almost never fails with NULL. This is not true, if resource limits are set properly, any one program using huge amounts of memory is going to hit them long before swap space is exhausted. - The program currently trying to get the page is the one that is killed. - Actually paging in all memory is going to protect a program from getting killed. This is going to make it *more likely* for it to be killed. - Not overcommitting doesn't consume huge amounts of reserve space unless programs do something special. A rough sum of memory usage can be computed by summing up all of the process VSZs plus your stack limit times the number of processes. How many of you would be willing to configure that much swap space? If you really wanted to run without overcommit, you'd only run statically linked binaries and set your stack limits to small values. This could be desirable for some (but not general-purpose) systems, an option for doing this wouldn't be entirely bogus. To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: Replacement for grep(1) (part 2)
c...@netbsd.org (Chris G. Demetriou) writes: Matthew Dillon dil...@apollo.backplane.com writes: The text size of a program is irrelevant, because swap is never allocated for it. The data and BSS are only relevant when they No, you can mprotect read-only vnode mappings to writable. Most things wouldn't be hurt badly if this changed, though, I suspect that this already varies between operating systems. are modified. The only thing swap is ever used for is the dynamic allocation of memory. There are three ways to do it: sbrk(), mmap(... MAP_ANON), or mmap(... MAP_PRIVATE). yup, almost: not all MAP_PRIVATE mappings need backing store, only MAP_PRIVATE and writeable mappings. (MAP_PRIVATE does _not_ guarantee that you won't see modifications made via other MAP_SHARED mappings.) ...but in *this* case, you certainly shouldn't allow mprotect to fail (with what, ENOMEM?). It's certainly counterintuitive to me that mprotect could fail due to a resource shortage. Actually, only now have you brought that up. And, that's very system dependent. On NetBSD/i386 the default is 2MB, and, it's worth noting that you only need to reserve as much as the current stack limit allows (after that, you're going to get a signal anyway, and if more So what setrlimit accepts depends on how much memory is available? Ok, programs changing their stack limit are rare, but this would still be another API change. To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: Swap overcommit (was Re: Replacement for grep(1) (part 2))
Matthew Dillon wrote: :On Tue, 13 Jul 1999 23:18:58 -0400 (EDT) : John Baldwin jobal...@vt.edu wrote: : : What does that have to do with overcommit? I student administrate a undergrad : CS lab at a university, and when student's programs misbehaved, they generate a : fault and are killed. The only machines that reboot on us without be : explicitly told to are the NT ones, and yes we run FreeBSD. : :What does it have to do with overcommit? Everthing in the world! : :If you have a lot of users, all of which have buggy programs which eat :a lot of memory, per-user swap quotas don't necessarily save your butt. If every single one of your users is trying to crash your machine daily, maybe you should consider throwing them off the system and finding users that are less hostile. This conversation is getting silly. Do you actually believe that an operating system can magically protect itself 100% from armloads of hostile users? Give me a break. You people are crazy. If you have something worthwhile to say i'll listen, but these the sky is falling! arguments are idiotic. -Matt students != hostile users Making mistakes is part of learning. Patrick To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: Swap overcommit (was Re: Replacement for grep(1) (part 2))
Patrick Welche wrote: students != hostile users We obviously have known different students... :-) Making mistakes is part of learning. A hostile user is one which will act in a non-friendly manner. Whether intentionaly or not is irrelevant from the point of view of the administrator, as far as protecting the system goes. -- Daniel C. Sobral(8-DCS) d...@newsguy.com d...@freebsd.org Would you like to go out with me? I'd love to. Oh, well, n... err... would you?... ahh... huh... what do I do next? To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: Replacement for grep(1) (part 2)
Daniel C. Sobral wrote: 4.4BSD derived system cannot do this, and have to use different machine for such applications. Incorrect. We can set *limits* to the users, so they won't be able to crash down the system. No. Really, not all users are used system in the same time. And it is too cruel to set too small limits. And, average system has user limits quite more than (total_resource*2/3)/n_users (2/3 is sub-optimal modifier). But, if too many users began to use system, they can overflow the resource. Group limits can make problem softer, but not more than a little. I don't remember now English word for soft barrier, the Russian word is 'dempfer' ;) System must provide such soft barrier to prevent overflow long far from the real overflow. Imho, 20% of typical critical resource must be prevented. -- Netch To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: Swap overcommit (was Re: Replacement for grep(1) (part 2))
Daniel C. Sobral wrote: It would be nice to have a way to indicate that, a la SIGDANGER. Ok, everybody is avoiding this, so I'll comment. Yes, this would be interesting, and a good implementation will very probably be committed. *BUT*, this is not as useful as it seems. Since the correct solution is buy more memory/increase swap (correct solution for our target markets, anyway), there is little incentive to implement it. So, I think people who can answer the above is thinking like Well, it is useful, but it's not useful enough for me to spend my time on it, and I'm sure as hell don't want to write mini-papers on why it's not that useful. For those who wish to develop code for safety related systems that is not good enough. They have to prove that all code can handle the degradation of resources gracefully. Such code relies on guaranteed memory allocations or in the very least warnings of memory shortage and prioritized allocations. So the least important sub-systems die first. --Sean To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: Swap overcommit (was Re: Replacement for grep(1) (part 2))
: :For those who wish to develop code for safety related systems that is :not good enough. They have to prove that all code can handle the :degradation :of resources gracefully. Such code relies on guaranteed memory :allocations :or in the very least warnings of memory shortage and prioritized :allocations. :So the least important sub-systems die first. : :--Sean I'm sorry, but when you write code for a safety related system you do not dynamically allocate memory at all. It's all essentially static. There is no issue with the memory resource. Besides, none of the BSD's are certified for any of that stuff that I know of. What's next: A space shot? These what-if scenarios are getting ridiculous. -Matt Matthew Dillon dil...@backplane.com To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: Swap overcommit (was Re: Replacement for grep(1) (part 2))
On Fri, 16 Jul 1999, Matthew Dillon wrote: I'm sorry, but when you write code for a safety related system you do not dynamically allocate memory at all. It's all essentially static. There is no issue with the memory resource. Besides, none of the BSD's are certified for any of that stuff that I know of. What's next: A space shot? These what-if scenarios are getting ridiculous. Well, NetBSD is slated to be used in the 'Space Acceleration Measurement System II', measuring the microgravity environment on the International Space Station using a distributed system based on several NetBSD/i386 boxes. Sometimes your 'what-if' senarios are others' standard operating procedures. David/absolute What _is_, what _should be_, and what _could be_ are all distinct. To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: Swap overcommit (was Re: Replacement for grep(1) (part 2))
: Well, NetBSD is slated to be used in the 'Space Acceleration : Measurement System II', measuring the microgravity environment on : the International Space Station using a distributed system based : on several NetBSD/i386 boxes. : : Sometimes your 'what-if' senarios are others' standard operating : procedures. : : David/absolute : : What _is_, what _should be_, and what _could be_ are all distinct. Ummm... this doesn't sound like a critical system to me. It sounds like an experiment. None of the BSD's (nor NT, nor any other complex general purpose operating system) are certified for critical systems in space. The reason is simple: None of these operating systems can deal with memory faults caused by radiation. You might see it for internal communications or non-critical sensing, but you aren't going to see it for external communications or thruster control. -Matt Matthew Dillon dil...@backplane.com To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: Swap overcommit (was Re: Replacement for grep(1) (part 2))
On Fri, 16 Jul 1999, Matthew Dillon wrote: : Well, NetBSD is slated to be used in the 'Space Acceleration : Measurement System II', measuring the microgravity environment on : the International Space Station using a distributed system based : on several NetBSD/i386 boxes. : : Sometimes your 'what-if' senarios are others' standard operating : procedures. : : David/absolute : : What _is_, what _should be_, and what _could be_ are all distinct. Ummm... this doesn't sound like a critical system to me. It sounds like an experiment. It's probably an awfully expensive experiment (putting things into space is not cheap)
Re: Swap overcommit (was Re: Replacement for grep(1) (part 2))
I'm sorry, but when you write code for a safety related system you do not dynamically allocate memory at all. It's all essentially static. There is no issue with the memory resource. Besides, none of the BSD's are certified for any of that stuff that I know of. Sometimes it's not feasible to statically allocate memory. You dynamically allocate all the memory you need at program initialization (and no, we don't want to manage a pool of memory ourselves - that's what the OS is for). Note that languages such as Ada raise exceptions when memory allocation fails. The underlying run-time relies on malloc returning null in order to raise an exception. Normally, programs written in Ada take great care to gracefully handle these exceptions. All the C programs that we've ever written also take great care in handling NULL returns from malloc. I have no problem with overcommit, but I can see the need that some folks have for turning it off. If you don't want to write the code to allow this, that's fine - you don't want/need it, so why should you? But if other folks see a need for it, let _them_ write the hooks for it :-) Dan Eischen eisc...@vigrid.com To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: Swap overcommit (was Re: Replacement for grep(1) (part 2))
: I'm sorry, but when you write code for a safety related system you : do not dynamically allocate memory at all. It's all essentially static. : There is no issue with the memory resource. Besides, none of the BSD's are : certified for any of that stuff that I know of. : :Sometimes it's not feasible to statically allocate memory. You :dynamically allocate all the memory you need at program initialization :(and no, we don't want to manage a pool of memory ourselves - that's :what the OS is for). :... :Note that languages such as Ada raise exceptions when memory allocation :fails. The underlying run-time relies on malloc returning null in :order to raise an exception. Normally, programs written in Ada Simply set a resource limit. You are making the classic mistake of assuming that a fail-safe in the O.S. must be integrated all the way down into the user level when, in fact, it is simply a matter of setting a resource limit. When you are running an embedded system and have full control over the software being run, setting resource limits will do what you want. By doing so you are effectively managing the software modules on a module-by-module basis and not allowing one module to indirectly effect another. This is what you want to do in an embedded system: You do not want to create a situation where a failure in one module cascades into others. -Matt Matthew Dillon dil...@backplane.com :take great care to gracefully handle these exceptions. All the C :programs that we've ever written also take great care in handling :NULL returns from malloc. : :I have no problem with overcommit, but I can see the need that :some folks have for turning it off. If you don't want to write :the code to allow this, that's fine - you don't want/need it, :so why should you? But if other folks see a need for it, let :_them_ write the hooks for it :-) : :Dan Eischen :eisc...@vigrid.com : To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: Swap overcommit (was Re: Replacement for grep(1) (part 2))
Can we kill this thread already? This resolves nothing. The only good to come of this is all of the nice doc-proj input Matt is providing (and providing well, I might add.) There is no point that hasn't been rehashed a dozen times over, and you (the ones who want overcommitting turned off) are not helping the S/N ratio. Brian Fundakowski Feldman _ __ ___ ___ ___ ___ gr...@freebsd.org _ __ ___ | _ ) __| \ FreeBSD: The Power to Serve!_ __ | _ \._ \ |) | http://www.FreeBSD.org/ _ |___/___/___/ To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: Swap overcommit (was Re: Replacement for grep(1) (part 2))
On Fri, 16 Jul 1999, Daniel C. Sobral wrote: Technical follow-up: Contrary to what I previously said, a number of tests reveal that Solaris, indeed, does not overcommit. All non-read only segments, Neither does HP/UX 10.x. (Haven't got an 11 box handy to check.) The memory allocation process is something like this: 1) reserve is allocated from a swap area. Preference is given to swap devices, even if a swap file system has a higher priority. 2) If there is no space on a swap device, swap is allocated from a swap filesystem, if one is configured. If there is nothing to be allocated in a swap filesystem, the kernel attempts to grow the swap file on a filesystem by swchunk (a tunable, default 2MB, I think). (Swap on filesystems starts at zero or swchunck, and is grown as needed up to the limit spec'd at swapon(1M) time.) 3) If this fails, either because there is no space on the file system, or the swapfile has reached its limit, memory (actual core) is allocated. The system tunable swapmem_on determines whether memory is used for swap reserve or not. Default is to use it. 4) If there isn't swap to reserve, the request fails, even if none of the reserved swap is used. The swapinfo(1M) man page makes this quite clear: +Requests for more paging space will fail when they cannot be satisfied by reserving device, file system, or memory paging, even if some of the reserved paging space is not yet in use. Thus it is possible for requests for more paging space to be denied when some, or even all, of the paging areas show zero usage - space in those areas is completely reserved. The upside of this is that if you do run out of swap, the kernel doesn't kill random processes. The downside is, I have seen 4GB boxes, with plenty of swap, run out with less than a gig of memory actually in use. Oh, and if you swap to a filesystem, you can fill it up, without actually using any of the space. I don't know which behaviors is more bogus. David Scheidt To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: Swap overcommit (was Re: Replacement for grep(1) (part 2))
Hi everyone, I've been following this discussion almost from the beginning, and I have the feeling that we're not _really_ getting very far. There's good arguments for and against overcommit, depending on your point of view and your requirements. What I do see is a not-so-openly voiced consent that the way resource(sp?) shortages are handled in an overcommitting system (SIGKILL) makes some of us rather unhappy. I therefore suggest those of us who would like to see a change in this area pool their efforts and energies to work on a mechanism that handles resource shortage in a more graceful way. cheerio Michael -- [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: Replacement for grep(1) (part 2)
Kevin Schoedel wrote: Imagine a reasonably big program, like Netscape or Emacs, of which you usually just use a subset of features. There can easily be many megabytes of code and data in them you never actually use, or you don't _usually_ use (like the people who use emacs like it was vi :). Without overcommit, you need to allocate all that memory for the code, no matter whether you end up using it or not. With overcommit, there is no such problem. Code, static data, and not-yet-written writable data should be backed by the executable file, not by swap space, so unused code and tables should not be a problem. TEXT should be backed by the executable, as long a the program doesn't change it to read/write. That's not the code I was refering to. Not-yet-written blah-blah-blah should be backed by: 1) The executable file if you are overcommitting. 2) RAM/Swap if you are not. If you don't do this, you are overcommitting. Proof: let the system exaust it's memory. Change a single byte in the not-yet-written stuff. Now you need more memory than you have to comply with a regular operation (like changing the value of a global variable), which means you overcommitted. Now comes the people saying "don't overcommit in *this* case, and overcommit in *that* case". Irrelevant. Programs are still getting killed because memory was overcommitted (with the added disadvantage of you not having as much memory as in a full overcommit mode). Stack is more interesting. There might be a place for a global overcommit switch. I think I'd be happier with a scheme in which stack the first page or first few pages are committed (so that reasonable programs will never run into trouble) and remaining stack is over-/un-committed by default, along with means for unusual programs to commit (and/or test commitability of) subsequent pages. Eh? Reasonable programs *never* run into trouble. Trouble only happens when you have unreasonable programs around, or did not configure the system correctly. And if you did not configure the system correctly, why do you think you would be able to correctly estimate the stack needed for the various programs? -- Daniel C. Sobral(8-DCS) [EMAIL PROTECTED] [EMAIL PROTECTED] "Would you like to go out with me?" "I'd love to." "Oh, well, n... err... would you?... ahh... huh... what do I do next?" To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: Swap overcommit (was Re: Replacement for grep(1) (part 2))
At 6:29 PM -0700 7/14/99, Matthew Dillon wrote: If 1G isn't enough, spend another $30 and throw 2G of swap online. Or perhaps dedicate an entire $150 disk and throw 6+ GB of swap online. The equivalent setup using a non-overcommit model would require considerably more swap to have the same reliability. Please note that we're talking at cross-purposes here, mainly because I didn't realize this same general topic was being beaten to death in the 'replacement for grep' thread (which I have not been following). Speaking for just me myself and I, I have no problems with the current overcommit model. All I'd like to do is have a way to indicate which processes should not get booted first, if the system does indeed run out of swap and needs to boot some processes. However, other people seem much more worked up about this topic than I am, and thus what I (personally) meant as "just casual questions" seem to be taken as "demands that something be done, RIGHT NOW". I now realize that some people are arguing that malloc should return an error if the system runs out of space, but that's not what I am thinking about. So, I think I'll bow out of this discussion for now, and maybe try to discuss my "casual questions" sometime in a different context... --- Garance Alistair Drosehn = [EMAIL PROTECTED] Senior Systems Programmer or [EMAIL PROTECTED] Rensselaer Polytechnic Institute To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: Swap overcommit (was Re: Replacement for grep(1) (part 2))
On Thu, 15 Jul 1999, Daniel C. Sobral wrote: Uh... like any modern unix, Solaris overcommits. On Thu, 15 Jul 1999 08:46:36 -0700 (PDT), "Eduardo E. Horvath" [EMAIL PROTECTED] said: Where do you guys get this misinformation? : Note the `19464k reserved'; that space has been reserved but not yet allocated. Both Dillon and Sobral mistakenly claimed that "Solaris overcommits", this fact seems to be somewhat suggestive. And also, the followings are allocated memory and reserved memory in my environment. (This table also includes Eduardo's example) SunOS allocated reservedtotal total/allocated - - 4.1.4 4268k1248k5516k 1.2924 4.1.2 7732k1492k9224k 1.193 4.1.4 8848k3080k 11928k 1.3481 4.1.4 13532k6772k 20304k 1.5004 5.5.1 15312k5092k 20404k 1.3325 4.1.3 16112k6512k 22624k 1.4042 4.1.2 26356k1620k 27976k 1.0615 4.1.4 26560k3756k 30316k 1.1414 5.526076k 11348k 37424k 1.4352 4.1.4 32984k5556k 38540k 1.1684 5.632448k7072k 39520k 1.2179 4.1.4 38056k3692k 41748k 1.097 4.1.4 49064k7672k 56736k 1.1564 4.1.4 67012k7800k 74812k 1.1164 4.1.4 99348k 16956k 116304k 1.1707 4.1.4 118288k 11780k 130068k 1.0996 5.6 231968k 18880k 250848k 1.0814 5.7 307240k 19464k 326704k 1.0634 (sorted by total amount of used swap) In those examples, non-overcommiting system requires 1.06x ... 1.50x more swap space than overcommiting system. This table also indicates that in proportion as total used swap increase the ratio will decrease. And extra swap space required on non-overcommiting system is approximately several tens mega bytes. i.e. The extra cost of non-overcommiting system is less than ten dollers in my environment. Matt Dillon claimed that non-overcommiting system requires 8x or more swap space than overcommiting system. That's just wrong as above. (There might be cases which requires 8x swap, but it is not typical like Dillon said.) If you don't want non-overcommiting system, because you don't want to pay it's cost. That's OK, but please don't force us to accept your limited view. -- soda To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: Swap overcommit (was Re: Replacement for grep(1) (part 2))
:Both Dillon and Sobral mistakenly claimed that "Solaris overcommits", :this fact seems to be somewhat suggestive. : :And also, the followings are allocated memory and reserved memory :in my environment. (This table also includes Eduardo's example) : : SunOS allocated reservedtotal total/allocated : - - : 4.1.4 4268k1248k5516k 1.2924 : 4.1.2 7732k1492k9224k 1.193 : 4.1.4 8848k3080k 11928k 1.3481 : 4.1.4 13532k6772k 20304k 1.5004 : 5.5.1 15312k5092k 20404k 1.3325 : 4.1.3 16112k6512k 22624k 1.4042 : 4.1.2 26356k1620k 27976k 1.0615 : 4.1.4 26560k3756k 30316k 1.1414 : 5.526076k 11348k 37424k 1.4352 : 4.1.4 32984k5556k 38540k 1.1684 : 5.632448k7072k 39520k 1.2179 : 4.1.4 38056k3692k 41748k 1.097 : 4.1.4 49064k7672k 56736k 1.1564 : 4.1.4 67012k7800k 74812k 1.1164 : 4.1.4 99348k 16956k 116304k 1.1707 : 4.1.4 118288k 11780k 130068k 1.0996 : 5.6 231968k 18880k 250848k 1.0814 : 5.7 307240k 19464k 326704k 1.0634 : : (sorted by total amount of used swap) : :In those examples, non-overcommiting system requires 1.06x ... 1.50x :... :soda Umm... how are you getting the reserved numbers? Are you sure that isn't simply cached swap blocks? I.E. when something gets swapped out and then is swapped back in and dirtied, Solaris may be holding the swap block assignment rather then letting it go. FreeBSD-stable does the same thing. FreeBSD-current does not -- it lets it go in order to be able to reallocate it later as part of a contiguous swath for performance reasons. These 'extra' swap blocks are effectively reserved but not actually allocated. They can be reassigned. The numbers above are very similar to what you would see in a redirtying-cache swap block situation on a FreeBSD-stable system. If I add up all the unshared writeable segments on my home box - that is, all segments for which one would potentially have to reserve swap space - I get a total of around 382MB. The machine is currently eating around 100MB of ram and 5MB of swap, or around a 3.5:1 ratio in this case. A non-overcommit model would have to reserve swap space for 382MB - 100MB = 282MB verses the 5MB of swap the machine actually allocates. -Matt Matthew Dillon [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: Swap overcommit (was Re: Replacement for grep(1) (part 2))
:"pstat -s" on SunOS4, and "swap -s" on SunOS5. From Solaris man page: : ::-s Print summary information about total swap :: space usage and availability: :: :: allocated The total amount of swap space :: (in 1024-byte blocks) :: currently allocated for use as :: backing store. :: :: reservedThe total amount of swap space :: (in 1024-bytes blocks) not :: currentlyallocated,but :: claimed by memory mappings for :: possible future use. :: :: usedThe total amount of swap space :: (in 1024-byte blocks) that is :: either allocated or reserved. :-- :soda Yah, that's what I thought. A solaris expert could tell us for sure but I am pretty sure those are simply cached swap blocks after-the-fact, not actual reservations on potentially swappable space. -Matt Matthew Dillon [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: Swap overcommit (was Re: Replacement for grep(1) (part 2))
::-s Print summary information about total swap :: space usage and availability: :: :: allocated The total amount of swap space :: (in 1024-byte blocks) :: currently allocated for use as :: backing store. :: :: reservedThe total amount of swap space :: (in 1024-bytes blocks) not :: currentlyallocated,but :: claimed by memory mappings for :: possible future use. :: :: usedThe total amount of swap space :: (in 1024-byte blocks) that is :: either allocated or reserved. :-- :soda It would be really easy to test this. Write a program that malloc's 32MB of space and touches it, then sleeps 10 seconds and forks, with both child and parent sleeping afterwords. ( the parent and the forked child should not touch the memory after the fork occurs ). Do a pstat -s before, after the initial touch, and after the fork. If you do not see the reserved swap space jump by 32MB after the fork, it isn't what you thought it was. -Matt Matthew Dillon [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: Swap overcommit (was Re: Replacement for grep(1) (part 2))
On Wed, 14 Jul 1999, John Nemeth wrote: On Jul 15, 2:40am, "Daniel C. Sobral" wrote: } Garance A Drosihn wrote: } At 12:20 AM +0900 7/15/99, Daniel C. Sobral wrote: } In which case the program that consumed all memory will be killed. } The program killed is +NOT+ the one demanding memory, it's the one } with most of it. } } But that isn't always the best process to have killed off... } } Sure it is. :-) Let's see... This statement is absurd. Only a comptetant admin can decide which process can be killed. No arbitrary decision is going to be correct. } It would be nice to have a way to indicate that, a la SIGDANGER. How about assigning something like a class to process, which gives VM a hint which processes should be killed first without much thinking, and which the last (or never)? In other words, let's say class 10 means "totally disposable, kill whenever you want", and class 1 means "never try to kill me". Of course, most processes would get some default value, and superuser could "renice" them to more resistant class. This way both sides of the discussion would be satisfied :-) Andrzej Bialecki // [EMAIL PROTECTED] WebGiro AB, Sweden (http://www.webgiro.com) // --- // -- FreeBSD: The Power to Serve. http://www.freebsd.org // --- Small Embedded FreeBSD: http://www.freebsd.org/~picobsd/ To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: Swap overcommit (was Re: Replacement for grep(1) (part 2))
:Before program start: :total: 2k bytes allocated + 4792k reserved = 24792k used, 191048k available : :After malloc, before touch: :total: 18756k bytes allocated + 37500k reserved = 56256k used, 159580k available : :After malloc + touch: :total: 52804k bytes allocated + 4852k reserved = 57656k used, 158184k available : :After fork: :total: 52928k bytes allocated + 37644k reserved = 90572k used, 125264k available : :[there has been a little background activity, but the numbers speak for themselves] : : :Daniel Assuming the allocated field is not inclusive of real memory, what we have is swap reservation under solaris for clean pages, and allocation and assignment for dirty pages. The grand total will tell you the total VM potential for malloc'd space but does not appear to tell you how much swap is actually active - i.e. was written to and contains valid data. It would be interesting to see if the stack segment is included in the reservation. Try setting the stack resource limit to 32m and run the same program, except without bothering to malloc() or touch anything. See if the stack segment is included in the reservation field. It would also be interesting to see how solaris deals with MAP_PRIVATE mmap's. If this is correct, then solaris is using a VMSPACE = SWAPSPACE model. FreeBSD uses a VMSPACE = SWAPSPACE + REALMEM model. -Matt Matthew Dillon [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: Swap overcommit (was Re: Replacement for grep(1) (part 2))
If this is correct, then solaris is using a VMSPACE = SWAPSPACE model. FreeBSD uses a VMSPACE = SWAPSPACE + REALMEM model. AFAIK it has been stated quite explicitly by the Solaris folks that Solaris 2.x uses VMSPACE = SWAPSPACE + REALMEM. This is *different* from SunOS 4.1.x. Steinar Haug, Nethelp consulting, [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: Swap overcommit (was Re: Replacement for grep(1) (part 2))
Here is what I get from one of BEST's mail www proxy machines. ~dillon/br adds the object size's together. 'swap' and 'default' objects refers to unbacked VM objects - and none of the processes running fork shared unbacked objects so we don't have to worry about that. The 'swap' designation means that at least one page in the object has been assigned swap. The default designation means that no pages have been assigned swap. The pages can be dirty or clean. Typical /proc/PID/map output looks like this (taken from one of the sendmail processes). The lines I've marked are the ones being counted as unbacked/swap-backed VM. The rest are vnode-backed and not counted. 0x1000 0x4b000 66 0 r-x COW vnode 0x4b0000x4e0003 3 rwx COW vnode 0x4e0000x87000 5343 rwx COW swap --- 0x870000x373000 738 738 rwx default --- 0x2004b000 0x2005a000 2 0 r-x COW vnode 0x2005a000 0x2005c000 2 0 rwx COW vnode 0x2005c000 0x20065000 6 2 rwx COW swap --- 0x20068000 0x2006d000 3 0 r-x COW vnode 0x2006d000 0x2006e000 1 1 rwx COW vnode 0x2006e000 0x200cc00070 0 r-x COW vnode 0x200cc000 0x200d 4 4 rwx COW vnode 0x200d 0x200e7000 8 6 rwx COW swap --- 0xefbde000 0xefbfe0001414 rwx COW swap --- proxy1:/tmp# cat /proc/*/map | egrep 'swap|default' | ~dillon/br 639168K proxy1:/tmp# pstat -s Device 1K-blocks UsedAvail Capacity Type /dev/sd0b 52428812596 511628 2%Interleaved This machine has 256MB of ram of which around 200MB is in use, we will assume the entire 200MB is used by VM spaces for processes. It is an active machine with around 205 processes at the time of the test. So. 200MB of ram + 12MB of swap = 212MB of actual storage being used out of 639MB of total swap-backable VM. About a factor of 3.2:1. Actual swap utilization is sitting at 2%. If no overcommit were allowed, and assuming a VMSPACE = REALMEM + SWAP model, 200MB of ram would be active and 439MB worth of swap would be either allocated or reserved ( though only 12MB would be actually written, that part doesn't change ). 439MB of swap verses 12MB of swap. In that scenario, the 512MB of swap I assigned to this machine would be dangerously low. -Matt Matthew Dillon [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: Swap overcommit (was Re: Replacement for grep(1) (part 2))
In that scenario, the 512MB of swap I assigned to this machine would be dangerously low. With 13GB disks available for a couple of hundred bucks, my machines aren't going to run out of swap space any time soon, even if I commit to disk. All I want for Christmas is a knob to disable overcommit. --lyndon To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: Swap overcommit (was Re: Replacement for grep(1) (part 2))
On Thu, 15 Jul 1999 17:53:52 CST, [EMAIL PROTECTED] wrote: All I want for Christmas is a knob to disable overcommit. And what I'm pretty sure the majority of the readers on this list want is for those of you who really think it's necessary to do it yourselves. What? Nobody who wants to disable the policy knows how to do it? Hmmm, I wonder whether that's significant... Ciao, Sheldon. To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
re: Swap overcommit (was Re: Replacement for grep(1) (part 2))
All I want for Christmas is a knob to disable overcommit. And what I'm pretty sure the majority of the readers on this list want is for those of you who really think it's necessary to do it yourselves. What? Nobody who wants to disable the policy knows how to do it? Hmmm, I wonder whether that's significant... that's an impressively bold statement to make. by my reconning, at least 4 people who have posted "wanting no overcommit" are more than capable of programming this for NetBSD. .mrg. To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: Swap overcommit (was Re: Replacement for grep(1) (part 2))
: In that scenario, the 512MB of swap I assigned to this machine would be : dangerously low. : :With 13GB disks available for a couple of hundred bucks, my machines aren't :going to run out of swap space any time soon, even if I commit to disk. : :All I want for Christmas is a knob to disable overcommit. : :--lyndon If your machines aren't going to run out of swap, then the overcommit isn't going to hurt you in a million years. -Matt Matthew Dillon [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: Swap overcommit (was Re: Replacement for grep(1) (part 2))
:Technical follow-up: : :Contrary to what I previously said, a number of tests reveal that :Solaris, indeed, does not overcommit. All non-read only segments, :and all malloc()ed memory is reserved upon exec() or fork(), and the :reserved memory is not allowed to exceed the total memory. It makes :extensive use of read only DATA segments, and has a NON_RESERVE :mmap() flag. : :Though the foot firmly planted in my mouth ought to prevent me from :saying anything else, I must say that it does explain a few things :to me... : :-- :Daniel C. Sobral (8-DCS) :[EMAIL PROTECTED] Something is weird here. If the solaris people are using a SWAPSIZE + REALMEM VM model, they have to allow the allocated + reserved space go +REALMEM bytes over available swap space. If not they are using only a SWAPSIZE VM model. Wait - does Solaris normally use swap files or swap partitions? Or is it that weird /tmp filesystem stuff? If it normally uses swap files and allows holes then that explains everything. -Matt Matthew Dillon [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: Swap overcommit (was Re: Replacement for grep(1) (part 2))
Hi everyone, I've been following this discussion almost from the beginning, and I have the feeling that we're not _really_ getting very far. There's good arguments for and against overcommit, depending on your point of view and your requirements. What I do see is a not-so-openly voiced consent that the way resource(sp?) shortages are handled in an overcommitting system (SIGKILL) makes some of us rather unhappy. I therefore suggest those of us who would like to see a change in this area pool their efforts and energies to work on a mechanism that handles resource shortage in a more graceful way. cheerio Michael -- michael.schus...@germany.sun.com To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: Swap overcommit (was Re: Replacement for grep(1) (part 2))
At 6:29 PM -0700 7/14/99, Matthew Dillon wrote: If 1G isn't enough, spend another $30 and throw 2G of swap online. Or perhaps dedicate an entire $150 disk and throw 6+ GB of swap online. The equivalent setup using a non-overcommit model would require considerably more swap to have the same reliability. Please note that we're talking at cross-purposes here, mainly because I didn't realize this same general topic was being beaten to death in the 'replacement for grep' thread (which I have not been following). Speaking for just me myself and I, I have no problems with the current overcommit model. All I'd like to do is have a way to indicate which processes should not get booted first, if the system does indeed run out of swap and needs to boot some processes. However, other people seem much more worked up about this topic than I am, and thus what I (personally) meant as just casual questions seem to be taken as demands that something be done, RIGHT NOW. I now realize that some people are arguing that malloc should return an error if the system runs out of space, but that's not what I am thinking about. So, I think I'll bow out of this discussion for now, and maybe try to discuss my casual questions sometime in a different context... --- Garance Alistair Drosehn = g...@eclipse.acs.rpi.edu Senior Systems Programmer or dro...@rpi.edu Rensselaer Polytechnic Institute To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: Swap overcommit (was Re: Replacement for grep(1) (part 2))
On Thu, 15 Jul 1999, Daniel C. Sobral wrote: Uh... like any modern unix, Solaris overcommits. On Thu, 15 Jul 1999 08:46:36 -0700 (PDT), Eduardo E. Horvath e...@one-o.com said: Where do you guys get this misinformation? : Note the `19464k reserved'; that space has been reserved but not yet allocated. Both Dillon and Sobral mistakenly claimed that Solaris overcommits, this fact seems to be somewhat suggestive. And also, the followings are allocated memory and reserved memory in my environment. (This table also includes Eduardo's example) SunOS allocated reservedtotal total/allocated - - 4.1.4 4268k1248k5516k 1.2924 4.1.2 7732k1492k9224k 1.193 4.1.4 8848k3080k 11928k 1.3481 4.1.4 13532k6772k 20304k 1.5004 5.5.1 15312k5092k 20404k 1.3325 4.1.3 16112k6512k 22624k 1.4042 4.1.2 26356k1620k 27976k 1.0615 4.1.4 26560k3756k 30316k 1.1414 5.526076k 11348k 37424k 1.4352 4.1.4 32984k5556k 38540k 1.1684 5.632448k7072k 39520k 1.2179 4.1.4 38056k3692k 41748k 1.097 4.1.4 49064k7672k 56736k 1.1564 4.1.4 67012k7800k 74812k 1.1164 4.1.4 99348k 16956k 116304k 1.1707 4.1.4 118288k 11780k 130068k 1.0996 5.6 231968k 18880k 250848k 1.0814 5.7 307240k 19464k 326704k 1.0634 (sorted by total amount of used swap) In those examples, non-overcommiting system requires 1.06x ... 1.50x more swap space than overcommiting system. This table also indicates that in proportion as total used swap increase the ratio will decrease. And extra swap space required on non-overcommiting system is approximately several tens mega bytes. i.e. The extra cost of non-overcommiting system is less than ten dollers in my environment. Matt Dillon claimed that non-overcommiting system requires 8x or more swap space than overcommiting system. That's just wrong as above. (There might be cases which requires 8x swap, but it is not typical like Dillon said.) If you don't want non-overcommiting system, because you don't want to pay it's cost. That's OK, but please don't force us to accept your limited view. -- soda To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: Swap overcommit (was Re: Replacement for grep(1) (part 2))
:Both Dillon and Sobral mistakenly claimed that Solaris overcommits, :this fact seems to be somewhat suggestive. : :And also, the followings are allocated memory and reserved memory :in my environment. (This table also includes Eduardo's example) : : SunOS allocated reservedtotal total/allocated : - - : 4.1.4 4268k1248k5516k 1.2924 : 4.1.2 7732k1492k9224k 1.193 : 4.1.4 8848k3080k 11928k 1.3481 : 4.1.4 13532k6772k 20304k 1.5004 : 5.5.1 15312k5092k 20404k 1.3325 : 4.1.3 16112k6512k 22624k 1.4042 : 4.1.2 26356k1620k 27976k 1.0615 : 4.1.4 26560k3756k 30316k 1.1414 : 5.526076k 11348k 37424k 1.4352 : 4.1.4 32984k5556k 38540k 1.1684 : 5.632448k7072k 39520k 1.2179 : 4.1.4 38056k3692k 41748k 1.097 : 4.1.4 49064k7672k 56736k 1.1564 : 4.1.4 67012k7800k 74812k 1.1164 : 4.1.4 99348k 16956k 116304k 1.1707 : 4.1.4 118288k 11780k 130068k 1.0996 : 5.6 231968k 18880k 250848k 1.0814 : 5.7 307240k 19464k 326704k 1.0634 : : (sorted by total amount of used swap) : :In those examples, non-overcommiting system requires 1.06x ... 1.50x :... :soda Umm... how are you getting the reserved numbers? Are you sure that isn't simply cached swap blocks? I.E. when something gets swapped out and then is swapped back in and dirtied, Solaris may be holding the swap block assignment rather then letting it go. FreeBSD-stable does the same thing. FreeBSD-current does not -- it lets it go in order to be able to reallocate it later as part of a contiguous swath for performance reasons. These 'extra' swap blocks are effectively reserved but not actually allocated. They can be reassigned. The numbers above are very similar to what you would see in a redirtying-cache swap block situation on a FreeBSD-stable system. If I add up all the unshared writeable segments on my home box - that is, all segments for which one would potentially have to reserve swap space - I get a total of around 382MB. The machine is currently eating around 100MB of ram and 5MB of swap, or around a 3.5:1 ratio in this case. A non-overcommit model would have to reserve swap space for 382MB - 100MB = 282MB verses the 5MB of swap the machine actually allocates. -Matt Matthew Dillon dil...@backplane.com To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: Swap overcommit (was Re: Replacement for grep(1) (part 2))
On Thu, 15 Jul 1999 11:09:01 -0700 (PDT), Matthew Dillon dil...@apollo.backplane.com said: Umm... how are you getting the reserved numbers? pstat -s on SunOS4, and swap -s on SunOS5. From Solaris man page: :-s Print summary information about total swap : space usage and availability: : : allocated The total amount of swap space : (in 1024-byte blocks) : currently allocated for use as : backing store. : : reservedThe total amount of swap space : (in 1024-bytes blocks) not : currentlyallocated,but : claimed by memory mappings for : possible future use. : : usedThe total amount of swap space : (in 1024-byte blocks) that is : either allocated or reserved. -- soda To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: Swap overcommit (was Re: Replacement for grep(1) (part 2))
:pstat -s on SunOS4, and swap -s on SunOS5. From Solaris man page: : ::-s Print summary information about total swap :: space usage and availability: :: :: allocated The total amount of swap space :: (in 1024-byte blocks) :: currently allocated for use as :: backing store. :: :: reservedThe total amount of swap space :: (in 1024-bytes blocks) not :: currentlyallocated,but :: claimed by memory mappings for :: possible future use. :: :: usedThe total amount of swap space :: (in 1024-byte blocks) that is :: either allocated or reserved. :-- :soda Yah, that's what I thought. A solaris expert could tell us for sure but I am pretty sure those are simply cached swap blocks after-the-fact, not actual reservations on potentially swappable space. -Matt Matthew Dillon dil...@backplane.com To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: Swap overcommit (was Re: Replacement for grep(1) (part 2))
::-s Print summary information about total swap :: space usage and availability: :: :: allocated The total amount of swap space :: (in 1024-byte blocks) :: currently allocated for use as :: backing store. :: :: reservedThe total amount of swap space :: (in 1024-bytes blocks) not :: currentlyallocated,but :: claimed by memory mappings for :: possible future use. :: :: usedThe total amount of swap space :: (in 1024-byte blocks) that is :: either allocated or reserved. :-- :soda It would be really easy to test this. Write a program that malloc's 32MB of space and touches it, then sleeps 10 seconds and forks, with both child and parent sleeping afterwords. ( the parent and the forked child should not touch the memory after the fork occurs ). Do a pstat -s before, after the initial touch, and after the fork. If you do not see the reserved swap space jump by 32MB after the fork, it isn't what you thought it was. -Matt Matthew Dillon dil...@backplane.com To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: Swap overcommit (was Re: Replacement for grep(1) (part 2))
On Wed, 14 Jul 1999, John Nemeth wrote: On Jul 15, 2:40am, Daniel C. Sobral wrote: } Garance A Drosihn wrote: } At 12:20 AM +0900 7/15/99, Daniel C. Sobral wrote: } In which case the program that consumed all memory will be killed. } The program killed is +NOT+ the one demanding memory, it's the one } with most of it. } } But that isn't always the best process to have killed off... } } Sure it is. :-) Let's see... This statement is absurd. Only a comptetant admin can decide which process can be killed. No arbitrary decision is going to be correct. } It would be nice to have a way to indicate that, a la SIGDANGER. How about assigning something like a class to process, which gives VM a hint which processes should be killed first without much thinking, and which the last (or never)? In other words, let's say class 10 means totally disposable, kill whenever you want, and class 1 means never try to kill me. Of course, most processes would get some default value, and superuser could renice them to more resistant class. This way both sides of the discussion would be satisfied :-) Andrzej Bialecki // ab...@webgiro.com WebGiro AB, Sweden (http://www.webgiro.com) // --- // -- FreeBSD: The Power to Serve. http://www.freebsd.org // --- Small Embedded FreeBSD: http://www.freebsd.org/~picobsd/ To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: Swap overcommit (was Re: Replacement for grep(1) (part 2))
:Before program start: :total: 2k bytes allocated + 4792k reserved = 24792k used, 191048k available : :After malloc, before touch: :total: 18756k bytes allocated + 37500k reserved = 56256k used, 159580k available : :After malloc + touch: :total: 52804k bytes allocated + 4852k reserved = 57656k used, 158184k available : :After fork: :total: 52928k bytes allocated + 37644k reserved = 90572k used, 125264k available : :[there has been a little background activity, but the numbers speak for themselves] : : :Daniel Assuming the allocated field is not inclusive of real memory, what we have is swap reservation under solaris for clean pages, and allocation and assignment for dirty pages. The grand total will tell you the total VM potential for malloc'd space but does not appear to tell you how much swap is actually active - i.e. was written to and contains valid data. It would be interesting to see if the stack segment is included in the reservation. Try setting the stack resource limit to 32m and run the same program, except without bothering to malloc() or touch anything. See if the stack segment is included in the reservation field. It would also be interesting to see how solaris deals with MAP_PRIVATE mmap's. If this is correct, then solaris is using a VMSPACE = SWAPSPACE model. FreeBSD uses a VMSPACE = SWAPSPACE + REALMEM model. -Matt Matthew Dillon dil...@backplane.com To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: Swap overcommit (was Re: Replacement for grep(1) (part 2))
In article local.mail.freebsd-hackers/199907151825.laa11...@apollo.backplane.com you write: ::-s Print summary information about total swap :: space usage and availability: :: :: allocated The total amount of swap space :: (in 1024-byte blocks) :: currently allocated for use as :: backing store. :: :: reservedThe total amount of swap space :: (in 1024-bytes blocks) not :: currentlyallocated,but :: claimed by memory mappings for :: possible future use. :: :: usedThe total amount of swap space :: (in 1024-byte blocks) that is :: either allocated or reserved. :-- :soda It would be really easy to test this. Write a program that malloc's 32MB of space and touches it, then sleeps 10 seconds and forks, with both child and parent sleeping afterwords. ( the parent and the forked child should not touch the memory after the fork occurs ). Do a pstat -s before, after the initial touch, and after the fork. If you do not see the reserved swap space jump by 32MB after the fork, it isn't what you thought it was. aladdin[5:32pm] prtconf System Configuration: Sun Microsystems i86pc Memory size: 128 Megabytes aladdin[5:41pm] uname -a SunOS aladdin 5.6 Generic_105182-14 i86pc i386 total: 67280k bytes allocated + 28668k reserved = 95948k used, 196460k avail malloced 32MB... total: 67320k bytes allocated + 61460k reserved = 128780k used, 163592k avail touched... total: 100084k bytes allocated + 28696k reserved = 128780k used, 163732k avail forking... total: 100092k bytes allocated + 61520k reserved = 161612k used, 130864k avail touching again (parent)... touching again (child)... total: 132864k bytes allocated + 28748k reserved = 161612k used, 130760k avail exiting... exiting... total: 67248k bytes allocated + 28700k reserved = 95948k used, 196448k avail -- Jonathan To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: Swap overcommit (was Re: Replacement for grep(1) (part 2))
If this is correct, then solaris is using a VMSPACE = SWAPSPACE model. FreeBSD uses a VMSPACE = SWAPSPACE + REALMEM model. AFAIK it has been stated quite explicitly by the Solaris folks that Solaris 2.x uses VMSPACE = SWAPSPACE + REALMEM. This is *different* from SunOS 4.1.x. Steinar Haug, Nethelp consulting, sth...@nethelp.no To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: Swap overcommit (was Re: Replacement for grep(1) (part 2))
Here is what I get from one of BEST's mail www proxy machines. ~dillon/br adds the object size's together. 'swap' and 'default' objects refers to unbacked VM objects - and none of the processes running fork shared unbacked objects so we don't have to worry about that. The 'swap' designation means that at least one page in the object has been assigned swap. The default designation means that no pages have been assigned swap. The pages can be dirty or clean. Typical /proc/PID/map output looks like this (taken from one of the sendmail processes). The lines I've marked are the ones being counted as unbacked/swap-backed VM. The rest are vnode-backed and not counted. 0x1000 0x4b000 66 0 r-x COW vnode 0x4b0000x4e0003 3 rwx COW vnode 0x4e0000x87000 5343 rwx COW swap --- 0x870000x373000 738 738 rwx default --- 0x2004b000 0x2005a000 2 0 r-x COW vnode 0x2005a000 0x2005c000 2 0 rwx COW vnode 0x2005c000 0x20065000 6 2 rwx COW swap --- 0x20068000 0x2006d000 3 0 r-x COW vnode 0x2006d000 0x2006e000 1 1 rwx COW vnode 0x2006e000 0x200cc00070 0 r-x COW vnode 0x200cc000 0x200d 4 4 rwx COW vnode 0x200d 0x200e7000 8 6 rwx COW swap --- 0xefbde000 0xefbfe0001414 rwx COW swap --- proxy1:/tmp# cat /proc/*/map | egrep 'swap|default' | ~dillon/br 639168K proxy1:/tmp# pstat -s Device 1K-blocks UsedAvail Capacity Type /dev/sd0b 52428812596 511628 2%Interleaved This machine has 256MB of ram of which around 200MB is in use, we will assume the entire 200MB is used by VM spaces for processes. It is an active machine with around 205 processes at the time of the test. So. 200MB of ram + 12MB of swap = 212MB of actual storage being used out of 639MB of total swap-backable VM. About a factor of 3.2:1. Actual swap utilization is sitting at 2%. If no overcommit were allowed, and assuming a VMSPACE = REALMEM + SWAP model, 200MB of ram would be active and 439MB worth of swap would be either allocated or reserved ( though only 12MB would be actually written, that part doesn't change ). 439MB of swap verses 12MB of swap. In that scenario, the 512MB of swap I assigned to this machine would be dangerously low. -Matt Matthew Dillon dil...@backplane.com To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: Swap overcommit (was Re: Replacement for grep(1) (part 2))
In that scenario, the 512MB of swap I assigned to this machine would be dangerously low. With 13GB disks available for a couple of hundred bucks, my machines aren't going to run out of swap space any time soon, even if I commit to disk. All I want for Christmas is a knob to disable overcommit. --lyndon To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: Swap overcommit (was Re: Replacement for grep(1) (part 2))
On Thu, 15 Jul 1999 17:53:52 CST, lyn...@orthanc.ab.ca wrote: All I want for Christmas is a knob to disable overcommit. And what I'm pretty sure the majority of the readers on this list want is for those of you who really think it's necessary to do it yourselves. What? Nobody who wants to disable the policy knows how to do it? Hmmm, I wonder whether that's significant... Ciao, Sheldon. To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
re: Swap overcommit (was Re: Replacement for grep(1) (part 2))
All I want for Christmas is a knob to disable overcommit. And what I'm pretty sure the majority of the readers on this list want is for those of you who really think it's necessary to do it yourselves. What? Nobody who wants to disable the policy knows how to do it? Hmmm, I wonder whether that's significant... that's an impressively bold statement to make. by my reconning, at least 4 people who have posted wanting no overcommit are more than capable of programming this for NetBSD. .mrg. To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: Swap overcommit (was Re: Replacement for grep(1) (part 2))
And what I'm pretty sure the majority of the readers on this list want is for those of you who really think it's necessary to do it yourselves. What? Nobody who wants to disable the policy knows how to do it? Hmmm, I wonder whether that's significant... Sheldon, if you can't contribute something useful, then shut up. If I have to do it myself, I will. To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: Swap overcommit (was Re: Replacement for grep(1) (part 2))
: In that scenario, the 512MB of swap I assigned to this machine would be : dangerously low. : :With 13GB disks available for a couple of hundred bucks, my machines aren't :going to run out of swap space any time soon, even if I commit to disk. : :All I want for Christmas is a knob to disable overcommit. : :--lyndon If your machines aren't going to run out of swap, then the overcommit isn't going to hurt you in a million years. -Matt Matthew Dillon dil...@backplane.com To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: Swap overcommit (was Re: Replacement for grep(1) (part 2))
Technical follow-up: Contrary to what I previously said, a number of tests reveal that Solaris, indeed, does not overcommit. All non-read only segments, and all malloc()ed memory is reserved upon exec() or fork(), and the reserved memory is not allowed to exceed the total memory. It makes extensive use of read only DATA segments, and has a NON_RESERVE mmap() flag. Though the foot firmly planted in my mouth ought to prevent me from saying anything else, I must say that it does explain a few things to me... -- Daniel C. Sobral(8-DCS) d...@newsguy.com d...@freebsd.org Would you like to go out with me? I'd love to. Oh, well, n... err... would you?... ahh... huh... what do I do next? To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: Swap overcommit (was Re: Replacement for grep(1) (part 2))
:Technical follow-up: : :Contrary to what I previously said, a number of tests reveal that :Solaris, indeed, does not overcommit. All non-read only segments, :and all malloc()ed memory is reserved upon exec() or fork(), and the :reserved memory is not allowed to exceed the total memory. It makes :extensive use of read only DATA segments, and has a NON_RESERVE :mmap() flag. : :Though the foot firmly planted in my mouth ought to prevent me from :saying anything else, I must say that it does explain a few things :to me... : :-- :Daniel C. Sobral (8-DCS) :d...@newsguy.com Something is weird here. If the solaris people are using a SWAPSIZE + REALMEM VM model, they have to allow the allocated + reserved space go +REALMEM bytes over available swap space. If not they are using only a SWAPSIZE VM model. Wait - does Solaris normally use swap files or swap partitions? Or is it that weird /tmp filesystem stuff? If it normally uses swap files and allows holes then that explains everything. -Matt Matthew Dillon dil...@backplane.com To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: Replacement for grep(1) (part 2)
On Tue, 13 Jul 1999, Jon Ribbens wrote: Alfred Perlstein [EMAIL PROTECTED] wrote: You're browsing with netscape and It hits about 32megs in size, you click on a multimedia object and netscape execs a helper app. vfork() you also have to consider a program wishing to make sparse use of its address space, without overcommit it becomes impossible. So Don't Do That Then. Overcommit can be used for many reasons. I use it to reserve a large linear address space to mmap alpha i/o spaces to which allows an efficient implementation of inx/outx in user mode: UID PID PPID CPU PRI NI VSZ RSS WCHAN STAT TT TIME COMMAND 0 43655 43652 7 2 0 12616584 12456 select S ?? 1036:41.62 /usr/X11R6/bin/X -auth /usr/X11R6/lib/X11/xdm/authdir/A:0-w43652 The X server is using 12G of address space.. -- Doug Rabson Mail: [EMAIL PROTECTED] Nonlinear Systems Ltd. Phone: +44 181 442 9037 To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: Replacement for grep(1) (part 2)
: Back on topic: : : Obviously you devote the most time to handling the most common : and serious failure modes, but if someone else if willing to : put in the work to handle nightmare cases, should you ignore or : discard that work? Of course not. But nobody in this thread is even close to doing any actual work and so far the two people I know who can (me and DG) aren't particularly interested. Instead they seem to want someone else to do the work based on what I consider to be entirely unsubtantiated supposition. Would you accept someone's unsupported and untested theories based almost entirely on a nightmare scenario to the exclusion of all other possible (and more likely) problems? I mean come on... read some of this stuff. There are plenty of ways to solve these problems without making the declaration that the overcommit model is flawed beyond repair, and so far nobody has bothered to offer any counter-arguments to the resource management issues involved with actually *implementing* a non-overcommit model... every time I throw up hard numbers the only response I get is a shrug-off with no basis in fact or experience noted anywhere. In the real world, you can't shrug of those sorts of problems. I'm the only one trying to run hard numbers on the problem. Certainly nobody else is. This is hardly something that would actually convince me of the efficy of the model as applied to a UNIX kernel core. Instead, people are pulling out their favorite screwups and then blaming the overcommit model for all their troubles rather then looking for the more obvious answer: A misconfiguration or simply a lack of resources. Some don't even appear to *have* any trouble with the overcommit model, but argue against it anyway basing their entire argument on the possibility that something might happen, again without bothering to calculate the probability or run any hard numbers. The argument is shifting from embedded work to multi-user operations to *hostile* multi-user systems with some people advocating that a non-overcommit model will magically solve all their woes in these very different scenarios, but can't be bothered with actually finding a real-life scenario or using an experience to demonstrate their position. It is all pretty much garbage. No wonder the NetBSD core broke up, if this is what they had to deal with 24 hours a day! : Put more accurately - if someone wants to provide a different rope : to permit people to write in a different defensive style, and it : does not in any way impact your use of the system: More power to them. : : David/absolute As I've said on several occassions now, there is nothing in the current *BSD design that prevents an embedded designer from implementing his or her own memory management subsystem to support the memory requirements of their programs. The current UNIX out-of-memory kill scenario only occurs as a last resort and it is very easy for an embedded system to avoid. It should be considered nothing more then a watchdog for catastrophic failure. To implement the simplest non-overcommit system in the *BSD kernel - returning NULL on an allocation failure due to non-availability of backing store - is virtually useless because it is just as arbitrary as killing processes. It might help a handful of people out of hundreds of thousands do something but they would do a lot better with a watchdog script. It makes no sense to try to build it into the kernel. -Matt Matthew Dillon [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: Replacement for grep(1) (part 2)
Date:Tue, 13 Jul 1999 14:14:52 -0700 (PDT) From:Matthew Dillon [EMAIL PROTECTED] Message-ID: [EMAIL PROTECTED] | If you don't have the disk necessary for a standard overcommit model to | work, you definitely do not have the disk necessary for a non-overcommit | model to work. This is based upon your somewhat strange definition of "work". I assure you that I have run many systems which don't use overcommit, and which I quite frequently run into "out of VM" conditions, and which I can assure you, work just fine. When they're getting to run out of VM, the system is approaching paging death, which is as you'd expect (they're overloaded). That is, adding more VM (more swap space) would be counterproductive. When this stage is reached, the absolute prime requirement of "working" is met though - applications that request memory get that request refused, but absolutely no processes get ungracefully killed. In a sense, no-one really cares what the page allocation policy is, the argument here isn't about overcommit, or the very conservative early BSD version, or any of the intermediate possibilities - all people really care about is what happens when resources are exhausted. What happens until then no-one really cares about (there are some issues of how much space you need to dedicate to paging - most people would probably prefer to not use the early BSD method, where you needed at least as much paging space as RAM, or some of your RAM simply would be left idle). But one absolute requirement for any system that wants to consider itself to be a reliable useable, general purpose system, is that it never simply randomly kill processes of its own volition. If you're happy for random processes to be killed on your workstation, that's fine, I'm not. I run processes which are intended to do specific work, they're not intended to simply go away just because memory is running low (there are other processes, stupid perl scripts and such, which will quite quickly die when a mem request is refused, and return resources, so the processes that matter, which can be very large, can keep on processing). I have no doubt but that you can dream up scenarios where you pander to the laziness of programmers, and make using huge VM space with little of it actually allocated anywhere (or ever touched) then you would indeed need monstrous amounts of paging space, most of which is never actually used for anything - personally I prefer to have the programmers think a little more about the memory footprint of their data structures. Not only does this reduce the VM footprint, it will also usually vastly improving the paging characteristics. Most applications which simply scatter data through a huge VM space simply stop being useable as soon as their RSS exceeds available physical memory - that is, if they start paging, they die (become comatose might be a better description). A little intelligent though as to how to actually make use of the mem resources can make a huge difference. There was an earlier comment on this thread (which no longer has the slightest thing to do with the new version of grep...) which mentioned fortran programs. People, fortran (and huge fortran programs) has been around much longer than VM has been. There are lots of techniques for fortran programmers to use to make use of restricted memory sizes, they've been managing that for decades. kre To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: Replacement for grep(1) (part 2)
Maybe if I call the sysctl "vm.crashmenow". No, that will just make more people actually try it. It might be doable as a compile-time option, since you wouldn't be able to run anything approaching standard on such a system anyway. I don't see much use for it myself. As I said before, there are easier ways to manage memory that are not quite as arbitrary as simply refusing a potential overcommit. Perhaps it could be an additional flag to mmap, in this way people wishing to run an overcommited system could do so but those writing programs which must not overcommit for certain memory allocations could ensure they did not do so. Regards, Niall To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: Replacement for grep(1) (part 2)
Noriyuki Soda wrote: Running out of swap can be easily done by normal user privilege. Non-overcommiting system can run important application on the system which has a normal user, because it never lose critical data, even if a user on the system make a mistake. (The application might stop, but it never lose data.) 4.4BSD derived system cannot do this, and have to use different machine for such applications. Incorrect. We can set *limits* to the users, so they won't be able to crash down the system. -- Daniel C. Sobral(8-DCS) [EMAIL PROTECTED] [EMAIL PROTECTED] "Would you like to go out with me?" "I'd love to." "Oh, well, n... err... would you?... ahh... huh... what do I do next?" To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: Replacement for grep(1) (part 2)
Doug Rabson [EMAIL PROTECTED] writes: Overcommit can be used for many reasons. I use it to reserve a large linear address space to mmap alpha i/o spaces [...] Overcommit can be used for many reasons, but unless you've misdescribed what you're doing, _that's not one of them_. The mapped I/O pages need no backing store to be allocated for them by the VM system. They're backed by hardware. And if you have 'placeholder' pages (I note that you didn't say you mmap all of alpha i/o space, just reserve a large linear address space in which to mmap it), then it should be possible to map them in such a way (e.g. read-only ZFOD) in which they wouldn't count against backing store requirements, either. cgd -- Chris Demetriou - [EMAIL PROTECTED] - http://www.netbsd.org/People/Pages/cgd.html Disclaimer: Not speaking for NetBSD, just expressing my own opinion. To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: Swap overcommit (was Re: Replacement for grep(1) (part 2))
On Thu, 15 Jul 1999, Daniel C. Sobral wrote: "Charles M. Hannum" wrote: That's also objectively false. Most such environments I've had experience with are, in fact, multi-user systems. As you've pointed out yourself, there is no combination of resource limits and whatnot that are guaranteed to prevent `crashing' a multi-user system due to overcommit. My simulation should not be axed because of a bug in someone else's program. (This is also not hypothetical. There was a bug in one version of bash that caused it to consume all the memory it could and then fall over.) In which case the program that consumed all memory will be killed. The program killed is +NOT+ the one demanding memory, it's the one with most of it. So why don't we do something else: when we're down to a certain amount of backing store, start collecting statistics. When we're out, we check the statistics and find what process has been allocating most of it. We kill that process. -- Daniel C. Sobral (8-DCS) [EMAIL PROTECTED] [EMAIL PROTECTED] "Would you like to go out with me?" "I'd love to." "Oh, well, n... err... would you?... ahh... huh... what do I do next?" Brian Fundakowski Feldman _ __ ___ ___ ___ ___ [EMAIL PROTECTED] _ __ ___ | _ ) __| \ FreeBSD: The Power to Serve!_ __ | _ \._ \ |) | http://www.FreeBSD.org/ _ |___/___/___/ To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: Replacement for grep(1) (part 2)
"Chris G. Demetriou" wrote: ... Overcommit avoidance may not be useful for your particular uses of these UNIX-like systems. However, if you think that it's not useful to anybody who uses them (or that people who think it's useful are deluding themselves 8-), then you're sorely mistaken and have a ... very wrong-headed attitude about why people find such features useful. Have you actually tried a system which can work in either overcommit and non-overcommit modes? What it comes down to is that if you have enough memory to run in non-overcommit, you have enough memory to run in overcommit. Setting limits is complex, but it is no more complex than correctly sizing the memory in a non-overcommit system (this is demonstrable). -- Daniel C. Sobral(8-DCS) [EMAIL PROTECTED] [EMAIL PROTECTED] "Would you like to go out with me?" "I'd love to." "Oh, well, n... err... would you?... ahh... huh... what do I do next?" To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: Replacement for grep(1) (part 2)
Matthew Dillon wrote: : :Heh, really? The camera ships w/ Apache running on it. : :-- Jason R. Thorpe [EMAIL PROTECTED] They obviously have a lot of memory to play with, then. Or they are crazy. Writing a web server is fairly easy to do. I've written several, including the one that BEST runs on most of its servers. For the record, professional digital cameras go into the $100K range, so I'd be expecting it not only to run Apache, but also to come with Doom. :-) -- Daniel C. Sobral(8-DCS) [EMAIL PROTECTED] [EMAIL PROTECTED] "Would you like to go out with me?" "I'd love to." "Oh, well, n... err... would you?... ahh... huh... what do I do next?" To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: Replacement for grep(1) (part 2)
Jason Thorpe wrote: There is a lot of hidden 'potential' VM that you haven't considered. For example, if the resource limit for a process's stack is 8MB, then the process can potentially allocate 8MB of stack even though it may actually only allocate 32K of stack. When a process forks, the child ...um, so, make the code that deals with faulting in the stack a bit smarter. Uh? Like what? Like overcommitting, for instance? The beauty of overcommitting is that either you do it or you don't. :-) -- Daniel C. Sobral(8-DCS) [EMAIL PROTECTED] [EMAIL PROTECTED] "Would you like to go out with me?" "I'd love to." "Oh, well, n... err... would you?... ahh... huh... what do I do next?" To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: Swap overcommit (was Re: Replacement for grep(1) (part 2))
At 12:00 PM -0400 7/14/99, Brian F. Feldman wrote: So why don't we do something else: when we're down to a certain amount of backing store, start collecting statistics. When we're out, we check the statistics and find what process has been allocating most of it. We kill that process. Not that I'm really commenting on the above idea (although it does sound fine to me), this reminds me about an earlier thread. Is there any interest in us (BSD's) having a SIGDANGER signal like some other OS's do? That way, key processes (like sshd) could at least make it less likely that THEY are the process which is killed. --- Garance Alistair Drosehn = [EMAIL PROTECTED] Senior Systems Programmer or [EMAIL PROTECTED] Rensselaer Polytechnic Institute To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: Swap overcommit (was Re: Replacement for grep(1) (part 2))
"Brian F. Feldman" wrote: In which case the program that consumed all memory will be killed. The program killed is +NOT+ the one demanding memory, it's the one with most of it. So why don't we do something else: when we're down to a certain amount of backing store, start collecting statistics. When we're out, we check the statistics and find what process has been allocating most of it. We kill that process. Because it's not only equally arbitrary but also takes more resources to implement? -- Daniel C. Sobral(8-DCS) [EMAIL PROTECTED] [EMAIL PROTECTED] "Would you like to go out with me?" "I'd love to." "Oh, well, n... err... would you?... ahh... huh... what do I do next?" To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: Replacement for grep(1) (part 2)
If you wanted to fix this, you could add a patch to malloc that touched every page that it handed to the application. (and trapped sig11s) On Wed, 14 Jul 1999 [EMAIL PROTECTED] wrote: I mean, jeeze, the reservation for the program stack alone would eat up all your available swap space! What is a reasonable stack size? The system defaults to 8MB. Do we rewrite every program to specify its own stack size? How do we account for architectural differences? The alternative is to rewrite every program that assumes the semantics of malloc() are being followed. The problem I have as an applications writer is that I tend to believe malloc. To pick a specific example, our IMAP client takes steps to ensure it won't run out of memory in critical sections. We maintain a "rainy day" pool block of memory. If we receive a NULL from malloc, we 1) free up whatever memory we can in other parts of the client (possibly using the rainy day pool to stage data out to disk), and 2) if necessary, reduce the size of the rainy day pool. This whole design is predicated on malloc() telling the truth. If instead it gives us a bogus block of memory, then seg faults when we try to use it, the best we can do is try to shut down without losing any of the users mail (and in fact we don't even do that, since there are just too many places where this can happen in third-party libraries that we aren't willing to rewrite). Sending us a kill signal is even worse. (And extremely unfair, since we take pains to not waste memory in the first place.) Has anyone analyzed all those applications people talk about that show huge allocation footprints but don't actually use the memory? That represents the code that needs to be fixed. Breaking malloc() is not a suitable response IMO. As a data point, we routinely disable overcommit on our SGI machines and it doesn't hurt us one bit. And we aren't allocating gigabytes of swap space, either. --lyndon To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: Swap overcommit (was Re: Replacement for grep(1) (part 2))
You don't seem to understand that a runaway process/one designed just to take up memory will be much more active than your little IMAP servers, and be the one killed, if this scheme were used. Brian Fundakowski Feldman _ __ ___ ___ ___ ___ [EMAIL PROTECTED] _ __ ___ | _ ) __| \ FreeBSD: The Power to Serve!_ __ | _ \._ \ |) | http://www.FreeBSD.org/ _ |___/___/___/ To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: Swap overcommit (was Re: Replacement for grep(1) (part 2))
You don't seem to understand that a runaway process/one designed just to take up memory will be much more active than your little IMAP servers, and be the one killed, if this scheme were used. No, what I don't understand is how the current behaviour can tell that my temporary and *valid* need for a large chunk of memory does not make me a runaway process, and therefore subject to death. --lyndon To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: Replacement for grep(1) (part 2)
On 14 Jul 1999, Chris G. Demetriou wrote: Doug Rabson [EMAIL PROTECTED] writes: Overcommit can be used for many reasons. I use it to reserve a large linear address space to mmap alpha i/o spaces [...] Overcommit can be used for many reasons, but unless you've misdescribed what you're doing, _that's not one of them_. The mapped I/O pages need no backing store to be allocated for them by the VM system. They're backed by hardware. And if you have 'placeholder' pages (I note that you didn't say you mmap all of alpha i/o space, just reserve a large linear address space in which to mmap it), then it should be possible to map them in such a way (e.g. read-only ZFOD) in which they wouldn't count against backing store requirements, either. I certainly don't need or want backing store for these pages. The original reserved region is never touched without first mapping device pages onto it. -- Doug Rabson Mail: [EMAIL PROTECTED] Nonlinear Systems Ltd. Phone: +44 181 442 9037 To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: Replacement for grep(1) (part 2)
On Thu, 15 Jul 1999, Daniel C. Sobral wrote: For the record, professional digital cameras go into the $100K range, so I'd be expecting it not only to run Apache, but also to come with Doom. :-) Well you have 16MB RAM, 32MB flash memory, a network interface, other bits and NetBSD for ~ $1600. Find yourself a remote display and fire up your compiler :) http://www.brains.co.jp/mmeye/index-e.html David/absolute -=- "Just adding to the wrinkles on his deathly frown" -=- To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: Swap overcommit (was Re: Replacement for grep(1) (part 2))
"John" == John Nemeth [EMAIL PROTECTED] writes: John On one system I administrate, the largest process is typically John rpc.nisd (the NIS+ server daemon). Killing that process would be a John bad thing (TM). You're talking about killing random processes. John This is no way to run a system. It is not possible for any John arbitrary decision to always hit the correct process. That is a John decision that must be made by a competent admin. This is the John biggest argument against overcommit: there is no way to gracefully John recover from an out of memory situation, and that makes for an John unreliable system. No, I don't agree. This is a biggest argument against solving the overcommit situation with SIGKILL. I have no problem with overcommit as a concept, I have a problem with being unable to keep my possibly big processes (X, rpc.nisd, etc. depending on cicumstances) from being victims. ] Train travel features AC outlets with no take-off restrictions| firewalls [ ] Michael Richardson, Sandelman Software Works, Ottawa, ON|net architect[ ] [EMAIL PROTECTED] http://www.sandelman.ottawa.on.ca/ |device driver[ ] panic("Just another NetBSD/notebook using, kernel hacking, security guy"); [ To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: Swap overcommit (was Re: Replacement for grep(1) (part 2))
On Jul 15, 2:40am, "Daniel C. Sobral" wrote: } Garance A Drosihn wrote: } At 12:20 AM +0900 7/15/99, Daniel C. Sobral wrote: } In which case the program that consumed all memory will be killed. } The program killed is +NOT+ the one demanding memory, it's the one } with most of it. } } But that isn't always the best process to have killed off... } } Sure it is. :-) Let's see... This statement is absurd. Only a comptetant admin can decide which process can be killed. No arbitrary decision is going to be correct. } It would be nice to have a way to indicate that, a la SIGDANGER. } } Ok, everybody is avoiding this, so I'll comment. Yes, this would be The reason I've ignored it, is because SIGDANGER is a hack on top of a very bad hack. } interesting, and a good implementation will very probably be } committed. *BUT*, this is not as useful as it seems. Since the } correct solution is buy more memory/increase swap (correct solution } for our target markets, anyway), there is little incentive to } implement it. In case you hadn't noticed, this debate is cross-posted to NetBSD. NetBSD's target market isn't the same as FreeBSD's target market. This answer is NOT the correct solution for NetBSD's target market. Heck, except for one rather vocal person, FreeBSD's target market may not consider it to be the correct solution either. I most certainly do not consider it to be correct, and I admin a lot of mission critical servers. }-- End of excerpt from "Daniel C. Sobral" To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: Swap overcommit (was Re: Replacement for grep(1) (part 2))
"Ben" == Ben Rosengart [EMAIL PROTECTED] writes: Ben On Wed, 14 Jul 1999, John Nemeth wrote: On one system I administrate, the largest process is typically rpc.nisd (the NIS+ server daemon). Killing that process would be a bad thing (TM). You're talking about killing random processes. This is no way to run a system. It is not possible for any arbitrary decision to always hit the correct process. That is a decision that must be made by a competent admin. This is the biggest argument against overcommit: there is no way to gracefully recover from an out of memory situation, and that makes for an unreliable system. Ben $DEITY on a pogo stick, how many times do we have to hear the same Ben hypothetical argument? Ben Tell me, Mr. Nemeth, has this ever happened to you? Have you ever Ben come *close*? Uh, since we don't run overcommit, the answer is specifically *NO*. We have never had lack of swap space randomly kill one of our processes. This is good, and this is the way we want to keep it. I have had it happen on other systems. (Solaris, AIX) It was very mystifying to diagnose. Sure, the systems were misconfigured for what we were trying to do, but if I wanted build a custom system for every application well... I'd be running NT. ] Train travel features AC outlets with no take-off restrictions| firewalls [ ] Michael Richardson, Sandelman Software Works, Ottawa, ON|net architect[ ] [EMAIL PROTECTED] http://www.sandelman.ottawa.on.ca/ |device driver[ ] panic("Just another NetBSD/notebook using, kernel hacking, security guy"); [ To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: Replacement for grep(1) (part 2)
Date:Thu, 15 Jul 1999 00:53:17 +0900 From:"Daniel C. Sobral" [EMAIL PROTECTED] Message-ID: [EMAIL PROTECTED] | Would you care to name such systems? munnari was one (the system of the From: header, even though this mail isn't actually going anywhere near it). I will describe it a bit lower down. | And, btw, a system consuming | all memory is *not* necessarily approaching paging death. No, of course not, though I didn't say all memory, I said all VM. And while it is possible to have all VM consumed, and no paging activity at all, that would tend to indicate insufficient VM allocated (reaching an artificial barrier). | More | likely, it is just storing a lot of data in the swap which will | never be used (which is the whole point of overcommit in first | place), and, thus, never paged in. The systems I describe were not using overcommit, further, I wouldn't imagine that a system storing anything to swap would be overcommiting - as I understand the term, overcommit only relates to allocating VM resources which aren't backed by anything physical at all ("here's all this address space you can play in if you like, but you had better not actually do that, because if you do it won't work"). Either applied to one process, as that wording suggests, or aggregated over the whole system. If a process was (for some stupid reason) loading a whole bunch of data into the swap space, that would be committed VM, and you have to have the resources to cope with it. Now to munnari. It no longer runs quite like this, but munnari is an alpha, 128MB, runs digital unix (not in overcommit mode, either is possible there). At the time of which I speak it ran two principal applications of note, innd with a VM footprint about 100MB, and named, with a memory footprint (at the time) of about 90MB (as it is now, it no longer runs innd, but its named has grown to 120MB). It also ran a bunch of small stuff (sendmail, typically 1 or 2 instances, around 3MB each), ftpd (smaller, most often 0 or 1, sometimes 3 or 4,) and the occasional shell (a few hundreds of MB) plus init getty cron syslog and all that associated noise with mem requirements approaching 0. That's fine. Well, not really fine, innd and named would fight each other all day for who had how much of the real memory, and who was relegated to swap, of which there was enough for all this to fit, but not a lot more than that (enough for one of them to fork when it needed to, that's all - not both at once, and yes, overcommit would have allowed both at once, but that was not an aim). Then, because it was running innd, it was also running the perl script that summarises the log file, that could grow to 30MB, maybe more. And because it is running sendmail, every now and then you get the typical sendmail huge queue syndrome (at least for old sendmails, which this was), where you get a dead site, a large queue of processes, and a bunch of sendmails running the queue, spending most of their time hung on connection attempts that aren't working, and gradually growing bigger (maybe 8 or 10 processes at 15Mb each). Somewhere amongst all of this swap would run out, and a good thing too, as by this time the system really would be paging itself to oblivion. Note that all this (large) VM I have described was filled with real data (except for the odd times hen innd or named had just forked), none of it could be overcommitted and just ignored. Whatever policy was in place, the physical VM resources would have run out. Now let's look at what happens with the two methods. With all VM backed by real mem or swap space, processes go about allocating memory - when there is no more left, the allocations start failing. If the process is perl, it just collapses in a heap, and the log file summary doesn't get made that day. So sad... If its sendmail, it issues "OS error, temporary failure" type responses, saves its queue files, and exits. A later sendmail will deliver those messages, no harm. If its a shell, who knows (I forget what the shells do, I think most just keep trying, at least if interactive), but they consume mem at such a slow rate it doesn't matter - fork() would typically fail though, so no new processes could get started. innd would just pause, and wait till a bit later when mem might be available again (those perls and sendmails all gone away). named just the same (at least the named munnari ran). They're the two processes munnari was supposed to be runinng - those two don't just die. Now, with overcommit mode, we get an extra 30 seconds of life, because no doubt there are a few pages floating around that have been allocated to some process, but nothing has bothered to write into yet. An extra 30 seconds if we're lucky (except if we followed the advice given here earlier which would indicate that only 1/8 the amount of swap space would be needed, in which case these processes would never have gotten started in
Re: Replacement for grep(1) (part 2)
:Now let's look at what happens with the two methods. : :With all VM backed by real mem or swap space, processes go about allocating :memory - when there is no more left, the allocations start failing. :If the process is perl, it just collapses in a heap, and the log file :summary doesn't get made that day. So sad... If its sendmail, it :issues "OS error, temporary failure" type responses, saves its queue files, :and exits. A later sendmail will deliver those messages, no harm. :If its a shell, who knows (I forget what the shells do, I think most just :keep trying, at least if interactive), but they consume mem at such a slow :rate it doesn't matter - fork() would typically fail though, so no new :processes could get started. innd would just pause, and wait till a :bit later when mem might be available again (those perls and sendmails :all gone away). named just the same (at least the named munnari ran). :They're the two processes munnari was supposed to be runinng - those two :don't just die. Which means that if one of those two processes happen to be the ones primarily responsible for running the machine out of VM, memory resources will never be released and now you can't even login! Not only that, but if you are running a news subsystem, it is actually *worse* if the news process bogs down and gets behind then it for the news process to simply die and alert someone. When you are pushing news, you cannot afford to get behind. Also, your named is badly misconfigured if it grows to 130MB. We never allow ours to grow past 30MB. Since the machine is basically in an unworking state anyway, and since you can now no longer login, I don't quite see why you are happy that those two processes are still running. From my standpoint, the machine is badly broken and needs to be rebooted and then fixed so the problems do not reoccur and I would be much happier if I could log into the beast to get that done then to have to hit the reset button. :Now, with overcommit mode, we get an extra 30 seconds of life, because :no doubt there are a few pages floating around that have been allocated :to some process, but nothing has bothered to write into yet. An extra 30 :... garbage removed ... :Sure it would get lots of VM back again, but the system would no longer :have been doing what it was supposed to be doing. Adding more swap space The machine isn't doing what it is supposed to be doing in either case once it has run out of VM. Except in the first case you think you should be happy because it didn't kill the news process, when in fact you ought to be trying to figure out why the thing ran out of VM in the first place and then fix it so it never happens again. To me, this whole scenario sounds like a badly configured machine which the sysop isn't willing to fix. I feel sorry for the poor company who hired that sysop! :would be easy, but the wrong thing to do, that would just have allowed :the system to page itself to death, thrashing into eternity - having :processes go away is the only solution to this kind of problem. Except :it needs to be the right processes, and "right" does not equal "big", :nor any other criteria the kernel could possibly figure out for itself. : :kre If you consider this a critical problem, then the only acceptable solution is to write a watchdog script that monitors swap utilization and kills the correct processes if swap starts to get low. If you wait until swap actually runs out, you've already lost because too many things are likely to break in a general purpose computing environment. Of course I suppose you could advocate that programs must be written 'properly' to handle the case... well, more power to you, but in a general computing environment you are running dozens if not hundreds of third party applications and fixing them all is a pipe dream. It seems to me that you are willing to blame the operating system for a situation that is really not the OS's fault, and that you are not willing to sit down and spend the 10 minutes necessary writing a simple watchdog script. I don't bother to write watchdog scripts to check for swap, because my machines DO NOT RUN OUT OF SWAP. If your machines do, then maybe you should consider writing the watchdog script. Personally, I think you would get better reliability by fixing your systems. You are blaming what is essentially a last-resort effort by the kernel for not being nice to your processes. Well Duh! It's a last-resort mechanism, it isn't supposed to be nice. Maybe you shouldn't be depending on last resort mechanisms to keep your machines running. -Matt Matthew Dillon [EMAIL PROTECTED] To Unsubscribe: send mail
Re: Swap overcommit (was Re: Replacement for grep(1) (part 2))
On Tue, 13 Jul 1999 23:18:58 -0400 (EDT) John Baldwin [EMAIL PROTECTED] wrote: What does that have to do with overcommit? I student administrate a undergrad CS lab at a university, and when student's programs misbehaved, they generate a fault and are killed. The only machines that reboot on us without be explicitly told to are the NT ones, and yes we run FreeBSD. What does it have to do with overcommit? Everthing in the world! If you have a lot of users, all of which have buggy programs which eat a lot of memory, per-user swap quotas don't necessarily save your butt. And maybe the individual programs didn't encounter their resource limits. ...but the sheer number of these runaway things caused the overcommit to be a problem. If malloc() or whatever had actually returned NULL at the right time (i.e. as backing store was about to become overcommitted), then these runaway processes would have stopped running away (they would have gotten a SIGSEGV and died). Anyhow, my "lame undergrads" example comes from a time when PCs weren't really powerful enough for the job (or something; anyhow, we didn't have any in the department :-). My example is from a Sequent Balance (16 ns32032 processors, 64M RAM [I think; been a while], 4.2BSD variant). -- Jason R. Thorpe [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: Replacement for grep(1) (part 2)
Also, your named is badly misconfigured if it grows to 130MB. We never allow ours to grow past 30MB. How do you know what kind of name server configuration kre is running? Here's an example of a name server running *non-recursive*, serving 11.500 zones: PID USERNAME PRI NICE SIZE RES STATE TIME WCPUCPU COMMAND 27162 root 2070M 57M sleep 271:01 3.27% 3.27% named Are you saying that such configurations should be illegal? Steinar Haug, Nethelp consulting, [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: Replacement for grep(1) (part 2)
: : Also, your named is badly misconfigured if it grows to 130MB. We never : allow ours to grow past 30MB. : :How do you know what kind of name server configuration kre is running? :Here's an example of a name server running *non-recursive*, serving :11.500 zones: : : PID USERNAME PRI NICE SIZE RES STATE TIME WCPUCPU COMMAND :27162 root 2070M 57M sleep 271:01 3.27% 3.27% named : :Are you saying that such configurations should be illegal? : :Steinar Haug, Nethelp consulting, [EMAIL PROTECTED] I assumed that since the guy said that his named GREW, that he was running a recurisve/caching named. Obviously if you are running a non-recursive named the static size will depend on the zones you are serving. Duh! It is not generally beneficial to allow a caching named to exceed 30MB or so on a system that is doing other things. If the system starts to page (which this person's system is obviously doing), then it is doubly a bad idea to allow a named to grow that large. -Matt Matthew Dillon [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: Swap overcommit (was Re: Replacement for grep(1) (part 2))
:On Tue, 13 Jul 1999 23:18:58 -0400 (EDT) : John Baldwin [EMAIL PROTECTED] wrote: : : What does that have to do with overcommit? I student administrate a undergrad : CS lab at a university, and when student's programs misbehaved, they generate a : fault and are killed. The only machines that reboot on us without be : explicitly told to are the NT ones, and yes we run FreeBSD. : :What does it have to do with overcommit? Everthing in the world! : :If you have a lot of users, all of which have buggy programs which eat :a lot of memory, per-user swap quotas don't necessarily save your butt. If every single one of your users is trying to crash your machine daily, maybe you should consider throwing them off the system and finding users that are less hostile. This conversation is getting silly. Do you actually believe that an operating system can magically protect itself 100% from armloads of hostile users? Give me a break. You people are crazy. If you have something worthwhile to say i'll listen, but these "the sky is falling!" arguments are idiotic. -Matt To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: Replacement for grep(1) (part 2)
On Wed, 14 Jul 1999 12:43:07 + Niall Smart [EMAIL PROTECTED] wrote: Perhaps it could be an additional flag to mmap, in this way people wishing to run an overcommited system could do so but those writing programs which must not overcommit for certain memory allocations could ensure they did not do so. This has already been mentioned. SVR4 has MAP_NORESERVE specifcally for this purpose. -- Jason R. Thorpe [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: Replacement for grep(1) (part 2)
On Thu, 15 Jul 1999 01:52:11 +0900 "Daniel C. Sobral" [EMAIL PROTECTED] wrote: ...um, so, make the code that deals with faulting in the stack a bit smarter. Uh? Like what? Like overcommitting, for instance? The beauty of overcommitting is that either you do it or you don't. :-) One option is to special-case overcommit the stack. Another is to set the default stack limits to something more reasonable on a system where overcommit is disabled. -- Jason R. Thorpe [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: Replacement for grep(1) (part 2)
On Thu, 15 Jul 1999 01:59:12 +0900 "Daniel C. Sobral" [EMAIL PROTECTED] wrote: That's why you make it a switch. No, really, you *can* just make it a switch. So, enlighten me, please... how do you switch it in NetBSD? When the code to do it is implemented (not that hard, really, and it is in the list of things to do with UVM), a sysctl will enable/disable overcommit checking. There would be like 4 or 5 places in the code where this boolean switch would have to be tested. -- Jason R. Thorpe [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: Swap subsystem overhead (was Re: Replacement for grep(1) (part 2))
On Tue, Jul 13, 1999 at 05:12:30PM -0700, Matthew Dillon wrote: Ok, I will be more specific. Under FreeBSD-STABLE *AND* FreeBSD-CURRENT, FreeBSD allocates metadata structures that scale to the amount of swap space assigned to the system. However, it is not *precisely* the amount of swap space. snip Under FreeBSD-stable, just look under "VM pgdata" to see how much memory is being wired to support the swap subsystem. This usage covers both the fixed and dynamic allocations. OK, at the risk of reawakening that particular thread -- if people are a little uneasy about Matt committing to src/*, how about letting him commit to doc/* instead? Matt -- some of these messages of yours could probably turn in to great articles for DaemonNews, or the FreeBSD 'zine, if you were that way inclined. . . N -- [intentional self-reference] can be easily accommodated using a blessed, non-self-referential dummy head-node whose own object destructor severs the links. -- Tom Christiansen in [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: Replacement for grep(1) (part 2)
: :One option is to special-case overcommit the stack. Another is to :set the default stack limits to something more reasonable on a system :where overcommit is disabled. : :-- Jason R. Thorpe [EMAIL PROTECTED] Try setting all the resource limits to something reasonable on general principles. It would work as well in an overcommit system as it would in a non-overcommit system. -Matt Matthew Dillon [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: Replacement for grep(1) (part 2)
[ Trimmed CC list a bit ] :* even if you are not willing to pay that price, there _are_ people :who are quite willing to pay that price to get the benefits that they :see (whether it's a matter of perception or not, from their :perspective they may as well be real) of such a scheme. Quite true. In the embedded world we preallocate memory and shape the programs to what is available in the system. But if we run out of memory we usually panic and reboot - because the code is designed to NOT run out of memory and thus running out of memory is a catastrophic situation. *ACK* This is unacceptable in many 'embedded' systems. There's a whole spectrum of embedded devices, and applications that run on them. That definition works for some of them, but definitely not all. Totally agreed. A previous poster brought up the fact that *some* embedded systems are built to deal with 'out of memory' situations, and that the 'total' amount of memory used in the system can be used by other parts of the system. For performance reasons, a particular application may choose to 'cache' data, but in low memory situation it can 'free' up alot of memory. You don't want to put hard-coded limits the process simply because if the memory is there you want it to be able to use it, but you *certainly* don't want to go through a reboot just to get memory back. [ And, I don't want to write my own OS to do this for me. :) ] (However, I agree that for general purpose computing, over-commit is the way to go. But, *BSD is not just for general purpose computing, although that is it's primary market.) Nate To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: Replacement for grep(1) (part 2)
: : Quite true. In the embedded world we preallocate memory and shape : the programs to what is available in the system. But if we run out : of memory we usually panic and reboot - because the code is designed : to NOT run out of memory and thus running out of memory is a catastrophic : situation. : :*ACK* This is unacceptable in many 'embedded' systems. Don't confuse a watchdog panic from other conditions. If the embedded system software is supposed to deal with a low-memory condition and can't, the failsafe is all that's left between it and infinity. The statement that the kernel's overcommit methodology somehow prevents one from being able to build embedded systems on top of it is just plain incorrect. The embedded system is perfectly capable of implementing its own memory management to avoid the filesafe provided by the kernel. Most of the embedded work I've done -- mainly remote telemetry units running with flash and a megabyte or so of ram -- panic and reboot if they run out of memory. I have several dozen units in the field each keeping track of several thousand data points on 2 minute intervals which have not ever crashed. The only time we reboot them is when we need to upgrade the OS core. The last time was 4 years ago. *These* units will panic and reboot if they run out of memory because the software is designed not to. It is as simple as that. -Matt To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message