On 09/07/2015 19:36, Steve Litt wrote:
I know what you mean. In the past 9 months I've seen a huge uptick in ambuification in emails, to the point where many times, you don't know who said what, and it looks like the person is arguing with himself, with temporal dislocations thrown in as people top post with words like "it" instead of exactly what they mean, or "I agree" in a thread with twelve different assertions.
Blame the tool designers. Most users read far more than they write, so tools are optimized for reading, and not much work goes into UIs for writing. Users are lazy - that's nothing new - and simply don't put in the necessary effort to properly format what they write; but a good UI should make it easier for them, or even do it in their stead. Unfortunately, they are few and far between. GMail is a prime example of this sad state of the art. The GMail Web UI is optimized for "conversation reading", i.e. it will display all the mails in a thread at once. But the way GMail can do that is that when you reply to a "conversation", it automatically quotes the *whole* conversation in your mail, and forces you to top-post, so the UI can hide the quoted part that's below your answer. This is great for readers who use the GMail web interface. And it is absolutely horrible for people who don't. My own lists only accept plain text - I consider that if you want to communicate via a mailing-list, you should be able to handle plain text; if you want HTML, go to a web forum. But obviously, not everyone agrees.
By the way, I have no personal knowledge of how many actor sockets a listener socket can spawn off, but if I had to guess, I'd imagine 50 would be way too low a number, if for no other reason than none of my current and former ISPs would have been able to serve httpd to the masses if 50 was the limit.
If you're interested in the "how many simultaneous clients can I handle ?" question, a fundamental reference page is: http://www.kegel.com/c10k.html It was essentially written between 1999 and 2003, but parts of it have been maintained until today, and most of it is still pretty accurate. The underlying APIs or algorithms have not changed that much. TL;DR: if you use the proper APIs, you can serve around the order of 10000 clients simultaneously on one socket. And that was already true in 1999. That is for heavy network servers. For services where you don't expect 10k clients, you can use the fork/exec model just fine - that's what inetd and tcpserver do, and it works pretty well. I expect you could serve several hundreds without a problem, and in certain cases, you could probably reach one or two thousands before experiencing noticeable slowdowns. The first problem you'll encounter when doing that will probably be the amount of resources, especially RAM, that you need to keep several hundreds concurrent servers running. Most servers are not designed to be especially thrifty with RAM, and if every instance is using a few megabytes of private data, you're looking at a few gigabytes of RAM if you even want to serve 1000 clients. Now, the original point was "What is the maximum number of processes you can run on a system". Well, for all practical intents and purposes, the answer really is "As many as you want". As I usually put it: processes are not a scarce resource. Let me repeat for emphasis: *processes are not a scarce resource.* I don't know what the scheduler algorithm was pre-Linux 2.6, but in Linux 2.6, the scheduler was in O(1), meaning it scheduled your processes in constant time, no matter how many you had. How awesome is that ? They changed it for some reason, in some version of Linux 3.0 or something around it. Now it's in O(log n), which is still incredibly good: unless you have billions of billions of processes, you are not going to noticeably slow down the scheduler. Fact is, you're going to fill up the process table way before having scheduler trouble. Go ahead and make your fork bomb. You *will* notice a system slowdown, but that will be because all the processes in your fork bomb are perpetually runnable, so it's just that you will be hogging the CPU with a potential infinity of runnable processes, and anything else will have no timeslice left. You will see immediately that your shell becomes unable to fork other commands - your fork bomb has filled up the process table. But the system is still running, as best as it can with all CPUs at perma-100% and a full process table. Historically, pid_t was 16 bits, and 32k processes won't kill your scheduler. Nowadays, pid_t is 32 bits, and although there definitely are limits that prevent you from having 2G processes, the sheer number of processes isn't it. On a typical machine, the constrained resources are RAM and CPU. Those are the resources you'll run out of first; and a process will consume more of one or more of the other, depending on what it does and how it is used. A Linux process takes some kernel memory (not sure exactly how much, probably 8k or 12k), and about 16k of userspace memory plus what is used by the process itself. Virtual memory makes it hard to tell exactly how much is used. Let's say that the absolute minimal amount of real memory used by a process is 64k total - this is a very, very large estimate. So if all your processes are small and use very little more than that, you can have up to almost a million processes on a 64 GB machine. More realistically, my main server generally has about 300 processes running at all times; sometimes it goes up to 400. And it still has tons of free RAM, because most of those processes are very small. A Linux process, on a 2 GHz x86_64 machine, takes about 1-2 milliseconds to fork() and about 1 millisecond to execve(). Then it can take a lot more time resolving dynamic symbols, if it is a dynamically linked executable (one of the reasons why I prefer static linking). Those numbers depend on a lot of factors, of course, including the size of the process, whether the executable is in the disk cache, etc. But let's say it takes about 2.5 milliseconds to fork+exec on average, to get very rough numbers. Well, if you're using a super-server to serve 1k clients, you're already spending more than 2 seconds just creating your 1k processes. This is expensive. You probably don't want to go over 1k clients if you're going to spawn a process per connection; and the heavier a process is, the more expensive it is to spawn - after the execve() and the dynamic linking, you have the configuration, etc. etc. No wonder Apache wants to pre-fork its server processes. On the other hand, once a process has been created, there's no upkeep for it aside from the kernel RAM it's using. If the process sleeps all day long, it's not going to hurt anything - its userspace memory can even be swapped out and the RAM reclaimed until it wakes up. The 300ish processes on my server are all I/O-bound: they're waiting on some I/O that basically never comes, so they're sleeping all the time. They're just there, ready to react when something comes their way, and in the meantime, they don't hurt. My load average, unless I perform a compilation or something, is rigorously 0.00. (The http://skarnet.org/ site definitely needs more visitors. XD) Conclusion: the number of processes on a system, or even the number of processes used to perform a given task, is a meaningless metric. Processes are a tool in a Unix programmer's toolbox, and a pretty cheap (unless you fork millions of them all the time) and good tool at that; don't be afraid to see some task fork zillions of processes. It's really all about what all those processes do, how they're written, how they interact with the system. Better have 50 well-behaved processes using exactly the resources they need to perform their job than one big memory hog or CPU hog. -- Laurent _______________________________________________ Dng mailing list Dng@lists.dyne.org https://mailinglists.dyne.org/cgi-bin/mailman/listinfo/dng