Distinguish real vs. virtual CPUs?
Is there a canonical way for user-space software to determine how many real CPUs are present in a system (as opposed to HyperThreaded or otherwise virtual CPUs)? We have an application that for performance reasons wants to run one process per CPU. However, on a HyperThreaded system /proc/cpuinfo lists two CPUs, and running two processes in this case is the wrong thing to do. (Hyperthreading ends up degrading our performance, perhaps due to cache or bus contention). Please CC replies. Thanks, Dan Maas - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: VM Requirement Document - v0.0
> Getting the user's "interactive" programs loaded back > in afterwards is a separate, much more difficult problem > IMHO, but no doubt still has a reasonable solution. Possibly stupid suggestion... Maybe the interactive/GUI programs should wake up once in a while and touch a couple of their pages? Go too far with this and you'll just get in the way of performance, but I don't think it would hurt to have processes waking up every couple of minutes and touching glibc, libqt, libgtk, etc so they stay hot in memory... A very slow incremental "caress" of the address space could eliminate the "I-just-logged-in-this-morning-and-dammit-everything-has-been-paged-out" problem. Regards, Dan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: A signal fairy tale
> Signals are a pretty dopey API anyway - so instead of trying to patch > them up, why not think of something better for AIO? I have to agree, in a way... At some point we need to swallow our pride, admit that UNIX has a crappy event model, and implement something like Win32 GetMessage =)... I've been having trouble finding situations where asynchronous signals are really the most appropriate technique, aside from delivering life-threatening things like SIGTERM, SIGKILL, and SIGSEGV. The mutation into queued, information-carrying siginfo signals just shows how badly we need a more robust event model... (what would truly kick butt is a unified interface that could deliver everything from fd events to AIO completions to semaphore/msgqueue events, etc, with explicit binding between event queues and threads). Regards, Dan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: VM Requirement Document - v0.0
> Windows NT/2000 has flags that can be for each CreateFile operation > ("open" in Unix terms), for instance > > FILE_ATTRIBUTE_TEMPORARY > FILE_FLAG_WRITE_THROUGH > FILE_FLAG_NO_BUFFERING > FILE_FLAG_RANDOM_ACCESS > FILE_FLAG_SEQUENTIAL_SCAN > There is a BSD-originated convention for this - madvise(). If you look in the Linux VM code there is a bit of explicit code for different madvise access patterns, but I'm not sure if it's 100% supported. Drop-behind would be really, really nice to have for my multimedia applications. I routinely deal with very large video files (several times larger than my RAM). When I sequentially read though such files a bit at a time, I do NOT want the old pages sitting there in RAM while all of my other running programs are rudely paged out... (hrm, maybe I could hack up my own manual read-ahead/drop-behind with mmap() and memory locking...) Regards, Dan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: threading question
> Is there a user-space implemenation (library?) for > coroutines that would work from C? Here is another one: http://oss.sgi.com/projects/state-threads/ Regards, Dan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: forcibly unmap pages in driver?
Just an update to my situation... I've implemented my idea of clearing the associated PTE's when I need to free the DMA buffer, then re-filling them in nopage(). This seems to work fine; if the user process tries anything fishy, it gets a SIGBUS instead of accessing the old mapping. I encountered two difficulties with the implementation: 1) zap_page_range(), flush_cache_range(), and flush_tlb_range() are not exported to drivers. I basically copied the guts of zap_page_range() into my driver, which seems to work OK on x86, but I know it will have trouble with architectures that require special treatment of PTE manipulation... 2) the state of mm->mmap_sem is unknown when my file_operations->release() function is called. If release() is called when the last FD closes, then mm->mmap_sem is not taken. But if release() is called from do_munmap, then mmap_sem has already been taken. So, it is risky to mess with vma's inside of release()... Regards, Dan > >> Later, the program calls the ioctl() again to set a smaller > >> buffer size, or closes the file descriptor. At this point > >> I'd like to shrink the buffer or free it completely. But I > >> can't assume that the program will be nice and munmap() the > >> region for me > > > Look at drivers/char/drm, for example. At mmap time they allocate a > > vm_ops to the address space. With that you catch changes to the vma > > structure initiated by a user mmap, munmap, etc. You could also > > dynamically map the pages in using the nopage method (optional). - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: forcibly unmap pages in driver?
>> Later, the program calls the ioctl() again to set a smaller >> buffer size, or closes the file descriptor. At this point >> I'd like to shrink the buffer or free it completely. But I >> can't assume that the program will be nice and munmap() the >> region for me > Look at drivers/char/drm, for example. At mmap time they allocate a > vm_ops to the address space. With that you catch changes to the vma > structure initiated by a user mmap, munmap, etc. You could also > dynamically map the pages in using the nopage method (optional). OK I think I have a solution... Whenever I need to re-allocate or free the DMA buffer, I could set all of the user's corresponding page table entries to deny all access. Then I'd get a page fault on the next access to the buffer, and inside nopage() I could update the user's mapping or send a SIGBUS as appropriate (hmm, just like restoring a file mapping that was thrown away)... So I just have to figure out how to find the user's page table entries that are pointing to the DMA buffer. Regards, Dan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: forcibly unmap pages in driver?
> That seems a bit perverse. How will the poor userspace program know > not to access the pages you have yanked away from it? If you plan > to kill it, better to do that directly. If you plan to signal it > that the mapping is gone, it can just call munmap() itself. Thanks Pete. I will explain situation I am envisioning; perhaps there is a better way to handle this -- My driver uses a variable-size DMA buffer that it shares with user-space; I provide an ioctl() to choose the buffer size and allocate the buffer. Say the user program chooses a large buffer size, and mmap()s the entire buffer. Later, the program calls the ioctl() again to set a smaller buffer size, or closes the file descriptor. At this point I'd like to shrink the buffer or free it completely. But I can't assume that the program will be nice and munmap() the region for me - it might still have the large buffer mapped. What should I do here? An easy solution would to allocate the largest possible buffer as my driver is loaded, even if not all of it will be exposed to user-space. I don't really like this choice because the buffer needs to be pinned in memory, and the largest useful buffer size is very big (several tens of MB). Maybe I should disallow more than one buffer allocation per open() of the device... But the memory mapping will stay around even after close(), correct? I'd hate to have to keep the buffer around until my driver module is unloaded. > However, do_munmap() will call zap_page_range() for you and take care of > cache and TLB flushing if you're going to do this in the kernel. I'm not sure if I could use do_munmap() -- how will I know if the user program has called munmap() already, and then mmap()ed something else in the same place? Then I'd be killing the wrong mapping... Regards, Dan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
forcibly unmap pages in driver?
I am writing a device driver that, like many others, exposes a shared memory region to user-space via mmap(). The region is allocated with vmalloc(), the pages are marked reserved, and the user-space mapping is implemented with remap_page_range(). In my driver, I may have to free the underlying vmalloc() region while the user-space program is still running. I need to remove the user-space mapping -- otherwise the user process would still have access to the now-freed pages. I need an inverse of remap_page_range(). Is zap_page_range() the function I am looking for? Unfortunately it's not exported to modules =(. As a quick fix, I was thinking I could just remap all of the user pages to point to a zeroed page or something... Another question- in the mm.c sources, I see that many of the memory-mapping functions are surrounded by calls to flush_cache_range() and flush_tlb_range(). But I don't see these calls in many drivers. Is it necessary to make them when my driver maps or unmaps the shared memory region? Regards, Dan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: #define HZ 1024 -- negative effects?
> Are there any negative effects of editing include/asm/param.h to change > HZ from 100 to 1024? Or any other number? This has been suggested as a > way to improve the responsiveness of the GUI on a Linux system. I have also played around with HZ=1024 and wondered how it affects interactivity. I don't quite understand why it could help - one thing I've learned looking at kernel traces (LTT) is that interactive processes very, very rarely eat up their whole timeslice (even hogs like X). So more frequent timer interrupts shouldn't have much of an effect... If you are burning CPU doing stuff like long compiles, then the increased HZ might make the system appear more responsive because the CPU hog gets pre-empted more often. However, you could get the same result just by running the task 'nice'ly... The only other possibility I can think of is a scheduler anomaly. A thread arose on this list recently about strange scheduling behavior of processes using local IPC - even though one process had readable data pending, the kernel would still go idle until the next timer interrupt. If this is the case, then HZ=1024 would kick the system back into action more quickly... Of course, the appearance of better interactivity could just be a placebo effect. Double-blind trials, anyone? =) Regards, Dan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Asynchronous IO
IIRC the problem with implementing asynchronous *disk* I/O in Linux today is that the filesystem code assumes synchronous I/O operations that block the whole process/thread. So implementing "real" asynch I/O (without the overhead of creating a process context for each operation) would require re-writing the filesystems as non-blocking state machines. Last I heard this was a long-term goal, but nobody's done the work yet (aside from maybe the SGI folks with XFS?). Or maybe I don't know what I'm talking about... Bart, glad to hear you are working on an event interface, sounds cool! One feature that I really, really, *really* want to see implemented is the ability to block on a set of any "waitable kernel objects" with one syscall - not just file descriptors, but also SysV semaphores and message queues, UNIX signals and child proceses, file locks, pthreads condition variables, asynch disk I/O completions, etc. I am dying for a clean way to accomplish this that doesn't require more than one thread... (Win32 and FreeBSD kick our butts here with MsgWaitForMultipleObjects() and kevent()...) IMHO cleaning up this API deficiency is just as important as optimizing the extreme case of socket I/O with zillions of file descriptors... Regards, Dan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Using IPCSysV in a device driver
> I am wondering if it is permitted to use message queues between a user > application and a device driver module... > Can anyone help me? It may be theoretically possible, but an easier and much more common approach to this type of thing is for the driver to export an mmap() interface. You could synchronize using poll() I think... Regards, Dan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: mapping physical memory
> I need to be able to obtain and pin approximately 8 MB of > contiguous physical memory in user space. How would I go > about doing that under Linux if it is at all possible? The only way to allocate that much *physically* contiguous memory is by writing a driver that grabs it at boot-time (I think the "bootmem" API is used for this). This is an extreme measure and should rarely be necessary, except in special cases such as primitive PCI cards that lack support for scatter/gather DMA. You can easily implement a mmap() interface to give user-space programs access to the memory; there are plenty of examples of how to do this in various character device drivers. (well OK, if all you need is a one-off hack, you can use the method developed by the Utah GLX people -- tell the kernel that you have 8MB *less* RAM than is actually present using a "mem=" directive at boot, then grab that last piece of memory by mmap'ing /dev/mem -- see http://utah-glx.sourceforge.net/memory-usage.html) Dan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: PROBLEM: select() on TCP socket sleeps for 1 tick even if data available
> It's not the select that waits. It's a delay in the tcp send > path waiting for more data. Try disabling it: > > int f=1; > setsockopt(s, SOL_TCP, TCP_NODELAY, &f, sizeof(f)); Bingo! With this fix, 2.2.18 performance becomes almost identical to 2.4.0 performance. I assume 2.4.0 disables Nagle by default on local connections... Dan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: PROBLEM: select() on TCP socket sleeps for 1 tick even if data available
What kernel have you been using? I have reproduced your problem on a standard 2.2.18 kernel (elapsed time ~10sec). However, using a 2.4.0 kernel with HZ=1000, I see a 100x improvement (elapsed time ~0.1 sec; note that increasing HZ alone should only give a 10x improvement). Perhaps the scheduler was fixed in 2.4.0? 2.2.18 very definitely has some scheduling anomalies. In your benchmark, select() or poll() takes 10ms, as can be observed with strace -T. Skipping the select() and blocking in read() gives the same behavior. This leads me to believe the scheduler is at fault, and not select(), poll(), or read(). When run without strace, 2.4.0 appears to have no problems with your benchmark. Elapsed time is 0.1 sec -- this may be the full potential of my machine (PII/450). Removing select() and blocking in read() results in a further improvement, to 0.07 sec. Strace disturbs the behavior of 2.4.0 in strange ways. Running the benchmark under strace with 2.4.0 causes the scheduler delays to return -- ~1ms delays appear in select() or write(). This is confusing - it appears that context switches can happen inside write() as well as select(), a result I don't understand at all (the socket buffers never completely fill since you only write 1000 bytes to each one). Other notes: poll() behaves same as select(). Using the SCHED_FIFO class and mlockall() has no effect on this benchmark. Setting the sockets non-blocking also has no effect. I wish I had the Linux Trace Toolkit handy; it would give a much better idea of what's going on than strace... Dan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Subtle MM bug (really 830MB barrier question)
> 08048000-08b5c000 r-xp 03:05 1130923 /tmp/newmagma/magma.exe.dyn > 08b5c000-08cc9000 rw-p 00b13000 03:05 1130923 /tmp/newmagma/magma.exe.dyn > 08cc9000-0bd0 rwxp 00:00 0 > Now, subsequent to each memory allocation, only the second number in the > third line changes. It becomes 23a78000, then 3b7f, and finally > 3b808000 (after the failed allocation). OK it's fairly obvious what's happening here. Your program is using its own allocator, which relies solely on brk() to obtain more memory. On x86 Linux, brk()-allocated memory (the heap) begins right above the executable and grows upward - the increasing number you noted above is the top of the heap, which grows with every brk(). Problem is, the heap can't keep growing forever - as you discovered, on x86 Linux the upper bound is just below 0x4000. That boundary is where shared libraries and other memory-mapped files start to appear. Note that there is still plenty (~2GB) of address space left, in the region between the shared libraries and the top of user address space (just under 0xBFFF). How do you use that space? You need an allocation scheme based on mmap'ing /dev/zero. As others pointed out, glibc's allocator does just that. Here's your short answer: ask the authors of your program to either 1) replace their custom allocator with regular malloc() or 2) enhance their custom allocator to use mmap. (or, buy some 64-bit hardware =)...) Dan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Journaling: Surviving or allowing unclean shutdown?
> > Being able to shut down by hitting the power switch is a little luxury > > for which I've been willing to invest more than a year of my life to > > attain. Clueless newbies don't know why it should be any other way, and > > it's essential for embedded devices. Just some food for thought - hitting the power switch on my old Indy actually performs the equivalent of "shutdown -r now"; the system only cuts the power when it's done cleaning up (sometimes several minutes later). I suspect most workstation-class systems do similar things. Of course this creates a confusing distinction between "pulling the plug" and "hitting the power switch." Uninformed users might even be more bewildered by the flurry of disk activity after performing the latter; heck, I wouldn't blame someone who freaks out and pull the plug to make it stop =). Also, such a system obviously has little benefit in the event of an AC power failure. Dan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: LMbench 2.4.0-test10pre-SMP vs. 2.2.18pre-SMP
> The pipe bandwidth is intimately related to pipe latency. Linux pipes > are fairly small (only 4kB worth of data buffer), so they need good > latency for good performance. ... > The pipe bandwidth could be fairly easily improved by just doubling the > buffer size (or by using VM tricks), but it's not been something that > anybody has felt was all that important in real life. A while ago I hacked 2.2.17 to use larger pipe buffers. On my own pure throughput benchmark (two processes ping-pongging one buffer's worth of data on a single-CPU system), buffers larger than 4KB hardly gave any advantage. 64KB buffers were marginally (10-20%) faster, but performance dropped quite considerably after that (cache effects, maybe...). After seeing these results I simply assumed that 4KB had been deliberately chosen as the optimal buffer size, rather than by luck =). Now, Dave Miller's kiobuf pipes may change the picture somewhat... Dan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Linux's implementation of poll() not scalable?
> Shouldn't there also be a way to add non-filedescriptor based events > into this, such as "child exited" or "signal caught" or shm things? Waiting on pthreads condition variables, POSIX message queues, and semaphores (as well as fd's) at the same time would *rock*... Unifying all these "waitable objects" would be tremendously helpful to fully exploit the "library transparency" advantage that Linus brought up. Some libraries might want to wait on things that are not file descriptors... Regards, Dan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: about time-slice
> I have a question about the time-slice of linux, how do I know it, or how > can I test it? First look for the (platform-specific) definition of HZ in include/asm/param.h. This is how many timer interrups you get per second (eg on i386 it's 100). Then look at include/linux/sched.h for the definition of DEF_COUNTER. This is the number of timer interrupts between mandatory schedules. By default it's HZ/10, meaning that the time-slice is 100ms (10 schedules/sec). (of course the interval could be longer if kernel code is hogging the CPU; the scheduler won't run until the process leaves the kernel or sleeps explicitly...) Experts, please correct me if I'm wrong. Regards, Dan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: large memory support for x86
The memory map of a user process on x86 looks like this: - KERNEL (always present here) 0xC000 - 0xBFFF STACK - MAPPED FILES (incl. shared libs) 0x4000 - HEAP (brk()/malloc()) EXECUTABLE CODE 0x08048000 - Try examining /proc/*/maps, and also watch your programs call brk() using strace; you'll see all this in action... > So why does the process space start at such a high virtual > address (why not closer to 0x)? Seems we're wasting ~128 megs of > RAM. Not a huge amount compared to 4G, but signifigant. I don't know; anyone care to comment? > Another question: how (and where in the code) do we translate virtual > user-addresses to physical addresses? In hardware, with the TLB and, if the TLB misses, then page tables. > Does the MMU do it, or does it call a > kernel handler function? Only when an attempt is made to access an unmapped or protected page; then you get an interrupt (page fault), which the kernel code handles. > Why is the kernel allowed to reference physical > addresses, while user processes go through the translation step? Not even the kernel accesses physical memory directly. It can, however, choose to map the physical memory into its own address space contiguously. Linux puts it at 0xC000 and up. (question for the gurus- what happens on machines with >1GB of RAM?) > Can kernel > pages be swapped out / faulted in just like user process pages? Linux does not swap kernel memory; the kernel is so small it's not worth the trouble (are there other reasons?). e.g. My Linux boxes run 1-2MB of kernel code; my NT machine is running >6MB at the moment... Dan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: thread rant [semi-OT]
> All portability issues aside, if one is writing an application in > Linux that one would be tempted to make multithreaded for > whatever reason, what would be the better Linux way of doing > things? Let's go back to basics. Look inside your computer. See what's there: 1) one (or more) CPUs 2) some RAM 3) a PCI bus, containing: 4) -- a SCSI/IDE controller 5) -- a network card 6) -- a graphics card These are all the parts of your computer that are smart enough to accomplish some amount of work on their own. The SCSI or IDE controller can read data from disk without bothering any other components. The network card can send and receive packets fairly autonomously. Each CPU in an SMP system operates nearly independently. An ideal application could have all of these devices doing useful work at the same time. When people think of "multithreading," often they are just looking for a way to extract more concurrency from their machine. You want all these independent parts to be working on your task simultaneously. There are many different mechanisms for achieveing this. Here we go... A naively-written "server" program (eg a web server) might be coded like so: * Read configuration file - all other work stops while data is fetched from disk * Parse configuration file - all other work stops while CPU/RAM work on parsing the file * Wait for a network connection - all other work stops while waiting for incoming packets * Read request from client - all other work stops while waiting for incoming packets * Process request - all other work stops while CPU/RAM figure out what to do - all other work stops while disk fetches requested file * Write reply to client - all other work stops until final buffer transmitted I've phrased the descriptions to emphasize that only one resource is being used at once - the rest of the system sits twiddling its thumbs until the one device in question finishes its task. Can we do better? Yes, thanks to various programming techniques that allow us to keep more of the system busy. The most important bottleneck is probably the network - it makes no sense for our server to wait while a slow client takes its time acknowledging our packets. By using standard UNIX multiplexed I/O (select()/poll()), we can send buffers of data to the kernel just when space becomes available in the outgoing queue; we can also accept client requests piecemeal, as the individual packets flow in. And while we're waiting for packets from one client, we can be processing another client's request. The improved program performs better since it keeps the CPU and network busy at the same time. However, it will be more difficult to write, since we have to maintain the connection state manually, rather than implicitly on the call stack. So now the server handles many clients at once, and it gracefully handles slow clients. Can we do even better? Yes, let's look at the next bottleneck - disk I/O. If a client asks for a file that's not in memory, the whole server will come to a halt while it read()s the data in. But the SCSI/IDE controller is smart enough to handle this alone; why not let the CPU and network take care of other clients while the disk does its work? How do we go about doing this? Well, it's UNIX, right? We talk to disk files the same way we talk to network sockets, so let's just select()/poll() on the disk files too, and everything will be dandy... (Unfortunately we can't do that - the designers of UNIX made a huge mistake and decided against implementing non-blocking disk I/O as they had with network I/O. Big booboo. For that reason, it was impossible to do concurrent disk I/O until the POSIX Asynchronous I/O standard came along. So we go learn this whole bloated API, in the process finding out that we can no longer use select()/poll(), and must switch to POSIX RT signals - sigwaitinfo() - to control our server***). After the dust has settled, we can now keep the CPU, network card, and the disk busy all the time -- so our server is even faster. Notice that our program has been made heavily concurrent, and I haven't even used the word "thread" yet! Let's take it one step further. Packets and buffers are now coming in and out so quickly that the CPU is sweating just handling all the I/O. But say we have one or three more CPU's sitting there idle - how can we get them going, too? We need to run multiple request handlers at once. Conventional multithreading is *one* possible way to accomplish this; it's rather brute-force, since the threads share all their memory, sockets, etc. (and full VM sharing doesn't scale optimally, since interrupts must be sent to all the CPUs when the memory layout changes). Lots of UNIX servers run multiple *processes*- the "sub-servers" might not share anything, or they might file cache or request queue. If we were brave, we'd think carefully about what resources really should be shared between the sub-servers, and then implement it manually using