On Mon, Mar 15, 2010 at 5:07 PM, James Mansion <ja...@mansionfamily.plus.com> wrote: > Marc Lehmann wrote: >> >> Keep in mind thatthe primary use for threads is improve context switching >> times in single processor situations - event loops are usually far faster >> at >> context switches. >> >> > > No, I don't think so. (re: improve context switching times in single > processor > situations). Yes, context switch in a state machine is fastest, followed by > coroutines, > and then threads. But you're limited to one core. >> >> Now that multi cores become increasingly more common and scalability to >> multiple cpus and even hosts is becoming more important, threads should be >> avoided as they are not efficiently using those configs (again, they are a >> single-cpu thing). >> > > You keep saying this, but that doesn't make it true.
His basic point here, which I agree with, is sound. If you're trying to scale up a meta-task (network server) that does many interleaved tasks (talking to many clients) on one processor, event loops are going to beat threads, assuming you can make everything the event loop does nonblocking (or fast enough for blocking to not matter much). That's all talking about a single CPU core though. However, the thread model as typically used scales poorly across multiple CPUs as compared to distinct processes, especially as one scales up from simple SMP to the ccNUMA style we're seeing with large-core-count Opteron and Xeon -based machines these days. This is mostly because of memory access and data caching issues, not because of context switching. The threads thrash on caching memory that they're both writing to (and/or content on locks, it's related), and some of the threads are running on a different NUMA node than where the data is (in some cases this is very pathological, especially if you haven't had each thread allocate its own memory with a smart malloc). >> >> While there are exceptions (as always), in the majority of cases you will >> not be able to beat event loops, especially whne using multiple processes, >> as they use the given resources most efficiently. >> > > Only if you don't block, which is frequently hard to ensure if you are > using third-party libraries for database access (or heavy crypto, or > calc-intensive code that is painful to step explicitly). In fact, all > those nasty business-related functions that cause us to build systems > in the first place. In the case that you can either (a) just use threads instead of event loops, but you still want one process per core and several threads within, or (b) use an event loop, but also spawn separate threads for slow-running tasks (crypto , database, whatever) and queue your I/O operations to those threads into the event loop for non-blocking access to them. I think some of the issue in this argument is a matter of semantics. You can make threads scale up well anyways by simply designing your multi-threaded software to not contend on pthread mutexes and not having multiple threads writing to the same shared blocks of memory, but then you're effectively describing the behavior of processes, and you've implemented a multi-process model by using threads but not using most of the defining features of threads. You may as well save yourself some sanity and use processes at that point, and have any shared read-only data either in memory pre-fork (copy-on-write, and write never happens to these blocks), or via mmap(MAP_SHARED), or some other data-sharing mechanism. So If you've got software that's scaling well by adding threads as you add CPU cores, you've probably got software that could have just as efficiently been written as processes instead of threads, and been less error-prone to boot. -- Brandon _______________________________________________ libev mailing list libev@lists.schmorp.de http://lists.schmorp.de/cgi-bin/mailman/listinfo/libev