Re: Re[2]: FFI: number of worker threads?
On 6/21/06, Simon Peyton-Jones [EMAIL PROTECTED] wrote: New worker threads are spawned on as needed. You'll need as many of them as you have simultaneously-blocked foreign calls. If you have 2000 simultaneously-blocked foreign calls, you'll need 2000 OS threads to support them, which probably won't work. 2000 OS threads definitely sound scary, but it is possible to work. The Linux NPTL threads can scale well up to 10K threads and the stack address spaces would be sufficient on 64-bit systems. I am thinking about some p2p applications where each peer is maintaining a huge amount of TCP connections to other peers, but most of these connections are idle. Unforturnately the default GHC RTS is multiplexing I/O using select, which is O(n) and it seems to have a FDSET size limit of 1024. That makes me wonder if the current design of the GHC RTS is optimal in the long run. As software and hardware evolves, we will have efficient OS threads (like NPTL) and huge (64-bit) address spaces. My guess is (1) It is always a good idea to multiplex GHC user-level threads on OS threads, because it improve performance. (2) It may not be optimal to multiplex nonblocking I/O inside the GHC RTS, because it is unrealistic to have an event-driven I/O interface that is both efficient (like AIO/epoll) and portable (like select/poll). What is worse, nonblocking I/O still blocks on disk accesses. On the other hand, the POSIX threads are portable and it can be efficiently implemented on many systems. At least on Linux, NPTL easily beats select! My wish is to have a future GHC implementation that (a) uses blocking I/O directly provided by the OS, and (b) provides more control over OS threads and the internal worker thread pool. Using blocking I/O will simplify the current design and allow the programmer to take advantage of high-performance OS threads. If non-blocking I/O is really needed, the programmer can use customized, Claessen-style threads wrapped in modular libraries---some of my preliminary tests show that Claessen-style threads can do a much better job to multiplex asynchronous I/O. If you think you have only a handful of simultaneously-blocked foreign calls, but you still get runaway worker threads, please do make a reproducible test case and file a bug report. Yes, I will try to make a reproducible test case soon. Once you get answers, can I ask either or both of you to type in what you learned to the GHC user-documentation Wiki? That way things improve! The place to start is here http://haskell.org/haskellwiki/GHC under Collaborative documentation. There's a already a page for Concurrency and for FFI, so you can add to those. Thanks Certainly! ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: Re[2]: FFI: number of worker threads?
On 6/21/06, Duncan Coutts [EMAIL PROTECTED] wrote: On linux, epoll scales very well with minimal overhead. Using multiple OS threads to do blocking IO would not scale in the case of lots of idle socket connections, you'd need one OS thread per socket. On Linux, OS threads can also scale very well. I have done an experiment using pipes and NPTL where most connections are idle---the performance scales like a straight line when up to 32K file descriptors and 16K threads are used. The IO is actually no longer done inside the RTS, it's done by a Haskell worker thread. So it should be easier now to use platform-specific select() replacements. It's already different between unix/win32. So I'd suggest the best approach is to keep the existing multiplexing non-blocking IO system and start to take advantage of more scalable IO APIs on the platforms we really care about (either select/poll replacements or AIO). It is easy to take advantage of epoll---it shouldn't be that hard to bake it in. The question is about flexiblity: do we want it to be edge-triggered or level-triggered? Even with epoll built-in, the disk performance cannot keep up with NPTL unless AIO is also built-in. But for AIO, it is more complicated. It bypasses the OS caching; the Linux AIO even requires the use of certain kinds of file systems. My idea is that not everybody needs high-performance, asynchronous or nonblocking I/O. For those who really need it, it is worth (or, necessary) writing their own event loops, and event-driven programming in Haskell is not that difficult using CPS monads. ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
FFI: number of worker threads?
Hello, The paper Extending the Haskell FFI with Concurrency mentioned the following in Section 6.3: GHC's run-time system employs one OS thread for every bound thread; additionally, there is a variable number of so-called worker OS threads that are used to execute the unbounded (lightweight) threads. How does the runtime system determine the number of worker threads? Is the number hardcoded in the RTS or dynamically adjustable? Can a programmer specify it as an RTS option or change it using an API? I would like to use a large number (say, 2000) of unbounded threads, each calling a blocking, safe foreign function via FFI import. What is supposed to happen if all the worker threads are used up? I tried this in the recent GHC 6.5 and got some kind of runaway worker threads? RTS failure message when more than 32 threads are used. Is it a current limitation of the RTS, or should I file a bug report for it? Thanks, Peng ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
readChan and unGetChan?
Suppose the following happens: (1) Thread A calls readChan on an empty channel and waits (2) Thread B puts something to the read-end of the channel using unGetChan When a GHC program does this, both threads are blocked! Is it the behaviour we really want for unGetChan, or should we fix the implementation for Control.Concurrent.Chan? Thanks, Peng ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Allocating aligned memory?
In GHC, how can I allocate a chunk of memory aligned to some block size (say, 512 or 1024 bytes)? I tried to specify it in the alignment method in the Storable typeclass, but that does not seem to work. Is Storable.alignment really used in GHC? If so, is there a code example that allocates aligned memory in this way? For the moment I am using the C function memalign() like this: foreign import ccall static stdlib.h memalign :: CInt - CInt - IO (Ptr CChar) do ptr - memalign alignment size fptr - newForeignPtr finalizerFree ptr Is it safe to do so? Thanks, Peng ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Using GHC with SMP and FFI?
[1] Extending the Haskell Foreign Function Interface with Concurrency [2] Haskell on a Shared-Memory Multiprocessor I read the above two papers [1,2] and I have been trying to write an application that uses both FFI and SMP. The first paper [1] shows how FFI is implemented on uniprocessor concurrent Haskell; the second paper [2] shows how SMP Concurrent Haskell is implemented. However, I found little documentation on using FFI with the latest SMP extension. In addition to [1], what has been changed and what should a programmer know if he wants to use FFI in a multithreaded program running on SMP machines? Best, Peng ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Read integer from prompt
hi In the older hugs, i do this to read in integer from standard input: readNum :: IO Integer readNum = do { line - getLine ; readIO line } However, in hugs98, it failed and the error message is: User error: PreludeIO.readIO: no parse Why? And can anybody tell me how to read in integer in hugs98 lipeng