Re: Bound Threads
Hi all, I have just spend some time reading through all the discussions and the new threads document and I would like to propose the addition of a new library function. forkOS :: IO () - IO ThreadID The function forkOS forks a new Haskell thread that runs in a new OS (or native) thread. With this, I also propose that forkIO always runs a Haskell thread in the same OS thread that the current Haskell thread runs in. (i.e. forkIO: same OS thread, forkOS: new OS thread) Using the new primitive, we can view the new threadsafe keyword as syntactic sugar: foreign import threadsafe foo :: Int - IO Int === foo :: Int - IO Int foo i = threadSafe (primFoo i) foreign import foo primFoo :: IO Int where threadSafe :: IO a - IO a threadSafe io = do result -newEmptyMVarforkOS (do{ x -io; putMVar result x }) getMVar result Note that forkOS can use thread pooling in its implementation. The advantage of a separate function forkOS is that we put control back to the users hands, as a programmer can be very specific about which Haskell threads are part of a certain OS thread and can be specific about the OS thread that is used to run a foreign function. On other words, it is absolutely clear to which OS thread a Haskell thread is bound. In this respect, it helps to have another function that runs a Haskell thread in a specific OS thread. getOSThread :: ThreadID - OSThreadID forkIOIn :: OSThreadID - IO () - IO ThreadID I have the feeling that it is not difficult to implement forkOS and family once the runtime system has been upgraded to support multiple OS threads. Wolfgang, you seem to be the expert on the OS thread area, would it be hard? I am not saying that we should discard the threadsafe keyword as it might be a useful shorthand, but I think that it is in general a mistake to try to keep the management of OS threads implicit -- don't use new keywords, add combinators to implement them! I feel that the following has happened; urk, we need some way of keeping haskell threads running while calling C; we add threadsafe; whoops, sometimes a function expects that it is run in the same OS thread; we add bound; whoops, sometimes functions expect to be run from a specific OS thread... unsolved?? Before we know it, we have added tons of new keywords to solve the wrong problem. Maybe it is time to take a step back and use a somewhat lower level model with two fork variants: forkIO (in the same OS thread) and forkOS (in a new OS thread). It seems that none of the above problems occur when having explicit control. In general it seems that OS threads are a resource that is too subtle to be managed automatically as they have a profound impact on how libraries are used and applications are structured. All the best, Daan. worse is better :-) ___ FFI mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/ffi
Re: Bound Threads
Hi Simon, I'd like to point out a concept that I think is being missed here: We never want to specify what OS thread is running a particular Haskell thread. why not? Because (a) it doesn't matter: the programmer can never tell, and (b) we want to give the implementation freedom to spread Haskell threads across multiple OS threads to make use of multiple real CPUs. I agree that these are valid points. However, as I said, I don't think we can do (b), ie. automatic management, in many real-world situations. The above points are mostly useful in a pure Haskell setting. In general, I think that only the programmer knows what strategy to use. In particular, we can provide a fork function that forks of a new Haskell thread that maybe runs in a new OS thread, or other CPU; basically implementing the above concept for programs that don't care about how the Haskell threads are distributed over OS threads: fork :: IO () - IO ThreadID fork io = do newOS -[complex algorithm that determines if a new OS thread is needed.] if (newOS) then forkOS io else do threadID -[complex algorithm that determines in which existing thread we run it] forkIOIn threadID io Note that we can now implement our really sophisticated distributed algorithms in plain Haskell. The point is that you want to specify which OS thread is used to invoke a foreign function, NOT which OS thread is used to execute Haskell code. The semantics that Simon I wrote make this clear. This is a good point and that is also the weakness of the forkOS, forkIO approach: it is less declarative and thus leaves less freedom to the implementation. However, I hope that through functions like fork, we can bring back declarativeness by abstraction. If we keep thinking like this, then implementations like Hugs can be single-threaded internally but switch OS threads to call out to foreign functions, and implementations like GHC can be multi-threaded internally and avoid switching threads when calling out to foreign functions. Ha, this is not true :-) We are saved by your observation that in the Haskell world we can't observe whether we run in a different OS thread or not. Thus a single-threaded Hugs will implement forkOS as forkIO but still attaches a different Hugs OS thread identifier to the Haskell thread. When a foreign call is made, it matches the Hugs OS thread identifiers and uses a different OS thread if necessary, maintaining a mapping between the Hugs OS thread identifiers and the spawned OS threads. threadSafe :: IO a - IO a threadSafe io = do result -newEmptyMVar forkOS (do{ x -io; putMVar result x }) getMVar result This forces a thread switch when calling a threadsafe foreign function, which is something I think we want to avoid. We can refine the implementation to avoid a thread switch when it is the only Haskell thread running in the current OS thread: threadSafeEx :: IO a - IO a threadSafeEx io = do count -getHaskellThreadCountInTheCurrentOSThreadif (count 1) then threadSafe io else io I'm basing this on two assumptions: (a) switching OS threads is expensive and (b) threadsafe foreign calls are common. I could potentially be wrong on either of these, and I'm prepared to be persuaded. But if both (a) and (b) turn out to be true, then worse is worse in this case. I think you are righ on (a), but I also think that we can avoid it just as it can be sometimes avoided when implemented in C in the runtime. Can't say anything about (b). All the best, Daan. Now, I have an example from the wxHaskell GUI library that exposes some of the problems with multiple threads. I can't say it can be solved nicely with forkOS, so I wonder how it would work out with threadsafe: The example is a Haskell initialization function that is called via a callback from the GUI library. The Haskell initialization function wants to do a lot processing but still stay reactive to close events for example. Since events are processed in an eventloop, new events can only come in by returning from the callback. So, the initilization functions forks of a Haskell thread (the processor) to do all the work and returns as soon as possible to the C GUI library. Now, the eventloop starts to wait for the next event in C land. The problem is that the processor thread won't run since we have returned to C-land and the haskell scheduler can't run. We can solve it by running the processor thread with forkOS. I can't say it is a particularly nice solution but it is how it is done in all other major programming languages. I wonder how the threadsafe keyword can be used to solve this problem. Since the haskell function is called via a callback, I guess that threadsafe should also apply to wrapper functions -- that is, when the foreign world calls haskell, we use another OS thread to run the haskell code. However, I think that we are than forced to use a OS thread context switch?? Cheers, Simon
Re: Bound Threads
I have just spend some time reading through all the discussions and the new threads document and I would like to propose the addition of a new library function. forkOS :: IO () - IO ThreadID Something like that is already in the proposal, only it's currently called forkBoundThread and it doesn't return the ThreadID (that can be changed, though). With this, I also propose that forkIO always runs a Haskell thread in the same OS thread that the current Haskell thread runs in. (i.e. forkIO: same OS thread, forkOS: new OS thread) In the proposal we wrote: The specification shouldnt explicitly require lightweight green threads to exist. The specification should be implementable in a simple and obvious way in haskell systems that always use a 1:1 correspondence between Haskell threads and OS threads. The idea was that lightweight (green) threads are an optimization only (do they have any other advantage?), not a language feature, and that implementations of Haskell should not be forced to support a complex thread management system. Your proposal obviously contradicts this. What is the advantage of explicitly requiring one OS thread to execute (the foreign calls made by) several Haskell threads? So far, I was only able to think of two possible situations: a) The foreign functions don't care what thread they are called from In that case, I would like the implementation to run my Haskell threads in the most efficient way possible. Currently, that means scheduling them all in one OS thread, but that is an implementation detail that I don't want to care about when I'm writing a normal application. On a four-processor-SMP machine, the most efficient way is to run them simultaneously in four OS threads (no implementation currently supports this, but there's experimental code in the GHC repository). b) The foreign functions do care what thread they are called from In that case I want the implementation to have an exact correspondence between Haskell threads and OS thread. I just want to think about one thread, and I don't want to manage some correspondence between Haskell threads and OS threads manually. Using the new primitive, we can view the new threadsafe keyword as syntactic sugar: foreign import threadsafe foo :: Int - IO Int === foo :: Int - IO Int foo i = threadSafe (primFoo i) foreign import foo primFoo :: IO Int where threadSafe :: IO a - IO a threadSafe io = do result -newEmptyMVarforkOS (do{ x -io; putMVar result x }) getMVar result That looks dangerous: I want to call both threadsafe imports and unsafe imports from a bound thread, and I expect all foreign calls from a bound thread to be executed from the same OS thread (by the definitioon of a bound thread). This implementation of threadsafe always uses another (new or pooled) OS thread for the threadsafe call. getOSThread :: ThreadID - OSThreadID forkIOIn :: OSThreadID - IO () - IO ThreadID Why should the RTS do inter-OS-thread messaging for us? I have the feeling that it is not difficult to implement forkOS and family once the runtime system has been upgraded to support multiple OS threads. Wolfgang, you seem to be the expert on the OS thread area, would it be hard? It would definitely more difficult to implement in GHC than the current proposal, but it could be done. In fact I think that implementing it would be more fun for me than having to use it afterwards. I am not saying that we should discard the threadsafe keyword as it might be a useful shorthand, but I think that it is in general a mistake to try to keep the management of OS threads implicit -- don't use new keywords, add combinators to implement them! Management of OS threads _should_ be kept implicit. Ideally, the user should never notice that the GHC runtime is using green threads internally. I feel that the following has happened; urk, we need some way of keeping haskell threads running while calling C; we add threadsafe; whoops, sometimes a function expects that it is run in the same OS thread; we add bound; whoops, sometimes functions expect to be run from a specific OS thread... unsolved?? Not unsolved. Use Control.Concurrent.Chan :-) Before we know it, we have added tons of new keywords to solve the wrong problem. The problem being, that some Haskell implementation try to optimize concurrency by doing the scheduling themselves. We have to provide hints (threadsafe and bounds) to the implementation to specify just how much it is allowed to optimize. We should never be required to explicitly do the optimization in the source code. It will break with SMP implementations (which I expect to be using in a few years), because different optimizations are required - suddenly it will be desirable to have multiple OS threads for performance reasons. Maybe it is time to take a step back and use a somewhat lower level model with two fork variants: forkIO (in the same OS thread) and forkOS (in a new OS thread). It seems that none of the
Re: Bound Threads
In general, I think that only the programmer knows what strategy to use. Do programmers know? I know about my own program, but do I know about the library that I am going to use? Does it use forkOS or forkIO? What will be the consequences if it uses forkIO and I do a lengthy foreign call? Does the library writer know about my program? I fear I'll end up wrapping every call to *Haskell* libraries in your threadSafe combinator - just to be sure that the library and my program don't interfere. I'm very afraid of having some long debugging sessions once we have this feature. Now, I have an example from the wxHaskell GUI library that exposes some of the problems with multiple threads. I can't say it can be solved nicely with forkOS, so I wonder how it would work out with threadsafe: In both cases, threadsafe and forkOS, we would have two OS threads, and we would have OS thread context switches between them. So it will get us nowhere to count the OS thread context switches involved (incidentally, handling GUI events is an area where we can easily afford the cost of OS thread context switches). You would mark the call to the wxWindows event loop as threadsafe (actually, in the CVS version of GHC, safe is currently a shorthand for threadsafe as nobody has yet provided a logically sound and meaningful definition of the semantics that safe should have instead). In case that wxWindows makes the assumption that it's functions are invoked _from the same OS thread_ that it used to call your callback, you can add bound to the foreign export statement for your callback. Then you would use just forkIO and everything would work. (If wxWindows doesn't allow access from multiple threads, then of course you can't call wx functions from the thread you just forked. But that's the same no matter which proposal we follow.) The problem is that the processor thread won't run since we have returned to C-land and the haskell scheduler can't run. Threadsafe means that you don't need to care about things like that. [...] but it is how it is done in all other major programming languages. Only other languages don't require you to manually keep track of a correspondence between lightweight and heavyweight threads. Since the haskell function is called via a callback, I guess that threadsafe should also apply to wrapper functions -- that is, when the foreign world calls haskell, we use another OS thread to run the haskell code. Sorry, no clue what you mean... could you elaborate? threadsafe just applies to imports, not to exports and wrappers. bound applies to exports and wrappers, but not to imports. Should it be different? Cheers, Wolfgang ___ FFI mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/ffi
Re: Bound Threads
(I keep forgetting to correctly fill out the To: and Cc: fields before sending my reply... so here's the copy that should have been sent to the list 10 minutes ago...) Daan Leijen wrote: Hi Wolfgang, I feel like you are beating my proposal to death here, and I find it hard to react individually to all your remarks. Sorry, maybe I shouldn't shoot all my arguments at you at once :-) I'll try to focus on the main issue: You are worried that the forkOS and forkIO distinction is too primitive and that it rule out sophisticated scheduling on SMP processors for example. That is closely related to the point that I don't want implementation details (that several Haskell threads can run in one OS thread) to be part of a defined interface that has to be supported by all implementations. My main point is probably my strong phobia of safe calls: they are not safe to use. FFI newbies are repeatedly getting bitten by the fact that safe calls block everything else. There is no natural reason for a call to block other threads, it's a consequence of an implementation detail (lightweight threads). The intention behind threadsafe is to make this implementation detail invisible, and therefore harmless. The forkOS/forkIO proposal seems to rely on having foreign calls that block other threads (that run in the same OS thread)... or doesn't it? Maybe, the forkOS/forkIO approach is flawed, but I think we should only rule it out when we can provide a convincing example where only the keyword approach would work, and where we can't use combinators to achieve the same effect. That's unfair ;-) --- I could also claim the reverse and say that we stick with threadsafe/bound until we have a convincing example... Using combinators sounds good, but there are things where combinators are not automatically the best choice (we're usually not using combinators to implement lazyness, either), and we don't know yet whether this is the case here or not. So let's just get on with the discussion About the example: [snip] That is amazing :-) [...] Well, of course, it's none of your business to know. Different Haskell runtimes could use completely different schemes, and you wouldn't even notice. In GHC, the second OS thread is spawned earlier, the first time a threadsafe call is made. When the threadsafe call is made, GHC a) makes sure that there is a second OS thread available (some overhead the first time) and b) makes sure that all non-bound [in the current implementation: all] Haskell threads are executed by the second OS thread from now on. Now if a bound callback is invoked [not yet implemented], the bound callback is executed in the thread that the wrapper was called in (the first OS thread). All other Haskell threads (if any) continue to run in the second OS thread. If forkIO is called, the new Haskell thread is just added to the list of threads to be run by the second OS thread, no new OS thread has to be spawned, and forkIO doesn't have to know about threadsafe or about bound. If a non-bound callback is invoked [available now in the CVS HEAD], the callback is executed along with the other background Haskell threads in the second OS thread. Since the RTS can't guess whether to use new OS threads or not at forkIO, I assumed that you could mark wrapper functions (callbacks) with a threadsafe attribute. If this is not the case, I don't understand how the implementation could work in this particular example. It doesn't need to guess, threadsafe is just for imports, and it works anyway. See above, and if I haven't explained it clearly, try http://www.cse.unsw.edu.au/~chak/haskell/ghc/comm/rts-libs/multi- thread.html (which I really should update, it's outdated), or ask again. Maybe we need a more exact specification of how forkOS/forkIO should behave, especially with respect to foreign calls blocking other threads. Could you elaborate on how you would expect normal (safe) foreign calls to behave in different situations? Cheers, Wolfgang ___ FFI mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/ffi