Re: [Haskell] select(2) or poll(2)-like function?
On Sun, Apr 17, 2011 at 01:44:50PM -0700, Don Stewart wrote: > `forkIO` is based on epoll. So threadWaitFD and friends are using epoll. Or (on non-Linux systems) on kqueue or poll, as i learned from grep(1) and the folowups here. (And sorry for the noise; I really didn't expect such a flamebait) At least now i know what should be changed for frameworks like wai/warp to listen on multible sockets. Ciao, Kili ___ Haskell mailing list Haskell@haskell.org http://www.haskell.org/mailman/listinfo/haskell
Re: [Haskell] select(2) or poll(2)-like function?
Mike Meyer wrote: > [...] In case you don't have a subscription in haskell-cafe, I have replied there, because this discussion does not belong to the general Haskell list. Greets, Ertugrul -- nightmare = unsafePerformIO (getWrongWife >>= sex) http://ertes.de/ ___ Haskell mailing list Haskell@haskell.org http://www.haskell.org/mailman/listinfo/haskell
Re: [Haskell] select(2) or poll(2)-like function?
On 18 April 2011 17:48, Mike Meyer wrote: > On Mon, 18 Apr 2011 17:07:53 +0100 > Colin Adams wrote: > > On 18 April 2011 16:54, Ertugrul Soeylemez wrote: > > > > > > > > > > > > > Well, *someone* has to worry about robustness and scalability. Users > > > > notice when their two minute system builds start taking four minutes > > > > (and will be at my door wanting me to fix it) because something > didn't > > > > scale fast enough, or have to be run more than once because a failing > > > > component build wasn't restarted properly. I'm willing to believe > that > > > > haskell lets you write more scalable code than C, but C's tools for > > > > handling concurrency suck, so that should be true in any language > > > > where someone actually thought about dealing with concurrency beyond > > > > locks and protected methods. The problem is, the only language I've > > > > found where that's true that *also* has reasonable tools to deal with > > > > scaling beyond a single system is Eiffel (which apparently abstracts > > > > things even further than haskell - details like how concurrency is > > > > achieved or how many concurrent operations you can have are > configured > > > > when you start an application, *not* when writing it). Unfortunately, > > > > Eiffel has other problems that make it undesirable. > > > > > > I can't make a comparison, because I don't know Eiffel. > > > > I do, and I don't recognize what the OP is referring to - I suspect he > meant > > Erlang. > > No, I meant Eiffel. In particular, the Simple Concurrent Object > Oriented Programming system was described in OOSC but is "not yet part > of the official standard" (one of those "other problems" I > mentioned). > SCOOP has yet to be implemented in a released compiler. Eiffel Software are currently incorporating into version 6.8, which will be released in a month or two. Note there is known to be an overhead associated with this level of abstraction. -- Colin Adams Preston, Lancashire, ENGLAND () ascii ribbon campaign - against html e-mail /\ www.asciiribbon.org - against proprietary attachments ___ Haskell mailing list Haskell@haskell.org http://www.haskell.org/mailman/listinfo/haskell
Re: [Haskell] select(2) or poll(2)-like function?
On Mon, 18 Apr 2011 17:07:53 +0100 Colin Adams wrote: > On 18 April 2011 16:54, Ertugrul Soeylemez wrote: > > > > > > > > > Well, *someone* has to worry about robustness and scalability. Users > > > notice when their two minute system builds start taking four minutes > > > (and will be at my door wanting me to fix it) because something didn't > > > scale fast enough, or have to be run more than once because a failing > > > component build wasn't restarted properly. I'm willing to believe that > > > haskell lets you write more scalable code than C, but C's tools for > > > handling concurrency suck, so that should be true in any language > > > where someone actually thought about dealing with concurrency beyond > > > locks and protected methods. The problem is, the only language I've > > > found where that's true that *also* has reasonable tools to deal with > > > scaling beyond a single system is Eiffel (which apparently abstracts > > > things even further than haskell - details like how concurrency is > > > achieved or how many concurrent operations you can have are configured > > > when you start an application, *not* when writing it). Unfortunately, > > > Eiffel has other problems that make it undesirable. > > > > I can't make a comparison, because I don't know Eiffel. > > I do, and I don't recognize what the OP is referring to - I suspect he meant > Erlang. No, I meant Eiffel. In particular, the Simple Concurrent Object Oriented Programming system was described in OOSC but is "not yet part of the official standard" (one of those "other problems" I mentioned). At the code level, you declared variables as referencing "separate" objects, meaning their methods ran concurrently when invoked. These declarations interacted with preconditions and method arguments, so the compiler could deal with locks, wait conditions, and synchronization behind the scenes. Processors (i.e. - threads, or processes, or even processes on remote systems) were assigned to the program when it started, and the RTS dealt with assigning objects to processors and making sure the communications worked properly. http://www.mired.org/consulting.html Independent Software developer/SCM consultant, email for more information. O< ascii ribbon campaign - stop html mail - www.asciiribbon.org ___ Haskell mailing list Haskell@haskell.org http://www.haskell.org/mailman/listinfo/haskell
Re: [Haskell] select(2) or poll(2)-like function?
Redirecting to haskell-cafe@, where this kind of long discussion belongs. On Mon, Apr 18, 2011 at 9:07 AM, Colin Adams wrote: > > > On 18 April 2011 16:54, Ertugrul Soeylemez wrote: >> >> > >> > Well, *someone* has to worry about robustness and scalability. Users >> > notice when their two minute system builds start taking four minutes >> > (and will be at my door wanting me to fix it) because something didn't >> > scale fast enough, or have to be run more than once because a failing >> > component build wasn't restarted properly. I'm willing to believe that >> > haskell lets you write more scalable code than C, but C's tools for >> > handling concurrency suck, so that should be true in any language >> > where someone actually thought about dealing with concurrency beyond >> > locks and protected methods. The problem is, the only language I've >> > found where that's true that *also* has reasonable tools to deal with >> > scaling beyond a single system is Eiffel (which apparently abstracts >> > things even further than haskell - details like how concurrency is >> > achieved or how many concurrent operations you can have are configured >> > when you start an application, *not* when writing it). Unfortunately, >> > Eiffel has other problems that make it undesirable. >> >> I can't make a comparison, because I don't know Eiffel. > > I do, and I don't recognize what the OP is referring to - I suspect he meant > Erlang. > > -- > Colin Adams > Preston, Lancashire, ENGLAND > () ascii ribbon campaign - against html e-mail > /\ www.asciiribbon.org - against proprietary attachments > > ___ > Haskell mailing list > Haskell@haskell.org > http://www.haskell.org/mailman/listinfo/haskell > > ___ Haskell mailing list Haskell@haskell.org http://www.haskell.org/mailman/listinfo/haskell
Re: [Haskell] select(2) or poll(2)-like function?
On 18 April 2011 16:54, Ertugrul Soeylemez wrote: > > > > > Well, *someone* has to worry about robustness and scalability. Users > > notice when their two minute system builds start taking four minutes > > (and will be at my door wanting me to fix it) because something didn't > > scale fast enough, or have to be run more than once because a failing > > component build wasn't restarted properly. I'm willing to believe that > > haskell lets you write more scalable code than C, but C's tools for > > handling concurrency suck, so that should be true in any language > > where someone actually thought about dealing with concurrency beyond > > locks and protected methods. The problem is, the only language I've > > found where that's true that *also* has reasonable tools to deal with > > scaling beyond a single system is Eiffel (which apparently abstracts > > things even further than haskell - details like how concurrency is > > achieved or how many concurrent operations you can have are configured > > when you start an application, *not* when writing it). Unfortunately, > > Eiffel has other problems that make it undesirable. > > I can't make a comparison, because I don't know Eiffel. > I do, and I don't recognize what the OP is referring to - I suspect he meant Erlang. -- Colin Adams Preston, Lancashire, ENGLAND () ascii ribbon campaign - against html e-mail /\ www.asciiribbon.org - against proprietary attachments ___ Haskell mailing list Haskell@haskell.org http://www.haskell.org/mailman/listinfo/haskell
Re: [Haskell] select(2) or poll(2)-like function?
Mike Meyer wrote: > > You also don't need Emacs/Vim, if all you want is to write a simple > > plain text file. There is nothing wrong with concurrency, because > > you are confusing the high level model with the low level > > implementation. Concurrency is nothing but a design pattern, and > > GHC shows that a high level design pattern can be mapped to > > efficient low level code. > > Possibly true. The question is - can it be mapped to a design that's > as robust and scalable as the ones I'm used to working on? Well, my Haskell programs run for weeks without problems and can handle a lot of load easily. > > In Haskell you should not use explicit, manual OS threading/forking > > for the same reason you shouldn't write machine code manually. > > That's a good thing - providing it doesn't compromise robustness and > scalability. Well, any implementation of threading, which compromises robustness or scalability can be considered broken, and Haskell developers try hard not to deliver broken packages. After all, Haskell is all about safety. Of course, Haskell's RTS is not perfect, but comparing costs will show that using the RTS is cheaper than writing your own hand-optimized scheduler. > > Perhaps Haskell is the wrong language for you. How about > > programming in C/C++? I think you want more control over low level > > resources than Haskell gives you. But I suggest having a closer > > look at concurrency. > > Personally, I don't want to have to worry about low-level resources, > or even concurrency. Having to do so feels to much like having to > explicitly allocate and free memory, or worry about register > allocations. But if I have to do those things to get robustness and > scalability until the languages start being able to deal with it, then > I need the RTS to get out of the way and let me do my job. Then again, Haskell is the wrong language. Don't expect any explicit OS threading builtin any time soon, if ever. You have to trust the compiler and RTS to generate efficient handle/thread handling just as well as you would have to trust a C compiler to generate efficient machine code. And reinventing the wheel is certainly a step away from robustness. Since most Haskell programmers are using GHC, it follows naturally that most Haskell developers rely on its RTS. If it were as badly broken as you seem to believe it is, then people would not use it. > If I'm using a value that needs protection from concurrent access > without providing that protection, I want the system give me an > error. At run-time is acceptable, but compile time is better. I want > the system to make sure the concurrent protection mechanisms work > properly - no deadlocks, no stuck process, etc - without my having to > do anything but indicate which values need such protection. Use concurrency with a lockfree communication abstraction like STM. A concurrency system is only a proper concurrency system, if you don't have to care about locking. Of course checking for deadlocks is undecidable in general, but GHC is pretty good at finding deadlocked threads and throws an exception, when it does. > > When writing concurrent code you don't care about how the RTS maps > > it to processes and threads. GHC chose threads, probably because > > they are faster to create/kill and consume less memory. But this is > > an implementation detail the Haskell developer should not have to > > worry about. > > So - what happens when a thread fails for some reason? I'm used to > dealing with systems that run 7x24 for weeks or even months on > end. Hardware hiccups, network failures, bogus input, hung clients, > etc. are all just facts of life. I need the system to keep running > properly in the face of all those, and I need them to disrupt the > world as little as possible. If the hardware goes bad, there isn't much you can do. But all other scenarios are handled well. You will want to write exception handlers. > Given that the RTS has taken control over this stuff, I sort of expect > it to take care of noticing a dead process and restarting it as > well. All of which is fine by me. The RTS is not responsible for reawakening, only for noticing death. Note that the RTS is not a server framework. You can take notice of the death of a thread and restart it yourself. > > In other words: Robustness and scalability should not be your > > business in Haskell. You should concentrate on understanding and > > using the concurrency concept well. And just to encourage you: I > > write productive concurrent servers in Haskell, which scale very > > well and probably better than an equivalent C implementation would. > > Reason: A Haskell thread is not mapped to an operating system > > thread (unless you used forkOS). When it is advantageous, the RTS > > can well decide to let another OS thread continue a running Haskell > > thread. That way the active OS threads are always utilized as > > efficiently as possible. It would be a pai
Re: [Haskell] select(2) or poll(2)-like function?
Mike Meyer wrote: > > To add a bit more. The most common use of select/epoll is to > > simulate the concurrency because the natural way of doing it > > fork/pthread_create etc are too expensive. I dont know of any other > > reason why select/epoll exits. > > You know, I've *never* written a select/kqueue loop because > fork/pthread/etc. were to expensive (and I can remember when fork was > cheap). I always use it because the languages I'm working in have > sucky tools for dealing with concurrency, so it's easier just to avoid > the problems by not writing concurrent code. Have you ever written concurrent code in Haskell? Because ... > > If fork was trivial in terms of overhead, then one would rather > > write a webserver as > > > > forever do > > accept the next connection > > handle the request in the new child thread/process > > Only if you also made the TCP/IP connection overhead trivial so you > could stop with HTTP/1.0 and not deal with HTTP/1.1. Failing that, the > most natural way to do this is: > > forever do > accept the next connection > handle requests from connection in new child > wait for next events > if one is a client request, start the response > if one is a finished response, return it to the client > if one is something else, something broke, deal with it. > > I.e, an event-driven loop for each incoming connection running in it's > own process. ... it seems that you have a completely wrong impression of cheap concurrency. You are still connecting Haskell threads to operating system threads and resources somehow. It's called "cheap concurrency" for a good reason. There is nothing wrong with creating tens or even hundreds of threads per client, even when you have hundreds of clients at the same time. Please don't think of Haskell threads as some concrete memory/execution object, because they are really not. They are a design pattern and the resulting code will be the ordinary threaded epolled code, just like you would write it without concurrency, and likely even better, as I noted in the other reply. > > This is because it is a natural ``lift'' of the client handling code > > to many clients (While coding the handling code one need not worry > > about the other threads). > > Still true - at least if you don't try and create a thread for each > request on a connection. If you do that, then the threads on a > connection have to worry about each other. Which is why the event loop > for the second stage is more natural than creating more threads. In Haskell there are many very easy to use communication constructs. You will like MVars and STM for this. Done properly concurrency with communication constructs are easier to use than event-driven client management. > > GHC's runtime with forkIO makes this natural server code > > efficient. It might use epoll/kqueue/black magic/sale of souls to > > the devil I don't care. > > But doesn't remove the need for some kind of event handling tool in > each thread. If you want to call STM event handling, you can, but I think that this interpretation doesn't really fit. Event handling involves polling/waiting. STM does not. For example one perfectly fine application is this: There is a variable holding a list of all clients and one thread scans this list in an infinite loop trying to find clients, which have a certain bit set. For each client found, the thread performs some reaction to this and resets the bit. There is no waiting. The thread really runs an infinite loop. Now one might think that this will waste CPU cycles. But it does not, because it's STM. Greets, Ertugrul -- nightmare = unsafePerformIO (getWrongWife >>= sex) http://ertes.de/ ___ Haskell mailing list Haskell@haskell.org http://www.haskell.org/mailman/listinfo/haskell
Re: [Haskell] select(2) or poll(2)-like function?
Hi Mike, On Mon, Apr 18, 2011 at 12:00 PM, Mike Meyer wrote: > > It's useful to use non-determinism (i.e. concurrency) to model a server > > processing multiple requests. Since requests are independent and > shouldn't > > impact each other we'd like to model them as such. This implies some > level > > of concurrency (whether using threads and processes). > > But because the requests are independent, you don't need concurrency > in this case - parallelism is sufficient. The unix process model works > quite well. Compared to a threaded model, this is more robust (if a > process breaks, you can kill and restart it without affecting other > processes, whereas if a thread breaks, restarting the process and all > the threads in it is the only safe option) and scalable (you're > already doing ipc, so moving processes onto more systems is easy, and > trivial if you design for it). The events handled by a single process > are simple enough that your callback/event spaghetti can line up in > nice, straight strands. > Threads and processes are both concurrent programming models. You get parallelism if you map the threads/processes to more than one CPU core. Processes (and to some extents threads) only scale (well) horizontally; given more processes you can handle more concurrent requests. However, they don't scale vertically, given more processes you don't handle a particular request any faster*. Since GHC's scheduler deals with both CPU bound threads and I/O bound threads you can use it to process single requests faster. A typically CPU bound activity is page rendering, which you could break up like this: renderPage = firstPart `par` secondPart `pseq` combine firstPart secondPart where firstPart = render ... secondPart = render ... This would run in GHC's per CPU core thread pool, if there are free CPU resources to do so. Regarding robustness I think Simon covered most of that. GHC's RTS gives you the building blocks you need to write Erland style process monitoring/restart. * In theory you could use IPC to implement parallel algorithms using processes, but it's hard to achieve real world performance gains this way as the overhead is quite large. Johan ___ Haskell mailing list Haskell@haskell.org http://www.haskell.org/mailman/listinfo/haskell
Re: [Haskell] select(2) or poll(2)-like function?
On Mon, Apr 18, 2011 at 2:25 PM, Mike Meyer wrote: > Only if you also made the TCP/IP connection overhead trivial so you > could stop with HTTP/1.0 and not deal with HTTP/1.1. Failing that, the > most natural way to do this is: > > forever do >accept the next connection > handle requests from connection in new child > wait for next events > if one is a client request, start the response > if one is a finished response, return it to the client > if one is something else, something broke, deal with it. > > I.e, an event-driven loop for each incoming connection running in it's > own process. > I'm not sure if I would call this "event driven", depending what's hiding behind the sentences above. Here's how I've written servers that deal with keep-alive, errors, etc. server sock = forever $ do clientSock <- accept sock forkIO $ talk clientSock talk sock = do maybeReq <- read sock case maybeReq of Nothing -> ... -- error while reading/parsing. Close socket or send 400 Bad Request Just req -> do resp <- handler req send sock resp when (isKeepAlive req) $ talk sock The above code is executed efficiently using epoll but the programming model is one with blocking system calls. Johan ___ Haskell mailing list Haskell@haskell.org http://www.haskell.org/mailman/listinfo/haskell
Re: [Haskell] select(2) or poll(2)-like function?
On Mon, 18 Apr 2011 17:05:12 +0530 Piyush P Kurur wrote: > On Mon, Apr 18, 2011 at 12:59:07PM +0200, Ertugrul Soeylemez wrote: > > Svein Ove Aas wrote: > To add a bit more. The most common use of select/epoll is to simulate > the concurrency because the natural way of doing it fork/pthread_create etc > are too expensive. I dont know of any other reason why select/epoll exits. You know, I've *never* written a select/kqueue loop because fork/pthread/etc. were to expensive (and I can remember when fork was cheap). I always use it because the languages I'm working in have sucky tools for dealing with concurrency, so it's easier just to avoid the problems by not writing concurrent code. > If fork was trivial in terms of overhead, then one would rather write a > webserver > as > > forever do > accept the next connection > handle the request in the new child thread/process Only if you also made the TCP/IP connection overhead trivial so you could stop with HTTP/1.0 and not deal with HTTP/1.1. Failing that, the most natural way to do this is: forever do accept the next connection handle requests from connection in new child wait for next events if one is a client request, start the response if one is a finished response, return it to the client if one is something else, something broke, deal with it. I.e, an event-driven loop for each incoming connection running in it's own process. > This is because it is a natural ``lift'' of the client handling code to many > clients (While coding the handling code one need not worry about the other > threads). Still true - at least if you don't try and create a thread for each request on a connection. If you do that, then the threads on a connection have to worry about each other. Which is why the event loop for the second stage is more natural than creating more threads. > GHC's runtime with forkIO makes this natural server code efficient. It might > use epoll/kqueue/black magic/sale of souls to the devil I don't care. But doesn't remove the need for some kind of event handling tool in each thread. http://www.mired.org/consulting.html Independent Software developer/SCM consultant, email for more information. O< ascii ribbon campaign - stop html mail - www.asciiribbon.org ___ Haskell mailing list Haskell@haskell.org http://www.haskell.org/mailman/listinfo/haskell
Re: [Haskell] select(2) or poll(2)-like function?
On Mon, Apr 18, 2011 at 07:55:57AM -0400, Mike Meyer wrote: > On Mon, 18 Apr 2011 12:56:39 +0200 > Ertugrul Soeylemez wrote: > > > > You also don't need Emacs/Vim, if all you want is to write a simple > > plain text file. There is nothing wrong with concurrency, because you > > are confusing the high level model with the low level implementation. > > Concurrency is nothing but a design pattern, and GHC shows that a high > > level design pattern can be mapped to efficient low level code. > > Possibly true. The question is - can it be mapped to a design that's > as robust and scalable as the ones I'm used to working on? I have not written any such server so take my response with a pinch of salt but I believe that the forkIO based solution does indeed scale to 10K clients. Internally the runtime uses epoll or kqueue to simulate the concurrency, so I don't see why it should be slower. May be you can check with the developers of Network.Wai or the Happstack server developers. They might be in a better possition to explain. > [snip] > > Perhaps Haskell is the wrong language for you. How about programming in > > C/C++? I think you want more control over low level resources than > > Haskell gives you. But I suggest having a closer look at concurrency. > > Personally, I don't want to have to worry about low-level resources, > or even concurrency. Having to do so feels to much like having to > explicitly allocate and free memory, or worry about register > allocations. But if I have to do those things to get robustness and > scalability until the languages start being able to deal with it, then > I need the RTS to get out of the way and let me do my job. I think it would be good idea to prototype what you want to do and have a try. Most likely I would think you would be surprised with the efficiency. > If I'm using a value that needs protection from concurrent access > without providing that protection, I want the system give me an > error. At run-time is acceptable, but compile time is better. Try STM's. They are great abstractions for shared states. Regards ppk ___ Haskell mailing list Haskell@haskell.org http://www.haskell.org/mailman/listinfo/haskell
Re: [Haskell] select(2) or poll(2)-like function?
On 18/04/2011 12:55, Mike Meyer wrote: On Mon, 18 Apr 2011 12:56:39 +0200 Ertugrul Soeylemez wrote: Mike Meyer wrote: The unix process model works quite well. Compared to a threaded model, this is more robust (if a process breaks, you can kill and restart it without affecting other processes, whereas if a thread breaks, restarting the process and all the threads in it is the only safe option) and scalable (you're already doing ipc, so moving processes onto more systems is easy, and trivial if you design for it). The events handled by a single process are simple enough that your callback/event spaghetti can line up in nice, straight strands. When writing concurrent code you don't care about how the RTS maps it to processes and threads. GHC chose threads, probably because they are faster to create/kill and consume less memory. But this is an implementation detail the Haskell developer should not have to worry about. So - what happens when a thread fails for some reason? I'm used to dealing with systems that run 7x24 for weeks or even months on end. Hardware hiccups, network failures, bogus input, hung clients, etc. are all just facts of life. I need the system to keep running properly in the face of all those, and I need them to disrupt the world as little as possible. Given that the RTS has taken control over this stuff, I sort of expect it to take care of noticing a dead process and restarting it as well. All of which is fine by me. The RTS can't manage things at that level, because it doesn't know what robustness model you want. So failures in the I/O library results in exceptions, and you get to decide what to do. If a thread dies due to an exception, then you are responsible for what happens from then on - typically you would have a top-level exception handler that notifies some higher-level thread what happened. It's true that Haskell doesn't give you as much help here as you would get in Erlang/OTP, but it's all readily programmed up. Haskell *does* give you some important guarantees though. Threads never just die without receiving an exception first. If a thread blocks on an unreachable resource then it gets an exception, so you get some help dealing with deadlocks. We don't need to do this. We can keep a concurrent programming model and get the execution efficiency of an event driven model. This is what GHC's I/O manager achieves. On top of that we also get parallelism for free. Another way to look at it is that GHC provides the scheduler (using a thread for the event loop and a separate worker pool) that you end up writing manually in event driven frameworks. So my question is - can I still get the robustness/scalability features I get from the unix process model using haskell? In particular, it seems like ghc starts threads I don't ask it to, and using both threads& forks for parallelism causes even more headaches than concurrency (at least on unix& unix-like systems), so just replicating the process model won't work well. Do any of the haskell parallel processing tools work across multiple systems? Effectively no (unless you want to use the terribly outdated GPH project), but that's a shortcoming of the current RTS, not of the design patterns you use in Haskell. By design Haskell programs are well suited for an auto-distributing RTS. It's just that no such RTS exists for recent versions of the common compilers. So is anyone working on such a package for haskell? I know clojure's got some people working on making STM work in a distributed environment, but that's outside the goals of the core team. Take a look at "Haskell for the Cloud", Jeff Epstein, Andrew Black and Simon Petyon Jones: http://research.microsoft.com/en-us/um/people/simonpj/papers/parallel/remote.pdf In other words: Robustness and scalability should not be your business in Haskell. You should concentrate on understanding and using the concurrency concept well. And just to encourage you: I write productive concurrent servers in Haskell, which scale very well and probably better than an equivalent C implementation would. Reason: A Haskell thread is not mapped to an operating system thread (unless you used forkOS). When it is advantageous, the RTS can well decide to let another OS thread continue a running Haskell thread. That way the active OS threads are always utilized as efficiently as possible. It would be a pain to get something like that with explicit threading and even more, when using processes. Well, *someone* has to worry about robustness and scalability. Users notice when their two minute system builds start taking four minutes (and will be at my door wanting me to fix it) because something didn't scale fast enough, or have to be run more than once because a failing component build wasn't restarted properly. I'm willing to believe that haskell lets you write more scalable code than C, but C's tools for handling concurrency suck, so that should be true in any lan
Re: [Haskell] select(2) or poll(2)-like function?
Please can this discussion be moved to haskell-cafe? http://www.haskell.org/haskellwiki/Mailing_Lists Ta. Jeremy On 18 Apr 2011, at 12:55, Mike Meyer wrote: On Mon, 18 Apr 2011 12:56:39 +0200 Ertugrul Soeylemez wrote: Mike Meyer wrote: On Mon, 18 Apr 2011 11:07:58 +0200 Johan Tibell wrote: On Mon, Apr 18, 2011 at 9:13 AM, Mike Meyer wrote: I always looked at it the other way 'round: threading is a hack to deal with system inadequacies like poor shared memory performance or an inability to get events from critical file types. Real processes and event-driven programming provide a more robust, understandable and scalable solutions. We need to keep two things separate: threads as a way to achieve concurrency and as a way to achieve parallelism [1]. Absolutely. Especially because you shouldn't have to deal with concurrency if all you want is parallelism. Your reference [1] covers why this is the case quite nicely (and is essentially the argument for "understandable" in my claim above). You also don't need Emacs/Vim, if all you want is to write a simple plain text file. There is nothing wrong with concurrency, because you are confusing the high level model with the low level implementation. Concurrency is nothing but a design pattern, and GHC shows that a high level design pattern can be mapped to efficient low level code. Possibly true. The question is - can it be mapped to a design that's as robust and scalable as the ones I'm used to working on? In Haskell you should not use explicit, manual OS threading/ forking for the same reason you shouldn't write machine code manually. That's a good thing - providing it doesn't compromise robustness and scalability. It's useful to use non-determinism (i.e. concurrency) to model a server processing multiple requests. Since requests are independent and shouldn't impact each other we'd like to model them as such. This implies some level of concurrency (whether using threads and processes). But because the requests are independent, you don't need concurrency in this case - parallelism is sufficient. Perhaps Haskell is the wrong language for you. How about programming in C/C++? I think you want more control over low level resources than Haskell gives you. But I suggest having a closer look at concurrency. Personally, I don't want to have to worry about low-level resources, or even concurrency. Having to do so feels to much like having to explicitly allocate and free memory, or worry about register allocations. But if I have to do those things to get robustness and scalability until the languages start being able to deal with it, then I need the RTS to get out of the way and let me do my job. If I'm using a value that needs protection from concurrent access without providing that protection, I want the system give me an error. At run-time is acceptable, but compile time is better. I want the system to make sure the concurrent protection mechanisms work properly - no deadlocks, no stuck process, etc - without my having to do anything but indicate which values need such protection. The unix process model works quite well. Compared to a threaded model, this is more robust (if a process breaks, you can kill and restart it without affecting other processes, whereas if a thread breaks, restarting the process and all the threads in it is the only safe option) and scalable (you're already doing ipc, so moving processes onto more systems is easy, and trivial if you design for it). The events handled by a single process are simple enough that your callback/event spaghetti can line up in nice, straight strands. When writing concurrent code you don't care about how the RTS maps it to processes and threads. GHC chose threads, probably because they are faster to create/kill and consume less memory. But this is an implementation detail the Haskell developer should not have to worry about. So - what happens when a thread fails for some reason? I'm used to dealing with systems that run 7x24 for weeks or even months on end. Hardware hiccups, network failures, bogus input, hung clients, etc. are all just facts of life. I need the system to keep running properly in the face of all those, and I need them to disrupt the world as little as possible. Given that the RTS has taken control over this stuff, I sort of expect it to take care of noticing a dead process and restarting it as well. All of which is fine by me. We don't need to do this. We can keep a concurrent programming model and get the execution efficiency of an event driven model. This is what GHC's I/O manager achieves. On top of that we also get parallelism for free. Another way to look at it is that GHC provides the scheduler (using a thread for the event loop and a separate worker pool) that you end up writing manually in event driven frameworks. So my question is - can I still get the robustness/scalability features I get from the unix process model usin
Re: [Haskell] select(2) or poll(2)-like function?
On Mon, Apr 18, 2011 at 12:59:07PM +0200, Ertugrul Soeylemez wrote: > Svein Ove Aas wrote: > > > And I've often wanted a select-equivalent in Haskell. It'd be simple, > > it'd help, so why not? > > Because perhaps it's just an illusion that it would help. I don't see > any advantage in using explicit polling. Use concurrency. > To add a bit more. The most common use of select/epoll is to simulate the concurrency because the natural way of doing it fork/pthread_create etc are too expensive. I don't know of any other reason why select/epoll exits. If fork was trivial in terms of overhead, then one would rather write a webserver as forever do accept the next connection handle the request in the new child thread/process This is because it is a natural ``lift'' of the client handling code to many clients (While coding the handling code one need not worry about the other threads). GHC's runtime with forkIO makes this natural server code efficient. It might use epoll/kqueue/black magic/sale of souls to the devil/whatever to achieve this, I don't care. And in case you want a shared state just adds an STM and that is it. Regards ppk ___ Haskell mailing list Haskell@haskell.org http://www.haskell.org/mailman/listinfo/haskell
Re: [Haskell] select(2) or poll(2)-like function?
On Mon, 18 Apr 2011 12:56:39 +0200 Ertugrul Soeylemez wrote: > Mike Meyer wrote: > > On Mon, 18 Apr 2011 11:07:58 +0200 > > Johan Tibell wrote: > > > On Mon, Apr 18, 2011 at 9:13 AM, Mike Meyer wrote: > > > > I always looked at it the other way 'round: threading is a hack to > > > > deal with system inadequacies like poor shared memory performance > > > > or an inability to get events from critical file types. > > > > > > > > Real processes and event-driven programming provide a more robust, > > > > understandable and scalable solutions. > > > > > > > > > > We need to keep two things separate: threads as a way to achieve > > > concurrency and as a way to achieve parallelism [1]. > > > > Absolutely. Especially because you shouldn't have to deal with > > concurrency if all you want is parallelism. Your reference [1] covers > > why this is the case quite nicely (and is essentially the argument for > > "understandable" in my claim above). > > You also don't need Emacs/Vim, if all you want is to write a simple > plain text file. There is nothing wrong with concurrency, because you > are confusing the high level model with the low level implementation. > Concurrency is nothing but a design pattern, and GHC shows that a high > level design pattern can be mapped to efficient low level code. Possibly true. The question is - can it be mapped to a design that's as robust and scalable as the ones I'm used to working on? > In Haskell you should not use explicit, manual OS threading/forking for > the same reason you shouldn't write machine code manually. That's a good thing - providing it doesn't compromise robustness and scalability. > > > It's useful to use non-determinism (i.e. concurrency) to model a > > > server processing multiple requests. Since requests are independent > > > and shouldn't impact each other we'd like to model them as > > > such. This implies some level of concurrency (whether using threads > > > and processes). > > > > But because the requests are independent, you don't need concurrency > > in this case - parallelism is sufficient. > Perhaps Haskell is the wrong language for you. How about programming in > C/C++? I think you want more control over low level resources than > Haskell gives you. But I suggest having a closer look at concurrency. Personally, I don't want to have to worry about low-level resources, or even concurrency. Having to do so feels to much like having to explicitly allocate and free memory, or worry about register allocations. But if I have to do those things to get robustness and scalability until the languages start being able to deal with it, then I need the RTS to get out of the way and let me do my job. If I'm using a value that needs protection from concurrent access without providing that protection, I want the system give me an error. At run-time is acceptable, but compile time is better. I want the system to make sure the concurrent protection mechanisms work properly - no deadlocks, no stuck process, etc - without my having to do anything but indicate which values need such protection. > > The unix process model works quite well. Compared to a threaded model, > > this is more robust (if a process breaks, you can kill and restart it > > without affecting other processes, whereas if a thread breaks, > > restarting the process and all the threads in it is the only safe > > option) and scalable (you're already doing ipc, so moving processes > > onto more systems is easy, and trivial if you design for it). The > > events handled by a single process are simple enough that your > > callback/event spaghetti can line up in nice, straight strands. > When writing concurrent code you don't care about how the RTS maps it to > processes and threads. GHC chose threads, probably because they are > faster to create/kill and consume less memory. But this is an > implementation detail the Haskell developer should not have to worry > about. So - what happens when a thread fails for some reason? I'm used to dealing with systems that run 7x24 for weeks or even months on end. Hardware hiccups, network failures, bogus input, hung clients, etc. are all just facts of life. I need the system to keep running properly in the face of all those, and I need them to disrupt the world as little as possible. Given that the RTS has taken control over this stuff, I sort of expect it to take care of noticing a dead process and restarting it as well. All of which is fine by me. > > > We don't need to do this. We can keep a concurrent programming model > > > and get the execution efficiency of an event driven model. This is > > > what GHC's I/O manager achieves. On top of that we also get > > > parallelism for free. Another way to look at it is that GHC provides > > > the scheduler (using a thread for the event loop and a separate > > > worker pool) that you end up writing manually in event driven > > > frameworks. > > > > So my question is - can I still get the robustness/scalabilit
Re: [Haskell] select(2) or poll(2)-like function?
On Mon, Apr 18, 2011 at 12:59:07PM +0200, Ertugrul Soeylemez wrote: > Svein Ove Aas wrote: > > > And I've often wanted a select-equivalent in Haskell. It'd be simple, > > it'd help, so why not? > > Because perhaps it's just an illusion that it would help. I don't see > any advantage in using explicit polling. Use concurrency. To add a bit more. The most common use of select/epoll is to simulate the concurrency because the natural way of doing it fork/pthread_create etc are too expensive. I dont know of any other reason why select/epoll exits. If fork was trivial in terms of overhead, then one would rather write a webserver as forever do accept the next connection handle the request in the new child thread/process This is because it is a natural ``lift'' of the client handling code to many clients (While coding the handling code one need not worry about the other threads). GHC's runtime with forkIO makes this natural server code efficient. It might use epoll/kqueue/black magic/sale of souls to the devil I don't care. And in case you want a shared state just adds an STM and that is it. Regards ppk ___ Haskell mailing list Haskell@haskell.org http://www.haskell.org/mailman/listinfo/haskell
Re: [Haskell] select(2) or poll(2)-like function?
On Mon, Apr 18, 2011 at 10:06 AM, Svein Ove Aas wrote: > And I've often wanted a select-equivalent in Haskell. It'd be simple, > it'd help, so why not? http://hackage.haskell.org/packages/archive/base/latest/doc/html/Control-Concurrent.html#v:threadWaitRead Ta-da! It's select(), or something smarter in GHC 7 (epoll() on linux, kqueue() on BSD/OSX). G -- Gregory Collins ___ Haskell mailing list Haskell@haskell.org http://www.haskell.org/mailman/listinfo/haskell
Re: [Haskell] select(2) or poll(2)-like function?
Svein Ove Aas wrote: > And I've often wanted a select-equivalent in Haskell. It'd be simple, > it'd help, so why not? Because perhaps it's just an illusion that it would help. I don't see any advantage in using explicit polling. Use concurrency. Greets, Ertugrul -- nightmare = unsafePerformIO (getWrongWife >>= sex) http://ertes.de/ ___ Haskell mailing list Haskell@haskell.org http://www.haskell.org/mailman/listinfo/haskell
Re: [Haskell] select(2) or poll(2)-like function?
Mike Meyer wrote: > On Mon, 18 Apr 2011 11:07:58 +0200 > Johan Tibell wrote: > > On Mon, Apr 18, 2011 at 9:13 AM, Mike Meyer wrote: > > > I always looked at it the other way 'round: threading is a hack to > > > deal with system inadequacies like poor shared memory performance > > > or an inability to get events from critical file types. > > > > > > Real processes and event-driven programming provide a more robust, > > > understandable and scalable solutions. > > > > > > > We need to keep two things separate: threads as a way to achieve > > concurrency and as a way to achieve parallelism [1]. > > Absolutely. Especially because you shouldn't have to deal with > concurrency if all you want is parallelism. Your reference [1] covers > why this is the case quite nicely (and is essentially the argument for > "understandable" in my claim above). You also don't need Emacs/Vim, if all you want is to write a simple plain text file. There is nothing wrong with concurrency, because you are confusing the high level model with the low level implementation. Concurrency is nothing but a design pattern, and GHC shows that a high level design pattern can be mapped to efficient low level code. Middle/high level languages follow the philosophy that the compiler is smarter than you, when it comes to generating machine code. Haskell takes this to its conclusion: The compiler and RTS are smarter than you, when it comes to generating machine code and managing system resources, including file descriptors and threads. In Haskell you should not use explicit, manual OS threading/forking for the same reason you shouldn't write machine code manually. > > It's useful to use non-determinism (i.e. concurrency) to model a > > server processing multiple requests. Since requests are independent > > and shouldn't impact each other we'd like to model them as > > such. This implies some level of concurrency (whether using threads > > and processes). > > But because the requests are independent, you don't need concurrency > in this case - parallelism is sufficient. Perhaps Haskell is the wrong language for you. How about programming in C/C++? I think you want more control over low level resources than Haskell gives you. But I suggest having a closer look at concurrency. > The unix process model works quite well. Compared to a threaded model, > this is more robust (if a process breaks, you can kill and restart it > without affecting other processes, whereas if a thread breaks, > restarting the process and all the threads in it is the only safe > option) and scalable (you're already doing ipc, so moving processes > onto more systems is easy, and trivial if you design for it). The > events handled by a single process are simple enough that your > callback/event spaghetti can line up in nice, straight strands. When writing concurrent code you don't care about how the RTS maps it to processes and threads. GHC chose threads, probably because they are faster to create/kill and consume less memory. But this is an implementation detail the Haskell developer should not have to worry about. > > We don't need to do this. We can keep a concurrent programming model > > and get the execution efficiency of an event driven model. This is > > what GHC's I/O manager achieves. On top of that we also get > > parallelism for free. Another way to look at it is that GHC provides > > the scheduler (using a thread for the event loop and a separate > > worker pool) that you end up writing manually in event driven > > frameworks. > > So my question is - can I still get the robustness/scalability > features I get from the unix process model using haskell? In > particular, it seems like ghc starts threads I don't ask it to, and > using both threads & forks for parallelism causes even more headaches > than concurrency (at least on unix & unix-like systems), so just > replicating the process model won't work well. Do any of the haskell > parallel processing tools work across multiple systems? Effectively no (unless you want to use the terribly outdated GPH project), but that's a shortcoming of the current RTS, not of the design patterns you use in Haskell. By design Haskell programs are well suited for an auto-distributing RTS. It's just that no such RTS exists for recent versions of the common compilers. In other words: Robustness and scalability should not be your business in Haskell. You should concentrate on understanding and using the concurrency concept well. And just to encourage you: I write productive concurrent servers in Haskell, which scale very well and probably better than an equivalent C implementation would. Reason: A Haskell thread is not mapped to an operating system thread (unless you used forkOS). When it is advantageous, the RTS can well decide to let another OS thread continue a running Haskell thread. That way the active OS threads are always utilized as efficiently as possible. It would be a pain to get something like
Re: [Haskell] select(2) or poll(2)-like function?
On Mon, 18 Apr 2011 11:07:58 +0200 Johan Tibell wrote: > On Mon, Apr 18, 2011 at 9:13 AM, Mike Meyer wrote: > > I always looked at it the other way 'round: threading is a hack to > > deal with system inadequacies like poor shared memory performance or > > an inability to get events from critical file types. > > > > Real processes and event-driven programming provide a more robust, > > understandable and scalable solutions. > > > > We need to keep two things separate: threads as a way to achieve concurrency > and as a way to achieve parallelism [1]. Absolutely. Especially because you shouldn't have to deal with concurrency if all you want is parallelism. Your reference [1] covers why this is the case quite nicely (and is essentially the argument for "understandable" in my claim above). > It's useful to use non-determinism (i.e. concurrency) to model a server > processing multiple requests. Since requests are independent and shouldn't > impact each other we'd like to model them as such. This implies some level > of concurrency (whether using threads and processes). But because the requests are independent, you don't need concurrency in this case - parallelism is sufficient. The unix process model works quite well. Compared to a threaded model, this is more robust (if a process breaks, you can kill and restart it without affecting other processes, whereas if a thread breaks, restarting the process and all the threads in it is the only safe option) and scalable (you're already doing ipc, so moving processes onto more systems is easy, and trivial if you design for it). The events handled by a single process are simple enough that your callback/event spaghetti can line up in nice, straight strands. > We don't need to do this. We can keep a concurrent programming model and get > the execution efficiency of an event driven model. This is what GHC's I/O > manager achieves. On top of that we also get parallelism for free. Another > way to look at it is that GHC provides the scheduler (using a thread for the > event loop and a separate worker pool) that you end up writing manually in > event driven frameworks. So my question is - can I still get the robustness/scalability features I get from the unix process model using haskell? In particular, it seems like ghc starts threads I don't ask it to, and using both threads & forks for parallelism causes even more headaches than concurrency (at least on unix & unix-like systems), so just replicating the process model won't work well. Do any of the haskell parallel processing tools work across multiple systems? thanks, http://www.mired.org/consulting.html Independent Software developer/SCM consultant, email for more information. O< ascii ribbon campaign - stop html mail - www.asciiribbon.org > 1. http://ghcmutterings.wordpress.com/2009/10/06/parallelism-concurrency/ ___ Haskell mailing list Haskell@haskell.org http://www.haskell.org/mailman/listinfo/haskell
Re: [Haskell] select(2) or poll(2)-like function?
On Mon, Apr 18, 2011 at 9:13 AM, Mike Meyer wrote: > I always looked at it the other way 'round: threading is a hack to > deal with system inadequacies like poor shared memory performance or > an inability to get events from critical file types. > > Real processes and event-driven programming provide a more robust, > understandable and scalable solutions. > > We need to keep two things separate: threads as a way to achieve concurrency and as a way to achieve parallelism [1]. It's useful to use non-determinism (i.e. concurrency) to model a server processing multiple requests. Since requests are independent and shouldn't impact each other we'd like to model them as such. This implies some level of concurrency (whether using threads and processes). Now, naively mapping this (concurrent) *programming model* to an *execution model* i.e. OS threads might give us less than optimal performance. At this point what often happens (i.e. in NodeJS and other event driven web frameworks) is that the programming model is thrown out with the execution model. This results in callback-spaghetti and lack of separation of concerns [2]. We don't need to do this. We can keep a concurrent programming model and get the execution efficiency of an event driven model. This is what GHC's I/O manager achieves. On top of that we also get parallelism for free. Another way to look at it is that GHC provides the scheduler (using a thread for the event loop and a separate worker pool) that you end up writing manually in event driven frameworks. -- Johan 1. http://ghcmutterings.wordpress.com/2009/10/06/parallelism-concurrency/ 2. http://lamp.epfl.ch/~imaier/pub/DeprecatingObserversTR2010.pdf - This discusses the observer pattern, which is very closely related to how event driven frameworks work (i.e. using callbacks). ___ Haskell mailing list Haskell@haskell.org http://www.haskell.org/mailman/listinfo/haskell
Re: [Haskell] select(2) or poll(2)-like function?
On Mon, Apr 18, 2011 at 10:06 AM, Svein Ove Aas wrote: > And I've often wanted a select-equivalent in Haskell. It'd be simple, > it'd help, so why not? > > But good luck using multiple cores like that. The one paradigm that > makes no sense in Haskell is worker threads (since the RTS does that > for us); instead of fetching a worker, just forkIO instead. > > Unless you're trying for LIFO or similar for overload handling, in > which case.. umh. Right. Can we have select, please? > > ..I'm sure I started this off trying to explain why you don't need it. > If you really want an event-driven programming model, use GHC.Event -- Johan ___ Haskell mailing list Haskell@haskell.org http://www.haskell.org/mailman/listinfo/haskell
Re: [Haskell] select(2) or poll(2)-like function?
On Mon, Apr 18, 2011 at 8:13 AM, Mike Meyer wrote: > On Mon, 18 Apr 2011 11:31:08 +0530 > Piyush P Kurur wrote: >> >> It is unfortunate that the usual fork and even pthread_create is not light >> weight enough for programming such high performance servers. The select >> based programming is more a hack than anything IMNSHO. > > I always looked at it the other way 'round: threading is a hack to > deal with system inadequacies like poor shared memory performance or > an inability to get events from critical file types. > > Real processes and event-driven programming provide a more robust, > understandable and scalable solutions. > > And I've often wanted a select-equivalent in Haskell. It'd be simple, it'd help, so why not? But good luck using multiple cores like that. The one paradigm that makes no sense in Haskell is worker threads (since the RTS does that for us); instead of fetching a worker, just forkIO instead. Unless you're trying for LIFO or similar for overload handling, in which case.. umh. Right. Can we have select, please? ..I'm sure I started this off trying to explain why you don't need it. -- Svein Ove Aas ___ Haskell mailing list Haskell@haskell.org http://www.haskell.org/mailman/listinfo/haskell
Re: [Haskell] select(2) or poll(2)-like function?
On Mon, 18 Apr 2011 11:31:08 +0530 Piyush P Kurur wrote: > > It is unfortunate that the usual fork and even pthread_create is not light > weight enough for programming such high performance servers. The select > based programming is more a hack than anything IMNSHO. I always looked at it the other way 'round: threading is a hack to deal with system inadequacies like poor shared memory performance or an inability to get events from critical file types. Real processes and event-driven programming provide a more robust, understandable and scalable solutions. http://www.mired.org/consulting.html Independent Software developer/SCM consultant, email for more information. O< ascii ribbon campaign - stop html mail - www.asciiribbon.org ___ Haskell mailing list Haskell@haskell.org http://www.haskell.org/mailman/listinfo/haskell
Re: [Haskell] select(2) or poll(2)-like function?
On Mon, Apr 18, 2011 at 12:47:33AM +0200, Johan Tibell wrote: > In other words, it's reasonable to fork of tens of thousands of threads and > expect good performance. Yes I think. Besides from the point of view of programming servers like for e.g. a web server, the forkIO based solution is more natural. It is unfortunate that the usual fork and even pthread_create is not light weight enough for programming such high performance servers. The select based programming is more a hack than anything IMNSHO. The light-weight threads of GHC hence is a boon besides other goodies like STM, Channels etc. Disclaimer: I have never programmed large servers on GHC nor benchmarked any servers that are available. But I believe stuff like Wai server (used by yesod) etc uses forkIO and has very good performance. Regards ppk ___ Haskell mailing list Haskell@haskell.org http://www.haskell.org/mailman/listinfo/haskell
Re: [Haskell] select(2) or poll(2)-like function?
In other words, it's reasonable to fork of tens of thousands of threads and expect good performance. On Apr 17, 2011 10:46 PM, "Don Stewart" wrote: > `forkIO` is based on epoll. So threadWaitFD and friends are using epoll. > > On Sun, Apr 17, 2011 at 1:29 PM, Matthias Kilian wrote: >> Hi, >> >> is there something like select(2) or poll(2) available in the >> standard (HP) libraries? I hoogled around a little bit but didn't >> find anything. (Something like this will be crucial for networking >> stuff listening on v4 and v6 sockets at the same time) >> >> Ciao, >>Kili >> >> ___ >> Haskell mailing list >> Haskell@haskell.org >> http://www.haskell.org/mailman/listinfo/haskell >> > > ___ > Haskell mailing list > Haskell@haskell.org > http://www.haskell.org/mailman/listinfo/haskell ___ Haskell mailing list Haskell@haskell.org http://www.haskell.org/mailman/listinfo/haskell
Re: [Haskell] select(2) or poll(2)-like function?
`forkIO` is based on epoll. So threadWaitFD and friends are using epoll. On Sun, Apr 17, 2011 at 1:29 PM, Matthias Kilian wrote: > Hi, > > is there something like select(2) or poll(2) available in the > standard (HP) libraries? I hoogled around a little bit but didn't > find anything. (Something like this will be crucial for networking > stuff listening on v4 and v6 sockets at the same time) > > Ciao, > Kili > > ___ > Haskell mailing list > Haskell@haskell.org > http://www.haskell.org/mailman/listinfo/haskell > ___ Haskell mailing list Haskell@haskell.org http://www.haskell.org/mailman/listinfo/haskell
[Haskell] select(2) or poll(2)-like function?
Hi, is there something like select(2) or poll(2) available in the standard (HP) libraries? I hoogled around a little bit but didn't find anything. (Something like this will be crucial for networking stuff listening on v4 and v6 sockets at the same time) Ciao, Kili ___ Haskell mailing list Haskell@haskell.org http://www.haskell.org/mailman/listinfo/haskell