On 18/04/2011 12:55, Mike Meyer wrote:
On Mon, 18 Apr 2011 12:56:39 +0200
Ertugrul Soeylemez<e...@ertes.de>  wrote:
Mike Meyer<m...@mired.org>  wrote:
The unix process model works quite well. Compared to a threaded model,
this is more robust (if a process breaks, you can kill and restart it
without affecting other processes, whereas if a thread breaks,
restarting the process and all the threads in it is the only safe
option) and scalable (you're already doing ipc, so moving processes
onto more systems is easy, and trivial if you design for it). The
events handled by a single process are simple enough that your
callback/event spaghetti can line up in nice, straight strands.
When writing concurrent code you don't care about how the RTS maps it to
processes and threads.  GHC chose threads, probably because they are
faster to create/kill and consume less memory.  But this is an
implementation detail the Haskell developer should not have to worry
about.

So - what happens when a thread fails for some reason? I'm used to
dealing with systems that run 7x24 for weeks or even months on
end. Hardware hiccups, network failures, bogus input, hung clients,
etc. are all just facts of life. I need the system to keep running
properly in the face of all those, and I need them to disrupt the
world as little as possible.

Given that the RTS has taken control over this stuff, I sort of expect
it to take care of noticing a dead process and restarting it as
well. All of which is fine by me.

The RTS can't manage things at that level, because it doesn't know what robustness model you want. So failures in the I/O library results in exceptions, and you get to decide what to do. If a thread dies due to an exception, then you are responsible for what happens from then on - typically you would have a top-level exception handler that notifies some higher-level thread what happened. It's true that Haskell doesn't give you as much help here as you would get in Erlang/OTP, but it's all readily programmed up.

Haskell *does* give you some important guarantees though. Threads never just die without receiving an exception first. If a thread blocks on an unreachable resource then it gets an exception, so you get some help dealing with deadlocks.

We don't need to do this. We can keep a concurrent programming model
and get the execution efficiency of an event driven model. This is
what GHC's I/O manager achieves. On top of that we also get
parallelism for free. Another way to look at it is that GHC provides
the scheduler (using a thread for the event loop and a separate
worker pool) that you end up writing manually in event driven
frameworks.

So my question is - can I still get the robustness/scalability
features I get from the unix process model using haskell? In
particular, it seems like ghc starts threads I don't ask it to, and
using both threads&  forks for parallelism causes even more headaches
than concurrency (at least on unix&  unix-like systems), so just
replicating the process model won't work well. Do any of the haskell
parallel processing tools work across multiple systems?

Effectively no (unless you want to use the terribly outdated GPH
project), but that's a shortcoming of the current RTS, not of the design
patterns you use in Haskell.  By design Haskell programs are well suited
for an auto-distributing RTS.  It's just that no such RTS exists for
recent versions of the common compilers.

So is anyone working on such a package for haskell? I know clojure's
got some people working on making STM work in a distributed
environment, but that's outside the goals of the core team.

Take a look at "Haskell for the Cloud", Jeff Epstein, Andrew Black and Simon Petyon Jones:

http://research.microsoft.com/en-us/um/people/simonpj/papers/parallel/remote.pdf

In other words:  Robustness and scalability should not be your business
in Haskell.  You should concentrate on understanding and using the
concurrency concept well.  And just to encourage you:  I write
productive concurrent servers in Haskell, which scale very well and
probably better than an equivalent C implementation would.  Reason:  A
Haskell thread is not mapped to an operating system thread (unless you
used forkOS).  When it is advantageous, the RTS can well decide to let
another OS thread continue a running Haskell thread.  That way the
active OS threads are always utilized as efficiently as possible.  It
would be a pain to get something like that with explicit threading and
even more, when using processes.

Well, *someone* has to worry about robustness and scalability. Users
notice when their two minute system builds start taking four minutes
(and will be at my door wanting me to fix it) because something didn't
scale fast enough, or have to be run more than once because a failing
component build wasn't restarted properly. I'm willing to believe that
haskell lets you write more scalable code than C, but C's tools for
handling concurrency suck, so that should be true in any language
where someone actually thought about dealing with concurrency beyond
locks and protected methods. The problem is, the only language I've
found where that's true that *also* has reasonable tools to deal with
scaling beyond a single system is Eiffel (which apparently abstracts
things even further than haskell - details like how concurrency is
achieved or how many concurrent operations you can have are configured
when you start an application, *not* when writing it). Unfortunately,
Eiffel has other problems that make it undesirable.

I'm interested in understanding what problems you're referring to. What kind of scaling are you interested in - number of clients, number of cores, or something else? What is it about Haskell threads that you are worried might not scale?

Cheers,
        Simon

_______________________________________________
Haskell mailing list
Haskell@haskell.org
http://www.haskell.org/mailman/listinfo/haskell

Reply via email to