Re: [Haskell-cafe] Control.Concurrent.forkIO versus Control.Parallel.par

2008-07-28 Thread Mario Blazevic

Sterling Clover wrote:
I think a better way to look at it is that Haskell has two separate 
mechanisms for different *notions* of concurrency -- forkIO for actual 
concurrent computation which needs explicit threads and communication 
(and within that, either semaphore-based communication with MVars or 
transactional control with TVars and STM), and par for parallelism which 
is to express computations that are innately parallel. See, e.g. the GHC 
users manual which defines them as such:


...


	Yes, I do understand the distinction. My problem is that I'm working on 
a new concurrency mechanism, in the form of a monad transformer. It 
should allow user to specify that particular monadic computation should 
be run in parallel. It appears that will be possible only if the 
underlying monad is IO, because I can't get par to work.


In any case, I suspect that your second parallelize function doesn't 
work right because \x -> x >>= return is an effective no-op, modulo 
strictness characteristics of >>=. And in any case, it can't be 
evaluated until it is called in a particular monadic "environment" which 
is provided, sequencing and all, via liftM2. One can't parallelize in an 
arbitrary monad in any case, at least without making a number of 
decisions. E.g., what's the resultant state after two parallel 
computations are run in a state monad?


	I see the problem now, thanks. I wonder if it would make sense to add a 
new defaulted method to Monad class, perhaps a variant of the existing 
sequence


parallelSequence :: [m a] -> m [a]
parallelSequence = sequence

	Then monads that have a way of forking and recombining parallel 
computations could override the method.


So if you're using concurrency with a monad transformer, you probably 
might want to start by stripping back the layers of the concurrent part 
of your algorithm to the minimum possible, and then explicitly managing 
passing state into the various forked computations, which can then be 
wrapped in as many runReaderT or such calls as necessary.


	I don't have any state to pass, the question is simply whether two 
monadic values can be run in parallel and then recombined. I can see why 
that's impossible for State, Cont, and probably some other monads.


On another, general, note, unless you're very careful, mixing IO into 
your algorithm will probably result in very underperformant parallel 
code, since it will be IO rather than processor bound.


	I know, the idea was to let the user control which concurrent 
computations should be run in parallel, if resources allow.




On Jul 27, 2008, at 10:49 PM, Mario Blažević wrote:



Hello. I have a question about parallel computation in Haskell. 
After browsing the GHC library documentation, I was left with 
impression that there are two separate mechanisms for expressing 
concurrency: Control.Parallel.par for pure computations and 
Control.Concurrent.forkIO for computations in IO monad.


This dichotomy becomes a problem when one tries to use concurrency 
from a monad transformer, though I'm sure that's not the only such 
situation. One cannot assume that the base monad is IO so forkIO 
cannot be used, while Control.Parallel.par won't run monads. My first 
solution was to replace the base monad class for the monad transformer 
by the following ParallelizableMonad class:


 


class Monad m => ParallelizableMonad m where
   parallelize :: m a -> m b -> m (a, b)
   parallelize ma mb = do a <- ma
  b <- mb
  return (a, b)

instance ParallelizableMonad Identity where
   parallelize (Identity a) (Identity b) = Identity (a `par` (b `pseq` 
(a, b)))


instance ParallelizableMonad IO where
   parallelize ma mb = do va <- newEmptyMVar
  vb <- newEmptyMVar
  forkIO (ma >>= putMVar va)
  forkIO (mb >>= putMVar vb)
  a <- takeMVar va
  b <- takeMVar vb
  return (a, b)
 



I tested this solution, and it worked for IO computations in the sense 
that they used both CPUs. The test also ran slower on two CPUs that on 
one, but that's beside the point.


Then I realized that par can, in fact, be used on any monad, it just 
needs a little nudge:


 


parallelize :: m a -> m b -> m (a, b)
parallelize ma mb = let a = ma >>= return
b = mb >>= return
in a `par` (b `pseq` liftM2 (,) a b)
 



However, in this version the IO monadic computations still appear to 
use only one CPU. I cannot get par to parallelize monadic 
computations. I've used the same command-line options in both 
e

Re: [Haskell-cafe] Control.Concurrent.forkIO versus Control.Parallel.par

2008-07-27 Thread Luke Palmer
On Mon, Jul 28, 2008 at 2:49 AM, Mario Blažević <[EMAIL PROTECTED]> wrote:
> parallelize :: m a -> m b -> m (a, b)
> parallelize ma mb = let a = ma >>= return
>b = mb >>= return
>in a `par` (b `pseq` liftM2 (,) a b)

See Sterling's reply for an actual answer to your question, but note
that one of the monad laws is:

   m >>= return = m

(i.e. return is a right identity of bind)

That means your code can be reduced to:

   parallelize ma mb = ma `par` (mb `pseq` liftM2 (,) ma mb)

Which, as Sterling points out, is *not* doing what you think it is.

Luke
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Control.Concurrent.forkIO versus Control.Parallel.par

2008-07-27 Thread Sterling Clover
I think a better way to look at it is that Haskell has two separate  
mechanisms for different *notions* of concurrency -- forkIO for  
actual concurrent computation which needs explicit threads and  
communication (and within that, either semaphore-based communication  
with MVars or transactional control with TVars and STM), and par for  
parallelism which is to express computations that are innately  
parallel. See, e.g. the GHC users manual which defines them as such:


* Parallelism means running a Haskell program on multiple processors,  
with the goal of improving performance. Ideally, this should be done  
invisibly, and with no semantic changes.


* Concurrency means implementing a program by using multiple I/O- 
performing threads. While a concurrent Haskell program can run on a  
parallel machine, the primary goal of using concurrency is not to  
gain performance, but rather because that is the simplest and most  
direct way to write the program. Since the threads perform I/O, the  
semantics of the program is necessarily non-deterministic.


(http://www.haskell.org/ghc/docs/latest/html/users_guide/lang- 
parallel.html)


In any case, I suspect that your second parallelize function doesn't  
work right because \x -> x >>= return is an effective no-op, modulo  
strictness characteristics of >>=. And in any case, it can't be  
evaluated until it is called in a particular monadic "environment"  
which is provided, sequencing and all, via liftM2. One can't  
parallelize in an arbitrary monad in any case, at least without  
making a number of decisions. E.g., what's the resultant state after  
two parallel computations are run in a state monad?


So if you're using concurrency with a monad transformer, you probably  
might want to start by stripping back the layers of the concurrent  
part of your algorithm to the minimum possible, and then explicitly  
managing passing state into the various forked computations, which  
can then be wrapped in as many runReaderT or such calls as necessary.


On another, general, note, unless you're very careful, mixing IO into  
your algorithm will probably result in very underperformant parallel  
code, since it will be IO rather than processor bound. Again the  
point from the GHC manual that "the primary goal of using concurrency  
is not to gain performance, but rather because that is the simplest  
and most direct way to write the program" seems appropriate.  
Additionally, many have found it easier at this stage to get good  
performance out of writing parallel code with concurrent mechanisms  
rather than `par`, because careless use of `par` will tend to add as  
much overhead in spark creation as is saved with multiprocessing,  
while an explicit work queue can be easier to reason about.


Regards,
S.

On Jul 27, 2008, at 10:49 PM, Mario Blažević wrote:



Hello. I have a question about parallel computation in Haskell.  
After browsing the GHC library documentation, I was left with  
impression that there are two separate mechanisms for expressing  
concurrency: Control.Parallel.par for pure computations and  
Control.Concurrent.forkIO for computations in IO monad.


This dichotomy becomes a problem when one tries to use  
concurrency from a monad transformer, though I'm sure that's not  
the only such situation. One cannot assume that the base monad is  
IO so forkIO cannot be used, while Control.Parallel.par won't run  
monads. My first solution was to replace the base monad class for  
the monad transformer by the following ParallelizableMonad class:


-- 
--

class Monad m => ParallelizableMonad m where
   parallelize :: m a -> m b -> m (a, b)
   parallelize ma mb = do a <- ma
  b <- mb
  return (a, b)

instance ParallelizableMonad Identity where
   parallelize (Identity a) (Identity b) = Identity (a `par` (b  
`pseq` (a, b)))


instance ParallelizableMonad IO where
   parallelize ma mb = do va <- newEmptyMVar
  vb <- newEmptyMVar
  forkIO (ma >>= putMVar va)
  forkIO (mb >>= putMVar vb)
  a <- takeMVar va
  b <- takeMVar vb
  return (a, b)
-- 
--


I tested this solution, and it worked for IO computations in the  
sense that they used both CPUs. The test also ran slower on two  
CPUs that on one, but that's beside the point.


Then I realized that par can, in fact, be used on any monad, it  
just needs a little nudge:


-- 
--

parallelize :: m a -> m b -> m (a, b)
parallelize ma mb = let a = ma >>= return
b = mb >>= return
in a `par` (b `pseq` liftM2 (,) a b)
---

[Haskell-cafe] Control.Concurrent.forkIO versus Control.Parallel.par

2008-07-27 Thread Mario Blažević

Hello. I have a question about parallel computation in Haskell. After 
browsing the GHC library documentation, I was left with impression that there 
are two separate mechanisms for expressing concurrency: Control.Parallel.par 
for pure computations and Control.Concurrent.forkIO for computations in IO 
monad.

This dichotomy becomes a problem when one tries to use concurrency from a 
monad transformer, though I'm sure that's not the only such situation. One 
cannot assume that the base monad is IO so forkIO cannot be used, while 
Control.Parallel.par won't run monads. My first solution was to replace the 
base monad class for the monad transformer by the following ParallelizableMonad 
class:


class Monad m => ParallelizableMonad m where
   parallelize :: m a -> m b -> m (a, b)
   parallelize ma mb = do a <- ma
  b <- mb
  return (a, b)

instance ParallelizableMonad Identity where
   parallelize (Identity a) (Identity b) = Identity (a `par` (b `pseq` (a, b)))

instance ParallelizableMonad IO where
   parallelize ma mb = do va <- newEmptyMVar
  vb <- newEmptyMVar
  forkIO (ma >>= putMVar va)
  forkIO (mb >>= putMVar vb)
  a <- takeMVar va
  b <- takeMVar vb
  return (a, b)


I tested this solution, and it worked for IO computations in the sense that 
they used both CPUs. The test also ran slower on two CPUs that on one, but 
that's beside the point.

Then I realized that par can, in fact, be used on any monad, it just needs a 
little nudge:


parallelize :: m a -> m b -> m (a, b)
parallelize ma mb = let a = ma >>= return
b = mb >>= return
in a `par` (b `pseq` liftM2 (,) a b)


However, in this version the IO monadic computations still appear to use only 
one CPU. I cannot get par to parallelize monadic computations. I've used the 
same command-line options in both examples: -O -threaded and +RTS -N2. What am 
I missing?


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe