Re: [Haskell-cafe] Re: Parallel foldl doesn't work correctly

2009-12-13 Thread Philip Beadling
On Sat, 2009-12-12 at 13:46 +, Ben Millwood wrote:
 On Sat, Dec 12, 2009 at 10:08 AM, Maciej Piechotka
 uzytkown...@gmail.com wrote:
  If operation is associative it can be done using divide et impera
  spliting list in half and operating on it pararerlly then split in half
  etc.
 

Thank you very much for the replies.

I've come to the conclusion that, yep, you can't (directly) parallelise
of fold operation, as fold guarantees order of processing.

With something like map the runtime is free to continue sparking
function application to each element without waiting for the result.
So we spark f x, force evaluation of the remainder of the xs and
recurse.
I'm *guessing* at a detailed level when we are creating the output list,
haskell can concat each result element before f x returns due to
laziness - that is, haskell doesn't need to wait for evaluation of f x,
before continuing?

With fold, and specifically with foldl (+), this isn't the case as (+)
is strict on both arguments and thus it cannot continue until each
sparked evaluation has completed and combined with the accumulator.  If
(+) was not strict on both arguments, I'm not sure if could solider
on... assuming I've understood map correctly!?


Writing it out long hand (sorry if this is tedious!), we have:

using :: a - Strategy a - a
using x s = s x `seq` x

rwhnf :: Strategy a 
rwhnf x = x `seq` ()  

parList :: Strategy a - Strategy [a]
parList strat [] = ()
parList strat (x:xs) = strat x `par` (parList strat xs)

parMap :: Strategy b - (a - b) - [a] - [b]
parMap strat f = (`using` parList strat) . map f 


'using' applies a strategy to an item, and then returns the item.
'parList' is a (combinator) strategy which applies an atomic strategy to
each element in the list *in parallel* (for example forcing each element
to WHNF).

So for parMap we have xs passed into 'map f' - the result is then passed
to 'using' which will force application of 'f' on each element in
parallel by way of 'parList'.  No forced evaluation is dependant on a
previous evaluation.

Now for parFoldl - a crude and wrong representation for my purposes
could be:

parFoldl :: Num b = Strategy b - (a - b) - [a] - b
parFoldl strat f = sum . (`using` parList strat) . map f

This isn't really a fold of course, but it is doing roughly the same
thing, it's summing the results of applying function 'f' to each element
in a list.

The problem here is that sum will only allow one spark at a time,
because

sum [] = 0
sum (x:xs) = x + sum xs

So we get something like:
0 + (x4 + (x3 + (x2 + (x1

For example the result for (x4 + previous) can only be evaluated after
x3, x2 and x1 have been evaluated.  This means it won't spark evaluation
on x4 until (x3 + ) has been evaluated, thus only one core is ever
used.

I believe fold is just the general case of sum and the same logic
applies.


I suppose my questions are:

Have I got this right, if not very succinct!?  

Is it purely the strictness of (+) that causes this situation?

Ignoring DPH, is it possible to write a parallel fold avoiding something
like the technique below?


Anyhow, a workaround similar to those suggested I came up with is to
divide the folds up across the cores and then sum the sub-folds - this
produces approximately double the performance across two cores:

import Control.Parallel.Strategies (parMap,rwhnf)
import Data.List (foldl')
import Data.List.Split (chunk)
import GHC.Conc (numCapabilities)


-- Prepare to share work to be 
-- done across available cores
chunkOnCpu :: [a] - [[a]]
chunkOnCpu xs = chunk (length xs `div` numCapabilities) xs
 
-- Spark a fold of each chunk and
-- sum the results. Only works because
-- for associative folds.
foldChunks :: ([a] - a) - (a - b - a) - a - [[b]] - a
foldChunks combineFunc foldFunc acc = 
  combineFunc . (parMap rwhnf $ foldl' foldFunc acc)

-- Some pointless work to keep thread busy
workFunc :: Int - Int
workFunc 1 = 1
workFunc x = workFunc $ x - 1

-- Do some work on element x and append
foldFunc :: Int - Int - Int
foldFunc acc x = acc + workFunc x 

testList = repeat 10
answer =  foldChunks sum foldFunc 0 $ chunkOnCpu (take 50 testList)

main :: IO()
main = print answer













___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: Parallel foldl doesn't work correctly

2009-12-13 Thread Philip Beadling
 -- Prepare to share work to be 
 -- done across available cores
 chunkOnCpu :: [a] - [[a]]
 chunkOnCpu xs = chunk (length xs `div` numCapabilities) xs
  
 -- Spark a fold of each chunk and
 -- sum the results. Only works because
 -- for associative folds.
 foldChunks :: ([a] - a) - (a - b - a) - a - [[b]] - a
 foldChunks combineFunc foldFunc acc = 
   combineFunc . (parMap rwhnf $ foldl' foldFunc acc)


I should probably point out that use of chunk above isn't a good idea in
anything beyond a toy example.  If you have used a list comprehension to
create your input then splitting it like the above results in thunks
that grow with list size as chunk forces generation of the list.  This
rapidly negates any advantage gained from processing across 1 core!
This is easily solved - just alter the generating function to create a
*list* of list comprehensions equal in length to the number of cores you
wish to process across, rather than create one list that is split across
the cores later.



___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: Parallel foldl doesn't work correctly

2009-12-12 Thread Ben Millwood
On Sat, Dec 12, 2009 at 10:08 AM, Maciej Piechotka
uzytkown...@gmail.com wrote:
 If operation is associative it can be done using divide et impera
 spliting list in half and operating on it pararerlly then split in half
 etc.

I implemented something like this as an exercise:

http://benmachine.co.uk/parconcat.hs

It took me a little while to get everything to par as it should and
I'm still not sure I'm doing it in the most efficient way, but there
it is.

(If the output is nonsense, you might try changing hPutStrLn stderr
into putStrLn so that it's buffered and arrives in blocks).
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe