Re: Advice wanted on parallel processing
== j waldmann waldm...@imn.htwk-leipzig.de writes: (I am not updating in place). The move generator produces a new board for each move. Well, this is sound design, but current memory managers may not be up to it. If you check the (board) game programming literature, you'll find that engine authors take great efforts to bypass automatic memory management (do all updates on the current board in-place, and write their own malloc/free for game tree nodes). I know it. In fact I wrote a program for this game about 10 years ago, in C++, which did all updates in place. It wasn't very good (although being the only one that implemented the rules correctly - as far as I could tell - they are very complicated - was necessarily the best). Now I want to have another go in Haskell. Reading about DPH inspired me to give it a try, although I'm not attempting to use DPH yet. Probably I was too naive thinking I was going to be able to exploit parallelism without pain. This becomes even more important when trying to exploit concurrency. In theory, that's all very nice (you can evaluate independent branches of the game tree in parallel) but in practice, you're doomed if your memory allocator/collector is still essentially single-threaded (that is, blocks all threads). OK, that's interesting. GHC's collector is still stop-the-world at the moment, right? I read in the recent paper that it is intended to try to remedy that soon, in which case I can try again to see if that improves matters. -- Colin Adams Preston Lancashire ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Advice wanted on parallel processing
I've just managed to build ghc 6.11 (Thanks Simon). I did this for two reasons, one of which is I want to try to improve the speed of the AI for the Chu Shogi program I am writing by making use of parallel processing. I have a 4-core Xeon runing Fedora Linux 10 (AMD64). I have a repeatable scenario for timings. With ghc 6.10.1, no threaded runtime, the computer takes 51 or 52 seconds to move (experimental variation up to 1.2 seconds). With ghc 6.11, it takes the same time. If I switch on the threaded runtime, then it rises slightly (perhaps not significantly - I measure 53-54 seconds). With 2, 3 or 4 processors, I measured (one-off, so not reliable) 65, 55 and 58 seconds respectively. Well, that doesn't really matter, provided I can get the time back with interest by exploiting parallelism. My first thought for an easy win is the move generator. At present I have this code: -- | Generate all legal moves from state. -- -- The best moves (according to some heuristic) should come first generate_moves :: (Non_interactive_state, Maybe Move) - [(Non_interactive_state, Maybe Move)] generate_moves (state, _) = let colour = case is_black_to_play state of True - Black False - White pieces' = all_pieces_of_colour (board state) colour unsorted = concatMap (generate_moves_for_piece state) pieces' sorted = sortBy best_move unsorted moves = mapMaybe new_move sorted in {-trace (concat (intersperse \n (map (print_move . fromJust . snd) moves)))-} moves where new_move mv = case update_from_move (state, Nothing) mv of Nothing - Nothing Just (st', Nothing) - Just (st', Just mv) and my idea was to run the call to generate_moves_for_piece in parallel over the pieces' list. I thought I could do that simply by replacing concatMap with parFlatMap, but discovered I need to supply an extra argument - a Strategy. Not knowing what this should be (where's the best place to read up on this?) I tried specifying rwhnf. My timings then are: -N1 83 seconds -N2 57 seconds -N3 58 seconds -N4 60 seconds so a complete failure. Where should I go from here? -- Colin Adams Preston Lancashire ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: Advice wanted on parallel processing
Am Mittwoch, 18. März 2009 15:28 schrieb Colin Paul Adams: I've just managed to build ghc 6.11 (Thanks Simon). I did this for two reasons, one of which is I want to try to improve the speed of the AI for the Chu Shogi program I am writing by making use of parallel processing. I have a 4-core Xeon runing Fedora Linux 10 (AMD64). I have a repeatable scenario for timings. With ghc 6.10.1, no threaded runtime, the computer takes 51 or 52 seconds to move (experimental variation up to 1.2 seconds). With ghc 6.11, it takes the same time. If I switch on the threaded runtime, then it rises slightly (perhaps not significantly - I measure 53-54 seconds). With 2, 3 or 4 processors, I measured (one-off, so not reliable) 65, 55 and 58 seconds respectively. Well, that doesn't really matter, provided I can get the time back with interest by exploiting parallelism. My first thought for an easy win is the move generator. At present I have this code: -- | Generate all legal moves from state. -- -- The best moves (according to some heuristic) should come first generate_moves :: (Non_interactive_state, Maybe Move) - [(Non_interactive_state, Maybe Move)] generate_moves (state, _) = let colour = case is_black_to_play state of True - Black False - White pieces' = all_pieces_of_colour (board state) colour unsorted = concatMap (generate_moves_for_piece state) pieces' sorted = sortBy best_move unsorted moves = mapMaybe new_move sorted in {-trace (concat (intersperse \n (map (print_move . fromJust . snd) moves)))-} moves where new_move mv = case update_from_move (state, Nothing) mv of Nothing - Nothing Just (st', Nothing) - Just (st', Just mv) and my idea was to run the call to generate_moves_for_piece in parallel over the pieces' list. I thought I could do that simply by replacing concatMap with parFlatMap, but discovered I need to supply an extra argument - a Strategy. Not knowing what this should be (where's the best place to read up on this?) I tried specifying rwhnf. Control.Parallel.Strategies, docs and source? generate_moves_for_piece produces a list. rwhnf forces this list enough to see if it's [] or (_:_) (rwhnf x = x `seq` ()), that doesn't get enough work done in each thread to compensate the overhead. Try using rnf after writing an NFData instance for Move. If e.g. data Move = Move {from :: Position, to :: Position} , the instance would be instance NFData Move where rnf (Move f t) = rnf f `seq` rnf t `seq` () That might require NFData instances for Position and its components, but specifying these should be automatic. My timings then are: -N1 83 seconds -N2 57 seconds -N3 58 seconds -N4 60 seconds so a complete failure. Where should I go from here? HTH, Daniel ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: Advice wanted on parallel processing
Daniel == Daniel Fischer daniel.is.fisc...@web.de writes: Daniel generate_moves_for_piece produces a list. rwhnf forces Daniel this list enough to see if it's [] or (_:_) (rwhnf x = x Daniel `seq` ()), that doesn't get enough work done in each Daniel thread to compensate the overhead. Try using rnf after Daniel writing an NFData instance for Move. Daniel If e.g. Daniel data Move = Move {from :: Position, to :: Position} Daniel , the instance would be Daniel instance NFData Move where rnf (Move f t) = rnf f `seq` Daniel rnf t `seq` () It made no difference to the timings whatsoever. Anyway, at that point I decided to revert to the first rule of performance tuning, and profiled the program for time. The move generator was using less than 3% of the cpu time, so that was clearly a waste of time on my part. The one routine that stood out was this one (about 35% CPU time, with 0% attributed to children): -- | Value of one rank of the board rank_value :: (Int, Array Int Square) - Int rank_value (rank_coord, rank') = sum (map (cell_value rank_coord) (assocs rank')) The array has 12 elements. So I tried changing the map to parMap rnf. This gave timings of: -N1 93 seconds -N2 105 seconds -N3 206 seconds at which point I decided I hadn't got a clue what was going on. -- Colin Adams Preston Lancashire ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: Advice wanted on parallel processing
On 2009 Mar 18, at 16:34, Colin Paul Adams wrote: The one routine that stood out was this one (about 35% CPU time, with 0% attributed to children): -- | Value of one rank of the board rank_value :: (Int, Array Int Square) - Int rank_value (rank_coord, rank') = sum (map (cell_value rank_coord) (assocs rank')) The array has 12 elements. How many times do you call it? Perhaps the real optimization you need is to memoize. -- brandon s. allbery [solaris,freebsd,perl,pugs,haskell] allb...@kf8nh.com system administrator [openafs,heimdal,too many hats] allb...@ece.cmu.edu electrical and computer engineering, carnegie mellon universityKF8NH PGP.sig Description: This is a digitally signed message part ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: Advice wanted on parallel processing
Brandon == Brandon S Allbery KF8NH allb...@ece.cmu.edu writes: The array has 12 elements. Brandon How many times do you call it? Perhaps the real Brandon optimization you need is to memoize. Very many times indeed. But it is a different array on most occasions (I am not updating in place). It represents one rank of the board on which the game is played. The move generator produces a new board for each move. -- Colin Adams Preston Lancashire ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: Advice wanted on parallel processing
On 2009 Mar 18, at 16:59, Colin Paul Adams wrote: Brandon == Brandon S Allbery KF8NH allb...@ece.cmu.edu writes: The array has 12 elements. Brandon How many times do you call it? Perhaps the real Brandon optimization you need is to memoize. Very many times indeed. But it is a different array on most occasions (I am not updating in place). It represents one rank of the board on which the game is played. The move generator produces a new board for each move. It might be helpful for you to show more of the program. One thing to keep in mind about parallelization is that the level at which happens matters: if individual calculations across the list are fast, all you gain by parallelizing it is MP synchronization/locking overhead. On the other hand, it's possible that if the caller can be rearranged so that rank_value is computed on another CPU while it's doing something else, you could gain quite a bit. Or maybe the caller is itself at the right level to parallelize. -- brandon s. allbery [solaris,freebsd,perl,pugs,haskell] allb...@kf8nh.com system administrator [openafs,heimdal,too many hats] allb...@ece.cmu.edu electrical and computer engineering, carnegie mellon universityKF8NH PGP.sig Description: This is a digitally signed message part ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: Advice wanted on parallel processing
(I am not updating in place). The move generator produces a new board for each move. Well, this is sound design, but current memory managers may not be up to it. If you check the (board) game programming literature, you'll find that engine authors take great efforts to bypass automatic memory management (do all updates on the current board in-place, and write their own malloc/free for game tree nodes). This becomes even more important when trying to exploit concurrency. In theory, that's all very nice (you can evaluate independent branches of the game tree in parallel) but in practice, you're doomed if your memory allocator/collector is still essentially single-threaded (that is, blocks all threads). That's language independent (it's a property of the run-time system). Of course in a more declarative language it should be easier for the compiler to analyze the source code and replace malloc/free by something better (no allocation by immediate re-use, or easy deallocation by some stack regime). J.W. -- View this message in context: http://www.nabble.com/Advice-wanted-on-parallel-processing-tp22580444p22591720.html Sent from the Haskell - Glasgow-haskell-users mailing list archive at Nabble.com. ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users