Re: [Haskell-cafe] Re: Parallel combinator, performance advice
Hi Gleb, >> I've just found that QSem _ISN'T_ valid below 0, so the above code >> won't actually work with QSem as it stands. > > I may be missing something, but I guess this code will work if 'newQSem > (-n)' is replaced with 'newQSemN n', every 'signalQSem sem' with > 'signalQSemN sem 1' and 'waitQSem sem' with 'waitQSemN sem n'. Yep, that seems valid. However, the new solution requires an MVar to store the number of things we're waiting for anyway, so we might as well use that to also wake up the main thread. Semaphores aren't more efficient than MVar's, they are in fact implemented in terms of MVar's, so there is no benefit. Thanks Neil ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Re: Parallel combinator, performance advice
Neil Mitchell wrote: I've just found that QSem _ISN'T_ valid below 0, so the above code won't actually work with QSem as it stands. I may be missing something, but I guess this code will work if 'newQSem (-n)' is replaced with 'newQSemN n', every 'signalQSem sem' with 'signalQSemN sem 1' and 'waitQSem sem' with 'waitQSemN sem n'. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Re: Parallel combinator, performance advice
Neil Mitchell wrote: > > I guess the "nested calls to parallel_" bit is the part of the spec > that makes everything much harder! > > Thanks > > Neil Yes. Much more annoying. But the problem here is generic. To avoid it you must never allow all thread to block at once. The parallel_ function is such a job, so you solved this with the 'idempotent' trick. You solution works by blocking all but 1 thread. 1a) Some worker thread 1 executes parallel_ with some jobs 1b) These get submitted the work queue 'chan' 1c) worker thread 1 starts on those same jobs, ignoring the queue 1d) worker thread 1 reaches the job being processed by thread 2 1e) worker thread 1 blocks until the jobs is finished in modifyMVar 2a) Worker thread 2 grabs a job posted by thread 1, that calls parallel_ 2b) This batch of jobs gets submitted to the work queue 'chan' 2c) worker thread 2 starts on those same jobs, ignoring the queue 1d) worker thread 2 reaches the job being processed by thread 3 1e) worker thread 2 blocks until the jobs is finished in modifyMVar 3...4...5... And now only 1 thread is still working, and it has to work in series. I think I can fix this... ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Re: Parallel combinator, performance advice
Hi > > The problem I'm trying to solve is running system commands in > > parallel. > "system commands" means execution of external commands or just system > calls inside Haskell? Calls to System.Cmd.system, i.e. running external console processes. It's a make system I'm writing, so virtually all the time is spent in calls to ghc etc. To Bulat: I should have been clearer with the spec. The idea is that multiple calls to paralell_ can execute, and a function executing inside parallel_ can itself call parallel_. For this reason I need one top-level thread pool, which requires unsafePerformIO. If I create a thread pool every time, I end up with more threads than I want. > Your parallel_ does not return until all operations are finished. > >> parallel_ (x:xs) = do >> ys <- mapM idempotent xs >> mapM_ addParallel ys >> sequence_ $ x : reverse ys > > By the way, there is no obvious reason to insert "reverse" there. There is a reason :-) Imagine I do parallel_ [a,b,c] That's roughly doing (if b' is idempotent b): enqueue b' enqueue c' a b' c' If while executing a the thread pool starts on b', then after I've finished a, I end up with both threads waiting for b', and nothing doing c'. If I do a reverse, then the thread pool and me are starting at different ends, so if we lock then I know it's something important to me that the thread pool started first. It's still not idea, but it happens less often. > What I meant was something like: >> >> para [] = return () >> para [x] = x >> para xs = do >> q <- newQSemN 0 >> let wrap x = finally x (signalQSemN q 1) >> go [y] n = wrap x >> waitQSemN q (succ n) >> go (y:ys) n = addParallel (wrap y) >> go ys $! succ n >> go xs 0 > > This is nearly identical to your code, and avoid creating the MVar for each > operation. I use "finally" to ensure the count is correct, but if a worker > threads dies then bas things will happen. You can replace finally with (>>) > if > speed is important. Consider a thread pool with 2 threads and the call parallel_ [parallel_ [b,c],a] You get the sequence: enqueue (parallel_ [b,c]) a wait on parallel_ [b,c] While you are executing a, a thread pool starts: enqueue b c wait for b Now you have all the threads waiting, and no one dealing with the thread pool. This results in deadlock. I guess the "nested calls to parallel_" bit is the part of the spec that makes everything much harder! Thanks Neil ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Re: Parallel combinator, performance advice
Neil Mitchell wrote: > Sorry, accidentally sent before finishing the response! I also noticed > you sent this directly to me, not to -cafe, was that intentional? The mail/news gateway makes it look like that, but I also sent to the mailing list. > You mean something like: > > parallel_ xs = >sem <- createSemapore (length xs) >enqueue [x >> signalSemapore sem | x <- xs] >waitOnSemaphore sem > > I thought of something like this, but then the thread that called > parallel_ is blocked, which means if you fire off N threads you only > get N-1 executing. If you have nested calls to parallel, then you end > up with thread exhaustion. Is there a way to avoid that problem? > > Thanks > > Neil Your parallel_ does not return until all operations are finished. > parallel_ (x:xs) = do > ys <- mapM idempotent xs > mapM_ addParallel ys > sequence_ $ x : reverse ys By the way, there is no obvious reason to insert "reverse" there. What I meant was something like: > > para [] = return () > para [x] = x > para xs = do > q <- newQSemN 0 > let wrap x = finally x (signalQSemN q 1) > go [y] n = wrap x >> waitQSemN q (succ n) > go (y:ys) n = addParallel (wrap y) >> go ys $! succ n > go xs 0 This is nearly identical to your code, and avoid creating the MVar for each operation. I use "finally" to ensure the count is correct, but if a worker threads dies then bas things will happen. You can replace finally with (>>) if speed is important. This is also lazy since the length of the list is not forced early. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Re: Parallel combinator, performance advice
You create one MVar for each task in order to ensure all the tasks are done. This is pretty heavyweight. You could create a single Control.Concurrent.QSemN to count the completed tasks, starting with 0. Each task is followed by signalQSemN with a value of 1. (I would use "finally"). As parallel_ launches the tasks it can count their number, then it would call waitQSemN for that quantity to have finished. -- Chris ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe