Re: [Haskell-cafe] Re: Parallel combinator, performance advice

2009-04-08 Thread Neil Mitchell
Hi Gleb,

>> I've just found that QSem _ISN'T_ valid below 0, so the above code
>> won't actually work with QSem as it stands.
>
> I may be missing something, but I guess this code will work if 'newQSem
> (-n)' is replaced with 'newQSemN n', every 'signalQSem sem' with
> 'signalQSemN sem 1' and 'waitQSem sem' with 'waitQSemN sem n'.

Yep, that seems valid. However, the new solution requires an MVar to
store the number of things we're waiting for anyway, so we might as
well use that to also wake up the main thread. Semaphores aren't more
efficient than MVar's, they are in fact implemented in terms of
MVar's, so there is no benefit.

Thanks

Neil
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Re: Parallel combinator, performance advice

2009-04-08 Thread Gleb Alexeyev

Neil Mitchell wrote:


I've just found that QSem _ISN'T_ valid below 0, so the above code
won't actually work with QSem as it stands.


I may be missing something, but I guess this code will work if 'newQSem 
(-n)' is replaced with 'newQSemN n', every 'signalQSem sem' with 
'signalQSemN sem 1' and 'waitQSem sem' with 'waitQSemN sem n'.


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Re: Parallel combinator, performance advice

2009-04-07 Thread ChrisK
Neil Mitchell wrote:
> 
> I guess the "nested calls to parallel_" bit is the part of the spec
> that makes everything much harder!
> 
> Thanks
> 
> Neil

Yes.  Much more annoying.

But the problem here is generic.  To avoid it you must never allow all thread to
block at once.

The parallel_ function is such a job, so you solved this with the 'idempotent'
trick.  You solution works by blocking all but 1 thread.

1a) Some worker thread 1 executes parallel_ with some jobs
1b) These get submitted the work queue 'chan'
1c) worker thread 1 starts on those same jobs, ignoring the queue
1d) worker thread 1 reaches the job being processed by thread 2
1e) worker thread 1 blocks until the jobs is finished in modifyMVar

2a) Worker thread 2 grabs a job posted by thread 1, that calls parallel_
2b) This batch of jobs gets submitted to the work queue 'chan'
2c) worker thread 2 starts on those same jobs, ignoring the queue
1d) worker thread 2 reaches the job being processed by thread 3
1e) worker thread 2 blocks until the jobs is finished in modifyMVar

3...4...5...

And now only 1 thread is still working, and it has to work in series.

I think I can fix this...

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Re: Parallel combinator, performance advice

2009-04-07 Thread Neil Mitchell
Hi

> > The problem I'm trying to solve is running system commands in
> > parallel.

> "system commands" means execution of external commands or just system
> calls inside Haskell?

Calls to System.Cmd.system, i.e. running external console processes.
It's a make system I'm writing, so virtually all the time is spent in
calls to ghc etc.

To Bulat: I should have been clearer with the spec. The idea is that
multiple calls to paralell_ can execute, and a function executing
inside parallel_ can itself call parallel_. For this reason I need one
top-level thread pool, which requires unsafePerformIO. If I create a
thread pool every time, I end up with more threads than I want.

> Your parallel_ does not return until all operations are finished.
>
>> parallel_ (x:xs) = do
>>     ys <- mapM idempotent xs
>>     mapM_ addParallel ys
>>     sequence_ $ x : reverse ys
>
> By the way, there is no obvious reason to insert "reverse" there.

There is a reason :-)

Imagine I do parallel_ [a,b,c]

That's roughly doing (if b' is idempotent b):

enqueue b'
enqueue c'
a
b'
c'

If while executing a the thread pool starts on b', then after I've
finished a, I end up with both threads waiting for b', and nothing
doing c'. If I do a reverse, then the thread pool and me are starting
at different ends, so if we lock then I know it's something important
to me that the thread pool started first. It's still not idea, but it
happens less often.

> What I meant was something like:
>>
>> para [] = return ()
>> para [x] = x
>> para xs = do
>>   q <- newQSemN 0
>>   let wrap x = finally x (signalQSemN q 1)
>>       go [y] n = wrap x >> waitQSemN q (succ n)
>>       go (y:ys) n = addParallel (wrap y) >> go ys $! succ n
>>   go xs 0
>
> This is nearly identical to your code, and avoid creating the MVar for each
> operation.  I use "finally" to ensure the count is correct, but if a worker
> threads dies then bas things will happen.  You can replace finally with (>>) 
> if
> speed is important.

Consider a thread pool with 2 threads and the call parallel_ [parallel_ [b,c],a]

You get the sequence:
enqueue (parallel_ [b,c])
a
wait on parallel_ [b,c]

While you are executing a, a thread pool starts:
enqueue b
c
wait for b

Now you have all the threads waiting, and no one dealing with the
thread pool. This results in deadlock.

I guess the "nested calls to parallel_" bit is the part of the spec
that makes everything much harder!

Thanks

Neil
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Re: Parallel combinator, performance advice

2009-04-07 Thread ChrisK
Neil Mitchell wrote:
> Sorry, accidentally sent before finishing the response! I also noticed
> you sent this directly to me, not to -cafe, was that intentional?

The mail/news gateway makes it look like that, but I also sent to the mailing 
list.

> You mean something like:
>
> parallel_ xs =
>sem <- createSemapore (length xs)
>enqueue [x >> signalSemapore sem | x <- xs]
>waitOnSemaphore sem
>
> I thought of something like this, but then the thread that called
> parallel_ is blocked, which means if you fire off N threads you only
> get N-1 executing. If you have nested calls to parallel, then you end
> up with thread exhaustion. Is there a way to avoid that problem?
>
> Thanks
>
> Neil

Your parallel_ does not return until all operations are finished.

> parallel_ (x:xs) = do
> ys <- mapM idempotent xs
> mapM_ addParallel ys
> sequence_ $ x : reverse ys

By the way, there is no obvious reason to insert "reverse" there.

What I meant was something like:
>
> para [] = return ()
> para [x] = x
> para xs = do
>   q <- newQSemN 0
>   let wrap x = finally x (signalQSemN q 1)
>   go [y] n = wrap x >> waitQSemN q (succ n)
>   go (y:ys) n = addParallel (wrap y) >> go ys $! succ n
>   go xs 0

This is nearly identical to your code, and avoid creating the MVar for each
operation.  I use "finally" to ensure the count is correct, but if a worker
threads dies then bas things will happen.  You can replace finally with (>>) if
speed is important.

This is also lazy since the length of the list is not forced early.

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Re: Parallel combinator, performance advice

2009-04-07 Thread ChrisK
You create one MVar for each task in order to ensure all the tasks are done.
This is pretty heavyweight.

You could create a single Control.Concurrent.QSemN to count the completed tasks,
starting with 0.

Each task is followed by signalQSemN with a value of 1.  (I would use 
"finally").

As parallel_ launches the tasks it can count their number, then it would call
waitQSemN for that quantity to have finished.

-- 
Chris

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe