Hello Everyone,

My last implementation of -j had a fatal flaw which could have led to stack 
overflow for large amounts of tasks. I discovered this after prototyping a 
version with drake's feature of turning tasks into multitasks.

I have a new version with a rewritten MultiTask implementation which removes 
that flaw, is smaller, and easily fits into Rake code. In addition, I've added 
an optional "--multitask -m" flag to turn every task into a multitask (in 
direct homage to drake).

   https://github.com/michaeljbishop/rake

I've again added a pull-request for the inclusion of the change to the master 
branch (as always allowing for further changes to match the style and 
simplicity of the original)

  https://github.com/jimweirich/rake/pull/113

Questions and comments as always, are welcome.

Sincerely,

Michael Bishop


---


## PROBLEM SUMMARY (THE CASE FOR -j and -m)

Rake can be unusable for builds invoking large numbers of concurrent external 
processes.

## PROBLEM DESCRIPTION

Rake makes it easy to maximize concurrency in builds with its "multitask" 
function. When using rake to build non-ruby projects quite often rake needs to 
execute shell tasks to process files. Unfortunately, when executing multitasks, 
rake spawns a new thread for each task-prerequisite. This shouldn't cause 
problems when the build code is pure ruby (for green threads), but when the 
tasks are executing external processes, the sheer number of spawned processes 
can cause the machine to thrash. Additionally ruby can reach the maximum number 
of open files (presumably because it's reading stdout for all those processes).

## SOLUTION SUMMARY

This request includes the code to add support for a `--jobs NUMBER (-j)` 
command-line option to specify the number of simultaneous tasks to execute.

  * To maintain backward compatibility, not passing `-j` reverts to the old 
behavior of unlimited concurrent tasks.

As a nod to [drake](http://drake.rubyforge.org), a `--multitask (-m)` flag is 
also included which when supplied, changes tasks into multitasks.

## SOLUTION

Rather than spawning a new thread per prerequisite `MultiTask` now sends its 
prerequisites to a `WorkerPool` object. `WorkerPool.new(n).execute_blocks` has 
the same semantics as `Thread.new`...`join` but caps the thread count at `n`.

### Core Change

    threads = @prerequisites.collect { |p|
      Thread.new(p) { |r| application[r, @scope].invoke_with_call_chain(args, 
invocation_chain) }
    }
    threads.each { |t| t.join }

...becomes...

    @@wp ||= WorkerPool.new(application.options.thread_pool_size)
    
    blocks = @prerequisites.collect { |r|
      lambda { application[r, @scope].invoke_with_call_chain(args, 
invocation_chain) }
    }
    @@wp.execute_blocks blocks


To support `-m`, the `MultiTask` implementation has moved to 
`Task#invoke_prerequisites_concurrently` and is called from 
`MultiTask#invoke_prerequisites`. This enables concurrent behavior for `Task`  
when `-m` is used.

### Details

`WorkerPool#execute_blocks` adds the passed-in blocks to a queue, ensures there 
are enough threads to execute them (under the maximum), and sleeps the current 
thread until the blocks are processed.

This creates a few potential problems:

> What if all of the blocks then called `#execute_blocks`? Wouldn't that sleep 
> all the threads?

Yes it would. This is solved as `#execute_blocks` removes the current thread 
from the thread pool just before it sleeps and creates a new one in its place. 
When all the blocks are processed, the current thread is added back to the pool 
(adjusting for the max-size). There are always enough available threads in the 
thread pool for processing.

> When do the threads shutdown?

`WorkerPool#execute_blocks` knows how many threads are waiting for their blocks 
to be processed. If, upon its awakening, it notices there are no threads 
waiting on blocks, it shuts down the thread pool.

### Statistics

     ---LINES--     ----LOC---
      old   new      old   new   File Name
     ----------     ----------   ----------
      598   605      477   484   lib/rake/application.rb
       16    13       11     8   lib/rake/multi_task.rb
      327   341      210   222   lib/rake/task.rb
            111             80   lib/rake/worker_pool.rb
     4264  4393     2696  2792   TOTAL
     --------------------------------------
           +129            +96   SUMMARY

## TESTS

Tests are included for all new functionality

## REQUIREMENTS

The Ruby version requirements remain the same. `lib/rake/worker_pool.rb` adds 
two new requirements: `thread` and `set`



_______________________________________________
Rake-devel mailing list
[email protected]
http://rubyforge.org/mailman/listinfo/rake-devel

Reply via email to