James M. Lawrence wrote:
Though it would be a privilege to see my code merged into Rake, of
course a case would have to be made for it. What are the problems it
solves? I am not a good person to make the case, as I have no
experience with non-trivial use of 'multitask'. But fortunately
others here do.
1. when compiling c/c++ code, usually a compiler is launched per source
file. each runs separately from the other, so they can be parallelized.
as c/c++ compilations are usually slower than java (for example), and
since multi-core servers are common these days, this is an important
improvement.
2. multi-module builds can benefit in the same way, so a project with 2
java modules can run their build in parallel.
3. it makes rake more compatible with make which helps non-ruby people
(c/c++ peolple) choose it.
4. multitask does not do a good job for two reasons:
a. it cannot be controlled from outside (no equivalent to -j), so on
1/2/4 cpus you get the same amount of threads (depending on the size of
prerequisites)
b. it runs thread for each prerequisites. so it can thrash the
system, running for example 30 threads on 2 cpus (again, the main use
case here being c/c++ with 30 source files)
5. alternatives that give better control, but are still implementations
of task are wrong since you eventually loose control of the number of
threads. i have a JobTask implementation that can be configured to tell
it how many jobs to run. now imagine i use it to compile modules in
parallel, and i have 2 top modules each with 2 sub modules, i end up
with 3 job tasks (top and 2 sub) each running 2 threads, so 6 threads,
which again brings us to thrashing the system. it is hard to overall
control the existence of only x threads. (currently i use my job task
for only the "leaf" prerequisites).
6. both multitask and jobtask suffer from the fact that the threads are
independent. usually, if there's a compilation error, i want to stop the
build and signal the error to the user. but when there are several
independent threads, it is hard to know them all (think several
multi-task/job-task) so as to stop them (cleanly) when one fails
- btw, this is possible with drake, right? so if one thread fails
the other is signaled to quit execution when the current execution block
finishes (cleanly)
7. current rake relies on recursion to execute tasks. for a large build
this may create a deep recursion stack that exhausts ruby's execution
stack (for 20 deep execution, the ruby stack was ~800 which caused
segfaults in linux even when the stack size was unlimited)
Ittay
After all, Drake may be a totally misguided project. That would be OK
with me, as my primary interest was in CompTree. The implementation
of Drake is trivial, given CompTree.
Here are Ittay's points from a previous thread,
1. If some top level tasks run in parallel, and each of them
recursively runs other tasks, and one of the bottom tasks fail, it
is impossible to stop the other tasks, short of a very ugly abort of
all threads.
2. Tasks that run in parallel can't tell when another task's execution
has failed. They may read a wrong timestamp from the failed task.
CompTree uses Thread.abort_on_exception = true. If something goes
wrong, why shouldn't we abort all threads? I don't yet see the issue
here.
That said, CompTree does give us an option. A Rake task is translated
into a computation node which discards its result. But we don't have
to discard the result. We could wrap the task inside begin/rescue,
returning the Exception instance as the result. CompTree 'computes'
the exception.
# top-level
result = driver.compute(root_node, :num_threads => n)
if result and result.is_a? Exception
raise result
end
3. Threads are created per prerequisite task, rather than a fixed number
(based initially on the number of cores/cpus), which causes thrashing
CompTree uses a fixed number of threads.
4. Even if a thread pool will be utilized, dependency information is
still hard to take into account. Imagine a task has 2 prerequisites,
where one depends on the other. Adding the tasks into a thread pool,
they may be invoked in two different threads, but one waits on the
other so the thread is not utilized. Maybe add a "distance" method
which calculates how far one task if from the other in dependency
(adding nil if not dependent), so when adding tasks to a queue, they
are added to the queue where the current tasks have the minimal
distance (nil being infinite).
In CompTree, whenever a node finishes its computation, the tree is
scanned for nodes waiting to be computed. Available nodes are handed
out to the available threads. Those threads which didn't get a node
are put to sleep. No soup for you, and go to bed. Repeat.
Thus CompTree operates at "max capacity" at all times. Given N
threads, if at any time N computations are not running, it is because
the graph topology demands it (children node results are not
available). In short, I believe it does what you want.
As you can see, I can't help myself from using graph terminology, as
my contact with Rake has been rather superficial. A quick rosetta
stone:
node <--> task
child <--> prerequisite
parent <--> ?? what do you call this ??
compute <--> invoke
function <--> @actions.each { |act| act.call }
function.call <--> execute
result <--> N/A
node.name <--> task.name.to_sym
It is not clear whether these issues would be solved better within
Rake itself, or whether CompTree should be used to solve them. What
are the other issues, and how would you solve them?
James M. Lawrence
_______________________________________________
Rake-devel mailing list
[email protected]
http://rubyforge.org/mailman/listinfo/rake-devel
--
--
Ittay Dror <[EMAIL PROTECTED]>
_______________________________________________
Rake-devel mailing list
[email protected]
http://rubyforge.org/mailman/listinfo/rake-devel