Re: Testing Kristian's MNG-3004 branch

Kristian Rosenvold Wed, 06 Jan 2010 03:16:02 -0800

Cool.  I'll do the *simple* clarifications first:

Without a threads argument it behaves like a totally standard M3. All
integration tests pass, and I spent a lot of energy to make sure I
didn't break anything. So in answer to (1), it builds maven3 without
threading. With threading is a different story ;)

With any kind of threading argument it switches to a different
algorithm. This is where I think we need to discuss a little ;)

I have split DefaultLifecycleExecutor into a number of smaller plexus
components (essentially just a series of "extract class" operations).
This allowed me to encapsulate the "execution strategy" into a separate
component. So now there's two versions; LifecycleModuleBuilder and
LifecycleWeaveBuilder.

In this process I removed your original implementation, simply because 
it allowed me to work freely in simplifying my own implementation (and I
truly believe I managed to make some good simplifications). I also
considered that I'd re-add your implementation as a third strategy
when/if needed - it'll only take me an hour or two.

So I think this is really the main topic that should  be discussed; I
will tentatively propose that both threaded strategies should be
included because:

At the moment Dan's original implementation is really the
non-experimental threading implementation. It analyzes the reactor
dependencies and schedules modules according to the
ProjectDependencyGraph. The main downside to this patch is that it has
probably quite close to exhausting the  available potential in terms of
how much concurrency can be achieved. But, as far as I can see it will
work for every single project right out of the box (parts of MNG-2802
still needs fixing but we should sort this out first). Implemented as a
"LifecycleThreadedBuilder" it could very well serve as a third execution
strategy.

I think the LifecycleWeaveBuilder pretty much does what we discussed,
but I'll try to summarize since I now have the power of detailed
knowledge, something I didn't have when starting this journey. As I
progressed I also managed to see a few simplifications/abstractions that
I may not have communicated to the list (I'm including some references
to changes in the patched code for those wanting to look)

It basically views all reactor modules as eligible for execution
immediately, and starts execution with the requested number of threads,
one thread per module.

Each ExecutionPlanItem now potentially has a "schedule" attached to it,
that describes external requirements that must be met before the item 
can be executed. This schedule is determined by a declarative
representation attached to class "DefaultLifecycles" (method
DefaultLifecycles.createExecutionPlanItem). 

The current dependencies that can be expressed in this declarative
representation are:

outputDependant: The execution of this mojo depends on the output of the
upstream executions of the same mojo according to the
ProjectDependencyGraph (much like what Dan does in his solution but
expressed at the ExecutionPlanItem level instead of module level).
mojoSynchronized: Synchronizes execution on the mojo's class, meaning
that only one instance can be running at any time.
forkable: Means this execution can be forked in an additional thread.
This scheduling is the source of the "violate lifecycle" concept, it's
the only scheduling that'll allow out-of-order execution of lifecycle
phases for a given module. I'm still not sure if this should be present.

These three constraints can be attached either to a lifecycle phase or a
specific mojo. 

This is really just an abstraction of the wave-mode Dan originally
described, but if there are no schedules attached, everything will just
run totally unconstrained. The schedules enforce the dependencies and
the XYZ restrictions.

This implementation builds some (fairly standard) projects, but still
fails on others. This should mostly be because the schedule is not
complete or there is another type of constraint that also needs to be
expressed in the schedule. 

The "weave" builder is fast, and still has some interesting untapped
potential in terms of achievable concurrency. It is also largely unknown
if all builds can be built with a single strategy for concurrency, or if
multiple concurrency schedules can/will be needed. The current
implementation would easily be modified to allow multiple schedules
("concurrency profiles") or even user-specified concurrency-profiles
from the command line/an external xml document. There is also a large
amount of fairly exciting enhancements that *could* be done; I can make
a long list of exciting stuff that could probably fill 3 months
development time.

Personally I think these questions should be answered:
- Do we need just *one* implementation of concurrency execution
strategy ?

I am unsure of this, but personally I feel multiple implementations
may distract us from reaching a truly great end result. Then again,
I am personally not able to specify *exactly* what such a truly great
result consists of. And Dan's implementation does the job safely.

- Given that parallel execution is an "alternate" mode that may have
additional constraints, does 3.0 need something that is guaranteed to
work for the vast majority of projects ? I think Dan's implementation
does this already, while the Weave build would probably get there around
3.1-3.2 if it is exposed as "experimental" through 3.0.

I'd also suggest that we re-package Dan's original implementation into
the modularized DefaultLifecycleExecutor. I also think the existence of
2 separate implementations confirms concurrency as doable ;) I am fairly
sure that my implementation/testing also clears Dan's implementation
from any really nasty concurrency issues.

Kristian

If anyone's still reading:

The only significant difference between the weave mode and the regular
mode is that the complete execution plan is determined up-front. As a
consequence of this the ReactorArtifactRepository (line 83) is forced to
use compile output from upstream modules when in weave mode, which means
jar files from other modules are not used in weave mode. This is also
the reason for the problem with the Antrun plugin, I believe. I'll have
to go jogging (skiing), to come up with how to solve this.

I'll look at the eclipse:eclipse issue ASAP.

On Tue, 2010-01-05 at 17:39 -0800, Dan Fabulich wrote:
> 1) I'm encountering some integration failures in my build at work when 
> using -Dmaven.threads.experimental=1; I'll try to turn them into proper 
> bugs in the next few days.
> 
> 2) In the documentation on http://github.com/krosenvold/maven3/ it says 
> that it does not yet build Maven 3.  Does this mean that it's unable to 
> build Maven 3 in multithreaded mode?  It seems to build OK for me in 
> non-multithreaded mode.
> 
> 3) I noticed that the branch seems to be unable to run "mvn 
> eclipse:eclipse" on itself; the "Apache Maven 3.x" project is marked 
> "SUCCESS" but then all the subsequent projects are skipped.  Is this 
> known?  I filed it as http://github.com/krosenvold/maven3/issues/#issue/2
> 
> 4) A point of terminology: I think the branch is now using the term "weave 
> mode" differently from the way I'd meant it when I introduced the term.
> 
> As I understood it, "weave mode" means running compile X compile Y compile 
> Z, then test X test Y test Z, as opposed to the default non-weave behavior 
> to compile X and test X, then compile Y and test Y, then compile Z and 
> test Z.
> 
> Is that what you think it means?  Or is that what "violate lifecycle" means?
> 
> It seems like the branch uses "weave mode" to refer to any multithreaded 
> build, which means that it's now not possible in Kristian's branch to do 
> the coarse-grained multithreading I had working in my MNG-3004 branch. 
> Do I understand that correctly?
> 
> -Dan

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@maven.apache.org
For additional commands, e-mail: dev-h...@maven.apache.org

Re: Testing Kristian's MNG-3004 branch

Reply via email to