Re: MNG-3004/MNG-2802 - Achieving massive parallelity ?

Kristian Rosenvold Sat, 21 Nov 2009 02:11:20 -0800

I seem to understand that there's room for several different
types of solution here;

Starting with the single-machine solution; I now understand that 
you could start forking downstream builds straight after
compile in a reactor build, maybe after install in other cases.

In this scenario I think each module is dependant on all upstream
modules successfully achieving "install" before proceeding to "deploy".
I really think it's important to avoid leaking artifacts that do not
have its own (and all upstream) lifecycle requirements fulfilled.

When it comes to clustering there may be several approaches:
If you decide to publish artifacts through "deploy" to any kind
of repo I believe these require to have all lifecycle requirements met,
which at my current understanding seems orthogonal to local out-of-order
execution.

Wouldn't it be feasible to distribute the "local" and perhaps
"transacted local" repo inside the cluster using network
file sharing ? One would still have to solve serialization issues
and using installed artifacts in a reactor build..?  

The clustering case seems like a much harder task than achieving
full local concurrency. I did some fairly extensive measurements
with my current build when I set up concurrent spring/junit testing:

Missing concurrency in classloading is the most important reason
why unit tests run slowly (classloading is strictly a synchronized
business until jdk7). By running tests out-order on my local
unit test-build I am fairly certain I could reduce run-time
for "mvn clean install" to something much closer to "mvn
-Dmaven.test.skip=true clean install" (80->25 seconds in my case). 
This is even before I start parallelizing the individual modules.

I must confess that I've yet to see a build that really needs
clustering for any other reason than running tests or other individual
tasks (javadoc, site etc). I think I'd be inclined to just distributing 
those specific tasks in a cluster. If you actually had a decent model of
inter-lifecycle phase dependencies (requiredForStarting between phases),
you could probably achieve good results by keeping lifecycle execution 
centralized but ditributing plugin execution ?

I suppose I may be narrow-minded on this last one...

I will be starting to look at the DefaultLifeCycleExecutor with thoughts
of out-of-order execution, maybe dabble around a little.

Kristian

fr., 20.11.2009 kl. 06.29 -0800, skrev Dan Fabulich:
> I've been meaning to reply to your earlier emails (it's been a busy week); 
> to this I'll just say that moving the "test" phase after the "install" 
> phase is a fascinating idea, which I personally like, but it seems like a 
> big violation of the contract for the lifecycle, and I suspect it won't be 
> popular. :-(
> 
> I've long felt that there should be a phase for testing after "install" 
> for similar reasons.  This might be SLIGHTLY more popular since users 
> would need to explicitly cause their tests to run during this phase.
> 
> What about users doing multi-machine builds?  Earlier this week I wrote 
> that users desiring to do multi-machine parallelism should deploy their 
> builds to a remote repository shared between the machines.  Should their 
> tests run post-deploy?
> 
> -Dan
> 
> 
> Kristian Rosenvold wrote:
> 
> > I've been thinking further about parallelity within maven. The proposed
> > solution to MNG-3004
> > achieves parallelity by analyzing inter-module dependencies and scheduling
> > parallel dependencies in parallel.
> >
> > A simple further evolution of this would be to collect and download all
> > external dependencies
> > for all modules immediately.
> >
> > But this idea has been rummaging in my head while jogging for a week or so:
> >
> > Would it be possible to achieve super-parallelity by describing
> > relationships between phases of the build, and even reordering some of the
> > phases ? I'll try to explain:
> >
> > Assume that you can add transactional ACID (or maybe just AID) abilities
> > towards the local
> > repo for a full build. Simply put: All writes to a local repo is done in a
> > per-process-specific instance of the repo, that can be rolled back if the
> > build fails (or pushed to the local repo if
> > the build is ok)
> >
> > If you do that you can re-order the life-cycle for most builds to be
> > something like this:
> >
> > validate
> > compile
> > package
> > install
> > test
> > integration-test
> > deploy
> >
> > Notice that I just moved all the "test" phases after the "install" phase.
> > Theoretically you could start any subsequent modules immediately after
> > "install" is done. Running of tests is really the big killer in most
> > multi-module projects I see.
> >
> > Since your commit "push" towards the local repo only happens at the very end
> > of the build, you
> > will not publish artifacts when tests are failing (at leas not project
> > output artifacts)
> >
> > You could actually make this a generic model that describes deifferent kinds
> > of
> > dependencies between lifecycle phases of different modules. The dependency I
> > immediately
> > see is "requiredForStarting" - which could be interpreted as meaning that
> > any upstream
> > dependencies must have reached at least that phase before the phase can be
> > started
> > for this project. I'm not sure if there's any value in a generic model, but
> > my perspective
> > may be limited to what I see on a daily basis.
> >
> > Would this be feasible ?
> >
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: MNG-3004/MNG-2802 - Achieving massive parallelity ?

Reply via email to