Re: MNG-3004/MNG-2802 - Achieving massive parallelity ?

Kristian Rosenvold Sat, 21 Nov 2009 23:46:50 -0800

I've looked over the code and thought a bit further about the
constraints involved, and given that:


- Multi module reactor builds are the only interesting targets of
multithreading.
- Reactor builds do not use the "install" output of their upstream
dependencies (I was not aware of that ;)

You do not have to re-order anything at all. An implementation 
could just:
A) Immediately fork 1 thread per module for all modules.
B) For the phases compile, install and deploy, a given module can
only proceeed when all its upstream dependencies have completed the same
state
There's still a chance of leaking artifacts to local repository if
upstream deploy fails after install, and the general idea of a
transacted repo would still be nice to stay consistent.
 
I'm still a bit sure about B) above, it may be a bit limiting in terms
of other usage scenarios. I'm also a bit sure how that'd fit in with all
the other activities in the lifecycle. An alternative would be to
make a declarative-representation of phase-interdependencies that could
express multiple types of concurrency-interdependencies. (But I
consistenly only see one dependency type -
upstreamMustFinishBeforeThisCanStart...?)

Would it float ?

Kristian


lø., 21.11.2009 kl. 11.40 +0000, skrev Stephen Connolly:
> In m3 (which is what we are talking about) AFAIK we can have a
> listener that waits for the end of the start of the deploy phase
> and/or the end of execution.
> 
> With a customized install plugin, we could just install to the
> "transaction" repository.  The listener can then block until the
> criteria have been met (allowing other modules to progress) That would
> achieve what you're after... namely, produce the artifacts for
> consumption by the other modules before running test and
> integration-test. Once the criteria have been met, we either fail the
> module or we move the artifacts from the "transactional" local repo to
> the real local repo and allow the lifecycle to continue
> 
> -Stephen
> 
> 2009/11/21 Kristian Rosenvold <[email protected]>:
> > I seem to understand that there's room for several different
> > types of solution here;
> >
> > Starting with the single-machine solution; I now understand that
> > you could start forking downstream builds straight after
> > compile in a reactor build, maybe after install in other cases.
> >
> > In this scenario I think each module is dependant on all upstream
> > modules successfully achieving "install" before proceeding to "deploy".
> > I really think it's important to avoid leaking artifacts that do not
> > have its own (and all upstream) lifecycle requirements fulfilled.
> >
> > When it comes to clustering there may be several approaches:
> > If you decide to publish artifacts through "deploy" to any kind
> > of repo I believe these require to have all lifecycle requirements met,
> > which at my current understanding seems orthogonal to local out-of-order
> > execution.
> >
> > Wouldn't it be feasible to distribute the "local" and perhaps
> > "transacted local" repo inside the cluster using network
> > file sharing ? One would still have to solve serialization issues
> > and using installed artifacts in a reactor build..?
> >
> > The clustering case seems like a much harder task than achieving
> > full local concurrency. I did some fairly extensive measurements
> > with my current build when I set up concurrent spring/junit testing:
> >
> > Missing concurrency in classloading is the most important reason
> > why unit tests run slowly (classloading is strictly a synchronized
> > business until jdk7). By running tests out-order on my local
> > unit test-build I am fairly certain I could reduce run-time
> > for "mvn clean install" to something much closer to "mvn
> > -Dmaven.test.skip=true clean install" (80->25 seconds in my case).
> > This is even before I start parallelizing the individual modules.
> >
> > I must confess that I've yet to see a build that really needs
> > clustering for any other reason than running tests or other individual
> > tasks (javadoc, site etc). I think I'd be inclined to just distributing
> > those specific tasks in a cluster. If you actually had a decent model of
> > inter-lifecycle phase dependencies (requiredForStarting between phases),
> > you could probably achieve good results by keeping lifecycle execution
> > centralized but ditributing plugin execution ?
> >
> > I suppose I may be narrow-minded on this last one...
> >
> > I will be starting to look at the DefaultLifeCycleExecutor with thoughts
> > of out-of-order execution, maybe dabble around a little.
> >
> > Kristian
> >
> > fr., 20.11.2009 kl. 06.29 -0800, skrev Dan Fabulich:
> >> I've been meaning to reply to your earlier emails (it's been a busy week);
> >> to this I'll just say that moving the "test" phase after the "install"
> >> phase is a fascinating idea, which I personally like, but it seems like a
> >> big violation of the contract for the lifecycle, and I suspect it won't be
> >> popular. :-(
> >>
> >> I've long felt that there should be a phase for testing after "install"
> >> for similar reasons.  This might be SLIGHTLY more popular since users
> >> would need to explicitly cause their tests to run during this phase.
> >>
> >> What about users doing multi-machine builds?  Earlier this week I wrote
> >> that users desiring to do multi-machine parallelism should deploy their
> >> builds to a remote repository shared between the machines.  Should their
> >> tests run post-deploy?
> >>
> >> -Dan
> >>
> >>
> >> Kristian Rosenvold wrote:
> >>
> >> > I've been thinking further about parallelity within maven. The proposed
> >> > solution to MNG-3004
> >> > achieves parallelity by analyzing inter-module dependencies and 
> >> > scheduling
> >> > parallel dependencies in parallel.
> >> >
> >> > A simple further evolution of this would be to collect and download all
> >> > external dependencies
> >> > for all modules immediately.
> >> >
> >> > But this idea has been rummaging in my head while jogging for a week or 
> >> > so:
> >> >
> >> > Would it be possible to achieve super-parallelity by describing
> >> > relationships between phases of the build, and even reordering some of 
> >> > the
> >> > phases ? I'll try to explain:
> >> >
> >> > Assume that you can add transactional ACID (or maybe just AID) abilities
> >> > towards the local
> >> > repo for a full build. Simply put: All writes to a local repo is done in 
> >> > a
> >> > per-process-specific instance of the repo, that can be rolled back if the
> >> > build fails (or pushed to the local repo if
> >> > the build is ok)
> >> >
> >> > If you do that you can re-order the life-cycle for most builds to be
> >> > something like this:
> >> >
> >> > validate
> >> > compile
> >> > package
> >> > install
> >> > test
> >> > integration-test
> >> > deploy
> >> >
> >> > Notice that I just moved all the "test" phases after the "install" phase.
> >> > Theoretically you could start any subsequent modules immediately after
> >> > "install" is done. Running of tests is really the big killer in most
> >> > multi-module projects I see.
> >> >
> >> > Since your commit "push" towards the local repo only happens at the very 
> >> > end
> >> > of the build, you
> >> > will not publish artifacts when tests are failing (at leas not project
> >> > output artifacts)
> >> >
> >> > You could actually make this a generic model that describes deifferent 
> >> > kinds
> >> > of
> >> > dependencies between lifecycle phases of different modules. The 
> >> > dependency I
> >> > immediately
> >> > see is "requiredForStarting" - which could be interpreted as meaning that
> >> > any upstream
> >> > dependencies must have reached at least that phase before the phase can 
> >> > be
> >> > started
> >> > for this project. I'm not sure if there's any value in a generic model, 
> >> > but
> >> > my perspective
> >> > may be limited to what I see on a daily basis.
> >> >
> >> > Would this be feasible ?
> >> >
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: [email protected]
> >> For additional commands, e-mail: [email protected]
> >>
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [email protected]
> > For additional commands, e-mail: [email protected]
> >
> >
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
> 



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: MNG-3004/MNG-2802 - Achieving massive parallelity ?

Reply via email to