Re: Testing Kristian's MNG-3004 branch

Kristian Rosenvold Thu, 07 Jan 2010 01:18:52 -0800

On Wed, 2010-01-06 at 18:36 -0800, Dan Fabulich wrote:
> Kristian Rosenvold wrote:
> 
> > In this process I removed your original implementation, simply because
> > it allowed me to work freely in simplifying my own implementation (and I
> > truly believe I managed to make some good simplifications). I also
> > considered that I'd re-add your implementation as a third strategy
> > when/if needed - it'll only take me an hour or two.
> 
> If it'd only take you an hour or two, I suggest we try it.  I can take a 
> crack at it if you're busy, but I'm not quite seeing where to plug in at 
> the moment.
> 
> > as far as I can see it will work for every single project right out of 
> > the box
> 
> Yes, that would be nice :-)


I will re-add your stuff, and I will also set it up to use my output
demultiplexer that causes output to appear in "normal" order. We'll just
try to close this (hugely interesting IMO!) discussion first. "An hour
or two" usually maps to 4-5 hours real time, so I want to try aiming for
the best solution.

> 
> 
> As for "weave" mode:
> 
> > Each ExecutionPlanItem now potentially has a "schedule" attached to it, 
> > that describes external requirements that must be met before the item 
> > can be executed. This schedule is determined by a declarative 
> > representation attached to class "DefaultLifecycles" (method 
> > DefaultLifecycles.createExecutionPlanItem).
> 
> That sounds more aggressive than what I was imagining.  I was only 
> imagining splitting the work into lifecycle phases; if I understand you 
> correctly, you're splitting to individual mojo executions, which is much 
> more fine-grained.

Actually I'm saying that the schedule controls this splitting. We could
support multiple (even user defined) schedules. See earlier mail I just
sent today.
 
> 
> So I was imagining running the entire compile phase of project X 
> sequentially in a thread, concurrent with the compile phase of project Y; 
> dependent project Z wouldn't begin compiling until X and Y had finished 
> compiling, but X and Y could begin testing while Z was compiling.

Yes, this what I initially did. And it worked - it still works. This is
what the current solution still does, but I got a little greedy - maybe
too much, but that's easily fixable ;) 

There is one very important restriction I ended up with; in all normal
scheduling cases each single module builds on 1 thread only. The only
thing that can happen to this thread is that it may wait for some
upstream ExecutionPlanItems to complete. Which (if any)
ExecutionPlanItem you're waiting for is controlled by the
ProjectDependencyGraph *and* the schedule in use.


> It seems like you're splitting the phase up into pieces, too; so 
> individual mojos of project X could run concurrently with each other, 
> which I was *not* imagining.

No, not for the normal case. Actually you *can* make this happen if you
enable "force lifecycle violations", which will permit you to fork a
single "something" (mojo or all the mojos in a phase) as a thread of its
own. I am vastly unsure if this mode should be included at all,
especially since I have been unable to get it to give me significantly
"more" concurrency. (I think this is because at the place I tried this,
we are already saturating all the cpu there is - I was trying surefire
in unit test phase) 

> The problem with splitting down to the execution item is that then you 
> need to know the dependencies between execution items (if any); the 
> dependencies need to be expressed in mojo metadata, etc.

> But if we only split down to lifecycle phase, well, we know the ordering 
> of lifecycle phases, and as long as we run them in that order, we're 
> guaranteed correctness, and we don't have to add mojo metadata, right?

Yes. 

I think that any succesful "weave" implementation needs to be solved
mostly without additional mojo metadata. Even though I can fantasize
about proxying parts of the maven model, that's mostly "maven 5.0"
territory ;)

> 
> (Well, except maybe declaring that certain mojos need to be 
> synchronized...?)


As I already said, the current schedule describes the "synchronized"
metadata-aspect of selected mojos; i needed this because they had
non-thread safe interactions with parts of the maven model. So already
there I lost it ;)


I have tried to make an implementation that reduces this problem to
describing the semantics of the "waiting" dependencies in a proper way.
When, and what, should control if a phase or mojo is allowed to start
execution?


> To put names on this, I think there are three "granularities" under 
> consideration:
> 
> * Project granularity (my first attempt in the MNG-3004 branch)
> * Phase granularity (my intended description of "weave" mode)
> * Mojo granularity (your highly concurrent implementation)

I think we should stick to the first two; because they describe
significantly different directions of execution. As I hope I made clear,
I am dubious to the effect/value of letting a single module be built by
more than one thread. But I just had to try ;)

To be any better than "Project Granularity", weave mode needs to be
running in different phases for different modules concurrently.
Otherwise it degrades to become almost the same as "Project
Granularity". From your example above, X and Y are testing while the
downstream Z is still in compile.



> (In practice, I think you would have something very close to phase 
> granularity just by assuming that every mojo was "output dependent."  Or 
> am I misunderstanding?)

You're understanding it ;) The only adjustments that seem to be needed
is to tweak the grammar for expressing the dependencies a bit broader;

Currently you can only say "compile" is outputDependenant upon itself,
meaning it'll wait for "compile" in all upstream projects to finish
before proceeding. We also need to be able to specify the explicit
target of the dependency, so you could say "test" is outputDependant on
"compile" in all upstream modules.

If you additionally add an outputDependency on itself to ALL subsequent
phases, I think that should solve most of the problems with this model.


Which brings me to another high-level concern here;

All I *really* want/need is to run test in the exact manner you are
describing. In reality I'm not sure if the later-phase concurrencies
(war, install etc) provide any real value, and some may even have
negative contributions in some case (I/O trashing due to concurrency
comes to mind). I run my builds on ramdisk in linux, so I don't suffer,
but those poor windows users with their crappy IO and virus controls are
not as lucky.

I am tempted to reduce/trim the schedule descriptions to fit just this
ONE usecase perfectly. I'd still be running massively concurrent /until/
the compile phase, but after that just concurrently schedule the tests
(as Dan describes above), and immediately after that just proceed with
regular sequential reactor building. 

I haven't used the shade plugin, but would that be needed in the test
phase of subsequent modules ?
 
> My long-term goal is that Maven should run by default in "concurrent" mode 
> where threads = 1; optionally, users can crank up the number of threads 
> *without* changing the execution strategy.

Good long-term goal. Maybe even default to threads = numCores;

Kristian





---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@maven.apache.org
For additional commands, e-mail: dev-h...@maven.apache.org

Re: Testing Kristian's MNG-3004 branch

Reply via email to