I would claim that our current build analysis is always "smart" by the
way its culprit identification algorithm is written.
The first question is: "What is acceptable integration build failure by
our standard?"
Obviously, the answer is that if build fails in a module nobody is
working on (in other words, the failure is caused by inter-module
dependency), then it's acceptable.
The second question is: "What's the algorithm that identifies the culprit?"
The answer is that if a build failures in a module, the algorithm tries
to find the person who has commit in that module but does not run test
that on that module.
So, from the 2 answers you can see that the build analysis will always
be "smart", exactly because it does not use dependency information
during the analysis.
My hypothesis is that since inter-module dependency build failure is
rare, our current approach requiring only local testing on the module
you are working on is a happy medium.
Cheers,
Cedric
Philip Johnson wrote:
Last night's build failure resulted in the following analysis:
--On Monday, February 13, 2006 6:00 AM -1000 Hackystat Administrator
<[EMAIL PROTECTED]> wrote:
Integration build for Hackystat-7 project on 13-Feb-2006 FAILED.
Module: hackyApp_Cgqm
Failure Types: Compilation
Plausible Culprit: Unknown
How culprit is identified: Build failure reason is unclear, and no
one has made any
commit. Perhaps hackystat sensor data is incomplete, or it is caused
by external error
on the integration build box. Failure Messages:
*
C:\HackystatSource\hackyApp_Cgqm\src\org\hackystat\app\cgqm\testbase\ABaseRemoteProject
TestClass.java::63::cannot find symbol
The compilation failures revolve around the ProjectManager class,
which Hongbing committed recently, so it looks like this is something
Hongbing should investigate.
But there's a more interesting hypothesis that occurs to me.
I'm wondering if the Build Analysis mechanism has become "smart
enough" in the following sense: build failures that the mechanism can
identify a culprit for are basically those modifications for which our
standard development process should prevent from failing the
integration build. Conversely, build failures that the mechanism
cannot identify a culprit for are those that we generally allow as
"acceptable" integration build failures.
This goes back to the following basic premise of our process: the
only way to guarantee that developer-induced integration build
failures are prevented would be to force everyone to do a 'freshStart
all.junit' on _all_ modules prior to _every_ commit. We've said that
this is too heavyweight: the cost in productivity for this kind of
process (even if we could get everyone to do it) is higher than the
cost in productivity for occasional integration build failures.
On the other hand, if the nightly build fails constantly, then that
slows things down too much because people can't trust the repository
to contain a working version.
So, our goal has been to find a "happy medium"---a level of process in
which people make "reasonable" efforts to test their changes before
committing which reduces the integration build failures to a level in
which those failures that remain are "justifiable", because the cost
of local testing to prevent these last remaining failures is too high.
So, what I'm starting to wonder is whether our build analysis
mechanism has actually become a valid measure for "reasonable": in
other words, if it can identify the culprit, then the culprit should
have prevented the failure, but if it can't identify the culprit, then
the failure is caused by a sufficiently indirect sequence of events
that we can view the daily build mechanism as being the most efficient
way to uncover it.
To test this, we simply need to start evaluating the daily build
analysis from the culprit/no culprit perspective. For example, I would
claim that today's failure is "reasonable", in that Hongbing would
have had to do a full test of the entire system to catch it.
Your thoughts?
Cheers,
Philip